[jira] [Assigned] (YARN-7668) Remove unused variables from ContainerLocalizer

2018-04-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-7668:


Assignee: (was: Ray Chiang)

> Remove unused variables from ContainerLocalizer
> ---
>
> Key: YARN-7668
> URL: https://issues.apache.org/jira/browse/YARN-7668
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Ray Chiang
>Priority: Trivial
>  Labels: newbie
>
> While figuring out something else, I found two class constants in 
> ContainerLocalizer that look like aren't being used anymore.
> {noformat}
>   public static final String OUTPUTDIR = "output";
>   public static final String WORKDIR = "work";
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2018-03-01 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3610:
-
Attachment: YARN-3610.003.patch

> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
>Priority: Major
> Attachments: YARN-3610.001.patch, YARN-3610.002.patch, 
> YARN-3610.003.patch
>
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7521) Add some missing @VisibleForTesting annotations

2018-02-16 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7521:
-
Summary: Add some missing @VisibleForTesting annotations   (was: Add some 
misisng @VisibleForTesting annotations )

> Add some missing @VisibleForTesting annotations 
> 
>
> Key: YARN-7521
> URL: https://issues.apache.org/jira/browse/YARN-7521
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Attachments: YARN-7521.001.patch
>
>
> While reviewing some other code, I ran into a few places where the 
> @VisibleForTesting annotation should be placed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4227) Ignore expired containers from removed nodes in FairScheduler

2018-01-08 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-4227:
-
Summary: Ignore expired containers from removed nodes in FairScheduler  
(was: FairScheduler: RM quits processing expired container from a removed node)

> Ignore expired containers from removed nodes in FairScheduler
> -
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.006.patch, YARN-4227.2.patch, 
> YARN-4227.3.patch, YARN-4227.4.patch, YARN-4227.5.patch, YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 
> and 2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2018-01-08 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317212#comment-16317212
 ] 

Ray Chiang commented on YARN-4227:
--

+1.  LGTM.

I'll commit this upstream soon.

> FairScheduler: RM quits processing expired container from a removed node
> 
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.006.patch, YARN-4227.2.patch, 
> YARN-4227.3.patch, YARN-4227.4.patch, YARN-4227.5.patch, YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 
> and 2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2018-01-05 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314170#comment-16314170
 ] 

Ray Chiang commented on YARN-4227:
--

Minor nit: The LOG.debug() calls for skipping containers aren't wrapped with 
LOG.isDebugEnabled().

> FairScheduler: RM quits processing expired container from a removed node
> 
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.2.patch, YARN-4227.3.patch, YARN-4227.4.patch, 
> YARN-4227.5.patch, YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 
> and 2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7645) TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers is flakey with FairScheduler

2018-01-05 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313888#comment-16313888
 ] 

Ray Chiang commented on YARN-7645:
--

+1.

I'm having difficulty reproducing the original error on my setup, but I'm not 
seeing any test issues with the new patch either.

> TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers is 
> flakey with FairScheduler
> -
>
> Key: YARN-7645
> URL: https://issues.apache.org/jira/browse/YARN-7645
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-7645.001.patch
>
>
> We've noticed some flakiness in 
> {{TestContainerResourceUsage#testUsageAfterAMRestartWithMultipleContainers}} 
> when using {{FairScheduler}}:
> {noformat}
> java.lang.AssertionError: Attempt state is not correct (timeout). 
> expected: but was:
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.amRestartTests(TestContainerResourceUsage.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers(TestContainerResourceUsage.java:254)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7668) Remove unused variables from ContainerLocalizer

2017-12-18 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-7668:


 Summary: Remove unused variables from ContainerLocalizer
 Key: YARN-7668
 URL: https://issues.apache.org/jira/browse/YARN-7668
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial


While figuring out something else, I found two class constants in 
ContainerLocalizer that look like aren't being used anymore.

{noformat}
  public static final String OUTPUTDIR = "output";
  public static final String WORKDIR = "work";
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7668) Remove unused variables from ContainerLocalizer

2017-12-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7668:
-
Labels: newbie  (was: )

> Remove unused variables from ContainerLocalizer
> ---
>
> Key: YARN-7668
> URL: https://issues.apache.org/jira/browse/YARN-7668
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
>  Labels: newbie
>
> While figuring out something else, I found two class constants in 
> ContainerLocalizer that look like aren't being used anymore.
> {noformat}
>   public static final String OUTPUTDIR = "output";
>   public static final String WORKDIR = "work";
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7363) ContainerLocalizer doesn't have a valid log4j config when using LinuxContainerExecutor

2017-12-06 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7363:
-
Fix Version/s: (was: 3.0.0)
   3.0.1

> ContainerLocalizer doesn't have a valid log4j config when using 
> LinuxContainerExecutor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 3.1.0, 2.10.0, 3.0.1
>
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch, YARN-7363.005.patch, 
> YARN-7363.branch-2.001.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7507) TestNodeLabelContainerAllocation failing in trunk

2017-12-04 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277193#comment-16277193
 ] 

Ray Chiang commented on YARN-7507:
--

I'm seeing the above error plus three more:

{noformat}
Error Message

expected:<5120> but was:<0>
Stacktrace

java.lang.AssertionError: expected:<5120> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.checkPendingResource(TestNodeLabelContainerAllocation.java:557)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.testPreferenceOfQueuesTowardsNodePartitions(TestNodeLabelContainerAllocation.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
{noformat}

{noformat}
Error Message

expected:<0> but was:<1024>
Stacktrace

java.lang.AssertionError: expected:<0> but was:<1024>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.testQueueMetricsWithLabels(TestNodeLabelContainerAllocation.java:1962)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at 

[jira] [Commented] (YARN-5594) Handle old RMDelegationToken format when recovering RM

2017-12-04 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277189#comment-16277189
 ] 

Ray Chiang commented on YARN-5594:
--

LGTM.  +1 (binding).

Test errors is identical to YARN-7507.

> Handle old RMDelegationToken format when recovering RM
> --
>
> Key: YARN-5594
> URL: https://issues.apache.org/jira/browse/YARN-5594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Tatyana But
>Assignee: Robert Kanter
>  Labels: oct16-medium
> Attachments: YARN-5594.001.patch, YARN-5594.002.patch, 
> YARN-5594.003.patch, YARN-5594.004.patch
>
>
> We've got that error after upgrade cluster from v.2.5.1 to 2.7.0.
> {noformat}
> 2016-08-25 17:20:33,293 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to
> load/recover state
> com.google.protobuf.InvalidProtocolBufferException: Protocol message contained
> an invalid tag (zero).
> at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
> at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.(YarnServerResourceManagerRecoveryProtos.java:4680)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.(YarnServerResourceManagerRecoveryProtos.java:4644)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4740)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4735)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:5075)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:4955)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:337)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:210)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.records.RMDelegationTokenIdentifierData.readFields(RMDelegationTokenIdentifierData.java:43)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:355)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044
> {noformat}
> The reason of this problem is that we use different formats of files 
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot/RMDelegationToken*
>  in these hadoop versions.
> This fix handle old data format during RM recover if 
> InvalidProtocolBufferException occures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5594) Handle old RMDelegationToken format when recovering RM

2017-11-30 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273594#comment-16273594
 ] 

Ray Chiang commented on YARN-5594:
--

Looks pretty good.  The only question I have is about the visibility of 
RMStateStoreUtils#readRMDelegationTokenIdentifierData.  I'm not sure whether 
public/protected/package visibility makes any real difference.

> Handle old RMDelegationToken format when recovering RM
> --
>
> Key: YARN-5594
> URL: https://issues.apache.org/jira/browse/YARN-5594
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Tatyana But
>Assignee: Robert Kanter
>  Labels: oct16-medium
> Attachments: YARN-5594.001.patch, YARN-5594.002.patch
>
>
> We've got that error after upgrade cluster from v.2.5.1 to 2.7.0.
> {noformat}
> 2016-08-25 17:20:33,293 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to
> load/recover state
> com.google.protobuf.InvalidProtocolBufferException: Protocol message contained
> an invalid tag (zero).
> at 
> com.google.protobuf.InvalidProtocolBufferException.invalidTag(InvalidProtocolBufferException.java:89)
> at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:108)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.(YarnServerResourceManagerRecoveryProtos.java:4680)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto.(YarnServerResourceManagerRecoveryProtos.java:4644)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4740)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$1.parsePartialFrom(YarnServerResourceManagerRecoveryProtos.java:4735)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:5075)
> at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$RMDelegationTokenIdentifierDataProto$Builder.mergeFrom(YarnServerResourceManagerRecoveryProtos.java:4955)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:337)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:267)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:210)
> at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:904)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.records.RMDelegationTokenIdentifierData.readFields(RMDelegationTokenIdentifierData.java:43)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:355)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1007)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1048)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1044
> {noformat}
> The reason of this problem is that we use different formats of files 
> /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot/RMDelegationToken*
>  in these hadoop versions.
> This fix handle old data format during RM recover if 
> InvalidProtocolBufferException occures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7381) Enable the configuration: yarn.nodemanager.log-container-debug-info.enabled

2017-11-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272084#comment-16272084
 ] 

Ray Chiang commented on YARN-7381:
--

Also, it's my understanding that there will be overhead of 2 files per 
container to HDFS unless everyone is running the tool from MAPREDUCE-6415.  So, 
processing ~1M mappers per day will add ~2M files to HDFS per day.  Leaving 
this on could be an issue for large/really busy clusters.

> Enable the configuration: yarn.nodemanager.log-container-debug-info.enabled
> ---
>
> Key: YARN-7381
> URL: https://issues.apache.org/jira/browse/YARN-7381
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-7381.1.patch
>
>
> Enable the configuration "yarn.nodemanager.log-container-debug-info.enabled", 
> so we can aggregate launch_container.sh and directory.info



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7381) Enable the configuration: yarn.nodemanager.log-container-debug-info.enabled

2017-11-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272056#comment-16272056
 ] 

Ray Chiang commented on YARN-7381:
--

[~leftnoteasy], I believe launch_container.sh contains all the environment 
variables.  If anyone has sensitive information there, then it will get exposed 
by turning on this debugging information, correct?  That's why we've had to 
have whitelist style filters for environment variables before.

I can't think of any security risk related to the directory listing offhand, 
not that I'm any kind of security expert.

If we go forward with this change, I'd strongly recommend putting in detailed 
information about that in the Release Notes field.

> Enable the configuration: yarn.nodemanager.log-container-debug-info.enabled
> ---
>
> Key: YARN-7381
> URL: https://issues.apache.org/jira/browse/YARN-7381
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-7381.1.patch
>
>
> Enable the configuration "yarn.nodemanager.log-container-debug-info.enabled", 
> so we can aggregate launch_container.sh and directory.info



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7363) ContainerLocalizer doesn't have a valid log4j config when using LinuxContainerExecutor

2017-11-27 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7363:
-
Summary: ContainerLocalizer doesn't have a valid log4j config when using 
LinuxContainerExecutor  (was: ContainerLocalizer don't have a valid log4j 
config in case of Linux container executor)

> ContainerLocalizer doesn't have a valid log4j config when using 
> LinuxContainerExecutor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch, YARN-7363.005.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7363) ContainerLocalizer don't have a valid log4j config in case of Linux container executor

2017-11-27 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267618#comment-16267618
 ] 

Ray Chiang commented on YARN-7363:
--

Looks good to me [~yufeigu].  +1 (binding) pending Jenkins.  I'll commit this 
soon.

> ContainerLocalizer don't have a valid log4j config in case of Linux container 
> executor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch, YARN-7363.005.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7363) ContainerLocalizer don't have a valid log4j config in case of Linux container executor

2017-11-27 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267274#comment-16267274
 ] 

Ray Chiang edited comment on YARN-7363 at 11/27/17 7:08 PM:


Minor nit:
* The yarn.nodemanager.container-localizer.log.level property is missing a 
description.
* The method getContaierLogDir misspells the word "Container"


was (Author: rchiang):
Minor nit: The yarn.nodemanager.container-localizer.log.level property is 
missing a description.

> ContainerLocalizer don't have a valid log4j config in case of Linux container 
> executor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7363) ContainerLocalizer don't have a valid log4j config in case of Linux container executor

2017-11-27 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267274#comment-16267274
 ] 

Ray Chiang commented on YARN-7363:
--

Minor nit: The yarn.nodemanager.container-localizer.log.level property is 
missing a description.

> ContainerLocalizer don't have a valid log4j config in case of Linux container 
> executor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7551) yarn.resourcemanager.reservation-system.max-periodicity is not in yarn-default.xml

2017-11-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261600#comment-16261600
 ] 

Ray Chiang commented on YARN-7551:
--

I see this in TestYarnConfigurationFields.java:

{noformat}
configurationPropsToSkipCompare
.add(YarnConfiguration.RM_RESERVATION_SYSTEM_MAX_PERIODICITY);
{noformat}

Looks like we skip requiring this property in yarn-default.xml and it came from 
YARN-5328.  [~subru], what's the reason for this?  

> yarn.resourcemanager.reservation-system.max-periodicity is not in 
> yarn-default.xml
> --
>
> Key: YARN-7551
> URL: https://issues.apache.org/jira/browse/YARN-7551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7521) Add some misisng @VisibleForTesting annotations

2017-11-16 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7521:
-
Attachment: YARN-7521.001.patch

> Add some misisng @VisibleForTesting annotations 
> 
>
> Key: YARN-7521
> URL: https://issues.apache.org/jira/browse/YARN-7521
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Trivial
> Attachments: YARN-7521.001.patch
>
>
> While reviewing some other code, I ran into a few places where the 
> @VisibleForTesting annotation should be placed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7521) Add some misisng @VisibleForTesting annotations

2017-11-16 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-7521:


 Summary: Add some misisng @VisibleForTesting annotations 
 Key: YARN-7521
 URL: https://issues.apache.org/jira/browse/YARN-7521
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial


While reviewing some other code, I ran into a few places where the 
@VisibleForTesting annotation should be placed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved YARN-6142.
--
   Resolution: Information Provided
Fix Version/s: 3.0.0

Protobuf and JACC analysis done.  Will continue rolling upgrade reviews at 
HDFS-11096.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213458#comment-16213458
 ] 

Ray Chiang commented on YARN-6142:
--

Minor issues found by JACC.

YARN-2696
- CapacityScheduler#getQueueComparator() split into partitioned/nonparititoned 
comparator

YARN-3139
- Removed synchronized from CapacityScheduler#getContainerTokenSecretManager()
- Removed synchronized from CapacityScheduler#getRMContext()
- Removed synchronized from CapacityScheduler#setRMContext()

YARN-3413
- YarnClient#getClusterNodeLabels() changed return type

YARN-3866
- Major refactor in Public APIs for AM-RM for handling container resizing.
- Change went into both 2.8.0 and 3.0.0.

YARN-3873
- CapacityScheduler#getApplicationComparator() removed

YARN-4593
- AbstractService#getConfig() removed synchronized

YARN-5077
- Removed SchedulingPolicy#checkIfAMResourceUsageOverLimit()

YARN-5221
- AllocateRequest / AllocateResponse has methods changed from Public/Stable to 
Public/Unstable

YARN-5713
- Update jackson affects TimelineUtils#dumpTimelineRecordtoJSON()


> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-10-20 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213251#comment-16213251
 ] 

Ray Chiang commented on YARN-6142:
--

I'm done with the JACC analysis, but need to do the same type of writeup that 
was done for protobuf.

The quick answer is that we don't have any major red flags, but I'm going to 
note some potential incompatibilities that are very minor, but could affect 
some random API user out there.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes

2017-10-13 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204094#comment-16204094
 ] 

Ray Chiang commented on YARN-7322:
--

Yeah, but I'm not sure we're actually adhering to any of those 
annotations--especially the @Public ones.

> Remove annotations from org.apache.hadoop.yarn.server classes
> -
>
> Key: YARN-7322
> URL: https://issues.apache.org/jira/browse/YARN-7322
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-7322.001.patch
>
>
> The main hadoop pom.xml has this section in the javadoc plugin:
> {noformat}
> org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp*
> {noformat}
> Since the package org.apache.hadoop.yarn.server is ignored, the various @ 
> annotations should be removed from those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes

2017-10-13 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7322:
-
Attachment: YARN-7322.001.patch

> Remove annotations from org.apache.hadoop.yarn.server classes
> -
>
> Key: YARN-7322
> URL: https://issues.apache.org/jira/browse/YARN-7322
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
> Attachments: YARN-7322.001.patch
>
>
> The main hadoop pom.xml has this section in the javadoc plugin:
> {noformat}
> org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp*
> {noformat}
> Since the package org.apache.hadoop.yarn.server is ignored, the various @ 
> annotations should be removed from those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes

2017-10-12 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7322:
-
Labels: newbie  (was: )

> Remove annotations from org.apache.hadoop.yarn.server classes
> -
>
> Key: YARN-7322
> URL: https://issues.apache.org/jira/browse/YARN-7322
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>  Labels: newbie
>
> The main hadoop pom.xml has this section in the javadoc plugin:
> {noformat}
> org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp*
> {noformat}
> Since the package org.apache.hadoop.yarn.server is ignored, the various @ 
> annotations should be removed from those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7138) Fix incompatible API change for YarnScheduler involved by YARN-5221

2017-10-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202407#comment-16202407
 ] 

Ray Chiang commented on YARN-7138:
--

Thanks.  Filed YARN-7322.

> Fix incompatible API change for YarnScheduler involved by YARN-5221
> ---
>
> Key: YARN-7138
> URL: https://issues.apache.org/jira/browse/YARN-7138
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Junping Du
>Priority: Critical
>
> From JACC report for 2.8.2 against 2.7.4, it indicates that we have 
> incompatible changes happen in YarnScheduler:
> {noformat}
> hadoop-yarn-server-resourcemanager-2.7.4.jar, YarnScheduler.class
> package org.apache.hadoop.yarn.server.resourcemanager.scheduler
> YarnScheduler.allocate ( ApplicationAttemptId p1, List p2, 
> List p3, List p4, List p5 ) [abstract]  :  
> Allocation 
> {noformat}
> The root cause is YARN-5221. We should change it back or workaround this by 
> adding back original API (mark as deprecated if not used any more).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes

2017-10-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202406#comment-16202406
 ] 

Ray Chiang commented on YARN-7322:
--

Link to YARN-7138.  This JIRA came about from a discussion there.

> Remove annotations from org.apache.hadoop.yarn.server classes
> -
>
> Key: YARN-7322
> URL: https://issues.apache.org/jira/browse/YARN-7322
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Minor
>
> The main hadoop pom.xml has this section in the javadoc plugin:
> {noformat}
> org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp*
> {noformat}
> Since the package org.apache.hadoop.yarn.server is ignored, the various @ 
> annotations should be removed from those classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7322) Remove annotations from org.apache.hadoop.yarn.server classes

2017-10-12 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-7322:


 Summary: Remove annotations from org.apache.hadoop.yarn.server 
classes
 Key: YARN-7322
 URL: https://issues.apache.org/jira/browse/YARN-7322
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0-beta1
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor


The main hadoop pom.xml has this section in the javadoc plugin:

{noformat}
org.apache.hadoop.authentication*,org.apache.hadoop.mapreduce.v2.proto,org.apache.hadoop.yarn.proto,org.apache.hadoop.yarn.server*,org.apache.hadoop.yarn.webapp*
{noformat}

Since the package org.apache.hadoop.yarn.server is ignored, the various @ 
annotations should be removed from those classes.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7138) Fix incompatible API change for YarnScheduler involved by YARN-5221

2017-10-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202352#comment-16202352
 ] 

Ray Chiang commented on YARN-7138:
--

Given that, then does it actually make sense to have such annotations on 
classes like YarnScheduler?  Would it be better to remove all such annotations 
then?

> Fix incompatible API change for YarnScheduler involved by YARN-5221
> ---
>
> Key: YARN-7138
> URL: https://issues.apache.org/jira/browse/YARN-7138
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Junping Du
>Priority: Critical
>
> From JACC report for 2.8.2 against 2.7.4, it indicates that we have 
> incompatible changes happen in YarnScheduler:
> {noformat}
> hadoop-yarn-server-resourcemanager-2.7.4.jar, YarnScheduler.class
> package org.apache.hadoop.yarn.server.resourcemanager.scheduler
> YarnScheduler.allocate ( ApplicationAttemptId p1, List p2, 
> List p3, List p4, List p5 ) [abstract]  :  
> Allocation 
> {noformat}
> The root cause is YARN-5221. We should change it back or workaround this by 
> adding back original API (mark as deprecated if not used any more).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7138) Fix incompatible API change for YarnScheduler involved by YARN-5221

2017-10-10 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199176#comment-16199176
 ] 

Ray Chiang commented on YARN-7138:
--

Sorry to come late into this.  Should we add a Release Note to YARN-5221 to 
document the incompatibility and mark the JIRA as incompatible?

> Fix incompatible API change for YarnScheduler involved by YARN-5221
> ---
>
> Key: YARN-7138
> URL: https://issues.apache.org/jira/browse/YARN-7138
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Junping Du
>Priority: Critical
>
> From JACC report for 2.8.2 against 2.7.4, it indicates that we have 
> incompatible changes happen in YarnScheduler:
> {noformat}
> hadoop-yarn-server-resourcemanager-2.7.4.jar, YarnScheduler.class
> package org.apache.hadoop.yarn.server.resourcemanager.scheduler
> YarnScheduler.allocate ( ApplicationAttemptId p1, List p2, 
> List p3, List p4, List p5 ) [abstract]  :  
> Allocation 
> {noformat}
> The root cause is YARN-5221. We should change it back or workaround this by 
> adding back original API (mark as deprecated if not used any more).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7219) Make AllocateRequestProto compatible with branch-2/branch-2.8

2017-10-03 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190455#comment-16190455
 ] 

Ray Chiang edited comment on YARN-7219 at 10/3/17 10:16 PM:


Committed to trunk and branch-3.0.  Thanks [~asuresh] for reviewing!


was (Author: rchiang):
Committed to trunk.  Thanks [~asuresh] for reviewing!

> Make AllocateRequestProto compatible with branch-2/branch-2.8
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: YARN-7219.001.patch
>
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7219) Make AllocateRequestProto compatible with branch-2/branch-2.8

2017-10-03 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7219:
-
Summary: Make AllocateRequestProto compatible with branch-2/branch-2.8  
(was: Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk)

> Make AllocateRequestProto compatible with branch-2/branch-2.8
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
> Attachments: YARN-7219.001.patch
>
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-10-03 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190301#comment-16190301
 ] 

Ray Chiang commented on YARN-7219:
--

[~asuresh], let me know if this patch is okay.  Thanks.

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
> Attachments: YARN-7219.001.patch
>
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7260) yarn.router.pipeline.cache-max-size is missing in yarn-default.xml

2017-09-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184892#comment-16184892
 ] 

Ray Chiang edited comment on YARN-7260 at 9/28/17 11:13 PM:


Just as an FYI, when you run the test, maven does point you at the 
surefire-reports directory (as does the dev-support/verify-xml.sh script).  You 
can look at the resulting 
org.apache.hadoop.yarn.conf.TestYarnConfigurationFields-output.txt file for the 
full output of the unit tests.


was (Author: rchiang):
Just as an FYI, when you run the test, maven does point you at the 
surefire-reports directory.  You can look at the resulting 
org.apache.hadoop.yarn.conf.TestYarnConfigurationFields-output.txt file for the 
full output of the unit tests.

> yarn.router.pipeline.cache-max-size is missing in yarn-default.xml
> --
>
> Key: YARN-7260
> URL: https://issues.apache.org/jira/browse/YARN-7260
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
> Attachments: YARN-7260-branch-2.001.patch
>
>
> In branch-2 TestYarnConfigurationFields fails
> {code}
> Running org.apache.hadoop.yarn.api.records.TestURL Tests run: 1, Failures: 0, 
> Errors: 0, Skipped: 0, Time elapsed: 0.278 sec - in 
> org.apache.hadoop.yarn.api.records.TestURL Running 
> org.apache.hadoop.yarn.conf.TestYarnConfigurationFields Tests run: 4, 
> Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.539 sec <<< FAILURE! - in 
> org.apache.hadoop.yarn.conf.TestYarnConfigurationFields 
> testCompareXmlAgainstConfigurationClass(org.apache.hadoop.yarn.conf.TestYarnConfigurationFields)
>  Time elapsed: 0.296 sec <<< FAILURE! java.lang.AssertionError: 
> yarn-default.xml has 1 properties missing in class 
> org.apache.hadoop.yarn.conf.YarnConfiguration at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:588)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7268) testCompareXmlAgainstConfigurationClass fails due to 1 missing property from yarn-default

2017-09-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185080#comment-16185080
 ] 

Ray Chiang commented on YARN-7268:
--

What branch are you running in?  I can't duplicate this error, and in both 
branch-2 and trunk, I see this line in TestYarnConfigurationFields.java:

xmlPrefixToSkipCompare.add(
"yarn.log-aggregation.file-controller.TFile.class");


> testCompareXmlAgainstConfigurationClass fails due to 1 missing property from 
> yarn-default
> -
>
> Key: YARN-7268
> URL: https://issues.apache.org/jira/browse/YARN-7268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Yesha Vora
>
> {code}
> Error Message
> yarn-default.xml has 1 properties missing in  class 
> org.apache.hadoop.yarn.conf.YarnConfiguration
> Stacktrace
> java.lang.AssertionError: yarn-default.xml has 1 properties missing in  class 
> org.apache.hadoop.yarn.conf.YarnConfiguration
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:414)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Standard Output
> File yarn-default.xml (253 properties)
> yarn-default.xml has 1 properties missing in  class 
> org.apache.hadoop.yarn.conf.YarnConfiguration
>   yarn.log-aggregation.file-controller.TFile.class
> ={code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7260) yarn.router.pipeline.cache-max-size is missing in yarn-default.xml

2017-09-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184892#comment-16184892
 ] 

Ray Chiang commented on YARN-7260:
--

Just as an FYI, when you run the test, maven does point you at the 
surefire-reports directory.  You can look at the resulting 
org.apache.hadoop.yarn.conf.TestYarnConfigurationFields-output.txt file for the 
full output of the unit tests.

> yarn.router.pipeline.cache-max-size is missing in yarn-default.xml
> --
>
> Key: YARN-7260
> URL: https://issues.apache.org/jira/browse/YARN-7260
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Rohith Sharma K S
>Assignee: Jason Lowe
> Attachments: YARN-7260-branch-2.001.patch
>
>
> In branch-2 TestYarnConfigurationFields fails
> {code}
> Running org.apache.hadoop.yarn.api.records.TestURL Tests run: 1, Failures: 0, 
> Errors: 0, Skipped: 0, Time elapsed: 0.278 sec - in 
> org.apache.hadoop.yarn.api.records.TestURL Running 
> org.apache.hadoop.yarn.conf.TestYarnConfigurationFields Tests run: 4, 
> Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.539 sec <<< FAILURE! - in 
> org.apache.hadoop.yarn.conf.TestYarnConfigurationFields 
> testCompareXmlAgainstConfigurationClass(org.apache.hadoop.yarn.conf.TestYarnConfigurationFields)
>  Time elapsed: 0.296 sec <<< FAILURE! java.lang.AssertionError: 
> yarn-default.xml has 1 properties missing in class 
> org.apache.hadoop.yarn.conf.YarnConfiguration at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> org.apache.hadoop.conf.TestConfigurationFieldsBase.testCompareXmlAgainstConfigurationClass(TestConfigurationFieldsBase.java:588)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-22 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-7219:
-
Attachment: YARN-7219.001.patch

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
> Attachments: YARN-7219.001.patch
>
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-09-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175664#comment-16175664
 ] 

Ray Chiang commented on YARN-6142:
--

Quick summary.  Filed YARN-7219 for the only follow up issue in investigating 
protobuf issues.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-09-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175663#comment-16175663
 ] 

Ray Chiang commented on YARN-6142:
--

Security.proto
* Two new messages

yarn_protos.proto
* ResourceUtilizationProto
** YARN-4293.  2.8.0 and later.
* ContainerStateProto added 2 new enum
** YARN-4597.  2.8.0 and later.
* ContainerProto added 3 optional fields
** execution_type
*** YARN-5127.  2.9.0 and later.
** allocation_request_id
*** YARN-4887.  2.9.0 and later.
** version
*** YARN-5221.  2.8.0 and later.
* FinalApplicationStatusProto added 1 new enum
** YARN-4207.  2.8.0 and later.
* ApplicationResourceUsageReportProto added 4 optional fields
** queue_usage_percentage
*** YARN-4285.  2.8.0 and later.
** cluster_usage_percentage
*** YARN-4285.  2.8.0 and later.
** preempted_memory_seconds
*** YARN-4218.  2.8.0 and later.
** preempted_vcore_seconds
*** YARN-4218.  2.8.0 and later.
* ApplicationReportProto added 6 optional fields
** log_aggregation_status
*** YARN-1402.  2.8.0 and later.
** unmanaged_application
*** YARN-3543.  2.8.0 and later.
** priority
*** YARN-3948.  2.8.0 and later.
** appNodeLabelExpression
*** YARN-3717.  2.8.0 and later.
** amNodeLabelExpression
*** YARN-3717.  2.8.0 and later.
** appTimeouts
*** YARN-5965.  2.9.0 and later.
* New message AppTimeoutsMapProto
** YARN-5965.  2.9.0 and later.
* New message ApplicationTimeoutProto
** YARN-5965.  2.9.0 and later.
* New enum LogAggregationStatusProto
** YARN-1402.  2.8.0 and later.
* ApplicationAttemptReportProto added 2 optional fields
** start_time
*** YARN-3451.  2.8.0 and later.
** finish_time
*** YARN-3451.  2.8.0 and later.
* NodeStateProto added 2 new enums
** NS_DECOMMISSIONING
*** YARN-3225.  2.8.0 and later.
** NS_SHUTDOWN
*** YARN-41.  2.8.0 and later.
* NodeReportProto added 2 optional fields
** containers_utilization
*** YARN-4293.  2.8.0 and later.
** node_utilization
*** YARN-4293.  2.8.0 and later.
* New message NodeLabelProto
* New enum ContainerTypeProto
* New enum ExecutionTypeProto
* ResourceRequestProto added 2 optional fields
** execution_type_request
*** YARN-5180.  2.9.0 and later.
** allocation_request_id
*** YARN-4888.  2.9.0 and later.
* New message ExecutionTypeRequestProto
** YARN-4888.  2.9.0 and later.
* ApplicationSubmissionContextProto changed 1 field to repeated, added 1 
optional field
** am_container_resource_request
*** YARN-6050.  2.9.0 and later.
** application_timeouts
*** YARN-4205.  2.9.0 and later.
* New enum ApplicationTimeoutTypeProto
** YARN-4205.  2.9.0 and later.
* New message ApplicationTimeoutMapProto
** YARN-4205.  2.9.0 and later.
* New message ApplicationUpdateTimeoutMapProto
** YARN-4205.  2.9.0 and later.
* LogAggregationContextProto added 2 optional fields
** log_aggregation_policy_class_name
*** YARN-221.  2.8.0 and later.
** log_aggregation_policy_parameters
*** YARN-221.  2.8.0 and later.
* YarnClusterMetricsProto added 5 optional fields
** All fields YARN-3348
* QueueStateProto added 1 new enum
** YARN-5756.  2.9.0 and later.
* New message QueueStatisticsProto
** YARN-3348.  2.8.0 and later.
* QueueInfoProto added 3 optional fields
** queueStatistics
** preemptionDisabled
** queueConfigurationsMap
* New message QueueConfigurationsProto
** YARN-6164.  2.9.0 and later.
* New message QueueConfigurationsMapRoto
** YARN-6164.  2.9.0 and later.
* New enum SignalContainerCommandProto
** YARN-1897.  2.8.0 and later.
* ReservationDefinitionProto added 2 optional fields
** recurrence_expression
*** YARN-5327.  2.9.0 and later.
** priority
*** YARN-5384.  2.9.0 and later.
* New message ResourceAllocationRequestProto
** YARN-4340.  2.8.0 and later.
* New message ReservationAllocationStateProto
** YARN-4340.  2.8.0 and later.
* ContainerLaunchContextProto added 2 optional fields
** container_retry_context
*** YARN-3998.  2.9.0 and later.
** tokens_conf
*** YARN-5910.  2.9.0 and later.
* ContainerStatusProto added 3 new optional fields
** capability
*** YARN-3866.  2.8.0 and later.
** executionType
*** YARN-2882.  2.9.0 and later.
** container_attributes
*** YARN-5430.  2.9.0 and later.
* Message ContainerResourceIncreaseRequestProto moved to 
yarn_service_protos.proto
** YARN-3866.  2.8.0 and later.
*** Still in 2.8.x
* Message ContainerResourceIncreaseProto moved to yarn_service_protos.proto
** YARN-3866.  2.8.0 and later.
*** Still in 2.8.x
* Message ContainerResourceDecreaseProto moved to yarn_service_protos.proto
** YARN-3866.  2.8.0 and later.
*** Still in 2.8.x
* New message ContainerRetryContextProto
** YARN-3998.  2.9.0 and later.

yarn_server_common_service_protos.proto
* New message RemoteNodeProto
* New message RegisterDistributedSchedulingAMResponseProto
* New message DistributedSchedulingAllocateResponseProto
* New message DistributedSchedulingAllocateRequestProto
* New message NodeLabelsProto
* RegisterNodeManagerRequestProto added 2 optional fields
* RegisterNodeManagerResponseProto 

[jira] [Assigned] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-21 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-7219:


Assignee: Ray Chiang

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Critical
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-19 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172491#comment-16172491
 ] 

Ray Chiang commented on YARN-7219:
--

Similar fix

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Priority: Critical
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-19 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172328#comment-16172328
 ] 

Ray Chiang commented on YARN-7219:
--

Will updating the update_requests field to 7 will be enough to fix the 
compatibility issue?  [~asuresh] or [~djp], any comment?

> Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk
> -
>
> Key: YARN-7219
> URL: https://issues.apache.org/jira/browse/YARN-7219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Priority: Critical
>
> For yarn_service_protos.proto, we have the following code in
> (branch-2.8.0, branch-2.8, branch-2)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated ContainerResourceIncreaseRequestProto increase_request = 6;
>   repeated UpdateContainerRequestProto update_requests = 7;
> }
> {noformat}
> For yarn_service_protos.proto, we have the following code in
> (trunk)
> {noformat}
> message AllocateRequestProto {
>   repeated ResourceRequestProto ask = 1;
>   repeated ContainerIdProto release = 2;
>   optional ResourceBlacklistRequestProto blacklist_request = 3;
>   optional int32 response_id = 4;
>   optional float progress = 5;
>   repeated UpdateContainerRequestProto update_requests = 6;
> }
> {noformat}
> Notes
> * YARN-3866 was the original JIRA for container resizing.
> * YARN-5221 is what introduced the incompatible change.
> * In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
> "Addendum patch to YARN-3866: fix incompatible API change."
> * There was a similar API fix done in YARN-6071.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7219) Fix AllocateRequestProto difference between branch-2/branch-2.8 and trunk

2017-09-19 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-7219:


 Summary: Fix AllocateRequestProto difference between 
branch-2/branch-2.8 and trunk
 Key: YARN-7219
 URL: https://issues.apache.org/jira/browse/YARN-7219
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn
Affects Versions: 3.0.0-beta1
Reporter: Ray Chiang
Priority: Critical


For yarn_service_protos.proto, we have the following code in
(branch-2.8.0, branch-2.8, branch-2)

{noformat}
message AllocateRequestProto {
  repeated ResourceRequestProto ask = 1;
  repeated ContainerIdProto release = 2;
  optional ResourceBlacklistRequestProto blacklist_request = 3;
  optional int32 response_id = 4;
  optional float progress = 5;
  repeated ContainerResourceIncreaseRequestProto increase_request = 6;
  repeated UpdateContainerRequestProto update_requests = 7;
}
{noformat}

For yarn_service_protos.proto, we have the following code in
(trunk)

{noformat}
message AllocateRequestProto {
  repeated ResourceRequestProto ask = 1;
  repeated ContainerIdProto release = 2;
  optional ResourceBlacklistRequestProto blacklist_request = 3;
  optional int32 response_id = 4;
  optional float progress = 5;
  repeated UpdateContainerRequestProto update_requests = 6;
}
{noformat}

Notes
* YARN-3866 was the original JIRA for container resizing.
* YARN-5221 is what introduced the incompatible change.
* In branch-2/branch-2.8/branch-2.8.0, this protobuf change was undone by 
"Addendum patch to YARN-3866: fix incompatible API change."
* There was a similar API fix done in YARN-6071.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-09-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170859#comment-16170859
 ] 

Ray Chiang commented on YARN-6142:
--

[~andrew.wang], I'll have to do the rest of the .proto files post beta1, but 
before GA.

For JACC output, I'm down to 217 issues out of the 3207 generated by JACC.  All 
the issues I've found so far are already documented as "Incompatible" in 
existing JIRAs.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7162) Remove XML excludes file format

2017-09-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166980#comment-16166980
 ] 

Ray Chiang commented on YARN-7162:
--

+1 (binding).  LGTM.

> Remove XML excludes file format
> ---
>
> Key: YARN-7162
> URL: https://issues.apache.org/jira/browse/YARN-7162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-7162.001.patch, YARN-7162.branch-2.001.patch
>
>
> YARN-5536 aims to replace the XML format for the excludes file with a JSON 
> format.  However, it looks like we won't have time for that for Hadoop 3 Beta 
> 1.  The concern is that if we release it as-is, we'll now have to support the 
> XML format as-is for all of Hadoop 3.x, which we're either planning on 
> removing, or rewriting using a pluggable framework.  
> [This comment in 
> YARN-5536|https://issues.apache.org/jira/browse/YARN-5536?focusedCommentId=16126194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16126194]
>  proposed two quick solutions to prevent this compat issue.  In this JIRA, 
> we're going to remove the XML format.  If we later want to add it back in, 
> YARN-5536 can add it back, rewriting it to be in the pluggable framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6717) [Umbrella] API related cleanup for Hadoop 3

2017-09-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1618#comment-1618
 ] 

Ray Chiang commented on YARN-6717:
--

Known incompatibilities that are flagged in JACC:

[YARN-5713 Update jackson from 1.9.13 to 2.x in 
hadoop-yarn|https://issues.apache.org/jira/browse/YARN-5713]
[YARN-3866 AM-RM protocol changes to support container 
resizing|https://issues.apache.org/jira/browse/YARN-3866]
* Could cause rolling upgrade issues from earlier than 2.8.0.


> [Umbrella] API related cleanup for Hadoop 3
> ---
>
> Key: YARN-6717
> URL: https://issues.apache.org/jira/browse/YARN-6717
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>
> Creating this umbrella JIRA for tracking various API related issues that need 
> to be properly tracked, adjusted, or documented before Hadoop 3 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6142) Support rolling upgrade between 2.x and 3.x

2017-09-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166655#comment-16166655
 ] 

Ray Chiang commented on YARN-6142:
--

h2. Proto Changes

h3. ResourceTracker.proto
* Added unRegisterNodeManager.  Change due to YARN-41.  Confirmed in 2.8.0 and 
later.
** Potential issue with upgraded NM but not RM.  => RM should be upgraded first.

h3. RPCHeader.proto
* Not related to YARN.  Looks like support for HTrace.

h3. applicationclient_protocol.proto
* Added failApplicationAttempt
** YARN-261.  Confirmed in 2.8.0 and later.
* Added getNewReservation
** YARN-4957.  Confirmed in 2.8.0 and later.
* Added listReservations
** YARN-4340.  Confirmed in 2.8.0 and later.
* Added updateApplicationPriority
** YARN-4014.  Confirmed in 2.8.0 and later.
* Added signalToContainer
** MAPREDUCE-5044.  Confirmed in 2.8.0 and later.
* Added updateApplicationTimeouts
** YARN-5611.  Confirmed in 2.9.0 and later.

h3. collectornodemanager_protocol.proto
* Requires Timeline Service to be running first?

h3. containermanagement_protocol.proto
* Added increaseContainersResource
** YARN-1449.  2.8.0 and later
* Added updateContainer
** YARN-5977.  2.9.0 and later
* signalToContainer
** MAPREDUCE-5044.  Confirmed 2.8.0 and later.
* Added localizeContainer 
** YARN-5557.  2.9.0 and later
* Added reInitializeContainer/restartContainer/rollbackContainer/commitContainer
** YARN-5609.  2.9.0 and later

h3. distributed_scheduling_am_protocol.proto
* All for distributed scheduling support in the AM

h3. mr_protos.proto
* Two optional fields added.

h3. resourcemanager_adminstration_protocol.proto
* All methods for RMAdmin CLI

h3. yarn_security_token.proto
* Four optional fields added
* Added nodeLabelExpression
** YARN-3354.  Confirmed 2.8.0 and later.
* Added containerType
** YARN-3116.  Confirmed 2.8.0 and later.
* Added executionType
** YARN-2882.  Confirmed 2.9.0 and later.
* Added version
** YARN-5221.  Confirmed 2.8.0 and later.

h3. yarn_server_common_protos.proto
* Four optional fields added

h3. yarn_server_nodemanager_recovery.proto
* All optional fields and self-contained message changes

h3. yarn_server_resourcemanager_recovery.proto
* All optional fields and self-contained message changes



h2. New proto files (i.e. ignorable)
* yarn_server_federation_protos.proto


h2. Still to complete
* Security.proto
* yarn_protos.proto
* yarn_server_common_service_protos.proto
* yarn_server_resourcemanager_service_protos.proto
* yarn_service_protos.proto


> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: YARN-6142
> URL: https://issues.apache.org/jira/browse/YARN-6142
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: rolling upgrade
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Ray Chiang
>Priority: Blocker
>
> Counterpart JIRA to HDFS-11096. We need to:
> * examine YARN and MR's  JACC report for binary and source incompatibilities
> * run the [PB 
> differ|https://issues.apache.org/jira/browse/HDFS-11096?focusedCommentId=15816405=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15816405]
>  that Sean wrote for HDFS-11096 for the YARN PBs.
> * sanity test some rolling upgrades between 2.x and 3.x. Ideally these are 
> automated and something we can run upstream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout

2017-08-22 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137326#comment-16137326
 ] 

Ray Chiang commented on YARN-5536:
--

I'm tracking blockers for beta1. What are the odds this is going to get done in 
time?

> Multiple format support (JSON, etc.) for exclude node file in NM graceful 
> decommission with timeout
> ---
>
> Key: YARN-5536
> URL: https://issues.apache.org/jira/browse/YARN-5536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Priority: Blocker
>
> Per discussion in YARN-4676, we agree that multiple format (other than xml) 
> should be supported to decommission nodes with timeout values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6523) RM requires large memory in sending out security tokens as part of Node Heartbeat in large cluster

2017-08-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135951#comment-16135951
 ] 

Ray Chiang commented on YARN-6523:
--

Looks like there hasn't been much movement on this recently.  [~Naganarasimha], 
we're about a month away from beta1.  In case there isn't a second beta, what 
is the likelihood we can get this one done?

> RM requires large memory in sending out security tokens as part of Node 
> Heartbeat in large cluster
> --
>
> Key: YARN-6523
> URL: https://issues.apache.org/jira/browse/YARN-6523
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>
> Currently as part of heartbeat response RM sets all application's tokens 
> though all applications might not be active on the node. On top of it 
> NodeHeartbeatResponsePBImpl converts tokens for each app into 
> SystemCredentialsForAppsProto. Hence for each node and each heartbeat too 
> many SystemCredentialsForAppsProto objects were getting created.
> We hit a OOM while testing for 2000 concurrent apps on 500 nodes cluster with 
> 8GB RAM configured for RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6996) Change javax.cache library implementation from JSR107 to Apache Geronimo

2017-08-15 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127671#comment-16127671
 ] 

Ray Chiang commented on YARN-6996:
--

Thanks [~subru] and [~busbey]!

> Change javax.cache library implementation from JSR107 to Apache Geronimo
> 
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>Priority: Blocker
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6996.001.patch
>
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5816) TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still flakey

2017-08-11 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124193#comment-16124193
 ] 

Ray Chiang commented on YARN-5816:
--

[~ajithshetty] or [~bibinchundatt], can we get this done in time for Hadoop 3 
beta 1?

> TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still 
> flakey
> ---
>
> Key: YARN-5816
> URL: https://issues.apache.org/jira/browse/YARN-5816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Reporter: Daniel Templeton
>Assignee: Ajith S
>Priority: Minor
> Attachments: 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer-output.txt,
>  
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.txt
>
>
> Even after YARN-5057, 
> TestDelegationTokenRenewer#testCancelWithMultipleAppSubmissions is still 
> flakey:
> {noformat}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.796 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
> testCancelWithMultipleAppSubmissions(org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer)
>   Time elapsed: 2.307 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer.testCancelWithMultipleAppSubmissions(TestDelegationTokenRenewer.java:1260)
> {noformat}
> Note that it's the same error as YARN-5057, but on a different line.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6996) Change javax.cache library implementation from JSR107 to Apache Geronimo

2017-08-11 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124162#comment-16124162
 ] 

Ray Chiang commented on YARN-6996:
--

Compiling works.  Unit test that tests the class that uses the jcache library 
passes.

> Change javax.cache library implementation from JSR107 to Apache Geronimo
> 
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6996.001.patch
>
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6996) Discuss license issue with javax.cache library

2017-08-11 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124047#comment-16124047
 ] 

Ray Chiang commented on YARN-6996:
--

[~subru], let me know what you think about this change.

> Discuss license issue with javax.cache library
> --
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6996.001.patch
>
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6996) Change javax.cache library implementation from JSR107 to Apache Geronimo

2017-08-11 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6996:
-
Summary: Change javax.cache library implementation from JSR107 to Apache 
Geronimo  (was: Discuss license issue with javax.cache library)

> Change javax.cache library implementation from JSR107 to Apache Geronimo
> 
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6996.001.patch
>
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6996) Discuss license issue with javax.cache library

2017-08-11 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6996:
-
Attachment: YARN-6996.001.patch

Based on a comment from LEGAL-325, trying out a patch that swaps to the Apache 
Geronimo jcache implementation.

> Discuss license issue with javax.cache library
> --
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6996.001.patch
>
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6996) Discuss license issue with javax.cache library

2017-08-11 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-6996:


Assignee: Ray Chiang

> Discuss license issue with javax.cache library
> --
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Moved] (YARN-6996) Discuss license issue with javax.cache library

2017-08-11 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang moved HADOOP-14763 to YARN-6996:
---

Affects Version/s: (was: 3.0.0-beta1)
   3.0.0-beta1
  Key: YARN-6996  (was: HADOOP-14763)
  Project: Hadoop YARN  (was: Hadoop Common)

> Discuss license issue with javax.cache library
> --
>
> Key: YARN-6996
> URL: https://issues.apache.org/jira/browse/YARN-6996
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>
> With YARN Federation, we added YARN-3672, which adds the following to 
> {noformat}
> javax.cache
> cache-api
> {noformat}
> This third-party library has some murky license history, as documented in 
> this [really long comment 
> thread|https://github.com/jsr107/jsr107spec/issues/333].  The summary of the 
> thread is that "the library is officially APL (take our word for it), but 
> there hasn't been a subsequent release with the license file change".
> LEGAL-325 has been filed to discuss the validity of this license for Apache.
> Before we get to final Hadoop 3 release, I'm wondering if anyone else has 
> concerns about using this library.  Just from looking at the various javax 
> Maven artifacts in our pom.xml files, I see a lot of other javax.* library 
> entries (although we may not ship the .jars if they're part of the Java 
> runtime).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2017-08-03 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-3610:
-
Attachment: YARN-3610.002.patch

> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
> Attachments: YARN-3610.001.patch, YARN-3610.002.patch
>
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5728) TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization timeout

2017-07-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106014#comment-16106014
 ] 

Ray Chiang commented on YARN-5728:
--

Sorry for the delay.  The change looks fine to me.  +1

> TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization timeout
> 
>
> Key: YARN-5728
> URL: https://issues.apache.org/jira/browse/YARN-5728
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: YARN-5728.001.patch, YARN-5728.002.patch, 
> YARN-5728.01.patch
>
>
> TestMiniYARNClusterNodeUtilization.testUpdateNodeUtilization is failing by 
> timeout.
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/192/testReport/junit/org.apache.hadoop.yarn.server/TestMiniYarnClusterNodeUtilization/testUpdateNodeUtilization/
> {noformat}
> java.lang.Exception: test timed out after 6 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
>   at com.sun.proxy.$Proxy85.nodeHeartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:113)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2017-07-25 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100967#comment-16100967
 ] 

Ray Chiang commented on YARN-6868:
--

Unit test failure looks to be the same as YARN-5548.

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2017-07-25 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100396#comment-16100396
 ] 

Ray Chiang commented on YARN-6868:
--

[~haibo.chen] or [~sjlee0], can you let me know if the above is also no longer 
necessary?  I'm guessing it might be leftover from some test restructuring.

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2017-07-25 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100391#comment-16100391
 ] 

Ray Chiang commented on YARN-6868:
--

Oddly, I also see this entry as being unnecessary or at least I can't figure 
out how make this fail when I remove it.

{noformat}

  org.apache.hadoop
  hadoop-yarn-server-timelineservice
  test
  test-jar

{noformat}

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2017-07-25 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6868:
-
Attachment: YARN-6868.001.patch

> Add test scope to certain entries in hadoop-yarn-server-resourcemanager 
> pom.xml
> ---
>
> Key: YARN-6868
> URL: https://issues.apache.org/jira/browse/YARN-6868
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6868.001.patch
>
>
> The tag
> {noformat}
> test
> {noformat}
> is missing from a few entries in the pom.xml for 
> hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6868) Add test scope to certain entries in hadoop-yarn-server-resourcemanager pom.xml

2017-07-25 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-6868:


 Summary: Add test scope to certain entries in 
hadoop-yarn-server-resourcemanager pom.xml
 Key: YARN-6868
 URL: https://issues.apache.org/jira/browse/YARN-6868
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0-beta1
Reporter: Ray Chiang
Assignee: Ray Chiang


The tag

{noformat}
test
{noformat}

is missing from a few entries in the pom.xml for 
hadoop-yarn-server-resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-07-24 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098922#comment-16098922
 ] 

Ray Chiang commented on YARN-6150:
--

+1

Thanks [~ajisakaa] for digging into this.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch, YARN-6150.007.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5049) Extend NMStateStore to save queued container information

2017-07-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-5049:
-
Release Note: 
This breaks rolling upgrades because it changes the major version of the NM 
state store schema. Therefore when a new NM comes up on an old state store it 
crashes.

The state store versions for this change have been updated in YARN-6798.

  was:This breaks rolling upgrades because it changes the major version of the 
NM state store schema. Therefore when a new NM comes up on an old state store 
it crashes.


> Extend NMStateStore to save queued container information
> 
>
> Key: YARN-5049
> URL: https://issues.apache.org/jira/browse/YARN-5049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5049.001.patch, YARN-5049.002.patch, 
> YARN-5049.003.patch
>
>
> This JIRA is about extending the NMStateStore to save queued container 
> information whenever a new container is added to the NM queue. 
> It also removes the information from the state store when the queued 
> container starts its execution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-07-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6127:
-
Hadoop Flags: Incompatible change
Release Note: 
This breaks rolling upgrades because it changes the major version of the NM 
state store schema. Therefore when a new NM comes up on an old state store it 
crashes.

The state store versions for this change have been updated in YARN-6798.

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, 
> YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6798) Fix NM startup failure with old state store due to version mismatch

2017-07-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6798:
-
Release Note: 
This fixes the LevelDB state store for the NodeManager.  As of this patch, the 
state store versions now correspond to the following table.

- Previous Patch: YARN-5049
-- LevelDB Key: queued
-- Hadoop Versions: 2.9.0, 3.0.0-alpha1
-- Corresponding LevelDB Version: 1.2
- Previous Patch: YARN-6127
-- LevelDB Key: AMRMProxy/NextMasterKey
-- Hadoop Versions: 2.9.0, 3.0.0-alpha4
-- Corresponding LevelDB Version: 1.1

  was:
This fixes the LevelDB state store for the NodeManager.  As of this patch, the 
state store versions now correspond to the following table.

|| Patch || LevelDBKey(s) || Hadoop Versions || NM LevelDB Version ||
| YARN-5049 | queued | (2.9.0, 3.0.0-alpha1) | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | 1.1 |


> Fix NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6798.v1.patch, YARN-6798.v2.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6798) Fix NM startup failure with old state store due to version mismatch

2017-07-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6798:
-
Summary: Fix NM startup failure with old state store due to version 
mismatch  (was: NM startup failure with old state store due to version mismatch)

> Fix NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch, YARN-6798.v2.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-17 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090778#comment-16090778
 ] 

Ray Chiang commented on YARN-6798:
--

+1

I'm going to commit this tomorrow unless I hear otherwise.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch, YARN-6798.v2.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-17 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6798:
-
Attachment: YARN-6798.v2.patch

Updated Botong's patch with the newer version organization.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch, YARN-6798.v2.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088085#comment-16088085
 ] 

Ray Chiang edited comment on YARN-6798 at 7/14/17 9:17 PM:
---

Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | (2.9.0, 3.0.0-alpha1) | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |


was (Author: rchiang):
Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088085#comment-16088085
 ] 

Ray Chiang commented on YARN-6798:
--

Updating the version table:

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date || NM LevelDB 
Version ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 | 1.2 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 | 
1.1 |

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088081#comment-16088081
 ] 

Ray Chiang commented on YARN-6798:
--

Thanks [~asuresh]!  [~botong], it looks like we'll use 1.2 as our current 
version.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-6798:


Assignee: Botong Huang  (was: Ray Chiang)

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088015#comment-16088015
 ] 

Ray Chiang commented on YARN-6798:
--

Thanks [~asuresh].  That's what I get for relying on JIRA and forgetting to 
check git.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087719#comment-16087719
 ] 

Ray Chiang commented on YARN-6798:
--

Finally got a bit of time to look at the previous patches.  I see a minor issue.

|| Patch || LevelDBKey(s) || Hadoop Versions || Commit Date ||
| YARN-5049 | queued | 3.0.0-alpha1 | May 11, 2016 |
| YARN-6127 | AMRMProxy/NextMasterKey | (2.9.0, 3.0.0-alpha4) | June 22, 2017 |

So, branch-2 has just YARN-6127, while trunk has YARN-5049 and YARN-6127.  If 
we label YARN-5049 as 1.1 and YARN-6127 as 1.2, then branch-2's having a 1.2 
version won't quite be accurate.  If do the reverse, we'd be chronologically 
backward (which seems okay to me, but I'd like a second opinion).


> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6822) TestContainerManagerSecurity tests fail on trunk

2017-07-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087482#comment-16087482
 ] 

Ray Chiang commented on YARN-6822:
--

[~Sonia], can you confirm whether this issue continues after applying YARN-6150?

> TestContainerManagerSecurity tests fail on trunk
> 
>
> Key: YARN-6822
> URL: https://issues.apache.org/jira/browse/YARN-6822
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
> Environment: Ubuntu 14.04 
> x86, ppc64le
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>
> {code}
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[0]
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
>   at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
> {code}
> {code}
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager[1]
> java.lang.Exception: test timed out after 12 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.processWaitTimeAndRetryInfo(RetryInvocationHandler.java:130)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:107)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:348)
>   at com.sun.proxy.$Proxy91.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.startContainer(TestContainerManagerSecurity.java:557)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testStartContainer(TestContainerManagerSecurity.java:478)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
>   at 
> org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
> {code}
> Logs -
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_0_/
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/463/testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testContainerManager_1_/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084804#comment-16084804
 ] 

Ray Chiang commented on YARN-6798:
--

{quote}
It would be helpful to have a release note that calls out the incompatibility 
with 3.0-alpha releases and that users who are upgrading from one of those 
releases will need to erase the NM state store on each node before upgrading.
{quote}

Agreed.  I intend to modify the release notes for this JIRA and the previous 
two to make this versioning issue clear.


> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-12 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084537#comment-16084537
 ] 

Ray Chiang commented on YARN-6798:
--

The failed unit test looks like YARN-5857.

I'm okay with this update as it is.  This will be incompatible the previous 
alphas and anyone running directly from branch-2 builds.  Does anyone have any 
problems with that?


> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
> Attachments: YARN-6798.v1.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-10 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang reassigned YARN-6798:


Assignee: Ray Chiang

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-10 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081105#comment-16081105
 ] 

Ray Chiang commented on YARN-6798:
--

[~kkaranasos], [~subru], [~botong], [~asuresh].  It looks like you guys bumped 
the NM version twice.  Is this behavior desirable or is it preferable to have 
more compatible state store versions (i.e. 1.0 -> 1.1 -> 1.2 instead of 1.0 -> 
2.0 -> 3.0).

Plus, anyone else who has thoughts about NM rolling upgrade, please chime in.

> NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6798) NM startup failure with old state store due to version mismatch

2017-07-10 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-6798:


 Summary: NM startup failure with old state store due to version 
mismatch
 Key: YARN-6798
 URL: https://issues.apache.org/jira/browse/YARN-6798
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha4
Reporter: Ray Chiang


YARN-6703 rolled back the state store version number for the RM from 2.0 to 1.4.

YARN-6127 bumped the version for the NM to 3.0

private static final Version CURRENT_VERSION_INFO = Version.newInstance(3, 
0);

YARN-5049 bumped the version for the NM to 2.0

private static final Version CURRENT_VERSION_INFO = Version.newInstance(2, 
0);

During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
alpha4.

{noformat}
2017-07-07 15:48:17,259 FATAL 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
NodeManager
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
Incompatible version for NM state: expecting NM state version 3.0, but loading 
version 2.0
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
Caused by: java.io.IOException: Incompatible version for NM state: expecting NM 
state version 3.0, but loading version 2.0
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
at 
org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
2017-07-07 15:48:17,277 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
/
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5067) Support specifying resources for AM containers in SLS

2017-07-07 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-5067:
-
Fix Version/s: 3.0.0-beta1

> Support specifying resources for AM containers in SLS
> -
>
> Key: YARN-5067
> URL: https://issues.apache.org/jira/browse/YARN-5067
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler-load-simulator
>Reporter: Wangda Tan
>Assignee: Yufei Gu
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-5067.001.patch, YARN-5067.002.patch, 
> YARN-5067.003.patch
>
>
> Now resource of application masters in SLS is hardcoded to mem=1024 vcores=1.
> We should be able to specify AM resources from trace input file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures

2017-06-29 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068796#comment-16068796
 ] 

Ray Chiang commented on YARN-6409:
--

What about making this a configuration setting?  It seems like this shows up 
more on larger clusters (higher chance of network down, more nodes to deal 
with, etc.).

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 

[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067471#comment-16067471
 ] 

Ray Chiang commented on YARN-6150:
--

I thought I had done this before, but the latest version of this patch doesn't 
seem to fix the unit test in branch-2.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch, YARN-6150.007.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-28 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067193#comment-16067193
 ] 

Ray Chiang commented on YARN-6150:
--

Unit test failure looks identical to YARN-5728.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch, YARN-6150.007.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-28 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6150:
-
Attachment: YARN-6150.007.patch

Adding Akira's suggested change.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch, YARN-6150.007.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-26 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063735#comment-16063735
 ] 

Ray Chiang commented on YARN-6150:
--

[~sturman], please implement Akira's suggested code change.  That looks to be 
the last issue.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-23 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061222#comment-16061222
 ] 

Ray Chiang commented on YARN-6150:
--

Thanks for digging into the issue [~ajisakaa].  I have a slight preference for 
the first solution, since that would make the test a bit more robust.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6717) [Umbrella] API related cleanup for Hadoop 3

2017-06-16 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-6717:


 Summary: [Umbrella] API related cleanup for Hadoop 3
 Key: YARN-6717
 URL: https://issues.apache.org/jira/browse/YARN-6717
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ray Chiang
Assignee: Ray Chiang


Creating this umbrella JIRA for tracking various API related issues that need 
to be properly tracked, adjusted, or documented before Hadoop 3 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-16 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052170#comment-16052170
 ] 

Ray Chiang commented on YARN-6150:
--

Interesting.  The failed TestContainerManagerSecurity tests are different.  The 
error in the logs is:

{quote}
Failed tests: 
  TestContainerManagerSecurity.testContainerManager:167->testNMTokens:268 In 
calling af73ca3dfb64:49984 exception was 'Invalid host name: local host is: 
(unknown); destination host is: "af73ca3dfb64":49984; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost' but doesn't contain 'SIMPLE 
authentication is not enabled.  Available:[TOKEN]'
  TestContainerManagerSecurity.testContainerManager:167->testNMTokens:268 In 
calling af73ca3dfb64:34648 exception was 'Invalid host name: local host is: 
(unknown); destination host is: "af73ca3dfb64":34648; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost' but doesn't contain 'Client cannot 
authenticate via:[TOKEN]'
{quote}


> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-06-15 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051151#comment-16051151
 ] 

Ray Chiang commented on YARN-6150:
--

Just as an FYI, I'm seeing this fail with regularity on OS X and Linux in trunk 
with the error:

{quote}
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.waitForContainerToFinishOnNM(TestContainerManagerSecurity.java:398)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:341)
at 
org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:158)
{quote}

With this patch, I no longer see the error on either platform.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5894) license warning in de.ruedigermoeller:fst:jar:2.24

2017-04-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973430#comment-15973430
 ] 

Ray Chiang commented on YARN-5894:
--

[~jeagles], it looks like this was brought in as part of YARN-3448.  Did we get 
some special permission for this license issue?  It seems like this would need 
to be corrected before final 3.0 release.

> license warning in de.ruedigermoeller:fst:jar:2.24
> --
>
> Key: YARN-5894
> URL: https://issues.apache.org/jira/browse/YARN-5894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Priority: Blocker
>
> The artifact de.ruedigermoeller:fst:jar:2.24, that ApplicationHistoryService 
> depends on,  shows its license being LGPL 2.1 in our license checking.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6273) TestAMRMClient#testAllocationWithBlacklist fails intermittently

2017-03-16 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928638#comment-15928638
 ] 

Ray Chiang commented on YARN-6273:
--

Test

> TestAMRMClient#testAllocationWithBlacklist fails intermittently
> ---
>
> Key: YARN-6273
> URL: https://issues.apache.org/jira/browse/YARN-6273
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Ray Chiang
>
> I'm seeing this unit test fail in trunk:
> testAllocationWithBlacklist(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 0.738 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAllocationWithBlacklist(TestAMRMClient.java:721)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6331) Fix flakiness in TestFairScheduler#testDumpState

2017-03-14 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925110#comment-15925110
 ] 

Ray Chiang commented on YARN-6331:
--

Looks good to me [~yufeigu].  +1.  Will commit soon.

> Fix flakiness in TestFairScheduler#testDumpState
> 
>
> Key: YARN-6331
> URL: https://issues.apache.org/jira/browse/YARN-6331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6331.001.patch
>
>
> Flakiness could happen in TestFairScheduler#testDumpState due to 
> unpredictable running time of FairScheduler#update() since it updates the 
> demand of queues. Explicitly invoke the Schedulable#updateDemand() to solve 
> the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6042) Dump scheduler and queue state information into FairScheduler DEBUG log

2017-03-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-6042:
-
Fix Version/s: (was: 3.0.0-alpha3)
   2.9.0

Pushed to branch-2.  Updated version to match.  Thanks [~yufeigu].

> Dump scheduler and queue state information into FairScheduler DEBUG log
> ---
>
> Key: YARN-6042
> URL: https://issues.apache.org/jira/browse/YARN-6042
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-6042.001.patch, YARN-6042.002.patch, 
> YARN-6042.003.patch, YARN-6042.004.patch, YARN-6042.005.patch, 
> YARN-6042.006.patch, YARN-6042.007.patch, YARN-6042.008.patch, 
> YARN-6042.009.patch, YARN-6042.010.patch, YARN-6042.branch-2.001.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   >