[jira] [Created] (YARN-9848) revert YARN-4946

2019-09-19 Thread Steven Rand (Jira)
Steven Rand created YARN-9848:
-

 Summary: revert YARN-4946
 Key: YARN-9848
 URL: https://issues.apache.org/jira/browse/YARN-9848
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, resourcemanager
Reporter: Steven Rand


In YARN-4946, we've been discussing a revert due to the potential for keeping 
more applications in the state store than desired, and the potential to greatly 
increase RM recovery times.

 

I'm in favor of reverting the patch, but other ideas along the lines of 
YARN-9571 would work as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9847) ZKRMStateStore will cause zk connection loss when writing huge data into znode

2019-09-19 Thread Wang, Xinglong (Jira)
Wang, Xinglong created YARN-9847:


 Summary: ZKRMStateStore will cause zk connection loss when writing 
huge data into znode
 Key: YARN-9847
 URL: https://issues.apache.org/jira/browse/YARN-9847
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wang, Xinglong
Assignee: Wang, Xinglong


Recently, we encountered RM ZK connection issue due to RM was trying to write 
huge data into znode. This behavior will zk report Len error and then cause zk 
session connection loss. And eventually RM would crash due to zk connection 
issue.

*The fix*

In order to protect ResouceManager from crash due to this.
This fix is trying to limit the size of data for attemp by limiting the 
diagnostic info when writing ApplicationAttemptStateData into znode. The size 
will be regulated by -Djute.maxbuffer set in yarn-env.sh. The same value will 
be also used by zookeeper server.

*The story*

ResourceManager Log
{code:java}
2019-07-29 02:14:59,638 WARN org.apache.zookeeper.ClientCnxn: Session 
0x36ab902369100a0 for serverabc-zk-5.vip.ebay.com/10.210.82.29:2181, unexpected 
error, closing socket connection and attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

2019-07-29 04:27:35,459 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
Exception while executing a ZK operation.
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:935)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:998)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$5.run(ZKRMStateStore.java:995)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1174)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1001)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:1009)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.setDataWithRetries(ZKRMStateStore.java:1050)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:699)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:299)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1036)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1031)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}


ResourceManager will retry to connect to zookeeper until it exhausted retry 
number and then give up.

{code:java}
2019-07-29 02:25:06,404 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Retrying 
operation on ZK. Retry no. 999


2019-07-29 02:25:06,718 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: 
Client will use GSSAPI as SASL mechanism.
2019-07-29 02:25:06,718 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server 2019-07-29 02:25:06,404 INFO 

[jira] [Resolved] (YARN-6684) TestAMRMClient tests fail on branch-2.7

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-6684.
-
Resolution: Won't Fix

branch-2.7 EOL, closing as won't fix

> TestAMRMClient tests fail on branch-2.7
> ---
>
> Key: YARN-6684
> URL: https://issues.apache.org/jira/browse/YARN-6684
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Priority: Major
>
> {noformat}2017-06-01 19:10:44,362 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:addNode(1335)) - Added node 
> jhung-ld2.linkedin.biz:58205 clusterResource: 
> 2017-06-01 19:10:44,370 INFO  server.MiniYARNCluster 
> (MiniYARNCluster.java:waitForNodeManagersToConnect(657)) - All Node Managers 
> connected in MiniYARNCluster
> 2017-06-01 19:10:44,376 INFO  client.RMProxy (RMProxy.java:createRMProxy(98)) 
> - Connecting to ResourceManager at jhung-ld2.linkedin.biz/ipaddr:36167
> 2017-06-01 19:10:45,501 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:46,502 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 1 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:47,503 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 2 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2017-06-01 19:10:48,504 INFO  ipc.Client 
> (Client.java:handleConnectionFailure(872)) - Retrying connect to server: 
> jhung-ld2.linkedin.biz/ipaddr:36167. Already tried 3 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS){noformat}
> After some investigation, seems it is the same issue as described here: 
> HDFS-11893



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8825) Print application tags in ApplicationSummary

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-8825.
-
Resolution: Duplicate

> Print application tags in ApplicationSummary
> 
>
> Key: YARN-8825
> URL: https://issues.apache.org/jira/browse/YARN-8825
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Useful for tracking application tag metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2

2019-09-19 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-9844.
-
Resolution: Fixed

> TestCapacitySchedulerPerf test errors in branch-2
> -
>
> Key: YARN-9844
> URL: https://issues.apache.org/jira/browse/YARN-9844
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jonathan Hung
>Priority: Major
>
> These TestCapacitySchedulerPerf throughput tests are failing in branch-2:
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}
> {{[ERROR]   
> TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114
>  » ArrayIndexOutOfBounds}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Hadoop-3.1.3-RC0

2019-09-19 Thread epa...@apache.org



+1 (binding)

Thanks Zhankun for all of your hard work on this release.

I downloaded and built the source and ran it on an insecure multi-node pseudo 
cluster.

I performed various YARN manual tests, including creating custom resources, 
creating queue submission ACLs, and queue refreshes.

One concern is that preemption does not seem to be working when only the custom 
resources are over the queue capacity, but I don't think this is something 
introduced with this release.

-Eric



On Thursday, September 12, 2019, 3:04:44 AM CDT, Zhankun Tang 
 wrote: 





Hi folks,

Thanks to everyone's help on this release. Special thanks to Rohith,
Wei-Chiu, Akira, Sunil, Wangda!

I have created a release candidate (RC0) for Apache Hadoop 3.1.3.

The RC release artifacts are available at:
http://home.apache.org/~ztang/hadoop-3.1.3-RC0/

The maven artifacts are staged at:
https://repository.apache.org/content/repositories/orgapachehadoop-1228/

The RC tag in git is here:
https://github.com/apache/hadoop/tree/release-3.1.3-RC0

And my public key is at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

*This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.*

For the testing, I have run several Spark and distributed shell jobs in my
pseudo cluster.

My +1 (non-binding) to start.

BR,
Zhankun

On Wed, 4 Sep 2019 at 15:56, zhankun tang  wrote:

> Hi all,
>
> Thanks for everyone helping in resolving all the blockers targeting Hadoop
> 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers issues
> to 3.1.4.
>
> I'll cut the branch today and call a release vote soon. Thanks!
>
>
> [1]. https://s.apache.org/5hj5i
>
> BR,
> Zhankun
>
>
> On Wed, 21 Aug 2019 at 12:38, Zhankun Tang  wrote:
>
>> Hi folks,
>>
>> We have Apache Hadoop 3.1.2 released on Feb 2019.
>>
>> It's been more than 6 months passed and there're
>>
>> 246 fixes[1]. 2 blocker and 4 critical Issues [2]
>>
>> (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker)
>>
>>
>> I propose my plan to do a maintenance release of 3.1.3 in the next few
>> (one or two) weeks.
>>
>> Hadoop 3.1.3 release plan:
>>
>> Code Freezing Date: *25th August 2019 PDT*
>>
>> Release Date: *31th August 2019 PDT*
>>
>>
>> Please feel free to share your insights on this. Thanks!
>>
>>
>> [1] https://s.apache.org/zw8l5
>>
>> [2] https://s.apache.org/fjol5
>>
>>
>> BR,
>>
>> Zhankun
>>
>

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9846) User Fineer-Grain Synchronization in ResourceLocalizationService.java

2019-09-19 Thread David Mollitor (Jira)
David Mollitor created YARN-9846:


 Summary: User Fineer-Grain Synchronization in 
ResourceLocalizationService.java
 Key: YARN-9846
 URL: https://issues.apache.org/jira/browse/YARN-9846
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java#L788

# Remove these synchronization blocks
# Ensure {{recentlyCleanedLocalizers}} is thread safe





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-19 Thread Aaron Fabbri
+1 (binding)

Thanks to the Ozone folks for their efforts at maintaining good separation
with HDFS and common. I took a lot of heat for the unpopular opinion that
they should  be separate, so I am glad the process has worked out well for
both codebases. It looks like my concerns were addressed and I appreciate
it.  It is cool to see the evolution here.

Aaron


On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran 
wrote:

> in that case,
>
> +1 from me (binding)
>
> On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton  wrote:
>
> >  > one thing to consider here as you are giving up your ability to make
> >  > changes in hadoop-* modules, including hadoop-common, and their
> >  > dependencies, in sync with your own code. That goes for filesystem
> > contract
> >  > tests.
> >  >
> >  > are you happy with that?
> >
> >
> > Yes. I think we can live with it.
> >
> > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc)
> > are stable enough, we didn't need bigger changes until now (small
> > patches are already included in 3.1/3.2).
> >
> > I think it's better to use released Hadoop bits in Ozone anyway, and
> > worst (best?) case we can try to do more frequent patch releases from
> > Hadoop (if required).
> >
> >
> > m.
> >
> >
> >
>


[jira] [Created] (YARN-9845) Update to Use Java 8 Map Concurrent API

2019-09-19 Thread David Mollitor (Jira)
David Mollitor created YARN-9845:


 Summary: Update to Use Java 8 Map Concurrent API
 Key: YARN-9845
 URL: https://issues.apache.org/jira/browse/YARN-9845
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java#L467

Class is using a {{ConcurrentHashMap}} but is not taking advantage of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9844) TestCapacitySchedulerPerf test errors in branch-2

2019-09-19 Thread Jim Brennan (Jira)
Jim Brennan created YARN-9844:
-

 Summary: TestCapacitySchedulerPerf test errors in branch-2
 Key: YARN-9844
 URL: https://issues.apache.org/jira/browse/YARN-9844
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test, yarn
Affects Versions: 2.10.0
Reporter: Jim Brennan


**These TestCapacitySchedulerPerf throughput tests are failing in branch-2:

{{[ERROR]   
TestCapacitySchedulerPerf.testUserLimitThroughputForFiveResources:263->testUserLimitThroughputWithNumberOfResourceTypes:114
 » ArrayIndexOutOfBounds}}{{[ERROR]   
TestCapacitySchedulerPerf.testUserLimitThroughputForFourResources:258->testUserLimitThroughputWithNumberOfResourceTypes:114
 » ArrayIndexOutOfBounds}}{{[ERROR]   
TestCapacitySchedulerPerf.testUserLimitThroughputForThreeResources:253->testUserLimitThroughputWithNumberOfResourceTypes:114
 » ArrayIndexOutOfBounds}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Hadoop-3.1.3-RC0

2019-09-19 Thread Weiwei Yang
+1 (binding)

- Downloaded source, setup a single node cluster
- Verified basic HDFS operations, put/get/cat etc
- Verified basic YARN restful apis, cluster/nodes/scheduler, all seem good
- Run several distributed shell job

Thanks
Weiwei
On Sep 19, 2019, 4:28 PM +0800, Sunil Govindan , wrote:
> +1 (binding)
>
> Thanks Zhankun for putting up the release. Thanks for leading this.
>
> - verified signature
> - ran a local cluster from tar ball
> - ran some MR jobs
> - perform CLI ops, and looks good
> - UI seems fine
>
> Thanks
> Sunil
>
> On Thu, Sep 12, 2019 at 1:34 PM Zhankun Tang  wrote:
>
> > Hi folks,
> >
> > Thanks to everyone's help on this release. Special thanks to Rohith,
> > Wei-Chiu, Akira, Sunil, Wangda!
> >
> > I have created a release candidate (RC0) for Apache Hadoop 3.1.3.
> >
> > The RC release artifacts are available at:
> > http://home.apache.org/~ztang/hadoop-3.1.3-RC0/
> >
> > The maven artifacts are staged at:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1228/
> >
> > The RC tag in git is here:
> > https://github.com/apache/hadoop/tree/release-3.1.3-RC0
> >
> > And my public key is at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.*
> >
> > For the testing, I have run several Spark and distributed shell jobs in my
> > pseudo cluster.
> >
> > My +1 (non-binding) to start.
> >
> > BR,
> > Zhankun
> >
> > On Wed, 4 Sep 2019 at 15:56, zhankun tang  wrote:
> >
> > > Hi all,
> > >
> > > Thanks for everyone helping in resolving all the blockers targeting
> > Hadoop
> > > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers
> > issues
> > > to 3.1.4.
> > >
> > > I'll cut the branch today and call a release vote soon. Thanks!
> > >
> > >
> > > [1]. https://s.apache.org/5hj5i
> > >
> > > BR,
> > > Zhankun
> > >
> > >
> > > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > We have Apache Hadoop 3.1.2 released on Feb 2019.
> > > >
> > > > It's been more than 6 months passed and there're
> > > >
> > > > 246 fixes[1]. 2 blocker and 4 critical Issues [2]
> > > >
> > > > (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker)
> > > >
> > > >
> > > > I propose my plan to do a maintenance release of 3.1.3 in the next few
> > > > (one or two) weeks.
> > > >
> > > > Hadoop 3.1.3 release plan:
> > > >
> > > > Code Freezing Date: *25th August 2019 PDT*
> > > >
> > > > Release Date: *31th August 2019 PDT*
> > > >
> > > >
> > > > Please feel free to share your insights on this. Thanks!
> > > >
> > > >
> > > > [1] https://s.apache.org/zw8l5
> > > >
> > > > [2] https://s.apache.org/fjol5
> > > >
> > > >
> > > > BR,
> > > >
> > > > Zhankun
> > > >
> > >
> >


[jira] [Created] (YARN-9843) Test TestAMSimulator.testAMSimulator fails intermittently.

2019-09-19 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9843:
---

 Summary: Test TestAMSimulator.testAMSimulator fails intermittently.
 Key: YARN-9843
 URL: https://issues.apache.org/jira/browse/YARN-9843
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Stack trace for failure:

java.lang.AssertionError: java.io.IOException: Unable to delete directory 
/testptch/hadoop/hadoop-tools/hadoop-sls/target/test-dir/output4038286622450859971/metrics.
 at org.junit.Assert.fail(Assert.java:88)
 at 
org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.deleteMetricOutputDir(TestAMSimulator.java:141)
 at 
org.apache.hadoop.yarn.sls.appmaster.TestAMSimulator.tearDown(TestAMSimulator.java:298)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runners.Suite.runChild(Suite.java:128)
 at org.junit.runners.Suite.runChild(Suite.java:27)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
 at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-19 Thread Steve Loughran
in that case,

+1 from me (binding)

On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton  wrote:

>  > one thing to consider here as you are giving up your ability to make
>  > changes in hadoop-* modules, including hadoop-common, and their
>  > dependencies, in sync with your own code. That goes for filesystem
> contract
>  > tests.
>  >
>  > are you happy with that?
>
>
> Yes. I think we can live with it.
>
> Fortunatelly the Hadoop parts which are used by Ozone (security + rpc)
> are stable enough, we didn't need bigger changes until now (small
> patches are already included in 3.1/3.2).
>
> I think it's better to use released Hadoop bits in Ozone anyway, and
> worst (best?) case we can try to do more frequent patch releases from
> Hadoop (if required).
>
>
> m.
>
>
>


[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-19 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9842:
---

 Summary: Port YARN-9608 DecommissioningNodesWatcher should get 
lists of running applications on node from RMNode to branch-3.0/branch-2
 Key: YARN-9842
 URL: https://issues.apache.org/jira/browse/YARN-9842
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0

2019-09-19 Thread Abhishek Modi
Hi Rohith,

Thanks for driving this release.

+1 (binding)

- built from the source on windows machine.
- created a pseudo cluster.
- ran PI job.
- checked basic metrics with ATSv2 enabled.

On Thu, Sep 19, 2019 at 12:30 PM Sunil Govindan  wrote:

> Hi Rohith
>
> Thanks for putting this together, appreciate the same.
>
> +1 (binding)
>
> - verified signature
> - brought up a cluster from the tar ball
> - Ran some basic MR jobs
> - RM UI seems fine (old and new)
>
>
> Thanks
> Sunil
>
> On Wed, Sep 11, 2019 at 12:56 PM Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
> > Hi folks,
> >
> > I have put together a release candidate (RC0) for Apache Hadoop 3.2.1.
> >
> > The RC is available at:
> > http://home.apache.org/~rohithsharmaks/hadoop-3.2.1-RC0/
> >
> > The RC tag in git is release-3.2.1-RC0:
> > https://github.com/apache/hadoop/tree/release-3.2.1-RC0
> >
> >
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1226/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> >
> > This vote will run for 7 days(5 weekdays), ending on 18th Sept at 11:59
> pm
> > PST.
> >
> > I have done testing with a pseudo cluster and distributed shell job. My
> +1
> > to start.
> >
> > Thanks & Regards
> > Rohith Sharma K S
> >
>


-- 
Regards,
Abhishek Modi


Re: [VOTE] Release Hadoop-3.1.3-RC0

2019-09-19 Thread Sunil Govindan
+1 (binding)

Thanks Zhankun for putting up the release. Thanks for leading this.

- verified signature
- ran a local cluster from tar ball
- ran some MR jobs
- perform CLI ops, and looks good
- UI seems fine

Thanks
Sunil

On Thu, Sep 12, 2019 at 1:34 PM Zhankun Tang  wrote:

> Hi folks,
>
> Thanks to everyone's help on this release. Special thanks to Rohith,
> Wei-Chiu, Akira, Sunil, Wangda!
>
> I have created a release candidate (RC0) for Apache Hadoop 3.1.3.
>
> The RC release artifacts are available at:
> http://home.apache.org/~ztang/hadoop-3.1.3-RC0/
>
> The maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1228/
>
> The RC tag in git is here:
> https://github.com/apache/hadoop/tree/release-3.1.3-RC0
>
> And my public key is at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> *This vote will run for 7 days, ending on Sept.19th at 11:59 pm PST.*
>
> For the testing, I have run several Spark and distributed shell jobs in my
> pseudo cluster.
>
> My +1 (non-binding) to start.
>
> BR,
> Zhankun
>
> On Wed, 4 Sep 2019 at 15:56, zhankun tang  wrote:
>
> > Hi all,
> >
> > Thanks for everyone helping in resolving all the blockers targeting
> Hadoop
> > 3.1.3[1]. We've cleaned all the blockers and moved out non-blockers
> issues
> > to 3.1.4.
> >
> > I'll cut the branch today and call a release vote soon. Thanks!
> >
> >
> > [1]. https://s.apache.org/5hj5i
> >
> > BR,
> > Zhankun
> >
> >
> > On Wed, 21 Aug 2019 at 12:38, Zhankun Tang  wrote:
> >
> >> Hi folks,
> >>
> >> We have Apache Hadoop 3.1.2 released on Feb 2019.
> >>
> >> It's been more than 6 months passed and there're
> >>
> >> 246 fixes[1]. 2 blocker and 4 critical Issues [2]
> >>
> >> (As Wei-Chiu Chuang mentioned, HDFS-13596 will be another blocker)
> >>
> >>
> >> I propose my plan to do a maintenance release of 3.1.3 in the next few
> >> (one or two) weeks.
> >>
> >> Hadoop 3.1.3 release plan:
> >>
> >> Code Freezing Date: *25th August 2019 PDT*
> >>
> >> Release Date: *31th August 2019 PDT*
> >>
> >>
> >> Please feel free to share your insights on this. Thanks!
> >>
> >>
> >> [1] https://s.apache.org/zw8l5
> >>
> >> [2] https://s.apache.org/fjol5
> >>
> >>
> >> BR,
> >>
> >> Zhankun
> >>
> >
>


[jira] [Created] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-19 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9841:
--

 Summary: Capacity scheduler: add support for combined %user + 
%primary_group mapping
 Key: YARN-9841
 URL: https://issues.apache.org/jira/browse/YARN-9841
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Right now in CS, using {{%primary_group}} with a parent queue is only possible 
this way:

{{u:%user:parentqueue.%primary_group}}

Looking at 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
 we cannot do something like:

{{u:%user:%primary_group.%user}}

Fair Scheduler supports a nested rule where such a placement/mapping rule is 
possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9840) Capacity scheduler: add support for Secondary Group user mapping

2019-09-19 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-9840:
--

 Summary: Capacity scheduler: add support for Secondary Group user 
mapping
 Key: YARN-9840
 URL: https://issues.apache.org/jira/browse/YARN-9840
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Currently, Capacity Scheduler only supports primary group rule mapping like 
this:

{{u:%user:%primary_group}}

Fair scheduler already supports secondary group placement rule. Let's add this 
to CS to reduce the feature gap.

Class of interest: 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0

2019-09-19 Thread Sunil Govindan
Hi Rohith

Thanks for putting this together, appreciate the same.

+1 (binding)

- verified signature
- brought up a cluster from the tar ball
- Ran some basic MR jobs
- RM UI seems fine (old and new)


Thanks
Sunil

On Wed, Sep 11, 2019 at 12:56 PM Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> Hi folks,
>
> I have put together a release candidate (RC0) for Apache Hadoop 3.2.1.
>
> The RC is available at:
> http://home.apache.org/~rohithsharmaks/hadoop-3.2.1-RC0/
>
> The RC tag in git is release-3.2.1-RC0:
> https://github.com/apache/hadoop/tree/release-3.2.1-RC0
>
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1226/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> This vote will run for 7 days(5 weekdays), ending on 18th Sept at 11:59 pm
> PST.
>
> I have done testing with a pseudo cluster and distributed shell job. My +1
> to start.
>
> Thanks & Regards
> Rohith Sharma K S
>


Re: [VOTE] Release Apache Hadoop 3.2.1 - RC0

2019-09-19 Thread Rohith Sharma K S
Thanks Brahma for voting and bringing this to my attention!

On Thu, 19 Sep 2019 at 11:28, Brahma Reddy Battula 
wrote:

> RohithThanks for driving the release
>
> +1 (Binding).
>
> --Built from the source
> --Installed pseudo cluster
> --Verified Basic hdfs shell command
> --Ran Pi jobs
> --Browsed the UI
>
>
> *Rolling Upgrade:*
> Following issue could have been merged.With out this, need to disable
> token till rolling upgrade finalised. (Since one of main rolling upgrade
> issue already merged (HDFS-13596)).
> https://issues.apache.org/jira/browse/HDFS-14509
>
This issue marked as blocker for 2.10 and still open!. Can anyone HDFS
folks confirms this whether is is blocker for *hadoop-3.2.1* release?

-Rohith Sharma K S