Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Konstantinos Karanasos
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger 
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453  >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharm...@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > >  JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >> at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >> e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >> at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >> )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >> at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharm...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> > at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Jonathan Hung
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger  wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharm...@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > >  JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G  wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >> at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >> at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >> )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >> at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharm...@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> > at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > 

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-11-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/33/

[Nov 6, 2017 5:39:41 PM] (bibinchundatt) Add containerId to Localizer failed 
logs. Contributed by Prabhu Joseph




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:16 
   bkjournal:8 
   hadoop-mapreduce-client-jobclient:14 
   hadoop-archives:1 
   hadoop-distcp:3 
   hadoop-extras:1 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-yarn-client:4 
   hadoop-yarn-server-timeline-pluginstorage:1 
   hadoop-yarn-server-timelineservice:1 

Failed junit tests :

   hadoop.hdfs.TestFSOutputSummer 
   hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd 
   hadoop.hdfs.TestAbandonBlock 
   hadoop.mapred.TestMROpportunisticMaps 
   hadoop.mapred.TestJobCounters 
   hadoop.mapred.TestJobCleanup 
   hadoop.mapred.TestNetworkedJob 
   hadoop.mapred.TestMiniMRClientCluster 
   hadoop.mapred.TestClusterMapReduceTestCase 
   hadoop.tools.TestDistCpViewFs 
   hadoop.tools.TestIntegration 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels 
   TEST-cetest 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestDFSStartupVersions 
   org.apache.hadoop.hdfs.TestHdfsAdmin 
   org.apache.hadoop.fs.TestEnhancedByteBufferAccess 
   org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart 
   org.apache.hadoop.hdfs.server.namenode.TestQuotaByStorageType 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal 
   org.apache.hadoop.fs.viewfs.TestViewFileSystemWithTruncate 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderFactory 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocalLegacy 
   org.apache.hadoop.hdfs.TestFileConcurrentReader 
   org.apache.hadoop.hdfs.server.namenode.TestAddBlock 
   org.apache.hadoop.hdfs.server.namenode.TestEditLogAutoroll 
   org.apache.hadoop.fs.permission.TestStickyBit 
   org.apache.hadoop.hdfs.TestGetFileChecksum 
   org.apache.hadoop.cli.TestHDFSCLI 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperEditLogStreams 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.contrib.bkjournal.TestCurrentInprogress 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperConfiguration 
   org.apache.hadoop.mapred.TestClusterMRNotification 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   org.apache.hadoop.mapred.TestMiniMRClasspath 
   org.apache.hadoop.mapred.TestMRCJCFileInputFormat 
   org.apache.hadoop.mapred.TestMRIntermediateDataEncryption 
   org.apache.hadoop.mapred.TestJobSysDirWithDFS 
   org.apache.hadoop.mapred.TestMRTimelineEventHandling 
   org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath 
   org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers 
   org.apache.hadoop.mapred.TestReduceFetchFromPartialMem 
   org.apache.hadoop.mapred.TestReduceFetch 
   org.apache.hadoop.mapred.TestMerge 
   org.apache.hadoop.tools.TestHadoopArchives 
   org.apache.hadoop.tools.TestDistCpSync 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromTarget 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromSource 
   org.apache.hadoop.tools.TestCopyFiles 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClient 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
   org.apache.hadoop.yarn.server.timeline.TestLogInfo 
   
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices
 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/33/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Eric Badger
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693;
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharm...@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > >  JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >> at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >> at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >> )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >> at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharm...@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> > at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> > ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> > at
> > >> >
> > 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Wangda Tan
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharm...@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >  JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G  wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >> at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >> at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >> )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharm...@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> > at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> > at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> > at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> > ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> > at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> > at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Arun Suresh
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453  JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S  wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>  JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G  wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>> at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>> )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharm...@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> > at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> > at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> > at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> > ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> > at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> > at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:422)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> > ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> > at
>> > 

[jira] [Created] (HADOOP-15023) ValueQueue should also validate (lowWatermark * numValues) > 0 on construction

2017-11-07 Thread Xiao Chen (JIRA)
Xiao Chen created HADOOP-15023:
--

 Summary: ValueQueue should also validate (lowWatermark * 
numValues) > 0 on construction
 Key: HADOOP-15023
 URL: https://issues.apache.org/jira/browse/HADOOP-15023
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Xiao Chen
Priority: Minor


ValueQueue has precondition checks for each item independently, but does not 
check {{(int)(lowWatermark * numValues) > 0}}. If the product is low enough, 
casting to int will wrap that to 0, causing problems later when filling / 
getting from the queue.

[code|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java#L224]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] A final minor release off branch-2?

2017-11-07 Thread Sean Mackrory
>> You mentioned rolling-upgrades: It will be good to exactly outline the
type of testing. For e.g., the rolling-upgrades orchestration order has
direct implication on the testing done.

Complete details are available in HDFS-11096 where I'm trying to get
scripts to automate these tests committed so we can run them on Jenkins.
For HDFS, I follow the same order as the documentation. I did not see any
documentation indicate when to upgrade zkfc daemons, so it is done at the
end. I also did not see any documentation about a rolling upgrade for YARN,
so I'm doing ResourceManagers first then NodeManager, basically following
the pattern used in HDFS.

I can't speak much about app compatibility in YARN, etc. but the rolling
upgrade runs Terasuite from Hadoop 2 continually while doing the upgrade
and for sometime afterward. 1 incompatibility was found and fixed in trunk
quite a while ago - that part of the test has been working well for quite a
while now.

>> Copying data between 2.x clusters and 3.x clusters: Does this work
already? Is it broken anywhere that we cannot fix? Do we need bridging
features for this work?

HDFS-11096 also includes tests that data can be copied distcp'd over
webhdfs:// to and from old and new clusters regardless of where the distcp
job is launched from. I'll try a test run that uses hdfs:// this week, too.

As part of that JIRA I also looked through all the protobuf's for any
discrepancies / incompatibilities. One was found and fixed, but the rest
looked good to me.



On Mon, Nov 6, 2017 at 6:42 PM, Vinod Kumar Vavilapalli 
wrote:

> The main goal of the bridging release is to ease transition on stuff that
> is guaranteed to be broken.
>
> Of the top of my head, one of the biggest areas is application
> compatibility. When folks move from 2.x to 3.x, are their apps binary
> compatible? Source compatible? Or need changes?
>
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be
> source compatible. This means relooking at the API compatibility in 3.x and
> their impact of migrating applications. We will have to revist and
> un-deprecate old APIs, un-delete old APIs and write documentation on how
> apps can be migrated.
>
> Most of this work will be in 3.x line. The bridging release on the other
> hand will have deprecation for APIs that cannot be undeleted. This may be
> already have been done in many places. But we need to make sure and fill
> gaps if any.
>
> Other areas that I can recall from the old days
>  - Config migration: Many configs are deprecated or deleted. We need
> documentation to help folks to move. We also need deprecations in the
> bridging release for configs that cannot be undeleted.
>  - You mentioned rolling-upgrades: It will be good to exactly outline the
> type of testing. For e.g., the rolling-upgrades orchestration order has
> direct implication on the testing done.
>  - Story for downgrades?
>  - Copying data between 2.x clusters and 3.x clusters: Does this work
> already? Is it broken anywhere that we cannot fix? Do we need bridging
> features for this work?
>
> +Vinod
>
> > On Nov 6, 2017, at 12:49 PM, Andrew Wang 
> wrote:
> >
> > What are the known gaps that need bridging between 2.x and 3.x?
> >
> > From an HDFS perspective, we've tested wire compat, rolling upgrade, and
> > rollback.
> >
> > From a YARN perspective, we've tested wire compat and rolling upgrade.
> Arun
> > just mentioned an NM rollback issue that I'm not familiar with.
> >
> > Anything else? External to this discussion, these should be documented as
> > known issues for 3.0.
> >
> > Best.
> > Andrew
> >
> > On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh  wrote:
> >
> >> Thanks for starting this discussion VInod.
> >>
> >> I agree (C) is a bad idea.
> >> I would prefer (A) given that ATM, branch-2 is still very close to
> >> branch-2.9 - and it is a good time to make a collective decision to lock
> >> down commits to branch-2.
> >>
> >> I think we should also clearly define what the 'bridging' release should
> >> be.
> >> I assume it means the following:
> >> * Any 2.x user wanting to move to 3.x must first upgrade to the bridging
> >> release first and then upgrade to the 3.x release.
> >> * With regard to state store upgrades (at least NM state stores) the
> >> bridging state stores should be aware of all new 3.x keys so the
> implicit
> >> assumption would be that a user can only rollback from the 3.x release
> to
> >> the bridging release and not to the old 2.x release.
> >> * Use the opportunity to clean up deprecated API ?
> >> * Do we even want to consider a separate bridging release for 2.7, 2.8
> an
> >> 2.9 lines ?
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli <
> >> vino...@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC
> out
> >>> (tx Arun / Subru!) and 2.8.2 (tx 

Re: [DISCUSS] A final minor release off branch-2?

2017-11-07 Thread Vinod Kumar Vavilapalli
Thanks for your comments, Zheng. Replies inline.


> On the other hand, I've discussed with quite a few 3.0 potential users, it 
> looks like most of them are interested in the erasure coding feature and a 
> major scenario for that is to back up their large volume of data to save 
> storage cost. They might run analytics workload using Hive, Spark, Impala and 
> Kylin on the new cluster based on the version, but it's not a must at the 
> first time. They understand there might be some gaps so they'd migrate their 
> workloads incrementally. For the major analytics workload, we've performed 
> lots of benchmark and integration tests as well as other sides I believe, we 
> did find some issues but they should be fixed in downstream projects. I 
> thought the release of GA will accelerate the progress and expose the issues 
> if any. We couldn't wait for it being matured. There isn't perfectness.


3.0 is a GA release from the Apache Hadoop community. So, we cannot assume that 
all usages in the short term are *only* going to be for storage optimization 
features and only on dedicated clusters. We have to make sure that the 
workloads can be migrated right now and/or that existing clusters can be 
upgraded in-place. If not, we shouldn't be calling it GA.


> This sounds a good consideration. I'm thinking if I'm a Hadoop user, for 
> example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first 
> upgrade to this bridging release then use the bridge support to upgrade to 
> 3.x version? I'm not sure. On the other hand, I might tend to look for some 
> guides or supports in 3.x docs about how to upgrade from 2.7 to 3.x. 



Arun Suresh also asked this same question earlier. I think this will really 
depend on what we discover as part of the migration and user-acceptance 
testing. If we don't find major issues, you are right, folks can jump directly 
from one of 2.7, 2.8 or 2.9 to 3.0.



> Frankly speaking, working on some bridging release not targeting any feature 
> isn't so attractive to me as a contributor. Overall, the final minor release 
> off branch-2 is good, we should also give 3.x more time to evolve and mature, 
> therefore it looks to me we would have to work on two release lines meanwhile 
> for some time. I'd like option C), and suggest we focus on the recent 
> releases.



Answering this question is also one of the goals of my starting this thread. 
Collectively we need to conclude if we are okay or not okay with no longer 
putting any new feature work in general on the 2.x line after 2.9.0 release and 
move over our focus into 3.0.


Thanks
+Vinod

> -Original Message-
> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
> Sent: Tuesday, November 07, 2017 9:43 AM
> To: Andrew Wang 
> Cc: Arun Suresh ; common-dev@hadoop.apache.org; 
> yarn-...@hadoop.apache.org; Hdfs-dev ; 
> mapreduce-...@hadoop.apache.org
> Subject: Re: [DISCUSS] A final minor release off branch-2?
> 
> The main goal of the bridging release is to ease transition on stuff that is 
> guaranteed to be broken.
> 
> Of the top of my head, one of the biggest areas is application compatibility. 
> When folks move from 2.x to 3.x, are their apps binary compatible? Source 
> compatible? Or need changes?
> 
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be 
> source compatible. This means relooking at the API compatibility in 3.x and 
> their impact of migrating applications. We will have to revist and 
> un-deprecate old APIs, un-delete old APIs and write documentation on how apps 
> can be migrated.
> 
> Most of this work will be in 3.x line. The bridging release on the other hand 
> will have deprecation for APIs that cannot be undeleted. This may be already 
> have been done in many places. But we need to make sure and fill gaps if any.
> 
> Other areas that I can recall from the old days
> - Config migration: Many configs are deprecated or deleted. We need 
> documentation to help folks to move. We also need deprecations in the 
> bridging release for configs that cannot be undeleted.
> - You mentioned rolling-upgrades: It will be good to exactly outline the type 
> of testing. For e.g., the rolling-upgrades orchestration order has direct 
> implication on the testing done.
> - Story for downgrades?
> - Copying data between 2.x clusters and 3.x clusters: Does this work already? 
> Is it broken anywhere that we cannot fix? Do we need bridging features for 
> this work?
> 
> +Vinod
> 
>> On Nov 6, 2017, at 12:49 PM, Andrew Wang  wrote:
>> 
>> What are the known gaps that need bridging between 2.x and 3.x?
>> 
>> From an HDFS perspective, we've tested wire compat, rolling upgrade, 
>> and rollback.
>> 
>> From a YARN perspective, we've tested wire compat and rolling upgrade. 
>> Arun just mentioned an NM rollback issue that I'm not familiar with.
>> 
>> Anything else? External to 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-11-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/583/

[Nov 6, 2017 5:09:10 PM] (bibinchundatt) Add containerId to Localizer failed 
logs. Contributed by Prabhu Joseph
[Nov 6, 2017 9:28:31 PM] (jianhe) YARN-5461. Initial code ported from 
slider-core module. (jianhe)
[Nov 6, 2017 9:28:32 PM] (jianhe) Rename org.apache.slider.core.build to 
org.apache.slider.core.buildutils
[Nov 6, 2017 9:28:32 PM] (jianhe) Modify pom file for slider
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5513. Move Java only tests from slider 
develop to
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5538. Apply SLIDER-875 to 
yarn-native-services. Contributed by
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5505. Create an agent-less docker 
provider in the native-services
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5623. Apply SLIDER-1166 to 
yarn-native-services branch. Contributed
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5610. Initial code for native services 
REST API. Contributed by
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5675. Swagger definition for YARN 
service API. Contributed by Gour
[Nov 6, 2017 9:28:33 PM] (jianhe) Addendum patch for YARN-5610. Contributed by 
Gour Saha
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5689. Update native services REST API to 
use agentless docker
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5729. Bug fixes for the service Rest 
API. Contributed by Gour Saha
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5735. Make the service REST API use the 
app timeout feature
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5701. Fix issues in yarn native services 
apps-of-apps. Contributed
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5778. Add .keep file for yarn native 
services AM web app.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5775. Convert enums in swagger 
definition to uppercase. Contributed
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5680. Add 2 new fields in Slider status 
output - image-name and
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5770. Performance improvement of 
native-services REST API service.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5690. Integrate native services modules 
into maven build.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5796. Convert enums values in service 
code to upper case and
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5813. Slider should not try to set a 
negative lifetime timeout
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5812. Exception during GET call - 
"Failed to retrieve application:
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5828. Native services client errors out 
when config formats are
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5808. Add gc log options to the yarn 
daemon script when starting
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5909. Remove agent related code in 
slider AM. Contributed by Jian
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5883 Avoid or eliminate expensive YARN 
get all applications call.
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5943. Write native services container 
stderr file to log directory.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5941. Slider handles "per.component" for 
multiple components
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5769. Integrate update app lifetime 
using feature implemented in
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5944. Native services AM should remain 
up if RM is down.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5958. Fix ASF license warnings for 
slider core module. Contributed
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5961. Generate native services protobuf 
classes during build.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5975. Remove the agent - slider AM ssl 
related code. Contributed by
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5740. Add a new field in Slider status 
output - lifetime
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5996. Native services AM kills app on 
AMRMClientAsync onError call.
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5967. Fix slider core module findbugs 
warnings. Contributed by Jian
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5968. Fix slider core module javadocs. 
Contributed by Billie
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-6010. Fix findbugs, site warnings in 
yarn-services-api module.
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-6014. Followup fix for slider core 
module findbugs. Contributed by
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-5218. Initial core change for DNS for 
YARN. Contributed by Jonathan
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-4757. Add the ability to split reverse 
zone subnets. Contributed by
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-5993. Allow native services quicklinks 
to be exported for each
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6115. Few additional paths in Slider 
client still uses get all
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6132. SliderClient bondToCluster should 
call findInstance with live
[Nov 6, 2017 9:28:40 PM] (jianhe) Updated pom to point to 3.0.0-alpha3-SNAPSHOT
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6173. Add artifact info and privileged 
container details to the
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6186 Handle 
InvalidResourceRequestException in 

[jira] [Created] (HADOOP-15022) s3guard IT tests increase R/W capacity of the test table by 1

2017-11-07 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15022:
---

 Summary: s3guard IT tests increase R/W capacity of the test table 
by 1
 Key: HADOOP-15022
 URL: https://issues.apache.org/jira/browse/HADOOP-15022
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.0.0
Reporter: Steve Loughran
Priority: Minor


Just noticed playing with the CLI that my allocated IOPs was 153; reset it to 
10 R & 10 W; after a few of the IT test runs it is now 13 each

assumption: every test run of the S3Guard CLI is increasing the allocated IO



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15021) Excluding private and limitiedprivate from javadoc causes broken links

2017-11-07 Thread Andras Bokor (JIRA)
Andras Bokor created HADOOP-15021:
-

 Summary: Excluding private and limitiedprivate from javadoc causes 
broken links
 Key: HADOOP-15021
 URL: https://issues.apache.org/jira/browse/HADOOP-15021
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Andras Bokor
Priority: Minor


Examples:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataInputStream.html
Check "All Implemented Interfaces" section

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/TaskAttemptContext.html
Same section

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html#renewDelegationToken-org.apache.hadoop.security.token.Token-
Method parameters

I am not sure about the correct solution. Waiting for ideas or something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Rohith Sharma K S
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
 JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G  wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
> at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
> )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharm...@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> > at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> > at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> > at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> > ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> > at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> > at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> > ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> > at
> >
> > 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Sunil G
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
)

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S 
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
> at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
> at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
> ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
> at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
> at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
> at
>
>