Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Konstantinos Karanasos
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger 
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453  >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharm...@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > >  JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >> at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >> e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >> at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >> )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >> at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharm...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> > at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> > at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Jonathan Hung
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger  wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharm...@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > >  JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G  wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >> at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >> at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >> )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >> at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharm...@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> > at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> > at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > 

Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-11-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/33/

[Nov 6, 2017 5:39:41 PM] (bibinchundatt) Add containerId to Localizer failed 
logs. Contributed by Prabhu Joseph




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:16 
   bkjournal:8 
   hadoop-mapreduce-client-jobclient:14 
   hadoop-archives:1 
   hadoop-distcp:3 
   hadoop-extras:1 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-yarn-client:4 
   hadoop-yarn-server-timeline-pluginstorage:1 
   hadoop-yarn-server-timelineservice:1 

Failed junit tests :

   hadoop.hdfs.TestFSOutputSummer 
   hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd 
   hadoop.hdfs.TestAbandonBlock 
   hadoop.mapred.TestMROpportunisticMaps 
   hadoop.mapred.TestJobCounters 
   hadoop.mapred.TestJobCleanup 
   hadoop.mapred.TestNetworkedJob 
   hadoop.mapred.TestMiniMRClientCluster 
   hadoop.mapred.TestClusterMapReduceTestCase 
   hadoop.tools.TestDistCpViewFs 
   hadoop.tools.TestIntegration 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels 
   TEST-cetest 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestDFSStartupVersions 
   org.apache.hadoop.hdfs.TestHdfsAdmin 
   org.apache.hadoop.fs.TestEnhancedByteBufferAccess 
   org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart 
   org.apache.hadoop.hdfs.server.namenode.TestQuotaByStorageType 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal 
   org.apache.hadoop.fs.viewfs.TestViewFileSystemWithTruncate 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderFactory 
   org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocalLegacy 
   org.apache.hadoop.hdfs.TestFileConcurrentReader 
   org.apache.hadoop.hdfs.server.namenode.TestAddBlock 
   org.apache.hadoop.hdfs.server.namenode.TestEditLogAutoroll 
   org.apache.hadoop.fs.permission.TestStickyBit 
   org.apache.hadoop.hdfs.TestGetFileChecksum 
   org.apache.hadoop.cli.TestHDFSCLI 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperEditLogStreams 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.contrib.bkjournal.TestCurrentInprogress 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperConfiguration 
   org.apache.hadoop.mapred.TestClusterMRNotification 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   org.apache.hadoop.mapred.TestMiniMRClasspath 
   org.apache.hadoop.mapred.TestMRCJCFileInputFormat 
   org.apache.hadoop.mapred.TestMRIntermediateDataEncryption 
   org.apache.hadoop.mapred.TestJobSysDirWithDFS 
   org.apache.hadoop.mapred.TestMRTimelineEventHandling 
   org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath 
   org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers 
   org.apache.hadoop.mapred.TestReduceFetchFromPartialMem 
   org.apache.hadoop.mapred.TestReduceFetch 
   org.apache.hadoop.mapred.TestMerge 
   org.apache.hadoop.tools.TestHadoopArchives 
   org.apache.hadoop.tools.TestDistCpSync 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromTarget 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromSource 
   org.apache.hadoop.tools.TestCopyFiles 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClient 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
   org.apache.hadoop.yarn.server.timeline.TestLogInfo 
   
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices
 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/33/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Eric Badger
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693;
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharm...@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > >  JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >> at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >> at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >> )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >> at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharm...@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> > at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> > at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> > ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> > at
> > >> >
> > 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Wangda Tan
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh  wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharm...@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >  JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G  wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >> at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >> at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >> )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharm...@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> > at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> > at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> > at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> > ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> > at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> > at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> > at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Arun Suresh
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453  JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S  wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>  JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G  wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>> at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>> )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharm...@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> > at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> > at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> > at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> > ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> > at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> > at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:422)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> > at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> > ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> > at
>> > 

[jira] [Created] (HDFS-12789) [READ] Image generation tool does not close an opened stream

2017-11-07 Thread Virajith Jalaparti (JIRA)
Virajith Jalaparti created HDFS-12789:
-

 Summary: [READ] Image generation tool does not close an opened 
stream
 Key: HDFS-12789
 URL: https://issues.apache.org/jira/browse/HDFS-12789
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Virajith Jalaparti






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12788) Reset the upload button when file upload fails

2017-11-07 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HDFS-12788:
---

 Summary: Reset the upload button when file upload fails
 Key: HDFS-12788
 URL: https://issues.apache.org/jira/browse/HDFS-12788
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ui, webhdfs
Affects Versions: 2.9.0
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


When any failure happen while uploading the file,upload dialogue box will not 
disappear.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] A final minor release off branch-2?

2017-11-07 Thread Sean Mackrory
>> You mentioned rolling-upgrades: It will be good to exactly outline the
type of testing. For e.g., the rolling-upgrades orchestration order has
direct implication on the testing done.

Complete details are available in HDFS-11096 where I'm trying to get
scripts to automate these tests committed so we can run them on Jenkins.
For HDFS, I follow the same order as the documentation. I did not see any
documentation indicate when to upgrade zkfc daemons, so it is done at the
end. I also did not see any documentation about a rolling upgrade for YARN,
so I'm doing ResourceManagers first then NodeManager, basically following
the pattern used in HDFS.

I can't speak much about app compatibility in YARN, etc. but the rolling
upgrade runs Terasuite from Hadoop 2 continually while doing the upgrade
and for sometime afterward. 1 incompatibility was found and fixed in trunk
quite a while ago - that part of the test has been working well for quite a
while now.

>> Copying data between 2.x clusters and 3.x clusters: Does this work
already? Is it broken anywhere that we cannot fix? Do we need bridging
features for this work?

HDFS-11096 also includes tests that data can be copied distcp'd over
webhdfs:// to and from old and new clusters regardless of where the distcp
job is launched from. I'll try a test run that uses hdfs:// this week, too.

As part of that JIRA I also looked through all the protobuf's for any
discrepancies / incompatibilities. One was found and fixed, but the rest
looked good to me.



On Mon, Nov 6, 2017 at 6:42 PM, Vinod Kumar Vavilapalli 
wrote:

> The main goal of the bridging release is to ease transition on stuff that
> is guaranteed to be broken.
>
> Of the top of my head, one of the biggest areas is application
> compatibility. When folks move from 2.x to 3.x, are their apps binary
> compatible? Source compatible? Or need changes?
>
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be
> source compatible. This means relooking at the API compatibility in 3.x and
> their impact of migrating applications. We will have to revist and
> un-deprecate old APIs, un-delete old APIs and write documentation on how
> apps can be migrated.
>
> Most of this work will be in 3.x line. The bridging release on the other
> hand will have deprecation for APIs that cannot be undeleted. This may be
> already have been done in many places. But we need to make sure and fill
> gaps if any.
>
> Other areas that I can recall from the old days
>  - Config migration: Many configs are deprecated or deleted. We need
> documentation to help folks to move. We also need deprecations in the
> bridging release for configs that cannot be undeleted.
>  - You mentioned rolling-upgrades: It will be good to exactly outline the
> type of testing. For e.g., the rolling-upgrades orchestration order has
> direct implication on the testing done.
>  - Story for downgrades?
>  - Copying data between 2.x clusters and 3.x clusters: Does this work
> already? Is it broken anywhere that we cannot fix? Do we need bridging
> features for this work?
>
> +Vinod
>
> > On Nov 6, 2017, at 12:49 PM, Andrew Wang 
> wrote:
> >
> > What are the known gaps that need bridging between 2.x and 3.x?
> >
> > From an HDFS perspective, we've tested wire compat, rolling upgrade, and
> > rollback.
> >
> > From a YARN perspective, we've tested wire compat and rolling upgrade.
> Arun
> > just mentioned an NM rollback issue that I'm not familiar with.
> >
> > Anything else? External to this discussion, these should be documented as
> > known issues for 3.0.
> >
> > Best.
> > Andrew
> >
> > On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh  wrote:
> >
> >> Thanks for starting this discussion VInod.
> >>
> >> I agree (C) is a bad idea.
> >> I would prefer (A) given that ATM, branch-2 is still very close to
> >> branch-2.9 - and it is a good time to make a collective decision to lock
> >> down commits to branch-2.
> >>
> >> I think we should also clearly define what the 'bridging' release should
> >> be.
> >> I assume it means the following:
> >> * Any 2.x user wanting to move to 3.x must first upgrade to the bridging
> >> release first and then upgrade to the 3.x release.
> >> * With regard to state store upgrades (at least NM state stores) the
> >> bridging state stores should be aware of all new 3.x keys so the
> implicit
> >> assumption would be that a user can only rollback from the 3.x release
> to
> >> the bridging release and not to the old 2.x release.
> >> * Use the opportunity to clean up deprecated API ?
> >> * Do we even want to consider a separate bridging release for 2.7, 2.8
> an
> >> 2.9 lines ?
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli <
> >> vino...@apache.org>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC
> out
> >>> (tx Arun / Subru!) and 2.8.2 (tx 

Re: [DISCUSS] A final minor release off branch-2?

2017-11-07 Thread Vinod Kumar Vavilapalli
Thanks for your comments, Zheng. Replies inline.


> On the other hand, I've discussed with quite a few 3.0 potential users, it 
> looks like most of them are interested in the erasure coding feature and a 
> major scenario for that is to back up their large volume of data to save 
> storage cost. They might run analytics workload using Hive, Spark, Impala and 
> Kylin on the new cluster based on the version, but it's not a must at the 
> first time. They understand there might be some gaps so they'd migrate their 
> workloads incrementally. For the major analytics workload, we've performed 
> lots of benchmark and integration tests as well as other sides I believe, we 
> did find some issues but they should be fixed in downstream projects. I 
> thought the release of GA will accelerate the progress and expose the issues 
> if any. We couldn't wait for it being matured. There isn't perfectness.


3.0 is a GA release from the Apache Hadoop community. So, we cannot assume that 
all usages in the short term are *only* going to be for storage optimization 
features and only on dedicated clusters. We have to make sure that the 
workloads can be migrated right now and/or that existing clusters can be 
upgraded in-place. If not, we shouldn't be calling it GA.


> This sounds a good consideration. I'm thinking if I'm a Hadoop user, for 
> example, I'm using 2.7.4 or 2.8.2 or whatever 2.x version, would I first 
> upgrade to this bridging release then use the bridge support to upgrade to 
> 3.x version? I'm not sure. On the other hand, I might tend to look for some 
> guides or supports in 3.x docs about how to upgrade from 2.7 to 3.x. 



Arun Suresh also asked this same question earlier. I think this will really 
depend on what we discover as part of the migration and user-acceptance 
testing. If we don't find major issues, you are right, folks can jump directly 
from one of 2.7, 2.8 or 2.9 to 3.0.



> Frankly speaking, working on some bridging release not targeting any feature 
> isn't so attractive to me as a contributor. Overall, the final minor release 
> off branch-2 is good, we should also give 3.x more time to evolve and mature, 
> therefore it looks to me we would have to work on two release lines meanwhile 
> for some time. I'd like option C), and suggest we focus on the recent 
> releases.



Answering this question is also one of the goals of my starting this thread. 
Collectively we need to conclude if we are okay or not okay with no longer 
putting any new feature work in general on the 2.x line after 2.9.0 release and 
move over our focus into 3.0.


Thanks
+Vinod

> -Original Message-
> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org] 
> Sent: Tuesday, November 07, 2017 9:43 AM
> To: Andrew Wang 
> Cc: Arun Suresh ; common-...@hadoop.apache.org; 
> yarn-...@hadoop.apache.org; Hdfs-dev ; 
> mapreduce-...@hadoop.apache.org
> Subject: Re: [DISCUSS] A final minor release off branch-2?
> 
> The main goal of the bridging release is to ease transition on stuff that is 
> guaranteed to be broken.
> 
> Of the top of my head, one of the biggest areas is application compatibility. 
> When folks move from 2.x to 3.x, are their apps binary compatible? Source 
> compatible? Or need changes?
> 
> In 1.x -> 2.x upgrade, we did a bunch of work to atleast make old apps be 
> source compatible. This means relooking at the API compatibility in 3.x and 
> their impact of migrating applications. We will have to revist and 
> un-deprecate old APIs, un-delete old APIs and write documentation on how apps 
> can be migrated.
> 
> Most of this work will be in 3.x line. The bridging release on the other hand 
> will have deprecation for APIs that cannot be undeleted. This may be already 
> have been done in many places. But we need to make sure and fill gaps if any.
> 
> Other areas that I can recall from the old days
> - Config migration: Many configs are deprecated or deleted. We need 
> documentation to help folks to move. We also need deprecations in the 
> bridging release for configs that cannot be undeleted.
> - You mentioned rolling-upgrades: It will be good to exactly outline the type 
> of testing. For e.g., the rolling-upgrades orchestration order has direct 
> implication on the testing done.
> - Story for downgrades?
> - Copying data between 2.x clusters and 3.x clusters: Does this work already? 
> Is it broken anywhere that we cannot fix? Do we need bridging features for 
> this work?
> 
> +Vinod
> 
>> On Nov 6, 2017, at 12:49 PM, Andrew Wang  wrote:
>> 
>> What are the known gaps that need bridging between 2.x and 3.x?
>> 
>> From an HDFS perspective, we've tested wire compat, rolling upgrade, 
>> and rollback.
>> 
>> From a YARN perspective, we've tested wire compat and rolling upgrade. 
>> Arun just mentioned an NM rollback issue that I'm not familiar with.
>> 
>> Anything else? External to 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-11-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/583/

[Nov 6, 2017 5:09:10 PM] (bibinchundatt) Add containerId to Localizer failed 
logs. Contributed by Prabhu Joseph
[Nov 6, 2017 9:28:31 PM] (jianhe) YARN-5461. Initial code ported from 
slider-core module. (jianhe)
[Nov 6, 2017 9:28:32 PM] (jianhe) Rename org.apache.slider.core.build to 
org.apache.slider.core.buildutils
[Nov 6, 2017 9:28:32 PM] (jianhe) Modify pom file for slider
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5513. Move Java only tests from slider 
develop to
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5538. Apply SLIDER-875 to 
yarn-native-services. Contributed by
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5505. Create an agent-less docker 
provider in the native-services
[Nov 6, 2017 9:28:32 PM] (jianhe) YARN-5623. Apply SLIDER-1166 to 
yarn-native-services branch. Contributed
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5610. Initial code for native services 
REST API. Contributed by
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5675. Swagger definition for YARN 
service API. Contributed by Gour
[Nov 6, 2017 9:28:33 PM] (jianhe) Addendum patch for YARN-5610. Contributed by 
Gour Saha
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5689. Update native services REST API to 
use agentless docker
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5729. Bug fixes for the service Rest 
API. Contributed by Gour Saha
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5735. Make the service REST API use the 
app timeout feature
[Nov 6, 2017 9:28:33 PM] (jianhe) YARN-5701. Fix issues in yarn native services 
apps-of-apps. Contributed
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5778. Add .keep file for yarn native 
services AM web app.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5775. Convert enums in swagger 
definition to uppercase. Contributed
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5680. Add 2 new fields in Slider status 
output - image-name and
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5770. Performance improvement of 
native-services REST API service.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5690. Integrate native services modules 
into maven build.
[Nov 6, 2017 9:28:34 PM] (jianhe) YARN-5796. Convert enums values in service 
code to upper case and
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5813. Slider should not try to set a 
negative lifetime timeout
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5812. Exception during GET call - 
"Failed to retrieve application:
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5828. Native services client errors out 
when config formats are
[Nov 6, 2017 9:28:35 PM] (jianhe) YARN-5808. Add gc log options to the yarn 
daemon script when starting
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5909. Remove agent related code in 
slider AM. Contributed by Jian
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5883 Avoid or eliminate expensive YARN 
get all applications call.
[Nov 6, 2017 9:28:36 PM] (jianhe) YARN-5943. Write native services container 
stderr file to log directory.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5941. Slider handles "per.component" for 
multiple components
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5769. Integrate update app lifetime 
using feature implemented in
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5944. Native services AM should remain 
up if RM is down.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5958. Fix ASF license warnings for 
slider core module. Contributed
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5961. Generate native services protobuf 
classes during build.
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5975. Remove the agent - slider AM ssl 
related code. Contributed by
[Nov 6, 2017 9:28:37 PM] (jianhe) YARN-5740. Add a new field in Slider status 
output - lifetime
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5996. Native services AM kills app on 
AMRMClientAsync onError call.
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5967. Fix slider core module findbugs 
warnings. Contributed by Jian
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-5968. Fix slider core module javadocs. 
Contributed by Billie
[Nov 6, 2017 9:28:38 PM] (jianhe) YARN-6010. Fix findbugs, site warnings in 
yarn-services-api module.
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-6014. Followup fix for slider core 
module findbugs. Contributed by
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-5218. Initial core change for DNS for 
YARN. Contributed by Jonathan
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-4757. Add the ability to split reverse 
zone subnets. Contributed by
[Nov 6, 2017 9:28:39 PM] (jianhe) YARN-5993. Allow native services quicklinks 
to be exported for each
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6115. Few additional paths in Slider 
client still uses get all
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6132. SliderClient bondToCluster should 
call findInstance with live
[Nov 6, 2017 9:28:40 PM] (jianhe) Updated pom to point to 3.0.0-alpha3-SNAPSHOT
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6173. Add artifact info and privileged 
container details to the
[Nov 6, 2017 9:28:40 PM] (jianhe) YARN-6186 Handle 
InvalidResourceRequestException in 

Amazon gift card awarded for interviewing for open source development research

2017-11-07 Thread Yatish Hegde
FYI.



Regards,

Yatish



Dear Developers of Apache Hadoop,


My name is Sangseok You. I am a postdoctoral research fellow who studies how 
open source developers collaborate for their projects as team. Our research 
team is interested in what tools are used and how coordination is managed in 
open source development teams.


We would like to hear insights and experiences from you regarding working in 
the open source community. Your participation will involve a verbal interview 
conversation by phone or Skype (approx. 30-45 minutes). We provide a $20 
Amazon.com gift card for the interview upon completion.


Your information and interview data will be kept securely and anonymized.


- If you are interested in the study, please click the link below to sign up.

https://goo.gl/forms/G0Tlq2dipaIwuhza2


- If you have any questions about the study, please contact Sangseok You 
(syo...@syr.edu)


Thank you very much!!


Best,

Sangseok


Sangseok You, Ph.D.

Postdoctoral Research Fellow

School of Information Studies

syo...@syr.edu

105 Hinds Hall, Syracuse, NY 13244

Syracuse University


Yatish



Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Rohith Sharma K S
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
 JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G  wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
> at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
> e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
> )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharm...@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> > at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> > at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> > at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> > ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> > at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> > at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> > at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> > ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> > at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> > at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> > at
> >
> > 

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-07 Thread Sunil G
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
)

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S 
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
> at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
> at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
> ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
> at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
> at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
> at
>
> 

[jira] [Created] (HDFS-12787) Ozone: SCM: Aggregate the metrics from all the container reports

2017-11-07 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-12787:


 Summary: Ozone: SCM: Aggregate the metrics from all the container 
reports
 Key: HDFS-12787
 URL: https://issues.apache.org/jira/browse/HDFS-12787
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: metrics, ozone
Affects Versions: HDFS-7240
Reporter: Yiqun Lin
Assignee: Yiqun Lin


We should aggregate the metrics from all the reports of different datanodes in 
addition to the last report. This way, we can get a global view of the 
container I/Os over the ozone cluster. This is a follow up work of HDFS-11468.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12786) Ozone: add port/service names to the ksm/scm web ui

2017-11-07 Thread Elek, Marton (JIRA)
Elek, Marton created HDFS-12786:
---

 Summary: Ozone: add port/service names to the ksm/scm web ui
 Key: HDFS-12786
 URL: https://issues.apache.org/jira/browse/HDFS-12786
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Elek, Marton


Since HDFS-12655 an additional serviceNames field is available for all rpc 
service via the metrics interface.

This super small patch modifies to scm/ksm web ui to display this name.

Instead of
:9863

We will display:
ScmBlockLocationProtocolService (:9863)

TESTING:

Start dozone cluster and check the header of the rpc metrics section on the web 
ui: http://localhost:9876/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org