[jira] [Created] (HDFS-12793) Ozone : TestSCMCli is failing consistently

2017-11-08 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12793:
-

 Summary: Ozone : TestSCMCli is failing consistently
 Key: HDFS-12793
 URL: https://issues.apache.org/jira/browse/HDFS-12793
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Chen Liang
Assignee: Chen Liang


In the Jenkins build of HDFS-12787 and HDFS-12758, there are same three tests 
in {{TestSCMCli}} that failed: {{testCloseContainer}}, {{testDeleteContainer}} 
and {{testInfoContainer}}. I tested locally, these three tests have been 
failing consistently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-11-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/34/

[Nov 7, 2017 8:15:53 PM] (brahma) HDFS-12783. [branch-2] dfsrouter should use 
hdfsScript. Contributed by




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:16 
   hadoop-hdfs-httpfs:1 
   hadoop-hdfs-nfs:1 
   bkjournal:7 
   hadoop-mapreduce-client-jobclient:18 
   hadoop-archives:1 
   hadoop-distcp:4 
   hadoop-extras:1 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-yarn-client:8 
   hadoop-yarn-common:1 
   hadoop-yarn-server-applicationhistoryservice:1 
   hadoop-yarn-server-nodemanager:2 
   hadoop-yarn-server-timelineservice:1 

Failed junit tests :

   hadoop.net.TestDNS 
   hadoop.hdfs.TestDFSRemove 
   hadoop.hdfs.TestDFSClientRetries 
   hadoop.hdfs.server.datanode.TestDataNodeInitStorage 
   hadoop.hdfs.server.namenode.TestProtectedDirectories 
   hadoop.hdfs.TestSetTimes 
   hadoop.hdfs.TestRenameWhileOpen 
   hadoop.hdfs.server.namenode.ha.TestHAMetrics 
   hadoop.hdfs.TestDecommission 
   hadoop.hdfs.TestExtendedAcls 
   hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd 
   hadoop.hdfs.server.namenode.TestFSNamesystemMBean 
   hadoop.hdfs.TestExternalBlockReader 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.federation.router.TestNamenodeHeartbeat 
   hadoop.hdfs.TestParallelShortCircuitReadUnCached 
   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForAcl 
   hadoop.hdfs.TestPersistBlocks 
   hadoop.mapred.TestMerge 
   hadoop.mapred.TestLocalJobSubmission 
   hadoop.mapred.TestMapProgress 
   hadoop.mapreduce.security.TestMRCredentials 
   hadoop.mapreduce.TestMapReduce 
   hadoop.io.TestSequenceFileMergeProgress 
   hadoop.mapred.TestMRTimelineEventHandling 
   hadoop.mapreduce.TestMROutputFormat 
   hadoop.mapred.lib.TestChainMapReduce 
   hadoop.mapred.TestFileOutputFormat 
   hadoop.mapred.TestMRCJCFileOutputCommitter 
   hadoop.mapred.TestClusterMRNotification 
   hadoop.mapred.TestTaskCommit 
   hadoop.mapreduce.TestLocalRunner 
   hadoop.mapred.lib.aggregate.TestAggregates 
   hadoop.mapred.lib.TestDelegatingInputFormat 
   hadoop.mapred.jobcontrol.TestJobControl 
   hadoop.mapred.TestFieldSelection 
   hadoop.mapred.TestJobName 
   hadoop.mapred.lib.TestMultipleOutputs 
   hadoop.mapred.TestJobCleanup 
   hadoop.mapred.TestKeyValueTextInputFormat 
   hadoop.mapred.lib.TestLineInputFormat 
   hadoop.mapred.TestNetworkedJob 
   hadoop.mapreduce.security.TestBinaryTokenFile 
   hadoop.mapred.TestMiniMRClientCluster 
   hadoop.ipc.TestMRCJCSocketFactory 
   hadoop.conf.TestNoDefaultsJobConf 
   hadoop.mapred.TestComparators 
   hadoop.mapred.lib.TestMultithreadedMapRunner 
   hadoop.mapred.lib.TestKeyFieldBasedComparator 
   hadoop.mapred.TestSequenceFileAsTextInputFormat 
   hadoop.mapred.TestMiniMRChildTask 
   hadoop.mapred.TestSequenceFileInputFormat 
   hadoop.mapred.TestMROpportunisticMaps 
   hadoop.mapred.jobcontrol.TestLocalJobControl 
   hadoop.mapred.TestMapRed 
   hadoop.hdfs.TestNNBench 
   hadoop.tools.TestIntegration 
   hadoop.tools.TestDistCpSystem 
   hadoop.tools.TestDistCpViewFs 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels 
   
hadoop.yarn.server.timeline.security.TestTimelineAuthenticationFilterForV1 
   hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL 
   hadoop.yarn.server.nodemanager.TestNodeManagerReboot 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels 
   hadoop.yarn.server.TestContainerManagerSecurity 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestEncryptionZones 
   org.apache.hadoop.fs.TestEnhancedByteBufferAccess 
   org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure 
   org.apache.hadoop.hdfs.TestDataTransferKeepalive 
   org.apache.hadoop.hdfs.TestDatanodeDeath 
   org.apache.hadoop.hdfs.TestDFSFinalize 
   org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream 
   org.apache.hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs 
   

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-08 Thread Vinod Kumar Vavilapalli
A related point - I thought I mentioned this in one of the release preparation 
threads, but in any case.

Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to the 
voting thread as well as the final release) that the first release can 
potentially go through additional fixes to incompatible changes (besides 
stabilization fixes). We should do this with 2.9.0 too.

This has some history - long before this, we tried two different things: (a) 
downstream projects consume an RC (b) downstream projects consume a release. 
Option (a) was tried many times but it was increasingly getting hard to manage 
this across all the projects that depend on Hadoop. When we tried option (b), 
we used to make .0 as a GA release, but downstream projects like Tez, Hive, 
Spark would come back and find an incompatible change - and now we were forced 
into a conundrum - is fixing this incompatible change itself an 
incompatibility? So to avoid this problem, we've started marking the first few 
releases as alpha eventually making a stable point release. Clearly, specific 
users can still use this in production as long as we the Hadoop community 
reserve the right to fix incompatibilities.

Long story short, I'd just add to your voting thread and release notes that 
2.9.0 still needs to be tested downstream and so users may want to wait for 
subsequent point releases.

Thanks
+Vinod

> On Nov 8, 2017, at 12:43 AM, Subru Krishnan  wrote:
> 
> We are canceling the RC due to the issue that Rohith/Sunil identified. The
> issue was difficult to track down as it only happens when you use IP for ZK
> (works fine with host names) and moreover if ZK and RM are co-located on
> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> 
> Thanks to everyone for the extensive testing/validation. Hopefully cost to
> replicate with RC1 is much lower.
> 
> -Subru/Arun.
> 
> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos > wrote:
> 
>> +1 from me too.
>> 
>> Did the following:
>> 1) set up a 9-node cluster;
>> 2) ran some Gridmix jobs;
>> 3) ran (2) after enabling opportunistic containers (used a mix of
>> guaranteed and opportunistic containers for each job);
>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>> containers.
>> 
>> All the above worked with no issues.
>> 
>> Thanks for all the effort guys!
>> 
>> Konstantinos
>> 
>> 
>> 
>> Konstantinos
>> 
>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger 
>> wrote:
>> 
>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>> 
>>> - Verified all hashes and checksums
>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>> - Deployed a pseudo cluster
>>> - Ran some example jobs
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
>>> 
 Sunil / Rohith,
 
 Could you check if your configs are same as Jonathan posted configs?
 https://issues.apache.org/jira/browse/YARN-7453?
>>> focusedCommentId=16242693&
 page=com.atlassian.jira.plugin.system.issuetabpanels:
 comment-tabpanel#comment-16242693
 
 And could you try if using Jonathan's configs can still reproduce the
 issue?
 
 Thanks,
 Wangda
 
 
 On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh 
>> wrote:
 
> Thanks for testing Rohith and Sunil
> 
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh
>>> cluster
> (both automatic and manual) and we are not able to reproduce this.
>> I've
> updated the YARN-7453 > jira/browse/YARN-7453
 
> JIRA
> with details of testing.
> 
> Cheers
> -Arun/Subru
> 
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharm...@apache.org
>> wrote:
> 
>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>  JIRA to track
>> this
>> issue.
>> 
>> - Rohith Sharma K S
>> 
>> On 7 November 2017 at 16:44, Sunil G  wrote:
>> 
>>> Hi Subru and Arun.
>>> 
>>> Thanks for driving 2.9 release. Great work!
>>> 
>>> I installed cluster built from source.
>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>> - Accessed new UI and it also seems fine.
>>> 
>>> However I am also getting same issue as Rohith reported.
>>> - Started an HA cluster
>>> - Pushed RM to standby
>>> - Pushed back RM to active then seeing an exception.
>>> 
>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>> transition
 to
>>> Active
>>>at
>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>> lectorBasedElectorServic
>>>

[jira] [Created] (HDFS-12792) RBF: Test Router-based federation using HDFSContract

2017-11-08 Thread JIRA
Íñigo Goiri created HDFS-12792:
--

 Summary: RBF: Test Router-based federation using HDFSContract
 Key: HDFS-12792
 URL: https://issues.apache.org/jira/browse/HDFS-12792
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Íñigo Goiri






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-11-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/

[Nov 7, 2017 10:53:48 PM] (templedf) YARN-7401. Reduce lock contention in
[Nov 8, 2017 12:39:04 AM] (wang) HADOOP-15018. Update JAVA_HOME in 
create-release for Xenial Dockerfile.
[Nov 8, 2017 2:22:13 AM] (wwei) HDFS-7060. Avoid taking locks when sending 
heartbeats from the DataNode.




-1 overall


The following subsystems voted -1:
asflicense findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 215] 

Unreaped Processes :

   hadoop-mapreduce-client-jobclient:1 

Failed junit tests :

   hadoop.net.TestDNS 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.qjournal.server.TestJournalNodeSync 
   hadoop.mapreduce.lib.join.TestJoinDatamerge 
   hadoop.mapred.lib.TestDelegatingInputFormat 
   hadoop.mapreduce.lib.input.TestDelegatingInputFormat 
   hadoop.mapred.join.TestDatamerge 
   hadoop.mapreduce.lib.join.TestJoinProperties 
   hadoop.streaming.TestMultipleArchiveFiles 
   hadoop.streaming.TestMultipleCachefiles 
   hadoop.streaming.TestSymLink 
   hadoop.contrib.utils.join.TestDataJoin 

Timed out junit tests :

   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA 
   org.apache.hadoop.mapred.pipes.TestPipeApplication 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-compile-javac-root.txt
  [280K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/whitespace-eol.txt
  [8.8M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/whitespace-tabs.txt
  [288K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   UnreapedProcessesLog:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient-reaper.txt
  [4.0K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [148K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [444K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [64K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [100K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-tools_hadoop-streaming.txt
  [32K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-unit-hadoop-tools_hadoop-datajoin.txt
  [8.0K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/584/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12791) NameNode Fsck http Connection can timeout for directories with multiple levels

2017-11-08 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDFS-12791:


 Summary: NameNode Fsck http Connection can timeout for directories 
with multiple levels
 Key: HDFS-12791
 URL: https://issues.apache.org/jira/browse/HDFS-12791
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


Currently the http connections are flushed for every 100 files, however if 
there are multiple levels of directories in the namespace then flushing will be 
postponed till multiple directories levels have been traversed. This connection 
timeout can be avoided if both files and directories are considered for the 
flushing query.

{code}
if (showprogress && (replRes.totalFiles + ecRes.totalFiles) % 100 == 0) {
  out.println();
  out.flush();
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12790) [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, HDFS-12599 and HDFS-11968 commits

2017-11-08 Thread Rakesh R (JIRA)
Rakesh R created HDFS-12790:
---

 Summary: [SPS]: Rebasing HDFS-10285 branch after HDFS-10467, 
HDFS-12599 and HDFS-11968 commits
 Key: HDFS-12790
 URL: https://issues.apache.org/jira/browse/HDFS-12790
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R


This task is a continuation with the periodic HDFS-10285 branch code rebasing 
with the trunk code. To make branch code compile with the trunk code, it needs 
to be refactored with the latest trunk code changes - HDFS-10467, HDFS-12599 
and HDFS-11968.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] A final minor release off branch-2?

2017-11-08 Thread Steve Loughran

> On 7 Nov 2017, at 19:08, Vinod Kumar Vavilapalli  wrote:
> 
> 
> 
> 
>> Frankly speaking, working on some bridging release not targeting any feature 
>> isn't so attractive to me as a contributor. Overall, the final minor release 
>> off branch-2 is good, we should also give 3.x more time to evolve and 
>> mature, therefore it looks to me we would have to work on two release lines 
>> meanwhile for some time. I'd like option C), and suggest we focus on the 
>> recent releases.
> 
> 
> 
> Answering this question is also one of the goals of my starting this thread. 
> Collectively we need to conclude if we are okay or not okay with no longer 
> putting any new feature work in general on the 2.x line after 2.9.0 release 
> and move over our focus into 3.0.
> 
> 
> Thanks
> +Vinod
> 


As a developer of new features (e.g the Hadoop S3A committers), I'm mostly 
already committed to targeting 3.1; the code in there to deal with failures and 
retries has unashamedly embraced java 8 lambda-expressions in production code: 
backporting that is going to be traumatic in terms of IDE-assisted code changes 
and the resultant diff in source between branch-2 & trunk. What's worse, its 
going to be traumatic to test as all my JVMs start with an 8 at the moment, and 
I'm starting to worry about whether I should bump a windows VM up to Java 9 to 
keep an eye on Akira's work there. Currently the only testing I'm really doing 
on java 7 is yetus branch-2 & internal test runs.


3.0 will be out the door, and we can assume that CDH will ship with it soon (*) 
 which will allow for a rapid round trip time on inevitable bugs: 3.1 can be 
the release with compatibility tuned, those reported issues addressed. It's 
certainly where I'd like to focus.


At the same time: 2.7.2-2.8.x are the broadly used versions, we can't just say 
"move to 3.0" & expect everyone to do it, not given we have explicitly got 
backwards-incompatible changes in. I don't seen people rushing to do it until 
the layers above are all qualified (HBase, Hive, Spark, ...). Which means big 
users of 2.7/2,8 won't be in a rush to move and we are going to have to 
maintain 2.x for a while, including security patches for old versions. One 
issue there: what if a patch (such as bumping up a JAR version) is incompatible?

For me then

* 3.1+ for new features
* fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation
* whoever puts their hand up to do 2.x releases deserves support in testing 
* If someone makes a really strong case to backport a feature from 3.x to 
branch-2 and its backwards compatible, I'm not going to stop them. It's just 
once 3.0 is out and a 3.1 on the way, it's less compelling

-Steve

Note: I'm implicitly assuming a timely 3.1 out the door with my work included, 
all all issues arriving from 3,0 fixed. We can worry when 3.1 ships whether 
there's any benefit in maintaining a 3.0.x, or whether it's best to say "move 
to 3.1"



(*) just a guess based the effort & test reports of Andrew & others


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

2017-11-08 Thread Subru Krishnan
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos  wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger 
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan  wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh 
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453  jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharm...@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > >  JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G  wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >> at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >> e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >> at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >> )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >> at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >> at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharm...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > >