[VOTE] Merging branch HDFS-7240 to trunk
Dear folks, We would like to start a vote to merge HDFS-7240 branch into trunk. The context can be reviewed in the DISCUSSION thread, and in the jiras (See references below). HDFS-7240 introduces Hadoop Distributed Storage Layer (HDSL), which is a distributed, replicated block layer. The old HDFS namespace and NN can be connected to this new block layer as we have described in HDFS-10419. We also introduce a key-value namespace called Ozone built on HDSL. The code is in a separate module and is turned off by default. In a secure setup, HDSL and Ozone daemons cannot be started. The detailed documentation is available at https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Distributed+Storage+Layer+and+Applications I will start with my vote. +1 (binding) Discussion Thread: https://s.apache.org/7240-merge https://s.apache.org/4sfU Jiras: https://issues.apache.org/jira/browse/HDFS-7240 https://issues.apache.org/jira/browse/HDFS-10419 https://issues.apache.org/jira/browse/HDFS-13074 https://issues.apache.org/jira/browse/HDFS-13180 Thanks jitendra DISCUSSION THREAD SUMMARY : On 2/13/18, 6:28 PM, "sanjay Radia"wrote: Sorry the formatting got messed by my email client. Here it is again Dear Hadoop Community Members, We had multiple community discussions, a few meetings in smaller groups and also jira discussions with respect to this thread. We express our gratitude for participation and valuable comments. The key questions raised were following 1) How the new block storage layer and OzoneFS benefit HDFS and we were asked to chalk out a roadmap towards the goal of a scalable namenode working with the new storage layer 2) We were asked to provide a security design 3)There were questions around stability given ozone brings in a large body of code. 4) Why can’t they be separate projects forever or merged in when production ready? We have responded to all the above questions with detailed explanations and answers on the jira as well as in the discussions. We believe that should sufficiently address community’s concerns. Please see the summary below: 1) The new code base benefits HDFS scaling and a roadmap has been provided. Summary: - New block storage layer addresses the scalability of the block layer. We have shown how existing NN can be connected to the new block layer and its benefits. We have shown 2 milestones, 1st milestone is much simpler than 2nd milestone while giving almost the same scaling benefits. Originally we had proposed simply milestone 2 and the community felt that removing the FSN/BM lock was was a fair amount of work and a simpler solution would be useful - We provide a new K-V namespace called Ozone FS with FileSystem/FileContext plugins to allow the users to use the new system. BTW Hive and Spark work very well on KV-namespaces on the cloud. This will facilitate stabilizing the new block layer. - The new block layer has a new netty based protocol engine in the Datanode which, when stabilized, can be used by the old hdfs block layer. See details below on sharing of code. 2) Stability impact on the existing HDFS code base and code separation. The new block layer and the OzoneFS are in modules that are separate from old HDFS code - currently there are no calls from HDFS into Ozone except for DN starting the new block layer module if configured to do so. It does not add instability (the instability argument has been raised many times). Over time as we share code, we will ensure that the old HDFS continues to remains stable. (for example we plan to stabilize the new netty based protocol engine in the new block layer before sharing it with HDFS’s old block layer) 3) In the short term and medium term, the new system and HDFS will be used side-by-side by users. Side by-side usage in the short term for testing and side-by-side in the medium term for actual production use till the new system has feature parity with old HDFS. During this time, sharing the DN daemon and admin functions between the two systems is operationally important:
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/147/ No changes -1 overall The following subsystems voted -1: asflicense findbugs mvnsite unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Unreaped Processes : hadoop-hdfs:20 bkjournal:5 hadoop-yarn-server-resourcemanager:1 hadoop-yarn-client:4 hadoop-yarn-applications-distributedshell:1 hadoop-mapreduce-client-jobclient:9 hadoop-distcp:4 hadoop-extras:1 Failed junit tests : hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes hadoop.hdfs.web.TestFSMainOperationsWebHdfs hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher hadoop.yarn.server.TestDiskFailures hadoop.conf.TestNoDefaultsJobConf hadoop.mapred.TestJobSysDirWithDFS hadoop.tools.TestIntegration hadoop.tools.TestDistCpViewFs hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.resourceestimator.service.TestResourceEstimatorService Timed out junit tests : org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.hdfs.TestRead org.apache.hadoop.security.TestPermission org.apache.hadoop.hdfs.web.TestWebHdfsTokens org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.TestReadWhileWriting org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.hdfs.TestDFSMkdirs org.apache.hadoop.hdfs.TestDFSOutputStream org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs org.apache.hadoop.hdfs.web.TestWebHDFSXAttr org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead org.apache.hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.fs.TestFileSystem org.apache.hadoop.mapred.TestMiniMRClasspath org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapred.TestMRIntermediateDataEncryption org.apache.hadoop.mapred.TestMRTimelineEventHandling org.apache.hadoop.mapred.join.TestDatamerge org.apache.hadoop.mapred.TestReduceFetchFromPartialMem org.apache.hadoop.mapred.TestLazyOutput org.apache.hadoop.mapred.TestReduceFetch org.apache.hadoop.tools.TestDistCpWithAcls org.apache.hadoop.tools.TestDistCpSync org.apache.hadoop.tools.TestDistCpSyncReverseFromTarget org.apache.hadoop.tools.TestDistCpSyncReverseFromSource org.apache.hadoop.tools.TestCopyFiles cc:
[jira] [Created] (MAPREDUCE-7060) Cherry Pick PathOutputCommitter class/factory to branch-3.0 & 2.10
Steve Loughran created MAPREDUCE-7060: - Summary: Cherry Pick PathOutputCommitter class/factory to branch-3.0 & 2.10 Key: MAPREDUCE-7060 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7060 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0, 2.10.0 Reporter: Steve Loughran Assignee: Steve Loughran It's easier for downstream apps like Spark to pick up the new PathOutputCommitter superclass if it is there on 2.10+, even if the S3A committer isn't there. Adding the interface & binding stuff of HADOOP-6956 allows for third party committers to be deployed. I'm not proposing a backport of the HADOOP-13786 committer: that's Java 8, S3Guard, etc. Too traumatic. All I want here is to allow downstream code to be able to pick up the new interface and so be able to support it and other store committers when available -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6961) Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved MAPREDUCE-6961. --- Resolution: Duplicate Fix Version/s: 3.1.0 This went in with HADOOP-13786 > Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter > > > Key: MAPREDUCE-6961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6961 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.0.0-beta1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.1.0 > > > SPARK-21549 has shown that downstream code is relying on the internal > property > if we pulled {{FileOutputCommitter.getOutputPath}} to the > {{PathOutputCommitter}} of MAPREDUCE-6956, then there'd be a public/stable > way to get this. Admittedly, it does imply that the committer will always > have *some* output path, but FileOutputFormat depends on that anyway. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/ No changes -1 overall The following subsystems voted -1: findbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 234] Failed junit tests : hadoop.crypto.key.kms.server.TestKMS hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.fs.http.server.TestHttpFSServerWebServer hadoop.yarn.client.api.impl.TestTimelineClientV2Impl hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-compile-javac-root.txt [280K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/whitespace-eol.txt [9.2M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/whitespace-tabs.txt [288K] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-common-project_hadoop-kms.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [320K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt [20K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt [40K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [48K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [84K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/704/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt [8.0K] Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org