Re: [VOTE] Merging branch HDFS-7240 to trunk
Hi Sanjay, thanks for the response, replying inline: - NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen > acknowledge the benefit of the new block layer). We have two choices here > ** a) Evolve NN so that it can interact with both old and new block layer, > ** b) Fork and create new NN that works only with new block layer, the > old NN will continue to work with old block layer. > There are trade-offs but clearly the 2nd option has least impact on the > old HDFS code. > > Are you proposing that we pursue the 2nd option to integrate HDSL with HDFS? > - Share the HDSL’s netty protocol engine with HDFS block layer. After > HDSL and Ozone has stabilized the engine, put the new netty engine in > either HDFS or in Hadoop common - HDSL will use it from there. The HDFS > community has been talking about moving to better thread model for HDFS > DNs since release 0.16!! > > The Netty-based protocol engine seems like it could be contributed separately from HDSL. I'd be interested to learn more about the performance and other improvements from this new engine. > - Shallow copy. Here HDSL needs a way to get the actual linux file system > links - HDFS block layer needs to provide a private secure API to get file > names of blocks so that HDSL can do a hard link (hence shallow copy)o > Why isn't this possible with two processes? SCR for instance securely passes file descriptors between the DN and client over a unix domain socket. I'm sure we can construct a protocol that securely and efficiently creates hardlinks. It also sounds like this shallow copy won't work with features like HDFS encryption or erasure coding, which diminishes its utility. We also don't even have HDFS-to-HDFS shallow copy yet, so HDFS-to-Ozone shallow copy is even further out. Best, Andrew
[EVENT] HDFS Bug Bash: March 12
[Cross-posting, as this affects the rest of the project] Hey folks- As discussed last month [1], the HDFS build hasn't been healthy recently. We're dedicating a bug bash to stabilize the build and address some longstanding issues with our unit tests. We rely on our CI infrastructure to keep the project releasable, and in its current state, it's not protecting us from regressions. While we probably won't achieve all our goals in this session, we can develop the conditions for reestablishing a firm foundation. If you're new to the project, please consider attending and contributing. Committers often prioritize large or complicated patches, and the issues that make the project livable don't get enough attention. A bug bash is a great opportunity to pull reviewers' collars, and fix the annoyances that slow us all down. If you're a committer, please join us! While some of the proposed repairs are rote, many unit tests rely on implementation details and non-obvious invariants. We need domain experts to help untangle complex dependencies and to prevent breakage of deliberate, but counter-intuitive code. We're collecting tasks in wiki [2] and will include a dial-in option for folks who aren't local. Meetup has started charging for creating new events, so we'll have to find another way to get an approximate headcount and publish the address. Please ping me if you have a preferred alternative. -C [1]: https://s.apache.org/nEoQ [2]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75965105 - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: [VOTE] Merging branch HDFS-7240 to trunk
Hi Owen, Wangda, Thanks for clearly laying out the subproject options, that helps the discussion. I'm all onboard with the idea of regular releases, and it's something I tried to do with the 3.0 alphas and betas. The problem though isn't a lack of commitment from feature developers like Sanjay or Jitendra; far from it! I think every feature developer makes a reasonable effort to test their code before it's merged. Yet, my experience as an RM is that more code comes with more risk. I don't believe that Ozone is special or different in this regard. It comes with a maintenance cost, not a maintenance benefit. I'm advocating for #3: separate source, separate release. Since HDSL stability and FSN/BM refactoring are still a ways out, I don't want to incur a maintenance cost now. I sympathize with the sentiment that working cross-repo is harder than within same repo, but the right tooling can make this a lot easier (e.g. git submodule, Google's repo tool). We have experience doing this internally here at Cloudera, and I'm happy to share knowledge and possibly code. Best, Andrew On Fri, Mar 2, 2018 at 4:41 PM, Wangda Tanwrote: > I like the idea of same source / same release and put Ozone's source under > a different directory. > > Like Owen mentioned, It gonna be important for all parties to keep a > regular and shorter release cycle for Hadoop, e.g. 3-4 months between minor > releases. Users can try features and give feedbacks to stabilize feature > earlier; developers can be happier since efforts will be consumed by users > soon after features get merged. In addition to this, if features merged to > trunk after reasonable tests/review, Andrew's concern may not be a problem > anymore: > > bq. Finally, I earnestly believe that Ozone/HDSL itself would benefit from > being a separate project. Ozone could release faster and iterate more > quickly if it wasn't hampered by Hadoop's release schedule and security and > compatibility requirements. > > Thanks, > Wangda > > > On Fri, Mar 2, 2018 at 4:24 PM, Owen O'Malley > wrote: > >> On Thu, Mar 1, 2018 at 11:03 PM, Andrew Wang >> wrote: >> >> Owen mentioned making a Hadoop subproject; we'd have to >> > hash out what exactly this means (I assume a separate repo still >> managed by >> > the Hadoop project), but I think we could make this work if it's more >> > attractive than incubation or a new TLP. >> >> >> Ok, there are multiple levels of sub-projects that all make sense: >> >>- Same source tree, same releases - examples like HDFS & YARN >>- Same master branch, separate releases and release branches - Hive's >>Storage API vs Hive. It is in the source tree for the master branch, >> but >>has distinct releases and release branches. >>- Separate source, separate release - Apache Commons. >> >> There are advantages and disadvantages to each. I'd propose that we use >> the >> same source, same release pattern for Ozone. Note that we tried and later >> reverted doing Common, HDFS, and YARN as separate source, separate release >> because it was too much trouble. I like Daryn's idea of putting it as a >> top >> level directory in Hadoop and making sure that nothing in Common, HDFS, or >> YARN depend on it. That way if a Release Manager doesn't think it is ready >> for release, it can be trivially removed before the release. >> >> One thing about using the same releases, Sanjay and Jitendra are signing >> up >> to make much more regular bugfix and minor releases in the near future. >> For >> example, they'll need to make 3.2 relatively soon to get it released and >> then 3.3 somewhere in the next 3 to 6 months. That would be good for the >> project. Hadoop needs more regular releases and fewer big bang releases. >> >> .. Owen >> > >
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/ [Mar 4, 2018 3:12:52 PM] (aajisaka) HADOOP-15286. Remove unused imports from TestKMSWithZK.java [Mar 4, 2018 3:33:47 PM] (aajisaka) HADOOP-15282. HADOOP-15235 broke TestHttpFSServerWebServer -1 overall The following subsystems voted -1: findbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 234] Failed junit tests : hadoop.crypto.key.kms.server.TestKMS hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage hadoop.yarn.server.TestDiskFailures hadoop.yarn.applications.distributedshell.TestDistributedShell cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-compile-javac-root.txt [296K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/whitespace-eol.txt [9.2M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/whitespace-tabs.txt [288K] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-common-project_hadoop-kms.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [324K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [48K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [84K] Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: [VOTE] Merging branch HDFS-7240 to trunk
Andrew Thanks for your response. In this email let me focus on maintenance and unnecessary impact on HDFS. Daryn also touched on this topic and looked at the code base from the developer impact point of view. He appreciated that the code is separate and I agree with his suggestion to move it further up the src tree (e.g. Hadoop-hdsl-project or hadoop-hdfs-project/hadoop-hdsl). He also gave a good analogy to the store: do not break things as you change and evolve the store. Let’s look at the areas of future interaction as examples. - NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen acknowledge the benefit of the new block layer). We have two choices here ** a) Evolve NN so that it can interact with both old and new block layer, ** b) Fork and create new NN that works only with new block layer, the old NN will continue to work with old block layer. There are trade-offs but clearly the 2nd option has least impact on the old HDFS code. - Share the HDSL’s netty protocol engine with HDFS block layer. After HDSL and Ozone has stabilized the engine, put the new netty engine in either HDFS or in Hadoop common - HDSL will use it from there. The HDFS community has been talking about moving to better thread model for HDFS DNs since release 0.16!! - Shallow copy. Here HDSL needs a way to get the actual linux file system links - HDFS block layer needs to provide a private secure API to get file names of blocks so that HDSL can do a hard link (hence shallow copy)o The first 2 examples are beneficial to existing HDFS and the maintenance burden can be minimized and worth the benefits (2x NN scalability!! And more efficient protocol engine). The 3rd is only beneficial to HDFS users who want the scalability of the new HDSL/Ozone code in a side-by-side system; here the cost is providing a private API to access the block file name. sanjay > On Mar 1, 2018, at 11:03 PM, Andrew Wangwrote: > > Hi Sanjay, > > I have different opinions about what's important and how to eventually > integrate this code, and that's not because I'm "conveniently ignoring" > your responses. I'm also not making some of the arguments you claim I am > making. Attacking arguments I'm not making is not going to change my mind, > so let's bring it back to the arguments I am making. > > Here's what it comes down to: HDFS-on-HDSL is not going to be ready in the > near-term, and it comes with a maintenance cost. > > I did read the proposal on HDFS-10419 and I understood that HDFS-on-HDSL > integration does not necessarily require a lock split. However, there still > needs to be refactoring to clearly define the FSN and BM interfaces and > make the BM pluggable so HDSL can be swapped in. This is a major > undertaking and risky. We did a similar refactoring in 2.x which made > backports hard and introduced bugs. I don't think we should have done this > in a minor release. > > Furthermore, I don't know what your expectation is on how long it will take > to stabilize HDSL, but this horizon for other storage systems is typically > measured in years rather than months. > > Both of these feel like Hadoop 4 items: a ways out yet. > > Moving on, there is a non-trivial maintenance cost to having this new code > in the code base. Ozone bugs become our bugs. Ozone dependencies become our > dependencies. Ozone's security flaws are our security flaws. All of this > negatively affects our already lumbering release schedule, and thus our > ability to deliver and iterate on the features we're already trying to > ship. Even if Ozone is separate and off by default, this is still a large > amount of code that comes with a large maintenance cost. I don't want to > incur this cost when the benefit is still a ways out. > > We disagree on the necessity of sharing a repo and sharing operational > behaviors. Libraries exist as a method for sharing code. HDFS also hardly > has a monopoly on intermediating storage today. Disks are shared with MR > shuffle, Spark/Impala spill, log output, Kudu, Kafka, etc. Operationally > we've made this work. Having Ozone/HDSL in a separate process can even be > seen as an operational advantage since it's isolated. I firmly believe that > we can solve any implementation issues even with separate processes. > > This is why I asked about making this a separate project. Given that these > two efforts (HDSL stabilization and NN refactoring) are a ways out, the > best way to get Ozone/HDSL in the hands of users today is to release it as > its own project. Owen mentioned making a Hadoop subproject; we'd have to > hash out what exactly this means (I assume a separate repo still managed by > the Hadoop project), but I think we could make this work if it's more > attractive than incubation or a new TLP. > > I'm excited about the possibilities of both HDSL and the NN refactoring in > ensuring a future for HDFS for years to come. A pluggable block manager >