Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-05 Thread Andrew Wang
Hi Sanjay, thanks for the response, replying inline:

- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen
> acknowledge the benefit of the new block layer).  We have two choices here
>  ** a) Evolve NN so that it can interact with both old and new block layer,
>  **  b) Fork and create new NN that works only with new block layer, the
> old NN will continue to work with old block layer.
> There are trade-offs but clearly the 2nd option has least impact on the
> old HDFS code.
>
> Are you proposing that we pursue the 2nd option to integrate HDSL with
HDFS?


> - Share the HDSL’s netty  protocol engine with HDFS block layer.  After
> HDSL and Ozone has stabilized the engine, put the new netty engine in
> either HDFS or in Hadoop common - HDSL will use it from there. The HDFS
> community  has been talking about moving to better thread model for HDFS
> DNs since release 0.16!!
>
> The Netty-based protocol engine seems like it could be contributed
separately from HDSL. I'd be interested to learn more about the performance
and other improvements from this new engine.


> - Shallow copy. Here HDSL needs a way to get the actual linux file system
> links - HDFS block layer needs  to provide a private secure API to get file
> names of blocks so that HDSL can do a hard link (hence shallow copy)o
>

Why isn't this possible with two processes? SCR for instance securely
passes file descriptors between the DN and client over a unix domain
socket. I'm sure we can construct a protocol that securely and efficiently
creates hardlinks.

It also sounds like this shallow copy won't work with features like HDFS
encryption or erasure coding, which diminishes its utility. We also don't
even have HDFS-to-HDFS shallow copy yet, so HDFS-to-Ozone shallow copy is
even further out.

Best,
Andrew


[EVENT] HDFS Bug Bash: March 12

2018-03-05 Thread Chris Douglas
[Cross-posting, as this affects the rest of the project]

Hey folks-

As discussed last month [1], the HDFS build hasn't been healthy
recently. We're dedicating a bug bash to stabilize the build and
address some longstanding issues with our unit tests. We rely on our
CI infrastructure to keep the project releasable, and in its current
state, it's not protecting us from regressions. While we probably
won't achieve all our goals in this session, we can develop the
conditions for reestablishing a firm foundation.

If you're new to the project, please consider attending and
contributing. Committers often prioritize large or complicated
patches, and the issues that make the project livable don't get enough
attention. A bug bash is a great opportunity to pull reviewers'
collars, and fix the annoyances that slow us all down.

If you're a committer, please join us! While some of the proposed
repairs are rote, many unit tests rely on implementation details and
non-obvious invariants. We need domain experts to help untangle
complex dependencies and to prevent breakage of deliberate, but
counter-intuitive code.

We're collecting tasks in wiki [2] and will include a dial-in option
for folks who aren't local.

Meetup has started charging for creating new events, so we'll have to
find another way to get an approximate headcount and publish the
address. Please ping me if you have a preferred alternative. -C

[1]: https://s.apache.org/nEoQ
[2]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75965105

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-05 Thread Andrew Wang
Hi Owen, Wangda,

Thanks for clearly laying out the subproject options, that helps the
discussion.

I'm all onboard with the idea of regular releases, and it's something I
tried to do with the 3.0 alphas and betas. The problem though isn't a lack
of commitment from feature developers like Sanjay or Jitendra; far from it!
I think every feature developer makes a reasonable effort to test their
code before it's merged. Yet, my experience as an RM is that more code
comes with more risk. I don't believe that Ozone is special or different in
this regard. It comes with a maintenance cost, not a maintenance benefit.

I'm advocating for #3: separate source, separate release. Since HDSL
stability and FSN/BM refactoring are still a ways out, I don't want to
incur a maintenance cost now. I sympathize with the sentiment that working
cross-repo is harder than within same repo, but the right tooling can make
this a lot easier (e.g. git submodule, Google's repo tool). We have
experience doing this internally here at Cloudera, and I'm happy to share
knowledge and possibly code.

Best,
Andrew

On Fri, Mar 2, 2018 at 4:41 PM, Wangda Tan  wrote:

> I like the idea of same source / same release and put Ozone's source under
> a different directory.
>
> Like Owen mentioned, It gonna be important for all parties to keep a
> regular and shorter release cycle for Hadoop, e.g. 3-4 months between minor
> releases. Users can try features and give feedbacks to stabilize feature
> earlier; developers can be happier since efforts will be consumed by users
> soon after features get merged. In addition to this, if features merged to
> trunk after reasonable tests/review, Andrew's concern may not be a problem
> anymore:
>
> bq. Finally, I earnestly believe that Ozone/HDSL itself would benefit from
> being a separate project. Ozone could release faster and iterate more
> quickly if it wasn't hampered by Hadoop's release schedule and security and
> compatibility requirements.
>
> Thanks,
> Wangda
>
>
> On Fri, Mar 2, 2018 at 4:24 PM, Owen O'Malley 
> wrote:
>
>> On Thu, Mar 1, 2018 at 11:03 PM, Andrew Wang 
>> wrote:
>>
>> Owen mentioned making a Hadoop subproject; we'd have to
>> > hash out what exactly this means (I assume a separate repo still
>> managed by
>> > the Hadoop project), but I think we could make this work if it's more
>> > attractive than incubation or a new TLP.
>>
>>
>> Ok, there are multiple levels of sub-projects that all make sense:
>>
>>- Same source tree, same releases - examples like HDFS & YARN
>>- Same master branch, separate releases and release branches - Hive's
>>Storage API vs Hive. It is in the source tree for the master branch,
>> but
>>has distinct releases and release branches.
>>- Separate source, separate release - Apache Commons.
>>
>> There are advantages and disadvantages to each. I'd propose that we use
>> the
>> same source, same release pattern for Ozone. Note that we tried and later
>> reverted doing Common, HDFS, and YARN as separate source, separate release
>> because it was too much trouble. I like Daryn's idea of putting it as a
>> top
>> level directory in Hadoop and making sure that nothing in Common, HDFS, or
>> YARN depend on it. That way if a Release Manager doesn't think it is ready
>> for release, it can be trivially removed before the release.
>>
>> One thing about using the same releases, Sanjay and Jitendra are signing
>> up
>> to make much more regular bugfix and minor releases in the near future.
>> For
>> example, they'll need to make 3.2 relatively soon to get it released and
>> then 3.3 somewhere in the next 3 to 6 months. That would be good for the
>> project. Hadoop needs more regular releases and fewer big bang releases.
>>
>> .. Owen
>>
>
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-03-05 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/

[Mar 4, 2018 3:12:52 PM] (aajisaka) HADOOP-15286. Remove unused imports from 
TestKMSWithZK.java
[Mar 4, 2018 3:33:47 PM] (aajisaka) HADOOP-15282. HADOOP-15235 broke 
TestHttpFSServerWebServer




-1 overall


The following subsystems voted -1:
findbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 234] 

Failed junit tests :

   hadoop.crypto.key.kms.server.TestKMS 
   hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-compile-javac-root.txt
  [296K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/whitespace-eol.txt
  [9.2M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/whitespace-tabs.txt
  [288K]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-common-project_hadoop-kms.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [324K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [48K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/712/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [84K]

Powered by Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-05 Thread sanjay Radia
Andrew
  Thanks for your response. 

 In this email let me focus on maintenance and unnecessary impact on HDFS.
Daryn also touched on this topic and looked at the code base from the developer 
impact point of view. He appreciated that the code is separate and I agree with 
his suggestion to move it further up the src tree (e.g. Hadoop-hdsl-project or 
hadoop-hdfs-project/hadoop-hdsl). He also gave a good analogy to the store: do 
not break things as you change and evolve the store. Let’s look at the areas of 
future interaction as examples.

- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen 
acknowledge the benefit of the new block layer).  We have two choices here 
 ** a) Evolve NN so that it can interact with both old and new block layer, 
 **  b) Fork and create new NN that works only with new block layer, the old NN 
will continue to work with old block layer. 
There are trade-offs but clearly the 2nd option has least impact on the old 
HDFS code.  

- Share the HDSL’s netty  protocol engine with HDFS block layer.  After HDSL 
and Ozone has stabilized the engine, put the new netty engine in either HDFS or 
in Hadoop common - HDSL will use it from there. The HDFS community  has been 
talking about moving to better thread model for HDFS DNs since release 0.16!!

- Shallow copy. Here HDSL needs a way to get the actual linux file system links 
- HDFS block layer needs  to provide a private secure API to get file names of 
blocks so that HDSL can do a hard link (hence shallow copy)o

The first 2 examples are beneficial to existing HDFS and the maintenance burden 
can be minimized and worth the benefits (2x NN scalability!! And more efficient 
protocol engine). The 3rd is only beneficial to HDFS users who want the 
scalability of the new HDSL/Ozone code in a side-by-side system; here the cost 
is providing a  private API to access the block file name. 


sanjay

> On Mar 1, 2018, at 11:03 PM, Andrew Wang  wrote:
> 
> Hi Sanjay,
> 
> I have different opinions about what's important and how to eventually
> integrate this code, and that's not because I'm "conveniently ignoring"
> your responses. I'm also not making some of the arguments you claim I am
> making. Attacking arguments I'm not making is not going to change my mind,
> so let's bring it back to the arguments I am making.
> 
> Here's what it comes down to: HDFS-on-HDSL is not going to be ready in the
> near-term, and it comes with a maintenance cost.
> 
> I did read the proposal on HDFS-10419 and I understood that HDFS-on-HDSL
> integration does not necessarily require a lock split. However, there still
> needs to be refactoring to clearly define the FSN and BM interfaces and
> make the BM pluggable so HDSL can be swapped in. This is a major
> undertaking and risky. We did a similar refactoring in 2.x which made
> backports hard and introduced bugs. I don't think we should have done this
> in a minor release.
> 
> Furthermore, I don't know what your expectation is on how long it will take
> to stabilize HDSL, but this horizon for other storage systems is typically
> measured in years rather than months.
> 
> Both of these feel like Hadoop 4 items: a ways out yet.
> 
> Moving on, there is a non-trivial maintenance cost to having this new code
> in the code base. Ozone bugs become our bugs. Ozone dependencies become our
> dependencies. Ozone's security flaws are our security flaws. All of this
> negatively affects our already lumbering release schedule, and thus our
> ability to deliver and iterate on the features we're already trying to
> ship. Even if Ozone is separate and off by default, this is still a large
> amount of code that comes with a large maintenance cost. I don't want to
> incur this cost when the benefit is still a ways out.
> 
> We disagree on the necessity of sharing a repo and sharing operational
> behaviors. Libraries exist as a method for sharing code. HDFS also hardly
> has a monopoly on intermediating storage today. Disks are shared with MR
> shuffle, Spark/Impala spill, log output, Kudu, Kafka, etc. Operationally
> we've made this work. Having Ozone/HDSL in a separate process can even be
> seen as an operational advantage since it's isolated. I firmly believe that
> we can solve any implementation issues even with separate processes.
> 
> This is why I asked about making this a separate project. Given that these
> two efforts (HDSL stabilization and NN refactoring) are a ways out, the
> best way to get Ozone/HDSL in the hands of users today is to release it as
> its own project. Owen mentioned making a Hadoop subproject; we'd have to
> hash out what exactly this means (I assume a separate repo still managed by
> the Hadoop project), but I think we could make this work if it's more
> attractive than incubation or a new TLP.
> 
> I'm excited about the possibilities of both HDSL and the NN refactoring in
> ensuring a future for HDFS for years to come. A pluggable block manager
>