Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-05 Thread Andrew Wang
Hi Sanjay, thanks for the response, replying inline:

- NN on top HDSL where the NN uses the new block layer (Both Daryn and Owen
> acknowledge the benefit of the new block layer).  We have two choices here
>  ** a) Evolve NN so that it can interact with both old and new block layer,
>  **  b) Fork and create new NN that works only with new block layer, the
> old NN will continue to work with old block layer.
> There are trade-offs but clearly the 2nd option has least impact on the
> old HDFS code.
>
> Are you proposing that we pursue the 2nd option to integrate HDSL with
HDFS?


> - Share the HDSL’s netty  protocol engine with HDFS block layer.  After
> HDSL and Ozone has stabilized the engine, put the new netty engine in
> either HDFS or in Hadoop common - HDSL will use it from there. The HDFS
> community  has been talking about moving to better thread model for HDFS
> DNs since release 0.16!!
>
> The Netty-based protocol engine seems like it could be contributed
separately from HDSL. I'd be interested to learn more about the performance
and other improvements from this new engine.


> - Shallow copy. Here HDSL needs a way to get the actual linux file system
> links - HDFS block layer needs  to provide a private secure API to get file
> names of blocks so that HDSL can do a hard link (hence shallow copy)o
>

Why isn't this possible with two processes? SCR for instance securely
passes file descriptors between the DN and client over a unix domain
socket. I'm sure we can construct a protocol that securely and efficiently
creates hardlinks.

It also sounds like this shallow copy won't work with features like HDFS
encryption or erasure coding, which diminishes its utility. We also don't
even have HDFS-to-HDFS shallow copy yet, so HDFS-to-Ozone shallow copy is
even further out.

Best,
Andrew


Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-05 Thread Andrew Wang
Hi Owen, Wangda,

Thanks for clearly laying out the subproject options, that helps the
discussion.

I'm all onboard with the idea of regular releases, and it's something I
tried to do with the 3.0 alphas and betas. The problem though isn't a lack
of commitment from feature developers like Sanjay or Jitendra; far from it!
I think every feature developer makes a reasonable effort to test their
code before it's merged. Yet, my experience as an RM is that more code
comes with more risk. I don't believe that Ozone is special or different in
this regard. It comes with a maintenance cost, not a maintenance benefit.

I'm advocating for #3: separate source, separate release. Since HDSL
stability and FSN/BM refactoring are still a ways out, I don't want to
incur a maintenance cost now. I sympathize with the sentiment that working
cross-repo is harder than within same repo, but the right tooling can make
this a lot easier (e.g. git submodule, Google's repo tool). We have
experience doing this internally here at Cloudera, and I'm happy to share
knowledge and possibly code.

Best,
Andrew

On Fri, Mar 2, 2018 at 4:41 PM, Wangda Tan <wheele...@gmail.com> wrote:

> I like the idea of same source / same release and put Ozone's source under
> a different directory.
>
> Like Owen mentioned, It gonna be important for all parties to keep a
> regular and shorter release cycle for Hadoop, e.g. 3-4 months between minor
> releases. Users can try features and give feedbacks to stabilize feature
> earlier; developers can be happier since efforts will be consumed by users
> soon after features get merged. In addition to this, if features merged to
> trunk after reasonable tests/review, Andrew's concern may not be a problem
> anymore:
>
> bq. Finally, I earnestly believe that Ozone/HDSL itself would benefit from
> being a separate project. Ozone could release faster and iterate more
> quickly if it wasn't hampered by Hadoop's release schedule and security and
> compatibility requirements.
>
> Thanks,
> Wangda
>
>
> On Fri, Mar 2, 2018 at 4:24 PM, Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> On Thu, Mar 1, 2018 at 11:03 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>
>> Owen mentioned making a Hadoop subproject; we'd have to
>> > hash out what exactly this means (I assume a separate repo still
>> managed by
>> > the Hadoop project), but I think we could make this work if it's more
>> > attractive than incubation or a new TLP.
>>
>>
>> Ok, there are multiple levels of sub-projects that all make sense:
>>
>>- Same source tree, same releases - examples like HDFS & YARN
>>- Same master branch, separate releases and release branches - Hive's
>>Storage API vs Hive. It is in the source tree for the master branch,
>> but
>>has distinct releases and release branches.
>>- Separate source, separate release - Apache Commons.
>>
>> There are advantages and disadvantages to each. I'd propose that we use
>> the
>> same source, same release pattern for Ozone. Note that we tried and later
>> reverted doing Common, HDFS, and YARN as separate source, separate release
>> because it was too much trouble. I like Daryn's idea of putting it as a
>> top
>> level directory in Hadoop and making sure that nothing in Common, HDFS, or
>> YARN depend on it. That way if a Release Manager doesn't think it is ready
>> for release, it can be trivially removed before the release.
>>
>> One thing about using the same releases, Sanjay and Jitendra are signing
>> up
>> to make much more regular bugfix and minor releases in the near future.
>> For
>> example, they'll need to make 3.2 relatively soon to get it released and
>> then 3.3 somewhere in the next 3 to 6 months. That would be good for the
>> project. Hadoop needs more regular releases and fewer big bang releases.
>>
>> .. Owen
>>
>
>


Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-01 Thread Andrew Wang
Hi Sanjay,

I have different opinions about what's important and how to eventually
integrate this code, and that's not because I'm "conveniently ignoring"
your responses. I'm also not making some of the arguments you claim I am
making. Attacking arguments I'm not making is not going to change my mind,
so let's bring it back to the arguments I am making.

Here's what it comes down to: HDFS-on-HDSL is not going to be ready in the
near-term, and it comes with a maintenance cost.

I did read the proposal on HDFS-10419 and I understood that HDFS-on-HDSL
integration does not necessarily require a lock split. However, there still
needs to be refactoring to clearly define the FSN and BM interfaces and
make the BM pluggable so HDSL can be swapped in. This is a major
undertaking and risky. We did a similar refactoring in 2.x which made
backports hard and introduced bugs. I don't think we should have done this
in a minor release.

Furthermore, I don't know what your expectation is on how long it will take
to stabilize HDSL, but this horizon for other storage systems is typically
measured in years rather than months.

Both of these feel like Hadoop 4 items: a ways out yet.

Moving on, there is a non-trivial maintenance cost to having this new code
in the code base. Ozone bugs become our bugs. Ozone dependencies become our
dependencies. Ozone's security flaws are our security flaws. All of this
negatively affects our already lumbering release schedule, and thus our
ability to deliver and iterate on the features we're already trying to
ship. Even if Ozone is separate and off by default, this is still a large
amount of code that comes with a large maintenance cost. I don't want to
incur this cost when the benefit is still a ways out.

We disagree on the necessity of sharing a repo and sharing operational
behaviors. Libraries exist as a method for sharing code. HDFS also hardly
has a monopoly on intermediating storage today. Disks are shared with MR
shuffle, Spark/Impala spill, log output, Kudu, Kafka, etc. Operationally
we've made this work. Having Ozone/HDSL in a separate process can even be
seen as an operational advantage since it's isolated. I firmly believe that
we can solve any implementation issues even with separate processes.

This is why I asked about making this a separate project. Given that these
two efforts (HDSL stabilization and NN refactoring) are a ways out, the
best way to get Ozone/HDSL in the hands of users today is to release it as
its own project. Owen mentioned making a Hadoop subproject; we'd have to
hash out what exactly this means (I assume a separate repo still managed by
the Hadoop project), but I think we could make this work if it's more
attractive than incubation or a new TLP.

I'm excited about the possibilities of both HDSL and the NN refactoring in
ensuring a future for HDFS for years to come. A pluggable block manager
would also let us experiment with things like HDFS-on-S3, increasingly
important in a cloud-centric world. CBlock would bring HDFS to new usecases
around generic container workloads. However, given the timeline for
completing these efforts, now is not the time to merge.

Best,
Andrew

On Thu, Mar 1, 2018 at 5:33 PM, Daryn Sharp  wrote:

> I’m generally neutral and looked foremost at developer impact.  Ie.  Will
> it be so intertwined with hdfs that each project risks destabilizing the
> other?  Will developers with no expertise in ozone will be impeded?  I
> think the answer is currently no.  These are the intersections and some
> concerns based on the assumption ozone is accepted into the project:
>
>
> Common
>
> Appear to be a number of superfluous changes.  The conf servlet must not be
> polluted with specific references and logic for ozone.  We don’t create
> dependencies from common to hdfs, mapred, yarn, hive, etc.  Common must be
> “ozone free”.
>
>
> Datanode
>
> I expected ozone changes to be intricately linked with the existing blocks
> map, dataset, volume, etc.  Thankfully it’s not.  As an independent
> service, the DN should not be polluted with specific references to ozone.
> If ozone is in the project, the DN should have a generic plugin interface
> conceptually similar to the NM aux services.
>
>
> Namenode
>
> No impact, currently, but certainly will be…
>
>
> Code Location
>
> I don’t feel hadoop-hdfs-project/hadoop-hdfs is an acceptable location.
> I’d rather see hadoop-hdfs-project/hadoop-hdsl, or even better
> hadoop-hdsl-project.  This clean separation will make it easier to later
> spin off or pull in depending on which way we vote.
>
>
> Dependencies
>
> Owen hit upon his before I could send.  Hadoop is already bursting with
> dependencies, I hope this doesn’t pull in a lot more.
>
>
> ––
>
>
> Do I think ozone be should be a separate project?  If we view it only as a
> competing filesystem, then clearly yes.  If it’s a low risk evolutionary
> step with near-term benefits, no, we want to keep it close and help it
> evolve. 

Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-02-28 Thread Andrew Wang
Resending since the formatting was messed up, let's try plain text this
time:

Hi Jitendra and all,

Thanks for putting this together. I caught up on the discussion on JIRA and
document at HDFS-10419, and still have the same concerns raised earlier
about merging the Ozone branch to trunk.

To recap these questions/concerns at a very high level:

* Wouldn't Ozone benefit from being a separate project?
* Why should it be merged now?

I still believe that both Ozone and Hadoop would benefit from Ozone being a
separate project, and that there is no pressing reason to merge Ozone/HDSL
now.

The primary reason I've heard for merging is that the Ozone is that it's at
a stage where it's ready for user feedback. Second, that it needs to be
merged to start on the NN refactoring for HDFS-on-HDSL.

First, without HDFS-on-HDSL support, users are testing against the Ozone
object storage interface. Ozone and HDSL themselves are implemented as
separate masters and new functionality bolted onto the datanode. It also
doesn't look like HDFS in terms of API or featureset; yes, it speaks
FileSystem, but so do many out-of-tree storage systems like S3, Ceph,
Swift, ADLS etc. Ozone/HDSL does not support popular HDFS features like
erasure coding, encryption, high-availability, snapshots, hflush/hsync (and
thus HBase), or APIs like WebHDFS or NFS. This means that Ozone feels like
a new, different system that could reasonably be deployed and tested
separately from HDFS. It's unlikely to replace many of today's HDFS
deployments, and from what I understand, Ozone was not designed to do this.

Second, the NameNode refactoring for HDFS-on-HDSL by itself is a major
undertaking. The discussion on HDFS-10419 is still ongoing so it’s not
clear what the ultimate refactoring will be, but I do know that the earlier
FSN/BM refactoring during 2.x was very painful (introducing new bugs and
making backports difficult) and probably should have been deferred to a new
major release instead. I think this refactoring is important for the
long-term maintainability of the NN and worth pursuing, but as a Hadoop 4.0
item. Merging HDSL is also not a prerequisite for starting this
refactoring. Really, I see the refactoring as the prerequisite for
HDFS-on-HDSL to be possible.

Finally, I earnestly believe that Ozone/HDSL itself would benefit from
being a separate project. Ozone could release faster and iterate more
quickly if it wasn't hampered by Hadoop's release schedule and security and
compatibility requirements. There are also publicity and community
benefits; it's an opportunity to build a community focused on the novel
capabilities and architectural choices of Ozone/HDSL. There are examples of
other projects that were "incubated" on a branch in the Hadoop repo before
being spun off to great success.

In conclusion, I'd like to see Ozone succeeding and thriving as a separate
project. Meanwhile, we can work on the HDFS refactoring required to
separate the FSN and BM and make it pluggable. At that point (likely in the
Hadoop 4 timeframe), we'll be ready to pursue HDFS-on-HDSL integration.

Best,
Andrew

On Tue, Feb 27, 2018 at 11:23 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

>
>
>
>
>
>
>
>
>
> *Hi Jitendra and all,Thanks for putting this together. I caught up on the
> discussion on JIRA and document at HDFS-10419, and still have the same
> concerns raised earlier
> <https://issues.apache.org/jira/browse/HDFS-7240?focusedCommentId=16257730=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16257730>
> about merging the Ozone branch to trunk.To recap these questions/concerns
> at a very high level:* Wouldn't Ozone benefit from being a separate
> project?* Why should it be merged now?I still believe that both Ozone and
> Hadoop would benefit from Ozone being a separate project, and that there is
> no pressing reason to merge Ozone/HDSL now.The primary reason I've heard
> for merging is that the Ozone is that it's at a stage where it's ready for
> user feedback. Second, that it needs to be merged to start on the NN
> refactoring for HDFS-on-HDSL.First, without HDFS-on-HDSL support, users are
> testing against the Ozone object storage interface. Ozone and HDSL
> themselves are implemented as separate masters and new functionality bolted
> onto the datanode. It also doesn't look like HDFS in terms of API or
> featureset; yes, it speaks FileSystem, but so do many out-of-tree storage
> systems like S3, Ceph, Swift, ADLS etc. Ozone/HDSL does not support popular
> HDFS features like erasure coding, encryption, high-availability,
> snapshots, hflush/hsync (and thus HBase), or APIs like WebHDFS or NFS. This
> means that Ozone feels like a new, different system that could reasonably
> be deployed and tested separately from HDFS. It's unlikely to replace many
> of today's HDFS deployments, and from what I understand, Ozone

Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-02-27 Thread Andrew Wang
*Hi Jitendra and all,Thanks for putting this together. I caught up on the
discussion on JIRA and document at HDFS-10419, and still have the same
concerns raised earlier

about merging the Ozone branch to trunk.To recap these questions/concerns
at a very high level:* Wouldn't Ozone benefit from being a separate
project?* Why should it be merged now?I still believe that both Ozone and
Hadoop would benefit from Ozone being a separate project, and that there is
no pressing reason to merge Ozone/HDSL now.The primary reason I've heard
for merging is that the Ozone is that it's at a stage where it's ready for
user feedback. Second, that it needs to be merged to start on the NN
refactoring for HDFS-on-HDSL.First, without HDFS-on-HDSL support, users are
testing against the Ozone object storage interface. Ozone and HDSL
themselves are implemented as separate masters and new functionality bolted
onto the datanode. It also doesn't look like HDFS in terms of API or
featureset; yes, it speaks FileSystem, but so do many out-of-tree storage
systems like S3, Ceph, Swift, ADLS etc. Ozone/HDSL does not support popular
HDFS features like erasure coding, encryption, high-availability,
snapshots, hflush/hsync (and thus HBase), or APIs like WebHDFS or NFS. This
means that Ozone feels like a new, different system that could reasonably
be deployed and tested separately from HDFS. It's unlikely to replace many
of today's HDFS deployments, and from what I understand, Ozone was not
designed to do this.Second, the NameNode refactoring for HDFS-on-HDSL by
itself is a major undertaking. The discussion on HDFS-10419 is still
ongoing so it’s not clear what the ultimate refactoring will be, but I do
know that the earlier FSN/BM refactoring during 2.x was very painful
(introducing new bugs and making backports difficult) and probably should
have been deferred to a new major release instead. I think this refactoring
is important for the long-term maintainability of the NN and worth
pursuing, but as a Hadoop 4.0 item. Merging HDSL is also not a prerequisite
for starting this refactoring. Really, I see the refactoring as the
prerequisite for HDFS-on-HDSL to be possible.Finally, I earnestly believe
that Ozone/HDSL itself would benefit from being a separate project. Ozone
could release faster and iterate more quickly if it wasn't hampered by
Hadoop's release schedule and security and compatibility requirements.
There are also publicity and community benefits; it's an opportunity to
build a community focused on the novel capabilities and architectural
choices of Ozone/HDSL. There are examples of other projects that were
"incubated" on a branch in the Hadoop repo before being spun off to great
success.In conclusion, I'd like to see Ozone succeeding and thriving as a
separate project. Meanwhile, we can work on the HDFS refactoring required
to separate the FSN and BM and make it pluggable. At that point (likely in
the Hadoop 4 timeframe), we'll be ready to pursue HDFS-on-HDSL integration.*
Best,
Andrew

On Mon, Feb 26, 2018 at 1:18 PM, Jitendra Pandey 
wrote:

> Dear folks,
>We would like to start a vote to merge HDFS-7240 branch into
> trunk. The context can be reviewed in the DISCUSSION thread, and in the
> jiras (See references below).
>
> HDFS-7240 introduces Hadoop Distributed Storage Layer (HDSL), which is
> a distributed, replicated block layer.
> The old HDFS namespace and NN can be connected to this new block layer
> as we have described in HDFS-10419.
> We also introduce a key-value namespace called Ozone built on HDSL.
>
> The code is in a separate module and is turned off by default. In a
> secure setup, HDSL and Ozone daemons cannot be started.
>
> The detailed documentation is available at
>  https://cwiki.apache.org/confluence/display/HADOOP/
> Hadoop+Distributed+Storage+Layer+and+Applications
>
>
> I will start with my vote.
> +1 (binding)
>
>
> Discussion Thread:
>   https://s.apache.org/7240-merge
>   https://s.apache.org/4sfU
>
> Jiras:
>https://issues.apache.org/jira/browse/HDFS-7240
>https://issues.apache.org/jira/browse/HDFS-10419
>https://issues.apache.org/jira/browse/HDFS-13074
>https://issues.apache.org/jira/browse/HDFS-13180
>
>
> Thanks
> jitendra
>
>
>
>
>
> DISCUSSION THREAD SUMMARY :
>
> On 2/13/18, 6:28 PM, "sanjay Radia" 
> wrote:
>
> Sorry the formatting got messed by my email client.  Here
> it is again
>
>
> Dear
>  Hadoop Community Members,
>
>We had multiple community discussions, a few meetings
> in smaller groups and also jira discussions with respect to 

Re: [DISCUSS] 2.9+ stabilization branch

2018-02-27 Thread Andrew Wang
Hi Konst and all,

Is there a list of 3.0 specific upgrade concerns that you could share? I
understand that a new major release comes with risk simply due to the
amount of code change, but we've done our best as a community to alleviate
these concerns through much improved integration testing and compatibility
efforts like the shaded client and revamped compat guide. I'd love to hear
about what else we can do here to improve our 3.x upgrade story.

I understand the need for a bridge release as an upgrade path to 3.x, but I
want to make sure we don't end up needing a 2.11 or 2.12 also. The scope
mentioned here isn't really bridging improvements, which in in my mind are
compatibility improvements that help with running 2.x and 3.x clients
concurrently to enable a later upgrade to just 3.x. Including new features
makes this harder (or at least not easier), and means more ongoing
maintenance work on 2.x.

So, a hearty +1 to your closing statement: if we're going to do a bridge
release, let's do it right and do it once.

Best,
Andrew

On Tue, Feb 27, 2018 at 6:21 PM, Konstantin Shvachko <shv.had...@gmail.com>
wrote:

> Thanks Subru for initiating the thread about GPU support.
> I think the path of taking 2.9 as a base for 2.10 and adding new resource
> types into it is quite reasonable.
> That way we can combine stabilization effort on 2.9 with GPUs.
>
> Arun, upgrading Java is probably a separate topic.
> We should discuss it on a separate followup thread if we agree to add GPU
> support into 2.10.
>
> Andrew, we actually ran a small 3.0 cluster to experiment with Tensorflow
> on YARN with gpu resources. It worked well! Therefore the interest.
> Although given the breadth (and the quantity) of our use cases it is
> infeasible to jump directly to 3.0, as Jonathan explained.
> A transitional stage such as 2.10 will be required. Probably the same for
> many other big-cluster folks.
> It would be great if people who run different hadoop versions <= 2.8 can
> converge at 2.10 bridge, to help cross over to 3.
> GPU support would be a serious catalyst for us to move forward, which I
> also heard from other organizations interested in ML.
>
> Thanks,
> --Konstantin
>
> On Tue, Feb 27, 2018 at 1:28 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Hi Arun/Subru,
>>
>> Bumping the minimum Java version is a major change, and incompatible for
>> users who are unable to upgrade their JVM version. We're beyond the EOL
>> for
>> Java 7, but as we know from our experience with Java 6, there are plenty
>> of
>> users who stick on old Java versions. Bumping the Java version also makes
>> backports more difficult, and we're still maintaining a number of older
>> 2.x
>> releases. I think this is too big for a minor release, particularly when
>> we
>> have 3.x as an option that fully supports Java 8.
>>
>> What's the rationale for bumping it here?
>>
>> I'm also curious if there are known issues with 3.x that we can fix to
>> make
>> 3.x upgrades smoother. I would prefer improving the upgrade experience to
>> backporting major features to 2.x since 3.x is meant to be the delivery
>> vehicle for new features beyond the ones named here.
>>
>> Best,
>> Andrew
>>
>> On Tue, Feb 27, 2018 at 11:01 AM, Arun Suresh <asur...@apache.org> wrote:
>>
>> > Hello folks
>> >
>> > We also think this bridging release opens up an opportunity to bump the
>> > java version in branch-2 to java 8.
>> > Would really love to hear thoughts on that.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> >
>> > On Mon, Feb 26, 2018 at 5:18 PM, Jonathan Hung <jyhung2...@gmail.com>
>> > wrote:
>> >
>> > > Hi Subru,
>> > >
>> > > Thanks for starting the discussion.
>> > >
>> > > We (LinkedIn) have an immediate need for resource types and native GPU
>> > > support. Given we are running 2.7 on our main clusters, we decided to
>> > avoid
>> > > deploying hadoop 3.x on our machine learning clusters (and having to
>> > > support two very different hadoop versions). Since for us there is
>> > > considerable risk and work involved in upgrading to hadoop 3, I think
>> > > having a branch-2.10 bridge release for porting important hadoop 3
>> > features
>> > > to branch-2 is a good idea.
>> > >
>> > > Thanks,
>> > >
>> > >
>> > > Jonathan Hung
>> > >
>> > > On Mon, Feb 26, 2018 at 2:37 PM, Subru Krishnan <su...@apache.org>
>> > wrote:
>> > &g

Re: [DISCUSS] 2.9+ stabilization branch

2018-02-27 Thread Andrew Wang
Hi Arun/Subru,

Bumping the minimum Java version is a major change, and incompatible for
users who are unable to upgrade their JVM version. We're beyond the EOL for
Java 7, but as we know from our experience with Java 6, there are plenty of
users who stick on old Java versions. Bumping the Java version also makes
backports more difficult, and we're still maintaining a number of older 2.x
releases. I think this is too big for a minor release, particularly when we
have 3.x as an option that fully supports Java 8.

What's the rationale for bumping it here?

I'm also curious if there are known issues with 3.x that we can fix to make
3.x upgrades smoother. I would prefer improving the upgrade experience to
backporting major features to 2.x since 3.x is meant to be the delivery
vehicle for new features beyond the ones named here.

Best,
Andrew

On Tue, Feb 27, 2018 at 11:01 AM, Arun Suresh  wrote:

> Hello folks
>
> We also think this bridging release opens up an opportunity to bump the
> java version in branch-2 to java 8.
> Would really love to hear thoughts on that.
>
> Cheers
> -Arun/Subru
>
>
> On Mon, Feb 26, 2018 at 5:18 PM, Jonathan Hung 
> wrote:
>
> > Hi Subru,
> >
> > Thanks for starting the discussion.
> >
> > We (LinkedIn) have an immediate need for resource types and native GPU
> > support. Given we are running 2.7 on our main clusters, we decided to
> avoid
> > deploying hadoop 3.x on our machine learning clusters (and having to
> > support two very different hadoop versions). Since for us there is
> > considerable risk and work involved in upgrading to hadoop 3, I think
> > having a branch-2.10 bridge release for porting important hadoop 3
> features
> > to branch-2 is a good idea.
> >
> > Thanks,
> >
> >
> > Jonathan Hung
> >
> > On Mon, Feb 26, 2018 at 2:37 PM, Subru Krishnan 
> wrote:
> >
> > > Folks,
> > >
> > > We (i.e. Microsoft) have started stabilization of 2.9 for our
> production
> > > deployment. During planning, we realized that we need to backport 3.x
> > > features to support GPUs (and more resource types like network IO)
> > natively
> > > as part of the upgrade. We'd like to share that work with the
> community.
> > >
> > > Instead of stabilizing the base release and cherry-picking fixes back
> to
> > > Apache, we want to work publicly and push fixes directly into
> > > trunk/.../branch-2 for a stable 2.10.0 release. Our goal is to create a
> > > bridge release for our production clusters to the 3.x series and to
> > address
> > > scalability problems in large clusters (N*10k nodes). As we find
> issues,
> > we
> > > will file JIRAs and track resolution of significant regressions/faults
> in
> > > wiki. Moreover, LinkedIn also has committed plans for a production
> > > deployment of the same branch. We welcome broad participation,
> > particularly
> > > since we'll be stabilizing relatively new features.
> > >
> > > The exact list of features we would like to backport in YARN are:
> > >
> > >- Support for Resource types [1][2]
> > >- Native support for GPUs[3]
> > >- Absolute Resource configuration in CapacityScheduler [4]
> > >
> > >
> > > With regards to HDFS, we are currently looking at mainly fixes to
> Router
> > > based Federation and Windows specific fixes which should anyways flow
> > > normally.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Subru/Arun
> > >
> > > [1] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/
> > msg27786.html
> > > [2] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/
> > msg28281.html
> > > [3] https://issues.apache.org/jira/browse/YARN-6223
> > > [4] https://www.mail-archive.com/yarn-dev@hadoop.apache.org/
> > msg28772.html
> > >
> >
>


Re: Apache Hadoop 3.0.1 Release plan

2018-01-09 Thread Andrew Wang
Hi Eddy, thanks for taking this on,

Historically we've waited for the first RC to cut the release branch since
it keeps things simpler for committers.

Also, could you check the permissions on your JIRA filter? It shows as
private for me.

Best,
Andrew

On Tue, Jan 9, 2018 at 11:17 AM, Lei Xu  wrote:

> Hi, All
>
> We have released Apache Hadoop 3.0.0 in December [1]. To further
> improve the quality of release, we plan to cut branch-3.0.1 branch
> tomorrow for the preparation of Apache Hadoop 3.0.1 release. The focus
> of 3.0.1 will be fixing blockers (3), critical bugs (1) and bug fixes
> [2].  No new features and improvement should be included.
>
> We plan to cut branch-3.0.1 tomorrow (Jan 10th) and vote for RC on Feb
> 1st, targeting for Feb 9th release.
>
> Please feel free to share your insights.
>
> [1] https://www.mail-archive.com/general@hadoop.apache.org/msg07757.html
> [2] https://issues.apache.org/jira/issues/?filter=12342842
>
> Best,
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-18 Thread Andrew Wang
Thanks for the spot, I just pushed a correct tag. I can't delete the bad
tag myself, will ask ASF infra for help.

On Mon, Dec 18, 2017 at 4:46 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:

> Congrats on the huge release!
>
> I just noticed, though, that the Github repo does not appear to have the
> correct tag for 3.0.0. I see a new tag called "rel/release-" that points to
> the same commit as "release-3.0.0-RC1" 
> (c25427ceca461ee979d30edd7a4b0f50718e6533).
> I assume that should have actually been called "rel/release-3.0.0" to match
> the pattern for prior releases.
>
> Thanks,
> Jonathan Kelly
>
> On Thu, Dec 14, 2017 at 10:45 AM Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Hi all,
>>
>> I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
>> (GA).
>>
>> 3.0.0 GA consists of 302 bug fixes, improvements, and other enhancements
>> since 3.0.0-beta1. This release marks a point of quality and stability for
>> the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta
>> releases
>> are encouraged to upgrade.
>>
>> Looking back, 3.0.0 GA is the culmination of over a year of work on the
>> 3.0.0 line, starting with 3.0.0-alpha1 which was released in September
>> 2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.
>>
>> Users are encouraged to read the overview of major changes
>> <http://hadoop.apache.org/docs/r3.0.0/index.html> in 3.0.0. The GA
>> release
>> notes
>> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
>> dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html>
>>  and changelog
>> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
>> dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html>
>> detail
>> the changes since 3.0.0-beta1.
>>
>> The ASF press release provides additional color and highlights some of the
>> major features:
>>
>> https://globenewswire.com/news-release/2017/12/14/
>> 1261879/0/en/The-Apache-Software-Foundation-Announces-
>> Apache-Hadoop-v3-0-0-General-Availability.html
>>
>> Let me end by thanking the many, many contributors who helped with this
>> release line. We've only had three major releases in Hadoop's 10 year
>> history, and this is our biggest major release ever. It's an incredible
>> accomplishment for our community, and I'm proud to have worked with all of
>> you.
>>
>> Best,
>> Andrew
>>
>


Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-18 Thread Andrew Wang
Moving general@ to BCC,

The main page and releases posts on hadoop.apache.org are pretty clear
about this being a diff from beta1, am I missing something? Pasted below:

After four alpha releases and one beta release, 3.0.0 is generally
available. 3.0.0 consists of 302 bug fixes, improvements, and other
enhancements since 3.0.0-beta1. All together, 6242 issues were fixed as
part of the 3.0.0 release series since 2.7.0.

Users are encouraged to read the overview of major changes
<http://hadoop.apache.org/docs/r3.0.0/index.html> in 3.0.0. The GA release
notes
<http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html>
 and changelog
<http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html>
detail
the changes since 3.0.0-beta1.



On Mon, Dec 18, 2017 at 10:32 AM, Arpit Agarwal <aagar...@hortonworks.com>
wrote:

> That makes sense for Beta users but most of our users will be upgrading
> from a previous GA release and the changelog will mislead them. The webpage
> does not mention this is a delta from the beta release.
>
>
>
>
>
> *From: *Andrew Wang <andrew.w...@cloudera.com>
> *Date: *Friday, December 15, 2017 at 10:36 AM
> *To: *Arpit Agarwal <aagar...@hortonworks.com>
> *Cc: *general <gene...@hadoop.apache.org>, "common-...@hadoop.apache.org"
> <common-...@hadoop.apache.org>, "yarn-...@hadoop.apache.org" <
> yarn-...@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <
> mapreduce-dev@hadoop.apache.org>, "hdfs-...@hadoop.apache.org" <
> hdfs-...@hadoop.apache.org>
> *Subject: *Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released
>
>
>
> Hi Arpit,
>
>
>
> If you look at the release announcements, it's made clear that the
> changelog for 3.0.0 is diffed based on beta1. This is important since users
> need to know what's different from the previous 3.0.0-* releases if they're
> upgrading.
>
>
>
> I agree there's additional value to making combined release notes, but
> it'd be something additive rather than replacing what's there.
>
>
>
> Best,
>
> Andrew
>
>
>
> On Fri, Dec 15, 2017 at 8:27 AM, Arpit Agarwal <aagar...@hortonworks.com>
> wrote:
>
>
> Hi Andrew,
>
> Thank you for all the hard work on this release. I was out the last few
> days and didn’t get a chance to evaluate RC1 earlier.
>
> The changelog looks incorrect. E.g. This gives an impression that there
> are just 5 incompatible changes in 3.0.0.
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/
> hadoop-common/release/3.0.0/CHANGES.3.0.0.html
>
> I assume you only counted 3.0.0 changes in this log excluding
> alphas/betas. However, users shouldn’t have to manually compile
> incompatibilities by summing up a/b release notes. Can we fix the changelog
> after the fact?
>
>
>
>
> On 12/14/17, 10:45 AM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:
>
> Hi all,
>
> I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
> (GA).
>
> 3.0.0 GA consists of 302 bug fixes, improvements, and other
> enhancements
> since 3.0.0-beta1. This release marks a point of quality and stability
> for
> the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta
> releases
> are encouraged to upgrade.
>
> Looking back, 3.0.0 GA is the culmination of over a year of work on the
> 3.0.0 line, starting with 3.0.0-alpha1 which was released in September
> 2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.
>
> Users are encouraged to read the overview of major changes
> <http://hadoop.apache.org/docs/r3.0.0/index.html> in 3.0.0. The GA
> release
> notes
> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
> dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html>
>  and changelog
> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
> dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html>
>
> detail
> the changes since 3.0.0-beta1.
>
> The ASF press release provides additional color and highlights some of
> the
> major features:
>
> https://globenewswire.com/news-release/2017/12/14/
> 1261879/0/en/The-Apache-Software-Foundation-Announces-
> Apache-Hadoop-v3-0-0-General-Availability.html
>
> Let me end by thanking the many, many contributors who helped with this
> release line. We've only had three major releases in Hadoop's 10 year
> history, and this is our biggest major release ever. It's an incredible
> accomplishment for our community, and I'm proud to have worked with
> all of
> you.
>
> Best,
> Andrew
>
>
>
>
>
>
>
>


Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-15 Thread Andrew Wang
Hi Arpit,

If you look at the release announcements, it's made clear that the
changelog for 3.0.0 is diffed based on beta1. This is important since users
need to know what's different from the previous 3.0.0-* releases if they're
upgrading.

I agree there's additional value to making combined release notes, but it'd
be something additive rather than replacing what's there.

Best,
Andrew

On Fri, Dec 15, 2017 at 8:27 AM, Arpit Agarwal <aagar...@hortonworks.com>
wrote:

>
> Hi Andrew,
>
> Thank you for all the hard work on this release. I was out the last few
> days and didn’t get a chance to evaluate RC1 earlier.
>
> The changelog looks incorrect. E.g. This gives an impression that there
> are just 5 incompatible changes in 3.0.0.
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/
> hadoop-common/release/3.0.0/CHANGES.3.0.0.html
>
> I assume you only counted 3.0.0 changes in this log excluding
> alphas/betas. However, users shouldn’t have to manually compile
> incompatibilities by summing up a/b release notes. Can we fix the changelog
> after the fact?
>
>
>
>
> On 12/14/17, 10:45 AM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:
>
> Hi all,
>
> I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
> (GA).
>
> 3.0.0 GA consists of 302 bug fixes, improvements, and other
> enhancements
> since 3.0.0-beta1. This release marks a point of quality and stability
> for
> the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta
> releases
> are encouraged to upgrade.
>
> Looking back, 3.0.0 GA is the culmination of over a year of work on the
> 3.0.0 line, starting with 3.0.0-alpha1 which was released in September
> 2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.
>
> Users are encouraged to read the overview of major changes
> <http://hadoop.apache.org/docs/r3.0.0/index.html> in 3.0.0. The GA
> release
> notes
> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
> dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html>
>  and changelog
> <http://hadoop.apache.org/docs/r3.0.0/hadoop-project-
> dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html>
> detail
> the changes since 3.0.0-beta1.
>
> The ASF press release provides additional color and highlights some of
> the
> major features:
>
> https://globenewswire.com/news-release/2017/12/14/
> 1261879/0/en/The-Apache-Software-Foundation-Announces-
> Apache-Hadoop-v3-0-0-General-Availability.html
>
> Let me end by thanking the many, many contributors who helped with this
> release line. We've only had three major releases in Hadoop's 10 year
> history, and this is our biggest major release ever. It's an incredible
> accomplishment for our community, and I'm proud to have worked with
> all of
> you.
>
> Best,
> Andrew
>
>
>
>
>
>
>
>


[ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-14 Thread Andrew Wang
Hi all,

I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
(GA).

3.0.0 GA consists of 302 bug fixes, improvements, and other enhancements
since 3.0.0-beta1. This release marks a point of quality and stability for
the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta releases
are encouraged to upgrade.

Looking back, 3.0.0 GA is the culmination of over a year of work on the
3.0.0 line, starting with 3.0.0-alpha1 which was released in September
2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.

Users are encouraged to read the overview of major changes
 in 3.0.0. The GA release
notes

 and changelog

detail
the changes since 3.0.0-beta1.

The ASF press release provides additional color and highlights some of the
major features:

https://globenewswire.com/news-release/2017/12/14/1261879/0/en/The-Apache-Software-Foundation-Announces-Apache-Hadoop-v3-0-0-General-Availability.html

Let me end by thanking the many, many contributors who helped with this
release line. We've only had three major releases in Hadoop's 10 year
history, and this is our biggest major release ever. It's an incredible
accomplishment for our community, and I'm proud to have worked with all of
you.

Best,
Andrew


Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-13 Thread Andrew Wang
Hi folks,

To close this out, the vote passes successfully with 13 binding +1s, 5
non-binding +1s, and no -1s. Thanks everyone for voting! I'll work on
staging.

I'm hoping we can address YARN-7588 and any remaining rolling upgrade
issues in 3.0.x maintenance releases. Beyond a wiki page, it would be
really great to get JIRAs filed and targeted for tracking as soon as
possible.

Vinod, what do you think we need to do regarding caveating rolling upgrade
support? We haven't advertised rolling upgrade support between major
releases outside of dev lists and JIRA. As a new major release, our compat
guidelines allow us to break compatibility, so I don't think it's expected
by users.

Best,
Andrew

On Wed, Dec 13, 2017 at 12:37 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> I was waiting for Daniel to post the minutes from YARN meetup to talk
> about this. Anyways, in that discussion, we identified a bunch of key
> upgrade related scenarios that no-one seems to have validated - atleast
> from the representation in the YARN meetup. I'm going to create a wiki-page
> listing all these scenarios.
>
> But back to the bug that Junping raised. At this point, we don't have a
> clear path towards running 2.x applications on 3.0.0 clusters. So, our
> claim of rolling-upgrades already working is not accurate.
>
> One of the two options that Junping proposed should be pursued before we
> close the release. I'm in favor of calling out rolling-upgrade support be
> with-drawn or caveated and push for progress instead of blocking the
> release.
>
> Thanks
> +Vinod
>
> > On Dec 12, 2017, at 5:44 PM, Junping Du <j...@hortonworks.com> wrote:
> >
> > Thanks Andrew for pushing new RC for 3.0.0. I was out last week, just
> get chance to validate new RC now.
> >
> > Basically, I found two critical issues with the same rolling upgrade
> scenario as where HADOOP-15059 get found previously:
> > HDFS-12920, we changed value format for some hdfs configurations that
> old version MR client doesn't understand when fetching these
> configurations. Some quick workarounds are to add old value (without time
> unit) in hdfs-site.xml to override new default values but will generate
> many annoying warnings. I provided my fix suggestions on the JIRA already
> for more discussion.
> > The other one is YARN-7646. After we workaround HDFS-12920, will hit the
> issue that old version MR AppMaster cannot communicate with new version of
> YARN RM - could be related to resource profile changes from YARN side but
> root cause are still in investigation.
> >
> > The first issue may not belong to a blocker given we can workaround this
> without code change. I am not sure if we can workaround 2nd issue so far.
> If not, we may have to fix this or compromise with withdrawing support of
> rolling upgrade or calling it a stable release.
> >
> >
> > Thanks,
> >
> > Junping
> >
> > 
> > From: Robert Kanter <rkan...@cloudera.com>
> > Sent: Tuesday, December 12, 2017 3:10 PM
> > To: Arun Suresh
> > Cc: Andrew Wang; Lei Xu; Wei-Chiu Chuang; Ajay Kumar; Xiao Chen; Aaron
> T. Myers; common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > Subject: Re: [VOTE] Release Apache Hadoop 3.0.0 RC1
> >
> > +1 (binding)
> >
> > + Downloaded the binary release
> > + Deployed on a 3 node cluster on CentOS 7.3
> > + Ran some MR jobs, clicked around the UI, etc
> > + Ran some CLI commands (yarn logs, etc)
> >
> > Good job everyone on Hadoop 3!
> >
> >
> > - Robert
> >
> > On Tue, Dec 12, 2017 at 1:56 PM, Arun Suresh <asur...@apache.org> wrote:
> >
> >> +1 (binding)
> >>
> >> - Verified signatures of the source tarball.
> >> - built from source - using the docker build environment.
> >> - set up a pseudo-distributed test cluster.
> >> - ran basic HDFS commands
> >> - ran some basic MR jobs
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Tue, Dec 12, 2017 at 1:52 PM, Andrew Wang <andrew.w...@cloudera.com>
> >> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> As a reminder, this vote closes tomorrow at 12:31pm, so please give it
> a
> >>> whack if you have time. There are already enough binding +1s to pass
> this
> >>> vote, but it'd be great to get additional validation.
> >>>
> >>> Thanks to everyone who's voted thus far!
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-12 Thread Andrew Wang
Hi everyone,

As a reminder, this vote closes tomorrow at 12:31pm, so please give it a
whack if you have time. There are already enough binding +1s to pass this
vote, but it'd be great to get additional validation.

Thanks to everyone who's voted thus far!

Best,
Andrew



On Tue, Dec 12, 2017 at 11:08 AM, Lei Xu <l...@cloudera.com> wrote:

> +1 (binding)
>
> * Verified src tarball and bin tarball, verified md5 of each.
> * Build source with -Pdist,native
> * Started a pseudo cluster
> * Run ec -listPolicies / -getPolicy / -setPolicy on /  , and run hdfs
> dfs put/get/cat on "/" with XOR-2-1 policy.
>
> Thanks Andrew for this great effort!
>
> Best,
>
>
> On Tue, Dec 12, 2017 at 9:55 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > Hi Wei-Chiu,
> >
> > The patchprocess directory is left over from the create-release process,
> > and it looks empty to me. We should still file a create-release JIRA to
> fix
> > this, but I think this is not a blocker. Would you agree?
> >
> > Best,
> > Andrew
> >
> > On Tue, Dec 12, 2017 at 9:44 AM, Wei-Chiu Chuang <weic...@cloudera.com>
> > wrote:
> >
> >> Hi Andrew, thanks the tremendous effort.
> >> I found an empty "patchprocess" directory in the source tarball, that is
> >> not there if you clone from github. Any chance you might have some
> leftover
> >> trash when you made the tarball?
> >> Not wanting to nitpicking, but you might want to double check so we
> don't
> >> ship anything private to you in public :)
> >>
> >>
> >>
> >> On Tue, Dec 12, 2017 at 7:48 AM, Ajay Kumar <ajay.ku...@hortonworks.com
> >
> >> wrote:
> >>
> >>> +1 (non-binding)
> >>> Thanks for driving this, Andrew Wang!!
> >>>
> >>> - downloaded the src tarball and verified md5 checksum
> >>> - built from source with jdk 1.8.0_111-b14
> >>> - brought up a pseudo distributed cluster
> >>> - did basic file system operations (mkdir, list, put, cat) and
> >>> confirmed that everything was working
> >>> - Run word count, pi and DFSIOTest
> >>> - run hdfs and yarn, confirmed that the NN, RM web UI worked
> >>>
> >>> Cheers,
> >>> Ajay
> >>>
> >>> On 12/11/17, 9:35 PM, "Xiao Chen" <x...@cloudera.com> wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> - downloaded src tarball, verified md5
> >>> - built from source with jdk1.8.0_112
> >>> - started a pseudo cluster with hdfs and kms
> >>> - sanity checked encryption related operations working
> >>> - sanity checked webui and logs.
> >>>
> >>> -Xiao
> >>>
> >>> On Mon, Dec 11, 2017 at 6:10 PM, Aaron T. Myers <a...@apache.org>
> >>> wrote:
> >>>
> >>> > +1 (binding)
> >>> >
> >>> > - downloaded the src tarball and built the source (-Pdist
> -Pnative)
> >>> > - verified the checksum
> >>> > - brought up a secure pseudo distributed cluster
> >>> > - did some basic file system operations (mkdir, list, put, cat)
> and
> >>> > confirmed that everything was working
> >>> > - confirmed that the web UI worked
> >>> >
> >>> > Best,
> >>> > Aaron
> >>> >
> >>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <
> >>> andrew.w...@cloudera.com>
> >>> > wrote:
> >>> >
> >>> > > Hi all,
> >>> > >
> >>> > > Let me start, as always, by thanking the efforts of all the
> >>> contributors
> >>> > > who contributed to this release, especially those who jumped on
> >>> the
> >>> > issues
> >>> > > found in RC0.
> >>> > >
> >>> > > I've prepared RC1 for Apache Hadoop 3.0.0. This release
> >>> incorporates 302
> >>> > > fixed JIRAs since the previous 3.0.0-beta1 release.
> >>> > >
> >>> > > You can find the artifacts here:
> >>> > >
> >>> > > http://home.apache.org/~wang/3.0.0-RC1/
> >>> > >
> >>> > > I've done the traditional testing of building from the source
> >>> tarball and
> >>> > > running a Pi job on a single node cluster. I also verified that
> >>> the
> >>> > shaded
> >>> > > jars are not empty.
> >>> > >
> >>> > > Found one issue that create-release (probably due to the mvn
> >>> deploy
> >>> > change)
> >>> > > didn't sign the artifacts, but I fixed that by calling mvn one
> >>> more time.
> >>> > > Available here:
> >>> > >
> >>> > > https://repository.apache.org/content/repositories/orgapache
> >>> hadoop-1075/
> >>> > >
> >>> > > This release will run the standard 5 days, closing on Dec 13th
> at
> >>> 12:31pm
> >>> > > Pacific. My +1 to start.
> >>> > >
> >>> > > Best,
> >>> > > Andrew
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>>
> >>
> >>
> >>
> >>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-12 Thread Andrew Wang
Hi Wei-Chiu,

The patchprocess directory is left over from the create-release process,
and it looks empty to me. We should still file a create-release JIRA to fix
this, but I think this is not a blocker. Would you agree?

Best,
Andrew

On Tue, Dec 12, 2017 at 9:44 AM, Wei-Chiu Chuang <weic...@cloudera.com>
wrote:

> Hi Andrew, thanks the tremendous effort.
> I found an empty "patchprocess" directory in the source tarball, that is
> not there if you clone from github. Any chance you might have some leftover
> trash when you made the tarball?
> Not wanting to nitpicking, but you might want to double check so we don't
> ship anything private to you in public :)
>
>
>
> On Tue, Dec 12, 2017 at 7:48 AM, Ajay Kumar <ajay.ku...@hortonworks.com>
> wrote:
>
>> +1 (non-binding)
>> Thanks for driving this, Andrew Wang!!
>>
>> - downloaded the src tarball and verified md5 checksum
>> - built from source with jdk 1.8.0_111-b14
>> - brought up a pseudo distributed cluster
>> - did basic file system operations (mkdir, list, put, cat) and
>> confirmed that everything was working
>> - Run word count, pi and DFSIOTest
>> - run hdfs and yarn, confirmed that the NN, RM web UI worked
>>
>> Cheers,
>> Ajay
>>
>> On 12/11/17, 9:35 PM, "Xiao Chen" <x...@cloudera.com> wrote:
>>
>> +1 (binding)
>>
>> - downloaded src tarball, verified md5
>> - built from source with jdk1.8.0_112
>> - started a pseudo cluster with hdfs and kms
>> - sanity checked encryption related operations working
>> - sanity checked webui and logs.
>>
>> -Xiao
>>
>> On Mon, Dec 11, 2017 at 6:10 PM, Aaron T. Myers <a...@apache.org>
>> wrote:
>>
>> > +1 (binding)
>> >
>> > - downloaded the src tarball and built the source (-Pdist -Pnative)
>> > - verified the checksum
>> > - brought up a secure pseudo distributed cluster
>> > - did some basic file system operations (mkdir, list, put, cat) and
>> > confirmed that everything was working
>> > - confirmed that the web UI worked
>> >
>> > Best,
>> > Aaron
>> >
>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > Let me start, as always, by thanking the efforts of all the
>> contributors
>> > > who contributed to this release, especially those who jumped on
>> the
>> > issues
>> > > found in RC0.
>> > >
>> > > I've prepared RC1 for Apache Hadoop 3.0.0. This release
>> incorporates 302
>> > > fixed JIRAs since the previous 3.0.0-beta1 release.
>> > >
>> > > You can find the artifacts here:
>> > >
>> > > http://home.apache.org/~wang/3.0.0-RC1/
>> > >
>> > > I've done the traditional testing of building from the source
>> tarball and
>> > > running a Pi job on a single node cluster. I also verified that
>> the
>> > shaded
>> > > jars are not empty.
>> > >
>> > > Found one issue that create-release (probably due to the mvn
>> deploy
>> > change)
>> > > didn't sign the artifacts, but I fixed that by calling mvn one
>> more time.
>> > > Available here:
>> > >
>> > > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1075/
>> > >
>> > > This release will run the standard 5 days, closing on Dec 13th at
>> 12:31pm
>> > > Pacific. My +1 to start.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> >
>>
>>
>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>
>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Andrew Wang
Sorry, forgot to push the tag. It's up there now.

On Sun, Dec 10, 2017 at 8:31 PM, Vinod Kumar Vavilapalli <vino...@apache.org
> wrote:

> I couldn't find the release tag for RC1 either - is it just me or has the
> release-process changed?
>
> +Vinod
>
> > On Dec 10, 2017, at 4:31 PM, Sangjin Lee <sj...@apache.org> wrote:
> >
> > Hi Andrew,
> >
> > Thanks much for your effort! Just to be clear, could you please state the
> > git commit id of the RC1 we're voting for?
> >
> > Sangjin
> >
> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <andrew.w...@cloudera.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> Let me start, as always, by thanking the efforts of all the contributors
> >> who contributed to this release, especially those who jumped on the
> issues
> >> found in RC0.
> >>
> >> I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
> >> fixed JIRAs since the previous 3.0.0-beta1 release.
> >>
> >> You can find the artifacts here:
> >>
> >> http://home.apache.org/~wang/3.0.0-RC1/
> >>
> >> I've done the traditional testing of building from the source tarball
> and
> >> running a Pi job on a single node cluster. I also verified that the
> shaded
> >> jars are not empty.
> >>
> >> Found one issue that create-release (probably due to the mvn deploy
> change)
> >> didn't sign the artifacts, but I fixed that by calling mvn one more
> time.
> >> Available here:
> >>
> >> https://repository.apache.org/content/repositories/
> orgapachehadoop-1075/
> >>
> >> This release will run the standard 5 days, closing on Dec 13th at
> 12:31pm
> >> Pacific. My +1 to start.
> >>
> >> Best,
> >> Andrew
> >>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Andrew Wang
Good point on the mutability. Release tags are immutable, RCs are not.

On Mon, Dec 11, 2017 at 1:39 PM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks Andrew. For the record, the commit id would be
> c25427ceca461ee979d30edd7a4b0f50718e6533. I mention that for completeness
> because of the mutability of tags.
>
> On Mon, Dec 11, 2017 at 10:31 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Sorry, forgot to push the tag. It's up there now.
>>
>> On Sun, Dec 10, 2017 at 8:31 PM, Vinod Kumar Vavilapalli <
>> vino...@apache.org> wrote:
>>
>>> I couldn't find the release tag for RC1 either - is it just me or has
>>> the release-process changed?
>>>
>>> +Vinod
>>>
>>> > On Dec 10, 2017, at 4:31 PM, Sangjin Lee <sj...@apache.org> wrote:
>>> >
>>> > Hi Andrew,
>>> >
>>> > Thanks much for your effort! Just to be clear, could you please state
>>> the
>>> > git commit id of the RC1 we're voting for?
>>> >
>>> > Sangjin
>>> >
>>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <andrew.w...@cloudera.com
>>> >
>>> > wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> Let me start, as always, by thanking the efforts of all the
>>> contributors
>>> >> who contributed to this release, especially those who jumped on the
>>> issues
>>> >> found in RC0.
>>> >>
>>> >> I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates
>>> 302
>>> >> fixed JIRAs since the previous 3.0.0-beta1 release.
>>> >>
>>> >> You can find the artifacts here:
>>> >>
>>> >> http://home.apache.org/~wang/3.0.0-RC1/
>>> >>
>>> >> I've done the traditional testing of building from the source tarball
>>> and
>>> >> running a Pi job on a single node cluster. I also verified that the
>>> shaded
>>> >> jars are not empty.
>>> >>
>>> >> Found one issue that create-release (probably due to the mvn deploy
>>> change)
>>> >> didn't sign the artifacts, but I fixed that by calling mvn one more
>>> time.
>>> >> Available here:
>>> >>
>>> >> https://repository.apache.org/content/repositories/orgapache
>>> hadoop-1075/
>>> >>
>>> >> This release will run the standard 5 days, closing on Dec 13th at
>>> 12:31pm
>>> >> Pacific. My +1 to start.
>>> >>
>>> >> Best,
>>> >> Andrew
>>> >>
>>>
>>>
>>
>


[VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-08 Thread Andrew Wang
Hi all,

Let me start, as always, by thanking the efforts of all the contributors
who contributed to this release, especially those who jumped on the issues
found in RC0.

I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
fixed JIRAs since the previous 3.0.0-beta1 release.

You can find the artifacts here:

http://home.apache.org/~wang/3.0.0-RC1/

I've done the traditional testing of building from the source tarball and
running a Pi job on a single node cluster. I also verified that the shaded
jars are not empty.

Found one issue that create-release (probably due to the mvn deploy change)
didn't sign the artifacts, but I fixed that by calling mvn one more time.
Available here:

https://repository.apache.org/content/repositories/orgapachehadoop-1075/

This release will run the standard 5 days, closing on Dec 13th at 12:31pm
Pacific. My +1 to start.

Best,
Andrew


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-12-08 Thread Andrew Wang
FYI that we got our last blocker in today, so I'm currently rolling RC1.
Stay tuned!

On Thu, Nov 30, 2017 at 8:32 AM, Allen Wittenauer 
wrote:

>
> > On Nov 30, 2017, at 1:07 AM, Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
> >
> >
> > >. If ATSv1 isn’t replaced by ATSv2, then why is it marked deprecated?
> > Ideally it should not be. Can you point out where it is marked as
> deprecated? If it is in historyserver daemon start, that change made very
> long back when timeline server added.
>
>
> Ahh, I see where all the problems lie.  No one is paying attention to the
> deprecation message because it’s kind of oddly worded:
>
> * It really means “don’t use ‘yarn historyserver’ use ‘yarn
> timelineserver’ ”
> * ‘yarn historyserver’ was removed from the documentation in 2.7.0
> * ‘yarn historyserver’ doesn’t appear in the yarn usage output
> * ‘yarn timelineserver’ runs the exact same class
>
> There’s no reason for ‘yarn historyserver’ to exist in 3.x.  Just run
> ‘yarn timelineserver’ instead.
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


2017-12-01 Hadoop 3 release status update

2017-12-01 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-12-01

Haven't written one of these in a month. I had high hopes for RC0, but it
failed due to  HADOOP-15058
 -
create-release
site build outputs dummy shaded jars due to skipShade PATCH AVAILABLE  which
Sangjin found, and then a number of other blockers were found shortly after
that.

We're back to blocker burndown. My new (realistic) goal is to get 3.0.0 out
before Christmas. We could always use more help with reviews; most things
are patch available.



Highlights:

Red flags:

Previously tracked blockers that have been resolved or dropped:

GA blockers:

   - HDFS-12840
    - Creating
   a replicated file in a EC zone does not correctly serialized in
EditLogs PATCH
   AVAILABLE : Has gone through several rounds of review, looks close.
   - HADOOP-15080
    - Cat-X
   transitive dependency on org.json library via json-lib OPEN : New issue,
   waiting on LEGAL but we might need to pull this entire feature.
   - HADOOP-15059
    - 3.0
   deployment cannot work with old version MR tar ball which break rolling
   upgrade PATCH AVAILABLE : Has gone through some review and has a +1 from
   Daryn, could use confirmation from Vinod and others
   - HADOOP-15058
   
- create-release
   site build outputs dummy shaded jars due to skipShade PATCH AVAILABLE :
   Needs review, asked Allen but might need someone else to help.

GA criticals:

   - HDFS-12872
    - EC
   Checksum broken when BlockAccessToken is enabled PATCH AVAILABLE : Patch
   needs review
   - YARN-7381
    - Enable
   the configuration: yarn.nodemanager.log-container-debug-info.enabled PATCH
   AVAILABLE : Has gone through some review and Wangda +1'd, could use
   confirmation from Ray and others

Features merged for GA:

   - Erasure coding
  - Testing is still ongoing at Cloudera, which resulted in  HDFS-12840
  
- Creating
  a replicated file in a EC zone does not correctly serialized in EditLogs
   PATCH AVAILABLE  and HDFS-12872
   - EC
  Checksum broken when BlockAccessToken is enabled PATCH AVAILABLE .
   - Classpath isolation (HADOOP-11656)
   - No change.
   - Compat guide (HADOOP-13714
   )
  - We slid a couple more changes into 3.0.0 after RC0 was cancelled,
  making this work more complete.
   - TSv2 alpha 2
   - No change.
   - API-based scheduler configuration  YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - No change.
   - HDFS router-based configuration  HDFS-10467
    -
Router-based
   HDFS federation RESOLVED
  - No change.
   - Resource types  YARN-3926
    - Extend
   the YARN resource model for easier resource-type management and profiles
   RESOLVED
  - Had some post-merge issues that were resolved, nothing outstanding.


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-21 Thread Andrew Wang
Hi folks,

Thanks again for the testing help with the RC. Here's our dashboard for the
3.0.0 release:

https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12329849

Right now we're tracking three blockers:

* HADOOP-15058 is the create-release fix, I just put up a patch which needs
reviews. It's the worst timing, but I'm hoping Allen could give it a quick
sanity check.
* HADOOP-15059 is the MR rolling upgrade issue that Junping found, needs
triage and an assignee. I asked Ray to look at what we've done with our
existing rolling upgrade testing, since it does run an MR job.
* HDFS-12480 is an EC issue that Eddy would like to get in if we're rolling
another RC, looks close.

Is there anything else from this thread that needs to be addressed? I rely
on the dashboard to track blockers, so please file a JIRA and prioritize if
so.

Best,
Andrew



On Tue, Nov 21, 2017 at 2:08 PM, Vrushali C  wrote:

> Hi Vinod,
>
> bq. (b) We need to figure out if this V1 TimelineService should even be
> support given ATSv2.
>
> Yes, I am following this discussion. Let me chat with Rohith and Varun
> about this and we will respond on this thread. As such, my preliminary
> thoughts are that we should continue to support Timeline Service V1 till we
> have the detailed entity level ACLs in V2 and perhaps also a proposal
> around upgrade/migration paths from TSv1 to TSv2.
>
> But in any case, we do need to work towards phasing out Timeline Service
> V1.
>
> thanks
> Vrushali
>
>
> On Tue, Nov 21, 2017 at 1:16 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org
> > wrote:
>
> > >> - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't
> > even work. Not just deprecated in favor of timelineserver as was
> advertised.
> > >
> > >   This works for me in trunk and the bash code doesn’t appear to
> > have changed in a very long time.  Probably something local to your
> > install.  (I do notice that the deprecation message says “starting” which
> > is awkward when the stop command is given though.)  Also: is the
> > deprecation message even true at this point?
> >
> >
> > Sorry, I mischaracterized the problem.
> >
> > The real issue is that I cannot use this command line when the MapReduce
> > JobHistoryServer is already started on the same machine.
> >
> > ~/tmp/yarn$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver
> > WARNING: Use of this script to start YARN daemons is deprecated.
> > WARNING: Attempting to execute replacement "yarn --daemon start" instead.
> > DEPRECATED: Use of this command to start the timeline server is
> deprecated.
> > Instead use the timelineserver command for it.
> > Starting the History Server anyway...
> > historyserver is running as process 86156.  Stop it first.
> >
> > So, it looks like in shell-scripts, there can ever be only one daemon of
> a
> > given name, irrespective of which daemon scripts are invoked.
> >
> > We need to figure out two things here
> >  (a) The behavior of this command. Clearly, it will conflict with the
> > MapReduce JHS - only one of them can be started on the same node.
> >  (b) We need to figure out if this V1 TimelineService should even be
> > support given ATSv2.
> >
> > @Vrushani / @Rohith / @Varun Saxena et.al, if you are watching, please
> > comment on (b).
> >
> > Thanks
> > +Vinod
>


Re: Apache Hadoop 2.8.3 Release Plan

2017-11-21 Thread Andrew Wang
The Aliyun OSS code isn't a small improvement. If you look at Sammi's
comment
<https://issues.apache.org/jira/browse/HADOOP-14964?focusedCommentId=16247085=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16247085>,
it's
a 17 patch series that is being backported in one shot. What we're talking
about is equivalent to merging a feature branch in a maintenance release. I
see that Kai and Chris are having a discussion about the dependency
changes, which indicates this is not a zero-risk change either. We really
should not be changing dependency versions in a maintenance unless it's
because of a bug.

It's unfortunate from a timing perspective that this missed 2.9.0, but I
still think it should wait for the next minor. Merging a feature into a
maintenance release sets the wrong precedent.

Best,
Andrew

On Tue, Nov 21, 2017 at 1:08 AM, Junping Du <j...@hortonworks.com> wrote:

> Thanks Kai for calling out this feature/improvement for attention and
> Andrew for comments.
>
>
> While I agree that maintenance release should focus on important bug fix
> only, I doubt we have strict rules to disallow any features/improvements to
> land on maint release especially when those are small footprint or low
> impact on existing code/features. In practice, we indeed has 77 new
> features/improvements in latest 2.7.3 and 2.7.4 release.
>
>
> Back to HADOOP-14964, I did a quick check and it looks like case here
> belongs to self-contained improvement that has very low impact on existing
> code base, so I am OK with the improvement get landed on branch-2.8 in case
> it is well reviewed and tested.
>
>
> However, as RM of branch-2.8, I have two concerns to accept it in our
> 2.8.3 release:
>
> 1. Timing - as I mentioned in beginning, the main purpose of 2.8.3 are for
> several critical bug fixes and we should target to release it very soon -
> my current plan is to cut RC out within this week inline with waiting
> for 3.0.0 vote closing. Can this improvement be well tested against
> branch-2.8.3 within this strictly timeline? It seems a bit rush unless we
> have strong commitment on test plan and activities in such a tight time.
>
>
> 2. Upgrading - I haven't heard we settle down the plan of releasing this
> feature in 2.9.1 release - though I saw some discussions are going on
> at HADOOP-14964. Assume 2.8.3 is released ahead of 2.9.1 and it includes
> this improvement, then users consuming this feature/improvement have no 2.9
> release to upgrade or forcefully upgrade with regression. We may need a
> better upgrade story here.
>
>
> Pls let me know what you think. Thanks!
>
>
>
> Thanks,
>
>
> Junping
>
>
> --
> *From:* Andrew Wang <andrew.w...@cloudera.com>
> *Sent:* Monday, November 20, 2017 10:22 PM
> *To:* Zheng, Kai
> *Cc:* Junping Du; common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> *Subject:* Re: Apache Hadoop 2.8.3 Release Plan
>
> I'm against including new features in maintenance releases, since they're
> meant to be bug-fix only.
>
> If we're struggling with being able to deliver new features in a safe and
> timely fashion, let's try to address that, not overload the meaning of
> "maintenance release".
>
> Best,
> Andrew
>
> On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>
>> Hi Junping,
>>
>> Thank you for making 2.8.2 happen and now planning the 2.8.3 release.
>>
>> I have an ask, is it convenient to include the back port work for OSS
>> connector module? We have some Hadoop users that wish to have it by default
>> for convenience, though in the past they used it by back porting
>> themselves. I have raised this and got thoughts from Chris and Steve. Looks
>> like this is more wanted for 2.9 but I wanted to ask again here for broad
>> feedback and thoughts by this chance. The back port patch is available for
>> 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we
>> can see some shift from 2.7.x, hence it's worth more important features and
>> efforts. How would you think? Thanks!
>>
>> https://issues.apache.org/jira/browse/HADOOP-14964
>>
>> Regards,
>> Kai
>>
>> -Original Message-
>> From: Junping Du [mailto:j...@hortonworks.com]
>> Sent: Tuesday, November 14, 2017 9:02 AM
>> To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
>> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
>> Subject: Apache Hadoop 2.8.3 Release Plan
>>
>> Hi,
>> We have several important fixes get landed on branch-2.8 and I would
>> li

Re: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Andrew Wang
>
>
> >> If we're struggling with being able to deliver new features in a safe
> and timely fashion, let's try to address that...
>
> This is interesting. Do you aware any means to do that? Thanks!
>
> I've mentioned this a few times on the lists before, but our biggest gap
in keeping branches releasable is automated integration testing.

I think we try to put too much into our minor releases, and features arrive
before they're baked. Having automated integration testing helps with this.
When we were finally able to turn on CI for the 3.0.0 release branch, we
started finding bugs much sooner after they were introduced, which made it
easier to revert before too much other code was built on top. The early
alphas felt Sisyphean at times, with bugs being introduced faster than we
could uncover and fix them.

A smaller example would be release validation. I've long wanted a nightly
Jenkins job that makes an RC and runs some basic checks on it. We end up
rolling extra RCs for small stuff that could have been caught earlier.

Best,
Andrew


Re: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Andrew Wang
I'm against including new features in maintenance releases, since they're
meant to be bug-fix only.

If we're struggling with being able to deliver new features in a safe and
timely fashion, let's try to address that, not overload the meaning of
"maintenance release".

Best,
Andrew

On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai  wrote:

> Hi Junping,
>
> Thank you for making 2.8.2 happen and now planning the 2.8.3 release.
>
> I have an ask, is it convenient to include the back port work for OSS
> connector module? We have some Hadoop users that wish to have it by default
> for convenience, though in the past they used it by back porting
> themselves. I have raised this and got thoughts from Chris and Steve. Looks
> like this is more wanted for 2.9 but I wanted to ask again here for broad
> feedback and thoughts by this chance. The back port patch is available for
> 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we
> can see some shift from 2.7.x, hence it's worth more important features and
> efforts. How would you think? Thanks!
>
> https://issues.apache.org/jira/browse/HADOOP-14964
>
> Regards,
> Kai
>
> -Original Message-
> From: Junping Du [mailto:j...@hortonworks.com]
> Sent: Tuesday, November 14, 2017 9:02 AM
> To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Apache Hadoop 2.8.3 Release Plan
>
> Hi,
> We have several important fixes get landed on branch-2.8 and I would
> like to cut off branch-2.8.3 now to start 2.8.3 release work.
> So far, I don't see any pending blockers on 2.8.3, so my current plan
> is to cut off 1st RC of 2.8.3 in next several days:
>  -  For all coming commits to land on branch-2.8, please mark the
> fix version as 2.8.4.
>  -  If there is a really important fix for 2.8.3 and getting
> closed, please notify me ahead before landing it on branch-2.8.3.
> Please let me know if you have any thoughts or comments on the plan.
>
> Thanks,
>
> Junping
> 
> From: dujunp...@gmail.com  on behalf of 俊平堵 <
> junping...@apache.org>
> Sent: Friday, October 27, 2017 3:33 PM
> To: gene...@hadoop.apache.org
> Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release.
>
> Hi all,
>
> It gives me great pleasure to announce that the Apache Hadoop
> community has voted to release Apache Hadoop 2.8.2, which is now available
> for download from Apache mirrors[1]. For download instructions please refer
> to the Apache Hadoop Release page [2].
>
> Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line and
> our newest stable release for entire Apache Hadoop project. For major
> changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main page[3].
>
> This release has 315 resolved issues since previous 2.8.1 release with
> following
> breakdown:
>- 91 in Hadoop Common
>- 99 in HDFS
>- 105 in YARN
>- 20 in MapReduce
> Please read the log of CHANGES[4] and RELEASENOTES[5] for more details.
>
> The release news is posted on the Hadoop website too, you can go to the
> downloads section directly [6].
>
> Thank you all for contributing to the Apache Hadoop release!
>
>
> Cheers,
>
> Junping
>
>
> [1] http://www.apache.org/dyn/closer.cgi/hadoop/common
>
> [2] http://hadoop.apache.org/releases.html
>
> [3] http://hadoop.apache.org/docs/r2.8.2/index.html
>
> [4]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/CHANGES.2.8.2.html
>
> [5]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html
>
> [6] http://hadoop.apache.org/releases.html#Download
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang
On Mon, Nov 20, 2017 at 9:59 PM, Sangjin Lee <sj...@apache.org> wrote:

>
> On Mon, Nov 20, 2017 at 9:46 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Thanks for the spot Sangjin. I think this bug introduced in
>> create-release by HADOOP-14835. The multi-pass maven build generates these
>> dummy client jars during the site build since skipShade is specified.
>>
>> This might be enough to cancel the RC. Thoughts?
>>
>
> IMO yes. This was one of the key features mentioned in the 3.0 release
> notes. I appreciate your effort for the release Andrew!
>
>
Yea, I was leaning that way too. Let's cancel this RC. I hope to have a new
RC up tomorrow. With the upcoming holidays, we'll probably have to extend
the vote until mid-next week.

I'm also worried about the "mvn deploy" step since I thought it was safe to
specify skipShade there too. I'll check that as well.

Best,
Andrew


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang
Thanks for the thorough review Vinod, some inline responses:

*Issues found during testing*
>
> Major
>  - The previously supported way of being able to use different tar-balls
> for different sub-modules is completely broken - common and HDFS tar.gz are
> completely empty.
>

Is this something people use? I figured that the sub-tarballs were a relic
from the project split, and nowadays Hadoop is one project with one release
tarball. I actually thought about getting rid of these extra tarballs since
they add extra overhead to a full build.


>  - Cannot enable new UI in YARN because it is under a non-default
> compilation flag. It should be on by default.
>

The yarn-ui profile has always been off by default, AFAIK. It's documented
to turn it on in BUILDING.txt for release builds, and we do it in
create-release.

IMO not a blocker. I think it's also more of a dev question (do we want to
do this on every YARN build?) than a release one.


>  - One decommissioned node in YARN ResourceManager UI always appears to
> start with, even when there are no NodeManagers that are started yet:
> Info :-1, DECOMMISSIONED, null rack. It shows up only in the UI though,
> not in the CLI node -list
>

Is this a blocker? Could we get a JIRA?

Thanks,
Andrew


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang
Thanks for the spot Sangjin. I think this bug introduced in create-release
by HADOOP-14835. The multi-pass maven build generates these dummy client
jars during the site build since skipShade is specified.

This might be enough to cancel the RC. Thoughts?

Best,
Andrew

On Mon, Nov 20, 2017 at 7:51 PM, Sangjin Lee <sj...@apache.org> wrote:

> I checked the client jars that are supposed to contain shaded
> dependencies, and they don't look quite right:
>
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-api-3.0.0.jar
> -rw-r--r--  0 andrew andrew44531 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-api-3.0.0.jar
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-runtime-3.0.0.jar
> -rw-r--r--  0 andrew andrew45533 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-runtime-3.0.0.jar
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-minicluster-3.0.0.jar
> -rw-r--r--  0 andrew andrew47015 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-minicluster-3.0.0.jar
>
> When I look at what's inside those jar, they only seem to include
> pom-related files with no class files. Am I missing something?
>
> When I build from the source with -Pdist, I do get much bigger jars:
> total 113760
> -rw-r--r--  1 sangjinlee  120039211  17055399 Nov 20 17:17
> hadoop-client-api-3.0.0.jar
> -rw-r--r--  1 sangjinlee  120039211  20451447 Nov 20 17:19
> hadoop-client-minicluster-3.0.0.jar
> -rw-r--r--  1 sangjinlee  120039211  20730866 Nov 20 17:18
> hadoop-client-runtime-3.0.0.jar
>
> Sangjin
>
> On Mon, Nov 20, 2017 at 5:52 PM, Sangjin Lee <sj...@apache.org> wrote:
>
>>
>>
>> On Mon, Nov 20, 2017 at 5:26 PM, Vinod Kumar Vavilapalli <
>> vino...@apache.org> wrote:
>>
>>> Thanks for all the push, Andrew!
>>>
>>> Looking at the RC. Went through my usual check-list. Here's my summary.
>>> Will cast my final vote after comparing and validating my findings with
>>> others.
>>>
>>> Verification
>>>
>>>  - [Check] Successful recompilation from source tar-ball
>>>  - [Check] Signature verification
>>>  - [Check] Generating dist tarballs from source tar-ball
>>>  - [Check] Testing
>>> -- Start NN, DN, RM, NM, JHS, Timeline Service
>>> -- Ran dist-shell example, MR sleep, wordcount, randomwriter, sort,
>>> grep, pi
>>> -- Tested CLIs to print nodes, apps etc and also navigated UIs
>>>
>>> Issues found during testing
>>>
>>> Major
>>>  - The previously supported way of being able to use different tar-balls
>>> for different sub-modules is completely broken - common and HDFS tar.gz are
>>> completely empty.
>>>  - Cannot enable new UI in YARN because it is under a non-default
>>> compilation flag. It should be on by default.
>>>  - One decommissioned node in YARN ResourceManager UI always appears to
>>> start with, even when there are no NodeManagers that are started yet:  Info
>>> :-1, DECOMMISSIONED, null rack. It shows up only in the UI though, not
>>> in the CLI node -list
>>>
>>> Minor
>>>  - resourcemanager-metrics.out is going into current directory instead
>>> of log directory
>>>  - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't
>>> even work. Not just deprecated in favor of timelineserver as was advertised.
>>>  - Spurious warnings on CLI
>>> 17/11/20 17:07:34 INFO conf.Configuration:
>>> resource-types.xml not found
>>> 17/11/20 17:07:34 INFO resource.ResourceUtils: Unable to
>>> find 'resource-types.xml'.
>>>
>>> Side notes
>>>
>>>  - When did we stop putting CHANGES files into the source artifacts?
>>>  - Even after "mvn install"ing once, shading is repeated again and again
>>> for every new 'mvn install' even though there are no source changes - we
>>> should see how this can be avoided.
>>>  - Compatibility notes
>>> -- NM's env list is curtailed unlike in 2.x (For e.g,
>>> HADOOP_MAPRED_HOME is not automatically inherited. Correct behavior)
>>> -- Sleep is moved from hadoop-mapreduce-client-jobclient-3.0.0.jar
>>> into hadoop-mapreduce-client-jobclient-3.0.0-tests.jar
>>>
>>
>> Sleep has always been in the jobclient test jar as long as I can
>> remember, so it's not new for 3.0.
>>
>>
>>>
>>> Thanks
>>> +Vinod
>>>
>>> > On Nov 14, 2017, at 1:34 PM, Andrew Wang <andrew.w...@cloudera.com>
>>> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > Thanks as always to the many, many contributors who helped with this
>>> > release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
>>> > available here:
>>> >
>>> > http://people.apache.org/~wang/3.0.0-RC0/
>>> >
>>> > This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>>> >
>>> > 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
>>> > additions include the merge of YARN resource types, API-based
>>> configuration
>>> > of the CapacityScheduler, and HDFS router-based federation.
>>> >
>>> > I've done my traditional testing with a pseudo cluster and a Pi job.
>>> My +1
>>> > to start.
>>> >
>>> > Best,
>>> > Andrew
>>>
>>>
>>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-17 Thread Andrew Wang
Hi Arpit,

I agree the timing is not great here, but extending it to meaningfully
avoid the holidays would mean extending it an extra week (e.g. to the
29th). We've been coordinating with ASF PR for that Tuesday, so I'd really,
really like to get the RC out before then.

In terms of downstream testing, we've done extensive integration testing
with downstreams via the alphas and betas, and we have continuous
integration running at Cloudera against branch-3.0. Because of this, I have
more confidence in our integration for 3.0.0 than most Hadoop releases.

Is it meaningful to extend to say, the 21st, which provides for a full week
of voting?

Best,
Andrew

On Fri, Nov 17, 2017 at 1:27 PM, Arpit Agarwal <aagar...@hortonworks.com>
wrote:

> Hi Andrew,
>
> Thank you for your hard work in getting us to this step. This is our first
> major GA release in many years.
>
> I feel a 5-day vote window ending over the weekend before thanksgiving may
> not provide sufficient time to evaluate this RC especially for downstream
> components.
>
> Would you please consider extending the voting deadline until a few days
> after the thanksgiving holiday? It would be a courtesy to our broader
> community and I see no harm in giving everyone a few days to evaluate it
> more thoroughly.
>
> On a lighter note, your deadline is also 4 minutes short of the required 5
> days. :)
>
> Regards,
> Arpit
>
>
>
> On 11/14/17, 1:34 PM, "Andrew Wang" <andrew.w...@cloudera.com> wrote:
>
> Hi folks,
>
> Thanks as always to the many, many contributors who helped with this
> release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
> available here:
>
> http://people.apache.org/~wang/3.0.0-RC0/
>
> This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>
> 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
> additions include the merge of YARN resource types, API-based
> configuration
> of the CapacityScheduler, and HDFS router-based federation.
>
> I've done my traditional testing with a pseudo cluster and a Pi job.
> My +1
> to start.
>
> Best,
> Andrew
>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-17 Thread Andrew Wang
Thanks for the spot, normally create-release spits those out. I uploaded
asc and mds for the release artifacts.

Best,
Andrew

On Thu, Nov 16, 2017 at 11:33 PM, Akira Ajisaka <aajis...@apache.org> wrote:

> Hi Andrew,
>
> Signatures are missing. Would you upload them?
>
> Thanks,
> Akira
>
>
> On 2017/11/15 6:34, Andrew Wang wrote:
>
>> Hi folks,
>>
>> Thanks as always to the many, many contributors who helped with this
>> release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
>> available here:
>>
>> http://people.apache.org/~wang/3.0.0-RC0/
>>
>> This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>>
>> 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
>> additions include the merge of YARN resource types, API-based
>> configuration
>> of the CapacityScheduler, and HDFS router-based federation.
>>
>> I've done my traditional testing with a pseudo cluster and a Pi job. My +1
>> to start.
>>
>> Best,
>> Andrew
>>
>>


Re: [DISCUSS] A final minor release off branch-2?

2017-11-15 Thread Andrew Wang
Hi Junping,

On Wed, Nov 15, 2017 at 1:37 AM, Junping Du  wrote:

> Thanks Vinod to bring up this discussion, which is just in time.
>
> I agree with most responses that option C is not a good choice as our
> community bandwidth is precious and we should focus on very limited
> mainstream branches to develop, test and deployment. Of course, we should
> still follow Apache way to allow any interested committer for rolling up
> his/her own release given specific requirement over the mainstream releases.
>
> I am not biased on option A or B (I will discuss this later), but I think
> a bridge release for upgrading to and back from 3.x is very necessary.
> The reasons are obviously:
> 1. Given lesson learned from previous experience of migration from 1.x to
> 2.x, no matter how careful we tend to be, there is still chance that some
> level of compatibility (source, binary, configuration, etc.) get broken for
> the migration to new major release. Some of these incompatibilities can
> only be identified in runtime after GA release with widely deployed in
> production cluster - we have tons of downstream projects and numerous
> configurations and we cannot cover them all from in-house deployment and
> test.
>

Source and binary compatibility are not required for 3.0.0. It's a new
major release, and there are known, documented incompatibilities in this
regard.

That said, we've done far, far more in this regard compared to previous
major or minor releases. We've compiled all of CDH against Hadoop 3 and run
our suite of system tests for the platform. We've been testing in this way
since 3.0.0-alpha1 and found and fixed plenty of source and binary
compatibility issues during the alpha and beta process. Many of these fixes
trickled down into 2.8 and 2.9.

>
> 2. From recent classpath isolation work, I was surprised to find out that
> many of our downstream projects (HBase, Tez, etc.) are still consuming many
> non-public, server side APIs of Hadoop, not saying the projects/products
> outside of hadoop ecosystem. Our API compatibility test does not (and
> should not) cover these cases and situations. We can claim that new major
> release shouldn't be responsible for these private API changes. But given
> the possibility of breaking existing applications in some way, users could
> be very hesitated to migrate to 3.x release if there is no safe solution to
> roll back.
>

This is true for 2.x releases as well. Similar to the previous answer,
we've compiled all of CDH against Hadoop 3, providing a much higher level
of assurance even compared to 2.x releases.

>
> 3. Beside incompatibilities, there is also possible to have performance
> regressions (lower throughput, higher latency, slower job running, bigger
> memory footprint or even memory leaking, etc.) for new hadoop releases.
> While the performance impact of migration (if any) could be neglectable to
> some users, other users could be very sensitive and wish to roll back if it
> happens on their production cluster.
>
> Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases
can potentially introduce new bugs.

However, I don't think rollback is the solution. In my experience, users
rarely rollback since it's so disruptive and causes data loss. It's much
more common that they patch and upgrade. With that in mind, I'd rather we
spend our effort on making 3.0.x high-quality vs. making it easier to
rollback.

The root of my concern in announcing a "bridge release" is that it
discourages users from upgrading to 3.0.0 until a bridge release is out. I
strongly believe the level of quality provided by 3.0.0 is at least equal
to new 2.x minor releases, given our extended testing and integration
process, and we don't have bridge releases for 2.x.

This is why I asked for a list of known issues with 2.x -> 3.0 upgrades,
that would necessitate a bridge release. Arun raised a concern about NM
rollback. Are there any other *known* issues?

Best,
Andrew


Re: [DISCUSS] A final minor release off branch-2?

2017-11-14 Thread Andrew Wang
To follow up on my earlier email, I don't think there's need for a bridge
release given that we've successfully tested rolling upgrade from 2.x to
3.0.0. I expect we'll keep making improvements to smooth over any
additional incompatibilities found, but there isn't a requirement that a
user upgrade to a bridge release before upgrading to 3.0.

Otherwise, I don't have a strong opinion about when to discontinue branch-2
releases. Historically, a release line is maintained until interest in it
wanes. If the maintainers are taking care of the backports, it's not much
work for the rest of us to vote on the RCs.

Best,
Andrew

On Mon, Nov 13, 2017 at 4:19 PM, Wangda Tan  wrote:

> Thanks Vinod for staring this,
>
> I'm also leaning towards the plan (A):
>
>
>
>
> * (A)-- Make 2.9.x the last minor release off branch-2-- Have a
> maintenance release that bridges 2.9 to 3.x-- Continue to make more
> maintenance releases on 2.8 and 2.9 as necessary*
>
> The only part I'm not sure is having a separate bridge release other than
> 3.x.
>
> For the bridge release, Steve's suggestion sounds more doable:
>
> ** 3.1+ for new features*
> ** fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation*
> ** whoever puts their hand up to do 2.x releases deserves support in
> testing *
> ** If someone makes a really strong case to backport a feature from 3.x to
> branch-2 and its backwards compatible, I'm not going to stop them. It's
> just once 3.0 is out and a 3.1 on the way, it's less compelling*
>
> This makes community can focus on 3.x releases and fill whatever gaps of
> migrating from 2.x to 3.x.
>
> Best,
> Wangda
>
>
> On Wed, Nov 8, 2017 at 3:57 AM, Steve Loughran 
> wrote:
>
>>
>> > On 7 Nov 2017, at 19:08, Vinod Kumar Vavilapalli 
>> wrote:
>> >
>> >
>> >
>> >
>> >> Frankly speaking, working on some bridging release not targeting any
>> feature isn't so attractive to me as a contributor. Overall, the final
>> minor release off branch-2 is good, we should also give 3.x more time to
>> evolve and mature, therefore it looks to me we would have to work on two
>> release lines meanwhile for some time. I'd like option C), and suggest we
>> focus on the recent releases.
>> >
>> >
>> >
>> > Answering this question is also one of the goals of my starting this
>> thread. Collectively we need to conclude if we are okay or not okay with no
>> longer putting any new feature work in general on the 2.x line after 2.9.0
>> release and move over our focus into 3.0.
>> >
>> >
>> > Thanks
>> > +Vinod
>> >
>>
>>
>> As a developer of new features (e.g the Hadoop S3A committers), I'm
>> mostly already committed to targeting 3.1; the code in there to deal with
>> failures and retries has unashamedly embraced java 8 lambda-expressions in
>> production code: backporting that is going to be traumatic in terms of
>> IDE-assisted code changes and the resultant diff in source between branch-2
>> & trunk. What's worse, its going to be traumatic to test as all my JVMs
>> start with an 8 at the moment, and I'm starting to worry about whether I
>> should bump a windows VM up to Java 9 to keep an eye on Akira's work there.
>> Currently the only testing I'm really doing on java 7 is yetus branch-2 &
>> internal test runs.
>>
>>
>> 3.0 will be out the door, and we can assume that CDH will ship with it
>> soon (*)  which will allow for a rapid round trip time on inevitable bugs:
>> 3.1 can be the release with compatibility tuned, those reported issues
>> addressed. It's certainly where I'd like to focus.
>>
>>
>> At the same time: 2.7.2-2.8.x are the broadly used versions, we can't
>> just say "move to 3.0" & expect everyone to do it, not given we have
>> explicitly got backwards-incompatible changes in. I don't seen people
>> rushing to do it until the layers above are all qualified (HBase, Hive,
>> Spark, ...). Which means big users of 2.7/2,8 won't be in a rush to move
>> and we are going to have to maintain 2.x for a while, including security
>> patches for old versions. One issue there: what if a patch (such as bumping
>> up a JAR version) is incompatible?
>>
>> For me then
>>
>> * 3.1+ for new features
>> * fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation
>> * whoever puts their hand up to do 2.x releases deserves support in
>> testing 
>> * If someone makes a really strong case to backport a feature from 3.x to
>> branch-2 and its backwards compatible, I'm not going to stop them. It's
>> just once 3.0 is out and a 3.1 on the way, it's less compelling
>>
>> -Steve
>>
>> Note: I'm implicitly assuming a timely 3.1 out the door with my work
>> included, all all issues arriving from 3,0 fixed. We can worry when 3.1
>> ships whether there's any benefit in maintaining a 3.0.x, or whether it's
>> best to say "move to 3.1"
>>
>>
>>
>> (*) just a guess based the effort & test reports of Andrew & others
>>
>>
>> 

[VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-14 Thread Andrew Wang
Hi folks,

Thanks as always to the many, many contributors who helped with this
release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
available here:

http://people.apache.org/~wang/3.0.0-RC0/

This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.

3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
additions include the merge of YARN resource types, API-based configuration
of the CapacityScheduler, and HDFS router-based federation.

I've done my traditional testing with a pseudo cluster and a Pi job. My +1
to start.

Best,
Andrew


Re: Heads up: branching branch-3.0.0 for GA

2017-11-14 Thread Andrew Wang
Branching is complete. Please use the 3.0.1 fix version for further commits
to branch-3.0. Ping me if you want something in branch-3.0.0 since I'm
rolling RC0 now.

On Tue, Nov 14, 2017 at 11:08 AM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Hi folks,
>
> We've resolved all the blockers for 3.0.0 and the release notes and
> changelog look good, so I'm going to cut the branch and get started on the
> RC.
>
> * branch-3.0 will advance to 3.0.1-SNAPSHOT
> * branch-3.0.0 will go to 3.0.0
>
> Please keep this in mind when committing.
>
> Cheers,
> Andrew
>


Heads up: branching branch-3.0.0 for GA

2017-11-14 Thread Andrew Wang
Hi folks,

We've resolved all the blockers for 3.0.0 and the release notes and
changelog look good, so I'm going to cut the branch and get started on the
RC.

* branch-3.0 will advance to 3.0.1-SNAPSHOT
* branch-3.0.0 will go to 3.0.0

Please keep this in mind when committing.

Cheers,
Andrew


Re: [DISCUSS] A final minor release off branch-2?

2017-11-06 Thread Andrew Wang
What are the known gaps that need bridging between 2.x and 3.x?

>From an HDFS perspective, we've tested wire compat, rolling upgrade, and
rollback.

>From a YARN perspective, we've tested wire compat and rolling upgrade. Arun
just mentioned an NM rollback issue that I'm not familiar with.

Anything else? External to this discussion, these should be documented as
known issues for 3.0.

Best.
Andrew

On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh  wrote:

> Thanks for starting this discussion VInod.
>
> I agree (C) is a bad idea.
> I would prefer (A) given that ATM, branch-2 is still very close to
> branch-2.9 - and it is a good time to make a collective decision to lock
> down commits to branch-2.
>
> I think we should also clearly define what the 'bridging' release should
> be.
> I assume it means the following:
> * Any 2.x user wanting to move to 3.x must first upgrade to the bridging
> release first and then upgrade to the 3.x release.
> * With regard to state store upgrades (at least NM state stores) the
> bridging state stores should be aware of all new 3.x keys so the implicit
> assumption would be that a user can only rollback from the 3.x release to
> the bridging release and not to the old 2.x release.
> * Use the opportunity to clean up deprecated API ?
> * Do we even want to consider a separate bridging release for 2.7, 2.8 an
> 2.9 lines ?
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org>
> wrote:
>
> > Hi all,
> >
> > With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC out
> > (tx Arun / Subru!) and 2.8.2 (tx Junping!), I think it's high time we
> have
> > a discussion on how we manage our developmental bandwidth between 2.x
> line
> > and 3.x lines.
> >
> > Once 3.0 GA goes out, we will have two parallel and major release lines.
> > The last time we were in this situation was back when we did 1.x -> 2.x
> > jump.
> >
> > The parallel releases implies overhead of decisions, branch-merges and
> > back-ports. Right now we already do backports for 2.7.5, 2.8.2, 2.9.1,
> > 3.0.1 and potentially a 3.1.0 in a few months after 3.0.0 GA. And many of
> > these lines - for e.g 2.8, 2.9 - are going to be used for a while at a
> > bunch of large sites! At the same time, our users won't migrate to 3.0 GA
> > overnight - so we do have to support two parallel lines.
> >
> > I propose we start thinking of the fate of branch-2. The idea is to have
> > one final release that helps our users migrate from 2.x to 3.x. This
> > includes any changes on the older line to bridge compatibility issues,
> > upgrade issues, layout changes, tooling etc.
> >
> > We have a few options I think
> >  (A)
> > -- Make 2.9.x the last minor release off branch-2
> > -- Have a maintenance release that bridges 2.9 to 3.x
> > -- Continue to make more maintenance releases on 2.8 and 2.9 as
> > necessary
> > -- All new features obviously only go into the 3.x line as no
> features
> > can go into the maint line.
> >
> >  (B)
> > -- Create a new 2.10 release which doesn't have any new features, but
> > as a bridging release
> > -- Continue to make more maintenance releases on 2.8, 2.9 and 2.10 as
> > necessary
> > -- All new features, other than the bridging changes, go into the 3.x
> > line
> >
> >  (C)
> > -- Continue making branch-2 releases and postpone this discussion for
> > later
> >
> > I'm leaning towards (A) or to a lesser extent (B). Willing to hear
> > otherwise.
> >
> > Now, this obviously doesn't mean blocking of any more minor releases on
> > branch-2. Obviously, any interested committer / PMC can roll up his/her
> > sleeves, create a release plan and release, but we all need to
> acknowledge
> > that versions are not cheap and figure out how the community bandwidth is
> > split overall.
> >
> > Thanks
> > +Vinod
> > PS: The proposal is obviously not to force everyone to go in one
> direction
> > but more of a nudging the community to figure out if we can focus a major
> > part of of our bandwidth on one line. I had a similar concern when we
> were
> > doing 2.8 and 3.0 in parallel, but the impending possibility of spreading
> > too thin is much worse IMO.
> > PPS: (C) is a bad choice. With 2.8 and 2.9 we are already seeing user
> > adoption splintering between two lines. With 2.10, 2.11 etc coexisting
> with
> > 3.0, 3.1 etc, we will revisit the mad phase years ago when we had 0.20.x,
> > 0.20-security coexisting with 0.21, 0.22 etc.
>


2017-10-31 Hadoop 3 release status update

2017-10-31 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+
updates

2017-10-31

Lots of progress towards GA, we look on track for cutting RC0 this week. I
ran the versions script to check the branch matches up with JIRA and fixed
things up, and also checked that the changelog and release notes look
reasonable.

Highlights:

   - Resource types vote has passed and will be merged with branch-3.0
   shortly.
   - Down to three blockers on the dashboard, all being actively revved.

Red flags:

   - Still need to validate that resource types is ready to go once it's
   merged.

Previous tracked GA blockers that have been resolved or dropped:

   - Change of ExecutionType
  - YARN-7178
   - Add
  documentation for Container Update API RESOLVED : Arun got the patch
  in with reviews from Wangda and Haibo.
   - ReservationSystem
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler RESOLVED: Yufei
  and Subru got this in.
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x RESOLVED : Ray resolved this
  since we think it's sufficiently complete.
   - Erasure coding
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart RESOLVED: Resolved this one to incorporate it in
  HDFS-12682

GA blockers:

   - Rolling upgrade
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: I asked Sean if
  we can downgrade this from blocker
   - Erasure coding
  - HDFS-12682
  
- ECAdmin
  -listPolicies will always show SystemErasureCodingPolicies state
as DISABLED
   PATCH AVAILABLE: Actively being worked on and reviewed, should be in
  soon
  - HDFS-11467
  
- Support
  ErasureCoding section in OIV XML/ReverseXMLPATCH AVAILABLE: Waiting
  on HDFS-12682, I asked if we can work concurrently

Features merged for GA:

   - Erasure coding
  - Testing is still ongoing at Cloudera, no new bugs found recently
  - Closing on remaining blockers for GA
   - Classpath isolation (HADOOP-11656)
   - HADOOP-13916
  
- Document
  how downstream clients should make use of the new shaded client artifacts
   OPEN: Seems unlikely to make it
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14876
  
- Create
  downstream developer docs from the compatibility guidelines PATCH
  AVAILABLE: Patch is being actively revved and reviewed, Robert +1'd,
  Anu posted a big review
  - HADOOP-14875
  
- Create
  end user documentation from the compatibility guidelines PATCH
  AVAILABLE: No patch yet
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]
   - API-based scheduler configuration YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - HDFS router-based configuration HDFS-10467
    -
Router-based
   HDFS federation RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - Resource types YARN-3926
    - Extend
   the YARN resource model for easier resource-type management and profiles
   RESOLVED
  - Vote has passed, Daniel is currently doing the mechanics of merging
  - Need to also perform final validation post-merge

Dropping the "unmerged features" section since we're not letting in
anything else at this point.


[jira] [Resolved] (MAPREDUCE-6987) JHS Log Scanner and Cleaner blocked

2017-10-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved MAPREDUCE-6987.

   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)
   (was: 3.0.0)
   (was: 2.9.0)

> JHS Log Scanner and Cleaner blocked
> ---
>
> Key: MAPREDUCE-6987
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6987
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> {code}
> "Log Scanner/Cleaner #1" #81 prio=5 os_prio=0 tid=0x7fd6c010f000 
> nid=0x11db waiting on condition [0x7fd6aa859000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6c88a80> (a 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47)
>   at 
> org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> "Log Scanner/Cleaner #0" #80 prio=5 os_prio=0 tid=0x7fd6c010c800 
> nid=0x11da waiting on condition [0x7fd6aa95a000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6c8> (a 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47)
>   at 
> org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Both threads waiting on {{FutureTask.get()}} for infinite time after first 
> execution



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-6987) JHS Log Scanner and Cleaner blocked

2017-10-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang reopened MAPREDUCE-6987:


> JHS Log Scanner and Cleaner blocked
> ---
>
> Key: MAPREDUCE-6987
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6987
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>
> {code}
> "Log Scanner/Cleaner #1" #81 prio=5 os_prio=0 tid=0x7fd6c010f000 
> nid=0x11db waiting on condition [0x7fd6aa859000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6c88a80> (a 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47)
>   at 
> org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> "Log Scanner/Cleaner #0" #80 prio=5 os_prio=0 tid=0x7fd6c010c800 
> nid=0x11da waiting on condition [0x7fd6aa95a000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xd6c8> (a 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:47)
>   at 
> org.apache.hadoop.util.concurrent.HadoopScheduledThreadPoolExecutor.afterExecute(HadoopScheduledThreadPoolExecutor.java:69)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Both threads waiting on {{FutureTask.get()}} for infinite time after first 
> execution



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Andrew Wang
FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.

On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer 
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> 
> >> From: Sean Busbey 
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  > wrote:
> >> Allen,
> >> Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >> Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 
> >> From: Allen Wittenauer >
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org hadoop.apache.org>; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> a...@effectivemachines.com> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> >>
> >>and
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
> >>
> >>It looks like my hunch is correct:  RAM in the HDFS unit tests
> are going through the roof.  It's also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
> >>
> >>Anyway, at the point, branch-2.8 and higher are probably
> fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker
> containers can 

2017-10-20 Hadoop 3 release status update

2017-10-20 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-10-20

Apologies for skipping the update last week. Here's how we're tracking for
GA.

Highlights:

   - Merge of HDFS router-based federation and API-based scheduler
   configuration with no reported problems. Kudos to the contributors involved!

Red flags:

   - We're making a last-minute push to get resource types (but not
   resource profiles in). Coming this late, it's a risk, but we decided it's
   worthwhile for this feature. See Daniel's yarn-dev email
   

for
   the full rationale.
   - Still uncovering EC bugs from testing

Previously tracked GA blockers that have been resolved or dropped:

   - YARN-6623
    - Add
   support to turn off launching privileged containers in the
   container-executor RESOLVED: Committed and resolved
   - Change of ExecutionType
  - YARN-7275
   - NM
  Statestore cleanup for Container updates RESOLVED : Patch committed,
  resolved.
   - ReservationSystem
  - YARN-4859
   - [Bug]
  Unable to submit a job to a reservation when using FairScheduler
  RESOLVED: Yufei tested this and found things mostly worked, filed two
  not-blocker followons: YARN-7347
   - Fixe
  the bug in Fair scheduler to handle a queue named "root.root" OPEN
   and YARN-7348
   - Ignore
  the vcore in reservation request for fair policy queue OPEN

GA blockers:

   - Change of ExecutionType
  - YARN-7178
   - Add
  documentation for Container Update API OPEN : Still no update from
  Arun, I pinged it.
   - ReservationSystem
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler OPEN: Yufei said
  he'd work on it as of 2 days ago
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x OPEN : I pinged this and asked
  for a status update
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: I pinged this and
  asked for a status update
   - Erasure coding
  - HDFS-12682
  
- ECAdmin
  -listPolicies will always show policy state as DISABLED OPEN: New
  blocker filed this week, Xiao is working on it
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart OPEN: New blocker filed this week, Sammi is on it
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart OPEN: Old blocker, Huafeng is on it, waiting on
  review from Wei-Chiu or Sammi

Features merged for GA:

   - Erasure coding
  - Continued bug reporting and fixing based on testing at Cloudera.
  - Two new blockers filed this week, mentioned above.
  - Huafeng completed patch to reenable disabled EC tests
   - Classpath isolation (HADOOP-11656)
   - HADOOP-13916
  
- Document
  how downstream clients should make use of the new shaded client artifacts
   IN PROGRESS: I pinged it
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14876
  
- Create
  downstream developer docs from the compatibility guidelines PATCH
  AVAILABLE: Daniel has a patch up, revved based on Steve's review
  feedback, waiting on Steve's reply
  - HADOOP-14875
  
- Create
  end user documentation from the compatibility guidelines OPEN: No
  patch yet
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]
   - API-based scheduler configuration YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - HDFS router-based configuration HDFS-10467
   

Re: 2017-10-06 Hadoop 3 release status update

2017-10-06 Thread Andrew Wang
Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer <a...@effectivemachines.com>
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
> Fake news.
>
> I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
> * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
> * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
> vs.
>
> $ yarn —daemon start apiserver
>
> if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”? Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
> … what does this even mean?
>
> It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
> It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>


2017-10-06 Hadoop 3 release status update

2017-10-06 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-10-06

The beta1 RC0 vote passed, and beta1 is out! Now tracking GA features.

Highlights:

   - 3.0.0-beta1 has been released!
   - Router-based federation merge vote should be about to pass
   - API-based scheduler configuration merge vote is out, has the votes so
   far

Red flags:

   - Still need to nail down whether we're going to try and merge resource
   profiles. I've been emailing with Wangda and Daniel about this, we need to
   reach a decision ASAP (might already be too late).
   - Still waiting on Allen to review YARN native services feature.

Previously tracked GA blockers that have been resolved or dropped:

   - YARN-7134
    -
AppSchedulingInfo
   has a dependency on capacity schedulerOPEN:  Wangda downgraded this to
   "Major", dropping from list.

GA blockers:

   - YARN-6623
    - Add
   support to turn off launching privileged containers in the
   container-executor PATCH AVAILABLE: Actively being reviewed
   - Change of ExecutionType
  - YARN-7275
   - NM
  Statestore cleanup for Container updatesPATCH AVAILABLE: Kartheek has
  posted a patch, waiting for review
  - YARN-7178
   - Add
  documentation for Container Update API OPEN : No update from Arun,
  though it's just a docs patch
   - ReservationSystem
  - YARN-4859
   - [Bug]
  Unable to submit a job to a reservation when using FairScheduler OPEN:
  Yufei has picked this up
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler OPEN: Yufei has
  picked this up, just a docs patch
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x OPEN : Ray is still going through
  JACC and proto output
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: Sean has revved
  the patch and is waiting on reviews from Ray, Allen

Features merged for GA:

   - Erasure coding
  - Continued bug reporting and fixing based on testing at Cloudera.
  - Still need to finish the 3.0 must-do's
   - Classpath isolation (HADOOP-11656)
   - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   )
  - Synced with Daniel, he plans to wrap up the remaining  stuff next
  week
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource types / profiles (YARN-3926
    and YARN-7069
   ) (Wangda Tan)
  - This has been merged for 3.1.0, YARN-7069 tracks follow on work
  - Wangda said that he's okay waiting for 3.1.0 for this, we're
  waiting on Daniel. I synced with Daniel earlier this week, and
he wants to
  try and get some of it into 3.0.0. Waiting on an update.
  - I still need a JIRA query for tracking the state of this.
   - HDFS router-based federation (HDFS-10467
   ) (Inigo Goiri and
   Chris Douglas)
   - Merge vote should close any minute now
   - API-based scheduler configuration (Jonathan Hung)
  - Merge vote is out, will close next week
   - YARN native services (YARN-5079
   ) (Jian He)
  - Subtasks were filed to address Allen's review comments from the
  previous merge vote, only one pending
  - We need to confirm with Allen that this is ready to go, he hasn't
  been reviewing


Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-04 Thread Andrew Wang
Thanks for the additional review Rohith, much appreciated!

On Wed, Oct 4, 2017 at 12:14 AM, Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> +1 (binding)
>
> Built from source and deployed YARN HA cluster with ATSv2 enabled in
> non-secured cluster.
> - tested for RM HA/work-preservring-restart/ NM-work-preserving restart
> for ATSv2 entities.
> - verified all ATSv2 REST end points to retrieve the entities
> - ran sample MR jobs and distributed jobs
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 October 2017 at 05:31, Andrew Wang <andrew.w...@cloudera.com> wrote:
>
>> Thanks everyone for voting! With 4 binding +1s and 7 non-binding +1s, the
>> vote passes.
>>
>> I'll get started on pushing out the release.
>>
>> Best,
>> Andrew
>>
>> On Tue, Oct 3, 2017 at 3:45 PM, Aaron Fabbri <fab...@cloudera.com> wrote:
>>
>> > +1
>> >
>> > Built from source.  Ran S3A integration tests in us-west-2 with S3Guard
>> > (both Local and Dynamo metadatastore).
>> >
>> > Everything worked fine except I hit one integration test failure.  It
>> is a
>> > minor test issue IMO and I've filed HADOOP-14927
>> >
>> > Failed tests:
>> >   ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDe
>> stroyNoBucket:228
>> > Expected an exception, got 0
>> >   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestr
>> oyNoBucket:228
>> > Expected an exception, got 0
>> >
>> >
>> >
>> > On Tue, Oct 3, 2017 at 2:45 PM, Ajay Kumar <ajay.ku...@hortonworks.com>
>> > wrote:
>> >
>> >> +1 (non-binding)
>> >>
>> >> - built from source
>> >> - deployed on single node cluster
>> >> - Basic hdfs operations
>> >> - Run wordcount on a text file
>> >> Thanks,
>> >> Ajay
>> >>
>> >>
>> >> On 10/3/17, 1:04 PM, "Eric Badger" <ebad...@oath.com.INVALID> wrote:
>> >>
>> >> +1 (non-binding)
>> >>
>> >> - Verified all checksums and signatures
>> >> - Built native from source on macOS 10.12.6 and RHEL 7.1
>> >> - Deployed a single node pseudo cluster
>> >> - Ran pi and sleep jobs
>> >> - Verified Docker was marked as experimental
>> >>
>> >> Thanks,
>> >>
>> >> Eric
>> >>
>> >> On Tue, Oct 3, 2017 at 1:41 PM, John Zhuge <john.zh...@gmail.com>
>> >> wrote:
>> >>
>> >> > +1 (binding)
>> >> >
>> >> >- Verified checksums and signatures of all tarballs
>> >> >- Built source with native, Java 1.8.0_131-b11 on Mac OS X
>> >> 10.12.6
>> >> >- Verified cloud connectors:
>> >> >   - All S3A integration tests
>> >> >   - All ADL live unit tests
>> >> >- Deployed both binary and built source to a pseudo cluster,
>> >> passed the
>> >> >following sanity tests in insecure, SSL, and SSL+Kerberos
>> mode:
>> >> >   - HDFS basic and ACL
>> >> >   - DistCp basic
>> >> >   - MapReduce wordcount (only failed in SSL+Kerberos mode for
>> >> binary
>> >> >   tarball, probably unrelated)
>> >> >   - KMS and HttpFS basic
>> >> >   - Balancer start/stop
>> >> >
>> >> > Hit the following errors but they don't seem to be blocking:
>> >> >
>> >> > == Missing dependencies during build ==
>> >> >
>> >> > > ERROR: hadoop-aliyun has missing dependencies:
>> json-lib-jdk15.jar
>> >> > > ERROR: hadoop-azure has missing dependencies:
>> >> jetty-util-ajax-9.3.19.
>> >> > > v20170502.jar
>> >> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> >> okhttp-2.4.0.jar
>> >> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> >> okio-1.4.0.jar
>> >> >
>> >> >
>> >> > Filed HADOOP-14923, HADOOP-14924, and HADOOP-14925.
>> >> >
>> >> > == Unit tests failed in Kerberos+SSL mode for KMS and HttpFs
>> >> default HTTP
>> >> > servlet /c

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-03 Thread Andrew Wang
Thanks everyone for voting! With 4 binding +1s and 7 non-binding +1s, the
vote passes.

I'll get started on pushing out the release.

Best,
Andrew

On Tue, Oct 3, 2017 at 3:45 PM, Aaron Fabbri <fab...@cloudera.com> wrote:

> +1
>
> Built from source.  Ran S3A integration tests in us-west-2 with S3Guard
> (both Local and Dynamo metadatastore).
>
> Everything worked fine except I hit one integration test failure.  It is a
> minor test issue IMO and I've filed HADOOP-14927
>
> Failed tests:
>   ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
> Expected an exception, got 0
>   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
> Expected an exception, got 0
>
>
>
> On Tue, Oct 3, 2017 at 2:45 PM, Ajay Kumar <ajay.ku...@hortonworks.com>
> wrote:
>
>> +1 (non-binding)
>>
>> - built from source
>> - deployed on single node cluster
>> - Basic hdfs operations
>> - Run wordcount on a text file
>> Thanks,
>> Ajay
>>
>>
>> On 10/3/17, 1:04 PM, "Eric Badger" <ebad...@oath.com.INVALID> wrote:
>>
>> +1 (non-binding)
>>
>> - Verified all checksums and signatures
>> - Built native from source on macOS 10.12.6 and RHEL 7.1
>> - Deployed a single node pseudo cluster
>> - Ran pi and sleep jobs
>> - Verified Docker was marked as experimental
>>
>> Thanks,
>>
>> Eric
>>
>> On Tue, Oct 3, 2017 at 1:41 PM, John Zhuge <john.zh...@gmail.com>
>> wrote:
>>
>> > +1 (binding)
>> >
>> >- Verified checksums and signatures of all tarballs
>> >- Built source with native, Java 1.8.0_131-b11 on Mac OS X
>> 10.12.6
>> >- Verified cloud connectors:
>> >   - All S3A integration tests
>> >   - All ADL live unit tests
>> >- Deployed both binary and built source to a pseudo cluster,
>> passed the
>> >following sanity tests in insecure, SSL, and SSL+Kerberos mode:
>> >   - HDFS basic and ACL
>> >   - DistCp basic
>> >   - MapReduce wordcount (only failed in SSL+Kerberos mode for
>> binary
>> >   tarball, probably unrelated)
>> >   - KMS and HttpFS basic
>> >   - Balancer start/stop
>> >
>> > Hit the following errors but they don't seem to be blocking:
>> >
>> > == Missing dependencies during build ==
>> >
>> > > ERROR: hadoop-aliyun has missing dependencies: json-lib-jdk15.jar
>> > > ERROR: hadoop-azure has missing dependencies:
>> jetty-util-ajax-9.3.19.
>> > > v20170502.jar
>> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> okhttp-2.4.0.jar
>> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> okio-1.4.0.jar
>> >
>> >
>> > Filed HADOOP-14923, HADOOP-14924, and HADOOP-14925.
>> >
>> > == Unit tests failed in Kerberos+SSL mode for KMS and HttpFs
>> default HTTP
>> > servlet /conf, /stacks, and /logLevel ==
>> >
>> > One example below:
>> >
>> > >Connecting to
>> > > https://localhost:14000/logLevel?log=org.apache.hadoop.fs.
>> http.server.
>> > HttpFSServer
>> > >Exception in thread "main"
>> > > org.apache.hadoop.security.authentication.client.
>> > AuthenticationException:
>> > > Authentication failed, URL:
>> > > https://localhost:14000/logLevel?log=org.apache.hadoop.fs.
>> http.server.
>> > HttpFSServer=jzhuge,
>> > > status: 403, message: GSSException: Failure unspecified at
>> GSS-API level
>> > > (Mechanism level: Request is a replay (34))
>> >
>> >
>> > The /logLevel failure will affect command "hadoop daemonlog".
>> >
>> >
>> > On Tue, Oct 3, 2017 at 10:56 AM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> > wrote:
>> >
>> > > Thanks for all the votes thus far! We've gotten the binding +1's
>> to close
>> > > the release, though are there contributors who could kick the
>> tires on
>> > > S3Guard and YARN TSv2 alpha2? These are the two new features
>> merged since
>> > > alpha4, so it'd be good to get some coverage.
>>  

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-03 Thread Andrew Wang
Thanks for all the votes thus far! We've gotten the binding +1's to close
the release, though are there contributors who could kick the tires on
S3Guard and YARN TSv2 alpha2? These are the two new features merged since
alpha4, so it'd be good to get some coverage.



On Tue, Oct 3, 2017 at 9:45 AM, Brahma Reddy Battula <bra...@apache.org>
wrote:

>
> Thanks Andrew.
>
> +1 (non binding)
>
> --Built from source
> --installed 3 node HA cluster
> --Verified shell commands and UI
> --Ran wordcount/pic jobs
>
>
>
>
> On Fri, 29 Sep 2017 at 5:34 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Hi all,
>>
>> Let me start, as always, by thanking the many, many contributors who
>> helped
>> with this release! I've prepared an RC0 for 3.0.0-beta1:
>>
>> http://home.apache.org/~wang/3.0.0-beta1-RC0/
>>
>> This vote will run five days, ending on Nov 3rd at 5PM Pacific.
>>
>> beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
>> improvements, and feature enhancements. Notable additions include the
>> addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
>> shaded client, and HDFS erasure coding pluggable policy support.
>>
>> I've done the traditional testing of running a Pi job on a pseudo cluster.
>> My +1 to start.
>>
>> We're working internally on getting this run through our integration test
>> rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.
>>
>> Best,
>> Andrew
>>
> --
>
>
>
> --Brahma Reddy Battula
>


2017-09-20 Hadoop 3 release status update

2017-09-29 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-29

After about a month of slip, RC0 has been sent out for a VOTE. Focus now
turns to GA, where we will attempt to keep the original beta1 target date
(early November).

Highlights:

   - RC0 vote was sent out on Thursday, two binding +1's so far.

Red flags:

   - Resource profiles still has a number of pending subtasks, which is
   concerning from a schedule perspective. I emailed Wangda about this, and we
   need to discuss with other key contributors.
   - Native services has one pending subtask but we haven't gotten
   follow-on reviews from Allen (who -1'd the earlier merge vote). Need to
   confirm that we've satisfied his feedback.

Previously tracked beta1 blockers that have been resolved or dropped:

   - YARN-6623 was pushed out of beta1 to GA, has been committed so we can
   drop it from tracking.
   - HADOOP-14897  (Loosen
   compatibility guidelines for native dependencies): Patch committed!

beta1 blockers:

   - None, RC0 is out

GA blockers:

   - YARN-7134
    -
AppSchedulingInfo
   has a dependency on capacity scheduler OPEN  : this one popped out of
   nowhere, I don't have an update yet.
   - YARN-7178
    - Add
   documentation for Container Update API OPEN : this also popped out of
   nowhere, no update yet.
   - YARN-7275
    - NM
   Statestore cleanup for Container updates OPEN : Ditto
   - YARN-4859
    - [Bug]
   Unable to submit a job to a reservation when using FairScheduler OPEN :
   Ditto
   - YARN-4827
    - Document
   configuration of ReservationSystem for FairScheduler OPEN : Ditto

Features merged for GA:

   - Erasure coding
  - People are looking more at the flaky tests and nice-to-haves
  - Some bugs reported and being fixed based on testing at Cloudera
  - Need to finish the 3.0 must-do's.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Sean has posted a new rev of the rolling upgrade script
  - Some YARN PB backward compat issues that we decided weren't
  blockers and are scheduled for GA
   - Classpath isolation (HADOOP-11656)
  - HADOOP-13917
 (Ensure
  nightly builds run the integration tests for the shaded client):
Resolved,
  Sean retriggered and determined that this works.
  - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   )
  - A few subtasks are targeted at GA
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource profiles (YARN-3926
    and YARN-7069
   ) (Wangda Tan)
  - This has been merged for 3.1.0, YARN-7069 tracks follow on work
  - ~7 patch available subtasks, I asked Wangda to set up a JIRA query
  for tracking this
   - HDFS router-based federation (HDFS-10467
   ) (Inigo Goiri and
   Chris Douglas)
   - Inigo sent out the merge vote
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan sent out a discuss thread for merge, thinking is early
  next week. Larry did a security-oriented review.
   - YARN native services (YARN-5079
   ) (Jian He)
  - Subtasks were filed to address Allen's review comments from the
  previous merge vote, only one pending
  - We need to confirm with Allen that this is ready to go, he hasn't
  been reviewing


Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread Andrew Wang
Hi Jonathan,

I'm okay with putting this into branch-3.0 for GA if it can be merged
within the next two weeks. Even though beta1 has slipped by a month, I want
to stick to the targeted GA data of Nov 1st as much as possible. Of course,
let's not sacrifice quality or stability for speed; if something's not
ready, let's defer it to 3.1.0.

Subru, have you been able to review this feature from the 2.9.0
perspective? It'd add confidence if you think it's immediately ready for
merging to branch-2 for 2.9.0.

Thanks,
Andrew

On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
wrote:

> Hi everyone,
>
> Starting this thread to discuss merging API-based scheduler configuration
> to trunk/branch-2. The feature adds the framework for allowing users to
> modify scheduler configuration via REST or CLI using a configurable backend
> (leveldb/zk are currently supported), and adds capacity scheduler support
> for this. The umbrella JIRA is YARN-5734. All the required work for this
> feature is done and committed to branch YARN-5734, and a full diff has been
> generated at YARN-7241.
>
> Regarding compatibility, this feature is configurable and turned off by
> default.
>
> The feature has been tested locally on a couple RMs (since it is an RM
> only change), with queue addition/removal/updates tested on single RM
> (leveldb) and two RMs (zk). Also we verified the original configuration
> update mechanism (via refreshQueues) is unaffected when the feature is
> off/not configured.
>
> Our original plan was to merge this to trunk (which is what the YARN-7241
> diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> what are your thoughts on also merging this to branch-3.0?
>
> Thanks!
>
> Jonathan Hung
>


[VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-09-28 Thread Andrew Wang
Hi all,

Let me start, as always, by thanking the many, many contributors who helped
with this release! I've prepared an RC0 for 3.0.0-beta1:

http://home.apache.org/~wang/3.0.0-beta1-RC0/

This vote will run five days, ending on Nov 3rd at 5PM Pacific.

beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
improvements, and feature enhancements. Notable additions include the
addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
shaded client, and HDFS erasure coding pluggable policy support.

I've done the traditional testing of running a Pi job on a pseudo cluster.
My +1 to start.

We're working internally on getting this run through our integration test
rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.

Best,
Andrew


Re: Heads up: branching branch-3.0.0-beta1 off of branch-3.0

2017-09-28 Thread Andrew Wang
Branch has been cut, branch-3.0 is now open for commits for 3.0.0 GA.

HEAD of branch-3.0.0-beta1 is 2223393ad1d5ffdd62da79e1546de79c6259dc12.

On Thu, Sep 28, 2017 at 10:52 AM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Hi folks,
>
> We've driven the blocker count down to 0, and I went through and made sure
> the fix versions and release notes and so on are all lined up.
>
> I'm going to cut branch-3.0.0-beta1 off branch-3.0 and try and get RC0 out
> today.
>
> Cheers,
> Andrew
>


Heads up: branching branch-3.0.0-beta1 off of branch-3.0

2017-09-28 Thread Andrew Wang
Hi folks,

We've driven the blocker count down to 0, and I went through and made sure
the fix versions and release notes and so on are all lined up.

I'm going to cut branch-3.0.0-beta1 off branch-3.0 and try and get RC0 out
today.

Cheers,
Andrew


2017-09-22 Hadoop 3 release status update

2017-09-22 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-22

We've had some late breaking blockers related to Docker support that are
delaying the release. We're on a day-by-day slip at this point.



Highlights:

   - I did a successful test create-release earlier this week.

Red flags:

   - Docker work resulted in some last minute blockers

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): Dropped this from the blocker list as
   it's mainly for documentation purposes
   - HDFS-12247 (Rename AddECPolicyResponse to
   AddErasureCodingPolicyResponse) was committed.

beta1 blockers:

   - YARN-6623  (Add
   support to turn off launching privileged containers in the
   container-executor): This is a newly escalated blocker related to the
   Docker work in YARN. Patch is up but we're still waiting on a commit.
   - HADOOP-14897  (Loosen
   compatibility guidelines for native dependencies): Raised by Chris Douglas,
   Daniel will post a patch soon.

beta1 features:

   - Erasure coding
  - Resolved last must-do for beta1!
  - People are looking more at the flaky tests and nice-to-haves
  - Eddy continues to make improvements to block reconstruction
  codepaths
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray has gone through almost all the YARN protos and thinks we're okay
  to move forwards.
  - I think we'll move forward without this committed, given that Sean
  has run it successfully.
   - Classpath isolation (HADOOP-11656)
  - HADOOP-13917
 (Ensure
  nightly builds run the integration tests for the shaded client):
Sean wants
  to get this in before beta1 if there's time, it's already
catching issues.
  Relies on YETUS-543 which I reviewed, waiting on Allen.
  - HADOOP-14771 might be squeezed in if there's time.
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14897 Above mentioned blocker filed by Chris Douglas.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.
   - YARN native services
  - Still not 100% clear when this will land.


Re: [DISCUSS] moving to Apache Yetus Audience Annotations

2017-09-22 Thread Andrew Wang
Yea, unfortunately I'd say backburner it. This would have been perfect
during alpha.

On Fri, Sep 22, 2017 at 11:14 AM, Sean Busbey <bus...@cloudera.com> wrote:

> I'd refer to it as an incompatible change; we expressly label the
> annotations as IA.Public.
>
> If you think it's too late to get in for 3.0, I can make a jira and put it
> on the back burner for when trunk goes to 4.0?
>
> On Fri, Sep 22, 2017 at 12:49 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Is this itself an incompatible change? I imagine the bytecode will be
>> different.
>>
>> I think we're too late to do this for beta1 given that I want to cut an
>> RC0 today.
>>
>> On Fri, Sep 22, 2017 at 7:03 AM, Sean Busbey <bus...@cloudera.com> wrote:
>>
>>> When Apache Yetus formed, it started with several key pieces of Hadoop
>>> that
>>> looked reusable. In addition to our contribution testing infra, the
>>> project
>>> also stood up a version of our audience annotations for delineating the
>>> public facing API[1].
>>>
>>> I recently got the Apache HBase community onto the Yetus version of those
>>> annotations rather than their internal fork of the Hadoop ones[2]. It
>>> wasn't pretty, mostly a lot of blind sed followed by spot checking and
>>> reliance on automated tests.
>>>
>>> What do folks think about making the jump ourselves? I'd be happy to work
>>> through things, either as one unreviewable monster or per-module
>>> transitions (though a piece-meal approach might complicate our javadoc
>>> situation).
>>>
>>>
>>> [1]: http://yetus.apache.org/documentation/0.5.0/interface-classi
>>> fication/
>>> [2]: https://issues.apache.org/jira/browse/HBASE-17823
>>>
>>> --
>>> busbey
>>>
>>
>>
>
>
> --
> busbey
>


Re: [DISCUSS] moving to Apache Yetus Audience Annotations

2017-09-22 Thread Andrew Wang
Is this itself an incompatible change? I imagine the bytecode will be
different.

I think we're too late to do this for beta1 given that I want to cut an RC0
today.

On Fri, Sep 22, 2017 at 7:03 AM, Sean Busbey  wrote:

> When Apache Yetus formed, it started with several key pieces of Hadoop that
> looked reusable. In addition to our contribution testing infra, the project
> also stood up a version of our audience annotations for delineating the
> public facing API[1].
>
> I recently got the Apache HBase community onto the Yetus version of those
> annotations rather than their internal fork of the Hadoop ones[2]. It
> wasn't pretty, mostly a lot of blind sed followed by spot checking and
> reliance on automated tests.
>
> What do folks think about making the jump ourselves? I'd be happy to work
> through things, either as one unreviewable monster or per-module
> transitions (though a piece-meal approach might complicate our javadoc
> situation).
>
>
> [1]: http://yetus.apache.org/documentation/0.5.0/interface-classification/
> [2]: https://issues.apache.org/jira/browse/HBASE-17823
>
> --
> busbey
>


2017-09-19 Hadoop 3 release status update

2017-09-19 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-19

Sorry for the late update. We're down to one blocker and one EC must do!
Made great progress over the last week and a bit.

We will likely cut RC0 this week.

Highlights:

   - Down to just two blocker issues!

Red flags:

   - HDFS unit tests are quite flaky. Some blockers were filed and then
   resolved or downgraded. More work to do here.

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14738  (Remove
   S3N and obsolete bits of S3A; rework docs): Committed!
   - HADOOP-14284  (Shade
   Guava everywhere): We resolved this since we decided it was unnecessary for
   beta1.
   - YARN-7162  (Remove
   XML excludes file format): Robert committed after review from Junping.
   - HADOOP-14847  (Remove
   Guava Supplier and change to java Supplier in AMRMClient and
   AMRMClientAysnc): Committed!
   - HADOOP-14238 
(Rechecking
   Guava's object is not exposed to user-facing API): We dropped this off the
   blocker list in the absence of other known issues
   - HADOOP-14835  (mvn
   site build throws SAX errors): I committed after further discussion and
   review with Sean Mackrory and Allen. Planning to switch to japicmp for
   later releases.
   - HDFS-12218  (Rename
   split EC / replicated block metrics in BlockManager): Committed.


beta1 blockers:

   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): This was committed but then reverted
   since it broke the build. Haibo and Sean are actively pressing towards a
   correct fix.


beta1 features:

   - Erasure coding
  - Resolved a number of must-dos
 - HDFS-7859 (fsimage changes) was committed!
 - HDFS-12395 (edit log changes) was also committed!
 - HDFS-12218 is discussed above.
  - Remaining blockers:
 - HDFS-12447 is to refactor some of the fsimage code, Andrew needs
 to review
  - Also been progress cleaning up the flaky unit tests, still more to
  do
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray has gone through almost all the YARN protos and thinks we're okay
  to move forwards.
  - I think we'll move forward without this committed, given that Sean
  has run it successfully.
   - Classpath isolation (HADOOP-11656)
  - We have just HADOOP-14771 left.
   - Compat guide (HADOOP-13714
   )
  - This was committed! Some follow-on work filed for GA.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.
   - YARN native services
  - Still not 100% clear when this will land.


Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-11 Thread Andrew Wang
Thanks for your consideration Jian, let's track this for GA then.

Best,
Andrew

On Fri, Sep 8, 2017 at 3:02 PM, Jian He <j...@hortonworks.com> wrote:

> Hi Andrew,
>
> At this point, there are no more release blockers including documentations
> from our side - all work done.
> But I agree it is too close to the release, after talking with other team
> members, we are fine to drop  this from beta,
>
> And we want to target this for GA.
> I’m withdrawing this vote and will start afresh vote later for GA.
> Thanks all who voted this effort !
>
> Thanks,
> Jian
>
>
> > On Sep 7, 2017, at 3:59 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> >
> > Hi folks,
> >
> > This vote closes today. I see a -1 from Allen on inclusion in beta1. I
> see
> > there's active fixing going on, but given that we're one week out from
> RC0,
> > I think we should drop this from beta1.
> >
> > Allen, Jian, others, is this reasonable? What release should we retarget
> > this for? I don't have a sense for how much work there is left to do, but
> > as a reminder, we're planning GA for Nov 1st, and 3.1.0 for January.
> >
> > Best,
> > Andrew
> >
> > On Wed, Sep 6, 2017 at 10:19 AM, Jian He <j...@hortonworks.com> wrote:
> >
> >>>  Please correct me if I’m wrong, but the current summary of the
> >> branch, post these changes, looks like:
> >> Sorry for confusion, I was actively writing the formal documentation for
> >> how to use/how it works etc. and will post soon in a few hours.
> >>
> >>
> >>> On Sep 6, 2017, at 10:15 AM, Allen Wittenauer <
> a...@effectivemachines.com>
> >> wrote:
> >>>
> >>>
> >>>> On Sep 5, 2017, at 6:23 PM, Jian He <j...@hortonworks.com> wrote:
> >>>>
> >>>>>If it doesn’t have all the bells and whistles, then it shouldn’t
> >> be on port 53 by default.
> >>>> Sure, I’ll change the default port to not use 53 and document it.
> >>>>>*how* is it getting launched on a privileged port? It sounds like
> >> the expectation is to run “command” as root.   *ALL* of the previous
> >> daemons in Hadoop that needed a privileged port used jsvc.  Why isn’t
> this
> >> one? These questions matter from a security standpoint.
> >>>> Yes, it is running as “root” to be able to use the privileged port.
> The
> >> DNS server is not yet integrated with the hadoop script.
> >>>>
> >>>>> Check the output.  It’s pretty obviously borked:
> >>>> Thanks for pointing out. Missed this when rebasing onto trunk.
> >>>
> >>>
> >>>  Please correct me if I’m wrong, but the current summary of the
> >> branch, post these changes, looks like:
> >>>
> >>>  * A bunch of mostly new Java code that may or may not have
> >> javadocs (post-revert YARN-6877, still working out HADOOP-14835)
> >>>  * ~1/3 of the docs are roadmap/TBD
> >>>  * ~1/3 of the docs are for an optional DNS daemon that has
> >> no end user hook to start it
> >>>  * ~1/3 of the docs are for a REST API that comes from some
> >> undefined daemon (apiserver?)
> >>>  * Two new, but undocumented, subcommands to yarn
> >>>  * There are no docs for admins or users on how to actually
> >> start or use this completely new/separate/optional feature
> >>>
> >>>  How are outside people (e.g., non-branch committers) supposed to
> >> test this new feature under these conditions?
> >>>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>
> >>
>
>


2017-09-07 Hadoop 3 release status update

2017-09-07 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-07

Slightly early update since I'll be out tomorrow. We're one week out, and
focus is on blocker burndown.

Highlights:

   - 3.1.0 release planning is underway, led by Wangda. Target release date
   is in January.

Red flags:

   - YARN native services merge vote got a -1 for beta1, I recommended we
   drop it from beta1 and retarget for a later release.
   - 11 blockers on the dashboard, one more than last week [image: (sad)]

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14826 was duped to HADOOP-14738.
   - YARN-5536  (Multiple
   format support (JSON, etc.) for exclude node file in NM graceful
   decommission with timeout): Downgraded in priority in favor of YARN-7162
   which Robert has posted a patch for.
   - MAPREDUCE-6941 (The default setting doesn't work for MapReduce job): I
   resolved this and Junping confirmed this is fine.


beta1 blockers:

   - HADOOP-14738  (Remove
   S3N and obsolete bits of S3A; rework docs): Steve has been actively revving
   this with our new committer Aaron Fabbri ready to review. The scope has
   expanded from HADOOP-14826, so it's not just a doc update.
   - HADOOP-14284  (Shade
   Guava everywhere): No change since last week. This is an umbrella JIRA.
   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): Patch up, needs review, still waiting
   on Busbey. Bharat gave it a review.
   - YARN-7162  (Remove
   XML excludes file format): Robert has posted a patch and is waiting for a
   review.
   - HADOOP-14238 
(Rechecking
   Guava's object is not exposed to user-facing API): Bharat took this up and
   turned it into an umbrella.
  - HADOOP-14847
 (Remove
  Guava Supplier and change to java Supplier in AMRMClient and
  AMRMClientAysnc) Bharat posted a patch on a subtask to fix the
known Guava
  Supplier issue in AMRMClient. Needs a review.
   - HADOOP-14835  (mvn
   site build throws SAX errors): I'm working on this. Debugged it and have a
   proposed patch up, discussing with Allen.
   - HDFS-12218  (Rename
   split EC / replicated block metrics in BlockManager): I'm working on this,
   just need to commit it, already have a +1 from Eddy.


beta1 features:

   - Erasure coding
  - There are three must-dos, all being actively worked on.
  - HDFS-7859 is being actively reviewed and revved by Sammi and Kai
  and Eddy.
  - HDFS-12395 was split out of HDFS-7859 to do the edit log changes.
  - HDFS-12218 is discussed above.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray and Allen reviewed Sean's HDFS rolling upgrade scripts.
  - Sean did a run through of the HDFS JACC report and it looked fine.
   - Classpath isolation (HADOOP-11656)
  - Sean has retriaged the subtasks and has been posting patches.
   - Compat guide (HADOOP-13714
   )
  - Daniel has been collecting feedback on dev lists, but still needs a
  detailed review of the patch.
   - YARN native services
  - Jian sent out the merge vote, but it's been -1'd for beta1 by
  Allen. I propose we drop this from beta1 scope and retarget.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.


Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-07 Thread Andrew Wang
Hi folks,

This vote closes today. I see a -1 from Allen on inclusion in beta1. I see
there's active fixing going on, but given that we're one week out from RC0,
I think we should drop this from beta1.

Allen, Jian, others, is this reasonable? What release should we retarget
this for? I don't have a sense for how much work there is left to do, but
as a reminder, we're planning GA for Nov 1st, and 3.1.0 for January.

Best,
Andrew

On Wed, Sep 6, 2017 at 10:19 AM, Jian He  wrote:

> >   Please correct me if I’m wrong, but the current summary of the
> branch, post these changes, looks like:
> Sorry for confusion, I was actively writing the formal documentation for
> how to use/how it works etc. and will post soon in a few hours.
>
>
> > On Sep 6, 2017, at 10:15 AM, Allen Wittenauer 
> wrote:
> >
> >
> >> On Sep 5, 2017, at 6:23 PM, Jian He  wrote:
> >>
> >>> If it doesn’t have all the bells and whistles, then it shouldn’t
> be on port 53 by default.
> >> Sure, I’ll change the default port to not use 53 and document it.
> >>> *how* is it getting launched on a privileged port? It sounds like
> the expectation is to run “command” as root.   *ALL* of the previous
> daemons in Hadoop that needed a privileged port used jsvc.  Why isn’t this
> one? These questions matter from a security standpoint.
> >> Yes, it is running as “root” to be able to use the privileged port. The
> DNS server is not yet integrated with the hadoop script.
> >>
> >>> Check the output.  It’s pretty obviously borked:
> >> Thanks for pointing out. Missed this when rebasing onto trunk.
> >
> >
> >   Please correct me if I’m wrong, but the current summary of the
> branch, post these changes, looks like:
> >
> >   * A bunch of mostly new Java code that may or may not have
> javadocs (post-revert YARN-6877, still working out HADOOP-14835)
> >   * ~1/3 of the docs are roadmap/TBD
> >   * ~1/3 of the docs are for an optional DNS daemon that has
> no end user hook to start it
> >   * ~1/3 of the docs are for a REST API that comes from some
> undefined daemon (apiserver?)
> >   * Two new, but undocumented, subcommands to yarn
> >   * There are no docs for admins or users on how to actually
> start or use this completely new/separate/optional feature
> >
> >   How are outside people (e.g., non-branch committers) supposed to
> test this new feature under these conditions?
> >
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Resolved] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job

2017-09-05 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved MAPREDUCE-6941.

Resolution: Not A Problem

I'm going to close this based on Ray's analysis. Junping, if you disagree, 
please re-open the JIRA.

> The default setting doesn't work for MapReduce job
> --
>
> Key: MAPREDUCE-6941
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Junping Du
>Priority: Blocker
>
> On the deployment of hadoop 3 cluster (based on current trunk branch) with 
> default settings, the MR job will get failed as following exceptions:
> {noformat}
> 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 
> running in uber mode : false
> 2017-08-16 13:00:03,847 INFO mapreduce.Job:  map 0% reduce 0%
> 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed 
> with state FAILED due to: Application application_1502913552390_0001 failed 2 
> times due to AM Container for appattempt_1502913552390_0001_02 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from 
> container-launch.
> Container id: container_1502913552390_0001_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:994)
>   at org.apache.hadoop.util.Shell.run(Shell.java:887)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because mapreduce related jar are not added into yarn setup by 
> default. To make MR job run successful, we need to add following 
> configurations to yarn-site.xml now:
> {noformat}
> 
>   yarn.application.classpath
>   
> ...
> /share/hadoop/mapreduce/*,
> /share/hadoop/mapreduce/lib/*
> ...
>   
> {noformat}
> But this config is not necessary for previous version of Hadoop. We should 
> fix this issue before beta release otherwise it will be a regression for 
> configuration changes.
> This could be more like a YARN issue (if so, we should move), depends on how 
> we fix it finally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



2017-09-01 Hadoop 3 release status update

2017-09-01 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-01

We're two weeks out from beta1, focus is on blocker burndown.

Highlights:

   - S3Guard merged!
   - TSv2 alpha2 merged!
   - branch-3.0 has been cut after discussion on dev lists.

Red flags:

   - 10 blockers on the dashboard, closed and bumped some but new ones
   appeared.
   - Still need to land YARN native services and fix some S3Guard doc
   issues for beta1.
   - Rolling upgrade JIRAs for YARN and HDFS are not making any visible
   progress

Previously tracked beta1 blockers that have been resolved:

   - HADOOP-13363  (Upgrade
   to protobuf 3): I dropped this from beta1 since it's simply not going to
   happen in time.
   - YARN-7076 : This was
   quickly resolved! Thanks Jian, Junping, Jason for the action.
   - YARN-7094  (Document
   that server-side graceful decom is currently not recommended): Patch
   committed!

beta1 blockers:

   - HADOOP-14826  (review
   S3 docs prior to 3.0.0-beta1): New blocker with S3Guard merged. Should just
   be a quick doc update.
   - HADOOP-14284  (Shade
   Guava everywhere): Agreement to shade yarn-client at at HADOOP-14771.
   Shading hadoop-hdfs is still being discussed?
   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): Patch up, needs review, waiting on
   Busbey
   - YARN-5536  (Multiple
   format support (JSON, etc.) for exclude node file in NM graceful
   decommission with timeout): We're waiting on input from Junping.
   - MAPREDUCE-6941 (The default setting doesn't work for MapReduce job):
   Ray thinks this is a Won't Fix, waiting on Junping to confirm.
   - HADOOP-14238 (Rechecking Guava's object is not exposed to user-facing
   API): This relates to HADOOP-14771, I left a JIRA comment.

beta1 features:

   - Erasure coding
  - There are three must-dos. Two have patches, one might not be a
  must-do.
  - HDFS-11882 has been revved and reviewed, seems close
  - HDFS-11467 and HDFS-7859 are related, Sammi/Eddy/Kai are
  discussing, Sammi thinks we can still make beta1.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
  - Sean has HDFS rolling upgrade scripts up, waiting on Ray to add
  some YARN/MR coverage too.
  - Need to do a final runthrough of the JACC reports for YARN and HDFS.
   - Classpath isolation (HADOOP-11656)
  - Sean has retriaged the subtasks and has been posting patches.
   - Compat guide (HADOOP-13714
   )
  - New patch is up, but needs review. Daniel asked Chris Douglas and
  Steve Loughran.
   - YARN native services
  - Jian sent out the merge vote
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.


Heads up: branch-3.0 has been cut, commit here for 3.0.0-beta1

2017-09-01 Thread Andrew Wang
Hi folks,

I've proceeded with the plan from our earlier thread and cut branch-3.0.
The branches and maven versions are now set as follows:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-beta1-SNAPSHOT

branch-2's are still the same.

This means if you want to commit something for beta1, commit it to
branch-3.0 too. Excepting features already committed for beta1 (e.g. EC,
native services, S3Guard, TSv2, YARN federation), please treat branch-3.0
the same as a maintenance release branch.

I'm planning to cut the release branch branch-3.0.0-beta1 just before RC.
If you have anything we pushed out of 3.0.0-beta1 and is waiting for 3.0.0
GA, please hold it in trunk until after we release 3.0.0-beta1 (which
should be relatively soon).

Best,
Andrew


Re: Branch merges and 3.0.0-beta1 scope

2017-08-29 Thread Andrew Wang
Hi Vinod,

On Fri, Aug 25, 2017 at 2:42 PM, Vinod Kumar Vavilapalli  wrote:

> > From a release management perspective, it's *extremely* reasonable to
> block the inclusion of new features a month from the planned release date.
> A typical software development lifecycle includes weeks of feature freeze
> and weeks of code freeze. It is no knock on any developer or any feature to
> say that we should not include something in 3.0.0.
>
>
> We have never followed the ‘typical' lifecycle that I am guessing you are
> referring to. If we are, you'll need to publish some of the following: a
> feature freeze date, blockers-criticals-only-from-now date,
> testing-finish date, documentation-finish date, final release date and so
> on.
>

We discussed this as part of the 3.0 alpha/beta/GA plan. The point of the
extended alpha/beta process was to release on a schedule. Things that
weren't ready could be merged for the next alpha. I also advertised alpha4
as feature complete and beta1 as code complete so we could quickly move on
to GA.


> What we do with Apache releases typically is instead we say ‘this' is
> roughly when we want to release, and roughly what features must land and
> let the rest figure out itself.
>
> We did this too. We defined the original scope for 3.0.0 GA way back when
we started the 3.0.0 release process. I've been writing status updates on
the wiki and tracking targeted features and release blockers throughout.

The target versions of this recent batch of features were not discussed
with me, the release manager, until just recently. After some discussion, I
think we've arrived at a release plan that everyone's happy with. But, I
want to be clear that late-breaking inclusion of additional scope should be
considered the exception rather than the norm. Merging code so close to
release means less time for testing and validation, which means lower
quality releases.

I don't think it's a lot to ask that feature leads shoot an email to the
release manager of their target version. DISCUSS emails right before a
proposed merge VOTE are way too late, it ends up being a fire drill where
we need to scramble on many fronts.


> Neither is right or wrong. If we want to change the process, we should
> communicate as such.
>
> Proposing a feature freeze date on the fly is only going to confuse
> people.
>

> > I've been very open and clear about the goals, schedule, and scope of
> 3.0.0 over the last year plus. The point of the extended alpha process was
> to get all our features in during alpha, and the alpha merge window has
> been open for a year. I'm unmoved by arguments about how long a feature has
> been worked on. None of these were not part of the original 3.0.0 scope,
> and our users have been waiting even longer for big-ticket 3.0 items like
> JDK8 and HDFS EC that were part of the discussed scope.
>
>
> Except our schedule is so fluid (not due to the release management process
> to be fair) that it is hard for folks to plan their features. IIRC, our
> schedule was a GA release beginning of this year. Again, this is not a
> critique of 3.0 release process - I have myself done enough releases to
> know that sticking to a date and herding the crowd has been an extremely
> hard job.
>
>
Schedules have been fluid because we don't know when features are getting
in, and there's an unwillingness to bump features to the next release. The
goal of the 3.x alphas and betas was to break out of this release
anti-pattern, and release on a schedule.

There have been schedule delays during the 3.x alphas, but I'm still proud
that we released 4 alphas in 10 months. I'm doing my best to stick to our
published schedule, and add a beta and GA to that list by EOY.

Best,
Andrew


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Hi Subru,

Basically we're amending the proposal from the original email in the chain
to also immediately create the branch-3.0.0-beta1 release branch. As
described in my 2017-08-25 wiki update, we're gating the merge of these two
features to branch-3.0 on additional testing,  but this keeps 3.0.0 open
for development.

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

For completeness, here's what our branches and versions would look like:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-SNAPSHOT
branch-3.0.0-beta1: 3.0.0-beta1-SNAPSHOT
branch-2 and etc: remain as is

Best,
Andrew

On Tue, Aug 29, 2017 at 12:21 PM, Subramaniam V K <subru...@gmail.com>
wrote:

> Andrew,
>
> First up thanks for tirelessly pushing on 3.0 release.
>
> I am confused about your comment on creating 2 branches as my
> understanding of Jason's (and Vinod's) comments are that we defer creating
> branch-3?
>
> IMHO, we should consider creating branch-3 (necessary but not sufficient)
> only when we have:
>
>1. a significant incompatible change.
>2. a new feature that cannot be turned off without affecting core
>components.
>
> In summary, I feel we should follow a lazy rather than eager approach
> towards creating mainline branches.
>
> Thanks,
> Subru
>
>
>
> On Tue, Aug 29, 2017 at 11:45 AM, Wangda Tan <wheele...@gmail.com> wrote:
>
>> Gotcha, make sense, so I will hold commit until you cut the two branches
>> and TSv2 get committed.
>>
>> Thanks,
>> Wangda
>>
>> On Tue, Aug 29, 2017 at 11:25 AM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>
>> > Hi Wangda,
>> >
>> > I'll cut two branches: branch-3.0 (3.0.0-SNAPSHOT) and
>> branch-3.0.0-beta1
>> > (3.0.0-beta1-SNAPSHOT). This way we can merge GA features to branch-3.0
>> but
>> > not branch-3.0.0-beta1.
>> >
>> > Best,
>> > Andrew
>> >
>> > On Tue, Aug 29, 2017 at 11:18 AM, Wangda Tan <wheele...@gmail.com>
>> wrote:
>> >
>> >> Vrushali,
>> >>
>> >> Sure we can wait TSv2 merged before merge resource profile branch.
>> >>
>> >> Andrew,
>> >>
>> >> My understanding is you're going to cut branch-3.0 for 3.0-beta1, and
>> the
>> >> same branch (branch-3.0) will be used for 3.0-GA as well. So my
>> question
>> >> is, there're several features (TSv2, resource profile, YARN-5734) are
>> >> targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
>> >> commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4,
>> you
>> >> will cut branch-3.0.0-beta1, correct?
>> >>
>> >> Thanks,
>> >> Wangda
>> >>
>> >>
>> >> On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> >> wrote:
>> >>
>> >>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>> >>>
>> >>> We're still waiting on the native services and S3Guard merges, but I
>> >>> don't want to hold branching to the last minute.
>> >>>
>> >>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C <vrushalic2...@gmail.com
>> >
>> >>> wrote:
>> >>>
>> >>>> Hi Andrew,
>> >>>> As Rohith mentioned, if you are good with it, from the TSv2 side, we
>> >>>> are ready to go for merge tonight itself (Pacific time)  right after
>> the
>> >>>> voting period ends. Varun Saxena has been diligently rebasing up
>> until now
>> >>>> so most likely our merge should be reasonably straightforward.
>> >>>>
>> >>>> @Wangda: your resource profile vote ends tomorrow, could we please
>> >>>> coordinate our merges?
>> >>>>
>> >>>> thanks
>> >>>> Vrushali
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>> >>>> rohithsharm...@apache.org> wrote:
>> >>>>
>> >>>>> On 29 August 2017 at 06:24, Andrew Wang <andrew.w...@cloudera.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>> > So far I've seen no -1's to the branching proposal, so I plan to
>> >>>>> execute
>> >>>>> > this tomorrow unless there's further feedback.
>> >>>>> >
>> >&

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Hi Wangda,

I'll cut two branches: branch-3.0 (3.0.0-SNAPSHOT) and branch-3.0.0-beta1
(3.0.0-beta1-SNAPSHOT). This way we can merge GA features to branch-3.0 but
not branch-3.0.0-beta1.

Best,
Andrew

On Tue, Aug 29, 2017 at 11:18 AM, Wangda Tan <wheele...@gmail.com> wrote:

> Vrushali,
>
> Sure we can wait TSv2 merged before merge resource profile branch.
>
> Andrew,
>
> My understanding is you're going to cut branch-3.0 for 3.0-beta1, and the
> same branch (branch-3.0) will be used for 3.0-GA as well. So my question
> is, there're several features (TSv2, resource profile, YARN-5734) are
> targeted to merge to 3.0-GA but not 3.0-beta1, which branch we should
> commit to, and when we can commit? Also, similar to 3.0.0-alpha1 to 4, you
> will cut branch-3.0.0-beta1, correct?
>
> Thanks,
> Wangda
>
>
> On Tue, Aug 29, 2017 at 11:05 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Sure. Ping me when the TSv2 goes in, and I can take care of branching.
>>
>> We're still waiting on the native services and S3Guard merges, but I
>> don't want to hold branching to the last minute.
>>
>> On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C <vrushalic2...@gmail.com>
>> wrote:
>>
>>> Hi Andrew,
>>> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
>>> ready to go for merge tonight itself (Pacific time)  right after the voting
>>> period ends. Varun Saxena has been diligently rebasing up until now so most
>>> likely our merge should be reasonably straightforward.
>>>
>>> @Wangda: your resource profile vote ends tomorrow, could we please
>>> coordinate our merges?
>>>
>>> thanks
>>> Vrushali
>>>
>>>
>>> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
>>> rohithsharm...@apache.org> wrote:
>>>
>>>> On 29 August 2017 at 06:24, Andrew Wang <andrew.w...@cloudera.com>
>>>> wrote:
>>>>
>>>> > So far I've seen no -1's to the branching proposal, so I plan to
>>>> execute
>>>> > this tomorrow unless there's further feedback.
>>>> >
>>>> For on going branch merge threads i.e TSv2, voting will be closing
>>>> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>>>> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
>>>> couple of more days before creating branch-3.0 so that TSv2 branch merge
>>>> would be done directly to trunk?
>>>>
>>>>
>>>>
>>>> >
>>>> > Regarding the above discussion, I think Jason and I have essentially
>>>> the
>>>> > same opinion.
>>>> >
>>>> > I hope that keeping trunk a release branch means a higher bar for
>>>> merges
>>>> > and code review in general. In the past, I've seen some patches
>>>> committed
>>>> > to trunk-only as a way of passing responsibility to a future user or
>>>> > reviewer. That doesn't help anyone; patches should be committed with
>>>> the
>>>> > intent of running them in production.
>>>> >
>>>> > I'd also like to repeat the above thanks to the many, many
>>>> contributors
>>>> > who've helped with release improvements. Allen's work on
>>>> create-release and
>>>> > automated changes and release notes were essential, as was Xiao's
>>>> work on
>>>> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
>>>> > improvements, which addresses one of the remaining sore spots in the
>>>> > release process.
>>>> >
>>>> > Things have gotten smoother with each alpha we've done over the last
>>>> year,
>>>> > and it's a testament to everyone's work that we have a good
>>>> probability of
>>>> > shipping beta and GA later this year.
>>>> >
>>>> > Cheers,
>>>> > Andrew
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-29 Thread Andrew Wang
Sure. Ping me when the TSv2 goes in, and I can take care of branching.

We're still waiting on the native services and S3Guard merges, but I don't
want to hold branching to the last minute.

On Tue, Aug 29, 2017 at 10:51 AM, Vrushali C <vrushalic2...@gmail.com>
wrote:

> Hi Andrew,
> As Rohith mentioned, if you are good with it, from the TSv2 side, we are
> ready to go for merge tonight itself (Pacific time)  right after the voting
> period ends. Varun Saxena has been diligently rebasing up until now so most
> likely our merge should be reasonably straightforward.
>
> @Wangda: your resource profile vote ends tomorrow, could we please
> coordinate our merges?
>
> thanks
> Vrushali
>
>
> On Mon, Aug 28, 2017 at 10:45 PM, Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> On 29 August 2017 at 06:24, Andrew Wang <andrew.w...@cloudera.com> wrote:
>>
>> > So far I've seen no -1's to the branching proposal, so I plan to execute
>> > this tomorrow unless there's further feedback.
>> >
>> For on going branch merge threads i.e TSv2, voting will be closing
>> tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
>> branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
>> couple of more days before creating branch-3.0 so that TSv2 branch merge
>> would be done directly to trunk?
>>
>>
>>
>> >
>> > Regarding the above discussion, I think Jason and I have essentially the
>> > same opinion.
>> >
>> > I hope that keeping trunk a release branch means a higher bar for merges
>> > and code review in general. In the past, I've seen some patches
>> committed
>> > to trunk-only as a way of passing responsibility to a future user or
>> > reviewer. That doesn't help anyone; patches should be committed with the
>> > intent of running them in production.
>> >
>> > I'd also like to repeat the above thanks to the many, many contributors
>> > who've helped with release improvements. Allen's work on create-release
>> and
>> > automated changes and release notes were essential, as was Xiao's work
>> on
>> > LICENSE and NOTICE files. I'm also looking forward to Marton's site
>> > improvements, which addresses one of the remaining sore spots in the
>> > release process.
>> >
>> > Things have gotten smoother with each alpha we've done over the last
>> year,
>> > and it's a testament to everyone's work that we have a good probability
>> of
>> > shipping beta and GA later this year.
>> >
>> > Cheers,
>> > Andrew
>> >
>> >
>>
>
>


Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Andrew Wang
So far I've seen no -1's to the branching proposal, so I plan to execute
this tomorrow unless there's further feedback.

Regarding the above discussion, I think Jason and I have essentially the
same opinion.

I hope that keeping trunk a release branch means a higher bar for merges
and code review in general. In the past, I've seen some patches committed
to trunk-only as a way of passing responsibility to a future user or
reviewer. That doesn't help anyone; patches should be committed with the
intent of running them in production.

I'd also like to repeat the above thanks to the many, many contributors
who've helped with release improvements. Allen's work on create-release and
automated changes and release notes were essential, as was Xiao's work on
LICENSE and NOTICE files. I'm also looking forward to Marton's site
improvements, which addresses one of the remaining sore spots in the
release process.

Things have gotten smoother with each alpha we've done over the last year,
and it's a testament to everyone's work that we have a good probability of
shipping beta and GA later this year.

Cheers,
Andrew

On Mon, Aug 28, 2017 at 3:48 PM, Colin McCabe  wrote:

> On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> >
> > > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > >
> > > I think this gets back to the "if it's worth committing" part.
> >
> >   This brings us back to my original question:
> >
> >   "Doesn't this place an undue burden on the contributor with the
> first incompatible patch to prove worthiness?  What happens if it is
> decided that it's not good enough?"
>
> I feel like this line of argument is flawed by definition.  "What
> happens if the patch isn't worth breaking compatibility over"?  Then we
> shouldn't break compatibility over it.  We all know that most
> compatibility breaks are avoidable with enough effort.  And it's an
> effort we should make, for the good of our users.
>
> Most useful features can be implemented without compatibility breaks.
> And for the few that truly can't, the community should surely agree that
> it's worth breaking compatibility before we do it.  If it's a really
> cool feature, that approval will surely not be hard to get (I'm tempted
> to quote your earlier email about how much we love features...)
>
> >
> >   The answer, if I understand your position, is then at least a
> maybe leaning towards yes: a patch that prior to this branching policy
> change that  would have gone in without any notice now has a higher burden
> (i.e., major feature) to prove worthiness ... and in the process eliminates
> a whole class of contributors and empowers others. Thus my concern ...
> >
> > > As you mentioned, people are already breaking compatibility left and
> right as it is, which is why I wondered if it was really any better in
> practice.  Personally I'd rather find out about a major breakage sooner
> than later, since if trunk remains an active area of development at all
> times it's more likely the community will sit up and take notice when
> something crazy goes in.  In the past, trunk was not really an actively
> deployed area for over 5 years, and all sorts of stuff went in without
> people really being aware of it.
> >
> >   Given the general acknowledgement that the compatibility
> guidelines are mostly useless in reality, maybe the answer is really that
> we're doing releases all wrong.  Would it necessarily be a bad thing if we
> moved to a model where incompatible changes gradually released instead of
> one big one every seven?
>
> I haven't seen anyone "acknowledge that... compatibility guidelines are
> mostly useless"... even you.  Reading your posts from the past, I don't
> get that impression.  On the contrary, you are often upset about
> compatibility breakages.
>
> What would be positive about allowing compatibility breaks in minor
> releases?  Can you give a specific example of what would be improved?
>
> >
> >   Yes, I lived through the "walking on glass" days at Yahoo! and
> realize what I'm saying.  But I also think the rate of incompatible changes
> has slowed tremendously.  Entire groups of APIs aren't getting tossed out
> every week anymore.
> >
> > > It sounds like we agree on that part but disagree on the specifics of
> how to help trunk remain active.
> >
> >   Yup, and there is nothing wrong with that. ;)
> >
> > >  Given that historically trunk has languished for years I was hoping
> this proposal would help reduce the likelihood of it happening again.  If
> we eventually decide that cutting branch-3 now makes more sense then I'll
> do what I can to make that work well, but it would be good to see concrete
> proposals on how to avoid the problems we had with it over the last 6 years.
> >
> >
> >   Yup, agree. But proposals rarely seem to get much actual traction.
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines
> and old [VOTE] threads to realize 

2017-08-25 Hadoop 3 release status update

2017-08-25 Thread Andrew Wang
Hi all,

I've written up a status report for the current state of Hadoop 3 on the
wiki. I've also pasted it below for your convenience.

Cheers,
Andrew

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-08-25

Another month flew by without an update. This is a big one.

Red flags:

   - 11 blockers still on the dashboard, with some filed recently. Need to
   burn these down.
   - There are many branch merges proposals flying around for features that
   were not originally being tracked for beta1 and GA. Introducing new code
   always comes with risk, so I'm working with the different contributors
   involved to discuss target versions, confirm readiness, and define quality
   bars for merge.

Miscellaneous blockers:

   - HADOOP-14284  (Shade
   Guava everywhere): We have agreement to shade the yarn client JAR. Shading
   hadoop-hdfs is still being discussed.
   - HADOOP-13363  (Upgrade
   to protobuf 3): Waiting on the Guava shading first.
   - YARN-7076 : New
   blocker, we need an assignee.
   - YARN-7094  (Document
   that server-side graceful decom is currently not recommended): Robert has a
   patch up, needs review. This is a stopgap for the old blocker YARN-5464.
   - YARN-5536  (Multiple
   format support (JSON, etc.) for exclude node file in NM graceful
   decommission with timeout): Robert has a proposal that needs to be pushed
   on.

beta1 features:

   - Erasure coding
  - There are three must-dos. Two have patches, one might not be a
  must-do.
  - I pinged the pluggable policy JIRA to see if metadata and API
  compatibility is complete.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
  - Sean has HDFS rolling upgrade scripts up, waiting on Ray to add
  some YARN/MR coverage too.
  - Need to do a final runthrough of the JACC reports for YARN and HDFS.
   - Classpath isolation (HADOOP-11656)
  - We're down to the wire on this, I pinged Sean for an update.
   - Compat guide (HADOOP-13714
   )
  - I pinged the JIRA on this too, no updated patch since May

Features under discussion:

I discussed with a number of lead contributors on these features that were
previously not on my radar.

3.0.0-beta1:

   - YARN native services (Jian He)
  - I was convinced that this is very separate from the core. I'll get
  someone from Cloudera to run it through our integration tests to
verify it
  doesn't break anything downstream, then happy to merge.
   - TSv2 alpha 2 (Vrushali C)
   - Despite being called "alpha 2", this is more like "beta" in terms of
  readiness. Twitter is planning to roll it out to production. Seems quite
  done.
  - I double checked with Haibo, and he successfully ran it through our
  internal integration testing.

3.0.0 GA:

   - Resource profiles (Wangda Tan)
  - Alpha feature, APIs are not stable yet. Has some compatible PB
  changes, will verify rolling upgrade from branch-2. Touches some
core parts
  of YARN.
  - Decided that it's too close to beta1 for this, we're going to test
  it a lot and make sure it's ready for 3.0.0 GA.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.

3.1.0:

   - Storage Policy Satisfier (Uma Gangumalla)
  - We're resolving some design discussions on JIRA. Plan is to do some
  MVP work on the API to get this into 3.1, and if we're happy with the
  second phase, consider for 3.0 GA.
   - HDFS tiered storage (Chris Douglas):
   - This touches some core stuff, and the write path is still being worked
  on. Still somewhat useful with just the read path. Targeting at
3.1.0 gives
  enough time to wrap this up.


Re: Branch merges and 3.0.0-beta1 scope

2017-08-25 Thread Andrew Wang
Jonathan, thanks for the heads up. I don't have much familiarity with YARN,
but gave the PBs and pom changes a look, and left a few small comments on
the umbrella JIRA.

This seems like a smaller change than some of the other branch merges we're
discussing, but I'm again reticent about adding scope if we can avoid it.

In your mind, is this truly a "must-have" for 3.0? It looks compatible, and
thus something we could add in a minor release like 2.9 or 3.1.

Best,
Andrew

On Fri, Aug 25, 2017 at 12:31 PM, Jonathan Hung <jyhung2...@gmail.com>
wrote:

> Hi Andrew,
>
> Thanks for starting the discussion - we have a feature YARN-5734 for API
> based scheduler configuration that I feel is pretty close to merge (also "a
> few weeks"). It's almost completely code and API additions and we were
> careful to design it so that it's compatible (feature is also turned off by
> default). Hoping to get this in before 3.0.0-GA. Just wanted to send this
> note so that we are not caught off guard by this feature.
>
> Thanks!
>
>
> Jonathan Hung
>
> On Fri, Aug 25, 2017 at 11:06 AM, Wangda Tan <wheele...@gmail.com> wrote:
>
>> Resource profile is similar to TSv2, the feature is:
>> - Alpha feature, we will not freeze new added APIs. And all added APIs are
>> explicitly marked to @Unstable.
>> - Allow rolling upgrade from branch-2.
>> - Touched existing code, but we have, and will continue tests to make sure
>> changes are safe.
>>
>> Discussed with Andrew offline, we decided to not put this to beta1 since
>> beta1 is not far away. But we want to put it before GA if sufficient tests
>> are done.
>>
>> Thanks,
>> Wangda
>>
>>
>>
>> On Fri, Aug 25, 2017 at 10:54 AM, Rohith Sharma K S <
>> rohithsharm...@apache.org> wrote:
>>
>> > On 25 August 2017 at 22:39, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>> >
>> > > Hi Rohith,
>> > >
>> > > Given that we're advertising TSv2 as an alpha feature, I think we're
>> > > allowed to break compatibility. Let's make sure this is clear in the
>> > > release notes and documentation.
>> > >
>> >
>> > > That said, with TSv2 phase 2, is the API going to be frozen? The
>> umbrella
>> > > JIRA refers to "TSv2 alpha2" which indicated to me it was still
>> > alpha-level
>> > > quality and stability.
>> > >
>> > YES, We have decided to freeze API's. I do not think we make any
>> > compatibility break in future.
>> >
>> >
>> >
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> >
>>
>
>


Re: Branch merges and 3.0.0-beta1 scope

2017-08-25 Thread Andrew Wang
Here's a summary of some 1-on-1 conversations I had with contributors of
the different features I'm tracking.

Storage Policy Satisfier (Uma Gangumalla)
* Target version: 3.1.0, maybe 3.0.0 GA
* We're resolving some design discussions on JIRA. Plan is to do some MVP
work on the API to get this into 3.1, and if we're happy with the second
phase, consider for 3.0 GA.

YARN native services (Jian He)
* Target version: 3.0.0-beta1 as an alpha feature
* I was convinced that this is very separate from the core. I'll get
someone from Cloudera to run it through our integration tests to verify it
doesn't break anything downstream, then happy to merge.

Resource profiles (Wangda Tan)
* Target version: 3.0.0 GA
* Already provided update above, we're going to test it a lot and target
for GA.

HDFS router-based federation (Chris Douglas)
* Target version: 3.0.0 GA
* This is like YARN federation, very separate and doesn't add new APIs, run
in production.
* If it passes our internal integration testing, I'm fine putting this in
late.

HDFS tiered storage (Chris Douglas):
* Target version: 3.1.0
* This touches some core stuff, and the write path is still being worked
on. Still somewhat useful with just the read path. Targeting at 3.1.0 gives
enough time to wrap this up.

TSv2 phase 2 (Vrushali C)
* Target version: 3.0.0-beta1
* This is more like "beta" in terms of readiness, Twitter is planning to
roll it out to production.
* I double checked with Haibo, and he successfully ran it through our
internal integration testing.

Thanks to everyone for meeting with me on short notice, and being very
reasonable about target versions and quality bars. If I mischaracterized
any of our discussions, please reach out or comment.

The branching and versioning discussion is still proceeding. I'd ask that
the pending merge VOTEs watch this carefully; I'm hoping to resolve the
discussion and branch before the VOTEs close, but let's make sure the
branches and versions are ready before doing the actual merges.

Thanks,
Andrew

On Fri, Aug 25, 2017 at 11:06 AM, Wangda Tan <wheele...@gmail.com> wrote:

> Resource profile is similar to TSv2, the feature is:
> - Alpha feature, we will not freeze new added APIs. And all added APIs are
> explicitly marked to @Unstable.
> - Allow rolling upgrade from branch-2.
> - Touched existing code, but we have, and will continue tests to make sure
> changes are safe.
>
> Discussed with Andrew offline, we decided to not put this to beta1 since
> beta1 is not far away. But we want to put it before GA if sufficient tests
> are done.
>
> Thanks,
> Wangda
>
>
>
> On Fri, Aug 25, 2017 at 10:54 AM, Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
>
>> On 25 August 2017 at 22:39, Andrew Wang <andrew.w...@cloudera.com> wrote:
>>
>> > Hi Rohith,
>> >
>> > Given that we're advertising TSv2 as an alpha feature, I think we're
>> > allowed to break compatibility. Let's make sure this is clear in the
>> > release notes and documentation.
>> >
>>
>> > That said, with TSv2 phase 2, is the API going to be frozen? The
>> umbrella
>> > JIRA refers to "TSv2 alpha2" which indicated to me it was still
>> alpha-level
>> > quality and stability.
>> >
>> YES, We have decided to freeze API's. I do not think we make any
>> compatibility break in future.
>>
>>
>>
>> >
>> > Best,
>> > Andrew
>> >
>>
>
>


[DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Andrew Wang
Hi folks,

With 3.0.0-beta1 fast approaching, I wanted to go over the proposed
branching strategy.

In the early 2.x days, moving trunk immediately to 3.0.0 was a mistake.
branch-2 and trunk were virtually identical, and increased backport
complexity. Until we need to make incompatible changes, there's no need for
a Hadoop 4.0 version.

Thus, here's a proposal of branches and versions:

trunk: 3.1.0-SNAPSHOT
branch-3.0: 3.0.0-beta1-SNAPSHOT
branch-2 and etc: remain as is

LMK questions/comments/etc. Appreciate your attentiveness; I'm hoping to
build consensus quickly since we have a number of open VOTEs for branch
merges.

Thanks,
Andrew


Re: Branch merges and 3.0.0-beta1 scope

2017-08-25 Thread Andrew Wang
Hi Jason,

I agree with this proposal. I'll start another email thread spelling this
out, and gather additional feedback.

Best,
Andrew

On Fri, Aug 25, 2017 at 6:27 AM, Jason Lowe <jl...@oath.com> wrote:

> Andrew Wang wrote:
>
>
>> This means I'll cut branch-3 and
>> branch-3.0, and move trunk to 4.0.0 before these VOTEs end. This will open
>> up development for Hadoop 3.1.0 and 4.0.0.
>
>
> I can see a need for branch-3.0, but please do not create branch-3.  Doing
> so will relegate trunk back to the "patch purgatory" branch, a place where
> patches won't see a release for years.  Unless something is imminently
> going in that will break backwards compatibility and warrant a new 4.x
> release, I don't see the need to distinguish trunk from the 3.x line.
> Leaving trunk as the 3.x line means less branches to commit patches through
> and more testing of every patch since trunk would remain an active area for
> testing and releasing.  If we separate trunk and branch-3 then it's almost
> certain only-trunk patches will start to accumulate and never get any
> "real" testing until someone eventually decides it's time to go to Hadoop
> 4.x.  Looking back at trunk-as-3.x for an example, patches committed there
> in the early days after branch-2 was cut didn't see a release for almost 6
> years.
>
> My apologies if I've missed a feature that is just going to miss the 3.0
> release and will break compatibility when it goes in.  If so then we need
> to cut branch-3, but if not then here's my plea to hold off until we do
> need it.
>
> Jason
>
>
> On Thu, Aug 24, 2017 at 3:33 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Glad to see the discussion continued in my absence :)
>>
>> From a release management perspective, it's *extremely* reasonable to
>> block
>> the inclusion of new features a month from the planned release date. A
>> typical software development lifecycle includes weeks of feature freeze
>> and
>> weeks of code freeze. It is no knock on any developer or any feature to
>> say
>> that we should not include something in 3.0.0.
>>
>> I've been very open and clear about the goals, schedule, and scope of
>> 3.0.0
>> over the last year plus. The point of the extended alpha process was to
>> get
>> all our features in during alpha, and the alpha merge window has been open
>> for a year. I'm unmoved by arguments about how long a feature has been
>> worked on. None of these were not part of the original 3.0.0 scope, and
>> our
>> users have been waiting even longer for big-ticket 3.0 items like JDK8 and
>> HDFS EC that were part of the discussed scope.
>>
>> I see that two VOTEs have gone out since I was out. I still plan to follow
>> the proposal in my original email. This means I'll cut branch-3 and
>> branch-3.0, and move trunk to 4.0.0 before these VOTEs end. This will open
>> up development for Hadoop 3.1.0 and 4.0.0.
>>
>> I'm reaching out to the lead contributor of each of these features
>> individually to discuss. We need to close on this quickly, and email is
>> too
>> low bandwidth at this stage.
>>
>> Best,
>> Andrew
>>
>
>


Re: Branch merges and 3.0.0-beta1 scope

2017-08-25 Thread Andrew Wang
Hi Rohith,

Given that we're advertising TSv2 as an alpha feature, I think we're
allowed to break compatibility. Let's make sure this is clear in the
release notes and documentation.

That said, with TSv2 phase 2, is the API going to be frozen? The umbrella
JIRA refers to "TSv2 alpha2" which indicated to me it was still alpha-level
quality and stability.

Best,
Andrew

On Thu, Aug 24, 2017 at 11:47 PM, Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> Hi Andrew
>
> Thanks for update on release plan!
>
> I would like to discuss specifically regarding compatibility of releases.
> What is the compatibility to be maintained for GA if we don't merge to
> beta1 release? IIUC, till now all the releases were alpha where
> compatibility was not that important. All the public interfaces are
> subjected to modifications. Once we release beta, compatibility would be a
> matter.
> During this gap i.e between beta-GA release, should we maintain
> compatibility ?
> If my understanding is right then TSv2 have to be merged with beta1
> release. In TSv2 phase-2, we have compatibility changes from phase-1.
>
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 25 August 2017 at 02:03, Andrew Wang <andrew.w...@cloudera.com> wrote:
>
> > Glad to see the discussion continued in my absence :)
> >
> > From a release management perspective, it's *extremely* reasonable to
> block
> > the inclusion of new features a month from the planned release date. A
> > typical software development lifecycle includes weeks of feature freeze
> and
> > weeks of code freeze. It is no knock on any developer or any feature to
> say
> > that we should not include something in 3.0.0.
> >
> > I've been very open and clear about the goals, schedule, and scope of
> 3.0.0
> > over the last year plus. The point of the extended alpha process was to
> get
> > all our features in during alpha, and the alpha merge window has been
> open
> > for a year. I'm unmoved by arguments about how long a feature has been
> > worked on. None of these were not part of the original 3.0.0 scope, and
> our
> > users have been waiting even longer for big-ticket 3.0 items like JDK8
> and
> > HDFS EC that were part of the discussed scope.
> >
> > I see that two VOTEs have gone out since I was out. I still plan to
> follow
> > the proposal in my original email. This means I'll cut branch-3 and
> > branch-3.0, and move trunk to 4.0.0 before these VOTEs end. This will
> open
> > up development for Hadoop 3.1.0 and 4.0.0.
> >
> > I'm reaching out to the lead contributor of each of these features
> > individually to discuss. We need to close on this quickly, and email is
> too
> > low bandwidth at this stage.
> >
> > Best,
> > Andrew
> >
>


Re: Branch merges and 3.0.0-beta1 scope

2017-08-24 Thread Andrew Wang
Glad to see the discussion continued in my absence :)

>From a release management perspective, it's *extremely* reasonable to block
the inclusion of new features a month from the planned release date. A
typical software development lifecycle includes weeks of feature freeze and
weeks of code freeze. It is no knock on any developer or any feature to say
that we should not include something in 3.0.0.

I've been very open and clear about the goals, schedule, and scope of 3.0.0
over the last year plus. The point of the extended alpha process was to get
all our features in during alpha, and the alpha merge window has been open
for a year. I'm unmoved by arguments about how long a feature has been
worked on. None of these were not part of the original 3.0.0 scope, and our
users have been waiting even longer for big-ticket 3.0 items like JDK8 and
HDFS EC that were part of the discussed scope.

I see that two VOTEs have gone out since I was out. I still plan to follow
the proposal in my original email. This means I'll cut branch-3 and
branch-3.0, and move trunk to 4.0.0 before these VOTEs end. This will open
up development for Hadoop 3.1.0 and 4.0.0.

I'm reaching out to the lead contributor of each of these features
individually to discuss. We need to close on this quickly, and email is too
low bandwidth at this stage.

Best,
Andrew


Branch merges and 3.0.0-beta1 scope

2017-08-18 Thread Andrew Wang
Hi folks,

As you might have seen, we've had a number of branch merges floated this
past week targeted for 3.0.0-beta1, which is planned for about a month from
now.

In total, I'm currently tracking these branches:

YARN-2915: YARN federation (recently merged)
HADOOP-13345: S3Guard (currently being voted on)
YARN-5355: TSv2 alpha2 ("few weeks")
YARN-5079: Native services ("few weeks")
YARN-3926: Resource profiles ("few weeks")

We should effectively be in code freeze (only blockers/criticals), so the
volume of merge proposals at this point came as a surprise. Despite our
best efforts as software engineers, big code movement always comes with
risk.

Since we started the 3.0 release series close to a year ago, I'm also loath
to increase the scope. The main motivation for 3.0 was to deliver JDK8 and
HDFS EC, and our users deserve a production-quality release with these
features. We've also been good about the release cadence thus far in 3.x,
so a 3.1 isn't that far out.

Here's my proposal:

* 3.0.0-beta1 includes S3Guard and YARN federation. Target date remains
mid-Sept.
* 3.0.0-GA includes TSv2 alpha2. Target date remains early November.
* Everything else waits for 3.1, approximately March 2018.

My rationale for inclusion and exclusion is as follows:

Inclusion:

* YARN federation has been run in production, does not touch existing code,
adds no new APIs, and is off by default.
* S3Guard has been run in production and is off by default.
* The first iteration of TSv2 was shipped in 3.0.0-alpha1, so we're
committed to this for 3.0.0 GA. It's off by default and adds no impact.

Exclusion:

* The primary reason for exclusion is to maintain the planned release
schedule. Native services and resource profiles are still a few weeks from
being ready for merge.
* A reminder that 3.1 is only another 3 months after 3.0 given our release
cadence thus far. If there's demand, we could even do a 3.1 immediately
following 3.0.

I'm happy to talk with the contributors of each of these features to
understand their timelines and requirements, with the caveat that I'll be
out through Wednesday next week. Please reach out.

Best,
Andrew


Re: [DISCUSS] Merge yarn-native-services branch into trunk

2017-08-18 Thread Andrew Wang
Hi Jian, thanks for the reply,

On Thu, Aug 17, 2017 at 1:03 PM, Jian He  wrote:

> Thanks Andrew for the comments. Answers below:
>
> - There are no new APIs added in YARN/Hadoop core. In fact, all the new
> code are running outside of existing system and they are optional and
> require users to explicitly opt in. The new system’s own rest API is not
> stable and will be evolving.
>

Great! That adds a lot more confidence that this is safe to merge.

Are these new APIs listed in user documentation, and described as unstable?


> - We have been running/testing a version of the entire system internally
> for quite a while.
>

Do you mind elaborating on the level of testing? Number of nodes, types of
applications, production or test workload, etc. It'd help us build
confidence.


> - I’d like to see this in hadoop3-beta1. Of course, we’ll take
> responsibility of moving fast and not block the potential timeline.
>

Few more questions:

How should we advertise this feature in the release? Since the APIs are
unstable, I'd propose calling it "alpha" in the release notes, like we do
the TSv2.

Could you move out subtasks from YARN-5079 that are not blocking the merge?
This would make it easier to understand what's remaining.

Thanks,
Andrew


Re: [DISCUSS] Merge YARN resource profile (YARN-3926) branch into trunk

2017-08-18 Thread Andrew Wang
Hi Wangda,

Can this feature be disabled? Is it on or off by default? We're 1 month
from the target release for beta1, so I don't want to introduce risk to
existing code paths. TSv2 and S3Guard and YARN Federation are all okay in
that regard.

I'm also not clear on what work is remaining, there are a lot of unresolved
subtasks still under YARN-3926.

In terms of compatibility, the design doc talks about some PB changes. Are
these stable? Is there documentation somewhere that explains what APIs are
stable or unstable for users?

I'm going to start a separate discussion about beta1 scope. I think this is
the fourth merge proposal this week, and this amount of code movement makes
me very nervous considering that beta1 is only a month out.

Best,
Andrew

On Thu, Aug 17, 2017 at 8:27 PM, Wangda Tan  wrote:

> +hdfs/common/mr
>
> On Thu, Aug 17, 2017 at 1:28 PM, Wangda Tan  wrote:
>
> > Hi all,
> >
> > I want to hear your thoughts of merging YARN resource profile branch into
> > trunk in the next few weeks. The goal is to get it in for Hadoop 3.0
> beta1.
> >
> > *Regarding to testing:*
> > We did extensive tests for the feature in the last several months.
> > Comparing to latest trunk.
> > - For SLS benchmark: We didn't see observable performance gap from
> > simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> > containers allocated per second.
> > - For microbenchmark: We use performance test cases added by YARN-6775,
> it
> > shows around 5% performance regression comparing to trunk.
> >
> > *Regarding to API stability: *
> > Most new added @Public APIs are @Unstable (We're going to convert some
> new
> > added @Public/@Evolving to @Unstable in the cleanup JIRA as well), we
> want
> > to get this included by beta1 so we get some feedbacks before declaring
> > stable API.
> >
> > There're few pending cleanups under YARN-3926 umbrella JIRA. Besides
> these
> > cleanups, this feature works from end-to-end, we will do another
> iteration
> > of end-to-end tests after cleanup patches got committed.
> >
> > We would love to get your thoughts before opening a voting thread.
> >
> > Special thanks to a team of folks who worked hard and contributed towards
> > this efforts including design discussion / patch / reviews, etc.: Varun
> > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu,
> > Karthik Kambatla, Jason Lowe, Arun Suresh.
> >
> > Thanks,
> > Wangda Tan
> >
>


Re: [DISCUSS] Merging YARN-5355 (Timeline Service v.2) to trunk

2017-08-16 Thread Andrew Wang
Great, thanks Vrushali! Sounds good to me.

I have a few procedural release notes comments I'll put on YARN-5355, to
make sure we advertise this to our users appropriately.

On Wed, Aug 16, 2017 at 11:32 AM, Vrushali Channapattan <
vrushal...@gmail.com> wrote:

> Hi Andrew,
>
> Thanks for your response!
>
> There have been no changes to existing APIs since alpha1.
>
> We at Twitter have tested the feature to demonstrate it works at what we
> consider moderate scale but this did not include the security related
> testing. The security testing is in progress at present by Timeline Service
> V2 team in the community and we think we will have more details on this
> very soon.
>
> About the jiras under YARN-5355: Only 3 of those sub-tasks are what we
> think of as "merge-blockers". The issues being targeted for merge are in
> [link1] below. There are about 59 jiras of which 56 are completed.
>
> We plan to make a new umbrella jira after the merge to trunk. We will then
> create a new branch with the new jira name and move these open jiras under
> YARN-5355 as subtasks of that new umbrella jira.
>
> thanks
> Vrushali
> [link1] https://issues.apache.org/jira/projects/YARN/versions/12337991
>
>
> On Wed, Aug 16, 2017 at 10:47 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Hi Vrushali,
>>
>> Glad to hear this major dev milestone is nearing completion!
>>
>> Repeating my request on other merge [DISCUSS] threads, could you comment
>> on testing and API stability of this merge? Our timeline for beta1 is about
>> a month out, so there's not much time to fix things beforehand.
>>
>> Looking at YARN-5355 there are also many unresolved subtasks. Should most
>> of these be moved out to a new umbrella? I'm wondering what needs to be
>> completed before sending the merge vote.
>>
>> Given that TSv2 is committed for 3.0.0 GA, I'm more willing to flex the
>> beta1 release date for this feature than others. Hopefully that won't be
>> necessary though :)
>>
>> Best,
>> Andrew
>>
>> On Wed, Aug 16, 2017 at 10:26 AM, Vrushali Channapattan <
>> vrushalic2...@gmail.com> wrote:
>>
>>> Looks like some of the hyperlinks appear messed up, my apologies,
>>> resending
>>> the same email with hopefully better looking content:
>>>
>>> Hi All,
>>>
>>> I'd like to open a discussion for merging Timeline Service v2 (YARN-5355)
>>> to trunk in a few weeks.
>>>
>>> We have previously completed one merge onto trunk [1] and Timeline
>>> Service
>>> v2 has been part of Hadoop release 3.0.0-alpha1.
>>>
>>> Since then, we have been working on extending the capabilities of
>>> Timeline
>>> Service v2 in a feature branch [2].  There are a few related issues
>>> pending
>>> that are being actively worked upon and tested. As soon as they are
>>> resolved, we plan on starting a merge vote within the next two weeks. The
>>> goal is to get this into hadoop3 beta.
>>>
>>> We have paid close attention to ensure that  once disabled Timeline
>>> Service
>>> v2 does not impact existing functionality when disabled (by default).
>>>
>>> At a high level, following are the key features that have been
>>> implemented
>>> since 3.0.0-alpha1:
>>> - Security (via Kerberos Authentication & delegation tokens) [YARN-3053]
>>> - Timeline server usability improvements [timeline-server
>>> <https://issues.apache.org/jira/browse/YARN-5715?jql=project
>>> %20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-5355%20AND%20c
>>> omponent%20%3D%20timelineserver%20AND%20labels%20!%3D%20atsv
>>> 2-hbase%20ORDER%20BY%20updated%20ASC%2C%20priority%20DESC%
>>> 2C%20created%20ASC>
>>> ]
>>> - HBase specific improvements [atsv2-hbase
>>> <https://issues.apache.org/jira/browse/YARN-6604?jql=project
>>> %20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-5355%20AND%20l
>>> abels%20%3D%20atsv2-hbase%20ORDER%20BY%20updated%20DESC%2C%
>>> 20affectedVersion%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
>>> ]
>>> - REST API additions and improvements [timeline-reader
>>> <https://issues.apache.org/jira/browse/YARN-5739?jql=project
>>> %20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-5355%20AND%20c
>>> omponent%20in%20(timelineclient%2C%20timelinereader)%
>>> 20ORDER%20BY%20updated%20ASC%2C%20priority%20DESC%2C%20created%20ASC>
>>> ]
>>> - Reader side simple authorization via whitelist [YARN-6820

Re: [DISCUSS] Merging YARN-5355 (Timeline Service v.2) to trunk

2017-08-16 Thread Andrew Wang
Hi Vrushali,

Glad to hear this major dev milestone is nearing completion!

Repeating my request on other merge [DISCUSS] threads, could you comment on
testing and API stability of this merge? Our timeline for beta1 is about a
month out, so there's not much time to fix things beforehand.

Looking at YARN-5355 there are also many unresolved subtasks. Should most
of these be moved out to a new umbrella? I'm wondering what needs to be
completed before sending the merge vote.

Given that TSv2 is committed for 3.0.0 GA, I'm more willing to flex the
beta1 release date for this feature than others. Hopefully that won't be
necessary though :)

Best,
Andrew

On Wed, Aug 16, 2017 at 10:26 AM, Vrushali Channapattan <
vrushalic2...@gmail.com> wrote:

> Looks like some of the hyperlinks appear messed up, my apologies, resending
> the same email with hopefully better looking content:
>
> Hi All,
>
> I'd like to open a discussion for merging Timeline Service v2 (YARN-5355)
> to trunk in a few weeks.
>
> We have previously completed one merge onto trunk [1] and Timeline Service
> v2 has been part of Hadoop release 3.0.0-alpha1.
>
> Since then, we have been working on extending the capabilities of Timeline
> Service v2 in a feature branch [2].  There are a few related issues pending
> that are being actively worked upon and tested. As soon as they are
> resolved, we plan on starting a merge vote within the next two weeks. The
> goal is to get this into hadoop3 beta.
>
> We have paid close attention to ensure that  once disabled Timeline Service
> v2 does not impact existing functionality when disabled (by default).
>
> At a high level, following are the key features that have been implemented
> since 3.0.0-alpha1:
> - Security (via Kerberos Authentication & delegation tokens) [YARN-3053]
> - Timeline server usability improvements [timeline-server
>  project%20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-
> 5355%20AND%20component%20%3D%20timelineserver%20AND%
> 20labels%20!%3D%20atsv2-hbase%20ORDER%20BY%20updated%20ASC%
> 2C%20priority%20DESC%2C%20created%20ASC>
> ]
> - HBase specific improvements [atsv2-hbase
>  project%20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-
> 5355%20AND%20labels%20%3D%20atsv2-hbase%20ORDER%20BY%20updated%20DESC%2C%
> 20affectedVersion%20DESC%2C%20priority%20DESC%2C%20created%20ASC>
> ]
> - REST API additions and improvements [timeline-reader
>  project%20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-
> 5355%20AND%20component%20in%20(timelineclient%2C%
> 20timelinereader)%20ORDER%20BY%20updated%20ASC%2C%20priority%20DESC%2C%
> 20created%20ASC>
> ]
> - Reader side simple authorization via whitelist [YARN-6820]
>
> We would love to get your thoughts on this before we open a real voting
> thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this effort via patches, reviews and guidance: Rohith Sharma K S, Varun
> Saxena, Haibo Chen, Sangjin Lee, Li Lu, Vinod Kumar Vavilapalli, Joep
> Rottinghuis, Jason Lowe, Jian He, Robert Kanter, Micheal Stack.
>
> Thanks
> Vrushali
> [1] Merge to trunk: http://www.mail-archive.com/yarn-dev@hadoop.apache.
> org/msg23897.html
> [2] feature branch YARN-5355 commits: https://github.com/ap
> ache/hadoop/commits/YARN-5355
> 
>
>
>
>
>
>
> On Wed, Aug 16, 2017 at 10:02 AM, Vrushali Channapattan <
> vrushal...@gmail.com> wrote:
>
> > Hi All
> >
> > I’d like to open a discussion for merging Timeline Service v.2
> (YARN-5355)
> > to trunk in a few weeks.
> >
> > We have previously completed one merge onto trunk [1] and Timeline
> Service
> > v2 has been part of Hadoop release 3.0.0-alpha1.
> >
> > Since then, we have been working on extending the capabilities of
> Timeline
> > Service v2 in a feature branch [2]. There are a few related issues
> pending
> > that are being actively worked upon & tested. As soon as they are
> resolved,
> > we plan on starting a merge vote within the next 2 weeks. The goal is to
> > get this in for Hadoop3 beta.
> >
> > We have paid close attention to ensure that once disabled Timeline
> Service
> > v.2 does not impact existing functionality when disabled (by default).
> >
> > At a high level, following are the key features that have been
> implemented
> > since 3.0.0-alpha1:
> > - Security (via Kerberos Authentication & delegation tokens) at the
> writer
> > [YARN-3053]
> > - Timeline server usability improvements [timeline-server
> >  > project%20%3D%20YARN%20AND%20fixVersion%20%3D%20YARN-
> > 5355%20AND%20component%20%3D%20timelineserver%20AND%
> > 20labels%20%21%3D%20atsv2-hbase%20ORDER%20BY%20updated%
> > 20ASC%2C%20priority%20DESC%2C%20created%20ASC>
> > ]
> > - HBase specific improvements [atsv2-hbase
> > 

Re: [DISCUSS] Merge yarn-native-services branch into trunk

2017-08-16 Thread Andrew Wang
Hi Jian,

Hadoop 3.0.0-beta1 is planned for mid-September. If the plan is to merge in
hopefully the next two weeks, that's very, very close to the goal release
date. We've already got a pile of blockers and criticals to resolve before
then.

Could you comment on testing and API stability for this branch? YARN
Federation was run at high scale and did not add new APIs, which provided a
lot of confidence in the merge.

I'll also raise the option of cutting branch-3 or branch-3.0 for the 3.0.0
efforts, and targeting this for 3.1.0.

Best,
Andrew

On Tue, Aug 15, 2017 at 1:56 PM, Jian He  wrote:

> Hi All,
> I would like to bring up the discussion of merging yarn-native-services
> branch into trunk in a few weeks. There are a few issues left under
> YARN-5079 that are being actively worked upon. As soon as they are
> resolved, we plan on start a vote hopefully in next 2 weeks. The goal is to
> get this in for hadoop3 beta.
>
> The major work in this branch include below umbrella jiras:
>  - YARN-5079. A native YARN framework (ApplicationMaster) to migrate and
> orchestrate existing services to YARN either docker or non-docker based.
>  - YARN-4793. A Rest API server for user to deploy a service via a simple
> JSON spec
>  - YARN-4757. Extending today's service registry with a simple DNS service
> to enable users to discover services deployed on YARN
>  - YARN-6419. UI support for native-services on the new YARN UI
> All these new services are optional and have to be explicitly enabled.
>
> Special thanks to a team of folks who worked hard towards this: Billie
> Rinaldi, Gour Saha, Vinod Kumar Vavilapalli, Jonathan Maron, Rohith Sharma
> K S, Sunil G, Akhil PB. This effort could not be possible without their
> ideas and hard work.
>
> Please share your thoughts. Thanks.
>
> Jian
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: Are binary artifacts are part of a release?

2017-08-15 Thread Andrew Wang
To close the thread on this, I'll try to summarize the LEGAL JIRA. I wasn't
able to convince anyone to make changes to the apache.org docs.

Convenience binary artifacts are not official release artifacts and thus
are not voted on. However, since they are distributed by Apache, they are
still subject to the same distribution requirements as official release
artifacts. This means they need to have a LICENSE and NOTICE file, follow
ASF licensing rules, etc. The PMC needs to ensure that binary artifacts
meet these requirements.

However, being a "convenience" artifact doesn't mean it isn't important.
The appropriate level of quality for binary artifacts is left up to the
project. An OpenOffice person mentioned the quality of their binary
artifacts is super important since very few of their users will compile
their own office suite.

I don't know if we've discussed the topic of binary artifact quality in
Hadoop. My stance is that if we're going to publish something, it should be
good, or we shouldn't publish it at all. I think we do want to publish
binary tarballs (it's the easiest way for new users to get started with
Hadoop), so it's fair to consider them when evaluating a release.

Best,
Andrew

On Mon, Jul 31, 2017 at 8:43 PM, Konstantin Shvachko <shv.had...@gmail.com>
wrote:

> It does not. Just adding historical references, as Andrew raised the
> question.
>
> On Mon, Jul 31, 2017 at 7:38 PM, Allen Wittenauer <
> a...@effectivemachines.com> wrote:
>
>>
>> ... that doesn't contradict anything I said.
>>
>> > On Jul 31, 2017, at 7:23 PM, Konstantin Shvachko <shv.had...@gmail.com>
>> wrote:
>> >
>> > The issue was discussed on several occasions in the past.
>> > Took me a while to dig this out as an example:
>> > http://mail-archives.apache.org/mod_mbox/hadoop-general/2011
>> 11.mbox/%3C4EB0827C.6040204%40apache.org%3E
>> >
>> > Doug Cutting:
>> > "Folks should not primarily evaluate binaries when voting. The ASF
>> primarily produces and publishes source-code
>> > so voting artifacts should be optimized for evaluation of that."
>> >
>> > Thanks,
>> > --Konst
>> >
>> > On Mon, Jul 31, 2017 at 4:51 PM, Allen Wittenauer <
>> a...@effectivemachines.com> wrote:
>> >
>> > > On Jul 31, 2017, at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>> > >
>> > > Forking this off to not distract from release activities.
>> > >
>> > > I filed https://issues.apache.org/jira/browse/LEGAL-323 to get
>> clarity on the matter. I read the entire webpage, and it could be improved
>> one way or the other.
>> >
>> >
>> > IANAL, my read has always lead me to believe:
>> >
>> > * An artifact is anything that is uploaded to dist.a.o
>> and repository.a.o
>> > * A release consists of one or more artifacts
>> ("Releases are, by definition, anything that is published beyond the group
>> that owns it. In our case, that means any publication outside the group of
>> people on the product dev list.")
>> > * One of those artifacts MUST be source
>> > * (insert voting rules here)
>> > * They must be built on a machine in control of the RM
>> > * There are no exceptions for alpha, nightly, etc
>> > * (various other requirements)
>> >
>> > i.e., release != artifact  it's more like release =
>> artifact * n .
>> >
>> > Do you have to have binaries?  No (e.g., Apache SpamAssassin
>> has no binaries to create).  But if you place binaries in dist.a.o or
>> repository.a.o, they are effectively part of your release and must follow
>> the same rules.  (Votes, etc.)
>> >
>> >
>>
>>
>


Are binary artifacts are part of a release?

2017-07-31 Thread Andrew Wang
Forking this off to not distract from release activities.

I filed https://issues.apache.org/jira/browse/LEGAL-323 to get clarity on
the matter. I read the entire webpage, and it could be improved one way or
the other.

Best,
Andrew

On Mon, Jul 31, 2017 at 3:56 PM, Chris Douglas <cdoug...@apache.org> wrote:

> On Mon, Jul 31, 2017 at 3:02 PM, Konstantin Shvachko
> <shv.had...@gmail.com> wrote:
> > For the packaging, here is the exact phrasing from the sited
> release-policy
> > document relevant to binaries:
> > "As a convenience to users that might not have the appropriate tools to
> > build a compiled version of the source, binary/bytecode packages MAY be
> > distributed alongside official Apache releases. In all such cases, the
> > binary/bytecode package MUST have the same version number as the source
> > release and MUST only add binary/bytecode files that are the result of
> > compiling that version of the source code release and its dependencies."
> > I don't think my binary package violates any of these.
>
> +1 The PMC VOTE applies to source code, only. If someone wants to
> rebuild the binary tarball with native libs and replace this one,
> that's fine.
>
> My reading of the above is that source code must be distributed with
> binaries, not that we omit the source code from binary releases... -C
>
> > But I'll upload an additional tar.gz with native bits and no src, as you
> > guys requested.
> > Will keep it as RC0 as there is no source code change and it comes from
> the
> > same build.
> > Hope this is satisfactory.
> >
> > Thanks,
> > --Konstantin
> >
> > On Mon, Jul 31, 2017 at 1:53 PM, Andrew Wang <andrew.w...@cloudera.com>
> > wrote:
> >
> >> I agree with Brahma on the two issues flagged (having src in the binary
> >> tarball, missing native libs). These are regressions from prior
> releases.
> >>
> >> As an aside, "we release binaries as a convenience" doesn't relax the
> >> quality bar. The binaries are linked on our website and distributed
> through
> >> official Apache channels. They have to adhere to Apache release
> >> requirements. And, most users consume our work via Maven dependencies,
> >> which are binary artifacts.
> >>
> >> http://www.apache.org/legal/release-policy.html goes into this in more
> >> detail. A release must minimally include source packages, and can also
> >> include binary artifacts.
> >>
> >> Best,
> >> Andrew
> >>
> >> On Mon, Jul 31, 2017 at 12:30 PM, Konstantin Shvachko <
> >> shv.had...@gmail.com> wrote:
> >>
> >>> To avoid any confusion in this regard. I built RC0 manually in
> compliance
> >>> with Apache release policy
> >>> http://www.apache.org/legal/release-policy.html
> >>> I edited the HowToReleasePreDSBCR page to make sure people don't use
> >>> Jenkins option for building.
> >>>
> >>> A side note. This particular build is broken anyways, so no worries
> there.
> >>> I think though it would be useful to have it working for testing and
> as a
> >>> packaging standard.
> >>>
> >>> Thanks,
> >>> --Konstantin
> >>>
> >>> On Mon, Jul 31, 2017 at 11:40 AM, Allen Wittenauer <
> >>> a...@effectivemachines.com
> >>> > wrote:
> >>>
> >>> >
> >>> > > On Jul 31, 2017, at 11:20 AM, Konstantin Shvachko <
> >>> shv.had...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > https://wiki.apache.org/hadoop/HowToReleasePreDSBCR
> >>> >
> >>> > FYI:
> >>> >
> >>> > If you are using ASF Jenkins to create an ASF release
> >>> > artifact, it's pretty much an automatic vote failure as any such
> >>> release is
> >>> > in violation of ASF policy.
> >>> >
> >>> >
> >>>
> >>
> >>
>


[jira] [Created] (MAPREDUCE-6924) Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875

2017-07-31 Thread Andrew Wang (JIRA)
Andrew Wang created MAPREDUCE-6924:
--

 Summary: Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875
 Key: MAPREDUCE-6924
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6924
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0-alpha1
Reporter: Andrew Wang
Assignee: Junping Du


Filing this JIRA so the reverts show up in the changelog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6924) Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875

2017-07-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved MAPREDUCE-6924.

   Resolution: Fixed
Fix Version/s: 3.0.0-beta1

Resolving this changelog tracking JIRA. Thanks to [~djp] for doing the reverts!

> Revert MAPREDUCE-6199 MAPREDUCE-6286 and MAPREDUCE-5875
> ---
>
> Key: MAPREDUCE-6924
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6924
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>    Reporter: Andrew Wang
>Assignee: Junping Du
> Fix For: 3.0.0-beta1
>
>
> Filing this JIRA so the reverts show up in the changelog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.7.4 (RC0)

2017-07-31 Thread Andrew Wang
I agree with Brahma on the two issues flagged (having src in the binary
tarball, missing native libs). These are regressions from prior releases.

As an aside, "we release binaries as a convenience" doesn't relax the
quality bar. The binaries are linked on our website and distributed through
official Apache channels. They have to adhere to Apache release
requirements. And, most users consume our work via Maven dependencies,
which are binary artifacts.

http://www.apache.org/legal/release-policy.html goes into this in more
detail. A release must minimally include source packages, and can also
include binary artifacts.

Best,
Andrew

On Mon, Jul 31, 2017 at 12:30 PM, Konstantin Shvachko 
wrote:

> To avoid any confusion in this regard. I built RC0 manually in compliance
> with Apache release policy
> http://www.apache.org/legal/release-policy.html
> I edited the HowToReleasePreDSBCR page to make sure people don't use
> Jenkins option for building.
>
> A side note. This particular build is broken anyways, so no worries there.
> I think though it would be useful to have it working for testing and as a
> packaging standard.
>
> Thanks,
> --Konstantin
>
> On Mon, Jul 31, 2017 at 11:40 AM, Allen Wittenauer <
> a...@effectivemachines.com
> > wrote:
>
> >
> > > On Jul 31, 2017, at 11:20 AM, Konstantin Shvachko <
> shv.had...@gmail.com>
> > wrote:
> > >
> > > https://wiki.apache.org/hadoop/HowToReleasePreDSBCR
> >
> > FYI:
> >
> > If you are using ASF Jenkins to create an ASF release
> > artifact, it's pretty much an automatic vote failure as any such release
> is
> > in violation of ASF policy.
> >
> >
>


3.0.0-beta1 release plan and branch cuts

2017-07-28 Thread Andrew Wang
Hi all,

Here's a long overdue update on Hadoop 3. We've made great progress through
the alpha releases, thanks to the hard work of numerous contributors. With
alpha4 out the door, it's time to look toward beta1!

I updated the wiki page with a proposed release date for beta1: September
15th.

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release

I went through the JIRA dashboard and pinged the blockers that looked
stalled. I also wrote a status update, and will be resuming more regular
updates as we close on beta1.

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

It also sounds like there are some pending branch merges. I'm keeping my
eye on them, with an eye toward not breaking existing functionality and
compatibility. Please don't be offended if I ask that we delay your feature
until after 3.0 GA. We've been baking the current 3.0 feature set with the
alphas, and I don't want to possibly destabilize things when we're so close
to beta/GA.

Finally, a note on branching. I've been cutting release branches directly
off of trunk, to reduce the number of cherry-picks for backporting.

Since beta1 represents a point of stability, I plan to cut branch-3 shortly
before the release, and then cut the 3.0.0-beta1 release branch from
branch-3. trunk would move to 4.0.0-SNAPSHOT.

For GA, I plan to cut branch-3.0 off of branch-3, with branch-3 becoming
3.1.0-SNAPSHOT. At this point, the scheme will be the same as the branch-2s.

If needed, we can do these branch cuts sooner to unblock branch merges.

As always, please reach out with any comments/questions.

Cheers,
Andrew


Re: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-07-05 Thread Andrew Wang
Thanks for testing Shane! Given that the Docker work is experimental
(YARN-6622 will fully document this), I think we're okay shipping with some
known issues.

On Wed, Jul 5, 2017 at 8:50 AM, Shane Kumpf <shane.kumpf.apa...@gmail.com>
wrote:

> Thanks for putting this together. I was able to build the RC, setup a
> pseudo distributed cluster, and run pi and dshell tests.
>
> There is an outstanding issue in YARN that impacts obtaining the network
> details and signal handling when running with the Docker container runtime.
> These will be addressed in YARN-5670. I'm not sure this is a blocker, as
> docker containers can still be run, but with degraded functionality. I'm +1
> if others feel this isn't a blocker.
>
> Thanks,
> -Shane
>
> On Fri, Jun 30, 2017 at 10:29 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> Thanks for taking a look Steve,
>>
>> On Fri, Jun 30, 2017 at 4:59 AM, Steve Loughran <ste...@hortonworks.com>
>> wrote:
>>
>> >
>> > On 30 Jun 2017, at 03:40, Andrew Wang <andrew.w...@cloudera.com> wrote:
>> >
>> > Hi all,
>> >
>> > As always, thanks to the many, many contributors who helped with this
>> > release! I've prepared an RC0 for 3.0.0-alpha4:
>> >
>> > http://home.apache.org/~wang/3.0.0-alpha4-RC0/
>> >
>> >
>> > whats the git commit ID? or signed tag?
>> >
>>
>> I'd already pushed a signed tag per the HowToRelease instructions, LMK if
>> it's not right.
>>
>> >
>> >
>> > The standard 5-day vote would run until midnight on Tuesday, July 4th.
>> > Given that July 4th is a holiday in the US, I expect this vote might
>> have
>> > to be extended, but I'd like to close the vote relatively soon after.
>> >
>> > I've done my traditional testing of a pseudo-distributed cluster with a
>> > single task pi job, which was successful.
>> >
>> > Normally my testing would end there, but I'm slightly more confident
>> this
>> > time. At Cloudera, we've successfully packaged and deployed a snapshot
>> from
>> > a few days ago, and run basic smoke tests. Some bugs found from this
>> > include HDFS-11956, which fixes backwards compat with Hadoop 2 clients,
>> and
>> > the revert of HDFS-11696, which broke NN QJM HA setup.
>> >
>> > Vijay is working on a test run with a fuller test suite (the results of
>> > which we can hopefully post soon).
>> >
>> > My +1 to start,
>> >
>> > Best,
>> > Andrew
>> >
>> >
>> >
>> > I haven't tested with this alpha, but spark 2.2+ can't read files on
>> > Azure: https://issues.apache.org/jira/browse/HADOOP-14598
>> >
>> > Cause: https://issues.apache.org/jira/browse/HADOOP-14383
>> >
>> > I can see how to do a test we can put into hadoop-common (for
>> trunk/2.9);
>> > if the bug exists in this RC then I think we'll need to revert the
>> > HADOOP-14598 patch for now & then work on a fix
>> >
>> > Sorry —only discovered it this week...an unexpected combination of
>> > features which only become visible when you take the latest bits of two
>> > projects and then run some blobstore tests downstream (
>> https://github.com/
>> > hortonworks-spark/cloud-integration/tree/master/cloud-examples/)
>> >
>> > Ick, I somehow didn't see this blocker on the release dashboard for
>> alpha4, but seems like it's targeted now.
>>
>> If it only affects WASB, then I think we can get by without it for the
>> alpha, fix it in beta1. WDYT?
>>
>> Best,
>> Andrew
>>
>
>


Re: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-06-30 Thread Andrew Wang
Thanks for taking a look Steve,

On Fri, Jun 30, 2017 at 4:59 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 30 Jun 2017, at 03:40, Andrew Wang <andrew.w...@cloudera.com> wrote:
>
> Hi all,
>
> As always, thanks to the many, many contributors who helped with this
> release! I've prepared an RC0 for 3.0.0-alpha4:
>
> http://home.apache.org/~wang/3.0.0-alpha4-RC0/
>
>
> whats the git commit ID? or signed tag?
>

I'd already pushed a signed tag per the HowToRelease instructions, LMK if
it's not right.

>
>
> The standard 5-day vote would run until midnight on Tuesday, July 4th.
> Given that July 4th is a holiday in the US, I expect this vote might have
> to be extended, but I'd like to close the vote relatively soon after.
>
> I've done my traditional testing of a pseudo-distributed cluster with a
> single task pi job, which was successful.
>
> Normally my testing would end there, but I'm slightly more confident this
> time. At Cloudera, we've successfully packaged and deployed a snapshot from
> a few days ago, and run basic smoke tests. Some bugs found from this
> include HDFS-11956, which fixes backwards compat with Hadoop 2 clients, and
> the revert of HDFS-11696, which broke NN QJM HA setup.
>
> Vijay is working on a test run with a fuller test suite (the results of
> which we can hopefully post soon).
>
> My +1 to start,
>
> Best,
> Andrew
>
>
>
> I haven't tested with this alpha, but spark 2.2+ can't read files on
> Azure: https://issues.apache.org/jira/browse/HADOOP-14598
>
> Cause: https://issues.apache.org/jira/browse/HADOOP-14383
>
> I can see how to do a test we can put into hadoop-common (for trunk/2.9);
> if the bug exists in this RC then I think we'll need to revert the
> HADOOP-14598 patch for now & then work on a fix
>
> Sorry —only discovered it this week...an unexpected combination of
> features which only become visible when you take the latest bits of two
> projects and then run some blobstore tests downstream (https://github.com/
> hortonworks-spark/cloud-integration/tree/master/cloud-examples/)
>
> Ick, I somehow didn't see this blocker on the release dashboard for
alpha4, but seems like it's targeted now.

If it only affects WASB, then I think we can get by without it for the
alpha, fix it in beta1. WDYT?

Best,
Andrew


[VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-06-29 Thread Andrew Wang
Hi all,

As always, thanks to the many, many contributors who helped with this
release! I've prepared an RC0 for 3.0.0-alpha4:

http://home.apache.org/~wang/3.0.0-alpha4-RC0/

The standard 5-day vote would run until midnight on Tuesday, July 4th.
Given that July 4th is a holiday in the US, I expect this vote might have
to be extended, but I'd like to close the vote relatively soon after.

I've done my traditional testing of a pseudo-distributed cluster with a
single task pi job, which was successful.

Normally my testing would end there, but I'm slightly more confident this
time. At Cloudera, we've successfully packaged and deployed a snapshot from
a few days ago, and run basic smoke tests. Some bugs found from this
include HDFS-11956, which fixes backwards compat with Hadoop 2 clients, and
the revert of HDFS-11696, which broke NN QJM HA setup.

Vijay is working on a test run with a fuller test suite (the results of
which we can hopefully post soon).

My +1 to start,

Best,
Andrew


Heads up: branching 3.0.0-alpha4, use -beta1 for new commits to trunk

2017-06-29 Thread Andrew Wang
Hi folks,

I'm in the process of moving out all the JIRA versions from 3.0.0-alpha4 to
3.0.0-beta1 in preparation for a 3.0.0-alpha4 release. I'm hoping to get an
RC up tomorrow for a vote, with possibly extending it given the holiday
week in the US for July 4th.

Please use the beta1 target/fix version from now on unless committing to
branch-3.0.0-alpha4. We're not planning on an alpha5.

Best,
Andrew


3.0.0-alpha3 JIRA version has been renamed to 3.0.0-alpha4

2017-05-26 Thread Andrew Wang
Hi all,

The Hadoop PMC is reserving the 3.0.0-alpha3 version for an upcoming
release. I've renamed the versions already in JIRA. More information on
this to come next week.

Please use the renamed "3.0.0-alpha4" version for commits to trunk from
here on out.

Thanks,
Andrew


Reminder to always set x.0.0 and x.y.0 fix versions when backporting

2017-05-07 Thread Andrew Wang
Hi folks,

I've noticed with the backporting efforts for 2.8.1, we're losing some
x.y.0 fix versions (e.g. 2.9.0). Our fix version scheme is described here
(also quoted)

https://hadoop.apache.org/versioning.html

   1. For each *minor* release line, set the *lowest unreleased a.b.c
   version, where c ≥ 0*.
   2. For each *major* release line, set the *lowest unreleased a.b.0
   version*.

This JIRA query for instance turns up 44 JIRAs with fix versions 2.8.1 or
2.8.2 and not 2.9.0:

https://issues.apache.org/jira/issues/?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20and%20fixVersion%20in%20(%222.8.1%22%2C%20%222.8.2%22)%20and%20fixVersion%20!%3D%20%222.9.0%22

Best,
Andrew


Re: About 2.7.4 Release

2017-05-02 Thread Andrew Wang
Can we wait for 2.7.4 first? There are still backports happening to
branch-2.7. After that, there shouldn't be many backports since both 2.8.x
and 2.7.x will be up-to-date with what's in 3.0.0-alpha1 and 3.0.0-alpha2.


On Tue, May 2, 2017 at 4:07 PM, Allen Wittenauer 
wrote:

>
> Is there any reason to not Close -alpha1+resolved state JIRAs?  It's been
> quite a while and those definitely should not getting re-opened anymore.
> What about -alpha2's that are also resolved?


Re: About 2.7.4 Release

2017-05-01 Thread Andrew Wang
On Mon, May 1, 2017 at 3:44 PM, Allen Wittenauer <a...@effectivemachines.com>
wrote:

>
> > On May 1, 2017, at 2:27 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > I believe I asked about this on dev-yetus a while back. I'd prefer that
> the presence of the fix version be sufficient to indicate whether a JIRA is
> included in a release branch. Yetus requires that the JIRA be resolved as
> "Fixed" to show up, which is why we are in our current situation.
>
> We can't do this because Hadoop is the only one that I've seen
> that sets Fix version at close time.  Everyone else is setting fix version
> in place of target (which is a custom field, iirc).
>
> Let's see if I can revive the discussion over on a yetus list/jira. I
think it's easier to add a new flag to Yetus than changing the Hadoop JIRA
workflow, and it seems like this issue is becoming more acute.


Re: About 2.7.4 Release

2017-05-01 Thread Andrew Wang
I didn't close JIRAs after the 3.0.0-alpha1 or alpha2 releases since
closing makes the JIRAs read-only. This makes it more annoying to backport
to older releases and for concurrent releases in general.

I believe I asked about this on dev-yetus a while back. I'd prefer that the
presence of the fix version be sufficient to indicate whether a JIRA is
included in a release branch. Yetus requires that the JIRA be resolved as
"Fixed" to show up, which is why we are in our current situation.

On Thu, Apr 27, 2017 at 12:47 AM, Akira Ajisaka  wrote:

> Thanks Allen for the additional information.
>
> > At one point, JIRA was configured to refuse re-opening after a release
> is cut.
>
> In the past, release manager closed the tickets and the process is written
> in the wiki: https://wiki.apache.org/hadoop/HowToRelease
>
> > 10. In JIRA, close issues resolved in the release. Disable mail
> notifications for this bulk change.
>
> Therefore, let's close them.
>
> On 2017/04/27 3:01, Allen Wittenauer wrote:
>
>>
>> On Apr 25, 2017, at 12:35 AM, Akira Ajisaka  wrote:
>>>
 Maybe we should create a jira to track this?

>>>
>>> I think now either way (reopen or create) is fine.
>>>
>>> Release doc maker creates change logs by fetching information from JIRA,
>>> so reopening the tickets should be avoided when a release process is in
>>> progress.
>>>
>>>
>> Keep in mind that the release documentation is part of the build
>> process.  Users who are doing their own builds will have incomplete
>> documentation if we keep re-opening JIRAs after a release.  At one point,
>> JIRA was configured to refuse re-opening after a release is cut.  I'm not
>> sure why it stopped doing that, but it might be time to see if we can
>> re-enable that functionality.
>>
>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: Automated documentation build for Apache Hadoop

2017-04-03 Thread Andrew Wang
Nice work Akira! Appreciate the help with trunk development.

On Mon, Apr 3, 2017 at 1:56 AM, Akira Ajisaka  wrote:

> Hi folks,
>
> I've created a repository to build and push Apache Hadoop document (trunk)
> via Travis CI.
> https://github.com/aajisaka/hadoop-document
>
> The document is updated daily by Travis CI cron job.
> https://aajisaka.github.io/hadoop-document/hadoop-project/
>
> Hope it helps!
>
> Regards,
> Akira
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Changing the default class path for clients

2017-04-03 Thread Andrew Wang
Thanks for digging that up. I agree with your analysis of our public
documentation, though we still need a transition path. Officially, our
classpath is not covered by compatibility, though we know that in reality,
classpath changes are quite impactful to users.

While we were having a related discussion on YARN container classpath
isolation, the plan was to still provide the existing set of JARs by
default, with applications having to explicitly opt-in to a clean
classpath. This feels similar.

How do you feel about providing e.g. `hadoop userclasspath` and `hadoop
daemonclasspath`, and having `hadoop classpath` continue to default to
`daemonclasspath` for now? We could then deprecate+remove `hadoop
classpath` in a future release.

On Mon, Apr 3, 2017 at 11:08 AM, Allen Wittenauer <a...@effectivemachines.com>
wrote:

>
> 1.0.4:
>
> "Prints the class path needed to get the Hadoop jar and the
> required libraries.”
>
>  2.8.0 and 3.0.0:
>
> "Prints the class path needed to get the Hadoop jar and the
> required libraries. If called without arguments, then prints the classpath
> set up by the command scripts, which is likely to contain wildcards in the
> classpath entries.”
>
> I would take that to mean “what gives me all the public APIs?”
> Which, by definition, should all be in hadoop-client-runtime (with the
> possible exception of the DistributedFileSystem Quota APIs, since for some
> reason those are marked public.)
>
> Let me ask it a different way:
>
> Why should ‘yarn jar’, ‘mapred jar’, ‘hadoop distcp’, ‘hadoop fs’,
> etc, etc, etc, have anything but hadoop-client-runtime as the provided jar?
> Yes, some things might break, but given this is 3.0, some changes should be
> expected anyway. Given the definition above "needed to get the Hadoop jar
> and the required libraries”  switching this over seems correct.
>
>
> > On Apr 3, 2017, at 10:37 AM, Esteban Gutierrez <este...@cloudera.com>
> wrote:
> >
> >
> > I agreed with Andrew too. Users have relied for years on `hadoop
> classpath` for their script to launch jobs or other tools, perhaps no the
> best idea to change the behavior without providing a proper deprecation
> path.
> >
> > thanks!
> > esteban.
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Mon, Apr 3, 2017 at 10:26 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > What's the current contract for `hadoop classpath`? Would it be safer to
> > introduce `hadoop userclasspath` or similar for this behavior?
> >
> > I'm betting that changing `hadoop classpath` will lead to some breakages,
> > so I'd prefer to make this new behavior opt-in.
> >
> > Best,
> > Andrew
> >
> > On Mon, Apr 3, 2017 at 9:04 AM, Allen Wittenauer <
> a...@effectivemachines.com>
> > wrote:
> >
> > >
> > > This morning I had a bit of a shower thought:
> > >
> > > With the new shaded hadoop client in 3.0, is there any reason
> the
> > > default classpath should remain the full blown jar list?  e.g.,
> shouldn’t
> > > ‘hadoop classpath’ just return configuration, user supplied bits (e.g.,
> > > HADOOP_USER_CLASSPATH, etc), HADOOP_OPTIONAL_TOOLS, and
> > > hadoop-client-runtime? We’d obviously have to add some plumbing for
> daemons
> > > and the capability for the user to get the full list, but that should
> be
> > > trivial.
> > >
> > > Thoughts?
> > > -
> > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> > >
> > >
> >
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>


Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Andrew Wang
Great. If y'all are satisfied, I am too.

My only other request is that we shade PB even for the non-client JARs,
since empirically there are a lot of downstream projects pulling in our
server-side artifacts.

On Thu, Mar 30, 2017 at 9:55 AM, Stack <st...@duboce.net> wrote:

> On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas <chris.doug...@gmail.com>
> wrote:
>
>> On Wed, Mar 29, 2017 at 4:59 PM, Stack <st...@duboce.net> wrote:
>> >> The former; an intermediate handler decoding, [modifying,] and
>> >> encoding the record without losing unknown fields.
>> >>
>> >
>> > I did not try this. Did you? Otherwise I can.
>>
>> Yeah, I did. Same format. -C
>>
>>
> Grand.
> St.Ack
>
>
>
>
>> >> This looks fine. -C
>> >>
>> >> > Thanks,
>> >> > St.Ack
>> >> >
>> >> >
>> >> > # Using the protoc v3.0.2 tool
>> >> > $ protoc --version
>> >> > libprotoc 3.0.2
>> >> >
>> >> > # I have a simple proto definition with two fields in it
>> >> > $ more pb.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> >   optional string two = 2;
>> >> > }
>> >> >
>> >> > # This is a text-encoded instance of a 'Test' proto message:
>> >> > $ more pb.txt
>> >> > one: "one"
>> >> > two: "two"
>> >> >
>> >> > # Now I encode the above as a pb binary
>> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb.proto. Please use 'syntax =
>> "proto2";'
>> >> > or
>> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to
>> proto2
>> >> > syntax.)
>> >> >
>> >> > # Here is a dump of the binary
>> >> > $ od -xc pb.bin
>> >> > 000  030a6e6f126574036f77
>> >> >   \n 003   o   n   e 022 003   t   w   o
>> >> > 012
>> >> >
>> >> > # Here is a proto definition file that has a Test Message minus the
>> >> > 'two'
>> >> > field.
>> >> > $ more pb_drops_two.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> > }
>> >> >
>> >> > # Use it to decode the bin file:
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> >> > (Defaulted
>> >> > to proto2 syntax.)
>> >> > one: "one"
>> >> > 2: "two"
>> >> >
>> >> > Note how the second field is preserved (absent a field name). It is
>> not
>> >> > dropped.
>> >> >
>> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
>> IS
>> >> > dropped.
>> >> >
>> >> > # Here proto file with proto3 syntax specified (had to drop the
>> >> > 'optional'
>> >> > qualifier -- not allowed in proto3):
>> >> > $ more pb_drops_two.proto
>> >> > syntax = "proto3";
>> >> > message Test {
>> >> >   string one = 1;
>> >> > }
>> >> >
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  >
>> pb_drops_two.txt
>> >> > $ more pb_drops_two.txt
>> >> > one: "one"
>> >> >
>> >> >
>> >> > I cannot reencode the text output using pb_drops_two.proto. It
>> >> > complains:
>> >> >
>> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
>> >> > pb_drops_two.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";'

Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Andrew Wang
>
> > If unknown fields are dropped, then applications proxying tokens and
> other
> >> data between servers will effectively corrupt those messages, unless we
> >> make everything opaque bytes, which- absent the convenient, prenominate
> >> semantics managing the conversion- obviate the compatibility machinery
> that
> >> is the whole point of PB. Google is removing the features that justified
> >> choosing PB over its alternatives. Since we can't require that our
> >> applications compile (or link) against our updated schema, this creates
> a
> >> problem that PB was supposed to solve.
> >
> >
> > This is scary, and it potentially affects services outside of the Hadoop
> > codebase. This makes it difficult to assess the impact.
>
> Stack mentioned a compatibility mode that uses the proto2 semantics.
> If that carries unknown fields through intermediate handlers, then
> this objection goes away. -C


Did some more googling, found this:

https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ

Feng Xiao appears to be a Google engineer, and suggests workarounds like
packing the fields into a byte type. No mention of a PB2 compatibility
mode. Also here:

https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ

Participants say that unknown fields were dropped for automatic JSON
encoding, since you can't losslessly convert to JSON without knowing the
type.

Unfortunately, it sounds like these are intrinsic differences with PB3.

Best,
Andrew


Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Andrew Wang
I've been investigating this a bit. I'm hoping Chris can ring in, since
he's identified wire compatibility issues. Replying inline to Chris' comment

on HDFS-11010:

There's no mention of the convenient "Embedded messages are compatible with
> bytes if the bytes contain an encoded version of the message" semantics in
> proto3.


I checked the proto3 guide, and I think this is supported:
https://developers.google.com/protocol-buffers/docs/proto3#updating

If unknown fields are dropped, then applications proxying tokens and other
> data between servers will effectively corrupt those messages, unless we
> make everything opaque bytes, which- absent the convenient, prenominate
> semantics managing the conversion- obviate the compatibility machinery that
> is the whole point of PB. Google is removing the features that justified
> choosing PB over its alternatives. Since we can't require that our
> applications compile (or link) against our updated schema, this creates a
> problem that PB was supposed to solve.


This is scary, and it potentially affects services outside of the Hadoop
codebase. This makes it difficult to assess the impact.

Paraphrasing, the issues with PB2.5 are:

   1. poor support for non-x86, non-Linux platforms
   2. not as available, so harder to setup a dev environment
   3. missing zero-copy support, which helped performance in HBase

#1 and #2 can be addressed if we rehosted PB (with cross-OS compilation
patches) elsewhere.
#3 I don't think we benefit from, since we don't pass around big PB byte
arrays (at least in HDFS).

So the way I see it, upgrading to PB3 has risk from the behavior change wrt
unknown fields, while there are other ways of addressing the stated issues
with PB2.5.

Best,
Andrew


Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-21 Thread Andrew Wang
I poked around a bit. The 3.0.0-alpha2 binary tarball is only 246M and has
more changes than 2.8.0.

It looks like the 2.8.0 bin tarball has an extra 1.5GB of docs when
extracted compared to 3.0.0-alpha2. I think it's from the extra src-html
content:

-> % find share/doc -name src-html | xargs du -sb | awk -e '{SUM+=$1} END
{print SUM}'
1651891481


On Tue, Mar 21, 2017 at 9:53 AM, Wei-Chiu Chuang 
wrote:

> Thanks Junping for taking on this huge effort!
>
> I found one tiny nit: the md5 files are not in conventional format
> That is to say,
> $ cat hadoop-2.8.0.tar.gz.md5
>
> $ /usr/bin/md5sum /build/source/target/artifacts/hadoop-2.8.0.tar.gz
> c728a090b68d009070085367695ed507  /build/source/target/
> artifacts/hadoop-2.8.0.tar.gz
>
> But a typical md5 file would have been:
> c728a090b68d009070085367695ed507  hadoop-2.8.0.tar.gz
>
> I was pretty stunned finding hadoop-2.8.0.tar.gz is a whopping 410MB
> binary, comparing to hadoop-2.7.3.tar.gz which is just 205 MB.
> But later on I realized the source code hadoop-2.8.0-src.tar.gz is 33MB
> comparing to hadoop-2.7.3-src.tar.gz which is 18MB. So probably it’s the
> amount of changes made into Hadoop 2.8 makes such difference in size.
>
> Regards,
> Wei-Chiu Chuang
>
> > On Mar 21, 2017, at 9:39 AM, Akira Ajisaka  wrote:
> >
> > Thanks Junping!
> >
> > +1 (binding)
> >
> > * Verified signatures and checksums
> > * Built Hive 2.1.0 and Tez 0.8.5 with Hadoop 2.8.0 pom
> > * Deployed a single node cluster and ran some Hive on Tez queries
> successfully
> > * The document looks good.
> >
> > I found a trivial issue in the doc. It does not block the release.
> > https://issues.apache.org/jira/browse/HADOOP-14208
> >
> > Regards,
> > Akira
> >
> >
> > On 2017/03/17 18:18, Junping Du wrote:
> >> Hi all,
> >> With fix of HDFS-11431 get in, I've created a new release candidate
> (RC3) for Apache Hadoop 2.8.0.
> >>
> >> This is the next minor release to follow up 2.7.0 which has been
> released for more than 1 year. It comprises 2,900+ fixes, improvements, and
> new features. Most of these commits are released for the first time in
> branch-2.
> >>
> >>  More information about the 2.8.0 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >>
> >>  New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.0-RC3
> >>
> >>  The RC tag in git is: release-2.8.0-RC3, and the latest commit id
> is: 91f2b7a13d1e97be65db92ddabc627cc29ac0009
> >>
> >>  The maven artifacts are available via repository.apache.org at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1057
> >>
> >>  Please try the release and vote; the vote will run for the usual 5
> days, ending on 03/22/2017 PDT time.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
>


Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

2017-03-15 Thread Andrew Wang
Hi Junping, inline,

>From my understanding, this issue is related to our previous
> improvements with separating client and server jars in HDFS-6200. If we use
> the new "client" jar in NN HA deployment, then we will hit the issue
> reported.
>
>From my read of the poms, hadoop-client depends on hadoop-hdfs-client to
pull in HDFS-related code. It doesn't have its own dependency on
hadoop-hdfs. So I think this affects users of the hadoop-client artifact,
which has existed for a long time.

Essentially all of our customer deployments run with NN HA, so this would
affect a lot of users.

> I can see two options here:
>
> - Without any change in 2.8.0, if user hit the issue when they deploy HA
> cluster by using new client jar, adding back hdfs jar just like how things
> work previously
>
> - Make the change now in 2.8.0, either moving
> ConfiguredFailoverProxyProvider to client jar or adding dependency
> between client jar and server jar. There must be some arguments there on
> which way to fix is better especially ConfiguredFailoverProxyProvider
> still has some sever side dependencies.
>
>
> I would prefer the first option, given:
>
> - The issue fixing time is unpredictable as there are still discussion on
> how to fix this issue. Our 2.8.0 release shouldn't be an endless journey
> which has been deferred several times for more serious issue.
>
Looks like we have a patch being actively revved and reviewed to fix this
by making hadoop-hdfs-client depend on hadoop-hdfs. Thanks to Steven and
Steve for working on this.

Steve proposed doing a proper split in a later JIRA.

> - We have workaround for this improvement, no regression happens due to
> this issue. People can still use hdfs jar in old way. The worst case
> is improvement for HDFS doesn't work in some cases - that shouldn't block
> the whole release.
>
Based on the above, I think there is a regression for users of the
hadoop-client artifact.

If it actually only affects users of hadoop-hdfs-client, then I agree we
can document it as a Known Issue and fix it later.

Best,
Andrew


  1   2   3   >