Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-19 Thread Aaron Fabbri
+1 (binding)

Thanks to the Ozone folks for their efforts at maintaining good separation
with HDFS and common. I took a lot of heat for the unpopular opinion that
they should  be separate, so I am glad the process has worked out well for
both codebases. It looks like my concerns were addressed and I appreciate
it.  It is cool to see the evolution here.

Aaron


On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran 
wrote:

> in that case,
>
> +1 from me (binding)
>
> On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton  wrote:
>
> >  > one thing to consider here as you are giving up your ability to make
> >  > changes in hadoop-* modules, including hadoop-common, and their
> >  > dependencies, in sync with your own code. That goes for filesystem
> > contract
> >  > tests.
> >  >
> >  > are you happy with that?
> >
> >
> > Yes. I think we can live with it.
> >
> > Fortunatelly the Hadoop parts which are used by Ozone (security + rpc)
> > are stable enough, we didn't need bigger changes until now (small
> > patches are already included in 3.1/3.2).
> >
> > I think it's better to use released Hadoop bits in Ozone anyway, and
> > worst (best?) case we can try to do more frequent patch releases from
> > Hadoop (if required).
> >
> >
> > m.
> >
> >
> >
>


Re: Including Original Author in git commits.

2019-02-14 Thread Aaron Fabbri
+1. I think formatted patches and PRs will be an improvement.  I've used
the git --committer thing a couple of times here without issue.

Another unrelated improvement with github is handling of large changes. I
really think large patches should be split up into logical subcommitts and
PR's support this naturally. (We could also use something like quilt and
change the JIRA process to understand patch sets but I may be the only one
excited about that idea.)

On Thu, Feb 14, 2019 at 5:24 AM Vinayakumar B 
wrote:

> So.. if we started doing that already.. we can encourage contributors to
> attach formatted patch.. or create PRs.
>
> And update wiki to follow exact steps to contribute and commit.
>
> -Vinay
>
>
> On Thu, 14 Feb 2019, 4:54 pm Steve Loughran  wrote:
>
> > I've been trying to do that recently, though as it forces me to go to the
> > command line rather than using Atlassian Sourcetree, I've been getting
> > other things wrong. To those people who have been dealing with commits
> I've
> > managed to mess up: apologies.
> >
> > 1. Once someone is down as an author you don't need to add their email
> > address; the first time you will need to get their email address
> > 2. Akira, Aaron and I also use the -S option to GPG sign the commits. We
> > should all be doing that, as it is the way to show who really committed
> the
> > patch. Add --show-signature to the end of any git log to command to see
> > those.
> > 3. note that if you cherry-pick a patch into a different branch, you have
> > to use -S in the git cherry-pick command to resign it.
> >
> > we should all have our GPG keys in the KEYS file, and co-sign the others
> in
> > there, so that we have that mutual trust.
> >
> > -Steve
> >
> > ps: one flaw in the GPG process: if you ever revoke the key then all
> > existing commits are considered untrusted
> >
> >
> http://steveloughran.blogspot.com/2017/10/roca-breaks-my-commit-process.html
> >
> >
> >
> >
> > On Thu, Feb 14, 2019 at 9:12 AM Akira Ajisaka 
> wrote:
> >
> > > Hi Vinay,
> > >
> > > I'm already doing this if I can get the original author name and the
> > > email address in some way.
> > > If the patch is created by git format-patch command, smart-apply-patch
> > > --committer option can do this automatically.
> > >
> >
> > Never knew that
> >
>


Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-10 Thread Aaron Fabbri
Thanks Sunil and everyone who has worked on this release.

+1 from me.

- Verified checksums for tar file.
- Built from tar.gz.
- Ran through S3A and S3Guard integration tests (in AWS us-west 2).

This includes a yarn minicluster test but is mostly focused on s3a/s3guard.

Cheers,
Aaron


On Thu, Jan 10, 2019 at 2:32 PM Kuhu Shukla 
wrote:

> +1 (non-binding)
>
> - built from source on Mac
> - deployed on a pseudo distributed one node cluster
> - ran example jobs like sleep and wordcount.
>
> Thank you for all the work on this release.
> Regards,
> Kuhu
>
> On Thu, Jan 10, 2019 at 10:32 AM Craig.Condit 
> wrote:
>
> > +1 (non-binding)
> >
> > - built from source on CentOS 7.5
> > - deployed single node cluster
> > - ran several yarn jobs
> > - ran multiple docker jobs, including spark-on-docker
> >
> > On 1/8/19, 5:42 AM, "Sunil G"  wrote:
> >
> > Hi folks,
> >
> >
> > Thanks to all of you who helped in this release [1] and for helping
> to
> > vote
> > for RC0. I have created second release candidate (RC1) for Apache
> > Hadoop
> > 3.2.0.
> >
> >
> > Artifacts for this RC are available here:
> >
> > http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/
> >
> >
> > RC tag in git is release-3.2.0-RC1.
> >
> >
> >
> > The maven artifacts are available via repository.apache.org at
> >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1178/
> >
> >
> > This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59
> pm
> > PST.
> >
> >
> >
> > 3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
> > additions
> >
> > are the highlights of this release.
> >
> > 1. Node Attributes Support in YARN
> >
> > 2. Hadoop Submarine project for running Deep Learning workloads on
> YARN
> >
> > 3. Support service upgrade via YARN Service API and CLI
> >
> > 4. HDFS Storage Policy Satisfier
> >
> > 5. Support Windows Azure Storage - Blob file system in Hadoop
> >
> > 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a
> >
> > 7. Improvements in Router-based HDFS federation
> >
> >
> >
> > Thanks to Wangda, Vinod, Marton for helping me in preparing the
> > release.
> >
> > I have done few testing with my pseudo cluster. My +1 to start.
> >
> >
> >
> > Regards,
> >
> > Sunil
> >
> >
> >
> > [1]
> >
> >
> >
> https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E
> >
> > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in
> > (3.2.0)
> > AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status =
> Resolved
> > ORDER BY fixVersion ASC
> >
> >
> >
>


Re: Hadoop 3.2 Release Plan proposal

2018-10-02 Thread Aaron Fabbri
- (Virajit) HDFS-12615: Router-based HDFS federation. Improvement
> >>>>> works.
> >>>>> - (Steve) S3Guard Phase III, S3a phase V, Support Windows Azure
> >>>>> Storage. In progress.
> >>>>>
> >>>>> 3. Tentative features:
> >>>>>
> >>>>> - (Haibo Chen) YARN-1011: Resource overcommitment. Looks challenging
> >>>>> to be done before Aug 2018.
> >>>>> - (Eric) YARN-7129: Application Catalog for YARN applications.
> >>>>> Challenging as more discussions are on-going.
> >>>>>
> >>>>> *Summary of 3.2.0 issues status:*
> >>>>>
> >>>>> 39 Blocker and Critical issues [1] are open, I am checking with
> owners
> >>>>> to get status on each of them to get in by Code Freeze date.
> >>>>>
> >>>>> [1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND priority in
> >>>>> (Blocker, Critical) AND resolution = Unresolved AND "Target
> Version/s" =
> >>>>> 3.2.0 ORDER BY priority DESC
> >>>>>
> >>>>> Thanks,
> >>>>> Sunil
> >>>>>
> >>>>> On Fri, Jul 20, 2018 at 8:03 AM Sunil G  wrote:
> >>>>>
> >>>>>> Thanks Subru for the thoughts.
> >>>>>> One of the main reason for a major release is to push out critical
> >>>>>> features with a faster cadence to the users. If we are pulling more
> and
> >>>>>> more different types of features to a minor release, that branch
> will
> >>>>>> become more destabilized and it may be tough to say that 3.1.2 is
> stable
> >>>>>> that 3.1.1 for eg. We always tend to improve and stabilize features
> in
> >>>>>> subsequent minor release.
> >>>>>> For few companies, it makes sense to push out these new features
> >>>>>> faster to make a reach to the users. Adding to the point to the
> backporting
> >>>>>> issues, I agree that its a pain and we can workaround that with
> some git
> >>>>>> scripts. If we can make such scripts available to committers,
> backport will
> >>>>>> be seem-less across branches and we can achieve the faster release
> cadence
> >>>>>> also.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>> - Sunil
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Jul 20, 2018 at 3:37 AM Subru Krishnan 
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks Sunil for volunteering to lead the release effort. I am
> >>>>>>> generally
> >>>>>>> supportive of a release but -1 on a 3.2 (prefer a 3.1.x) as feel we
> >>>>>>> already
> >>>>>>> have too many branches to be maintained. I already see many commits
> >>>>>>> are in
> >>>>>>> different branches with no apparent rationale, for e.g: 3.1 has
> >>>>>>> commits
> >>>>>>> which are absent in 3.0 etc.
> >>>>>>>
> >>>>>>> Additionally AFAIK 3.x has not been deployed in any major
> production
> >>>>>>> setting so the cost of adding features should be minimal.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> -Subru
> >>>>>>>
> >>>>>>> On Thu, Jul 19, 2018 at 12:31 AM, Sunil G 
> wrote:
> >>>>>>>
> >>>>>>> > Thanks Steve, Aaron, Wangda for sharing thoughts.
> >>>>>>> >
> >>>>>>> > Yes, important changes and features are much needed, hence we
> will
> >>>>>>> be
> >>>>>>> > keeping the door open for them as possible. Also considering few
> >>>>>>> more
> >>>>>>> > offline requests from other folks, I think extending the
> timeframe
> >>>>>>> by
> >>>>>>> > couple of weeks makes sense (including a second RC buffer) and
> >>>>>>> this should
> >>>>>>> > ideally help us to ship this by September itself.
> >>>

Re: Hadoop 3.2 Release Plan proposal

2018-07-18 Thread Aaron Fabbri
On Tue, Jul 17, 2018 at 7:21 PM Steve Loughran 
wrote:

>
>
> On 16 Jul 2018, at 23:45, Sunil G  sun...@apache.org>> wrote:
>
> I would also would like to take this opportunity to come up with a detailed
> plan.
>
> - Feature freeze date : all features should be merged by August 10, 2018.
>
>
>
> 

>
> Please let me know if I missed any features targeted to 3.2 per this
>
>
> Well there these big todo lists for S3 & S3Guard.
>
> https://issues.apache.org/jira/browse/HADOOP-15226
> https://issues.apache.org/jira/browse/HADOOP-15220
>
>
> There's a bigger bit of work coming on for Azure Datalake Gen 2
> https://issues.apache.org/jira/browse/HADOOP-15407
>
> I don't think this is quite ready yet, I've been doing work on it, but if
> we have a 3 week deadline, I'm going to expect some timely reviews on
> https://issues.apache.org/jira/browse/HADOOP-15546
>
> I've uprated that to a blocker feature; will review the S3 & S3Guard JIRAs
> to see which of those are blocking. Then there are some pressing "guave,
> java 9 prep"
>
>
 I can help with this part if you like.



>
>
>
> timeline. I would like to volunteer myself as release manager of 3.2.0
> release.
>
>
> well volunteered!
>
>
>
Yes, thank you for stepping up.


>
> I think this raises a good q: what timetable should we have for the 3.2. &
> 3.3 releases; if we do want a faster cadence, then having the outline time
> from the 3.2 to the 3.3 release means that there's less concern about
> things not making the 3.2 dealine
>
> -Steve
>
>
Good idea to mitigate the short deadline.

-AF