Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-22 Thread Karthik Kambatla
+1 (binding)

* Built from source
* Started a pseudo-distributed cluster with fairscheduler.
* Ran sample jobs
* Verified WebUI

On Wed, Mar 22, 2017 at 11:56 AM, varunsax...@apache.org <
varun.saxena.apa...@gmail.com> wrote:

> Thanks Junping for creating the release.
>
> +1 (non-binding)
>
> * Verified signatures.
> * Built from source.
> * Set up a pseudo-distributed cluster.
> * Successfully ran pi and wordcount jobs.
> * Navigated the YARN RM and NM UI.
>
> Regards,
> Varun Saxena.
>
> On Wed, Mar 22, 2017 at 12:13 AM, Haibo Chen 
> wrote:
>
> > Thanks Junping for working on the new release!
> >
> > +1 non-binding
> >
> > 1) Downloaded the source, verified the checksum
> > 2) Built natively from source, and deployed it to a pseudo-distributed
> > cluster
> > 3) Ran sleep and teragen job and checked both YARN and JHS web UI
> > 4) Played with yarn + mapreduce command lines
> >
> > Best,
> > Haibo Chen
> >
> > On Mon, Mar 20, 2017 at 11:18 AM, Junping Du 
> wrote:
> >
> > > ?Thanks for update, John. Then we should be OK with fixing this issue
> in
> > > 2.8.1.
> > >
> > > Mark the target version of HADOOP-14205 to 2.8.1 instead of 2.8.0 and
> > bump
> > > up to blocker in case we could miss this in releasing 2.8.1. :)
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Junping
> > >
> > > 
> > > From: John Zhuge 
> > > Sent: Monday, March 20, 2017 10:31 AM
> > > To: Junping Du
> > > Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> > > yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > > Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
> > >
> > > Yes, it only affects ADL. There is a workaround of adding these 2
> > > properties to core-site.xml:
> > >
> > >   
> > > fs.adl.impl
> > > org.apache.hadoop.fs.adl.AdlFileSystem
> > >   
> > >
> > >   
> > > fs.AbstractFileSystem.adl.impl
> > > org.apache.hadoop.fs.adl.Adl
> > >   
> > >
> > > I have the initial patch ready but hitting these live unit test
> failures:
> > >
> > > Failed tests:
> > >   TestAdlFileSystemContractLive.runTest:60->FileSystemContract
> > BaseTest.testListStatus:257
> > > expected:<1> but was:<10>
> > >
> > > Tests in error:
> > >   TestAdlFileContextMainOperationsLive>FileContextMainOperatio
> > nsBaseTest.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:254 » AccessControl
> > >   TestAdlFileSystemContractLive.runTest:60->
> FileSystemContractBaseTest.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:190 » AccessControl
> > >
> > >
> > > Stay tuned...
> > >
> > > John Zhuge
> > > Software Engineer, Cloudera
> > >
> > > On Mon, Mar 20, 2017 at 10:02 AM, Junping Du  >  > > j...@hortonworks.com>> wrote:
> > >
> > > Thank you for reporting the issue, John! Does this issue only affect
> ADL
> > > (Azure Data Lake) which is a new feature for 2.8 rather than other
> > existing
> > > FS? If so, I think we can leave the fix to 2.8.1 to fix given this is
> > not a
> > > regression and just a new feature get broken.?
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Junping
> > >
> > > 
> > > From: John Zhuge >
> > > Sent: Monday, March 20, 2017 9:07 AM
> > > To: Junping Du
> > > Cc: common-...@hadoop.apache.org;
> > > hdfs-...@hadoop.apache.org;
> > > yarn-...@hadoop.apache.org;
> > > mapreduce-dev@hadoop.apache.org >
> > > Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
> > >
> > > Discovered https://issues.apache.org/jira/browse/HADOOP-14205 "No
> > > FileSystem for scheme: adl".
> > >
> > > The issue were caused by backporting HADOOP-13037 to branch-2 and
> > earlier.
> > > HADOOP-12666 should not be backported, but some changes are needed:
> > > property fs.adl.impl in core-default.xml and hadoop-tools-dist/pom.xml.
> > >
> > > I am working on a patch.
> > >
> > >
> > > John Zhuge
> > > Software Engineer, Cloudera
> > >
> > > On Fri, Mar 17, 2017 at 2:18 AM, Junping Du   > jd
> > > u...@hortonworks.com>> wrote:
> > > Hi all,
> > >  With fix of HDFS-11431 get in, I've created a new release
> candidate
> > > (RC3) for Apache Hadoop 2.8.0.
> > >
> > >  This is the next minor release to follow up 2.7.0 which has been
> > > released for more than 1 year. It comprises 2,900+ fixes, improvements,
> > and
> > > new features. Most of these commits are released for the first time in
> > > branch-2.
> > >
> > >   More information about the 2.8.0 release plan can be found here:
> > > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> > >
> > >   New RC is available at: http://home.apache.org/~
> > > junping_du/hadoop-2.8.0-RC3
> > >
> > >   The RC tag in git is: release-2.8.0-RC3, and the latest 

[jira] [Resolved] (MAPREDUCE-6506) Make the reducer-preemption configs consistent in how they handle defaults

2017-02-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-6506.
-
  Resolution: Not A Problem
Target Version/s:   (was: )

Closing this as "Not a Problem". 

As Haibo mentioned, setting both configs to a negative value disables the 
individual features. It is just that disabling one means preempting instantly 
and disabling another means preempting never. 

> Make the reducer-preemption configs consistent in how they handle defaults
> --
>
> Key: MAPREDUCE-6506
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6506
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: applicationmaster
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Gergő Pásztor
>Priority: Critical
>
> {{mapreduce.job.reducer.preempt.delay.sec}} and 
> {{mapreduce.job.reducer.unconditional-preempt.delay.sec}} are two configs 
> related to reducer preemption. These configs are not consistent in how they 
> handle non-positive values. Also, the way to disable them is different. 
> It would be nice to make them consistent somehow, and change the behavior of 
> former if need be. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-25 Thread Karthik Kambatla
I feel the same way as Chris.

+1 on the current RC. And, open to changes to RN and License.

On Wed, Jan 25, 2017 at 11:59 AM, Chris Douglas <chris.doug...@gmail.com>
wrote:

> On Wed, Jan 25, 2017 at 11:42 AM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
> > Chris and Karthik, could you clarify the contingency of your votes? Is
> > fixing just the release notes sufficient?
>
> My +1 was not contingent on any changes.
>
> The release is fine as-is. Fixing any subset of the release notes,
> minicluster jar, and the organization of LICENSE files within jars
> should not reset the clock on the VOTE. -C
>
> > On Wed, Jan 25, 2017 at 11:14 AM, Karthik Kambatla <ka...@cloudera.com>
> > wrote:
> >
> >> Thanks for driving the alphas, Andrew. I don't see the need to restart
> the
> >> vote and I feel it is okay to fix the minor issues before releasing.
> >>
> >> +1 (binding). Downloaded source, stood up a pseudo-distributed cluster
> >> with FairScheduler, ran example jobs, and played around with the UI.
> >>
> >> Thanks
> >> Karthik
> >>
> >>
> >> On Fri, Jan 20, 2017 at 2:36 PM, Andrew Wang <andrew.w...@cloudera.com>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> With heartfelt thanks to many contributors, the RC0 for 3.0.0-alpha2 is
> >>> ready.
> >>>
> >>> 3.0.0-alpha2 is the second alpha in the planned 3.0.0 release line
> leading
> >>> up to a 3.0.0 GA. It comprises 857 fixes, improvements, and new
> features
> >>> since alpha1 was released on September 3rd, 2016.
> >>>
> >>> More information about the 3.0.0 release plan can be found here:
> >>>
> >>> https://cwiki.apache.org/confluence/display/HADOOP/
> Hadoop+3.0.0+release
> >>>
> >>> The artifacts can be found here:
> >>>
> >>> http://home.apache.org/~wang/3.0.0-alpha2-RC0/
> >>>
> >>> This vote will run 5 days, ending on 01/25/2017 at 2PM pacific.
> >>>
> >>> I ran basic validation with a local pseudo cluster and a Pi job. RAT
> >>> output
> >>> was clean.
> >>>
> >>> My +1 to start.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>
> >>
>


Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-25 Thread Karthik Kambatla
Thanks for driving the alphas, Andrew. I don't see the need to restart the
vote and I feel it is okay to fix the minor issues before releasing.

+1 (binding). Downloaded source, stood up a pseudo-distributed cluster with
FairScheduler, ran example jobs, and played around with the UI.

Thanks
Karthik


On Fri, Jan 20, 2017 at 2:36 PM, Andrew Wang 
wrote:

> Hi all,
>
> With heartfelt thanks to many contributors, the RC0 for 3.0.0-alpha2 is
> ready.
>
> 3.0.0-alpha2 is the second alpha in the planned 3.0.0 release line leading
> up to a 3.0.0 GA. It comprises 857 fixes, improvements, and new features
> since alpha1 was released on September 3rd, 2016.
>
> More information about the 3.0.0 release plan can be found here:
>
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0.0+release
>
> The artifacts can be found here:
>
> http://home.apache.org/~wang/3.0.0-alpha2-RC0/
>
> This vote will run 5 days, ending on 01/25/2017 at 2PM pacific.
>
> I ran basic validation with a local pseudo cluster and a Pi job. RAT output
> was clean.
>
> My +1 to start.
>
> Thanks,
> Andrew
>


Re: [VOTE] Release cadence and EOL

2017-01-21 Thread Karthik Kambatla
Given the discussions, I feel we are not ready for VOTE on this yet.
Sangjin, should we go back to the DISCUSS thread?

IMO, these are guidelines and not policies we want to enforce. Maybe, the
text should say something along the lines of: "The Hadoop community is
inclined to". And, maybe, a caveat that these inclinations will be acted
upon based on the release in question, its adoption and the availability of
committer volunteers.

As I said on the DISCUSS thread, IMO, these guidelines are very useful and
form a good baseline to operate from as Sangjin said. Also, I don't see any
disadvantages of having this baseline.

Here are the advantages I see.

   1. Users know what to expect in terms of security fixes, critical bug
   fixes and can plan their upgrades accordingly.
   2. Contributors don't need to panic about getting a feature/improvement
   in *this* release, because they know the next one is only 6 months away.
   3. RM: some method to madness. Junping, for instance, is trying to roll
   a release with 2300 patches. It is a huge time investment. (Thanks again,
   Junping.) Smaller releases are easier to manage. A target release cadence,
   coupled with a process that encourages volunteering, IMO would lead to more
   committers doing releases.

To conclude, the biggest value I see is us (the community) agreeing on good
practices for our releases and work towards that. Writing it down somewhere
makes it a little more formal like the compatibility stuff, even if it is
not enforceable.

On Sat, Jan 21, 2017 at 11:36 AM, Chris Douglas  wrote:

> On Fri, Jan 20, 2017 at 2:50 PM, Sangjin Lee  wrote:
> > The security patch for the 2.6.x line is a case in point. Without any
> > guideline, we would start with "What should we do for 2.6.x? Should we
> > continue to patch it?" With this guideline, the baseline is already "it's
> > been 2 years since 2.6.0 is released and we should consider stopping
> > releasing from 2.6.x and encourage users to upgrade to 2.7.x."
>
> Unless it falls under the "good reason" clause. To invent an example,
> if 3.6.x were within the two year/minor release window, but 3.5.x was
> more widely deployed/stable, then we'd use this escape hatch to patch
> 3.5.x and likely just violate our policy on 3.6.x (when the
> implementation cost is high). How do we "force" a fix to 3.6.x?
>
> We can't actually compel work from people. Even when we can point to a
> prior consensus, someone needs to be motivated to actually complete
> that task. That's the rub: this proposal doesn't only allow us to stop
> working on old code, it also promises that we'll work on code we want
> to abandon.
>
> Pointing to a bylaw, and telling a contributor they "must" support a
> branch/release isn't going to result in shorter discussions, either.
> In the preceding hypothetical, if someone wants a fix in the 3.6 line,
> they either need to convince others that it's important or they need
> to do the work themselves.
>
> > Actually the decision on security issues is a pretty strong indicator of
> our
> > desire for EOL. If we say we do not want to patch that line for security
> > vulnerability, then there would be even *less* rationale for fixing any
> > other issue on that line. So the decision to stop backporting security
> > patches is a sufficient indication of EOL in my mind.
>
> Agreed. If we're not backporting security patches to a branch, then we
> need to release a CVE, file a JIRA, and move on. If someone wants to
> fix it and roll an RC for that release line, it lives. Otherwise it
> dies as people move to later versions (unpatched security flaws are
> motivating). A branch is EOL when we stop releasing from it. Two years
> or two minor releases is a good heuristic based on recent history, but
> overfitting policy to experience doesn't seem to buy us anything.
>
> I'm all for spending less time discussing release criteria, but if
> it's sufficient to observe which release lines are getting updates and
> label them accordingly, that's cheaper to implement than a curated set
> of constraints. -C
>
> >> We can still embargo security flaws if someone asks (to give them time
> >> time to implement a fix and call a vote). If there's nothing else in
> >> the release, then we're effectively announcing it. In those cases, we
> >> call a vote on private@ (cc: security@). -C
> >>
> >>
> >> On Thu, Jan 19, 2017 at 1:30 PM, Andrew Wang 
> >> wrote:
> >> > I don't think the motivation here is vendor play or taking away power
> >> > from
> >> > committers. Having a regular release cadence helps our users
> understand
> >> > when a feature will ship so they can plan their upgrades. Having an
> EOL
> >> > policy and a minimum support period helps users choose a release line,
> >> > and
> >> > understand when they will need to upgrade.
> >> >
> >> > In the earlier thread, we discussed how these are not rules, but
> >> > guidelines. There's a lot of 

Re: [VOTE] Release cadence and EOL

2017-01-17 Thread Karthik Kambatla
+1

I would also like to see some process guidelines. I should have brought
this up on the discussion thread, but I thought of them only now :(

   - Is an RM responsible for all maintenance releases against a minor
   release, or finding another RM to drive a maintenance release? In the past,
   this hasn't been a major issue.
   - When do we pick/volunteer to RM a minor release? IMO, this should be
   right after the previous release goes out. For example, Junping is driving
   2.8.0 now. As soon as that is done, we need to find a volunteer to RM 2.9.0
   6 months after.
   - The release process has multiple steps, based on
   major/minor/maintenance. It would be nice to capture/track how long each
   step takes so the RM can be prepared. e.g. herding the cats for an RC takes
   x weeks, compatibility checks take y days of work.


On Tue, Jan 17, 2017 at 10:05 AM, Sangjin Lee  wrote:

> Thanks for correcting me! I left out a sentence by mistake. :)
>
> To correct the proposal we're voting for:
>
> A minor release on the latest major line should be every 6 months, and a
> maintenance release on a minor release (as there may be concurrently
> maintained minor releases) every 2 months.
>
> A minor release line is end-of-lifed 2 years after it is released or there
> are 2 newer minor releases, whichever is sooner. The community reserves the
> right to extend or shorten the life of a release line if there is a good
> reason to do so.
>
> Sorry for the snafu.
>
> Regards,
> Sangjin
>
> On Tue, Jan 17, 2017 at 9:58 AM, Daniel Templeton 
> wrote:
>
> > Thanks for driving this, Sangjin. Quick question, though: the subject
> line
> > is "Release cadence and EOL," but I don't see anything about cadence in
> the
> > proposal.  Did I miss something?
> >
> > Daniel
> >
> >
> > On 1/17/17 8:35 AM, Sangjin Lee wrote:
> >
> >> Following up on the discussion thread on this topic (
> >> https://s.apache.org/eFOf), I'd like to put the proposal for a vote for
> >> the
> >> release cadence and EOL. The proposal is as follows:
> >>
> >> "A minor release line is end-of-lifed 2 years after it is released or
> >> there
> >> are 2 newer minor releases, whichever is sooner. The community reserves
> >> the
> >> right to extend or shorten the life of a release line if there is a good
> >> reason to do so."
> >>
> >> This also entails that we the Hadoop community commit to following this
> >> practice and solving challenges to make it possible. Andrew Wang laid
> out
> >> some of those challenges and what can be done in the discussion thread
> >> mentioned above.
> >>
> >> I'll set the voting period to 7 days. I understand a majority rule would
> >> apply in this case. Your vote is greatly appreciated, and so are
> >> suggestions!
> >>
> >> Thanks,
> >> Sangjin
> >>
> >>
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
>


Re: [DISCUSS] Release cadence and EOL

2017-01-11 Thread Karthik Kambatla
I am certainly in favor of doing a vote for the same.

Along with that, we should likely make some progress on the logistic issues
with doing a release that Andrew raised.

On Tue, Jan 3, 2017 at 4:06 PM, Sangjin Lee <sj...@apache.org> wrote:

> Happy new year!
>
> I think this topic has aged quite a bit in the discussion thread. Should
> we take it to a vote? Do we need additional discussions?
>
> Regards,
> Sangjin
>
> On Wed, Nov 9, 2016 at 11:11 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> Fair points, Sangjin and Andrew.
>>
>> To get the ball rolling on this, I am willing to try the proposed policy.
>>
>> On Fri, Nov 4, 2016 at 12:09 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>
>> > I'm certainly willing to try this policy. There's definitely room for
>> > improvement when it comes to streamlining the release process. The
>> > create-release script that Allen wrote helps, but there are still a lot
>> of
>> > manual steps in HowToRelease for staging and publishing a release.
>> >
>> > Another perennial problem is reconciling git log with the changes and
>> > release notes and JIRA information. I think each RM has written their
>> own
>> > scripts for this, but it could probably be automated into a Jenkins
>> report.
>> >
>> > And the final problem is that branches are often not in a releasable
>> > state. This is because we don't have any upstream integration testing.
>> For
>> > instance, testing with 3.0.0-alpha1 has found a number of latent
>> > incompatibilities in the 2.8.0 branch. If we want to meaningfully speed
>> up
>> > the minor release cycle, continuous integration testing is a must.
>> >
>> > Best,
>> > Andrew
>> >
>> > On Fri, Nov 4, 2016 at 10:33 AM, Sangjin Lee <sj...@apache.org> wrote:
>> >
>> >> Thanks for your thoughts and more data points Andrew.
>> >>
>> >> I share your concern that the proposal may be more aggressive than what
>> >> we have been able to accomplish so far. I'd like to hear from the
>> community
>> >> what is a desirable release cadence which is still within the realm of
>> the
>> >> possible.
>> >>
>> >> The EOL policy can also be a bit of a forcing function. By having a
>> >> defined EOL, hopefully it would prod the community to move faster with
>> >> releases. Of course, automating releases and testing should help.
>> >>
>> >>
>> >> On Tue, Nov 1, 2016 at 4:31 PM, Andrew Wang <andrew.w...@cloudera.com>
>> >> wrote:
>> >>
>> >>> Thanks for pushing on this Sangjin. The proposal sounds reasonable.
>> >>>
>> >>> However, for it to have teeth, we need to be *very* disciplined about
>> the
>> >>> release cadence. Looking at our release history, we've done 4
>> maintenance
>> >>> releases in 2016 and no minor releases. 2015 had 4 maintenance and 1
>> >>> minor
>> >>> release. The proposal advocates for 12 maintenance releases and 2
>> minors
>> >>> per year, or about 3.5x more releases than we've historically done. I
>> >>> think
>> >>> achieving this will require significantly streamlining our release and
>> >>> testing process.
>> >>>
>> >>> For some data points, here are a few EOL lifecycles for some major
>> >>> projects. They talk about support in terms of time (not number of
>> >>> releases), and release on a cadence.
>> >>>
>> >>> Ubuntu maintains LTS for 5 years:
>> >>> https://www.ubuntu.com/info/release-end-of-life
>> >>>
>> >>> Linux LTS kernels have EOLs ranging from 2 to 6 years, though it seems
>> >>> only
>> >>> one has actually ever been EOL'd:
>> >>> https://www.kernel.org/category/releases.html
>> >>>
>> >>> Mesos supports minor releases for 6 months, with a new minor every 2
>> >>> months:
>> >>> http://mesos.apache.org/documentation/latest/versioning/
>> >>>
>> >>> Eclipse maintains each minor for ~9 months before moving onto a new
>> >>> minor:
>> >>> http://stackoverflow.com/questions/35997352/how-to-determine
>> >>> -end-of-life-for-eclipse-versions
>> >>>
>> >>>
>> >>>
>> >>>

Re: Updated 2.8.0-SNAPSHOT artifact

2016-11-09 Thread Karthik Kambatla
If there is interest in releasing off of branch-2.8, we should definitely
do that. As Sangjin mentioned, there might be value in doing 2.9 off
branch-2 too.

How do we go about maintenance releases along those minor lines, and when
would we discontinue 2.6.x/2.7.x releases?

On Wed, Nov 9, 2016 at 12:06 PM, Ming Ma <min...@twitter.com> wrote:

> I would also prefer releasing current 2.8 branch sooner. There are several
> incomplete features in branch-2 such as YARN-914 and HDFS-7877 that are
> better served if we can complete them in the next major release. Letting
> them span across multiple releases might not be desirable as there could be
> some potential compatibility issues involved. Therefore if we recut 2.8 it
> means we have to work on those items before the new 2.8 is released which
> could cause major delay on the schedule.
>
> On Mon, Nov 7, 2016 at 10:37 AM, Sangjin Lee <sjl...@gmail.com> wrote:
>
>> +1. Resetting the 2.8 effort and the branch at this point may be
>> counter-productive. IMO we should focus on resolving the remaining
>> blockers
>> and getting it out the door. I also think that we should seriously
>> consider
>> 2.9 as well, as a fairly large number of changes have accumulated in
>> branch-2 (over branch-2.8).
>>
>>
>> Sangjin
>>
>> On Fri, Nov 4, 2016 at 3:38 PM, Jason Lowe <jl...@yahoo-inc.com.invalid>
>> wrote:
>>
>> > At this point my preference would be to do the most expeditious thing to
>> > release 2.8, whether that's sticking with the branch-2.8 we have today
>> or
>> > re-cutting it on branch-2.  Doing a quick JIRA query, there's been
>> almost
>> > 2,400 JIRAs resolved in 2.8.0 (1).  For many of them, it's well-past
>> time
>> > they saw a release vehicle.  If re-cutting the branch means we have to
>> wrap
>> > up a few extra things that are still in-progress on branch-2 or add a
>> few
>> > more blockers to the list before we release then I'd rather stay where
>> > we're at and ship it ASAP.
>> >
>> > Jason
>> > (1) https://issues.apache.org/jira/issues/?jql=project%20in%
>> > 20%28hadoop%2C%20yarn%2C%20mapreduce%2C%20hdfs%29%
>> > 20and%20resolution%20%3D%20Fixed%20and%20fixVersion%20%3D%202.8.0
>> >
>> >
>> >
>> >
>> >
>> > On Tuesday, October 25, 2016 5:31 PM, Karthik Kambatla <
>> > ka...@cloudera.com> wrote:
>> >
>> >
>> >  Is there value in releasing current branch-2.8? Aren't we better off
>> > re-cutting the branch off of branch-2?
>> >
>> > On Tue, Oct 25, 2016 at 12:20 AM, Akira Ajisaka <
>> > ajisa...@oss.nttdata.co.jp>
>> > wrote:
>> >
>> > > It's almost a year since branch-2.8 has cut.
>> > > I'm thinking we need to release 2.8.0 ASAP.
>> > >
>> > > According to the following list, there are 5 blocker and 6 critical
>> > issues.
>> > > https://issues.apache.org/jira/issues/?filter=12334985
>> > >
>> > > Regards,
>> > > Akira
>> > >
>> > >
>> > > On 10/18/16 10:47, Brahma Reddy Battula wrote:
>> > >
>> > >> Hi Vinod,
>> > >>
>> > >> Any plan on first RC for branch-2.8 ? I think, it has been long time.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --Brahma Reddy Battula
>> > >>
>> > >> -Original Message-
>> > >> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org]
>> > >> Sent: 20 August 2016 00:56
>> > >> To: Jonathan Eagles
>> > >> Cc: common-...@hadoop.apache.org
>> > >> Subject: Re: Updated 2.8.0-SNAPSHOT artifact
>> > >>
>> > >> Jon,
>> > >>
>> > >> That is around the time when I branched 2.8, so I guess you were
>> getting
>> > >> SNAPSHOT artifacts till then from the branch-2 nightly builds.
>> > >>
>> > >> If you need it, we can set up SNAPSHOT builds. Or just wait for the
>> > first
>> > >> RC, which is around the corner.
>> > >>
>> > >> +Vinod
>> > >>
>> > >> On Jul 28, 2016, at 4:27 PM, Jonathan Eagles <jeag...@gmail.com>
>> wrote:
>> > >>>
>> > >>> Latest snapshot is uploaded in Nov 2015, but checkins are still
>> coming
>> > >>> in quite frequently.
>> > >>> https://repository.apache.org/content/repositories/snapshots
>> /org/apach
>> > >>> e/hadoop/hadoop-yarn-api/
>> > >>>
>> > >>> Are there any plans to start producing updated SNAPSHOT artifacts
>> for
>> > >>> current hadoop development lines?
>> > >>>
>> > >>
>> > >>
>> > >> 
>> -
>> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> > >>
>> > >>
>> > >> 
>> -
>> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> > >>
>> > >>
>> > >
>> > > -
>> > > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
>> > > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>
>


Re: [DISCUSS] Release cadence and EOL

2016-11-09 Thread Karthik Kambatla
Fair points, Sangjin and Andrew.

To get the ball rolling on this, I am willing to try the proposed policy.

On Fri, Nov 4, 2016 at 12:09 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> I'm certainly willing to try this policy. There's definitely room for
> improvement when it comes to streamlining the release process. The
> create-release script that Allen wrote helps, but there are still a lot of
> manual steps in HowToRelease for staging and publishing a release.
>
> Another perennial problem is reconciling git log with the changes and
> release notes and JIRA information. I think each RM has written their own
> scripts for this, but it could probably be automated into a Jenkins report.
>
> And the final problem is that branches are often not in a releasable
> state. This is because we don't have any upstream integration testing. For
> instance, testing with 3.0.0-alpha1 has found a number of latent
> incompatibilities in the 2.8.0 branch. If we want to meaningfully speed up
> the minor release cycle, continuous integration testing is a must.
>
> Best,
> Andrew
>
> On Fri, Nov 4, 2016 at 10:33 AM, Sangjin Lee <sj...@apache.org> wrote:
>
>> Thanks for your thoughts and more data points Andrew.
>>
>> I share your concern that the proposal may be more aggressive than what
>> we have been able to accomplish so far. I'd like to hear from the community
>> what is a desirable release cadence which is still within the realm of the
>> possible.
>>
>> The EOL policy can also be a bit of a forcing function. By having a
>> defined EOL, hopefully it would prod the community to move faster with
>> releases. Of course, automating releases and testing should help.
>>
>>
>> On Tue, Nov 1, 2016 at 4:31 PM, Andrew Wang <andrew.w...@cloudera.com>
>> wrote:
>>
>>> Thanks for pushing on this Sangjin. The proposal sounds reasonable.
>>>
>>> However, for it to have teeth, we need to be *very* disciplined about the
>>> release cadence. Looking at our release history, we've done 4 maintenance
>>> releases in 2016 and no minor releases. 2015 had 4 maintenance and 1
>>> minor
>>> release. The proposal advocates for 12 maintenance releases and 2 minors
>>> per year, or about 3.5x more releases than we've historically done. I
>>> think
>>> achieving this will require significantly streamlining our release and
>>> testing process.
>>>
>>> For some data points, here are a few EOL lifecycles for some major
>>> projects. They talk about support in terms of time (not number of
>>> releases), and release on a cadence.
>>>
>>> Ubuntu maintains LTS for 5 years:
>>> https://www.ubuntu.com/info/release-end-of-life
>>>
>>> Linux LTS kernels have EOLs ranging from 2 to 6 years, though it seems
>>> only
>>> one has actually ever been EOL'd:
>>> https://www.kernel.org/category/releases.html
>>>
>>> Mesos supports minor releases for 6 months, with a new minor every 2
>>> months:
>>> http://mesos.apache.org/documentation/latest/versioning/
>>>
>>> Eclipse maintains each minor for ~9 months before moving onto a new
>>> minor:
>>> http://stackoverflow.com/questions/35997352/how-to-determine
>>> -end-of-life-for-eclipse-versions
>>>
>>>
>>>
>>> On Fri, Oct 28, 2016 at 10:55 AM, Sangjin Lee <sj...@apache.org> wrote:
>>>
>>> > Reviving an old thread. I think we had a fairly concrete proposal on
>>> the
>>> > table that we can vote for.
>>> >
>>> > The proposal is a minor release on the latest major line every 6
>>> months,
>>> > and a maintenance release on a minor release (as there may be
>>> concurrently
>>> > maintained minor releases) every 2 months.
>>> >
>>> > A minor release line is EOLed 2 years after it is first released or
>>> there
>>> > are 2 newer minor releases, whichever is sooner. The community
>>> reserves the
>>> > right to extend or shorten the life of a release line if there is a
>>> good
>>> > reason to do so.
>>> >
>>> > Comments? Objections?
>>> >
>>> > Regards,
>>> > Sangjin
>>> >
>>> >
>>> > On Tue, Aug 23, 2016 at 9:33 AM, Karthik Kambatla <ka...@cloudera.com>
>>> > wrote:
>>> >
>>> > >
>>> > >> Here is just an idea to get started. How about "a minor release
>>> lin

Re: Updated 2.8.0-SNAPSHOT artifact

2016-10-25 Thread Karthik Kambatla
Is there value in releasing current branch-2.8? Aren't we better off
re-cutting the branch off of branch-2?

On Tue, Oct 25, 2016 at 12:20 AM, Akira Ajisaka 
wrote:

> It's almost a year since branch-2.8 has cut.
> I'm thinking we need to release 2.8.0 ASAP.
>
> According to the following list, there are 5 blocker and 6 critical issues.
> https://issues.apache.org/jira/issues/?filter=12334985
>
> Regards,
> Akira
>
>
> On 10/18/16 10:47, Brahma Reddy Battula wrote:
>
>> Hi Vinod,
>>
>> Any plan on first RC for branch-2.8 ? I think, it has been long time.
>>
>>
>>
>>
>> --Brahma Reddy Battula
>>
>> -Original Message-
>> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org]
>> Sent: 20 August 2016 00:56
>> To: Jonathan Eagles
>> Cc: common-...@hadoop.apache.org
>> Subject: Re: Updated 2.8.0-SNAPSHOT artifact
>>
>> Jon,
>>
>> That is around the time when I branched 2.8, so I guess you were getting
>> SNAPSHOT artifacts till then from the branch-2 nightly builds.
>>
>> If you need it, we can set up SNAPSHOT builds. Or just wait for the first
>> RC, which is around the corner.
>>
>> +Vinod
>>
>> On Jul 28, 2016, at 4:27 PM, Jonathan Eagles  wrote:
>>>
>>> Latest snapshot is uploaded in Nov 2015, but checkins are still coming
>>> in quite frequently.
>>> https://repository.apache.org/content/repositories/snapshots/org/apach
>>> e/hadoop/hadoop-yarn-api/
>>>
>>> Are there any plans to start producing updated SNAPSHOT artifacts for
>>> current hadoop development lines?
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: Chrome extension to collapse JIRA comments

2016-10-17 Thread Karthik Kambatla
Never included the link :)

https://github.com/gezapeti/jira-comment-collapser


On Mon, Oct 17, 2016 at 6:46 PM, Karthik Kambatla <ka...@cloudera.com>
wrote:

> Hi folks
>
> Sorry for the widespread email, but thought you would find this useful.
>
> My colleague, Peter, had put together this chrome extension to collapse
> comments from certain users (HadoopQA, Githubbot) that makes tracking
> conversations in JIRAs much easier.
>
> Cheers!
> Karthik
>
>
>


Chrome extension to collapse JIRA comments

2016-10-17 Thread Karthik Kambatla
Hi folks

Sorry for the widespread email, but thought you would find this useful.

My colleague, Peter, had put together this chrome extension to collapse
comments from certain users (HadoopQA, Githubbot) that makes tracking
conversations in JIRAs much easier.

Cheers!
Karthik


Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-07 Thread Karthik Kambatla
Thanks for putting the RC together, Sangjin.

+1 (binding)

Built from source, deployed pseudo distributed cluster and ran some example
MR jobs.

On Fri, Oct 7, 2016 at 6:01 PM, Yongjun Zhang  wrote:

> Hi Sangjin,
>
> Thanks a lot for your work here.
>
> My +1 (binding).
>
> - Downloaded both binary and src tarballs
> - Verified md5 checksum and signature for both
> - Build from source tarball
> - Deployed 2 pseudo clusters, one with the released tarball and the other
> with what I built from source, and did the following on both:
> - Run basic HDFS operations, and distcp jobs
> - Run pi job
> - Examined HDFS webui, YARN webui.
>
> Best,
>
> --Yongjun
>
> > > > * verified basic HDFS operations and Pi job.
> > > > * Did a sanity check for RM and NM UI.
>
>
> On Fri, Oct 7, 2016 at 5:08 PM, Sangjin Lee  wrote:
>
> > I'm casting my vote: +1 (binding)
> >
> > Regards,
> > Sangjin
> >
> > On Fri, Oct 7, 2016 at 3:12 PM, Andrew Wang 
> > wrote:
> >
> > > Thanks to Chris and Sangjin for working on this release.
> > >
> > > +1 binding
> > >
> > > * Verified signatures
> > > * Built from source tarball
> > > * Started HDFS and did some basic ops
> > >
> > > Thanks,
> > > Andrew
> > >
> > > On Fri, Oct 7, 2016 at 2:50 PM, Wangda Tan 
> wrote:
> > >
> > > > Thanks Sangjin for cutting this release!
> > > >
> > > > +1 (Binding)
> > > >
> > > > - Downloaded binary tar ball and setup a single node cluster.
> > > > - Submit a few applications and which can successfully run.
> > > >
> > > > Thanks,
> > > > Wangda
> > > >
> > > >
> > > > On Fri, Oct 7, 2016 at 10:33 AM, Zhihai Xu 
> > > wrote:
> > > >
> > > > > Thanks Sangjin for creating release 2.6.5 RC1.
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > * Downloaded and built from source
> > > > > * Verified md5 checksums and signature
> > > > > * Deployed a pseudo cluster
> > > > > * verified basic HDFS operations and Pi job.
> > > > > * Did a sanity check for RM and NM UI.
> > > > >
> > > > > Thanks
> > > > > zhihai
> > > > >
> > > > > On Fri, Oct 7, 2016 at 8:16 AM, Sangjin Lee 
> > wrote:
> > > > >
> > > > > > Thanks Masatake!
> > > > > >
> > > > > > Today's the last day for this vote, and I'd like to ask you to
> try
> > > out
> > > > > the
> > > > > > RC and vote on this today. So far there has been no binding vote.
> > > > Thanks
> > > > > > again.
> > > > > >
> > > > > > Regards,
> > > > > > Sangjin
> > > > > >
> > > > > > On Fri, Oct 7, 2016 at 6:45 AM, Masatake Iwasaki <
> > > > > > iwasak...@oss.nttdata.co.jp> wrote:
> > > > > >
> > > > > > > +1(non-binding)
> > > > > > >
> > > > > > > * verified signature and md5.
> > > > > > > * built with -Pnative on CentOS6 and OpenJDK7.
> > > > > > > * built documentation and skimmed the contents.
> > > > > > > * built rpms by bigtop and ran smoke-tests of hdfs, yarn and
> > > > mapreduce
> > > > > on
> > > > > > > 3-node cluster.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Masatake Iwasaki
> > > > > > >
> > > > > > > On 10/3/16 09:12, Sangjin Lee wrote:
> > > > > > >
> > > > > > >> Hi folks,
> > > > > > >>
> > > > > > >> I have pushed a new release candidate (R1) for the Apache
> Hadoop
> > > > 2.6.5
> > > > > > >> release (the next maintenance release in the 2.6.x release
> > line).
> > > > RC1
> > > > > > >> contains fixes to CHANGES.txt, and is otherwise identical to
> > RC0.
> > > > > > >>
> > > > > > >> Below are the details of this release candidate:
> > > > > > >>
> > > > > > >> The RC is available for validation at:
> > > > > > >> http://home.apache.org/~sjlee/hadoop-2.6.5-RC1/.
> > > > > > >>
> > > > > > >> The RC tag in git is release-2.6.5-RC1 and its git commit is
> > > > > > >> e8c9fe0b4c252caf2ebf1464220599650f119997.
> > > > > > >>
> > > > > > >> The maven artifacts are staged via repository.apache.org at:
> > > > > > >> https://repository.apache.org/content/repositories/
> > > > > > orgapachehadoop-1050/.
> > > > > > >>
> > > > > > >> You can find my public key at
> > > > > > >> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS.
> > > > > > >>
> > > > > > >> Please try the release and vote. The vote will run for the
> > usual 5
> > > > > > days. I
> > > > > > >> would greatly appreciate your timely vote. Thanks!
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Sangjin
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Release cadence and EOL

2016-08-23 Thread Karthik Kambatla
>
>
> Here is just an idea to get started. How about "a minor release line is
> EOLed 2 years after it is released or there are 2 newer minor releases,
> whichever is sooner. The community reserves the right to extend or shorten
> the life of a release line if there is a good reason to do so."
>
>
Sounds reasonable, especially for our first commitment. For current
releases, this essentially means 2.6.x is maintained until Nov 2016 and Apr
2017 if 2.8 and 2.9 are not released by those dates.

IIUC EOL does two things - (1) eases the maintenance cost for developers
past EOL, and (2) indicates to the user when they must upgrade by. For the
latter, would users appreciate a specific timeline without any caveats for
number of subsequent minor releases?

If we were to give folks a specific period for EOL for x.y.z, we should
plan on releasing at least x.y+1.1 by then. 2 years might be a good number
to start with given our current cadence, and adjusted in the future as
needed.


[DISCUSS] Release cadence and EOL

2016-08-12 Thread Karthik Kambatla
Forking off this discussion from 2.6.5 release thread. Junping and Chris T
have brought up important concerns regarding too many concurrent releases
and the lack of EOL for our releases.

First up, it would be nice to hear from others on our releases having the
notion of EOL and other predictability is indeed of interest.

Secondly, I believe EOLs work better in conjunction with a predictable
cadence. Given past discussions on this and current periods between our
minor releases, I would like to propose a minor release on the latest major
line every 6 months and a maintenance release on the latest minor release
every 2 months.

Eager to hear others thoughts.

Thanks
Karthik


Re: [Release thread] 2.6.5 release activities

2016-08-11 Thread Karthik Kambatla
Since there is sufficient interest in 2.6.5, we should probably do it. All
the reasons Allen outlines make sense.

That said, Junping brings up a very important point that we should think of
for future releases. For a new user or a user that does not directly
contribute to the project, more stable releases make it hard to pick from.

As Chris T mentioned, the notion of EOL for our releases seems like a good
idea. However, to come up with any EOLs, we need to have some sort of
cadence for the releases. While this is hard for major releases (big bang,
potentially incompatible features), it should be doable for minor releases.

How do people feel about doing a minor release every 6 months, with
follow-up maintenance releases every 2 months until the next minor and as
needed after that? That way, we could EOL a minor release a year after its
initial release? In the future, we could consider shrinking this window. In
addition to the EOL, this also makes our releases a little more predictable
for both users and vendors. Note that 2.6.0 went out almost 2 years ago and
we haven't had a new minor release in 14 months. I am happy to start
another DISCUSS thread around this if people think it is useful.

Thanks
Karthik

On Thu, Aug 11, 2016 at 12:50 PM, Allen Wittenauer  wrote:

>
> > On Aug 11, 2016, at 8:10 AM, Junping Du  wrote:
> >
> > Allen, to be clear, I am not against any branch release effort here.
> However,
>
> "I'm not an X but "
>
> > as RM for previous releases 2.6.3 and 2.6.4, I feel to have
> responsibility to take care branch-2.6 together with other RMs (Vinod and
> Sangjin) on this branch and understand current gap - especially, to get
> consensus from community on the future plan for 2.6.x.
> > Our bylaw give us freedom for anyone to do release effort, but our bylaw
> doesn't stop our rights for reasonable question/concern on any release
> plan. As you mentioned below, people can potentially fire up branch-1
> release effort. But if you call a release plan tomorrow for branch-1, I
> cannot imagine nobody will question on that effort. Isn't it?
>
> From previous discussions I've seen around releases, I
> think it would depend upon which employee from which vendor raised the
> question.
>
> > Let's keep discussions on releasing 2.6.5 more technical. IMO, to make
> 2.6.5 release more reasonable, shouldn't we check following questions first?
> > 1. Do we have any significant issues that should land on 2.6.5 comparing
> with 2.6.4?
> > 2. If so, any technical reasons (like: upgrade is not smoothly,
> performance downgrade, incompatibility with downstream projects, etc.) to
> stop our users to move from 2.6.4 to 2.7.2/2.7.3?
> > I believe having good answer on these questions can make our release
> plan more reasonable to the whole community. More thoughts?
>
> I think these questions are moot though:
>
> * Hadoop 2.6 is the last release to support JDK6.   That sort of ends any
> questions around moving to 2.7.
>
> * There are always bugs in software that can benefit from getting fixes.
> Given the JDK6 issue, yes, of course there are reasons why someone may want
> a 2.6.5.
>
> * If a company/vendor is willing to fund people to work on a release, I'd
> much rather they do that work in the ASF than off on their own somewhere.
> This way the community as a whole benefits.
>
>
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-08-09 Thread Karthik Kambatla
Most people I talked to found 3.0.0-alpha, 3.1.0-alpha/beta confusing. I am
not aware of any other software shipped that way. While being used by other
software does not make an approach right, I think we should adopt an
approach that is easy for our users to understand.

The notion of 3.0.0-alphaX and 3.0.0-betaX ending in 3.0.0 (GA) has been
proposed and okay for a long while. Do people still consider it okay? Is
there a specific need to consider alternatives?

On Mon, Aug 8, 2016 at 11:44 AM, Junping Du <j...@hortonworks.com> wrote:

> I think that incompatible API between 3.0.0-alpha and 3.1.0-beta is
> something less confusing than incompatible between 2.8/2.9 and 2.98.x
> alphas/2.99.x betas.
> Why not just follow our previous practice in the beginning of branch-2? we
> can have 3.0.0-alpha, 3.1.0-alpha/beta, but once when we are finalizing our
> APIs, we should bump up trunk version to 4.x for landing new incompatible
> changes.
>
> Thanks,
>
> Junping
> ________
> From: Karthik Kambatla <ka...@cloudera.com>
> Sent: Monday, August 08, 2016 6:54 PM
> Cc: common-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
> hdfs-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Release numbering semantics with concurrent (>2)
> releases [Was Setting JIRA fix versions for 3.0.0 releases]
>
> I like the 3.0.0-alphaX approach primarily for simpler understanding of
> compatibility guarantees. Calling 3.0.0 alpha and 3.1.0 beta is confusing
> because, it is not immediately clear that 3.0.0 and 3.1.0 could be
> incompatible in APIs.
>
> I am open to something like 2.98.x for alphas and 2.99.x for betas leading
> to a 3.0.0 GA. I have seen other projects use this without causing much
> confusion.
>
> On Thu, Aug 4, 2016 at 6:01 PM, Konstantin Shvachko <shv.had...@gmail.com>
> wrote:
>
> > On Thu, Aug 4, 2016 at 11:20 AM, Andrew Wang <andrew.w...@cloudera.com>
> > wrote:
> >
> > > Hi Konst, thanks for commenting,
> > >
> > > On Wed, Aug 3, 2016 at 11:29 PM, Konstantin Shvachko <
> > shv.had...@gmail.com
> > > > wrote:
> > >
> > >> 1. I probably missed something but I didn't get it how "alpha"s made
> > >> their way into release numbers again. This was discussed on several
> > >> occasions and I thought the common perception was to use just three
> > level
> > >> numbers for release versioning and avoid branding them.
> > >> It is particularly confusing to have 3.0.0-alpha1 and 3.0.0-alpha2.
> What
> > >> is alphaX - fourth level? I think releasing 3.0.0 and setting trunk to
> > >> 3.1.0 would be perfectly in line with our current release practices.
> > >>
> > >
> > > We discussed release numbering a while ago when discussing the release
> > > plan for 3.0.0, and agreed on this scheme. "-alphaX" is essentially a
> > > fourth level as you say, but the intent is to only use it (and
> "-betaX")
> > in
> > > the leadup to 3.0.0.
> > >
> > > The goal here is clarity for end users, since most other enterprise
> > > software uses a a.0.0 version to denote the GA of a new major version.
> > Same
> > > for a.b.0 for a new minor version, though we haven't talked about that
> > yet.
> > > The alphaX and betaX scheme also shares similarity to release
> versioning
> > of
> > > other enterprise software.
> > >
> >
> > As you remember we did this (alpha, beta) for Hadoop-2 and I don't think
> it
> > went well with user perception.
> > Say release 2.0.5-alpha turned out to be quite good even though still
> > branded "alpha", while 2.2 was not and not branded.
> > We should move a release to stable, when people ran it and agree it is GA
> > worthy. Otherwise you never know.
> >
> >
> > >
> > >> 2. I do not see any confusions with releasing 2.8.0 after 3.0.0.
> > >> The release number is not intended to reflect historical release
> > >> sequence, but rather the point in the source tree, which it was
> branched
> > >> off. So one can release 2.8, 2.9, etc. after or before 3.0.
> > >>
> > >
> > > As described earlier in this thread, the issue here is setting the fix
> > > versions such that the changelog is a useful diff from a previous
> > version,
> > > and also clear about what changes are present in each branch. If we do
> > not
> > > order a specific 2.x before 3.0, then we don't know what 2.x to diff
> > from.

Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-08-08 Thread Karthik Kambatla
I like the 3.0.0-alphaX approach primarily for simpler understanding of
compatibility guarantees. Calling 3.0.0 alpha and 3.1.0 beta is confusing
because, it is not immediately clear that 3.0.0 and 3.1.0 could be
incompatible in APIs.

I am open to something like 2.98.x for alphas and 2.99.x for betas leading
to a 3.0.0 GA. I have seen other projects use this without causing much
confusion.

On Thu, Aug 4, 2016 at 6:01 PM, Konstantin Shvachko 
wrote:

> On Thu, Aug 4, 2016 at 11:20 AM, Andrew Wang 
> wrote:
>
> > Hi Konst, thanks for commenting,
> >
> > On Wed, Aug 3, 2016 at 11:29 PM, Konstantin Shvachko <
> shv.had...@gmail.com
> > > wrote:
> >
> >> 1. I probably missed something but I didn't get it how "alpha"s made
> >> their way into release numbers again. This was discussed on several
> >> occasions and I thought the common perception was to use just three
> level
> >> numbers for release versioning and avoid branding them.
> >> It is particularly confusing to have 3.0.0-alpha1 and 3.0.0-alpha2. What
> >> is alphaX - fourth level? I think releasing 3.0.0 and setting trunk to
> >> 3.1.0 would be perfectly in line with our current release practices.
> >>
> >
> > We discussed release numbering a while ago when discussing the release
> > plan for 3.0.0, and agreed on this scheme. "-alphaX" is essentially a
> > fourth level as you say, but the intent is to only use it (and "-betaX")
> in
> > the leadup to 3.0.0.
> >
> > The goal here is clarity for end users, since most other enterprise
> > software uses a a.0.0 version to denote the GA of a new major version.
> Same
> > for a.b.0 for a new minor version, though we haven't talked about that
> yet.
> > The alphaX and betaX scheme also shares similarity to release versioning
> of
> > other enterprise software.
> >
>
> As you remember we did this (alpha, beta) for Hadoop-2 and I don't think it
> went well with user perception.
> Say release 2.0.5-alpha turned out to be quite good even though still
> branded "alpha", while 2.2 was not and not branded.
> We should move a release to stable, when people ran it and agree it is GA
> worthy. Otherwise you never know.
>
>
> >
> >> 2. I do not see any confusions with releasing 2.8.0 after 3.0.0.
> >> The release number is not intended to reflect historical release
> >> sequence, but rather the point in the source tree, which it was branched
> >> off. So one can release 2.8, 2.9, etc. after or before 3.0.
> >>
> >
> > As described earlier in this thread, the issue here is setting the fix
> > versions such that the changelog is a useful diff from a previous
> version,
> > and also clear about what changes are present in each branch. If we do
> not
> > order a specific 2.x before 3.0, then we don't know what 2.x to diff
> from.
> >
>
> So the problem is in determining the latest commit, which was not present
> in the last release, when the last release bears higher number than the one
> being released.
> Interesting problem. Don't have a strong opinion on that. I guess it's OK
> to have overlapping in changelogs.
> As long as we keep following the rule that commits should be made to trunk
> first and them propagated to lower branches until the target branch is
> reached.
>
>
> >
> >> 3. I agree that current 3.0.0 branch can be dropped and re-cut. We may
> >> think of another rule that if a release branch is not released in 3
> month
> >> it should be abandoned. Which is applicable to branch 2.8.0 and it is
> too
> >> much work syncing it with branch-2.
> >>
> >> Time-based rules are tough here. I'd prefer we continue to leave this up
> > to release managers. If you think we should recut branch-2.8, recommend
> > pinging Vinod and discussing on a new thread.
> >
>
> Not recut, but abandon 2.8.0. And Vinod (or anybody who volunteers to RM)
> can recut  from the desired point.
> People were committing to branch-2 and branch-2.8 for months. And they are
> out of sync anyways. So what's the point of the extra commit.
> Probably still a different thread.
>
> Thanks,
> --Konst
>


Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-28 Thread Karthik Kambatla
Inline.


>
>> BTW, I never see we have a clear definition for alpha release. It is
>> previously used as unstable in API definition (2.1-alpha, 2.2-alpha, etc.)
>> but sometimes means unstable in production quality (2.7.0). I think we
>> should clearly define it with major consensus so user won't
>> misunderstanding the risky here.
>>
>
> These are the definitions of "alpha" and "beta" used leading up to the 2.2
> GA release, so it's not something new. These are also the normal industry
> definitions. Alpha means no API compatibility guarantees, early software.
> Beta means API compatible, but still some bugs.
>
> If anything, we never defined the terms "alpha" and "beta" for 2.x
> releases post-2.2 GA. The thinking was that everything after would be
> compatible and thus (at the least) never alpha. I think this is why the
> website talks about the 2.7.x line as "stable" or "unstable" instead, but
> since I think we still guarantee API compatibility between 2.7.0 and 2.7.1,
> we could have just called 2.7.0 "beta".
>
> I think this would be good to have in our compat guidelines or somewhere.
> Happy to work with Karthik/Vinod/others on this.
>

I am not sure if we formally defined the terms "alpha" and "beta" for
Hadoop 2, but my understanding of them agrees with the general definitions
on the web.

Alpha:

   - Early version for testing - integration with downstream, deployment
   etc.
   - Not feature complete
   - No compatibility guarantees yet

Beta:

   - Feature complete
   - API compatibility guaranteed
   - Need clear definition for other kinds of compatibility (wire,
   client-dependencies, server-dependencies etc.)
   - Not ready for production deployments

GA

   - Ready for production
   - All the usual compatibility guarantees apply.

If there is general agreement, I can work towards getting this into our
documentation.


>
>> Also, if we treat our 3.0.0-alpha release work seriously, we should also
>> think about trunk's version number issue (bump up to 4.0.0-alpha?) or there
>> could be no room for 3.0 incompatible feature/bits soon.
>>
>> While we're still in alpha for 3.0.0, there's no need for a separate
> 4.0.0 version since there's no guarantee of API compatibility. I plan to
> cut a branch-3 for the beta period, at which point we'll upgrade trunk to
> 4.0.0-alpha1. This is something we discussed on another mailing list thread.
>

Branching at beta time seems reasonable.

Overall, are there any incompatible changes on trunk that we wouldn't be
comfortable shipping in 3.0.0. If yes, do we feel comfortable shipping
those bits ever?


>
> Best,
> Andrew
>


Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Karthik Kambatla
Inline.

> 1) Set the fix version for all a.b.c versions, where c > 0.
> 2) For each major release line, set the lowest a.b.0 version.
>

Sounds reasonable.


>
> The -alphaX versions we're using leading up to 3.0.0 GA can be treated as
> a.b.c versions, with alpha1 being the a.b.0 release.
>

Once 3.0.0 GA goes out, a user would want to see the diff from the latest
2.x.0 release (say 2.9.0).

Are you suggesting 3.0.0 GA would have c = 5 (say) and hence rule 1 would
apply, and it should show up in the release notes?


>
> As an example, if a JIRA was committed to branch-2.6, branch-2.7, branch-2,
> branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5, 2.7.3,
> 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
> rule 1, and the last two fix versions come from rule 2.
>
> I'm very eager to move this discussion forward, so feel free to reach out
> on or off list if I can help with anything.
>


I think it is good practice to set multiple fix versions. However, it might
take the committers a little bit to learn.

Since the plan is to cut 3.0.0 off trunk, can we just bulk edit to add the
3.0.0-alphaX version?


>
> Best,
> Andrew
>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-26 Thread Karthik Kambatla
IIRR, the vote is on source artifacts and binaries are for convenience.

If that is right, I am open to either options - do another RC or continue
this vote and fix the binary artifacts.

On Tue, Jul 26, 2016 at 12:11 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> Thanks Daniel and Wei.
>
> I think these are worth fixing, I’m withdrawing this RC. Will look at
> fixing these issues and roll a new candidate with the fixes as soon as
> possible.
>
> Thanks
> +Vinod
>
> > On Jul 26, 2016, at 11:05 AM, Wei-Chiu Chuang 
> wrote:
> >
> > I noticed two issues:
> >
> > (1) I ran hadoop checknative, but it seems the binary tarball was not
> compiled with native library for Linux. On the contrary, the Hadoop built
> from source tarball with maven -Pnative can find the native libraries on
> the same host.
> >
> > (2) I noticed that the release dates in CHANGES.txt in tag
> release-2.7.3-RC0 are set to Release 2.7.3 - 2016-07-27.
> > However, the release dates in CHANGES.txt in the source and binary tar
> balls are set to Release 2.7.3 - 2016-08-01. This is probably a non-issue
> though.
> >
> > * Downloaded source and binary.
> > * Verified signature.
> > * Verified checksum.
> > * Built from source using 64-bit Java 7 (1.7.0.75) and 8 (1.8.0.05).
> Both went fine.
> > * Ran hadoop checknative
> >
> > On Tue, Jul 26, 2016 at 9:12 AM, Rushabh Shah
> >
> wrote:
> > Thanks Vinod for all the release work !
> > +1 (non-binding).
> > * Downloaded from source and built it.* Deployed a pseudo distributed
> cluster.
> > * Ran some sample jobs: sleep, pi* Ran some dfs commands.* Everything
> works fine.
> >
> >
> > On Friday, July 22, 2016 9:16 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org > wrote:
> >
> >
> >  Hi all,
> >
> > I've created a release candidate RC0 for Apache Hadoop 2.7.3.
> >
> > As discussed before, this is the next maintenance release to follow up
> 2.7.2.
> >
> > The RC is available for validation at:
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/> <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>>
> >
> > The RC tag in git is: release-2.7.3-RC0
> >
> > The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/>  http://repository.apache.org/>> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/>
>   >>
> >
> > The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html> <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>>
> for your quick perusal.
> >
> > As you may have noted, a very long fix-cycle for the License & Notice
> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
> to slip by quite a bit. This release's related discussion thread is linked
> below: [1].
> >
> > Please try the release and vote; the vote will run for the usual 5 days.
> >
> > Thanks,
> > Vinod
> >
> > [1]: 2.7.3 release plan:
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html> <
> http://markmail.org/thread/6yv2fyrs4jlepmmr <
> http://markmail.org/thread/6yv2fyrs4jlepmmr>>
> >
> >
> >
>
>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-25 Thread Karthik Kambatla
+1 (binding)

* Downloaded and build from source
* Checked LICENSE and NOTICE
* Pseudo-distributed cluster with FairScheduler
* Ran MR and HDFS tests
* Verified basic UI

On Sun, Jul 24, 2016 at 1:07 PM, larry mccay  wrote:

> +1 binding
>
> * downloaded and built from source
> * checked LICENSE and NOTICE files
> * verified signatures
> * ran standalone tests
> * installed pseudo-distributed instance on my mac
> * ran through HDFS and mapreduce tests
> * tested credential command
> * tested webhdfs access through Apache Knox
>
>
> On Fri, Jul 22, 2016 at 10:15 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
>
> > Hi all,
> >
> > I've created a release candidate RC0 for Apache Hadoop 2.7.3.
> >
> > As discussed before, this is the next maintenance release to follow up
> > 2.7.2.
> >
> > The RC is available for validation at:
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
> >
> > The RC tag in git is: release-2.7.3-RC0
> >
> > The maven artifacts are available via repository.apache.org <
> > http://repository.apache.org/> at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> >
> >
> > The release-notes are inside the tar-balls at location
> > hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> > hosted this at
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> > http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>
> for
> > your quick perusal.
> >
> > As you may have noted, a very long fix-cycle for the License & Notice
> > issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop
> release)
> > to slip by quite a bit. This release's related discussion thread is
> linked
> > below: [1].
> >
> > Please try the release and vote; the vote will run for the usual 5 days.
> >
> > Thanks,
> > Vinod
> >
> > [1]: 2.7.3 release plan:
> > https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html
> <
> > http://markmail.org/thread/6yv2fyrs4jlepmmr>
>


Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Karthik Kambatla
Thanks for clarifying Andrew. Inline.

On Mon, Jun 13, 2016 at 3:59 PM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

>
> On Fri, Jun 10, 2016 at 9:39 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> I would like to understand the trunk-incompat part of the proposal a
>> little better.
>>
>> Is trunk-incompat always going to be a superset of trunk? If yes, is it
>> just a change in naming convention with a hope that our approach to trunk
>> stability changes as Sangjin mentioned?
>>
>> Or, is it okay for trunk-incompat to be based off of an older commit in
>> trunk with (in)frequent rebases? This has the risk of incompatible changes
>> truly rotting. Periodic rebases will ensure these changes don't rot while
>> also easing the burden of hosting two branches; if we choose this route,
>> some guidance of the period and who rebases will be nice.
>>
>
> Based on my understanding from Vinod on the previous "Looking to..."
> thread, it would be the latter. The goal of trunk-incompat was to avoid
> adding yet-another-branch we need to commit to every time, compared to the
> branch-3 proposal.
>
> I agree with the concerns you raise around feature rot. For a feature like
> EC, it'd be untenable to leave it in trunk-incompat since the rebases would
> be impossible. I imagine we'd also need a very motivated maintainer (or
> maintainers) to handle the periodic integration of new trunk commits, since
> you'd potentially be doing it for multiple large features. If some brave
> and experienced committer is willing to own maintenance of the
> trunk-incompat branch, I think it could work. However, this is a big shift
> from how we've historically done development.
>

If an incompatible feature is ready (like EC here), should we consider
working towards the next major release? In other words, is it okay to defer
cutting branch-3 until we have a large incompatible feature that would be a
pain to keep up with?


>
> This is why I leaned toward Chris D's proposal, which is that we cut
> branch-3 for 3.0.0-beta1, at which point trunk moves on to 4.0. In my mind,
> this is the "default" proposal, since it's how we've previously done
> things, with the slight adjustment that we defer cutting branch-3 until we
> start enforcing compatibility. This is my current plan for the Hadoop 3
> series, and we already had a lot of +1's about releasing from trunk on the
> previous thread.
>

I guess this makes sense.


>
> If there's a strong advocate for trunk-incompat over branch-3, let's have
> that discussion. However, given that beta is still months (and multiple
> releases) away, I don't think this decision affects my near-term goal of
> getting 3.0.0-alpha1 released.
>
> Thanks,
> Andrew
>


Re: [DISCUSS] Increased use of feature branches

2016-06-10 Thread Karthik Kambatla
Even if we release from the
>>> trunk, if
>>> >our bar for merging to trunk is low, the quality will not improve
>>> >automatically. So I think we ought to tackle the quality question first.
>>> >
>>> >My 2 cents.
>>> >
>>> >
>>> >On Fri, Jun 10, 2016 at 8:57 AM, Zhe Zhang <z...@apache.org> wrote:
>>> >
>>> >> Thanks for the notes Andrew, Junping, Karthik.
>>> >>
>>> >> Here are some of my understandings:
>>> >>
>>> >> - Trunk is the "latest and greatest" of Hadoop. If a user starts using
>>> >> Hadoop today, without legacy workloads, trunk is what he/she should
>>> use.
>>> >> - Therefore, each commit to trunk should be transactional -- atomic,
>>> >> consistent, isolated (from other uncommitted patches); I'm not so sure
>>> >> about durability, Hadoop might be gone in 50 years :). As a
>>> committer, I
>>> >> should be able to look at a patch and determine whether it's a
>>> >> self-contained improvement of trunk, without looking at other
>>> uncommitted
>>> >> patches.
>>> >> - Some comments inline:
>>> >>
>>> >> On Fri, Jun 10, 2016 at 6:56 AM Junping Du <j...@hortonworks.com>
>>> wrote:
>>> >>
>>> >> > Comparing with advantages, I believe the disadvantages of shipping
>>> any
>>> >> > releases directly from trunk are more obvious and significant:
>>> >> > - A lot of commits (incompatible, risky, uncompleted feature, etc.)
>>> have
>>> >> > to wait to commit to trunk or put into a separated branch that could
>>> >> delay
>>> >> > feature development progress as additional vote process get
>>> involved even
>>> >> > the feature is simple and harmless.
>>> >> >
>>> >> Thanks Junping, those are valid concerns. I think we should clearly
>>> >> separate incompatible with  uncompleted / half-done work in this
>>> >> discussion. Whether people should commit incompatible changes to
>>> trunk is a
>>> >> much more tricky question (related to trunk-incompat etc.). But per my
>>> >> comment above, IMHO, *not committing uncompleted work to trunk*
>>> should be a
>>> >> much easier principle to agree upon.
>>> >>
>>> >>
>>> >> > - For small feature with only 1 or 2 commits, that need three +1
>>> from
>>> >> PMCs
>>> >> > will increase the bar largely for contributors who just start to
>>> >> contribute
>>> >> > on Hadoop features but no such sufficient support.
>>> >> >
>>> >> Development overhead is another valid concern. I think our
>>> rule-of-thumb
>>> >> should be that, small-medium new features should be proposed as a
>>> single
>>> >> JIRA/patch (as we recently did for HADOOP-12666). If the complexity
>>> goes
>>> >> beyond a single JIRA/patch, use a feature branch.
>>> >>
>>> >>
>>> >> >
>>> >> > Given these concerns, I am open to other options, like: proposed by
>>> Vinod
>>> >> > or Chris, but rather than to release anything directly from trunk.
>>> >> >
>>> >> > - This point doesn't necessarily need to be resolved now though,
>>> since
>>> >> > again we're still doing alphas.
>>> >> > No. I think we have to settle down this first. Without a common
>>> agreed
>>> >> and
>>> >> > transparent release process and branches in community, any release
>>> >> (alpha,
>>> >> > beta) bits is only called a private release but not a official
>>> apache
>>> >> > hadoop release (even alpha).
>>> >> >
>>> >> >
>>> >> > Thanks,
>>> >> >
>>> >> > Junping
>>> >> > 
>>> >> > From: Karthik Kambatla <ka...@cloudera.com>
>>> >> > Sent: Friday, June 10, 2016 7:49 AM
>>> >> > To: Andrew Wang
>>> >> > Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
>>> >> > mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
>

Re: [DISCUSS] Increased use of feature branches

2016-06-10 Thread Karthik Kambatla
Inline.

On Fri, Jun 10, 2016 at 6:56 AM, Junping Du <j...@hortonworks.com> wrote:

> Comparing with advantages, I believe the disadvantages of shipping any
> releases directly from trunk are more obvious and significant:
> - A lot of commits (incompatible, risky, uncompleted feature, etc.) have
> to wait to commit to trunk or put into a separated branch that could delay
> feature development progress as additional vote process get involved even
> the feature is simple and harmless.
>

Including these sorts of commits in trunk is a major pain.

One example from a recent mistake I made:
YARN-2877 and YARN-1011 had some common changes. Instead of putting them in
a separate branch, I committed these common changes to trunk because well
we don't release from trunk and what can go wrong. After a few days, other
contributors and committers started feeling annoyed about having to submit
two different patches for trunk and branch-2. This inconvenience led to
those patches being pulled into branch-2 even though they were not ready
for inclusion in branch-2 or a 2.x release.

I feel the major friction for feature branches comes from only some
features using it. If everyone uses feature branches and we have better
processes around quantifying the stability of a feature branch, feature
branches should make for a smoother experience for everyone.

It is not uncommon for features to get merged into trunk before being ready
with promises of follow-up work. While that might very well be the intent
of contributors, other work items come up and things get sidelined. How
often have we seen features without HA and security.


>
> - These commits left in separated branches are isolated and get more
> chance to conflict each other, and more bugs could be involved due to
> conflicts and/or less eyes watching/bless on isolated branches.
>

Partially agree. There is a tradeoff here: if we keep putting them into
trunk, they (1) destabilize trunk, and (2) conflict with other bug fixes
and smaller improvements.


>
> - More unnecessary arguments/debates will happen on if some commits should
> land on trunk or a separated branch, just like what we have recently.
>

Again, clearly defining the requirements to be merged into trunk will make
this easier. How is this different from what we do today for branch-2? If
we still have debates, that is probably required? Not having them today is
actually a concern?


>
> - Because branches will get increased massively, more community efforts
> will be spent on review & vote for branches merge that means less effort
> will be spent on other commits review given our review bandwidth is quite
> short so far.
>

Yes and no. Strictly using feature branches will serialize features.
Integrating with other features is a one-time, albeit more involved,
process instead of multiple rebases/resolutions each somewhat involved.


>
> - For small feature with only 1 or 2 commits, that need three +1 from PMCs
> will increase the bar largely for contributors who just start to contribute
> on Hadoop features but no such sufficient support.
>

If a feature/improvement is not supported by 3 committers (not PMC
members), it is probably worth looking at why. May be, this feature should
not be included at all?

I am open to changing the requirements for a merge. What do you think of
one +1 (thorough review) and two +0s (high-level review).

If the concern is finding enough committers, I would like for the PMC to
consider voting in more committers and increasing bandwidth.


>
> Given these concerns, I am open to other options, like: proposed by Vinod
> or Chris, but rather than to release anything directly from trunk.
>

I actually thought this was Vinod's proposal. My understanding is Andrew is
resurfacing this so we finalize things.


>
> - This point doesn't necessarily need to be resolved now though, since
> again we're still doing alphas.
> No. I think we have to settle down this first. Without a common agreed and
> transparent release process and branches in community, any release (alpha,
> beta) bits is only called a private release but not a official apache
> hadoop release (even alpha).
>
>
I am absolutely with Junping here. Changing this process primarily requires
a change in our mental model. I think it is pretty important that we decide
on one approach preferably before doing an alpha release.

To clarify: our current approach (trunk and branch-2) has been working
okay. The only issue I see is in the way we take merging into trunk
lightly. If we have well-defined requirements for merging to trunk and take
those seriously, I am comfortable with using the approach for 3.x. The new
proposal forces following these requirements and hence I like it more.


>
> Thanks,
>
> Junping
> 
> From: Karthik Kambatla <ka..

Re: [DISCUSS] Increased use of feature branches

2016-06-10 Thread Karthik Kambatla
Thanks for restarting this thread Andrew. I really hope we can get this
across to a VOTE so it is clear.

I see a few advantages shipping from trunk:

   - The lack of need for one additional backport each time.
   - Feature rot in trunk

Instead of creating branch-3, I recommend creating branch-3.x so we can
continue doing 3.x releases off branch-3 even after we move trunk to 4.x (I
said it :))

On Thu, Jun 9, 2016 at 11:12 PM, Andrew Wang 
wrote:

> Hi all,
>
> On a separate thread, a question was raised about 3.x branching and use of
> feature branches going forward.
>
> We discussed this previously on the "Looking to a Hadoop 3 release" thread
> that has spanned the years, with Vinod making this proposal (building on
> ideas from others who also commented in the email thread):
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201604.mbox/browser
>
> Pasting here for ease:
>
> On an unrelated note, offline I was pitching to a bunch of
> contributors another idea to deal
> with rotting trunk post 3.x: *Make 3.x releases off of trunk directly*.
>
> What this gains us is that
>  - Trunk is always nearly stable or nearly ready for releases
>  - We no longer have some code lying around in some branch (today’s
> trunk) that is not releasable
> because it gets mixed with other undesirable and incompatible changes.
>  - This needs to be coupled with more discipline on individual
> features - medium to to large
> features are always worked upon in branches and get merged into trunk
> (and a nearing release!)
> when they are ready
>  - All incompatible changes go into some sort of a trunk-incompat
> branch and stay there till
> we accumulate enough of those to warrant another major release.
>
> Regarding "trunk-incompat", since we're still in the alpha stage for 3.0.0,
> there's no need for this branch yet. This aspect of Vinod's proposal was
> still under a bit of discussion; Chris Douglas though we should cut a
> branch-3 for the first 3.0.0 beta, which aligns with my original thinking.
> This point doesn't necessarily need to be resolved now though, since again
> we're still doing alphas.
>
> What we should get consensus on is the goal of keeping trunk stable, and
> achieving that by doing more development on feature branches and being
> judicious about merges. My sense from the Hadoop 3 email thread (and the
> more recent one on the async API) is that people are generally in favor of
> this.
>
> We're just about ready to do the first 3.0.0 alpha, so would greatly
> appreciate everyone's timely response in this matter.
>
> Thanks,
> Andrew
>


Re: Looking to a Hadoop 3 release

2016-05-12 Thread Karthik Kambatla
I am with Vinod on avoiding merging mostly_complete_branches to trunk since
we are not shipping any release off it. If 3.x releases going off of trunk
is going to help with this, I am fine with that approach. We should still
make sure to keep trunk-incompat small and not include large features.

On Sat, Apr 23, 2016 at 6:53 PM, Chris Douglas  wrote:

> If we're not starting branch-3/trunk, what would distinguish it from
> trunk/trunk-incompat? Is it the same mechanism with different labels?
>
> That may be a reasonable strategy when we create branch-3, as a
> release branch for beta. Releasing 3.x from trunk will help us figure
> out which incompatibilities can be called out in an upgrade guide
> (e.g., "new feature X is incompatible with uncommon configuration Y")
> and which require code changes (e.g., "data loss upgrading a cluster
> with feature X"). Given how long trunk has been unreleased, we need
> more data from deployments to triage. How to manage transitions
> between major versions will always be case-by-case; consensus on how
> we'll address generic incompatible changes is not saving any work.
>
> Once created, removing functionality from branch-3 (leaving it in
> trunk) _because_ nobody volunteers cycles to address urgent
> compatibility issues is fair. It's also more workable than asking that
> features be committed to a branch that we have no plan to release,
> even as alpha. -C
>
> On Fri, Apr 22, 2016 at 6:50 PM, Vinod Kumar Vavilapalli
>  wrote:
> > Tx for your replies, Andrew.
> >
> >>> For exit criteria, how about we time box it? My plan was to do monthly
> >> alphas through the summer, leading up to beta in late August / early
> Sep.
> >> At that point we freeze and stabilize for GA in Nov/Dec.
> >
> >
> > Time-boxing is a reasonable exit-criterion.
> >
> >
> >> In this case, does trunk-incompat essentially become the new trunk? Or
> are
> >> we treating trunk-incompat as a feature branch, which periodically
> merges
> >> changes from trunk?
> >
> >
> > It’s the later. Essentially
> >  - trunk-incompat = trunk + only incompatible changes, periodically kept
> up-to-date to trunk
> >  - trunk is always ready to ship
> >  - and no compatible code gets left behind
> >
> > The reason for my proposal like this is to address the tension between
> “there is lot of compatible code in trunk that we are not shipping” and
> “don’t ship trunk, it has incompatibilities”. With this, we will not have
> (compatible) code not getting shipped to users.
> >
> > Obviously, we can forget about all of my proposal completely if everyone
> puts in all compatible code into branch-2 / branch-3 or whatever the main
> releasable branch is. This didn’t work in practice, have seen this not
> happening prominently during 0.21, and now 3.x.
> >
> > There is another related issue - "my feature is nearly ready, so I’ll
> just merge it into trunk as we don’t release that anyways, but not the
> current releasable branch - I’m lazy to fix the last few stability related
> issues”. With this, we will (should) get more disciplined, take feature
> stability on a branch seriously and merge a feature branch only when it is
> truly ready!
> >
> >> For 3.x, my strawman was to release off trunk for the alphas, then
> branch a
> >> branch-3 for the beta and onwards.
> >
> >
> > Repeating above, I’m proposing continuing to make GA 3.x releases also
> off of trunk! This way only incompatible changes don’t get shipped to users
> - by design! Eventually, trunk-incompat will be latest 3.x GA + enough
> incompatible code to warrant a 4.x, 5.x etc.
> >
> > +Vinod
>


[jira] [Created] (MAPREDUCE-6673) Add a test example job that grows in memory usage over time

2016-04-11 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6673:
---

 Summary: Add a test example job that grows in memory usage over 
time
 Key: MAPREDUCE-6673
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6673
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: test
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


While working on YARN-1011, I needed to put together an example that would have 
tasks increase their resource usage deterministically over time. It would be 
useful for any other utilization related work or stress tests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6669) Jobs with encrypted spills don't tolerate AM failures

2016-04-05 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6669:
---

 Summary: Jobs with encrypted spills don't tolerate AM failures
 Key: MAPREDUCE-6669
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6669
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Priority: Critical


The key used for encrypting intermediate data is not persisted anywhere, and 
hence can't be recovered the same way other MR jobs can be. We should support 
recovering these jobs as well, hopefully without having to re-run completed 
tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6638) Jobs with encrypted spills don't recover if the AM goes down

2016-02-19 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6638:
---

 Summary: Jobs with encrypted spills don't recover if the AM goes 
down
 Key: MAPREDUCE-6638
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6638
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.7.2
Reporter: Karthik Kambatla
Priority: Critical


Post the fix to CVE-2015-1776, jobs with ecrypted spills enabled cannot be 
recovered if the AM fails. We should store the key some place safe so they can 
actually be recovered. If there is no "safe" place, at least we should restart 
the job by re-running all mappers/reducers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: MAPREDUCE-6520 and MAPREDUCE-6543 review request

2016-02-17 Thread Karthik Kambatla
Dustin, one of us could definitely review the patches, but I highly
encourage involving folks in the community (and networking with them) for
non-urgent patches. Tsuyoshi is a cool guy and is generally keen on test
cleanups, as you can see from his chiming in on this JIRA. Want to try
hitting him up?

On Mon, Dec 28, 2015 at 12:32 PM, Dustin Cote  wrote:

> Hi folks,
>
> I've been working on upgrading the JUnits for the MR client and I've added
> patches to do so.  Can anyone help me out with a review?  I'd love to get
> these Junit upgrades in before the patches go stale.  Thanks!
>
> https://issues.apache.org/jira/browse/MAPREDUCE-6543
> https://issues.apache.org/jira/browse/MAPREDUCE-6520
>
> --
> Dustin Cote
> Customer Operations Engineer
> 
>


Re: [Release thread] 2.8.0 release activities

2016-02-03 Thread Karthik Kambatla
Thanks Vinod. Not labeling 2.8.0 stable sounds perfectly reasonable to me.
Let us not call it alpha or beta though, it is quite confusing. :)

On Wed, Feb 3, 2016 at 8:17 PM, Gangumalla, Uma 
wrote:

> Thanks Vinod. +1 for 2.8 release start.
>
> Regards,
> Uma
>
> On 2/3/16, 3:53 PM, "Vinod Kumar Vavilapalli"  wrote:
>
> >Seems like all the features listed in the Roadmap wiki are in. I¹m going
> >to try cutting an RC this weekend for a first/non-stable release off of
> >branch-2.8.
> >
> >Let me know if anyone has any objections/concerns.
> >
> >Thanks
> >+Vinod
> >
> >> On Nov 25, 2015, at 5:59 PM, Vinod Kumar Vavilapalli
> >> wrote:
> >>
> >> Branch-2.8 is created.
> >>
> >> As mentioned before, the goal on branch-2.8 is to put improvements /
> >>fixes to existing features with a goal of converging on an alpha release
> >>soon.
> >>
> >> Thanks
> >> +Vinod
> >>
> >>
> >>> On Nov 25, 2015, at 5:30 PM, Vinod Kumar Vavilapalli
> >>> wrote:
> >>>
> >>> Forking threads now in order to track all things related to the
> >>>release.
> >>>
> >>> Creating the branch now.
> >>>
> >>> Thanks
> >>> +Vinod
> >>>
> >>>
>  On Nov 25, 2015, at 11:37 AM, Vinod Kumar Vavilapalli
>  wrote:
> 
>  I think we¹ve converged at a high level w.r.t 2.8. And as I just sent
> out an email, I updated the Roadmap wiki reflecting the same:
> https://wiki.apache.org/hadoop/Roadmap
> 
> 
>  I plan to create a 2.8 branch EOD today.
> 
>  The goal for all of us should be to restrict improvements & fixes to
> only (a) the feature-set documented under 2.8 in the RoadMap wiki and
> (b) other minor features that are already in 2.8.
> 
>  Thanks
>  +Vinod
> 
> 
> > On Nov 11, 2015, at 12:13 PM, Vinod Kumar Vavilapalli
> >> wrote:
> >
> > - Cut a branch about two weeks from now
> > - Do an RC mid next month (leaving ~4weeks since branch-cut)
> > - As with 2.7.x series, the first release will still be called as
> >early / alpha release in the interest of
> >   ‹ gaining downstream adoption
> >   ‹ wider testing,
> >   ‹ yet reserving our right to fix any inadvertent incompatibilities
> >introduced.
> 
> >>>
> >>
> >
>
>


Re: [UPDATE] New ASF git policy on force-pushes / Tags / Stale branches

2016-01-27 Thread Karthik Kambatla
Filed https://issues.apache.org/jira/browse/INFRA-11136

On Mon, Jan 25, 2016 at 3:41 PM, Vinod Kumar Vavilapalli  wrote:

> I believe this is still in place, though I am not sure how we can verify
> this (without doing a force-push of course)
>
> +Vinod
>
> > One thing that wasn't clear from the INFRA announcement: are trunk,
> > branch-* branches protected against force-pushes in the new world? If
> not,
> > should we ask them to be locked up?
> >
> > Thanks
> > Karthik
> >
> > On Thu, Jan 14, 2016 at 10:26 PM, Vinod Kumar Vavilapalli <
> > vino...@apache.org> wrote:
> >
> >> Hi all,
> >>
> >> As some of you have noticed, we have an update from ASF infra on git
> >> branching policy: We no longer have a ASF wide mandate on disallowing
> >> force-pushes on all branches / tags.
> >>
> >> Summarizing information from the INFRA email for the sake of clarity in
> >> the midst of recent confusion
> >> - We now can do force pushes, and branch/tag deletion on any branch or
> >> tag except refs/tags/rel
> >> - Any force pushes will be annotated in the commit-email as “[Forced
> >> Update!]” for the community to watch out for undesired force-pushes
> >> - Only tags under refs/tags/rel are protected from force-push for the
> >> sake of release-provenance: Essentially, the releases that community
> votes
> >> on are archived in their entirety with the development history and we
> >> cannot alter that once a tag is created. As one might expect.
> >>
> >> What this means for us
> >> - Stale branches: There are a few stale branches that got accumulated.
> >>— During this branch moratorium, origin/bracnh-2.8 got created (May
> be
> >> as part of HDFS-8785, can’t say for sure)
> >>— A couple of stale branches that helped 2.6.1 release:
> >> origin/sjlee/hdfs-merge and origin/ajisakaa/common-merge
> >>— I’ll wait till EOD tomorrow for any yays/nays and delete them
> >> - Feature branch updates: Developers can now go rebase and force-push
> >> their feature branches.
> >> - Mainline branches: Mainline branches like trunk, branch-2 have always
> >> been configured to avoid force-pushes. In general, force-push continues
> to
> >> be recommended mainly for feature branches and definitely not on any
> >> mainline branches from which we make releases.
> >> - Release tags:
> >>— To follow ASF provenance policy, we will now push the final release
> >> tags under refs/tags/rel. We will first push the RC tags under where
> they
> >> reside now (refs/tags) and if the vote passes, the final tag will be
> >> created under refs/tags/rel.
> >>— I’ll update our release wiki page
> >> http://wiki.apache.org/hadoop/HowToRelease <
> >> http://wiki.apache.org/hadoop/HowToRelease> with the same details once
> I
> >> can get 2.7.2 release done using this updated process.
> >> - Existing release tags:
> >>— There is a general recommendation from INFRA team to take all of
> our
> >> existing release tags under "tags" and copy them to “rel”.
> >>— I’ll wait till EOD tomorrow for any yays/nays and copy existing
> >> releases under refs/tags/rel following general recommendations.
> >>
> >> Any comments / thoughts / questions welcome.
> >>
> >> Thanks
> >> +Vinod
>
>


Re: [UPDATE] New ASF git policy on force-pushes / Tags / Stale branches

2016-01-17 Thread Karthik Kambatla
+1 on all counts.

One thing that wasn't clear from the INFRA announcement: are trunk,
branch-* branches protected against force-pushes in the new world? If not,
should we ask them to be locked up?

Thanks
Karthik

On Thu, Jan 14, 2016 at 10:26 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> Hi all,
>
> As some of you have noticed, we have an update from ASF infra on git
> branching policy: We no longer have a ASF wide mandate on disallowing
> force-pushes on all branches / tags.
>
> Summarizing information from the INFRA email for the sake of clarity in
> the midst of recent confusion
>  - We now can do force pushes, and branch/tag deletion on any branch or
> tag except refs/tags/rel
>  - Any force pushes will be annotated in the commit-email as “[Forced
> Update!]” for the community to watch out for undesired force-pushes
>  - Only tags under refs/tags/rel are protected from force-push for the
> sake of release-provenance: Essentially, the releases that community votes
> on are archived in their entirety with the development history and we
> cannot alter that once a tag is created. As one might expect.
>
> What this means for us
>  - Stale branches: There are a few stale branches that got accumulated.
> — During this branch moratorium, origin/bracnh-2.8 got created (May be
> as part of HDFS-8785, can’t say for sure)
> — A couple of stale branches that helped 2.6.1 release:
> origin/sjlee/hdfs-merge and origin/ajisakaa/common-merge
> — I’ll wait till EOD tomorrow for any yays/nays and delete them
>  - Feature branch updates: Developers can now go rebase and force-push
> their feature branches.
>  - Mainline branches: Mainline branches like trunk, branch-2 have always
> been configured to avoid force-pushes. In general, force-push continues to
> be recommended mainly for feature branches and definitely not on any
> mainline branches from which we make releases.
>  - Release tags:
> — To follow ASF provenance policy, we will now push the final release
> tags under refs/tags/rel. We will first push the RC tags under where they
> reside now (refs/tags) and if the vote passes, the final tag will be
> created under refs/tags/rel.
> — I’ll update our release wiki page
> http://wiki.apache.org/hadoop/HowToRelease <
> http://wiki.apache.org/hadoop/HowToRelease> with the same details once I
> can get 2.7.2 release done using this updated process.
>  - Existing release tags:
> — There is a general recommendation from INFRA team to take all of our
> existing release tags under "tags" and copy them to “rel”.
> — I’ll wait till EOD tomorrow for any yays/nays and copy existing
> releases under refs/tags/rel following general recommendations.
>
> Any comments / thoughts / questions welcome.
>
> Thanks
> +Vinod


Highlighting communication on releases

2015-12-20 Thread Karthik Kambatla
Hi folks

In the last few months, we have been shipping multiple releases -
maintenance and minor - elevating the quality and purpose of our releases.

With the increase in releases and related communication, I wonder if we
need to highlight release-related communication in some way. Otherwise, it
is easy to miss the details in the plethora of emails on dev lists. For
instance, it appears committers might have missed the detail that
branch-2.8 was cut.

A couple of options I see:

   1. Tag release-related emails with [RELEASE] or some other appropriate
   tag.
   2. Separate mailing list for release specific information. May be, this
   list could be used for other project-level topics as well. common-dev@
   has been overloaded for both common-specific and project-level discussions.
   Also, folks wouldn't have to email all -dev@lists.

Would like to hear others' thoughts on the matter.

Karthik


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Karthik Kambatla
Did we consider cutting a branch-3 that borrows relatively compatible
patches from trunk to run the 3.x line? That said, I would like for us to
really tighten our compatibility policies and actually stick to them
starting the next major release.

On Wed, Nov 11, 2015 at 1:11 PM, Vinod Vavilapalli 
wrote:

> I’ll let others comment on specific features.
>
> Regarding the 3.x vs 2.x point, as I noted before on other threads, given
> all the incompatibilities in trunk it will be ways off before users can run
> their production workloads on a 3.x release. Therefore, as I was proposing
> before, we should continue the 2.x lines, but soon get started on rolling
> out a release candidate based off trunk.
>
> Like with 2.8, I’d like to go back and prepare some notes on trunk’s
> content so we can objectively discuss about it.
>
> +Vinod


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-12 Thread Karthik Kambatla
I am really against the notion of calling x.y.0 releases alpha/beta; it is
very confusing. If we think a release is alpha/beta quality, why not
release it as x.y.0-alpha or x.y.0-beta, and follow it up eventually with
x.y.0 GA.

I am in favor of labeling any of the newer features to be of alpha/beta
quality.

SharedCache is another close to done feature.

On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli  wrote:

> Agreed on not mixing this with major release discussions.
>
> Okay, I just finished my review of 2.8 content.
>
> A quick summary follows.
>
> Current state of originally planned items
>
>  - Nearly Done / Done and so need to close down quickly
> — Support *both* JDK7 and JDK8 runtimes HADOOP-11090
> — Supporting non-exclusive node-labels: YARN-3214: Done, can push as
> an alpha / beta feature
> — Support priorities across applications within the same queue
> YARN-1963: Can push as an alpha / beta feature
>
>  - Definitely have to move out into 2.9 and beyond
> — Early work for disk and network isolation in YARN: YARN-2139,
> YARN-2140: Early noise, some critical pieces designed, done but not a lot
> of movement of late
> — Classpath isolation for downstream clients HADOOP-11656: Lots of
> chatter a while ago, not much movement of late
> — Support for Erasure Codes in HDFS HDFS-7285<
> https://issues.apache.org/jira/browse/HDFS-7285>: Moved out to 2.9 in the
> interest of stability / bake-in
>
> Non-planned features that went into 2.8.0
>
> — The overall list of new features:
> https://issues.apache.org/jira/issues/?filter=12333994
> — HDFS-6200 Create a separate jar for hdfs-client: Compatible
> improvement - no dimension of alpha/betaness here.
> — HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
> — Stability improvements ready to use:
> — HDFS-8008Support client-side back off when the datanodes are
> congested
> — HDFS-8009Signal congestion on the DataNode
> — YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> most of it anyways. Can push as an alpha feature.
> — YARN-1197 Support changing resources of an allocated container: Can
> push as an alpha/beta feature
>
> Items in progress to think about in 2 weeks
>
> — YARN Timeline Service v1.5 - YARN-4233: A short term bridge before
> YARN-2928 comes around. I think this should go in given the tremendous
> activity recently.
> — YARN Timeline Service Next generation: YARN-2928: Lots of momentum,
> but clearly a work in progress. Two options here
> — If it is safe to ship it into 2.8 in a disable manner, we can
> get the early code into trunk and all the way int o2.8.
> — If it is not safe, it organically rolls over into 2.9
> — Compatibility tools to catch backwards, forwards compatibility
> issues at patch submission, release times. Some of it is captured at
> YARN-3292. This also involves resurrecting jdiff
> (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or investing in new tools.
>
> This is my plan of action for now in terms of the release itself
>
>  - Cut a branch about two weeks from now
>  - Do an RC mid next month (leaving ~4weeks since branch-cut)
>  - As with 2.7.x series, the first release will still be called as early /
> alpha release in the interest of
> — gaining downstream adoption
> — wider testing,
> — yet reserving our right to fix any inadvertent incompatibilities
> introduced.
>
> If we can get answers on “Items to think about now” during this and next
> week, we will overall be in good shape.
>
> Thoughts?
>
> Thanks
> +Vinod
> PS:As you may have noted above, this time around, I want to do something
> that we’ve always wanted to do, but never explicitly did. I’m calling out
> readiness of each feature as they stand today so we can inform our users
> better of what they can start relying on in production clusters.
>
>
> On Oct 5, 2015, at 11:53 AM, Colin P. McCabe > wrote:
>
> I think it makes sense to have a 2.8 release since there are a
> tremendous number of JIRAs in 2.8 that are not in 2.7.  Doing a 3.x
> release seems like something we should consider separately since it
> would not have the same compatibility guarantees as a 2.8 release.
> There's a pretty big delta between trunk and 2.8 as well.
>
> cheers,
> Colin
>
> On Sat, Sep 26, 2015 at 1:36 PM, Chris Douglas  > wrote:
> With two active sustaining branches (2.6, 2.7), what would you think
> of releasing trunk as 3.x instead of pushing 2.8? There are many new
> features (EC, Y1197, etc.), and trunk could be the source of several
> alpha/beta releases before we fork the 3.x line. -C
>
> On Sat, Sep 26, 2015 at 12:49 PM, Vinod Vavilapalli
> > wrote:
> As you may have noted, 2.8.0 got completely derailed what with 2.7.x and
> the unusually long 2.6.1 

[jira] [Resolved] (MAPREDUCE-6531) CLONE - Mumak: Map-Reduce Simulator

2015-11-04 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-6531.
-
Resolution: Won't Fix

Resolving as "Won't Fix". 

> CLONE - Mumak: Map-Reduce Simulator
> ---
>
> Key: MAPREDUCE-6531
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6531
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.21.0
>Reporter: GD
>Assignee: Hong Tang
> Fix For: 0.21.0
>
> Attachments: 19-jobs.topology.json.gz, 19-jobs.trace.json.gz, 
> mapreduce-728-20090917-3.patch, mapreduce-728-20090917-4.patch, 
> mapreduce-728-20090917.patch, mapreduce-728-20090918-2.patch, 
> mapreduce-728-20090918-3.patch, mapreduce-728-20090918-5.patch, 
> mapreduce-728-20090918-6.patch, mapreduce-728-20090918.patch, mumak.png
>
>
> h3. Vision:
> We want to build a Simulator to simulate large-scale Hadoop clusters, 
> applications and workloads. This would be invaluable in furthering Hadoop by 
> providing a tool for researchers and developers to prototype features (e.g. 
> pluggable block-placement for HDFS, Map-Reduce schedulers etc.) and predict 
> their behaviour and performance with reasonable amount of confidence, 
> there-by aiding rapid innovation.
> 
> h3. First Cut: Simulator for the Map-Reduce Scheduler
> The Map-Reduce Scheduler is a fertile area of interest with at least four 
> schedulers, each with their own set of features, currently in existence: 
> Default Scheduler, Capacity Scheduler, Fairshare Scheduler & Priority 
> Scheduler.
> Each scheduler's scheduling decisions are driven by many factors, such as 
> fairness, capacity guarantee, resource availability, data-locality etc.
> Given that, it is non-trivial to accurately choose a single scheduler or even 
> a set of desired features to predict the right scheduler (or features) for a 
> given workload. Hence a simulator which can predict how well a particular 
> scheduler works for some specific workload by quickly iterating over 
> schedulers and/or scheduler features would be quite useful.
> So, the first cut is to implement a simulator for the Map-Reduce scheduler 
> which take as input a job trace derived from production workload and a 
> cluster definition, and simulates the execution of the jobs in as defined in 
> the trace in this virtual cluster. As output, the detailed job execution 
> trace (recorded in relation to virtual simulated time) could then be analyzed 
> to understand various traits of individual schedulers (individual jobs turn 
> around time, throughput, faireness, capacity guarantee, etc). To support 
> this, we would need a simulator which could accurately model the conditions 
> of the actual system which would affect a schedulers decisions. These include 
> very large-scale clusters (thousands of nodes), the detailed characteristics 
> of the workload thrown at the clusters, job or task failures, data locality, 
> and cluster hardware (cpu, memory, disk i/o, network i/o, network topology) 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] About the details of JDK-8 support

2015-10-15 Thread Karthik Kambatla
On Wed, Oct 14, 2015 at 4:28 PM, Allen Wittenauer  wrote:

>
> If people want, I could setup a cut off of yetus master to run the jenkins
> test-patch.  (multiple maven repos, docker support, multijdk support, … )
> Yetus would get some real world testing out of it and hadoop common-dev
> could stop spinning in circles over some of the same issues month after
> month. ;)
>

Seems like a step in the right direction.

Should we expect a downtime and is there a good/bad time to do this?


>
>
> On Oct 14, 2015, at 3:05 PM, Robert Kanter  wrote:
>
> > The only problem with trying to get the JDK 8 trunk builds green (or
> blue I
> > guess) is that it's like trying to hit a moving target because of how
> many
> > new commits keep coming in.  I was looking at fixing these a while ago,
> and
> > managed to at least make them compile and fixed (or worked with others to
> > fix) some of the unit tests.  I've been really busy on other tasks and
> > haven't had time to continue working on this in quite a while though.
> >
> > Currently, it looks like Common is still green mostly, Yarn is having a
> > build failure with checkstyle, MR has between 1 and 10 test failures, and
> > HDFS had between 3 and 10 test failures.
> >
> > I think it's going to be difficult to get these green, and to keep them
> > green, unless we get more buy in from everyone on new commits being
> tested
> > against JDK 8.  Otherwise, it's too hard to keep up with the number of
> > commits coming in, even if we do get it green.  Perhaps we could have
> > test-patch also run the patch against JDK 8?
> >
> >
> > - Robert
> >
> > On Wed, Oct 14, 2015 at 8:27 AM, Steve Loughran 
> > wrote:
> >
> >>
> >>> On 13 Oct 2015, at 17:32, Haohui Mai  wrote:
> >>>
> >>> Just to echo Steve's idea -- if we're seriously considering supporting
> >>> JDK 8, maybe the first thing to do is to set up the jenkins to run
> >>> with JDK 8? I'm happy to help. Does anyone know who I can talk to if I
> >>> need to play around with all the Jenkins knob?
> >>
> >> Jenkins is building with JAva 7 and 8. all that's needed is to turn off
> >> the Java 7 build, which I will  happily do. The POM can be changed to
> set
> >> the minimum JVM version -though that's most likely to be visible to
> people
> >> building locally, as you'll need to make sure that you have access to
> java
> >> 7 and java 8 JVMs if you want to build and test for both.
> >>
> >> Jenkins-wise, the big issue is one I've mentioned before: the builds are
> >> failing an not enough people are caring
> >>
> >>
> >>
> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Hdfs-trunk-Java8/488/
> >>
> >> Please, lets fix this
> >>
> >>
>
>


[jira] [Created] (MAPREDUCE-6506) Make the reducer-preemption configs consistent in how they handle defaults

2015-10-08 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6506:
---

 Summary: Make the reducer-preemption configs consistent in how 
they handle defaults
 Key: MAPREDUCE-6506
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6506
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: applicationmaster
Affects Versions: 2.8.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6501) Improve reducer preemption

2015-10-05 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6501:
---

 Summary: Improve reducer preemption
 Key: MAPREDUCE-6501
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6501
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 2.7.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


As discussed on MAPREDUCE-6302, we could improve the reducer preemption as 
follows:
# preempt enough reducers so there are enough mappers to match slowstart 
# prioritize preempting reducers that are still in SHUFFLE phase
# add an option to not preempt reducers that are past SHUFFLE phase 
irrespective of slowstart as long as one mapper can run

We could do it all in one patch or create subtasks as necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Hadoop 2.6.1 RC1

2015-09-22 Thread Karthik Kambatla
+1 (binding)

Ran a few MR jobs on a pseudo-distributed cluster on Java 8.

On Tue, Sep 22, 2015 at 8:09 AM, Masatake Iwasaki <
iwasak...@oss.nttdata.co.jp> wrote:

> +1(non-binding)
>
> - verified mds and signature of source and binary tarball
> - built from source tarball with -Pnative on CentOS 7 and OpenJDK 7
> - built site documentation
> - started pseudo distributed cluster and ran example jobs
>
> Masatake Iwasaki
>
>
>
> On 9/17/15 11:10, Vinod Kumar Vavilapalli wrote:
>
>> Hi all,
>>
>> After a nearly month long [1] toil, with loads of help from Sangjin Lee
>> and
>> Akira Ajisaka, and 153 (RC0)+7(RC1) commits later, I've created a release
>> candidate RC1 for hadoop-2.6.1.
>>
>> RC1 is RC0 [0] (for which I opened and closed a vote last week) + UI fixes
>> for the issue Sangjin raised (YARN-3171 and the dependencies YARN-3779,
>> YARN-3248), additional fix to avoid incompatibility (YARN-3740), other UI
>> bugs (YARN-1884, YARN-3544) and the MiniYARNCluster issue (right patch for
>> YARN-2890) that Jeff Zhang raised.
>>
>> The RC is available at:
>> http://people.apache.org/~vinodkv/hadoop-2.6.1-RC1/
>>
>> The RC tag in git is: release-2.6.1-RC1
>>
>> The maven artifacts are available via repository.apache.org at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1021
>>
>> Some notes from our release process
>>   -  - Sangjin and I moved out a bunch of items pending from 2.6.1 [2] -
>> non-committed but desired patches. 2.6.1 is already big as is and is late
>> by any standard, we can definitely include them in the next release.
>>   - The 2.6.1 wiki page [3] captures some (but not all) of the context of
>> the patches that we pushed in.
>>   - Given the number of fixes pushed [4] in, we had to make a bunch of
>> changes to our original plan - we added a few improvements that helped us
>> backport patches easier (or in many cases made backports possible), and we
>> dropped a few that didn't make sense (HDFS-7831, HDFS-7926, HDFS-7676,
>> HDFS-7611, HDFS-7843, HDFS-8850).
>>   - I ran all the unit tests which (surprisingly?) passed. (Except for
>> one,
>> which pointed out a missing fix HDFS-7552).
>>
>> As discussed before [5]
>>   - This release is the first point release after 2.6.0
>>   - I’d like to use this as a starting release for 2.6.2 in a few weeks
>> and
>> then follow up with more of these.
>>
>> Please try the release and vote; the vote will run for the usual 5 days.
>>
>> Thanks,
>> Vinod
>>
>> [0] Hadoop 2.6.1 RC0 vote: http://markmail.org/thread/ubut2rn3lodc55iy
>> [1] Hadoop 2.6.1 Release process thread:
>> http://markmail.org/thread/wkbgkxkhntx5tlux
>> [2] 2.6.1 Pending tickets:
>> https://issues.apache.org/jira/issues/?filter=12331711
>> [3] 2.6.1 Wiki page: https://wiki.apache.org/hadoop/Release-2.6.1
>> -Working-Notes
>> [4] List of 2.6.1 patches pushed:
>> https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%202.6.1
>> %20and%20labels%20%3D%20%222.6.1-candidate%22
>> [5] Planning Hadoop 2.6.1 release:
>> http://markmail.org/thread/sbykjn5xgnksh6wg
>>
>> PS:
>>   - Note that branch-2.6 which will be the base for 2.6.2 doesn't have
>> these
>> fixes yet. Once 2.6.1 goes through, I plan to rebase branch-2.6 based off
>> 2.6.1.
>>   - The additional patches in RC1 that got into 2.6.1 all the way from 2.8
>> are NOT in 2.7.2 yet, this will be done as a followup.
>>
>>
>


[jira] [Resolved] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run

2015-09-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-6485.
-
Resolution: Duplicate

I believe this is a duplicate of MAPREDUCE-6302, and an artifact of the 
scheduler's headroom calculation not being accurate. 

Please re-open if you disagree. 



> MR job hanged forever because all resources are taken up by reducers and the 
> last map attempt never get resource to run
> ---
>
> Key: MAPREDUCE-6485
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>Reporter: Bob
>Priority: Critical
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces 
> will take resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running 
> reduces, there is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), 
> and its state transitioned from RUNNING to FAIL_CONTAINER_CLEANUP, so failed 
> map attempt would not be started. 
> As for the second attempt which was started due to having enable map task 
> speculative is pending at UNASSINGED state because of no resource available. 
> But the second map attempt request have lower priority than reduces, so 
> preemption would not happened.
> As a result all reduces would not finished because of there is one map left. 
> and the last map hanged there because of no resource available. so, the job 
> would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Scaling JHS

2015-09-02 Thread Karthik Kambatla
Hi folks

Just wanted to bring this up and see what people think.

IIUC, JHS memory consumption depends on the number of jobs, tasks per job,
and concurrent accesses. There might be a few orthogonal approaches to
improving its scalability:

   - Appears we process jhist files on every access. May be, we could store
   the results in a different file and consult that first. We might be able to
   store all these events in ATS and use it for aggregation etc., but it might
   be a while before ATS is production-ready.
   - Active/active HA: We could bring up multiple instances of JHS behind a
   load-balancer. Moving/deleting history files needs to be done by one of
   them - we could have a leader that does all of this, or have ZK locks for
   directories being processed.

Would like to hear experiences/ thoughts/ suggestions from the community.

Thanks
Karthik


Re: [VOTE] Release Apache Hadoop 2.7.1 RC0

2015-07-03 Thread Karthik Kambatla
+1 (binding)

Stood up a pseudo-dist cluster with FairScheduler and ran a few jobs.

On Mon, Jun 29, 2015 at 1:45 AM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 Hi all,

 I've created a release candidate RC0 for Apache Hadoop 2.7.1.

 As discussed before, this is the next stable release to follow up 2.6.0,
 and the first stable one in the 2.7.x line.

 The RC is available for validation at:
 *http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/
 http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/*

 The RC tag in git is: release-2.7.1-RC0

 The maven artifacts are available via repository.apache.org at
 *https://repository.apache.org/content/repositories/orgapachehadoop-1019/
 https://repository.apache.org/content/repositories/orgapachehadoop-1019/
 *

 Please try the release and vote; the vote will run for the usual 5 days.

 Thanks,
 Vinod

 PS: It took 2 months instead of the planned [1] 2 weeks in getting this
 release out: post-mortem in a separate thread.

 [1]: A 2.7.1 release to follow up 2.7.0
 http://markmail.org/thread/zwzze6cqqgwq4rmw




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


[jira] [Created] (MAPREDUCE-6369) MR compile shouldn't depend on nodemanager

2015-05-19 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-6369:
---

 Summary: MR compile shouldn't depend on nodemanager
 Key: MAPREDUCE-6369
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6369
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Karthik Kambatla


Dependency analysis shows MR depends on nodemanager. If possible, MR shouldn't 
depend on any yarn-server APIs at all. This might require some changes in Yarn. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-1439) Learning Scheduler

2015-05-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-1439.
-
Resolution: Won't Fix

Given the lack of activity for 5 years and that there haven't been any releases 
on Hadoop 1, I think it is fair to close this as Won't Fix.

If there is interest in building something along these lines for Yarn, please 
open a JIRA with a design doc. 

 Learning Scheduler
 --

 Key: MAPREDUCE-1439
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1439
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Jaideep
Assignee: Jaideep
 Attachments: learning-scheduler-description.pdf


 I would like to contribute the scheduler I have written to the MapReduce 
 project. Presently the scheduler source code is available on 
 http://code.google.com/p/learnsched/. It has been tested to work with Hadoop 
 0.20, although the code available at the URL had been modified to build with 
 trunk and needs testing. Currently the scheduler is in experimental stages, 
 and any feedback for improvement will be extremely useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] branch-1

2015-05-08 Thread Karthik Kambatla
Closing out the JIRAs as Auto Closed or Closed due to Inactivity seems
reasonable to me. For branch-1, we can be more aggressive. We should
probably do the same less aggressively for other branches too.

On Fri, May 8, 2015 at 11:01 AM, Arun C Murthy acmur...@apache.org wrote:

 +1

 Arun

 On May 8, 2015, at 10:41 AM, Allen Wittenauer a...@altiscale.com wrote:

 
May we declare this branch dead and just close bugs (but not
 necessarily concepts, ideas, etc) with won’t fix?  I don’t think anyone has
 any intention of working on the 1.3 release, especially given that 1.2.1
 was Aug 2013 ….
 
I guess we need a PMC member to declare a vote or whatever….
 
 




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: [DISCUSS] branch-1

2015-05-08 Thread Karthik Kambatla
I would be -1 to declaring the branch dead just yet. There have been 7
commits to that branch this year. I know this isn't comparable to trunk or
branch-2, but it is not negligible either.

I propose we come up with a policy for deprecating past major release
branches. May be, something along the lines of - deprecate branch-x when
release x+3.0.0 goes GA?



On Fri, May 8, 2015 at 10:41 AM, Allen Wittenauer a...@altiscale.com wrote:


 May we declare this branch dead and just close bugs (but not
 necessarily concepts, ideas, etc) with won’t fix?  I don’t think anyone has
 any intention of working on the 1.3 release, especially given that 1.2.1
 was Aug 2013 ….

 I guess we need a PMC member to declare a vote or whatever….





-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


[jira] [Resolved] (MAPREDUCE-6343) JobConf.parseMaximumHeapSizeMB() fails to parse value greater than 2GB expressed in bytes

2015-04-28 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-6343.
-
   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed

Thanks [~jasonxh] for reporting and working on this. I just committed this to 
trunk. 

 JobConf.parseMaximumHeapSizeMB() fails to parse value greater than 2GB 
 expressed in bytes
 -

 Key: MAPREDUCE-6343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Hao Xia
Assignee: Hao Xia
 Fix For: 3.0.0

 Attachments: parse-heap-size.patch, parse-heap-size.v2.patch


 It currently tries to parse the value as an integer, which blows up whenever 
 the value is greater than 2GB and expressed in bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Release numbering for stable 2.8 and beyond

2015-04-22 Thread Karthik Kambatla
Approach (1) seems like a good way to handle stability concerns people
might have. If we explicitly distinguish between current and stable (i.e.,
not set them both to the latest release). It would be nice to do a VOTE for
calling a release stable.

I would use approach (2) for compatibility concerns. Ideally, I wouldn't
ship anything in a release unless we are sure we can preserve its
compatibility. If we really think a feature is not ready, we could always
guard them with private configs that devs or beta-testers could enable to
use. In the past, there have been proposals about labeling specific
features alpha and beta. I am open to considering it provided we define
what those terms mean, ideally in our compat guidelines.

On Wed, Apr 22, 2015 at 2:46 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Thanks for forking this Vinod,

 Linux used to do the odd/even minor versions for unstable/stable, but that
 went away when 2.6 lasted forever. With the 3.x and 4.x I think it's just
 always stable. The odd/even though was at least a convention everyone knew
 about.

 Stack can comment better than I about HBase versioning, but my impression
 is that they do something similar. Evens are stable, odds are not.

 I'd be okay with an even/odd convention, but it would still be our third
 versioning convention in as many years.

 I like (2) since it's similar to what we did with 2.0 and 2.1. Our contract
 since 2.2 has been that everything is compatible / stable. Adding a tag
 makes it very clear that we are now doing unstable releases. This also has
 the pleasing property that we culminate in a 2.x.0, rather than stable
 starting at 2.x.4 or whatever. Much simpler than having to consult a
 website.

 Re (2), we should add a number (e.g. alpha2) too in case there's more than
 one alpha or beta too.

 Best,
 Andrew

 On Wed, Apr 22, 2015 at 2:17 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

  Forking the thread.
 
  In the previous 2.7.1 thread [1], there were enough yays to my proposal
 to
  wait for a bug-fix release or two before calling a 2.x release stable.
  There were some concerns about the naming.
 
  We have two options, taking 2.8 as an example
   (1) Release 2.8.0, call it as an alpha in documentation and release
  notes, wait for a 2.8.1/2.8.2 reasonably stable enough to be called as
 the
  first stable release of 2.8.
   (2) Release 2.8.0-alpha, 2.8.0-beta etc before culminating in a 2.8.0
  stable release.
 
  (1) is what I preferred first up. This is what HBase used to do, and far
  beyond, in the linux kernel releases. It helps in scenarios where we are
  forced to downgrade a release, say due to major issues. We can simply
  announce it as not stable retroactively, change the pointers on our
 website
  and move on.
 
  Thoughts?
 
  Thanks,
  +Vinod
 
  [1] http://markmail.org/thread/ogzk4phj6wsdpssu
 
  On Apr 21, 2015, at 4:59 PM, Vinod Kumar Vavilapalli 
  vino...@hortonworks.com wrote:
 
  
   Sure, I agree it's better to have clear guidelines and scheme. Let me
  fork this thread about that.
  
   Re 2.7.0, I just forgot about the naming initially though I was clear
 in
  the discussion/voting. I so had to end up calling it alpha outside of the
  release artifact naming.
  
   Thanks
   +Vinod
  
   On Apr 21, 2015, at 4:26 PM, Andrew Wang andrew.w...@cloudera.com
  wrote:
  
   I would also like to support Karthik's proposal on the release thread
  about
   version numbering. 2.7.0 being alpha is pretty confusing since all
 of
  the
   other 2.x releases since GA have been stable. I think users would
  prefer a
   version like 2.8.0-alpha1 instead, which is very clear and similar
 to
   what we did for 2.0 and 2.1. Then we release 2.8.0 when we're actually
   stable.
  
   I don't know if it's retroactively possible to do this for 2.7.0, but
  it's
   something to consider for the next 2.7 alpha or beta or whatever.
  
 
 




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: [DISCUSS] Looking to a 2.8.0 release

2015-04-21 Thread Karthik Kambatla
The feature set here seems pretty long, even for 2 - 3 months. Can we come
up with a minimum set of features (or a number of features) that justify a
new minor release, and start stabilizing as soon as those are in?

On Tue, Apr 21, 2015 at 2:39 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 With 2.7.0 out of the way, and with more maintenance releases to stabilize
 it, I propose we start thinking about 2.8.0.

 Here's my first cut of the proposal, will update the Roadmap wiki.
  - Support *both* JDK7 and JDK8 runtimes: HADOOP-11090
  - Compatibility tools to catch backwards, forwards compatibility issues at
 patch submission, release times. Some of it is captured at YARN-3292. This
 also involves resurrecting jdiff (HADOOP-11776/YARN-3426/MAPREDUCE-6310)
 and/or investing in new tools.
  - HADOOP-11656 Classpath isolation for downstream clients
  - Support for Erasure Codes in HDFS HDFS-7285
  - Early work for disk and network isolation in YARN: YARN-2139, YARN-2140
  - YARN Timeline Service Next generation: YARN-2928. At least branch-merge
 + early peek.
  - Supporting non-exclusive node-labels: YARN-3214

 I'm experimenting with more agile 2.7.x releases and would like to continue
 the same by volunteering as the RM for 2.8.x too.

 Given the long time we took with 2.7.0, the timeline I am looking at is
 8-12 weeks. We can pick as many features as they finish along and make a
 more predictable releases instead of holding up releases for ever.

 Thoughts?

 Thanks
 +Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: [VOTE] Release Apache Hadoop 2.7.0 RC0

2015-04-15 Thread Karthik Kambatla
Thanks for putting this together, Vinod.

Stood up a pseudo-dist cluster using the binary bits and ran a pi job.
- The MR-AM got allocated, but never launched because the NM-RM didn't talk
to each other. Filed YARN-3492. Was able to run jobs after restart.

I believe we should at least investigate this further to see what could
have been going on.

On Wed, Apr 15, 2015 at 1:11 PM, Kihwal Lee kih...@yahoo-inc.com.invalid
wrote:

 +1 (binding)Built the source from the tag and ran basic test on pseudo
 distributed cluster.
 Kihwal

   From: Vinod Kumar Vavilapalli vino...@apache.org
  To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
 yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
 Cc: vino...@apache.org
  Sent: Friday, April 10, 2015 6:44 PM
  Subject: [VOTE] Release Apache Hadoop 2.7.0 RC0

 Hi all,

 I've created a release candidate RC0 for Apache Hadoop 2.7.0.

  The RC is available at:
 http://people.apache.org/~vinodkv/hadoop-2.7.0-RC0/

 The RC tag in git is: release-2.7.0-RC0

  The maven artifacts are available via repository.apache.org at
 https://repository.apache.org/content/repositories/orgapachehadoop-1017/

 As discussed before
  - This release will only work with JDK 1.7 and above
  - I’d like to use this as a starting release for 2.7.x [1], depending on
 how it goes, get it stabilized and potentially use a 2.7.1 in a few
 weeks as the stable release.

  Please try the release and vote; the vote will run for the usual 5 days.

  Thanks,
  Vinod

  [1]: A 2.7.1 release to follow up 2.7.0
 http://markmail.org/thread/zwzze6cqqgwq4rmw






-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Removal of unused properties

2015-04-09 Thread Karthik Kambatla
Should be okay to remove.

The policy was intended for configs that continue to be relevant like
mapred.child.java.opts. In this case, since there is no TaskTracker
anymore, leaving this config purely for compat reasons seems unnecessary.
We could also improve the policies to clarify this.

On Thu, Apr 9, 2015 at 4:33 AM, Akira AJISAKA ajisa...@oss.nttdata.co.jp
wrote:

 Hi Folks,

 In MAPREDUCE-6307, I'd like to remove unused mapreduce.tasktracker.
 taskmemorymanager.monitoringinterval property, however, the
 compatibility document says Hadoop-defined properties are to be deprecated
 at least for one major release before being removed.

 http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/
 hadoop-common/Compatibility.html#Hadoop_Configuration_Files

 Is it applicable for unused properties?
 Can we remove unused properties right now?

 Regards,
 Akira




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: A 2.7.1 release to follow up 2.7.0

2015-04-09 Thread Karthik Kambatla
Inline.

On Thu, Apr 9, 2015 at 11:48 AM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 Hi all,

 I feel like we haven't done a great job of maintaining the previous 2.x
 releases. Seeing as how long 2.7.0 release has taken, I am sure we will
 spend more time stabilizing it, fixing issues etc.

 I propose that we immediately follow up 2.7.0 with a 2.7.1 within 2-3
 weeks. The focus obviously is to have blocker issues, bug-fixes and *no*
 features.


+1. Having a 2.7.2/2.7.3 to continue stabilizing is also appealing. Would
greatly help folks who upgrade to later releases for major bug fixes
instead of the new and shiny features.



 Improvements are going to be slightly hard to reason about, but I
 propose limiting ourselves to very small improvements, if at all.


I would avoid any improvements unless they are to fix severe regressions -
performance or otherwise. I guess they become blockers in that case. So,
yeah, I suggest no improvements at all.



 The other area of concern with the previous releases had been
 compatibility. With help from Li Lu, I got jdiff reinstated in branch-2
 (though patches are not yet in), and did a pass. In the unavoidable event
 that we find incompatibilities with 2.7.0, we can fix those in 2.7.1 and
 promote that to be the stable release.


Sounds reasonable.



 Thoughts?

 Thanks,+Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Uncertainty of pre-commit builds

2015-03-31 Thread Karthik Kambatla
Hi devs,

I am sure people must have noticed, but thought I would bring this up.

I have noticed that our pre-commit builds could end up running the wrong
set of unit tests for patches. For instance, YARN-3412
https://issues.apache.org/jira/browse/YARN-3412 changes only YARN files,
but the test were run against one of the MR modules.

I suspect there is a race condition when there are multiple builds
executing on the same node, or remnants from a previous run are getting
picked up.

Filed HADOOP-11779 for this.

-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Uncertainty of pre-commit builds

2015-03-31 Thread Karthik Kambatla
Ravi just pointed me to HADOOP-11746. Thanks a bunch for working on
improving our test-patch scripts, Allen.

On Tue, Mar 31, 2015 at 9:37 AM, Allen Wittenauer a...@altiscale.com wrote:


 Very likely.  One of the things I noticed during HADOOP-11746 is
 that there is a HUGE, catastrophic race if Jenkins doesn’t setup the
 environment correctly or leaks variables between runs.  shellcheck prints
 out so many messages on the current code I’m surprised it doesn’t crash.

 On Mar 31, 2015, at 9:21 AM, Karthik Kambatla ka...@cloudera.com wrote:

  Hi devs,
 
  I am sure people must have noticed, but thought I would bring this up.
 
  I have noticed that our pre-commit builds could end up running the
 wrong
  set of unit tests for patches. For instance, YARN-3412
  https://issues.apache.org/jira/browse/YARN-3412 changes only YARN
 files,
  but the test were run against one of the MR modules.
 
  I suspect there is a race condition when there are multiple builds
  executing on the same node, or remnants from a previous run are getting
  picked up.
 
  Filed HADOOP-11779 for this.
 
  --
  Karthik Kambatla
  Software Engineer, Cloudera Inc.
  
  http://five.sentenc.es




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?

2015-03-10 Thread Karthik Kambatla
On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran ste...@hortonworks.com
wrote:


 If 3.x is going to be Java 8  not backwards compatible, I don't expect
 anyone wanting to use this in production until some time deep into 2016.

 Issue: JDK 8 vs 7

 It will require Hadoop clusters to move up to Java 8. While there's dev
 pull for this, there's ops pull against this: people are still in the
 moving-off Java 6 phase due to that it's working, don't update it
 philosophy. Java 8 is compelling to us coders, but that doesn't mean ops
 want it.

 You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*,
 the main thing is setting up JAVA_HOME. That's something we could make
 easier somehow (maybe some min Java version field in resource requests that
 will let apps say java 8, java 9, ...). YARN could not only set up JVM
 paths, it could fail-fast if a Java version wasn't available.

 What we can't do in hadoop coretoday  is set javac.version=1.8  use java
 8 code. Downstream code ca do that (Hive, etc); they just need to accept
 that they don't get to play on JDK7 clusters if they embrace l-expressions.

 So...we need to stay on java 7 for some time due to ops pull; downstream
 apps get to choose what they want. We can/could enhance YARN to make JVM
 choice more declarative.

 Issue: Incompatible changes

 Without knowing what is proposed for an incompatible classpath change, I
 can't say whether this is something that could be made optional. If it
 isn't, then it is a python-3 class option, rewrite your code event, which
 is going to be particularly traumatic to things like Hive that already do
 complex CP games. I'm currently against any mandatory change here, though
 would love to see an optional one. And if optional, it ceases to become an
 incompatible change...


We should probably start qualifying the word incompatible more often.

Are we okay with an API incompatible Hadoop-3? No.

Are we okay with an wire-incompatible Hadoop-3? Likely not.

Are we okay with breaking other forms of compatibility for Hadoop-3, like
behavior, dependencies, JDK, classpath, environment? I think so. Are we
okay with breaking these forms of compatibility in future Hadoop-2.x?
Likely not. Does our compatibility policy allow these changes in 2.x?
Mostly yes, but that is because we don't have policies for a lot of these
things that affect end-users. The reason we don't have a policy, IMO, is a
combination of (1) we haven't spent enough time thinking about them, (2)
without things like classpath isolation, we end up tying developers' hands
if we don't let them change the dependencies. I propose we update our
compat guidelines to be stricter, and do whatever is required to get there.
Is it okay to change our compat guidelines incompatibly? May be, it
warrants a Hadoop-3? I don't know yet.

And, some other policies like bumping min JDK requirement are allowed in
minor releases. Users might be okay with certain JDK bumps (6 to 7, since
no one seems to be using 6 anymore), but users most definitely care about
some other bumps (7 - 8). If we want to remove this subjective evaluation,
I am open to requiring a major version for JDK upgrades (not support, but
language features) even if it meant we have to wait until 3.0 for JDK
upgrade.




 Issue: Getting trunk out the door

 The main diff from branch-2 and trunk is currently the bash script
 changes. These don't break client apps. May or may not break bigtop  other
 downstream hadoop stacks, but developers don't need to worry about this:
 no recompilation necessary

 Proposed: ship trunk as a 2.x release, compatible with JDK7  Java code.

 It seems to me that I could go

 git checkout trunk
 mvn versions:set -DnewVersion=2.8.0-SNAPSHOT

 We'd then have a version of Hadoop-trunk we could ship later this year,
 compatible at the JDK and API level with the existing java code  JDK7+
 clusters.

 A classpath fix that is optional/compatible can then go out on the 2.x
 line, saving the 3.x tag for something that really breaks things, forces
 all downstream apps to set up new hadoop profiles, have separate modules 
 generally hate the hadoop dev team

 This lets us tick off the recent trunk release and fixed shell scripts
 items, pushing out those benefits to people sooner rather than later, and
 puts off the Hello, we've just broken your code event for another 12+
 months.

 Comments?

 -Steve






-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Reviving HADOOP-7435: Making Jenkins pre-commit build work with branches

2015-03-04 Thread Karthik Kambatla
Thanks for reviving this on email, Vinod. Newer folks like me might not be
aware of this JIRA/effort.

This would be wonderful to have so (1) we know the status of release
branches (branch-2, etc.) and also (2) feature branches (YARN-2928).
Jonathan's or Matt's proposal for including branch name looks reasonable to
me.

If none has any objections, I think we can continue on JIRA and get this
in.

On Wed, Mar 4, 2015 at 1:20 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Hi all,

 I'd like us to revive the effort at
 https://issues.apache.org/jira/browse/HADOOP-7435 to make precommit
 builds being able to work with branches. Having the Jenkins verify patches
 on branches is very useful even if there may be relaxed review oversight on
 the said-branch.

 Unless there are objections, I'd request help from Giri who already has a
 patch sitting there for more than a year before. This may need us to
 collectively agree on some convention - the last comment says that the
 branch patch name should be in some format for this to work.

 Thanks,
 +Vinod




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Karthik Kambatla
On Wed, Mar 4, 2015 at 10:46 AM, Stack st...@duboce.net wrote:

 In general +1 on 3.0.0. Its time. If we start now, it might make it out by
 2016. If we start now, downstreamers can start aligning themselves to land
 versions that suit at about the same time.

 While two big items have been called out as possible incompatible changes,
 and there is ongoing discussion as to whether they are or not*, is there
 any chance of getting a longer list of big differences between the
 branches? In particular I'd be interested in improvements that are 'off' by
 default that would be better defaulted 'on'.

 Thanks,
 St.Ack

 * Let me note that 'compatible' around these parts is a trampled concept
 seemingly open to interpretation with a definition that is other than
 prevails elsewhere in software. See Allen's list above, and in our
 downstream project, the recent HBASE-13149 HBase server MR tools are
 broken on Hadoop 2.5+ Yarn, among others.  Let 3.x be incompatible with
 2.x if only so we can leave behind all current notions of 'compatibility'
 and just start over (as per Allen).


Unfortunately, our compatibility policies
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html
are
rather loose and allow for changes that break downstream projects. Fixing
the classpath issues would let us tighten our policies and bring our
compatibility store more inline with the general expectations.






 On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi devs,
 
  It's been a year and a half since 2.x went GA, and I think we're about
 due
  for a 3.x release.
  Notably, there are two incompatible changes I'd like to call out, that
 will
  have a tremendous positive impact for our users.
 
  First, classpath isolation being done at HADOOP-11656, which has been a
  long-standing request from many downstreams and Hadoop users.
 
  Second, bumping the source and target JDK version to JDK8 (related to
  HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
  months from now). In the past, we've had issues with our dependencies
  discontinuing support for old JDKs, so this will future-proof us.
 
  Between the two, we'll also have quite an opportunity to clean up and
  upgrade our dependencies, another common user and developer request.
 
  I'd like to propose that we start rolling a series of monthly-ish series
 of
  3.0 alpha releases ASAP, with myself volunteering to take on the RM and
  other cat herding responsibilities. There are already quite a few changes
  slated for 3.0 besides the above (for instance the shell script rewrite)
 so
  there's already value in a 3.0 alpha, and the more time we give
 downstreams
  to integrate, the better.
 
  This opens up discussion about inclusion of other changes, but I'm hoping
  to freeze incompatible changes after maybe two alphas, do a beta (with no
  further incompat changes allowed), and then finally a 3.x GA. For those
  keeping track, that means a 3.x GA in about four months.
 
  I would also like to stress though that this is not intended to be a big
  bang release. For instance, it would be great if we could maintain wire
  compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
  branch-2 and branch-3 similar also makes backports easier, since we're
  likely maintaining 2.x for a while yet.
 
  Please let me know any comments / concerns related to the above. If
 people
  are friendly to the idea, I'd like to cut a branch-3 and start working on
  the first alpha.
 
  Best,
  Andrew
 




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-03 Thread Karthik Kambatla
I am surprised classpath-isolation is being called a minor issue. We have
been hearing users complain about Hadoop leaking its dependencies into the
classpath for a while now, Guava being the culprit often. Not being able to
upgrade our dependencies without affecting users has started to hamper our
development too; e.g. Guava conflict with upgrading Curator version.

If we preserve API compat and try to preserve wire compat, I don't see the
harm in bumping the major release. It allows us to include several
fixes/features in trunk in a release. If we are not actively thinking of a
way to release items in trunk, why even have it?

If there are any disadvantages to doing a major release, I would like to
know. May be, we could arrive at a plan to accomplish it without those
problems.

Thanks
Karthik

On Tue, Mar 3, 2015 at 9:02 AM, Steve Loughran ste...@hortonworks.com
wrote:


  I want to understand a lot more about the classpath isolation
 (HADOOP-11656) proposal, specifically, what is proposed and does it have to
 be tagged as incompatible? That's a bigger change than must setting
 javac.version=8 in the POM —though given what a fundamental problem it
 addresses, I'm in favour of doing something there.

 On 3 March 2015 at 08:05:46, Andrew Wang (andrew.w...@cloudera.com) wrote:

 I view branch-3 as essentially the same size as our recent 2.x releases,
 with the exception of incompatible changes like classpath isolation and
 JDK8 target version. These, while perhaps not revolutionary, are still
 incompatible, and require a major version bump.




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Looking to a Hadoop 3 release

2015-03-02 Thread Karthik Kambatla
+1

On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.


 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.


Guava etc. have been such a pain in the past. Can't wait to have a release
we don't have to worry about what version of dependencies users want to
use.



 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.


Are you saying we can use lambdas without re-writing all of Hadoop in
Scala?



 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities.


Will be glad to help.


 There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.


 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: DISCUSSION: Patch commit criteria.

2015-03-02 Thread Karthik Kambatla
On Mon, Mar 2, 2015 at 11:29 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 We always needed another committer's +1 even if it isn't that clear in the
 bylaws. In the minimum, we should codify this in the bylaws to avoid stuff
 like people committing their own patches.

 Regarding trivial changes, I always distinguish between trivial *patches*
 and trivial changes to *existing* patches. Patches even if trivial need to
 be +1ed by another committer. OTOH, many a times, for patches that are
 extensively reviewed, potentially for months on, I sometimes end up making
 a small javadoc/documentation change in the last version of patch before
 committing. It just avoids one more cycle and more delay. It's hard to
 codify this distinction though.


In the past, I have made trivial (new lines, indentation, etc.) changes to
well reviewed patches before committing. Even then, I believe we should
upload the updated patch or the diff of trivial changes and wait for
someone else (potentially a non-committer contributor) to quickly check to
avoid making silly mistakes.


 Thanks
 +Vinod

 On Feb 27, 2015, at 1:04 PM, Konstantin Shvachko shv.had...@gmail.com
 wrote:

  There were discussions on several jiras and threads recently about how
 RTC
  actually works in Hadoop.
  My opinion has always been that for a patch to be committed it needs an
  approval  (+1) of at least one committer other than the author and no
 -1s.
  The Bylaws seem to be stating just that:
  Consensus approval of active committers, but with a minimum of one +1.
  See the full version under Actions / Code Change
  http://hadoop.apache.org/bylaws.html#Decision+Making
 
  Turned out people have different readings of that part of Bylaws, and
  different opinions on how RTC should work in different cases. Some of the
  questions that were raised include:
  - Should we clarify the Code Change decision making clause in Bylaws?
  - Should there be a relaxed criteria for trivial changes?
  - Can a patch be committed if approved only by a non committer?
  - Can a patch be committed based on self-review by a committer?
  - What is the point for a non-committer to review the patch?
  Creating this thread to discuss these (and other that I sure missed)
 issues
  and to combine multiple discussions into one.
 
  My personal opinion we should just stick to the tradition. Good or bad,
 it
  worked for this community so far.
  I think most of the discrepancies arise from the fact that reviewers are
  hard to find. May be this should be the focus of improvements rather than
  the RTC rules.
 
  Thanks,
  --Konst




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: 2.7 status

2015-02-13 Thread Karthik Kambatla
2 weeks from now (end of Feb) sounds reasonable. The one feature I would
like for to be included is shared-cache: we are pretty close - two more
main items to take care of.

In an offline conversation, Steve mentioned building Windows binaries for
our releases. Do we want to do that for 2.7? If so, can anyone with Windows
expertise setup a Jenkins job to build these artifacts, and may be hook it
up to https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder/



On Fri, Feb 13, 2015 at 11:07 AM, Arun Murthy a...@hortonworks.com wrote:

 My bad, been sorted distracted.

 I agree, we should just roll fwd a 2.7 ASAP with all the goodies.

 What sort of timing makes sense? 2 week hence?

 thanks,
 Arun

 
 From: Jason Lowe jl...@yahoo-inc.com.INVALID
 Sent: Friday, February 13, 2015 8:11 AM
 To: common-...@hadoop.apache.org
 Subject: Re: 2.7 status

 I'd like to see a 2.7 release sooner than later.  It has been almost 3
 months since Hadoop 2.6 was released, and there have already been 634 JIRAs
 committed to 2.7.  That's a lot of changes waiting for an official release.

 https://issues.apache.org/jira/issues/?jql=project%20in%20%28hadoop%2Chdfs%2Cyarn%2Cmapreduce%29%20AND%20fixversion%3D2.7.0%20AND%20resolution%3DFixed
 Jason

   From: Sangjin Lee sj...@apache.org
  To: common-...@hadoop.apache.org common-...@hadoop.apache.org
  Sent: Tuesday, February 10, 2015 1:30 PM
  Subject: 2.7 status

 Folks,

 What is the current status of the 2.7 release? I know initially it started
 out as a java-7 only release, but looking at the JIRAs that is very much
 not the case.

 Do we have a certain timeframe for 2.7 or is it time to discuss it?

 Thanks,
 Sangjin





-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


[jira] [Resolved] (MAPREDUCE-5842) uber job with LinuxContainerExecutor cause exception

2015-02-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-5842.
-
Resolution: Duplicate

This appears to be a duplicate of MAPREDUCE-5799. Reopen if that is not the 
case. 

 uber job with LinuxContainerExecutor cause exception
 

 Key: MAPREDUCE-5842
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5842
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Atkins

 enable ubertask with linux container executer cause exception:
 {noformat}
 2014-04-17 23:26:07,859 DEBUG [localfetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.LocalFetcher: LocalFetcher 1 going to 
 fetch: attempt_1397748070416_0001_m_06_0
 2014-04-17 23:26:07,860 WARN [uber-SubtaskRunner] 
 org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local 
 (uberized) 'child' : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in localfetcher#1
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:351)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ExceptionInInitializerError
   at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:70)
   at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:62)
   at org.apache.hadoop.mapred.SpillRecord.init(SpillRecord.java:57)
   at 
 org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:123)
   at 
 org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:101)
   at 
 org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:84)
 Caused by: java.lang.RuntimeException: Secure IO is not possible without 
 native code extensions.
   at org.apache.hadoop.io.SecureIOUtils.clinit(SecureIOUtils.java:75)
   ... 6 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Upgrading findbugs

2014-12-08 Thread Karthik Kambatla
Thanks for initiating this, Haohui. +1 to upgrading findbugs version.

Inline.

On Mon, Dec 8, 2014 at 9:57 PM, Haohui Mai h...@hortonworks.com wrote:

 Hi,

 The recent changes on moving to Java 7 triggers a bug in findbug (
 http://sourceforge.net/p/findbugs/bugs/918), which causes all pre-commit
 runs (e.g., HADOOP-11287) to fail.

 The current version of findbugs (1.3.9) used by Hadoop is released in 2009.
 Given that:

 (1) The current bug that we hit are fixed by a later version of findbug.
 (2) A newer findbug (3.0.0) is required to analyze Hadoop that is compiled
 against Java 8.
 (3) Newer findbugs are capable of catching more bugs. :-)

 Is it a good time to consider upgrading findbugs, which gives us better
 tools on ensuring the quality of the code case?

 I ran findbugs 3.0.0 against trunk today. It reported 111 warnings for
 hadoop-common, 44 for HDFS and 40+ for YARN. Many of them are possible
 NPEs, resource leaks, and ignored exception which are indeed bugs and are
 worthwhile to address.

 However, one issue that needs to be considered is that how to deal with the
 additional warnings reported by the newer findbugs without breaking the
 Jenkins pre-commit runs.

 Personally I can see three possible routes if we decide to upgrade
 findbugs:

 (1) Fix all warnings before upgrading to newer findbugs.


This might take a while. We might want to use the newer findbugs sooner?


 (2) Add all new warnings to the exclude list and fix them slowly.


I have my doubts on how soon we fix these warnings unless we make the
associated JIRAs (assuming we have one per exclude) blockers for the next
release. A findbugs Fix It day would be ideal to get this done.


 (3) Update test-patch.sh to make sure that new code won't introduce any new
 findbugs warnings.


Seems the best, especially if test-patch.sh shows the warnings, but doesn't
-1 unless there are new findbugs warnings. This way, the contributor can
choose to fix related warnings at the least.



 I proposed upgrading to findbugs 2.0.2 and fixing new warnings in
 HADOOP-10476, which could be dated backed to April, 2014. I volunteer to
 accelerate the effort if it is required.


 Thoughts?

 Regards,
 Haohui

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Karthik Kambatla
Software Engineer, Cloudera Inc.

http://five.sentenc.es


Re: Thinking ahead to hadoop-2.7

2014-12-05 Thread Karthik Kambatla
It would be nice to cut the branch for the next feature release (not just
Java 7) in the first week of January, so we can get the RC out by the end
of the month?

Yesterday, this came up in an offline discussion on ATS. Given people can
run 2.6 on Java 7, is there merit to doing 2.7 with the exact same bits
targeting Java 7? I am okay with going through with it, as long as it
doesn't delay the next feature release.

Thoughts?

On Wed, Dec 3, 2014 at 8:59 AM, Sangjin Lee sj...@apache.org wrote:

 Late January sounds fine to me. I think we should be able to wrap it up
 much earlier than that (hopefully).

 Thanks,
 Sangjin

 On Tue, Dec 2, 2014 at 5:19 PM, Arun C Murthy a...@hortonworks.com wrote:

  Sangjin/Karthik,
 
   How about planning on hadoop-2.8 by late Jan? Thoughts?
 
  thanks,
  Arun
 
  On Dec 2, 2014, at 11:09 AM, Sangjin Lee sjl...@gmail.com wrote:
 
   If 2.7 is being positioned as the JDK7-only release, then it would be
  good
   to know how 2.8 lines up in terms of timing. Our interest is landing
 the
   shared cache feature (YARN-1492)... Thanks.
  
   Sangjin
  
   On Mon, Dec 1, 2014 at 2:55 PM, Karthik Kambatla ka...@cloudera.com
  wrote:
  
   Thanks for starting this thread, Arun.
  
   Your proposal seems reasonable to me. I suppose we would like new
  features
   and improvements to go into 2.8 then? If yes, what time frame are we
   looking at for 2.8? Looking at YARN, it would be nice to get a release
  with
   shared-cache and a stable version of reservation work. I believe they
  are
   well under way and should be ready in a few weeks.
  
   Regarding 2.7 release specifics, do you plan to create a branch off of
   current branch-2.6 and update all issues marked fixed for 2.7 to be
  fixed
   for 2.8?
  
   Thanks
   Karthik
  
   On Mon, Dec 1, 2014 at 2:42 PM, Arun Murthy a...@hortonworks.com
  wrote:
  
   Folks,
  
   With hadoop-2.6 out it's time to think ahead.
  
   As we've discussed in the past, 2.6 was the last release which
 supports
   JDK6.
  
   I'm thinking it's best to try get 2.7 out in a few weeks (maybe by
 the
   holidays) with just the switch to JDK7 (HADOOP-10530) and possibly
   support for JDK-1.8 (as a runtime) via HADOOP-11090.
  
   This way we can start with the stable base of 2.6 and switch over to
   JDK7 to allow our downstream projects to use either for a short time
   (hadoop-2.6 or hadoop-2.7).
  
   I'll update the Roadmap wiki accordingly.
  
   Thoughts?
  
   thanks,
   Arun
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
  entity
   to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby
 notified
   that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
   immediately
   and delete it from your system. Thank You.
  
  
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/hdp/
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 




-- 
-- Karthik Kambatla, Software Engineer, Cloudera

Q: Why is this email five sentences or less?
A: http://five.sentenc.es


Re: Thinking ahead to hadoop-2.7

2014-12-01 Thread Karthik Kambatla
Thanks for starting this thread, Arun.

Your proposal seems reasonable to me. I suppose we would like new features
and improvements to go into 2.8 then? If yes, what time frame are we
looking at for 2.8? Looking at YARN, it would be nice to get a release with
shared-cache and a stable version of reservation work. I believe they are
well under way and should be ready in a few weeks.

Regarding 2.7 release specifics, do you plan to create a branch off of
current branch-2.6 and update all issues marked fixed for 2.7 to be fixed
for 2.8?

Thanks
Karthik

On Mon, Dec 1, 2014 at 2:42 PM, Arun Murthy a...@hortonworks.com wrote:

 Folks,

 With hadoop-2.6 out it's time to think ahead.

 As we've discussed in the past, 2.6 was the last release which supports
 JDK6.

 I'm thinking it's best to try get 2.7 out in a few weeks (maybe by the
 holidays) with just the switch to JDK7 (HADOOP-10530) and possibly
 support for JDK-1.8 (as a runtime) via HADOOP-11090.

 This way we can start with the stable base of 2.6 and switch over to
 JDK7 to allow our downstream projects to use either for a short time
 (hadoop-2.6 or hadoop-2.7).

 I'll update the Roadmap wiki accordingly.

 Thoughts?

 thanks,
 Arun

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: [VOTE] Merge branch MAPREDUCE-2841 to trunk

2014-09-11 Thread Karthik Kambatla
+1

I skimmed over the initial import, but looked at the follow-up patches more
closely. There is very little change to the existing code, most (all?) of
which is already committed to trunk. Ran wordcount with the default
collector and the native collector on a single node setup - the latter
takes ~ 10% less wall-clock time. Haven't verified the CPU usage myself.



On Thu, Sep 11, 2014 at 5:00 AM, Devaraj K deva...@apache.org wrote:

 +1

 Good performance improvement. Nice work…



 On Sat, Sep 6, 2014 at 6:05 AM, Chris Douglas cdoug...@apache.org wrote:

  +1
 
  The change to the existing code is very limited and the perf is
  impressive. -C
 
  On Fri, Sep 5, 2014 at 4:58 PM, Todd Lipcon t...@apache.org wrote:
   Hi all,
  
   As I've reported recently [1], work on the MAPREDUCE-2841 branch has
   progressed well and the development team working on it feels that it is
   ready to be merged into trunk.
  
   For those not familiar with the JIRA (it's a bit lengthy to read from
  start
   to finish!) the goal of this work is to build a native implementation
 of
   the map-side sort code. The native implementation's primary advantage
 is
   its speed: for example, terasort is 30% faster on a wall-clock basis
 and
   60% faster on a resource consumption basis. For clusters which make
 heavy
   use of MapReduce, this is a substantial improvement to their
 efficiency.
   Users may enable the feature by switching a single configuration flag,
  and
   it will fall back to the original implementation in cases where the
  native
   code doesn't support the configured features/types.
  
   The new work is entirely pluggable and off-by-default to mitigate risk.
  The
   merge patch itself does not modify even a single line of existing code:
  all
   necessary plug-points have already been committed to trunk for some
 time.
  
   Though we do not yet have a full +1 precommit Jenkins run on the JIRA,
   there are only a few small nits to fix before merge, so I figured that
 we
   could start the vote in parallel. Of course we will not merge until it
  has
   a positive precommit run.
  
   Though this branch is a new contribution to the Apache repository, it
   represents work done over several years by a large community of
  developers
   including the following:
  
   Binglin Chang
   Yang Dong
   Sean Zhong
   Manu Zhang
   Zhongliang Zhu
   Vincent Wang
   Yan Dong
   Cheng Lian
   Xusen Yin
   Fangqin Dai
   Jiang Weihua
   Gansha Wu
   Avik Dey
  
   The vote will run for 7 days, ending Friday 9/12 EOD PST.
  
   I'll start the voting with my own +1.
  
   -Todd
  
   [1]
  
 
 http://search-hadoop.com/m/09oay13EwlV/native+task+progresssubj=Native+task+branch+progress
 



 --


 Thanks
 Devaraj K



Re: [VOTE] Release Apache Hadoop 2.5.1 RC0

2014-09-10 Thread Karthik Kambatla
Thanks for reporting the mistake in the documentation, Akira. While it is
good to fix it, I am not sure it is big enough to warrant another RC,
particularly because 2.5.1 is very much 2.5.0 done right.

I just updated the how-to-release wiki to capture this step in the release
process, so we don't miss it in the future.

On Mon, Sep 8, 2014 at 11:37 PM, Akira AJISAKA ajisa...@oss.nttdata.co.jp
wrote:

 -0 (non-binding)

 In the document, Apache Hadoop 2.5.1 is a minor release in the 2.x.y
 release line, buliding upon the previous stable release 2.4.1.

 Hadoop 2.5.1 is a point release. Filed HADOOP-11078 to track this.

 Regards,
 Akira


 (2014/09/09 0:51), Karthik Kambatla wrote:

 +1 (non-binding)

 Built the source tarball, brought up a pseudo-distributed cluster and ran
 a
 few MR jobs. Verified documentation and size of the binary tarball.

 On Fri, Sep 5, 2014 at 5:18 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Hi folks,

 I have put together a release candidate (RC0) for Hadoop 2.5.1.

 The RC is available at: http://people.apache.org/~
 kasha/hadoop-2.5.1-RC0/
 The RC git tag is release-2.5.1-RC0
 The maven artifacts are staged at:
 https://repository.apache.org/content/repositories/orgapachehadoop-1010/

 You can find my public key at:
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Please try the release and vote. The vote will run for the now usual 5
 days.

 Thanks
 Karthik






Re: [VOTE] Release Apache Hadoop 2.5.1 RC0

2014-09-08 Thread Karthik Kambatla
+1 (non-binding)

Built the source tarball, brought up a pseudo-distributed cluster and ran a
few MR jobs. Verified documentation and size of the binary tarball.

On Fri, Sep 5, 2014 at 5:18 PM, Karthik Kambatla ka...@cloudera.com wrote:

 Hi folks,

 I have put together a release candidate (RC0) for Hadoop 2.5.1.

 The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.1-RC0/
 The RC git tag is release-2.5.1-RC0
 The maven artifacts are staged at:
 https://repository.apache.org/content/repositories/orgapachehadoop-1010/

 You can find my public key at:
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Please try the release and vote. The vote will run for the now usual 5
 days.

 Thanks
 Karthik



[VOTE] Release Apache Hadoop 2.5.1 RC0

2014-09-05 Thread Karthik Kambatla
Hi folks,

I have put together a release candidate (RC0) for Hadoop 2.5.1.

The RC is available at: http://people.apache.org/~kasha/hadoop-2.5.1-RC0/
The RC git tag is release-2.5.1-RC0
The maven artifacts are staged at:
https://repository.apache.org/content/repositories/orgapachehadoop-1010/

You can find my public key at:
http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

Please try the release and vote. The vote will run for the now usual 5
days.

Thanks
Karthik


[jira] [Resolved] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-09-03 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved MAPREDUCE-5956.
-
  Resolution: Fixed
Target Version/s: 2.6.0  (was: 2.5.1)

Spoke to [~zjshen] offline. Looks like we can leave it out of 2.5.1. 

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 2.4.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: MR-5956.patch, MR-5956.patch


 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] Hadoop 2.5.1

2014-09-03 Thread Karthik Kambatla
Hi folks

Now that all issues with target 2.5.1 are committed, I am planning to cut
an RC for 2.5.1 this Friday. The fixes going into 2.5.1 are -
http://s.apache.org/2Mz

Are there any other Blocker issues that we would like to get into 2.5.1?
If there are any, please mark them as Blocker and target 2.5.1.

Committers - please avoid committing non-blockers to 2.5.1.

I have put together a new wiki for the release process
https://wiki.apache.org/hadoop/HowToReleasePostMavenizationWithGit. Would
greatly appreciate any eyeballs I can get on improving it. The wiki
reflects our migration to git, the updated create-release script that
avoids manual massaging of tarballs and other minor improvements.

Thanks
Karthik


Git repo ready to use

2014-08-27 Thread Karthik Kambatla
Hi folks,

I am very excited to let you know that the git repo is now writable. I
committed a few changes (CHANGES.txt fixes and branching for 2.5.1) and
everything looks good.

Current status:

   1. All branches have the same names, including trunk.
   2. Force push is disabled on trunk, branch-2 and tags.
   3. Even if you are experienced with git, take a look at
   https://wiki.apache.org/hadoop/HowToCommitWithGit . Particularly, let us
   avoid merge commits.

Follow-up items:

   1. Update rest of the wiki documentation
   2. Update precommit Jenkins jobs and get HADOOP-11001 committed (reviews
   appreciated). Until this is done, the precommit jobs will run against our
   old svn repo.
   3. git mirrors etc. to use the new repo instead of the old svn repo.

Thanks again for your cooperation through the migration process. Please
reach out to me (or the list) if you find anything missing or have
suggestions.

Cheers!
Karthik


Re: Git repo ready to use

2014-08-27 Thread Karthik Kambatla
Oh.. a couple more things.

The git commit hashes have changed and are different from what we had on
our github. This might interfere with any build automations that folks
have.

Another follow-up item: email and JIRA integration


On Wed, Aug 27, 2014 at 1:33 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Hi folks,

 I am very excited to let you know that the git repo is now writable. I
 committed a few changes (CHANGES.txt fixes and branching for 2.5.1) and
 everything looks good.

 Current status:

1. All branches have the same names, including trunk.
2. Force push is disabled on trunk, branch-2 and tags.
3. Even if you are experienced with git, take a look at
https://wiki.apache.org/hadoop/HowToCommitWithGit . Particularly, let
us avoid merge commits.

 Follow-up items:

1. Update rest of the wiki documentation
2. Update precommit Jenkins jobs and get HADOOP-11001 committed
(reviews appreciated). Until this is done, the precommit jobs will run
against our old svn repo.
3. git mirrors etc. to use the new repo instead of the old svn repo.

 Thanks again for your cooperation through the migration process. Please
 reach out to me (or the list) if you find anything missing or have
 suggestions.

 Cheers!
 Karthik




Re: Git repo ready to use

2014-08-27 Thread Karthik Kambatla
The emails for commits apparently all go to common-commits@, irrespective
of the project. Apparently, the git filtering is not as good as svn
filtering. Daniel offered to look into alternatives this coming weekend.

I filed INFRA-8250 to restore updates to JIRA on commits.


On Wed, Aug 27, 2014 at 1:40 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 Oh.. a couple more things.

 The git commit hashes have changed and are different from what we had on
 our github. This might interfere with any build automations that folks
 have.

 Another follow-up item: email and JIRA integration


 On Wed, Aug 27, 2014 at 1:33 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 I am very excited to let you know that the git repo is now writable. I
 committed a few changes (CHANGES.txt fixes and branching for 2.5.1) and
 everything looks good.

 Current status:

1. All branches have the same names, including trunk.
2. Force push is disabled on trunk, branch-2 and tags.
3. Even if you are experienced with git, take a look at
https://wiki.apache.org/hadoop/HowToCommitWithGit . Particularly, let
us avoid merge commits.

 Follow-up items:

1. Update rest of the wiki documentation
2. Update precommit Jenkins jobs and get HADOOP-11001 committed
(reviews appreciated). Until this is done, the precommit jobs will run
against our old svn repo.
3. git mirrors etc. to use the new repo instead of the old svn repo.

 Thanks again for your cooperation through the migration process. Please
 reach out to me (or the list) if you find anything missing or have
 suggestions.

 Cheers!
 Karthik





Re: Git repo ready to use

2014-08-27 Thread Karthik Kambatla
It appears the comments from Hudson on our JIRAs (post commits) are not
setup by the INFRA team. Do we use any other scripts for this? If yes, do
we want to fix those scripts or use svngit2jira?


On Wed, Aug 27, 2014 at 10:18 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 The emails for commits apparently all go to common-commits@, irrespective
 of the project. Apparently, the git filtering is not as good as svn
 filtering. Daniel offered to look into alternatives this coming weekend.

 I filed INFRA-8250 to restore updates to JIRA on commits.


 On Wed, Aug 27, 2014 at 1:40 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Oh.. a couple more things.

 The git commit hashes have changed and are different from what we had on
 our github. This might interfere with any build automations that folks
 have.

 Another follow-up item: email and JIRA integration


 On Wed, Aug 27, 2014 at 1:33 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 I am very excited to let you know that the git repo is now writable. I
 committed a few changes (CHANGES.txt fixes and branching for 2.5.1) and
 everything looks good.

 Current status:

1. All branches have the same names, including trunk.
2. Force push is disabled on trunk, branch-2 and tags.
3. Even if you are experienced with git, take a look at
https://wiki.apache.org/hadoop/HowToCommitWithGit . Particularly,
let us avoid merge commits.

 Follow-up items:

1. Update rest of the wiki documentation
2. Update precommit Jenkins jobs and get HADOOP-11001 committed
(reviews appreciated). Until this is done, the precommit jobs will run
against our old svn repo.
3. git mirrors etc. to use the new repo instead of the old svn repo.

 Thanks again for your cooperation through the migration process. Please
 reach out to me (or the list) if you find anything missing or have
 suggestions.

 Cheers!
 Karthik






Re: Git repo ready to use

2014-08-27 Thread Karthik Kambatla
svngit2jira would write a message like this -
https://issues.apache.org/jira/browse/CLOUDSTACK-1638?focusedCommentId=13714929page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13714929
with a link to the commit like this -
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=7260e8d

This is more concise and easier to look at than the Hudson list.


On Wed, Aug 27, 2014 at 10:48 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 It appears the comments from Hudson on our JIRAs (post commits) are not
 setup by the INFRA team. Do we use any other scripts for this? If yes, do
 we want to fix those scripts or use svngit2jira?


 On Wed, Aug 27, 2014 at 10:18 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 The emails for commits apparently all go to common-commits@,
 irrespective of the project. Apparently, the git filtering is not as good
 as svn filtering. Daniel offered to look into alternatives this coming
 weekend.

 I filed INFRA-8250 to restore updates to JIRA on commits.


 On Wed, Aug 27, 2014 at 1:40 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Oh.. a couple more things.

 The git commit hashes have changed and are different from what we had on
 our github. This might interfere with any build automations that folks
 have.

 Another follow-up item: email and JIRA integration


 On Wed, Aug 27, 2014 at 1:33 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 I am very excited to let you know that the git repo is now writable. I
 committed a few changes (CHANGES.txt fixes and branching for 2.5.1) and
 everything looks good.

 Current status:

1. All branches have the same names, including trunk.
2. Force push is disabled on trunk, branch-2 and tags.
3. Even if you are experienced with git, take a look at
https://wiki.apache.org/hadoop/HowToCommitWithGit . Particularly,
let us avoid merge commits.

 Follow-up items:

1. Update rest of the wiki documentation
2. Update precommit Jenkins jobs and get HADOOP-11001 committed
(reviews appreciated). Until this is done, the precommit jobs will run
against our old svn repo.
3. git mirrors etc. to use the new repo instead of the old svn
repo.

 Thanks again for your cooperation through the migration process. Please
 reach out to me (or the list) if you find anything missing or have
 suggestions.

 Cheers!
 Karthik







Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
Last I heard, the import is still going on and appears closer to getting
done. Thanks for your patience with the migration.

I ll update you as and when there is something. Eventually, the git repo
should be at the location in the wiki.


On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 Thanks for bringing these points up, Zhijie.

 By the way, a revised How-to-commit wiki is at:
 https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel free to
 make changes and improve it.

 On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen zs...@hortonworks.com
 wrote:

 Do we have any convention about user.name and user.email? For
 example,
 we'd like to use @apache.org for the email.


 May be, we can ask people to use project-specific configs here and use
 their real name and @apache.org address.

 Is there any downside to letting people use their global values for these
 configs?




 Moreover, do we want to use --author=Author Name em...@address.com
 when committing on behalf of a particular contributor?


 Fetching the email-address is complicated here. Should we use the
 contributor's email from JIRA? What if that is not their @apache address?




 On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Thanks for your input, Steve. Sorry for sending the email out that
 late, I
  sent it as soon as I could.
 
 
  On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran ste...@hortonworks.com
 
  wrote:
 
   just caught up with this after some offlininess...15:48 PST is too
 late
  for
   me.
  
   I'd be -1 to a change to master because of that risk that it does
 break
   existing code -especially people that have trunk off the git mirrors
 and
   automated builds/merges to go with it.
  
 
  Fair enough. It makes sense to leave it as trunk, unless someone is
  against it being trunk.
 
 
  
   master may be viewed as the official git way, but it doesn't have to
  be.
   For git-flow workflows (which we use in slider) master/ is for
 releases,
   develop/ for dev.
  
  
  
  
   On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com wrote:
  
Couple of things:
   
1. Since no one expressed any reservations against doing this on
 Sunday
   or
renaming trunk to master, I ll go ahead and confirm that. I think
 that
serves us better in the long run.
   
2. Arpit brought up the precommit builds - we should definitely fix
  them
   as
soon as we can. I understand Giri maintains those builds, do we have
   anyone
else who has access in case Giri is not reachable? Giri - please
 shout
   out
if you can help us with this either on Sunday or Monday.
   
Thanks
Karthik
   
   
   
   
On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla 
 ka...@cloudera.com
wrote:
   
 Also, does anyone know what we use for integration between JIRA
 and
   svn?
I
 am assuming svn2jira.


 On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla 
  ka...@cloudera.com
 wrote:

 Hi folks,

 For the SCM migration, feel free to follow
 https://issues.apache.org/jira/browse/INFRA-8195

 Most of this is planned to be handled this Sunday. As a result,
 the
 subversion repository would be read-only. If this is a major
 issue
  for
you,
 please shout out.

 Daniel Gruno, the one helping us with the migration, was asking
 if
  we
are
 open to renaming trunk to master to better conform to git
  lingo. I
am
 tempted to say yes, but wanted to check.

 Would greatly appreciate any help with checking the git repo has
 everything.

 Thanks
 Karthik



   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.





Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
Hi Suresh

There was one vote thread on whether to migrate to git, and the
implications to the commit process for individual patches and feature
branches -
https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html .
Prior to that, there was a discuss thread on the same topic.

As INFRA handles the actual migration from subversion to git, the vote
didn't include those specifics. The migration is going on as we speak (See
INFRA-8195). The initial expectation was that the migration would be done
in a few hours, but it has been several hours and the last I heard the
import was still running.

I have elaborated on the points in the vote thread and drafted up a wiki
page on how-to-commit - https://wiki.apache.org/hadoop/HowToCommitWithGit .
We can work on improving this further and call a vote thread on those items
if need be.

Thanks
Karthik


On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas sur...@hortonworks.com
wrote:

 Karthik,

 I would like to see detailed information on how this migration will be
 done, how it will affect the existing project and commit process. This
 should be done in a document that can be reviewed instead of in an email
 thread on an ad-hoc basis. Was there any voting on this in PMC and should
 we have a vote to ensure everyone is one the same page on doing this and
 how to go about it?

 Regards,
 Suresh


 On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Last I heard, the import is still going on and appears closer to getting
  done. Thanks for your patience with the migration.
 
  I ll update you as and when there is something. Eventually, the git repo
  should be at the location in the wiki.
 
 
  On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
   Thanks for bringing these points up, Zhijie.
  
   By the way, a revised How-to-commit wiki is at:
   https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel free
 to
   make changes and improve it.
  
   On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen zs...@hortonworks.com
   wrote:
  
   Do we have any convention about user.name and user.email? For
   example,
   we'd like to use @apache.org for the email.
  
  
   May be, we can ask people to use project-specific configs here and use
   their real name and @apache.org address.
  
   Is there any downside to letting people use their global values for
 these
   configs?
  
  
  
  
   Moreover, do we want to use --author=Author Name em...@address.com
 
   when committing on behalf of a particular contributor?
  
  
   Fetching the email-address is complicated here. Should we use the
   contributor's email from JIRA? What if that is not their @apache
 address?
  
  
  
  
   On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla ka...@cloudera.com
 
   wrote:
  
Thanks for your input, Steve. Sorry for sending the email out that
   late, I
sent it as soon as I could.
   
   
On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran 
  ste...@hortonworks.com
   
wrote:
   
 just caught up with this after some offlininess...15:48 PST is too
   late
for
 me.

 I'd be -1 to a change to master because of that risk that it
 does
   break
 existing code -especially people that have trunk off the git
 mirrors
   and
 automated builds/merges to go with it.

   
Fair enough. It makes sense to leave it as trunk, unless someone
 is
against it being trunk.
   
   

 master may be viewed as the official git way, but it doesn't
 have
  to
be.
 For git-flow workflows (which we use in slider) master/ is for
   releases,
 develop/ for dev.




 On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com
  wrote:

  Couple of things:
 
  1. Since no one expressed any reservations against doing this on
   Sunday
 or
  renaming trunk to master, I ll go ahead and confirm that. I
 think
   that
  serves us better in the long run.
 
  2. Arpit brought up the precommit builds - we should definitely
  fix
them
 as
  soon as we can. I understand Giri maintains those builds, do we
  have
 anyone
  else who has access in case Giri is not reachable? Giri - please
   shout
 out
  if you can help us with this either on Sunday or Monday.
 
  Thanks
  Karthik
 
 
 
 
  On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla 
   ka...@cloudera.com
  wrote:
 
   Also, does anyone know what we use for integration between
 JIRA
   and
 svn?
  I
   am assuming svn2jira.
  
  
   On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla 
ka...@cloudera.com
   wrote:
  
   Hi folks,
  
   For the SCM migration, feel free to follow
   https://issues.apache.org/jira/browse/INFRA-8195
  
   Most of this is planned to be handled this Sunday. As a
 result,
   the
   subversion repository would be read-only

Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
The git repository is now ready for inspection. I ll take a look shortly,
but it would be great if a few others could too.

Once we are okay with it, we can ask it to be writable.

On Tuesday, August 26, 2014, Karthik Kambatla ka...@cloudera.com wrote:

 Hi Suresh

 There was one vote thread on whether to migrate to git, and the
 implications to the commit process for individual patches and feature
 branches -
 https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html .
 Prior to that, there was a discuss thread on the same topic.

 As INFRA handles the actual migration from subversion to git, the vote
 didn't include those specifics. The migration is going on as we speak (See
 INFRA-8195). The initial expectation was that the migration would be done
 in a few hours, but it has been several hours and the last I heard the
 import was still running.

 I have elaborated on the points in the vote thread and drafted up a wiki
 page on how-to-commit - https://wiki.apache.org/hadoop/HowToCommitWithGit
 . We can work on improving this further and call a vote thread on those
 items if need be.

 Thanks
 Karthik


 On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas sur...@hortonworks.com
 javascript:_e(%7B%7D,'cvml','sur...@hortonworks.com'); wrote:

 Karthik,

 I would like to see detailed information on how this migration will be
 done, how it will affect the existing project and commit process. This
 should be done in a document that can be reviewed instead of in an email
 thread on an ad-hoc basis. Was there any voting on this in PMC and should
 we have a vote to ensure everyone is one the same page on doing this and
 how to go about it?

 Regards,
 Suresh


 On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla ka...@cloudera.com
 javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
 wrote:

  Last I heard, the import is still going on and appears closer to getting
  done. Thanks for your patience with the migration.
 
  I ll update you as and when there is something. Eventually, the git repo
  should be at the location in the wiki.
 
 
  On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla ka...@cloudera.com
 javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
  wrote:
 
   Thanks for bringing these points up, Zhijie.
  
   By the way, a revised How-to-commit wiki is at:
   https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel free
 to
   make changes and improve it.
  
   On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen zs...@hortonworks.com
 javascript:_e(%7B%7D,'cvml','zs...@hortonworks.com');
   wrote:
  
   Do we have any convention about user.name and user.email? For
   example,
   we'd like to use @apache.org for the email.
  
  
   May be, we can ask people to use project-specific configs here and use
   their real name and @apache.org address.
  
   Is there any downside to letting people use their global values for
 these
   configs?
  
  
  
  
   Moreover, do we want to use --author=Author Name 
 em...@address.com javascript:_e(%7B%7D,'cvml','em...@address.com');
   when committing on behalf of a particular contributor?
  
  
   Fetching the email-address is complicated here. Should we use the
   contributor's email from JIRA? What if that is not their @apache
 address?
  
  
  
  
   On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla 
 ka...@cloudera.com javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
   wrote:
  
Thanks for your input, Steve. Sorry for sending the email out that
   late, I
sent it as soon as I could.
   
   
On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran 
  ste...@hortonworks.com
 javascript:_e(%7B%7D,'cvml','ste...@hortonworks.com');
   
wrote:
   
 just caught up with this after some offlininess...15:48 PST is
 too
   late
for
 me.

 I'd be -1 to a change to master because of that risk that it
 does
   break
 existing code -especially people that have trunk off the git
 mirrors
   and
 automated builds/merges to go with it.

   
Fair enough. It makes sense to leave it as trunk, unless someone
 is
against it being trunk.
   
   

 master may be viewed as the official git way, but it doesn't
 have
  to
be.
 For git-flow workflows (which we use in slider) master/ is for
   releases,
 develop/ for dev.




 On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com
 javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
  wrote:

  Couple of things:
 
  1. Since no one expressed any reservations against doing this
 on
   Sunday
 or
  renaming trunk to master, I ll go ahead and confirm that. I
 think
   that
  serves us better in the long run.
 
  2. Arpit brought up the precommit builds - we should definitely
  fix
them
 as
  soon as we can. I understand Giri maintains those builds, do we
  have
 anyone
  else who has access in case Giri is not reachable? Giri -
 please
   shout
 out
  if you can help us

Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
I compared the new asf git repo against the svn and github repos (mirrored
from svn). Here is what I see:
- for i in *; do git diff $i ../hadoop-github/$i; done showed no
differences between the two. So, I think all the source is there.
- The branches match
- All svn tags exist in git, but git has a few more. These additional ones
are those that we deleted from svn.
- git rev-list --remotes | wc -l shows 27006 revisions in the new git repo
and 29549 revisions in the github repo. Checking with Daniel, he said the
git svn import works differently compared to the git mirroring.

Are we comfortable with making the git repo writable under these
conditions? I ll let other people poke around and report.

Thanks for your cooperation,
Karthik


On Tue, Aug 26, 2014 at 1:19 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 The git repository is now ready for inspection. I ll take a look shortly,
 but it would be great if a few others could too.

 Once we are okay with it, we can ask it to be writable.


 On Tuesday, August 26, 2014, Karthik Kambatla ka...@cloudera.com wrote:

 Hi Suresh

 There was one vote thread on whether to migrate to git, and the
 implications to the commit process for individual patches and feature
 branches -
 https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html
 . Prior to that, there was a discuss thread on the same topic.

 As INFRA handles the actual migration from subversion to git, the vote
 didn't include those specifics. The migration is going on as we speak (See
 INFRA-8195). The initial expectation was that the migration would be done
 in a few hours, but it has been several hours and the last I heard the
 import was still running.

 I have elaborated on the points in the vote thread and drafted up a wiki
 page on how-to-commit - https://wiki.apache.org/hadoop/HowToCommitWithGit
 . We can work on improving this further and call a vote thread on those
 items if need be.

 Thanks
 Karthik


 On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas sur...@hortonworks.com
  wrote:

 Karthik,

 I would like to see detailed information on how this migration will be
 done, how it will affect the existing project and commit process. This
 should be done in a document that can be reviewed instead of in an email
 thread on an ad-hoc basis. Was there any voting on this in PMC and should
 we have a vote to ensure everyone is one the same page on doing this and
 how to go about it?

 Regards,
 Suresh


 On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Last I heard, the import is still going on and appears closer to
 getting
  done. Thanks for your patience with the migration.
 
  I ll update you as and when there is something. Eventually, the git
 repo
  should be at the location in the wiki.
 
 
  On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
   Thanks for bringing these points up, Zhijie.
  
   By the way, a revised How-to-commit wiki is at:
   https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel
 free to
   make changes and improve it.
  
   On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen zs...@hortonworks.com
 
   wrote:
  
   Do we have any convention about user.name and user.email? For
   example,
   we'd like to use @apache.org for the email.
  
  
   May be, we can ask people to use project-specific configs here and
 use
   their real name and @apache.org address.
  
   Is there any downside to letting people use their global values for
 these
   configs?
  
  
  
  
   Moreover, do we want to use --author=Author Name 
 em...@address.com
   when committing on behalf of a particular contributor?
  
  
   Fetching the email-address is complicated here. Should we use the
   contributor's email from JIRA? What if that is not their @apache
 address?
  
  
  
  
   On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla 
 ka...@cloudera.com
   wrote:
  
Thanks for your input, Steve. Sorry for sending the email out that
   late, I
sent it as soon as I could.
   
   
On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran 
  ste...@hortonworks.com
   
wrote:
   
 just caught up with this after some offlininess...15:48 PST is
 too
   late
for
 me.

 I'd be -1 to a change to master because of that risk that it
 does
   break
 existing code -especially people that have trunk off the git
 mirrors
   and
 automated builds/merges to go with it.

   
Fair enough. It makes sense to leave it as trunk, unless
 someone is
against it being trunk.
   
   

 master may be viewed as the official git way, but it doesn't
 have
  to
be.
 For git-flow workflows (which we use in slider) master/ is for
   releases,
 develop/ for dev.




 On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com
  wrote:

  Couple of things:
 
  1. Since no one expressed any reservations against doing this
 on
   Sunday
 or
  renaming trunk

Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
Looks like our git repo is good to go.

On INFRA-8195, I am asking Daniel to enable writing to it. In case you find
any issues, please comment on the JIRA.

Thanks
Karthik


On Tue, Aug 26, 2014 at 3:28 PM, Arpit Agarwal aagar...@hortonworks.com
wrote:

 I cloned the new repo, built trunk and branch-2, verified all the branches
 are present. Also checked a few branches and the recent commit history
 matches our existing repo. Everything looks good so far.


 On Tue, Aug 26, 2014 at 1:19 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

  The git repository is now ready for inspection. I ll take a look shortly,
  but it would be great if a few others could too.
 
  Once we are okay with it, we can ask it to be writable.
 
  On Tuesday, August 26, 2014, Karthik Kambatla ka...@cloudera.com
 wrote:
 
   Hi Suresh
  
   There was one vote thread on whether to migrate to git, and the
   implications to the commit process for individual patches and feature
   branches -
  
 https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html
  .
   Prior to that, there was a discuss thread on the same topic.
  
   As INFRA handles the actual migration from subversion to git, the vote
   didn't include those specifics. The migration is going on as we speak
  (See
   INFRA-8195). The initial expectation was that the migration would be
 done
   in a few hours, but it has been several hours and the last I heard the
   import was still running.
  
   I have elaborated on the points in the vote thread and drafted up a
 wiki
   page on how-to-commit -
  https://wiki.apache.org/hadoop/HowToCommitWithGit
   . We can work on improving this further and call a vote thread on those
   items if need be.
  
   Thanks
   Karthik
  
  
   On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas 
  sur...@hortonworks.com
   javascript:_e(%7B%7D,'cvml','sur...@hortonworks.com'); wrote:
  
   Karthik,
  
   I would like to see detailed information on how this migration will be
   done, how it will affect the existing project and commit process. This
   should be done in a document that can be reviewed instead of in an
 email
   thread on an ad-hoc basis. Was there any voting on this in PMC and
  should
   we have a vote to ensure everyone is one the same page on doing this
 and
   how to go about it?
  
   Regards,
   Suresh
  
  
   On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla ka...@cloudera.com
   javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
   wrote:
  
Last I heard, the import is still going on and appears closer to
  getting
done. Thanks for your patience with the migration.
   
I ll update you as and when there is something. Eventually, the git
  repo
should be at the location in the wiki.
   
   
On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla 
 ka...@cloudera.com
   javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
wrote:
   
 Thanks for bringing these points up, Zhijie.

 By the way, a revised How-to-commit wiki is at:
 https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel
  free
   to
 make changes and improve it.

 On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen 
  zs...@hortonworks.com
   javascript:_e(%7B%7D,'cvml','zs...@hortonworks.com');
 wrote:

 Do we have any convention about user.name and user.email?
 For
 example,
 we'd like to use @apache.org for the email.


 May be, we can ask people to use project-specific configs here and
  use
 their real name and @apache.org address.

 Is there any downside to letting people use their global values
 for
   these
 configs?




 Moreover, do we want to use --author=Author Name 
   em...@address.com javascript:_e(%7B%7D,'cvml','em...@address.com
 ');
 when committing on behalf of a particular contributor?


 Fetching the email-address is complicated here. Should we use the
 contributor's email from JIRA? What if that is not their @apache
   address?




 On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla 
   ka...@cloudera.com javascript:_e(%7B%7D,'cvml','ka...@cloudera.com
  ');
 wrote:

  Thanks for your input, Steve. Sorry for sending the email out
  that
 late, I
  sent it as soon as I could.
 
 
  On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran 
ste...@hortonworks.com
   javascript:_e(%7B%7D,'cvml','ste...@hortonworks.com');
 
  wrote:
 
   just caught up with this after some offlininess...15:48 PST
 is
   too
 late
  for
   me.
  
   I'd be -1 to a change to master because of that risk that
 it
   does
 break
   existing code -especially people that have trunk off the git
   mirrors
 and
   automated builds/merges to go with it.
  
 
  Fair enough. It makes sense to leave it as trunk, unless
  someone
   is
  against it being trunk.
 
 
  
   master may be viewed

Re: Updates on migration to git

2014-08-26 Thread Karthik Kambatla
Yes, we have requested for force-push disabled on trunk and branch-*
branches. I didn't test it though :P, it is not writable yet.


On Tue, Aug 26, 2014 at 5:48 PM, Todd Lipcon t...@cloudera.com wrote:

 Hey Karthik,

 Just to confirm, have we disabled force-push support on the repo?

 In my experience, especially when a project has committers new to git,
 force-push support causes more trouble than it's worth.

 -Todd


 On Tue, Aug 26, 2014 at 4:39 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Looks like our git repo is good to go.
 
  On INFRA-8195, I am asking Daniel to enable writing to it. In case you
 find
  any issues, please comment on the JIRA.
 
  Thanks
  Karthik
 
 
  On Tue, Aug 26, 2014 at 3:28 PM, Arpit Agarwal aagar...@hortonworks.com
 
  wrote:
 
   I cloned the new repo, built trunk and branch-2, verified all the
  branches
   are present. Also checked a few branches and the recent commit history
   matches our existing repo. Everything looks good so far.
  
  
   On Tue, Aug 26, 2014 at 1:19 PM, Karthik Kambatla ka...@cloudera.com
   wrote:
  
The git repository is now ready for inspection. I ll take a look
  shortly,
but it would be great if a few others could too.
   
Once we are okay with it, we can ask it to be writable.
   
On Tuesday, August 26, 2014, Karthik Kambatla ka...@cloudera.com
   wrote:
   
 Hi Suresh

 There was one vote thread on whether to migrate to git, and the
 implications to the commit process for individual patches and
 feature
 branches -

  
 https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13447.html
.
 Prior to that, there was a discuss thread on the same topic.

 As INFRA handles the actual migration from subversion to git, the
  vote
 didn't include those specifics. The migration is going on as we
 speak
(See
 INFRA-8195). The initial expectation was that the migration would
 be
   done
 in a few hours, but it has been several hours and the last I heard
  the
 import was still running.

 I have elaborated on the points in the vote thread and drafted up a
   wiki
 page on how-to-commit -
https://wiki.apache.org/hadoop/HowToCommitWithGit
 . We can work on improving this further and call a vote thread on
  those
 items if need be.

 Thanks
 Karthik


 On Tue, Aug 26, 2014 at 11:41 AM, Suresh Srinivas 
sur...@hortonworks.com
 javascript:_e(%7B%7D,'cvml','sur...@hortonworks.com'); wrote:

 Karthik,

 I would like to see detailed information on how this migration
 will
  be
 done, how it will affect the existing project and commit process.
  This
 should be done in a document that can be reviewed instead of in an
   email
 thread on an ad-hoc basis. Was there any voting on this in PMC and
should
 we have a vote to ensure everyone is one the same page on doing
 this
   and
 how to go about it?

 Regards,
 Suresh


 On Tue, Aug 26, 2014 at 9:17 AM, Karthik Kambatla 
  ka...@cloudera.com
 javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
 wrote:

  Last I heard, the import is still going on and appears closer to
getting
  done. Thanks for your patience with the migration.
 
  I ll update you as and when there is something. Eventually, the
  git
repo
  should be at the location in the wiki.
 
 
  On Mon, Aug 25, 2014 at 3:45 PM, Karthik Kambatla 
   ka...@cloudera.com
 javascript:_e(%7B%7D,'cvml','ka...@cloudera.com');
  wrote:
 
   Thanks for bringing these points up, Zhijie.
  
   By the way, a revised How-to-commit wiki is at:
   https://wiki.apache.org/hadoop/HowToCommitWithGit . Please
 feel
free
 to
   make changes and improve it.
  
   On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen 
zs...@hortonworks.com
 javascript:_e(%7B%7D,'cvml','zs...@hortonworks.com');
   wrote:
  
   Do we have any convention about user.name and
 user.email?
   For
   example,
   we'd like to use @apache.org for the email.
  
  
   May be, we can ask people to use project-specific configs here
  and
use
   their real name and @apache.org address.
  
   Is there any downside to letting people use their global
 values
   for
 these
   configs?
  
  
  
  
   Moreover, do we want to use --author=Author Name 
 em...@address.com javascript:_e(%7B%7D,'cvml','em...@address.com
   ');
   when committing on behalf of a particular contributor?
  
  
   Fetching the email-address is complicated here. Should we use
  the
   contributor's email from JIRA? What if that is not their
 @apache
 address?
  
  
  
  
   On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla 
 ka...@cloudera.com javascript:_e(%7B%7D,'cvml','
 ka...@cloudera.com
');
   wrote

Re: Updates on migration to git

2014-08-25 Thread Karthik Kambatla
Thanks for your input, Steve. Sorry for sending the email out that late, I
sent it as soon as I could.


On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran ste...@hortonworks.com
wrote:

 just caught up with this after some offlininess...15:48 PST is too late for
 me.

 I'd be -1 to a change to master because of that risk that it does break
 existing code -especially people that have trunk off the git mirrors and
 automated builds/merges to go with it.


Fair enough. It makes sense to leave it as trunk, unless someone is
against it being trunk.



 master may be viewed as the official git way, but it doesn't have to be.
 For git-flow workflows (which we use in slider) master/ is for releases,
 develop/ for dev.




 On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com wrote:

  Couple of things:
 
  1. Since no one expressed any reservations against doing this on Sunday
 or
  renaming trunk to master, I ll go ahead and confirm that. I think that
  serves us better in the long run.
 
  2. Arpit brought up the precommit builds - we should definitely fix them
 as
  soon as we can. I understand Giri maintains those builds, do we have
 anyone
  else who has access in case Giri is not reachable? Giri - please shout
 out
  if you can help us with this either on Sunday or Monday.
 
  Thanks
  Karthik
 
 
 
 
  On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
   Also, does anyone know what we use for integration between JIRA and
 svn?
  I
   am assuming svn2jira.
  
  
   On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla ka...@cloudera.com
   wrote:
  
   Hi folks,
  
   For the SCM migration, feel free to follow
   https://issues.apache.org/jira/browse/INFRA-8195
  
   Most of this is planned to be handled this Sunday. As a result, the
   subversion repository would be read-only. If this is a major issue for
  you,
   please shout out.
  
   Daniel Gruno, the one helping us with the migration, was asking if we
  are
   open to renaming trunk to master to better conform to git lingo. I
  am
   tempted to say yes, but wanted to check.
  
   Would greatly appreciate any help with checking the git repo has
   everything.
  
   Thanks
   Karthik
  
  
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Updates on migration to git

2014-08-25 Thread Karthik Kambatla
Thanks for bringing these points up, Zhijie.

By the way, a revised How-to-commit wiki is at:
https://wiki.apache.org/hadoop/HowToCommitWithGit . Please feel free to
make changes and improve it.

On Mon, Aug 25, 2014 at 11:00 AM, Zhijie Shen zs...@hortonworks.com wrote:

 Do we have any convention about user.name and user.email? For example,
 we'd like to use @apache.org for the email.


May be, we can ask people to use project-specific configs here and use
their real name and @apache.org address.

Is there any downside to letting people use their global values for these
configs?




 Moreover, do we want to use --author=Author Name em...@address.com
 when committing on behalf of a particular contributor?


Fetching the email-address is complicated here. Should we use the
contributor's email from JIRA? What if that is not their @apache address?




 On Mon, Aug 25, 2014 at 9:56 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

  Thanks for your input, Steve. Sorry for sending the email out that late,
 I
  sent it as soon as I could.
 
 
  On Mon, Aug 25, 2014 at 2:20 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   just caught up with this after some offlininess...15:48 PST is too late
  for
   me.
  
   I'd be -1 to a change to master because of that risk that it does
 break
   existing code -especially people that have trunk off the git mirrors
 and
   automated builds/merges to go with it.
  
 
  Fair enough. It makes sense to leave it as trunk, unless someone is
  against it being trunk.
 
 
  
   master may be viewed as the official git way, but it doesn't have to
  be.
   For git-flow workflows (which we use in slider) master/ is for
 releases,
   develop/ for dev.
  
  
  
  
   On 24 August 2014 02:31, Karthik Kambatla ka...@cloudera.com wrote:
  
Couple of things:
   
1. Since no one expressed any reservations against doing this on
 Sunday
   or
renaming trunk to master, I ll go ahead and confirm that. I think
 that
serves us better in the long run.
   
2. Arpit brought up the precommit builds - we should definitely fix
  them
   as
soon as we can. I understand Giri maintains those builds, do we have
   anyone
else who has access in case Giri is not reachable? Giri - please
 shout
   out
if you can help us with this either on Sunday or Monday.
   
Thanks
Karthik
   
   
   
   
On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla 
 ka...@cloudera.com
wrote:
   
 Also, does anyone know what we use for integration between JIRA and
   svn?
I
 am assuming svn2jira.


 On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla 
  ka...@cloudera.com
 wrote:

 Hi folks,

 For the SCM migration, feel free to follow
 https://issues.apache.org/jira/browse/INFRA-8195

 Most of this is planned to be handled this Sunday. As a result,
 the
 subversion repository would be read-only. If this is a major issue
  for
you,
 please shout out.

 Daniel Gruno, the one helping us with the migration, was asking if
  we
are
 open to renaming trunk to master to better conform to git
  lingo. I
am
 tempted to say yes, but wanted to check.

 Would greatly appreciate any help with checking the git repo has
 everything.

 Thanks
 Karthik



   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Updates on migration to git

2014-08-24 Thread Karthik Kambatla
Thanks Giri.

By the way, writes to svn are now disabled.

On Saturday, August 23, 2014, Giridharan Kesavan gkesa...@hortonworks.com
wrote:

 ​I can take a look at this on Monday. ​

 -giri


 On Sat, Aug 23, 2014 at 6:31 PM, Karthik Kambatla ka...@cloudera.com
 javascript:;
 wrote:

  Couple of things:
 
  1. Since no one expressed any reservations against doing this on Sunday
 or
  renaming trunk to master, I ll go ahead and confirm that. I think that
  serves us better in the long run.
 
  2. Arpit brought up the precommit builds - we should definitely fix them
 as
  soon as we can. I understand Giri maintains those builds, do we have
 anyone
  else who has access in case Giri is not reachable? Giri - please shout
 out
  if you can help us with this either on Sunday or Monday.
 
  Thanks
  Karthik
 
 
 
 
  On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla ka...@cloudera.com
 javascript:;
  wrote:
 
   Also, does anyone know what we use for integration between JIRA and
 svn?
  I
   am assuming svn2jira.
  
  
   On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla ka...@cloudera.com
 javascript:;
   wrote:
  
   Hi folks,
  
   For the SCM migration, feel free to follow
   https://issues.apache.org/jira/browse/INFRA-8195
  
   Most of this is planned to be handled this Sunday. As a result, the
   subversion repository would be read-only. If this is a major issue for
  you,
   please shout out.
  
   Daniel Gruno, the one helping us with the migration, was asking if we
  are
   open to renaming trunk to master to better conform to git lingo. I
  am
   tempted to say yes, but wanted to check.
  
   Would greatly appreciate any help with checking the git repo has
   everything.
  
   Thanks
   Karthik
  
  
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
Mobile


Re: Updates on migration to git

2014-08-23 Thread Karthik Kambatla
Couple of things:

1. Since no one expressed any reservations against doing this on Sunday or
renaming trunk to master, I ll go ahead and confirm that. I think that
serves us better in the long run.

2. Arpit brought up the precommit builds - we should definitely fix them as
soon as we can. I understand Giri maintains those builds, do we have anyone
else who has access in case Giri is not reachable? Giri - please shout out
if you can help us with this either on Sunday or Monday.

Thanks
Karthik




On Fri, Aug 22, 2014 at 3:50 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 Also, does anyone know what we use for integration between JIRA and svn? I
 am assuming svn2jira.


 On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 For the SCM migration, feel free to follow
 https://issues.apache.org/jira/browse/INFRA-8195

 Most of this is planned to be handled this Sunday. As a result, the
 subversion repository would be read-only. If this is a major issue for you,
 please shout out.

 Daniel Gruno, the one helping us with the migration, was asking if we are
 open to renaming trunk to master to better conform to git lingo. I am
 tempted to say yes, but wanted to check.

 Would greatly appreciate any help with checking the git repo has
 everything.

 Thanks
 Karthik





Updates on migration to git

2014-08-22 Thread Karthik Kambatla
Hi folks,

For the SCM migration, feel free to follow
https://issues.apache.org/jira/browse/INFRA-8195

Most of this is planned to be handled this Sunday. As a result, the
subversion repository would be read-only. If this is a major issue for you,
please shout out.

Daniel Gruno, the one helping us with the migration, was asking if we are
open to renaming trunk to master to better conform to git lingo. I am
tempted to say yes, but wanted to check.

Would greatly appreciate any help with checking the git repo has
everything.

Thanks
Karthik


Re: Updates on migration to git

2014-08-22 Thread Karthik Kambatla
Also, does anyone know what we use for integration between JIRA and svn? I
am assuming svn2jira.


On Fri, Aug 22, 2014 at 3:48 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 Hi folks,

 For the SCM migration, feel free to follow
 https://issues.apache.org/jira/browse/INFRA-8195

 Most of this is planned to be handled this Sunday. As a result, the
 subversion repository would be read-only. If this is a major issue for you,
 please shout out.

 Daniel Gruno, the one helping us with the migration, was asking if we are
 open to renaming trunk to master to better conform to git lingo. I am
 tempted to say yes, but wanted to check.

 Would greatly appreciate any help with checking the git repo has
 everything.

 Thanks
 Karthik



[jira] [Reopened] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-08-21 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reopened MAPREDUCE-5956:
-


Reopening this to include in 2.5.1. 

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Affects Versions: 2.4.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.6.0

 Attachments: MR-5956.patch, MR-5956.patch


 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Apache Hadoop 2.5.0 published tarballs are missing some txt files

2014-08-18 Thread Karthik Kambatla
Hi devs

Tsuyoshi just brought it to my notice that the published tarballs don't
have LICENSE, NOTICE and README at the top-level. Instead, they are only
under common, hdfs, etc.

Now that we have already announced the release and the jars/functionality
doesn't change, I propose we just update the tarballs with ones that
includes those files? I just untar-ed the published tarballs and copied
LICENSE, NOTICE and README from under common to the top directory and
tar-ed them back again.

The updated tarballs are at: http://people.apache.org/~kasha/hadoop-2.5.0/
. Can someone please verify the signatures?

If you would prefer an alternate action, please suggest.

Thanks
Karthik

PS: HADOOP-10956 should include the fix for these files also.


Re: [VOTE] Migration from subversion to git for version control

2014-08-16 Thread Karthik Kambatla
Thanks everyone for voting. Here is my +1 (non-binding).

The vote closes with 27 +1s (15 binding). I ll work with INFRA on the
migration, and draft a wiki for How To with git.


On Thu, Aug 14, 2014 at 4:34 PM, Jonathan Eagles jeag...@gmail.com wrote:

 +1
 On Aug 14, 2014 5:56 PM, Hitesh Shah hit...@apache.org wrote:

  +1
 
  — Hitesh
 
  On Aug 8, 2014, at 7:57 PM, Karthik Kambatla ka...@cloudera.com wrote:
 
   I have put together this proposal based on recent discussion on this
  topic.
  
   Please vote on the proposal. The vote runs for 7 days.
  
 1. Migrate from subversion to git for version control.
 2. Force-push to be disabled on trunk and branch-* branches. Applying
 changes from any of trunk/branch-* to any of branch-* should be
 through
 git cherry-pick -x.
 3. Force-push on feature-branches is allowed. Before pulling in a
 feature, the feature-branch should be rebased on latest trunk and the
 changes applied to trunk through git rebase --onto or git
  cherry-pick
 commit-range.
 4. Every time a feature branch is rebased on trunk, a tag that
 identifies the state before the rebase needs to be created (e.g.
 tag_feature_JIRA-2454_2014-08-07_rebase). These tags can be deleted
  once
 the feature is pulled into trunk and the tags are no longer useful.
 5. The relevance/use of tags stay the same after the migration.
  
   Thanks
   Karthik
  
   PS: Per Andrew Wang, this should be a Adoption of New Codebase kind
 of
   vote and will be Lazy 2/3 majority of PMC members.
 
 



Re: [VOTE] Release Apache Hadoop 2.5.0 RC2

2014-08-12 Thread Karthik Kambatla
Thanks for trying the new binary out, Akira. I just updated the binary to
include the site files as well.


On Mon, Aug 11, 2014 at 11:28 PM, Akira AJISAKA ajisa...@oss.nttdata.co.jp
wrote:

 Thanks Karthik for the update, but the some documents are not included.

 It looks to me that *.apt.vm files are not compiled.
 Would you please generate documents with
 'mvn site site:stage /path/to/deploy'?

 I'm +1 (non-binding) if the documents are included.

 Thanks,
 Akira


 (2014/08/12 13:49), Karthik Kambatla wrote:

 I have updated the binary tar ball to include the docs, by building the
 docs locally and copying them over. Filed HADOOP-10956 to fix the
 create-release script to handle this.

 The RC is here: http://people.apache.org/~kasha/hadoop-2.5.0-RC2/

 Please note that the binary tar ball is signed by a new gpg key, so please
 re-import keys. (I lost my original private key). The source tar ball,
 signature and checksum are untouched.


 On Mon, Aug 11, 2014 at 1:03 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
 
 wrote:

  Thank you for dealing with this problem, Karthik. It sounds reasonable
 to me to update binary tarball without modifying source code.
 +1(non-binding) to continue the voting.

 Thanks,
 - Tsuyoshi

 On Tue, Aug 12, 2014 at 3:47 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Thanks Akira for catching the missing docs. Let me work on *updating*
 the
 binary tarball to include the docs *without* modifying the source. Will
 send an update as soon as I sort that out.

 The vote itself is on the source and the binary is just a convenience. I
 hope we can let the vote continue so long as we don't change the source.

 Thanks
 Karthik


 On Mon, Aug 11, 2014 at 5:51 AM, Tsuyoshi OZAWA 

 ozawa.tsuyo...@gmail.com

 wrote:

  -1 (non-binding).

 * Downloaded source and verified signature.
 * Built from source.
 * Ran tests and some MR jobs.
 * Ran RM-HA with manual failover mode.
 - Documents are missing.

 Good catch, Akira. As he mentioned, the documents including javadocs
 are missing from the binary tar ball. We should include them. It only
 includes following docs:
 $ find . -name *.html
 ./share/doc/hadoop/httpfs/dependency-analysis.html
 ./share/doc/hadoop/httpfs/UsingHttpTools.html
 ./share/doc/hadoop/httpfs/project-reports.html
 ./share/doc/hadoop/httpfs/ServerSetup.html
 ./share/doc/hadoop/httpfs/index.html
 ./share/hadoop/httpfs/tomcat/webapps/ROOT/index.html
 ./share/hadoop/hdfs/webapps/secondary/status.html
 ./share/hadoop/hdfs/webapps/secondary/index.html
 ./share/hadoop/hdfs/webapps/hdfs/dfshealth.html
 ./share/hadoop/hdfs/webapps/hdfs/explorer.html
 ./share/hadoop/hdfs/webapps/hdfs/index.html
 ./share/hadoop/hdfs/webapps/journal/index.html
 ./share/hadoop/hdfs/webapps/datanode/index.html
 ./share/hadoop/tools/sls/html/showSimulationTrace.html

 Karthik, could you create new tar ball with the documentations?

 Thanks,
 - Tsuyoshi

 On Thu, Aug 7, 2014 at 5:59 AM, Karthik Kambatla ka...@cloudera.com
 wrote:

 Hi folks,

 I have put together a release candidate (rc2) for Hadoop 2.5.0.

 The RC is available at:

 http://people.apache.org/~kasha/hadoop-2.5.0-RC2/

 The RC tag in svn is here:

  https://svn.apache.org/repos/asf/hadoop/common/tags/
 release-2.5.0-rc2/

 The maven artifacts are staged at:

  https://repository.apache.org/content/repositories/
 orgapachehadoop-1009/


 You can find my public key at:
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Please try the release and vote. The vote will run for the now usual 5
 days.

 Thanks




 --
 - Tsuyoshi




 --
 - Tsuyoshi






Re: [VOTE] Release Apache Hadoop 2.5.0 RC2

2014-08-12 Thread Karthik Kambatla
Thanks everyone for trying out the RC and voting.

The vote ends with 17 +1s (5 binding) and no -1s. I ll work on publishing
the release.

On Tue, Aug 12, 2014 at 8:46 AM, Arun C Murthy a...@hortonworks.com wrote:

 +1 (binding)

 Verified sigs and ran sample jobs. Thanks for taking the lead on this
 Karthik.

 Arun

 On Aug 6, 2014, at 1:59 PM, Karthik Kambatla ka...@cloudera.com wrote:

  Hi folks,
 
  I have put together a release candidate (rc2) for Hadoop 2.5.0.
 
  The RC is available at:
 http://people.apache.org/~kasha/hadoop-2.5.0-RC2/
  The RC tag in svn is here:
  https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc2/
  The maven artifacts are staged at:
  https://repository.apache.org/content/repositories/orgapachehadoop-1009/
 
  You can find my public key at:
  http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
 
  Please try the release and vote. The vote will run for the now usual 5
  days.
 
  Thanks



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: [VOTE] Release Apache Hadoop 2.5.0 RC2

2014-08-11 Thread Karthik Kambatla
Thanks Akira for catching the missing docs. Let me work on *updating* the
binary tarball to include the docs *without* modifying the source. Will
send an update as soon as I sort that out.

The vote itself is on the source and the binary is just a convenience. I
hope we can let the vote continue so long as we don't change the source.

Thanks
Karthik


On Mon, Aug 11, 2014 at 5:51 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
wrote:

 -1 (non-binding).

 * Downloaded source and verified signature.
 * Built from source.
 * Ran tests and some MR jobs.
 * Ran RM-HA with manual failover mode.
 - Documents are missing.

 Good catch, Akira. As he mentioned, the documents including javadocs
 are missing from the binary tar ball. We should include them. It only
 includes following docs:
 $ find . -name *.html
 ./share/doc/hadoop/httpfs/dependency-analysis.html
 ./share/doc/hadoop/httpfs/UsingHttpTools.html
 ./share/doc/hadoop/httpfs/project-reports.html
 ./share/doc/hadoop/httpfs/ServerSetup.html
 ./share/doc/hadoop/httpfs/index.html
 ./share/hadoop/httpfs/tomcat/webapps/ROOT/index.html
 ./share/hadoop/hdfs/webapps/secondary/status.html
 ./share/hadoop/hdfs/webapps/secondary/index.html
 ./share/hadoop/hdfs/webapps/hdfs/dfshealth.html
 ./share/hadoop/hdfs/webapps/hdfs/explorer.html
 ./share/hadoop/hdfs/webapps/hdfs/index.html
 ./share/hadoop/hdfs/webapps/journal/index.html
 ./share/hadoop/hdfs/webapps/datanode/index.html
 ./share/hadoop/tools/sls/html/showSimulationTrace.html

 Karthik, could you create new tar ball with the documentations?

 Thanks,
 - Tsuyoshi

 On Thu, Aug 7, 2014 at 5:59 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
  Hi folks,
 
  I have put together a release candidate (rc2) for Hadoop 2.5.0.
 
  The RC is available at:
 http://people.apache.org/~kasha/hadoop-2.5.0-RC2/
  The RC tag in svn is here:
  https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc2/
  The maven artifacts are staged at:
  https://repository.apache.org/content/repositories/orgapachehadoop-1009/
 
  You can find my public key at:
  http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
 
  Please try the release and vote. The vote will run for the now usual 5
  days.
 
  Thanks



 --
 - Tsuyoshi



Re: [VOTE] Release Apache Hadoop 2.5.0 RC2

2014-08-11 Thread Karthik Kambatla
I have updated the binary tar ball to include the docs, by building the
docs locally and copying them over. Filed HADOOP-10956 to fix the
create-release script to handle this.

The RC is here: http://people.apache.org/~kasha/hadoop-2.5.0-RC2/

Please note that the binary tar ball is signed by a new gpg key, so please
re-import keys. (I lost my original private key). The source tar ball,
signature and checksum are untouched.


On Mon, Aug 11, 2014 at 1:03 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
wrote:

 Thank you for dealing with this problem, Karthik. It sounds reasonable
 to me to update binary tarball without modifying source code.
 +1(non-binding) to continue the voting.

 Thanks,
 - Tsuyoshi

 On Tue, Aug 12, 2014 at 3:47 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
  Thanks Akira for catching the missing docs. Let me work on *updating* the
  binary tarball to include the docs *without* modifying the source. Will
  send an update as soon as I sort that out.
 
  The vote itself is on the source and the binary is just a convenience. I
  hope we can let the vote continue so long as we don't change the source.
 
  Thanks
  Karthik
 
 
  On Mon, Aug 11, 2014 at 5:51 AM, Tsuyoshi OZAWA 
 ozawa.tsuyo...@gmail.com
  wrote:
 
  -1 (non-binding).
 
  * Downloaded source and verified signature.
  * Built from source.
  * Ran tests and some MR jobs.
  * Ran RM-HA with manual failover mode.
  - Documents are missing.
 
  Good catch, Akira. As he mentioned, the documents including javadocs
  are missing from the binary tar ball. We should include them. It only
  includes following docs:
  $ find . -name *.html
  ./share/doc/hadoop/httpfs/dependency-analysis.html
  ./share/doc/hadoop/httpfs/UsingHttpTools.html
  ./share/doc/hadoop/httpfs/project-reports.html
  ./share/doc/hadoop/httpfs/ServerSetup.html
  ./share/doc/hadoop/httpfs/index.html
  ./share/hadoop/httpfs/tomcat/webapps/ROOT/index.html
  ./share/hadoop/hdfs/webapps/secondary/status.html
  ./share/hadoop/hdfs/webapps/secondary/index.html
  ./share/hadoop/hdfs/webapps/hdfs/dfshealth.html
  ./share/hadoop/hdfs/webapps/hdfs/explorer.html
  ./share/hadoop/hdfs/webapps/hdfs/index.html
  ./share/hadoop/hdfs/webapps/journal/index.html
  ./share/hadoop/hdfs/webapps/datanode/index.html
  ./share/hadoop/tools/sls/html/showSimulationTrace.html
 
  Karthik, could you create new tar ball with the documentations?
 
  Thanks,
  - Tsuyoshi
 
  On Thu, Aug 7, 2014 at 5:59 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
   Hi folks,
  
   I have put together a release candidate (rc2) for Hadoop 2.5.0.
  
   The RC is available at:
  http://people.apache.org/~kasha/hadoop-2.5.0-RC2/
   The RC tag in svn is here:
  
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc2/
   The maven artifacts are staged at:
  
 https://repository.apache.org/content/repositories/orgapachehadoop-1009/
  
   You can find my public key at:
   http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
  
   Please try the release and vote. The vote will run for the now usual 5
   days.
  
   Thanks
 
 
 
  --
  - Tsuyoshi
 



 --
 - Tsuyoshi



Re: [VOTE] Release Apache Hadoop 2.5.0 RC2

2014-08-11 Thread Karthik Kambatla
Can someone please verify the signatures on the new binary and the old
source tarballs to make sure it is all good? If it is, I believe we can go
ahead and close the vote.


On Mon, Aug 11, 2014 at 9:49 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 I have updated the binary tar ball to include the docs, by building the
 docs locally and copying them over. Filed HADOOP-10956 to fix the
 create-release script to handle this.

 The RC is here: http://people.apache.org/~kasha/hadoop-2.5.0-RC2/

 Please note that the binary tar ball is signed by a new gpg key, so please
 re-import keys. (I lost my original private key). The source tar ball,
 signature and checksum are untouched.



 On Mon, Aug 11, 2014 at 1:03 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
 wrote:

 Thank you for dealing with this problem, Karthik. It sounds reasonable
 to me to update binary tarball without modifying source code.
 +1(non-binding) to continue the voting.

 Thanks,
 - Tsuyoshi

 On Tue, Aug 12, 2014 at 3:47 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
  Thanks Akira for catching the missing docs. Let me work on *updating*
 the
  binary tarball to include the docs *without* modifying the source. Will
  send an update as soon as I sort that out.
 
  The vote itself is on the source and the binary is just a convenience. I
  hope we can let the vote continue so long as we don't change the source.
 
  Thanks
  Karthik
 
 
  On Mon, Aug 11, 2014 at 5:51 AM, Tsuyoshi OZAWA 
 ozawa.tsuyo...@gmail.com
  wrote:
 
  -1 (non-binding).
 
  * Downloaded source and verified signature.
  * Built from source.
  * Ran tests and some MR jobs.
  * Ran RM-HA with manual failover mode.
  - Documents are missing.
 
  Good catch, Akira. As he mentioned, the documents including javadocs
  are missing from the binary tar ball. We should include them. It only
  includes following docs:
  $ find . -name *.html
  ./share/doc/hadoop/httpfs/dependency-analysis.html
  ./share/doc/hadoop/httpfs/UsingHttpTools.html
  ./share/doc/hadoop/httpfs/project-reports.html
  ./share/doc/hadoop/httpfs/ServerSetup.html
  ./share/doc/hadoop/httpfs/index.html
  ./share/hadoop/httpfs/tomcat/webapps/ROOT/index.html
  ./share/hadoop/hdfs/webapps/secondary/status.html
  ./share/hadoop/hdfs/webapps/secondary/index.html
  ./share/hadoop/hdfs/webapps/hdfs/dfshealth.html
  ./share/hadoop/hdfs/webapps/hdfs/explorer.html
  ./share/hadoop/hdfs/webapps/hdfs/index.html
  ./share/hadoop/hdfs/webapps/journal/index.html
  ./share/hadoop/hdfs/webapps/datanode/index.html
  ./share/hadoop/tools/sls/html/showSimulationTrace.html
 
  Karthik, could you create new tar ball with the documentations?
 
  Thanks,
  - Tsuyoshi
 
  On Thu, Aug 7, 2014 at 5:59 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
   Hi folks,
  
   I have put together a release candidate (rc2) for Hadoop 2.5.0.
  
   The RC is available at:
  http://people.apache.org/~kasha/hadoop-2.5.0-RC2/
   The RC tag in svn is here:
  
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.5.0-rc2/
   The maven artifacts are staged at:
  
 https://repository.apache.org/content/repositories/orgapachehadoop-1009/
  
   You can find my public key at:
   http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
  
   Please try the release and vote. The vote will run for the now usual
 5
   days.
  
   Thanks
 
 
 
  --
  - Tsuyoshi
 



 --
 - Tsuyoshi





Re: [DISCUSS] Migrate from svn to git for source control?

2014-08-08 Thread Karthik Kambatla
Thanks Steve. Including that in the proposal.

By the way, from our project bylaws (http://hadoop.apache.org/bylaws.html),
I can't tell what kind of a vote this would be.


On Thu, Aug 7, 2014 at 1:22 AM, Steve Loughran ste...@hortonworks.com
wrote:

 On 6 August 2014 22:16, Karthik Kambatla ka...@cloudera.com wrote:

  3. Force-push on feature-branches is allowed. Before pulling in a
 feature,
  the feature-branch should be rebased on latest trunk and the changes
  applied to trunk through git rebase --onto or git cherry-pick
  commit-range.
 

 I'd add to this process the requirement to tag any feature branch before a
 rebase, with some standard naming like

 tag_feature_JIRA-2454_2014-08-07_rebase

 Why? it keeps the state of the branch before the rebase in case you ever
 want it back again. Without the tag: lost data. Once the feature is merged
 in you can rm the tags, but until then they give you a log of what changes
 went on, and make it possible to switch back to the pre-rebase version.

 Without those tags you do lose history of the development.

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



  1   2   3   >