Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-27 Thread Sangjin Lee
+1 (binding)

- downloaded both source and binary tarballs and verified the signatures
- set up a pseudo-distributed cluster
- ran some simple mapreduce jobs
- checked the basic web UI

Sangjin

On Wed, Jul 27, 2016 at 12:57 PM, John Zhuge  wrote:

> +1 (non-binding)
>
> - Build source with Java 1.8.0_101 on Centos 7.2 with native
> - Build source with Java 1.7.0_79 on Mac
> - Verify license and notice using the shell script in HADOOP-13374
> - Deploy a pseudo cluster
> - Run basic dfs, distcp, ACL, webhdfs commands
> - Run MapReduce workcount and pi examples
> - Start balancer
>
> Thanks,
> John
>
> John Zhuge
> Software Engineer, Cloudera
>
> On Wed, Jul 27, 2016 at 11:38 AM, Robert Kanter 
> wrote:
>
> > +1 (binding)
> >
> > - Downloaded binary tarball
> > - verified signatures
> > - setup pseudo cluster
> > - ran some of the example jobs, clicked around the UI a bit
> >
> >
> > - Robert
> >
> >
> > On Mon, Jul 25, 2016 at 3:28 PM, Jason Lowe  >
> > wrote:
> >
> > > +1 (binding)
> > > - Verified signatures and digests- Built from source with native
> support-
> > > Deployed a pseudo-distributed cluster- Ran some sample jobs
> > > Jason
> > >
> > >   From: Vinod Kumar Vavilapalli 
> > >  To: "common-dev@hadoop.apache.org" ;
> > > hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; "
> > > mapreduce-...@hadoop.apache.org" 
> > > Cc: Vinod Kumar Vavilapalli 
> > >  Sent: Friday, July 22, 2016 9:15 PM
> > >  Subject: [VOTE] Release Apache Hadoop 2.7.3 RC0
> > >
> > > Hi all,
> > >
> > > I've created a release candidate RC0 for Apache Hadoop 2.7.3.
> > >
> > > As discussed before, this is the next maintenance release to follow up
> > > 2.7.2.
> > >
> > > The RC is available for validation at:
> > > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> > > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
> > >
> > > The RC tag in git is: release-2.7.3-RC0
> > >
> > > The maven artifacts are available via repository.apache.org <
> > > http://repository.apache.org/> at
> > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> > <
> > >
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> > >
> > >
> > > The release-notes are inside the tar-balls at location
> > > hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> > > hosted this at
> > > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> > > http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>
> > for
> > > your quick perusal.
> > >
> > > As you may have noted, a very long fix-cycle for the License & Notice
> > > issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop
> > release)
> > > to slip by quite a bit. This release's related discussion thread is
> > linked
> > > below: [1].
> > >
> > > Please try the release and vote; the vote will run for the usual 5
> days.
> > >
> > > Thanks,
> > > Vinod
> > >
> > > [1]: 2.7.3 release plan:
> > >
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html
> > <
> > > http://markmail.org/thread/6yv2fyrs4jlepmmr>
> > >
> > >
> > >
> >
>


[jira] [Created] (HADOOP-13434) Add quoting to Shell class

2016-07-27 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-13434:
--

 Summary: Add quoting to Shell class
 Key: HADOOP-13434
 URL: https://issues.apache.org/jira/browse/HADOOP-13434
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The Shell class makes assumptions that the parameters won't have spaces or 
other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Feedback on IRC channel

2016-07-27 Thread Martin Rosse
Regarding approaches to cleaning up the Wiki content--how about an approach
similar to the Spark cwiki:

https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage

My take is that the Hadoop product docs on hadoop.apache.org generally
target (or should target) the audiences described in 1-4 below, while the
Wiki is (should be) primarily for audience #5 or "Hadoop staff"--internal
Hadoop development, product management, QA, etc.

Definitely current Wiki content such as "Overview of Hadoop" and the link
to "Single Node Hadoop Cluster" installation is redundant, unnecessary doc
maintenance, and annoying to come across as a user because one must assess
its value relative to the same/similar content in the product doc on
hadoop.apache.org.

BTW, I did some random testing of ASF project wikis hosted on
cwiki.apache.org, and the pages for those sites definitely load much, much
faster than ASF wiki pages using MoinMoin.

Best,
Martin

On Wed, Jul 27, 2016 at 10:29 AM, Ray Chiang  wrote:

> Good to know.  It's certainly easier to set up an alternate location in
> any case and then do a wholesale migration.  It saves from having that
> "under construction" look before it's complete.
>
> I'll get on the appropriate infra@ list and ask about recommendations.
>
> -Ray
>
>
> On 7/26/16 10:49 PM, Andrew Wang wrote:
>
>> Hi Ray, if you're going to do a wiki cleanup, fair warning that I filed
>> this INFRA JIRA about the wiki being terribly slow, and they closed it as
>> WONTFIX:
>>
>> https://issues.apache.org/jira/browse/INFRA-12283
>>
>> So if you'd actually like to undertake a wiki cleanup, we should also
>> consider migrating the content to a wiki that isn't terribly slow.
>>
>> I think cwiki.apache.org is better, but maybe we should ask infra what
>> the
>> preferred option is here. They might be able to help with a content
>> migration too.
>>
>> On Tue, Jul 26, 2016 at 3:27 PM, Ray Chiang  wrote:
>>
>> Coming in late to an old thread.
>>>
>>> I was looking around at the Hadoop documentation (hadoop.apache.org and
>>> wiki.apache.org/hadoop) and I'd sum up the current state of the
>>> documentation as follows:
>>>
>>> 1. hadoop.apache.org is pretty clearly full of technical information.
>>> My only minor nit here is that the wiki pointer and the Git pointer
>>> at the top is really tiny.
>>> 2. wiki.apache.org is simultaneously targeted to at least four audiences
>>>  1. Industry Users (broadest sense of Big Data Industry)
>>>  2. Industry Developers (mostly those adding a layer like Hive does
>>> to MapReduce)
>>>  3. Hadoop Users (those who just want to set up a small cluster)
>>>  4. Hadoop Developers (e.g. using MapReduce APIs)
>>>  5. Hadoop Internal Developers (eventual contributors)
>>>
>>> I'd like to initiate some cleanup of the wiki, but before I even start,
>>> I'd like to see if anyone has constructive suggestions or other
>>> approaches
>>> that would make this transition smoother.
>>>
>>> 1. Some sections, like Industry Users and Industry Developers is
>>> growing so fast, I'm not sure whether it's worth maintaining in any
>>> meaningful format. I'd be inclined to make suggestions on where to
>>> start and let Google take them forward from there.
>>> 2. Organize the developer section based on the pieces a new reader
>>> wants to learn (new to everything, new to Hadoop, all the tools for
>>> Hadoop development, "just check out code and go", etc).
>>> 3. Organize the Users section a bit more.  The "Setting up a Hadoop
>>> Cluster" is grouped well, but I'd perhaps rearrange the ordering a
>>> bit.
>>>
>>> -Ray
>>>
>>>
>>> On 7/14/16 3:49 PM, Andrew Wang wrote:
>>>
>>> I think we should try to keep ownership over the #hadoop channel (do we
 have ownership?) but make it clear on the website and in the channel
 greeting that this is for user-on-user discussion, and it's not actively
 monitored by developers.

 On Thu, Jul 14, 2016 at 3:37 PM, Akira AJISAKA <
 ajisa...@oss.nttdata.co.jp>
 wrote:

 I'm not using the IRC channel (#hadoop at irc.freenode.net.)

> I'm using slack (hadoopdev.slack.com) instead.
>
> -Akira
>
>
> On 7/14/16 14:48, Ravi Prakash wrote:
>
> I've never gone there either. +1 for retiring.
>
>> On Wed, Jul 13, 2016 at 11:34 PM, J. Rottinghuis <
>> jrottingh...@gmail.com>
>> wrote:
>>
>> Uhm, there is an IRC channel?!?
>>
>> Joep
>>>
>>> On Wed, Jul 13, 2016 at 3:13 PM, Sangjin Lee 
>>> wrote:
>>>
>>> I seldom check out IRC (as my experience was the same). I'm OK with
>>>
>>> retiring it if no committers are around.

 On a related note, I know Tsuyoshi set up a slack channel for the
 committers. Even that one is pretty idle. :) Should we use it more
 often?
 If that starts to gain traction, we could 

Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Andrew Wang
Hi Junping, thanks for sharing your thoughts, inline,

On Wed, Jul 27, 2016 at 9:10 AM, 俊平堵  wrote:

> Thanks Vinod for bringing up this topic for discussion. I share the same
> concern here from my previous experience and I doubt some simple rules
> proposed below could make life easier.
>
> > The question now is what we do for the 2.8.0 and 3.0.0-alpha1 fix
> versions.
> > Allen's historical perspective is that we've based each minor or major
> > release off of the previous minor release. So, 2.8.0 would be based off
> of
> > 2.7.0. Assuming 3.0.0-alpha1 happens before 2.8.0, 3.0.0-alpha1 would
> also
> > be based off of 2.7.0. This also makes sense from a user POV; someone on
> a
> > 2.6.x going to 3.0.0-alpha1 can look at the 2.7.0 and 3.0.0-alpha1 notes
> to
> > see what's changed.
> This is not correct - not reflecting the past and not helpful for the
> future. There is no benefit to claim 3.0.0-alpha1 is based on 2.7.0 over
> 2.7.3 (in case 2.8.0 is not there).
> In the past, for example, when we cut off 2.7, we already have 2.6.0 and
> 2.6.1 get released, so 2.7.0 take all commits from 2.6.1 (not 2.6.0). In
> future, assume when we start the release effor of 3.1.0 and we have 3.0.1,
> 3.0.2, etc., 3.0.x should be more stable than 3.0.0-alpha, so there is
> unnecessary to do everything from scratch (3.0.0-alpha).
>

Based on the website, 2.7.0 immediately followed 2.6.0, so that's not quite
accurate. However your point does hold for 2.5.2 -> 2.6.0.

Vinod already described this earlier, where for a while we only had a
single chronological release line. Now though, we have the concurrent 2.6.x
and 2.7.x release lines, which do fix versions independently.

One goal here is a versioning scheme that extends this concurrency to
additional major and minor releases, i.e. 2.8 and 3.0. It doesn't make
sense to have a global ordering of releases, since not all patches belong
in all branches. It also poses additional coordination cost for release
management, particularly since it's hard to predict release timings.

Another goal is a scheme that is easy for users to understand. I believe
the above scheme accurately encodes the versioning system used by most
other enterprise software.


> So the rule here should be: a new major or minor release should come from
> a release:
> 1. tag with stable
> 2. released latest
> 3. with maximum version number
> If condition 2 and 3 get conflicts, we should give priority to 3. For
> example, when 3.0.0-alpha1 is about to release, assume we have 2.8.0, 2.7.4
> and 2.7.4 get released after 2.8.0, then we should claim 3.0.0-alpha is
> based on 2.8.0 instead of 2.7.4.
>
> These rules seem to require a high degree of coordination when it comes to
release ordering, but one goal is to reduce coordination. For example, the
2.7.0 notes say "Production users should wait for a 2.7.1/2.7.2 release."
We didn't know in advance if 2.7.1 or 2.7.2 would be the stable release. We
also don't know the release ordering of 2.7.4 and 2.8.0.

Given the uncertainty around release timing and stability, it's
advantageous to avoid basing versioning on these two pieces of information.
With these rules, a changes in stability or release ordering means we need
to update JIRA fix versions.

I also didn't quite follow your example, since I assume that 2.8.0 will
*not* be marked stable similar to how 2.7.0 is not stable. If this is true,
does rule 1 take precedence and we would base 3.0.0-alpha1 off of 2.7.4
(the latest stable release)? That seems confusing given the presence of
2.8.0.

Could you elaborate on what you see as the advantages of this scheme from a
user POV? It seems like users now need to be aware of the total ordering
and also stability of releases to know what changelogs to read. And as
described previously, it's not how other enterprise software versioning
works.


> > As an example, if a JIRA was committed to branch-2.6, branch-2.7,
> branch-2,
> > branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5,
> 2.7.3,
> > 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
> > rule 1, and the last two fix versions come from rule 2.
> I don't think setting version tags to be more than 3 is a good practice.
> The example above means we need to backport this patch to 5 branches which
> make our committers' life really tough - it requires more effort of
> committing a patch and also increases the risky of bugs that caused by
> backport. Given realistic community review bandwidth (please check another
> thread from Chris D.), I strongly suggest we keep active release train to
> be less than 3, so we can have 2 stable release or 1 stable release + 1
> alpha release in releasing.
>
> We already need to backport to many branches, setting some more fix
versions doesn't change that. Setting more fix versions is also not related
to review bandwidth, particularly in this case since things normally land
in trunk first. It's also not related to the 

Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-27 Thread John Zhuge
+1 (non-binding)

- Build source with Java 1.8.0_101 on Centos 7.2 with native
- Build source with Java 1.7.0_79 on Mac
- Verify license and notice using the shell script in HADOOP-13374
- Deploy a pseudo cluster
- Run basic dfs, distcp, ACL, webhdfs commands
- Run MapReduce workcount and pi examples
- Start balancer

Thanks,
John

John Zhuge
Software Engineer, Cloudera

On Wed, Jul 27, 2016 at 11:38 AM, Robert Kanter 
wrote:

> +1 (binding)
>
> - Downloaded binary tarball
> - verified signatures
> - setup pseudo cluster
> - ran some of the example jobs, clicked around the UI a bit
>
>
> - Robert
>
>
> On Mon, Jul 25, 2016 at 3:28 PM, Jason Lowe 
> wrote:
>
> > +1 (binding)
> > - Verified signatures and digests- Built from source with native support-
> > Deployed a pseudo-distributed cluster- Ran some sample jobs
> > Jason
> >
> >   From: Vinod Kumar Vavilapalli 
> >  To: "common-dev@hadoop.apache.org" ;
> > hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; "
> > mapreduce-...@hadoop.apache.org" 
> > Cc: Vinod Kumar Vavilapalli 
> >  Sent: Friday, July 22, 2016 9:15 PM
> >  Subject: [VOTE] Release Apache Hadoop 2.7.3 RC0
> >
> > Hi all,
> >
> > I've created a release candidate RC0 for Apache Hadoop 2.7.3.
> >
> > As discussed before, this is the next maintenance release to follow up
> > 2.7.2.
> >
> > The RC is available for validation at:
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
> >
> > The RC tag in git is: release-2.7.3-RC0
> >
> > The maven artifacts are available via repository.apache.org <
> > http://repository.apache.org/> at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1040/
> >
> >
> > The release-notes are inside the tar-balls at location
> > hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> > hosted this at
> > http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> > http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>
> for
> > your quick perusal.
> >
> > As you may have noted, a very long fix-cycle for the License & Notice
> > issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop
> release)
> > to slip by quite a bit. This release's related discussion thread is
> linked
> > below: [1].
> >
> > Please try the release and vote; the vote will run for the usual 5 days.
> >
> > Thanks,
> > Vinod
> >
> > [1]: 2.7.3 release plan:
> > https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html
> <
> > http://markmail.org/thread/6yv2fyrs4jlepmmr>
> >
> >
> >
>


Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Andrew Wang
>
> The -alphaX versions we're using leading up to 3.0.0 GA can be treated as
>> a.b.c versions, with alpha1 being the a.b.0 release.
>>
>
> Once 3.0.0 GA goes out, a user would want to see the diff from the latest
> 2.x.0 release (say 2.9.0).
>
> Are you suggesting 3.0.0 GA would have c = 5 (say) and hence rule 1 would
> apply, and it should show up in the release notes?
>
>
Yea. It means the if you're coming from 2.7, you'd read the 3.0.0-alphas,
3.0.0-betas, and finally the 3.0.0 GA notes.

The website notes will aggregate all the alpha and beta changes leading up
to 3.0.0 GA, so is likely where users will turn to first.

>
>> As an example, if a JIRA was committed to branch-2.6, branch-2.7,
>> branch-2,
>> branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5,
>> 2.7.3,
>> 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
>> rule 1, and the last two fix versions come from rule 2.
>>
>> I'm very eager to move this discussion forward, so feel free to reach out
>> on or off list if I can help with anything.
>>
>
>
> I think it is good practice to set multiple fix versions. However, it
> might take the committers a little bit to learn.
>
> Since the plan is to cut 3.0.0 off trunk, can we just bulk edit to add the
> 3.0.0-alphaX version?
>
>
Yea, I have a script and some JIRA queries that can help with this. I'll
also plan to compare with git log contents for extra verification.


Wiki migration and clean-up

2016-07-27 Thread Martin Rosse
Hi Ray,

The migration is much needed, and thanks for initiating it.

Regarding approaches to cleaning up the Wiki content--my 2 cents is in
favor an approach similar to the Spark cwiki:

https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage

My take is that the Hadoop product docs on hadoop.apache.org generally
target (or should target) the audiences you describe in 1-4, while the Wiki
is (should be) primarily for audience #5 or "Hadoop staff"--internal Hadoop
development, product management, QA, etc.

Definitely current Wiki content such as "Overview of Hadoop" and the link
to "Single Node Hadoop Cluster" installation is redundant, unnecessary doc
maintenance, and annoying to come across as a user because you have to
assess its value relative to the same/similar content in the product doc on
hadoop.apache.org.

BTW, I did some random testing of ASF project wikis hosted on
cwiki.apache.org, and the pages for those sites definitely load much, much
faster than ASF wiki pages using MoinMoin. Clearly no surprise.

Best,
Martin


On Wed, Jul 27, 2016 at 10:29 AM, Ray Chiang  wrote:

> Good to know.  It's certainly easier to set up an alternate location in
> any case and then do a wholesale migration.  It saves from having that
> "under construction" look before it's complete.
>
> I'll get on the appropriate infra@ list and ask about recommendations.
>
> -Ray
>
>
> On 7/26/16 10:49 PM, Andrew Wang wrote:
>
>> Hi Ray, if you're going to do a wiki cleanup, fair warning that I filed
>> this INFRA JIRA about the wiki being terribly slow, and they closed it as
>> WONTFIX:
>>
>> https://issues.apache.org/jira/browse/INFRA-12283
>>
>> So if you'd actually like to undertake a wiki cleanup, we should also
>> consider migrating the content to a wiki that isn't terribly slow.
>>
>> I think cwiki.apache.org is better, but maybe we should ask infra what
>> the
>> preferred option is here. They might be able to help with a content
>> migration too.
>>
>> On Tue, Jul 26, 2016 at 3:27 PM, Ray Chiang  wrote:
>>
>> Coming in late to an old thread.
>>>
>>> I was looking around at the Hadoop documentation (hadoop.apache.org and
>>> wiki.apache.org/hadoop) and I'd sum up the current state of the
>>> documentation as follows:
>>>
>>> 1. hadoop.apache.org is pretty clearly full of technical information.
>>> My only minor nit here is that the wiki pointer and the Git pointer
>>> at the top is really tiny.
>>> 2. wiki.apache.org is simultaneously targeted to at least four audiences
>>>  1. Industry Users (broadest sense of Big Data Industry)
>>>  2. Industry Developers (mostly those adding a layer like Hive does
>>> to MapReduce)
>>>  3. Hadoop Users (those who just want to set up a small cluster)
>>>  4. Hadoop Developers (e.g. using MapReduce APIs)
>>>  5. Hadoop Internal Developers (eventual contributors)
>>>
>>> I'd like to initiate some cleanup of the wiki, but before I even start,
>>> I'd like to see if anyone has constructive suggestions or other
>>> approaches
>>> that would make this transition smoother.
>>>
>>> 1. Some sections, like Industry Users and Industry Developers is
>>> growing so fast, I'm not sure whether it's worth maintaining in any
>>> meaningful format. I'd be inclined to make suggestions on where to
>>> start and let Google take them forward from there.
>>> 2. Organize the developer section based on the pieces a new reader
>>> wants to learn (new to everything, new to Hadoop, all the tools for
>>> Hadoop development, "just check out code and go", etc).
>>> 3. Organize the Users section a bit more.  The "Setting up a Hadoop
>>> Cluster" is grouped well, but I'd perhaps rearrange the ordering a
>>> bit.
>>>
>>> -Ray
>>>
>>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-27 Thread Robert Kanter
+1 (binding)

- Downloaded binary tarball
- verified signatures
- setup pseudo cluster
- ran some of the example jobs, clicked around the UI a bit


- Robert


On Mon, Jul 25, 2016 at 3:28 PM, Jason Lowe 
wrote:

> +1 (binding)
> - Verified signatures and digests- Built from source with native support-
> Deployed a pseudo-distributed cluster- Ran some sample jobs
> Jason
>
>   From: Vinod Kumar Vavilapalli 
>  To: "common-dev@hadoop.apache.org" ;
> hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; "
> mapreduce-...@hadoop.apache.org" 
> Cc: Vinod Kumar Vavilapalli 
>  Sent: Friday, July 22, 2016 9:15 PM
>  Subject: [VOTE] Release Apache Hadoop 2.7.3 RC0
>
> Hi all,
>
> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>
> As discussed before, this is the next maintenance release to follow up
> 2.7.2.
>
> The RC is available for validation at:
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>
> The RC tag in git is: release-2.7.3-RC0
>
> The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/>
>
> The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for
> your quick perusal.
>
> As you may have noted, a very long fix-cycle for the License & Notice
> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
> to slip by quite a bit. This release's related discussion thread is linked
> below: [1].
>
> Please try the release and vote; the vote will run for the usual 5 days.
>
> Thanks,
> Vinod
>
> [1]: 2.7.3 release plan:
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
> http://markmail.org/thread/6yv2fyrs4jlepmmr>
>
>
>


Re: Feedback on IRC channel

2016-07-27 Thread Ray Chiang
Good to know.  It's certainly easier to set up an alternate location in 
any case and then do a wholesale migration.  It saves from having that 
"under construction" look before it's complete.


I'll get on the appropriate infra@ list and ask about recommendations.

-Ray


On 7/26/16 10:49 PM, Andrew Wang wrote:

Hi Ray, if you're going to do a wiki cleanup, fair warning that I filed
this INFRA JIRA about the wiki being terribly slow, and they closed it as
WONTFIX:

https://issues.apache.org/jira/browse/INFRA-12283

So if you'd actually like to undertake a wiki cleanup, we should also
consider migrating the content to a wiki that isn't terribly slow.

I think cwiki.apache.org is better, but maybe we should ask infra what the
preferred option is here. They might be able to help with a content
migration too.

On Tue, Jul 26, 2016 at 3:27 PM, Ray Chiang  wrote:


Coming in late to an old thread.

I was looking around at the Hadoop documentation (hadoop.apache.org and
wiki.apache.org/hadoop) and I'd sum up the current state of the
documentation as follows:

1. hadoop.apache.org is pretty clearly full of technical information.
My only minor nit here is that the wiki pointer and the Git pointer
at the top is really tiny.
2. wiki.apache.org is simultaneously targeted to at least four audiences
 1. Industry Users (broadest sense of Big Data Industry)
 2. Industry Developers (mostly those adding a layer like Hive does
to MapReduce)
 3. Hadoop Users (those who just want to set up a small cluster)
 4. Hadoop Developers (e.g. using MapReduce APIs)
 5. Hadoop Internal Developers (eventual contributors)

I'd like to initiate some cleanup of the wiki, but before I even start,
I'd like to see if anyone has constructive suggestions or other approaches
that would make this transition smoother.

1. Some sections, like Industry Users and Industry Developers is
growing so fast, I'm not sure whether it's worth maintaining in any
meaningful format. I'd be inclined to make suggestions on where to
start and let Google take them forward from there.
2. Organize the developer section based on the pieces a new reader
wants to learn (new to everything, new to Hadoop, all the tools for
Hadoop development, "just check out code and go", etc).
3. Organize the Users section a bit more.  The "Setting up a Hadoop
Cluster" is grouped well, but I'd perhaps rearrange the ordering a bit.

-Ray


On 7/14/16 3:49 PM, Andrew Wang wrote:


I think we should try to keep ownership over the #hadoop channel (do we
have ownership?) but make it clear on the website and in the channel
greeting that this is for user-on-user discussion, and it's not actively
monitored by developers.

On Thu, Jul 14, 2016 at 3:37 PM, Akira AJISAKA <
ajisa...@oss.nttdata.co.jp>
wrote:

I'm not using the IRC channel (#hadoop at irc.freenode.net.)

I'm using slack (hadoopdev.slack.com) instead.

-Akira


On 7/14/16 14:48, Ravi Prakash wrote:

I've never gone there either. +1 for retiring.

On Wed, Jul 13, 2016 at 11:34 PM, J. Rottinghuis <
jrottingh...@gmail.com>
wrote:

Uhm, there is an IRC channel?!?


Joep

On Wed, Jul 13, 2016 at 3:13 PM, Sangjin Lee  wrote:

I seldom check out IRC (as my experience was the same). I'm OK with


retiring it if no committers are around.

On a related note, I know Tsuyoshi set up a slack channel for the
committers. Even that one is pretty idle. :) Should we use it more
often?
If that starts to gain traction, we could set up a more open room for

users

as well.

Sangjin

On Wed, Jul 13, 2016 at 9:13 AM, Karthik Kambatla 

Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread 俊平堵
Thanks Vinod for bringing up this topic for discussion. I share the same
concern here from my previous experience and I doubt some simple rules
proposed below could make life easier.

> The question now is what we do for the 2.8.0 and 3.0.0-alpha1 fix
versions.
> Allen's historical perspective is that we've based each minor or major
> release off of the previous minor release. So, 2.8.0 would be based off of
> 2.7.0. Assuming 3.0.0-alpha1 happens before 2.8.0, 3.0.0-alpha1 would also
> be based off of 2.7.0. This also makes sense from a user POV; someone on a
> 2.6.x going to 3.0.0-alpha1 can look at the 2.7.0 and 3.0.0-alpha1 notes
to
> see what's changed.
This is not correct - not reflecting the past and not helpful for the
future. There is no benefit to claim 3.0.0-alpha1 is based on 2.7.0 over
2.7.3 (in case 2.8.0 is not there).
In the past, for example, when we cut off 2.7, we already have 2.6.0 and
2.6.1 get released, so 2.7.0 take all commits from 2.6.1 (not 2.6.0). In
future, assume when we start the release effor of 3.1.0 and we have 3.0.1,
3.0.2, etc., 3.0.x should be more stable than 3.0.0-alpha, so there is
unnecessary to do everything from scratch (3.0.0-alpha). So the rule here
should be: a new major or minor release should come from a release:
1. tag with stable
2. released latest
3. with maximum version number
If condition 2 and 3 get conflicts, we should give priority to 3. For
example, when 3.0.0-alpha1 is about to release, assume we have 2.8.0, 2.7.4
and 2.7.4 get released after 2.8.0, then we should claim 3.0.0-alpha is
based on 2.8.0 instead of 2.7.4.


> As an example, if a JIRA was committed to branch-2.6, branch-2.7,
branch-2,
> branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5,
2.7.3,
> 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
> rule 1, and the last two fix versions come from rule 2.
I don't think setting version tags to be more than 3 is a good practice.
The example above means we need to backport this patch to 5 branches which
make our committers' life really tough - it requires more effort of
committing a patch and also increases the risky of bugs that caused by
backport. Given realistic community review bandwidth (please check another
thread from Chris D.), I strongly suggest we keep active release train to
be less than 3, so we can have 2 stable release or 1 stable release + 1
alpha release in releasing.

BTW, I never see we have a clear definition for alpha release. It is
previously used as unstable in API definition (2.1-alpha, 2.2-alpha, etc.)
but sometimes means unstable in production quality (2.7.0). I think we
should clearly define it with major consensus so user won't
misunderstanding the risky here.
Also, if we treat our 3.0.0-alpha release work seriously, we should also
think about trunk's version number issue (bump up to 4.0.0-alpha?) or there
could be no room for 3.0 incompatible feature/bits soon.

Just 2 cents.

Thanks,

Junping

2016-07-27 15:34 GMT+01:00 Karthik Kambatla :

> Inline.
>
> > 1) Set the fix version for all a.b.c versions, where c > 0.
> > 2) For each major release line, set the lowest a.b.0 version.
> >
>
> Sounds reasonable.
>
>
> >
> > The -alphaX versions we're using leading up to 3.0.0 GA can be treated as
> > a.b.c versions, with alpha1 being the a.b.0 release.
> >
>
> Once 3.0.0 GA goes out, a user would want to see the diff from the latest
> 2.x.0 release (say 2.9.0).
>
> Are you suggesting 3.0.0 GA would have c = 5 (say) and hence rule 1 would
> apply, and it should show up in the release notes?
>
>
> >
> > As an example, if a JIRA was committed to branch-2.6, branch-2.7,
> branch-2,
> > branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5,
> 2.7.3,
> > 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
> > rule 1, and the last two fix versions come from rule 2.
> >
> > I'm very eager to move this discussion forward, so feel free to reach out
> > on or off list if I can help with anything.
> >
>
>
> I think it is good practice to set multiple fix versions. However, it might
> take the committers a little bit to learn.
>
> Since the plan is to cut 3.0.0 off trunk, can we just bulk edit to add the
> 3.0.0-alphaX version?
>
>
> >
> > Best,
> > Andrew
> >
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-07-27 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/

[Jul 26, 2016 1:30:02 PM] (stevel) Revert "HDFS-10668. Fix intermittently 
failing UT
[Jul 26, 2016 1:53:37 PM] (kai.zheng) HADOOP-13041. Adding tests for coder 
utilities. Contributed by Kai
[Jul 26, 2016 3:01:42 PM] (weichiu) HDFS-9937. Update dfsadmin command line 
help and HdfsQuotaAdminGuide.
[Jul 26, 2016 3:19:06 PM] (varunsaxena) YARN-5431. TimelineReader daemon start 
should allow to pass its own
[Jul 26, 2016 3:43:12 PM] (varunsaxena) Revert "YARN-5431. TimelineReader 
daemon start should allow to pass its
[Jul 26, 2016 7:27:46 PM] (arp) HDFS-10642.
[Jul 26, 2016 9:54:03 PM] (Arun Suresh) YARN-5392. Replace use of Priority in 
the Scheduling infrastructure with
[Jul 26, 2016 10:33:20 PM] (cnauroth) HADOOP-13422. 
ZKDelegationTokenSecretManager JaasConfig does not work
[Jul 26, 2016 11:01:50 PM] (weichiu) HDFS-10598. DiskBalancer does not execute 
multi-steps plan. Contributed
[Jul 27, 2016 1:14:09 AM] (wangda) YARN-5342. Improve non-exclusive node 
partition resource allocation in
[Jul 27, 2016 2:08:30 AM] (Arun Suresh) YARN-5351. ResourceRequest should take 
ExecutionType into account during
[Jul 27, 2016 4:22:59 AM] (wangda) YARN-5195. RM intermittently crashed with 
NPE while handling
[Jul 27, 2016 4:56:42 AM] (brahma) HDFS-10668. Fix intermittently failing UT




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.cli.TestHDFSCLI 
   hadoop.yarn.client.api.impl.TestYarnClient 
   hadoop.yarn.server.nodemanager.TestDirectoryCollection 
   hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler 
   hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler 
   hadoop.yarn.server.resourcemanager.TestResourceManager 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-compile-javac-root.txt
  [172K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-checkstyle-root.txt
  [16M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-patch-pylint.txt
  [16K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-patch-shelldocs.txt
  [16K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/diff-javadoc-javadoc-root.txt
  [2.3M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [144K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-nativetask.txt
  [124K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [36K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [64K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [268K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/115/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org



-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Karthik Kambatla
Inline.

> 1) Set the fix version for all a.b.c versions, where c > 0.
> 2) For each major release line, set the lowest a.b.0 version.
>

Sounds reasonable.


>
> The -alphaX versions we're using leading up to 3.0.0 GA can be treated as
> a.b.c versions, with alpha1 being the a.b.0 release.
>

Once 3.0.0 GA goes out, a user would want to see the diff from the latest
2.x.0 release (say 2.9.0).

Are you suggesting 3.0.0 GA would have c = 5 (say) and hence rule 1 would
apply, and it should show up in the release notes?


>
> As an example, if a JIRA was committed to branch-2.6, branch-2.7, branch-2,
> branch-3.0.0-alpha1, and trunk, it could have fix versions of 2.6.5, 2.7.3,
> 2.8.0, 3.0.0-alpha1. The first two fix versions come from application of
> rule 1, and the last two fix versions come from rule 2.
>
> I'm very eager to move this discussion forward, so feel free to reach out
> on or off list if I can help with anything.
>


I think it is good practice to set multiple fix versions. However, it might
take the committers a little bit to learn.

Since the plan is to cut 3.0.0 off trunk, can we just bulk edit to add the
3.0.0-alphaX version?


>
> Best,
> Andrew
>


[jira] [Created] (HADOOP-13433) Race in UGI.reloginFromKeytab

2016-07-27 Thread Duo Zhang (JIRA)
Duo Zhang created HADOOP-13433:
--

 Summary: Race in UGI.reloginFromKeytab
 Key: HADOOP-13433
 URL: https://issues.apache.org/jira/browse/HADOOP-13433
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Duo Zhang


This is a problem that has troubled us for several years. For our HBase 
cluster, sometimes the RS will be stuck due to

{noformat}
2016-06-20,03:44:12,936 INFO org.apache.hadoop.ipc.SecureClient: Exception 
encountered while connecting to the server :
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: The ticket isn't for us (35) - 
BAD TGS SERVER NAME)]
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
at 
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:140)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupSaslConnection(SecureClient.java:187)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.access$700(SecureClient.java:95)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:325)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$2.run(SecureClient.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37)
at org.apache.hadoop.hbase.security.User.call(User.java:607)
at org.apache.hadoop.hbase.security.User.access$700(User.java:51)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:461)
at 
org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:321)
at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1164)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1004)
at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:107)
at $Proxy24.replicateLogEntries(Unknown Source)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:962)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.runLoop(ReplicationSource.java:466)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:515)
Caused by: GSSException: No valid credentials provided (Mechanism level: The 
ticket isn't for us (35) - BAD TGS SERVER NAME)
at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:663)
at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:180)
at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
... 23 more
Caused by: KrbException: The ticket isn't for us (35) - BAD TGS SERVER NAME
at sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:64)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:185)
at 
sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:294)
at 
sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:106)
at 
sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:557)
at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:594)
... 26 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:133)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:58)
at sun.security.krb5.internal.TGSRep.(TGSRep.java:53)
at sun.security.krb5.KrbTgsRep.(KrbTgsRep.java:46)
... 31 more​
{noformat}

It rarely happens, but if it happens, the regionserver will be stuck and can 
never recover.

Recently we added a log after a successful re-login which prints the private 
credentials, and finally catched the direct reason. After a successful 
re-login, we have two kerberos tickets in the credentials, one is the TGT, and 
the other is a service ticket. The strange thing is that, the service ticket is 
placed before TGT. This breaks the assumption of jdk's kerberos library. See 
http://hg.openjdk.java.net/jdk8u/jdk8u60/jdk/file/935758609767/src/share/classes/sun/security/jgss/krb5/Krb5InitCredential.java,
 the 

Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-27 Thread Sunil Govind
Hi All
+1 (non-binding)
- Compiled and created tar ball from source
- Tested few MR jobs with node labels
- Verified UI

Thanks
Sunil

On Sat, Jul 23, 2016 at 7:45 AM Vinod Kumar Vavilapalli 
wrote:

> Hi all,
>
> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>
> As discussed before, this is the next maintenance release to follow up
> 2.7.2.
>
> The RC is available for validation at:
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>
> The RC tag in git is: release-2.7.3-RC0
>
> The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/>
>
> The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for
> your quick perusal.
>
> As you may have noted, a very long fix-cycle for the License & Notice
> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
> to slip by quite a bit. This release's related discussion thread is linked
> below: [1].
>
> Please try the release and vote; the vote will run for the usual 5 days.
>
> Thanks,
> Vinod
>
> [1]: 2.7.3 release plan:
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
> http://markmail.org/thread/6yv2fyrs4jlepmmr>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-27 Thread Jian He
+1 for the source tarball.

- Compiled and built from the source code
- Deployed a cluster
- Successfully ran some sample jobs.

Thanks,
Jian

> On Jul 27, 2016, at 10:11 AM, Akira Ajisaka  
> wrote:
> 
> +1 for the source tarball.
> 
> - Downloaded source tarball and binary tarball
> - Verified signatures and checksums
> - Compiled and built a single node cluster
> - Compiled Hive 2.1.0/1.2.1 and Tez 0.8.4/0.7.1 using Hadoop 2.7.3 pom 
> successfully
> - Ran some Hive on Tez queries successfully
> 
> Thanks,
> Akira
> 
> On 7/27/16 04:12, Vinod Kumar Vavilapalli wrote:
>> But, everyone please do continue your sanity checking on RC0 in case there 
>> are more issues to be fixed.
>> 
>> Thanks
>> +Vinod
>> 
>>> On Jul 26, 2016, at 12:11 PM, Vinod Kumar Vavilapalli  
>>> wrote:
>>> 
>>> Thanks Daniel and Wei.
>>> 
>>> I think these are worth fixing, I’m withdrawing this RC. Will look at 
>>> fixing these issues and roll a new candidate with the fixes as soon as 
>>> possible.
>>> 
>>> Thanks
>>> +Vinod
>>> 
 On Jul 26, 2016, at 11:05 AM, Wei-Chiu Chuang > wrote:
 
 I noticed two issues:
 
 (1) I ran hadoop checknative, but it seems the binary tarball was not 
 compiled with native library for Linux. On the contrary, the Hadoop built 
 from source tarball with maven -Pnative can find the native libraries on 
 the same host.
 
 (2) I noticed that the release dates in CHANGES.txt in tag 
 release-2.7.3-RC0 are set to Release 2.7.3 - 2016-07-27.
 However, the release dates in CHANGES.txt in the source and binary tar 
 balls are set to Release 2.7.3 - 2016-08-01. This is probably a non-issue 
 though.
 
 * Downloaded source and binary.
 * Verified signature.
 * Verified checksum.
 * Built from source using 64-bit Java 7 (1.7.0.75) and 8 (1.8.0.05). Both 
 went fine.
 * Ran hadoop checknative
 
 On Tue, Jul 26, 2016 at 9:12 AM, Rushabh Shah 
 > 
 wrote:
 Thanks Vinod for all the release work !
 +1 (non-binding).
 * Downloaded from source and built it.* Deployed a pseudo distributed 
 cluster.
 * Ran some sample jobs: sleep, pi* Ran some dfs commands.* Everything 
 works fine.
 
 
On Friday, July 22, 2016 9:16 PM, Vinod Kumar Vavilapalli 
 > wrote:
 
 
 Hi all,
 
 I've created a release candidate RC0 for Apache Hadoop 2.7.3.
 
 As discussed before, this is the next maintenance release to follow up 
 2.7.2.
 
 The RC is available for validation at: 
 http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ 
  
 >
 
 The RC tag in git is: release-2.7.3-RC0
 
 The maven artifacts are available via repository.apache.org 
  > at 
 https://repository.apache.org/content/repositories/orgapachehadoop-1040/ 
  
 >
 
 The release-notes are inside the tar-balls at location 
 hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I 
 hosted this at 
 http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html 
  
 > 
 for your quick perusal.
 
 As you may have noted, a very long fix-cycle for the License & Notice 
 issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release) 
 to slip by quite a bit. This release's related discussion thread is linked 
 below: [1].
 
 Please try the release and vote; the vote will run for the usual 5 days.
 
 Thanks,
 Vinod
 
 [1]: 2.7.3 release plan: 
 https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
  
 >
 
 
 
>>> 
>> 
>> 
> 
> 
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> 
> 



Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Wangda Tan
Thanks Andrew for sharing your thoughts,

It looks better if we can put multiple versions on the fix version, with
that we can at least do some queries on JIRA to check the issues like "in
branch-2.6.5 but not in branch-2.7.4".

I still have a couple of questions:

*1) How CHANGES.txt (or release note) should look like?*
Not sure if this is discussed before. Previously we put the *next* release
version of earliest branch to CHANGES.txt. However, this could be confusing
and need a lot of manual works.

For example, we have two parallel release branches: branch-2.6.5 and
branch-2.7.4. When we need to backport a commit X in branch-2.7.4 to
branch-2.6.5, we will update CHANGES.txt in branch-2.7.4 to say this commit
X is included by Hadoop-2.6.5.

However, if we release Hadoop-2.7.4 before Hadoop-2.6.5, user will find the
Hadoop-2.6.5 is not released yet.

To me, we should put the fix version in CHANGES.txt to the released Hadoop
from the earliest branch, in the above example, Hadoop-2.7.4 should be the
fix version of commit X in release note of Hadoop-2.7.4.

Instead, I suggest to add a suffix ("released") to the fix version after
release is done. So the release note generator can do query easier, and
other user of JIRA can benefit from this to understand which releases
include a given JIRA.

*2) Do we need to update historical JIRAs?*

It's better to make a consistent rule for active release branches (to me
they're branch-2.6 and up). So it will be better to update fix version for
all resolved JIRAs in release branches.

Thoughts?

Wangda

On Tue, Jul 26, 2016 at 11:40 PM, Andrew Wang 
wrote:

> I think I understand a bit better, though now I ask how this date is
> different from the release date. Based on the HowToRelease instructions, we
> set the release date to when the release vote passes. So, start of release
> vote vs. end of release vote doesn't seem that different, and these dates
> are still totally ordered.
>
> For the user in this scenario, she can upgrade from 2.7.3 to any later
> 2.7.c release (found easily since a.b.c releases are ordered), and when
> jumping to a new minor or major version, any version released
> chronologically after 2.7.3. This means you need to check the website, but
> given that this is the way most enterprise software is versioned, I think
> it'll be okay by users.
>
> I think this problem is also pretty rare in practice, since users normally
> upgrade to the highest maintenance release within a major/minor. Thus
> they'll only hit this if their upgrade cycle is faster than it takes for a
> change released in e.g. 2.6.x to then also be released in a 2.7.x.
>
> Best,
> Andrew
>
> On Tue, Jul 26, 2016 at 11:13 PM, Tsuyoshi Ozawa  wrote:
>
> > > Andrew: I bet many would assume it's the release date, like how Ubuntu
> > releases are numbered.
> >
> > Good point. Maybe I confuse you because of lack of explanation.
> >
> > I assume that "branch-cut off timing" mean the timing of freezing branch
> > like when starting the release vote. It's because that the release can
> > be delayed after the release pass. Does it make sense to you?
> >
> > > Even if we have the branch-cut date in the version string, devs still
> > need to be aware of other branches and backport appropriately.
> >
> > Yes, you're right. The good point of including date is that we can
> declare
> > which version includes the latest changes. It helps users, not devs
> > basically, to decide which version users will use: e.g. if
> > 2.8.1-20160801 is released after 2.9.0-20160701 and a user uses
> > 2.7.3-20160701, she can update their cluster 2.8.1, which include bug
> fixes
> > against 2.7.3. Please let me know if I have some missing points.
> >
> > Thanks,
> > - Tsuyoshi
> >
> > On Wednesday, 27 July 2016, Andrew Wang 
> wrote:
> >
> >> Thanks for replies Akira and Tsuyoshi, inline:
> >>
> >> Akira: Assuming 3.0.0-alpha1 will be released between 2.7.0 and 2.8.0,
> we
> >>> need to add 3.0.0-alphaX if 2.8.0 is in the fix versions of a jira and
> we
> >>> don't need to add 3.0.0-alphaX if 2.7.0 is in the fix versions of a
> jira.
> >>> Is it right?
> >>
> >>
> >> Yes, correct.
> >>
> >>
> >>> Tsuyoshi: My suggestion is adding the date when branch cut is done:
> like
> >>> 3.0.0-alpha1-20160724, 2.8.0-20160730 or something.
> >>>
> >>> Pros:-) It's totally ordered. If we have a policy such as backporting
> >>> to maintainance branches after the date, users can find that which
> >>> version
> >>> is cutting edge. In the example of above, 2.8.0-20160730 can include
> bug
> >>> fixes which is not included in 3.0.0-alpha1-20160724.
> >>>
> >>> Cons:-( A bit redundant.
> >>>
> >>> Could you elaborate on the problem this scheme addresses? We always
> want
> >> our releases, when ordered chronologically, to incorporate all the known
> >> relevant bug fixes. Even if we have the branch-cut date in the version
> >> string, devs still need to be aware of 

Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Tsuyoshi Ozawa
> I think I understand a bit better, though now I ask how this date is 
> different from the release date.

OIC. I also assume that the freezing branch cannot include the changes
between freezing date and the release date. This is for strict
ordering to ensure which is the newer. If we have lots maintenance
branches, it helps us to understand which branches include fix a
problem of my cluster.

> I think this problem is also pretty rare in practice, since users normally 
> upgrade to the highest maintenance release within a major/minor.

If there will be lots maintenance branches in parallel(2.6.x, 2.7.x,
2.8.x, 2.9.x, 3.0.x), we can hit this problem more easily -  if a user
uses plan to upgrade 2.7.3 cluster to 2.8.3 or 2.9.1 or 3.0.1, which
version should the user choose? This is my concern.

However, as you mentioned, we decide to reduce the number of branches
we keep maintenance, we don't need to do that.

Best,
- Tsuyoshi

On Wed, Jul 27, 2016 at 3:40 PM, Andrew Wang  wrote:
> I think I understand a bit better, though now I ask how this date is
> different from the release date. Based on the HowToRelease instructions, we
> set the release date to when the release vote passes. So, start of release
> vote vs. end of release vote doesn't seem that different, and these dates
> are still totally ordered.
>
> For the user in this scenario, she can upgrade from 2.7.3 to any later 2.7.c
> release (found easily since a.b.c releases are ordered), and when jumping to
> a new minor or major version, any version released chronologically after
> 2.7.3. This means you need to check the website, but given that this is the
> way most enterprise software is versioned, I think it'll be okay by users.
>
> I think this problem is also pretty rare in practice, since users normally
> upgrade to the highest maintenance release within a major/minor. Thus
> they'll only hit this if their upgrade cycle is faster than it takes for a
> change released in e.g. 2.6.x to then also be released in a 2.7.x.
>
> Best,
> Andrew
>
> On Tue, Jul 26, 2016 at 11:13 PM, Tsuyoshi Ozawa  wrote:
>>
>> > Andrew: I bet many would assume it's the release date, like how Ubuntu
>> > releases are numbered.
>>
>> Good point. Maybe I confuse you because of lack of explanation.
>>
>> I assume that "branch-cut off timing" mean the timing of freezing branch
>> like when starting the release vote. It's because that the release can be
>> delayed after the release pass. Does it make sense to you?
>>
>> > Even if we have the branch-cut date in the version string, devs still
>> > need to be aware of other branches and backport appropriately.
>>
>> Yes, you're right. The good point of including date is that we can declare
>> which version includes the latest changes. It helps users, not devs
>> basically, to decide which version users will use: e.g. if 2.8.1-20160801 is
>> released after 2.9.0-20160701 and a user uses 2.7.3-20160701, she can update
>> their cluster 2.8.1, which include bug fixes against 2.7.3. Please let me
>> know if I have some missing points.
>>
>> Thanks,
>> - Tsuyoshi
>>
>> On Wednesday, 27 July 2016, Andrew Wang  wrote:
>>>
>>> Thanks for replies Akira and Tsuyoshi, inline:
>>>
 Akira: Assuming 3.0.0-alpha1 will be released between 2.7.0 and 2.8.0,
 we need to add 3.0.0-alphaX if 2.8.0 is in the fix versions of a jira and 
 we
 don't need to add 3.0.0-alphaX if 2.7.0 is in the fix versions of a jira. 
 Is
 it right?
>>>
>>>
>>> Yes, correct.
>>>

 Tsuyoshi: My suggestion is adding the date when branch cut is done: like
 3.0.0-alpha1-20160724, 2.8.0-20160730 or something.

 Pros:-) It's totally ordered. If we have a policy such as backporting
 to maintainance branches after the date, users can find that which
 version
 is cutting edge. In the example of above, 2.8.0-20160730 can include bug
 fixes which is not included in 3.0.0-alpha1-20160724.

 Cons:-( A bit redundant.

>>> Could you elaborate on the problem this scheme addresses? We always want
>>> our releases, when ordered chronologically, to incorporate all the known
>>> relevant bug fixes. Even if we have the branch-cut date in the version
>>> string, devs still need to be aware of other branches and backport
>>> appropriately.
>>>
>>> Given that branch cuts and releases might not happen in the same order
>>> (e.g. if 3.0.0-alpha1 precedes 2.8.0), I think this also would be confusing
>>> for users. I bet many would assume it's the release date, like how Ubuntu
>>> releases are numbered.
>>>
>>> Best,
>>> Andrew
>
>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13432) S3A: Consider using TransferManager.download for copyToLocalFile

2016-07-27 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13432:
-

 Summary: S3A: Consider using TransferManager.download for 
copyToLocalFile
 Key: HADOOP-13432
 URL: https://issues.apache.org/jira/browse/HADOOP-13432
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


Currently it relies on the default implementation of FileSystem. But it would 
be good to explore using TransferManager.download() (Ref: 
https://java.awsblog.com/post/Tx3Z7NO7C2TVLB/Parallelizing-Large-Downloads-for-Optimal-Speed
 for recent aws-sdk-java).  When aws-sdk version is bumped it,it would 
automatically get the benefit of parallel download sas well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Andrew Wang
I think I understand a bit better, though now I ask how this date is
different from the release date. Based on the HowToRelease instructions, we
set the release date to when the release vote passes. So, start of release
vote vs. end of release vote doesn't seem that different, and these dates
are still totally ordered.

For the user in this scenario, she can upgrade from 2.7.3 to any later
2.7.c release (found easily since a.b.c releases are ordered), and when
jumping to a new minor or major version, any version released
chronologically after 2.7.3. This means you need to check the website, but
given that this is the way most enterprise software is versioned, I think
it'll be okay by users.

I think this problem is also pretty rare in practice, since users normally
upgrade to the highest maintenance release within a major/minor. Thus
they'll only hit this if their upgrade cycle is faster than it takes for a
change released in e.g. 2.6.x to then also be released in a 2.7.x.

Best,
Andrew

On Tue, Jul 26, 2016 at 11:13 PM, Tsuyoshi Ozawa  wrote:

> > Andrew: I bet many would assume it's the release date, like how Ubuntu
> releases are numbered.
>
> Good point. Maybe I confuse you because of lack of explanation.
>
> I assume that "branch-cut off timing" mean the timing of freezing branch
> like when starting the release vote. It's because that the release can
> be delayed after the release pass. Does it make sense to you?
>
> > Even if we have the branch-cut date in the version string, devs still
> need to be aware of other branches and backport appropriately.
>
> Yes, you're right. The good point of including date is that we can declare
> which version includes the latest changes. It helps users, not devs
> basically, to decide which version users will use: e.g. if
> 2.8.1-20160801 is released after 2.9.0-20160701 and a user uses
> 2.7.3-20160701, she can update their cluster 2.8.1, which include bug fixes
> against 2.7.3. Please let me know if I have some missing points.
>
> Thanks,
> - Tsuyoshi
>
> On Wednesday, 27 July 2016, Andrew Wang  wrote:
>
>> Thanks for replies Akira and Tsuyoshi, inline:
>>
>> Akira: Assuming 3.0.0-alpha1 will be released between 2.7.0 and 2.8.0, we
>>> need to add 3.0.0-alphaX if 2.8.0 is in the fix versions of a jira and we
>>> don't need to add 3.0.0-alphaX if 2.7.0 is in the fix versions of a jira.
>>> Is it right?
>>
>>
>> Yes, correct.
>>
>>
>>> Tsuyoshi: My suggestion is adding the date when branch cut is done: like
>>> 3.0.0-alpha1-20160724, 2.8.0-20160730 or something.
>>>
>>> Pros:-) It's totally ordered. If we have a policy such as backporting
>>> to maintainance branches after the date, users can find that which
>>> version
>>> is cutting edge. In the example of above, 2.8.0-20160730 can include bug
>>> fixes which is not included in 3.0.0-alpha1-20160724.
>>>
>>> Cons:-( A bit redundant.
>>>
>>> Could you elaborate on the problem this scheme addresses? We always want
>> our releases, when ordered chronologically, to incorporate all the known
>> relevant bug fixes. Even if we have the branch-cut date in the version
>> string, devs still need to be aware of other branches and backport
>> appropriately.
>>
>> Given that branch cuts and releases might not happen in the same order
>> (e.g. if 3.0.0-alpha1 precedes 2.8.0), I think this also would be confusing
>> for users. I bet many would assume it's the release date, like how Ubuntu
>> releases are numbered.
>>
>> Best,
>> Andrew
>>
>


Re: [DISCUSS] Release numbering semantics with concurrent (>2) releases [Was Setting JIRA fix versions for 3.0.0 releases]

2016-07-27 Thread Tsuyoshi Ozawa
> Andrew: I bet many would assume it's the release date, like how Ubuntu
releases are numbered.

Good point. Maybe I confuse you because of lack of explanation.

I assume that "branch-cut off timing" mean the timing of freezing branch
like when starting the release vote. It's because that the release can
be delayed after the release pass. Does it make sense to you?

> Even if we have the branch-cut date in the version string, devs still
need to be aware of other branches and backport appropriately.

Yes, you're right. The good point of including date is that we can declare
which version includes the latest changes. It helps users, not devs
basically, to decide which version users will use: e.g. if
2.8.1-20160801 is released after 2.9.0-20160701 and a user uses
2.7.3-20160701, she can update their cluster 2.8.1, which include bug fixes
against 2.7.3. Please let me know if I have some missing points.

Thanks,
- Tsuyoshi

On Wednesday, 27 July 2016, Andrew Wang  wrote:

> Thanks for replies Akira and Tsuyoshi, inline:
>
> Akira: Assuming 3.0.0-alpha1 will be released between 2.7.0 and 2.8.0, we
>> need to add 3.0.0-alphaX if 2.8.0 is in the fix versions of a jira and we
>> don't need to add 3.0.0-alphaX if 2.7.0 is in the fix versions of a jira.
>> Is it right?
>
>
> Yes, correct.
>
>
>> Tsuyoshi: My suggestion is adding the date when branch cut is done: like
>> 3.0.0-alpha1-20160724, 2.8.0-20160730 or something.
>>
>> Pros:-) It's totally ordered. If we have a policy such as backporting
>> to maintainance branches after the date, users can find that which version
>> is cutting edge. In the example of above, 2.8.0-20160730 can include bug
>> fixes which is not included in 3.0.0-alpha1-20160724.
>>
>> Cons:-( A bit redundant.
>>
>> Could you elaborate on the problem this scheme addresses? We always want
> our releases, when ordered chronologically, to incorporate all the known
> relevant bug fixes. Even if we have the branch-cut date in the version
> string, devs still need to be aware of other branches and backport
> appropriately.
>
> Given that branch cuts and releases might not happen in the same order
> (e.g. if 3.0.0-alpha1 precedes 2.8.0), I think this also would be confusing
> for users. I bet many would assume it's the release date, like how Ubuntu
> releases are numbered.
>
> Best,
> Andrew
>