Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Matthias J. Sax
+1 for dropping

On 09/04/2015 11:04 AM, Maximilian Michels wrote:
> +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
> release is hardly used and complicates the important high-availability
> changes in Flink.
> 
> On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen  wrote:
>> I am good with that as well. Mind that we are not only dropping a binary
>> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>>
>>
>>
>> Lets also reconfigure Travis to test
>>
>>  - Hadoop1
>>  - Hadoop 2.3
>>  - Hadoop 2.4
>>  - Hadoop 2.6
>>  - Hadoop 2.7
>>
>>
>> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park  wrote:
>>>
>>> +1 for dropping Hadoop 2.2.0
>>>
>>> Regards,
>>> Chiwan Park
>>>
 On Sep 4, 2015, at 5:58 AM, Ufuk Celebi  wrote:

 +1 to what Robert said.

 On Thursday, September 3, 2015, Robert Metzger 
 wrote:
 I think most cloud providers moved beyond Hadoop 2.2.0.
 Google's Click-To-Deploy is on 2.4.1
 AWS EMR is on 2.6.0

 The situation for the distributions seems to be the following:
 MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
 CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

 HDP 2.0  (October 2013) is using 2.2.0
 HDP 2.1 (April 2014) uses 2.4.0 already

 So both vendors and cloud providers are multiple releases away from
 Hadoop 2.2.0.

 Spark does not offer a binary distribution lower than 2.3.0.

 In addition to that, I don't think that the HDFS client in 2.2.0 is
 really usable in production environments. Users were reporting
 ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
 sometimes.

 The easiest approach  to resolve this issue would be  (a) dropping the
 support for Hadoop 2.2.0
 An alternative approach (b) would be:
  - ship a binary version for Hadoop 2.3.0
  - make the source of Flink still compatible with 2.2.0, so that users
 can compile a Hadoop 2.2.0 version if needed.

 I would vote for approach (a).


 On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann 
 wrote:
 While working on high availability (HA) for Flink's YARN execution I
 stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
 2.3.0, Hadoop introduced new functionality which is required for an
 efficient HA implementation. Therefore, I was wondering whether there is
 actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively 
 used
 by someone?

 Cheers,
 Till

>>>
>>>
>>>
>>>
>>>
>>



signature.asc
Description: OpenPGP digital signature


Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Maximilian Michels
+1 for dropping Hadoop 2.2.0 binary and source-compatibility. The
release is hardly used and complicates the important high-availability
changes in Flink.

On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen  wrote:
> I am good with that as well. Mind that we are not only dropping a binary
> distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.
>
>
>
> Lets also reconfigure Travis to test
>
>  - Hadoop1
>  - Hadoop 2.3
>  - Hadoop 2.4
>  - Hadoop 2.6
>  - Hadoop 2.7
>
>
> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park  wrote:
>>
>> +1 for dropping Hadoop 2.2.0
>>
>> Regards,
>> Chiwan Park
>>
>> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi  wrote:
>> >
>> > +1 to what Robert said.
>> >
>> > On Thursday, September 3, 2015, Robert Metzger 
>> > wrote:
>> > I think most cloud providers moved beyond Hadoop 2.2.0.
>> > Google's Click-To-Deploy is on 2.4.1
>> > AWS EMR is on 2.6.0
>> >
>> > The situation for the distributions seems to be the following:
>> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
>> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>> >
>> > HDP 2.0  (October 2013) is using 2.2.0
>> > HDP 2.1 (April 2014) uses 2.4.0 already
>> >
>> > So both vendors and cloud providers are multiple releases away from
>> > Hadoop 2.2.0.
>> >
>> > Spark does not offer a binary distribution lower than 2.3.0.
>> >
>> > In addition to that, I don't think that the HDFS client in 2.2.0 is
>> > really usable in production environments. Users were reporting
>> > ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
>> > sometimes.
>> >
>> > The easiest approach  to resolve this issue would be  (a) dropping the
>> > support for Hadoop 2.2.0
>> > An alternative approach (b) would be:
>> >  - ship a binary version for Hadoop 2.3.0
>> >  - make the source of Flink still compatible with 2.2.0, so that users
>> > can compile a Hadoop 2.2.0 version if needed.
>> >
>> > I would vote for approach (a).
>> >
>> >
>> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann 
>> > wrote:
>> > While working on high availability (HA) for Flink's YARN execution I
>> > stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>> > 2.3.0, Hadoop introduced new functionality which is required for an
>> > efficient HA implementation. Therefore, I was wondering whether there is
>> > actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively 
>> > used
>> > by someone?
>> >
>> > Cheers,
>> > Till
>> >
>>
>>
>>
>>
>>
>


Re: Usage of Hadoop 2.2.0

2015-09-04 Thread Stephan Ewen
I am good with that as well. Mind that we are not only dropping a binary
distribution for Hadoop 2.2.0, but also the source compatibility with 2.2.0.



Lets also reconfigure Travis to test

 - Hadoop1
 - Hadoop 2.3
 - Hadoop 2.4
 - Hadoop 2.6
 - Hadoop 2.7


On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park  wrote:

> +1 for dropping Hadoop 2.2.0
>
> Regards,
> Chiwan Park
>
> > On Sep 4, 2015, at 5:58 AM, Ufuk Celebi  wrote:
> >
> > +1 to what Robert said.
> >
> > On Thursday, September 3, 2015, Robert Metzger 
> wrote:
> > I think most cloud providers moved beyond Hadoop 2.2.0.
> > Google's Click-To-Deploy is on 2.4.1
> > AWS EMR is on 2.6.0
> >
> > The situation for the distributions seems to be the following:
> > MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
> >
> > HDP 2.0  (October 2013) is using 2.2.0
> > HDP 2.1 (April 2014) uses 2.4.0 already
> >
> > So both vendors and cloud providers are multiple releases away from
> Hadoop 2.2.0.
> >
> > Spark does not offer a binary distribution lower than 2.3.0.
> >
> > In addition to that, I don't think that the HDFS client in 2.2.0 is
> really usable in production environments. Users were reporting
> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
> sometimes.
> >
> > The easiest approach  to resolve this issue would be  (a) dropping the
> support for Hadoop 2.2.0
> > An alternative approach (b) would be:
> >  - ship a binary version for Hadoop 2.3.0
> >  - make the source of Flink still compatible with 2.2.0, so that users
> can compile a Hadoop 2.2.0 version if needed.
> >
> > I would vote for approach (a).
> >
> >
> > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann 
> wrote:
> > While working on high availability (HA) for Flink's YARN execution I
> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
> 2.3.0, Hadoop introduced new functionality which is required for an
> efficient HA implementation. Therefore, I was wondering whether there is
> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively
> used by someone?
> >
> > Cheers,
> > Till
> >
>
>
>
>
>
>


Re: Usage of Hadoop 2.2.0

2015-09-03 Thread Robert Metzger
I think most cloud providers moved beyond Hadoop 2.2.0.
Google's Click-To-Deploy is on 2.4.1
AWS EMR is on 2.6.0

The situation for the distributions seems to be the following:
MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)

HDP 2.0  (October 2013) is using 2.2.0
HDP 2.1 (April 2014) uses 2.4.0 already

So both vendors and cloud providers are multiple releases away from Hadoop
2.2.0.

Spark does not offer a binary distribution lower than 2.3.0.

In addition to that, I don't think that the HDFS client in 2.2.0 is really
usable in production environments. Users were reporting
ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
sometimes.

The easiest approach  to resolve this issue would be  (a) dropping the
support for Hadoop 2.2.0
An alternative approach (b) would be:
 - ship a binary version for Hadoop 2.3.0
 - make the source of Flink still compatible with 2.2.0, so that users can
compile a Hadoop 2.2.0 version if needed.

I would vote for approach (a).


On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann  wrote:

> While working on high availability (HA) for Flink's YARN execution I
> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
> 2.3.0, Hadoop introduced new functionality which is required for an
> efficient HA implementation. Therefore, I was wondering whether there is
> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively
> used by someone?
>
> Cheers,
> Till
>


Re: Usage of Hadoop 2.2.0

2015-09-03 Thread Ufuk Celebi
+1 to what Robert said.

On Thursday, September 3, 2015, Robert Metzger  wrote:

> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
>
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
>
> HDP 2.0  (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
>
> So both vendors and cloud providers are multiple releases away from Hadoop
> 2.2.0.
>
> Spark does not offer a binary distribution lower than 2.3.0.
>
> In addition to that, I don't think that the HDFS client in 2.2.0 is really
> usable in production environments. Users were reporting
> ArrayIndexOutOfBounds exceptions for some jobs, I also had these exceptions
> sometimes.
>
> The easiest approach  to resolve this issue would be  (a) dropping the
> support for Hadoop 2.2.0
> An alternative approach (b) would be:
>  - ship a binary version for Hadoop 2.3.0
>  - make the source of Flink still compatible with 2.2.0, so that users can
> compile a Hadoop 2.2.0 version if needed.
>
> I would vote for approach (a).
>
>
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann  > wrote:
>
>> While working on high availability (HA) for Flink's YARN execution I
>> stumbled across some limitations with Hadoop 2.2.0. From version 2.2.0 to
>> 2.3.0, Hadoop introduced new functionality which is required for an
>> efficient HA implementation. Therefore, I was wondering whether there is
>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively
>> used by someone?
>>
>> Cheers,
>> Till
>>
>
>


Re: Usage of Hadoop 2.2.0

2015-09-03 Thread Chiwan Park
+1 for dropping Hadoop 2.2.0

Regards,
Chiwan Park

> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi  wrote:
> 
> +1 to what Robert said.
> 
> On Thursday, September 3, 2015, Robert Metzger  wrote:
> I think most cloud providers moved beyond Hadoop 2.2.0.
> Google's Click-To-Deploy is on 2.4.1
> AWS EMR is on 2.6.0
> 
> The situation for the distributions seems to be the following:
> MapR 4 uses Hadoop 2.4.0 (current is MapR 5)
> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4)
> 
> HDP 2.0  (October 2013) is using 2.2.0
> HDP 2.1 (April 2014) uses 2.4.0 already
> 
> So both vendors and cloud providers are multiple releases away from Hadoop 
> 2.2.0.
> 
> Spark does not offer a binary distribution lower than 2.3.0.
> 
> In addition to that, I don't think that the HDFS client in 2.2.0 is really 
> usable in production environments. Users were reporting ArrayIndexOutOfBounds 
> exceptions for some jobs, I also had these exceptions sometimes.
> 
> The easiest approach  to resolve this issue would be  (a) dropping the 
> support for Hadoop 2.2.0
> An alternative approach (b) would be:
>  - ship a binary version for Hadoop 2.3.0
>  - make the source of Flink still compatible with 2.2.0, so that users can 
> compile a Hadoop 2.2.0 version if needed.
> 
> I would vote for approach (a).
> 
> 
> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann  wrote:
> While working on high availability (HA) for Flink's YARN execution I stumbled 
> across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, 
> Hadoop introduced new functionality which is required for an efficient HA 
> implementation. Therefore, I was wondering whether there is actually a need 
> to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone?
> 
> Cheers,
> Till
>