Re: Release to support Hadoop 3

2018-05-11 Thread Boglarka Egyed
; > in
> > > > > Hive).
> > > > > Not sure about vendors, but I think they're usually not adding new
> > > > features
> > > > > to older release lines. In my opinion we should branch off from
> > current
> > > > > trunk to track the 1.x release line (where we keep supporting
> Hadoop
> > 2)
> > > > and
> > > > > keep adding bugfixes there, but add new features to trunk only and
> > > don't
> > > > > worry about Hadoop 2 there.
> > > > >
> > > > > I agree with Attila on the dependencies. We shouldn't release based
> > on
> > > > > non-final releases. We might bump the dependencies to some
> alpha/beta
> > > > > during development, but don't forget to move to the final version
> in
> > > the
> > > > > end.
> > > > >
> > > > > +1 for Bogi as release manager.
> > > > >
> > > > > Regards,
> > > > > Daniel
> > > > >
> > > > > [1] https://reviews.apache.org/r/66548/
> > > > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture
> > > > >
> > > > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu>
> > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > >
> > > > > > I'd like to also attach my thoughts:
> > > > > >
> > > > > >
> > > > > > New Sqoop version: Last time when I'd the chance to talk about
> this
> > > > with
> > > > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the
> > front
> > > to
> > > > > > create Sqoop-NG (NG == Next Generation), quite the same what the
> > > Flume
> > > > > > community did (and AFAIK from Mike Percy it's been a quite
> > successful
> > > > act
> > > > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0,
> > > though
> > > > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > > > >
> > > > > >
> > > > > > Kite: I would totally split this effort into two subtasks. First
> I
> > > > would
> > > > > > get in contact with the Parquet team, and would create a KITE
> > > > independent
> > > > > > execution path in Sqoop for the Parquet backed tables
> > > > (Hive/Impala/etc.).
> > > > > > As a part of this effort I would also add direct support for ORC
> > > format
> > > > > (in
> > > > > > the past few years I've found it very useful in several different
> > > > > > situation, and usually it's quite inconvenient that Sqoop does
> not
> > > > > support
> > > > > > it "out of the box").
> > > > > >
> > > > > > As the second substask I would start to remove every KITE based
> > > > > dependency
> > > > > > (but according to my gut feeling it could break the codebase on
> too
> > > > many
> > > > > > places, and might not be that EZ to succeed on that front).
> > > > > >
> > > > > >
> > > > > > Hadoop 2:
> > > > > >
> > > > > > Could anyone please highlight me what would be the pros/cons on
> > this
> > > > > > front? AFAIK several vendors (including Cloudera, Hortonworks,
> > MapR,
> > > > EMR,
> > > > > > etc.) are still supporting Hadoop 2, and according to my best
> > > knowledge
> > > > > > most of the userbase are connected to their releases, so I'd like
> > to
> > > > > > provide the chance for those users to use the newest features of
> > > Sqoop,
> > > > > > thus I would vote for the compatibility for a bit more
> > time/versions.
> > > > > >
> > > > > >
> > > > > > Dependencies:
> > > > > >
> > > > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > > > dependencies (including HBase or anything else!). IMHO Sqoop is
> > > > already a
> > > > > > stable component of the Apache Foundation, and the users can
> depend
> &

Fwd: Release to support Hadoop 3

2018-05-10 Thread Attila Szabó
to
> > > > > create Sqoop-NG (NG == Next Generation), quite the same what the
> > Flume
> > > > > community did (and AFAIK from Mike Percy it's been a quite
> successful
> > > act
> > > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0,
> > though
> > > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > > >
> > > > >
> > > > > Kite: I would totally split this effort into two subtasks. First I
> > > would
> > > > > get in contact with the Parquet team, and would create a KITE
> > > independent
> > > > > execution path in Sqoop for the Parquet backed tables
> > > (Hive/Impala/etc.).
> > > > > As a part of this effort I would also add direct support for ORC
> > format
> > > > (in
> > > > > the past few years I've found it very useful in several different
> > > > > situation, and usually it's quite inconvenient that Sqoop does not
> > > > support
> > > > > it "out of the box").
> > > > >
> > > > > As the second substask I would start to remove every KITE based
> > > > dependency
> > > > > (but according to my gut feeling it could break the codebase on too
> > > many
> > > > > places, and might not be that EZ to succeed on that front).
> > > > >
> > > > >
> > > > > Hadoop 2:
> > > > >
> > > > > Could anyone please highlight me what would be the pros/cons on
> this
> > > > > front? AFAIK several vendors (including Cloudera, Hortonworks,
> MapR,
> > > EMR,
> > > > > etc.) are still supporting Hadoop 2, and according to my best
> > knowledge
> > > > > most of the userbase are connected to their releases, so I'd like
> to
> > > > > provide the chance for those users to use the newest features of
> > Sqoop,
> > > > > thus I would vote for the compatibility for a bit more
> time/versions.
> > > > >
> > > > >
> > > > > Dependencies:
> > > > >
> > > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > > dependencies (including HBase or anything else!). IMHO Sqoop is
> > > already a
> > > > > stable component of the Apache Foundation, and the users can depend
> > on
> > > > it,
> > > > > thus I'd like to avoid any kind of "immature" dependency related
> > > issues.
> > > > Of
> > > > > course this is also just my solo opinion, but as a community I
> think
> > we
> > > > > must not undermine our stability.
> > > > >
> > > > > On the other fronts I totally agree and +1 with the planned
> efforts,
> > > > >
> > > > > Best regards,
> > > > > Attila
> > > > >
> > > > > 
> > > > > From: Szabolcs Vasas <va...@apache.org>
> > > > > Sent: Friday, April 13, 2018 3:43 PM
> > > > > To: dev@sqoop.apache.org
> > > > > Subject: Re: Release to support Hadoop 3
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I also think that completely eliminating the Kite dependency from
> > Sqoop
> > > > > would be the easiest way of going forward, I will try to analyze
> this
> > > > topic
> > > > > a bit more next week and come up with subtasks so we could work on
> it
> > > in
> > > > > parallel potentially.
> > > > >
> > > > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the
> > > > release
> > > > > manager of it.
> > > > >
> > > > > Szabolcs
> > > > >
> > > > >
> > > > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Daniel et al,
> > > > > >
> > > > > > Thanks for bringing up this topic and the detailed status update.
> > > > > >
> > > > > > I am sharing my thoughts point by point, please find them below.
> > > > > >
> > > > > > 1) How to get a new Kite release? Maybe we should remove the Kite
> > > > > > > dependency altogether (as Szabol

Re: Release to support Hadoop 3

2018-05-10 Thread Boglarka Egyed
equest[1].
> > >
> > > I'm not that familiar with Flume, but it seems they've added NG after
> > > architectural changes and released FlumeNG 1.0 after Flume 0.9.4 [2].
> > Even
> > > if we go with NG, I'd suggest calling it 3.0, to avoid confusion with
> > > earlier releases.
> > >
> > > I think the biggest part of keeping Hadoop 2 (and previous versions of
> > > downstream projects like Hive) supported would be testing against
> those.
> > It
> > > would also require at least another build profile to build against
> them,
> > > and probably another layer of abstraction in the code (like Hadoop
> shims
> > in
> > > Hive).
> > > Not sure about vendors, but I think they're usually not adding new
> > features
> > > to older release lines. In my opinion we should branch off from current
> > > trunk to track the 1.x release line (where we keep supporting Hadoop 2)
> > and
> > > keep adding bugfixes there, but add new features to trunk only and
> don't
> > > worry about Hadoop 2 there.
> > >
> > > I agree with Attila on the dependencies. We shouldn't release based on
> > > non-final releases. We might bump the dependencies to some alpha/beta
> > > during development, but don't forget to move to the final version in
> the
> > > end.
> > >
> > > +1 for Bogi as release manager.
> > >
> > > Regards,
> > > Daniel
> > >
> > > [1] https://reviews.apache.org/r/66548/
> > > [2] https://blogs.apache.org/flume/entry/flume_ng_architecture
> > >
> > > On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu>
> wrote:
> > >
> > > >
> > > >
> > > > Hello everyone,
> > > >
> > > >
> > > > I'd like to also attach my thoughts:
> > > >
> > > >
> > > > New Sqoop version: Last time when I'd the chance to talk about this
> > with
> > > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the front
> to
> > > > create Sqoop-NG (NG == Next Generation), quite the same what the
> Flume
> > > > community did (and AFAIK from Mike Percy it's been a quite successful
> > act
> > > > from their POV). Don't get me wrong, I'm totall NOT against 3.0,
> though
> > > > IMHO Sqoop-NG 1.0 would be a better choice.
> > > >
> > > >
> > > > Kite: I would totally split this effort into two subtasks. First I
> > would
> > > > get in contact with the Parquet team, and would create a KITE
> > independent
> > > > execution path in Sqoop for the Parquet backed tables
> > (Hive/Impala/etc.).
> > > > As a part of this effort I would also add direct support for ORC
> format
> > > (in
> > > > the past few years I've found it very useful in several different
> > > > situation, and usually it's quite inconvenient that Sqoop does not
> > > support
> > > > it "out of the box").
> > > >
> > > > As the second substask I would start to remove every KITE based
> > > dependency
> > > > (but according to my gut feeling it could break the codebase on too
> > many
> > > > places, and might not be that EZ to succeed on that front).
> > > >
> > > >
> > > > Hadoop 2:
> > > >
> > > > Could anyone please highlight me what would be the pros/cons on this
> > > > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR,
> > EMR,
> > > > etc.) are still supporting Hadoop 2, and according to my best
> knowledge
> > > > most of the userbase are connected to their releases, so I'd like to
> > > > provide the chance for those users to use the newest features of
> Sqoop,
> > > > thus I would vote for the compatibility for a bit more time/versions.
> > > >
> > > >
> > > > Dependencies:
> > > >
> > > > I'd like to cast my very direct and LOUD vote against any alpha
> > > > dependencies (including HBase or anything else!). IMHO Sqoop is
> > already a
> > > > stable component of the Apache Foundation, and the users can depend
> on
> > > it,
> > > > thus I'd like to avoid any kind of "immature" dependency related
> > issues.
> > > Of
> > > > course this is also just my solo opinion, but as a community I think
> we
> > > > must not undermine 

Re: Release to support Hadoop 3

2018-05-10 Thread Dániel Vörös
at 5:24 PM Szabó Attila <mau...@inf.elte.hu> wrote:
> >
> > >
> > >
> > > Hello everyone,
> > >
> > >
> > > I'd like to also attach my thoughts:
> > >
> > >
> > > New Sqoop version: Last time when I'd the chance to talk about this
> with
> > > some of the PMC members (e.g. Jarcec, Kate ) we've been on the front to
> > > create Sqoop-NG (NG == Next Generation), quite the same what the Flume
> > > community did (and AFAIK from Mike Percy it's been a quite successful
> act
> > > from their POV). Don't get me wrong, I'm totall NOT against 3.0, though
> > > IMHO Sqoop-NG 1.0 would be a better choice.
> > >
> > >
> > > Kite: I would totally split this effort into two subtasks. First I
> would
> > > get in contact with the Parquet team, and would create a KITE
> independent
> > > execution path in Sqoop for the Parquet backed tables
> (Hive/Impala/etc.).
> > > As a part of this effort I would also add direct support for ORC format
> > (in
> > > the past few years I've found it very useful in several different
> > > situation, and usually it's quite inconvenient that Sqoop does not
> > support
> > > it "out of the box").
> > >
> > > As the second substask I would start to remove every KITE based
> > dependency
> > > (but according to my gut feeling it could break the codebase on too
> many
> > > places, and might not be that EZ to succeed on that front).
> > >
> > >
> > > Hadoop 2:
> > >
> > > Could anyone please highlight me what would be the pros/cons on this
> > > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR,
> EMR,
> > > etc.) are still supporting Hadoop 2, and according to my best knowledge
> > > most of the userbase are connected to their releases, so I'd like to
> > > provide the chance for those users to use the newest features of Sqoop,
> > > thus I would vote for the compatibility for a bit more time/versions.
> > >
> > >
> > > Dependencies:
> > >
> > > I'd like to cast my very direct and LOUD vote against any alpha
> > > dependencies (including HBase or anything else!). IMHO Sqoop is
> already a
> > > stable component of the Apache Foundation, and the users can depend on
> > it,
> > > thus I'd like to avoid any kind of "immature" dependency related
> issues.
> > Of
> > > course this is also just my solo opinion, but as a community I think we
> > > must not undermine our stability.
> > >
> > > On the other fronts I totally agree and +1 with the planned efforts,
> > >
> > > Best regards,
> > > Attila
> > >
> > > 
> > > From: Szabolcs Vasas <va...@apache.org>
> > > Sent: Friday, April 13, 2018 3:43 PM
> > > To: dev@sqoop.apache.org
> > > Subject: Re: Release to support Hadoop 3
> > >
> > > Hi all,
> > >
> > > I also think that completely eliminating the Kite dependency from Sqoop
> > > would be the easiest way of going forward, I will try to analyze this
> > topic
> > > a bit more next week and come up with subtasks so we could work on it
> in
> > > parallel potentially.
> > >
> > > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the
> > release
> > > manager of it.
> > >
> > > Szabolcs
> > >
> > >
> > > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org>
> wrote:
> > >
> > > > Hi Daniel et al,
> > > >
> > > > Thanks for bringing up this topic and the detailed status update.
> > > >
> > > > I am sharing my thoughts point by point, please find them below.
> > > >
> > > > 1) How to get a new Kite release? Maybe we should remove the Kite
> > > > > dependency altogether (as Szabolcs hinted in comments of
> SQOOP-3171)?
> > > >
> > > >
> > > > I think making a new Kite release would be a huge effort as it would
> > > > require upgrading the versions, making the necessary code
> > modifications,
> > > > testing it thoroughly, etc. then making the release itself meanwhile
> > Kite
> > > > is a very passively handled tool having minimal activity on it thus
> it
> > > > would definitely mean a lot of effort to get it done. It would have a
> > > > dependency o

Re: Release to support Hadoop 3

2018-04-16 Thread Szabolcs Vasas
Hadoop 2:
> >
> > Could anyone please highlight me what would be the pros/cons on this
> > front? AFAIK several vendors (including Cloudera, Hortonworks, MapR, EMR,
> > etc.) are still supporting Hadoop 2, and according to my best knowledge
> > most of the userbase are connected to their releases, so I'd like to
> > provide the chance for those users to use the newest features of Sqoop,
> > thus I would vote for the compatibility for a bit more time/versions.
> >
> >
> > Dependencies:
> >
> > I'd like to cast my very direct and LOUD vote against any alpha
> > dependencies (including HBase or anything else!). IMHO Sqoop is already a
> > stable component of the Apache Foundation, and the users can depend on
> it,
> > thus I'd like to avoid any kind of "immature" dependency related issues.
> Of
> > course this is also just my solo opinion, but as a community I think we
> > must not undermine our stability.
> >
> > On the other fronts I totally agree and +1 with the planned efforts,
> >
> > Best regards,
> > Attila
> >
> > 
> > From: Szabolcs Vasas <va...@apache.org>
> > Sent: Friday, April 13, 2018 3:43 PM
> > To: dev@sqoop.apache.org
> > Subject: Re: Release to support Hadoop 3
> >
> > Hi all,
> >
> > I also think that completely eliminating the Kite dependency from Sqoop
> > would be the easiest way of going forward, I will try to analyze this
> topic
> > a bit more next week and come up with subtasks so we could work on it in
> > parallel potentially.
> >
> > I am happy with the Sqoop 3.0 scope proposal too and Bogi being the
> release
> > manager of it.
> >
> > Szabolcs
> >
> >
> > On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> wrote:
> >
> > > Hi Daniel et al,
> > >
> > > Thanks for bringing up this topic and the detailed status update.
> > >
> > > I am sharing my thoughts point by point, please find them below.
> > >
> > > 1) How to get a new Kite release? Maybe we should remove the Kite
> > > > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
> > >
> > >
> > > I think making a new Kite release would be a huge effort as it would
> > > require upgrading the versions, making the necessary code
> modifications,
> > > testing it thoroughly, etc. then making the release itself meanwhile
> Kite
> > > is a very passively handled tool having minimal activity on it thus it
> > > would definitely mean a lot of effort to get it done. It would have a
> > > dependency on Solr community too as the Morphlines module of Kite is
> > > heavily used and somewhat actively developed by them. Also indeed there
> > is
> > > a shorter/longer term goal to get rid of Kite dependency in Sqoop
> > entirely,
> > > i.e. all release efforts would become throw-away very soon.
> > >
> > > Focusing on the Kite removal seems to be more reasonable to me. However
> > it
> > > would be great to see an estimation regarding this effort, @Szabolcs
> > could
> > > you maybe share your thoughts on this?
> > >
> > > 2) Should we drop support for Hadoop 2?
> > > >
> > >
> > > I think we can drop support for Hadoop 2 especially if we use
> > > straightforward versioning with the new release.
> > >
> > >
> > > > 3) What version number should we use? To avoid confusion with Sqoop2
> > I'd
> > > go
> > > > with 3.0.
> > > >
> > >
> > > I like this idea, +1 for making a 3.0 release containing these changes.
> > >
> > >
> > > > 4) Does (should?) this affect the 1.5 release?
> > >
> > >
> > > I think the answer is yes. Currently the following breaking changes are
> > on
> > > the horizon which could be part of a next Sqoop release:
> > > * com.cloudera package removal (done)
> > > * Gradle introduction (in progress)
> > > * Hadoop/Hive/HBase version upgrade (in progress)
> > > * Kite deprecation/removal (planned)
> > > * Bump Java version to 8 (planned )
> > >
> > > Looking at this list I would say that making a Sqoop 1.5 release
> > containing
> > > only the com.cloudera package removal, the Gradle introduction and the
> > Java
> > > version bump would mean a somewhat small and irrelevant scope from a
> user
> > > perspective so maybe having 

Re: Release to support Hadoop 3

2018-04-16 Thread Dániel Vörös
Hi All,

I believe we're all on the same page on removing Kite, so I've opened
SQOOP-3313 to track that. @Attila I'm glad to see you're interest in the
ORC part. It would be highly appreciated if you could take a look at this
review request[1].

I'm not that familiar with Flume, but it seems they've added NG after
architectural changes and released FlumeNG 1.0 after Flume 0.9.4 [2]. Even
if we go with NG, I'd suggest calling it 3.0, to avoid confusion with
earlier releases.

I think the biggest part of keeping Hadoop 2 (and previous versions of
downstream projects like Hive) supported would be testing against those. It
would also require at least another build profile to build against them,
and probably another layer of abstraction in the code (like Hadoop shims in
Hive).
Not sure about vendors, but I think they're usually not adding new features
to older release lines. In my opinion we should branch off from current
trunk to track the 1.x release line (where we keep supporting Hadoop 2) and
keep adding bugfixes there, but add new features to trunk only and don't
worry about Hadoop 2 there.

I agree with Attila on the dependencies. We shouldn't release based on
non-final releases. We might bump the dependencies to some alpha/beta
during development, but don't forget to move to the final version in the
end.

+1 for Bogi as release manager.

Regards,
Daniel

[1] https://reviews.apache.org/r/66548/
[2] https://blogs.apache.org/flume/entry/flume_ng_architecture

On Fri, Apr 13, 2018 at 5:24 PM Szabó Attila <mau...@inf.elte.hu> wrote:

>
>
> Hello everyone,
>
>
> I'd like to also attach my thoughts:
>
>
> New Sqoop version: Last time when I'd the chance to talk about this with
> some of the PMC members (e.g. Jarcec, Kate ) we've been on the front to
> create Sqoop-NG (NG == Next Generation), quite the same what the Flume
> community did (and AFAIK from Mike Percy it's been a quite successful act
> from their POV). Don't get me wrong, I'm totall NOT against 3.0, though
> IMHO Sqoop-NG 1.0 would be a better choice.
>
>
> Kite: I would totally split this effort into two subtasks. First I would
> get in contact with the Parquet team, and would create a KITE independent
> execution path in Sqoop for the Parquet backed tables (Hive/Impala/etc.).
> As a part of this effort I would also add direct support for ORC format (in
> the past few years I've found it very useful in several different
> situation, and usually it's quite inconvenient that Sqoop does not support
> it "out of the box").
>
> As the second substask I would start to remove every KITE based dependency
> (but according to my gut feeling it could break the codebase on too many
> places, and might not be that EZ to succeed on that front).
>
>
> Hadoop 2:
>
> Could anyone please highlight me what would be the pros/cons on this
> front? AFAIK several vendors (including Cloudera, Hortonworks, MapR, EMR,
> etc.) are still supporting Hadoop 2, and according to my best knowledge
> most of the userbase are connected to their releases, so I'd like to
> provide the chance for those users to use the newest features of Sqoop,
> thus I would vote for the compatibility for a bit more time/versions.
>
>
> Dependencies:
>
> I'd like to cast my very direct and LOUD vote against any alpha
> dependencies (including HBase or anything else!). IMHO Sqoop is already a
> stable component of the Apache Foundation, and the users can depend on it,
> thus I'd like to avoid any kind of "immature" dependency related issues. Of
> course this is also just my solo opinion, but as a community I think we
> must not undermine our stability.
>
> On the other fronts I totally agree and +1 with the planned efforts,
>
> Best regards,
> Attila
>
> 
> From: Szabolcs Vasas <va...@apache.org>
> Sent: Friday, April 13, 2018 3:43 PM
> To: dev@sqoop.apache.org
> Subject: Re: Release to support Hadoop 3
>
> Hi all,
>
> I also think that completely eliminating the Kite dependency from Sqoop
> would be the easiest way of going forward, I will try to analyze this topic
> a bit more next week and come up with subtasks so we could work on it in
> parallel potentially.
>
> I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release
> manager of it.
>
> Szabolcs
>
>
> On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> wrote:
>
> > Hi Daniel et al,
> >
> > Thanks for bringing up this topic and the detailed status update.
> >
> > I am sharing my thoughts point by point, please find them below.
> >
> > 1) How to get a new Kite release? Maybe we should remove the Kite
> > > dependency altogether (as Szabolcs hinted in comments of SQOOP-317

Re: Release to support Hadoop 3

2018-04-13 Thread Szabó Attila


Hello everyone,


I'd like to also attach my thoughts:


New Sqoop version: Last time when I'd the chance to talk about this with some 
of the PMC members (e.g. Jarcec, Kate ) we've been on the front to create 
Sqoop-NG (NG == Next Generation), quite the same what the Flume community did 
(and AFAIK from Mike Percy it's been a quite successful act from their POV). 
Don't get me wrong, I'm totall NOT against 3.0, though IMHO Sqoop-NG 1.0 would 
be a better choice.


Kite: I would totally split this effort into two subtasks. First I would get in 
contact with the Parquet team, and would create a KITE independent execution 
path in Sqoop for the Parquet backed tables (Hive/Impala/etc.). As a part of 
this effort I would also add direct support for ORC format (in the past few 
years I've found it very useful in several different situation, and usually 
it's quite inconvenient that Sqoop does not support it "out of the box").

As the second substask I would start to remove every KITE based dependency (but 
according to my gut feeling it could break the codebase on too many places, and 
might not be that EZ to succeed on that front).


Hadoop 2:

Could anyone please highlight me what would be the pros/cons on this front? 
AFAIK several vendors (including Cloudera, Hortonworks, MapR, EMR, etc.) are 
still supporting Hadoop 2, and according to my best knowledge most of the 
userbase are connected to their releases, so I'd like to provide the chance for 
those users to use the newest features of Sqoop, thus I would vote for the 
compatibility for a bit more time/versions.


Dependencies:

I'd like to cast my very direct and LOUD vote against any alpha dependencies 
(including HBase or anything else!). IMHO Sqoop is already a stable component 
of the Apache Foundation, and the users can depend on it, thus I'd like to 
avoid any kind of "immature" dependency related issues. Of course this is also 
just my solo opinion, but as a community I think we must not undermine our 
stability.

On the other fronts I totally agree and +1 with the planned efforts,

Best regards,
Attila


From: Szabolcs Vasas <va...@apache.org>
Sent: Friday, April 13, 2018 3:43 PM
To: dev@sqoop.apache.org
Subject: Re: Release to support Hadoop 3

Hi all,

I also think that completely eliminating the Kite dependency from Sqoop
would be the easiest way of going forward, I will try to analyze this topic
a bit more next week and come up with subtasks so we could work on it in
parallel potentially.

I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release
manager of it.

Szabolcs


On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed <b...@apache.org> wrote:

> Hi Daniel et al,
>
> Thanks for bringing up this topic and the detailed status update.
>
> I am sharing my thoughts point by point, please find them below.
>
> 1) How to get a new Kite release? Maybe we should remove the Kite
> > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
>
>
> I think making a new Kite release would be a huge effort as it would
> require upgrading the versions, making the necessary code modifications,
> testing it thoroughly, etc. then making the release itself meanwhile Kite
> is a very passively handled tool having minimal activity on it thus it
> would definitely mean a lot of effort to get it done. It would have a
> dependency on Solr community too as the Morphlines module of Kite is
> heavily used and somewhat actively developed by them. Also indeed there is
> a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely,
> i.e. all release efforts would become throw-away very soon.
>
> Focusing on the Kite removal seems to be more reasonable to me. However it
> would be great to see an estimation regarding this effort, @Szabolcs could
> you maybe share your thoughts on this?
>
> 2) Should we drop support for Hadoop 2?
> >
>
> I think we can drop support for Hadoop 2 especially if we use
> straightforward versioning with the new release.
>
>
> > 3) What version number should we use? To avoid confusion with Sqoop2 I'd
> go
> > with 3.0.
> >
>
> I like this idea, +1 for making a 3.0 release containing these changes.
>
>
> > 4) Does (should?) this affect the 1.5 release?
>
>
> I think the answer is yes. Currently the following breaking changes are on
> the horizon which could be part of a next Sqoop release:
> * com.cloudera package removal (done)
> * Gradle introduction (in progress)
> * Hadoop/Hive/HBase version upgrade (in progress)
> * Kite deprecation/removal (planned)
> * Bump Java version to 8 (planned )
>
> Looking at this list I would say that making a Sqoop 1.5 release containing
> only the com.cloudera package removal, the Gradle introduction and the Java
> version bu

Re: Release to support Hadoop 3

2018-04-13 Thread Szabolcs Vasas
Hi all,

I also think that completely eliminating the Kite dependency from Sqoop
would be the easiest way of going forward, I will try to analyze this topic
a bit more next week and come up with subtasks so we could work on it in
parallel potentially.

I am happy with the Sqoop 3.0 scope proposal too and Bogi being the release
manager of it.

Szabolcs


On Fri, Apr 13, 2018 at 2:37 PM, Boglarka Egyed  wrote:

> Hi Daniel et al,
>
> Thanks for bringing up this topic and the detailed status update.
>
> I am sharing my thoughts point by point, please find them below.
>
> 1) How to get a new Kite release? Maybe we should remove the Kite
> > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
>
>
> I think making a new Kite release would be a huge effort as it would
> require upgrading the versions, making the necessary code modifications,
> testing it thoroughly, etc. then making the release itself meanwhile Kite
> is a very passively handled tool having minimal activity on it thus it
> would definitely mean a lot of effort to get it done. It would have a
> dependency on Solr community too as the Morphlines module of Kite is
> heavily used and somewhat actively developed by them. Also indeed there is
> a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely,
> i.e. all release efforts would become throw-away very soon.
>
> Focusing on the Kite removal seems to be more reasonable to me. However it
> would be great to see an estimation regarding this effort, @Szabolcs could
> you maybe share your thoughts on this?
>
> 2) Should we drop support for Hadoop 2?
> >
>
> I think we can drop support for Hadoop 2 especially if we use
> straightforward versioning with the new release.
>
>
> > 3) What version number should we use? To avoid confusion with Sqoop2 I'd
> go
> > with 3.0.
> >
>
> I like this idea, +1 for making a 3.0 release containing these changes.
>
>
> > 4) Does (should?) this affect the 1.5 release?
>
>
> I think the answer is yes. Currently the following breaking changes are on
> the horizon which could be part of a next Sqoop release:
> * com.cloudera package removal (done)
> * Gradle introduction (in progress)
> * Hadoop/Hive/HBase version upgrade (in progress)
> * Kite deprecation/removal (planned)
> * Bump Java version to 8 (planned )
>
> Looking at this list I would say that making a Sqoop 1.5 release containing
> only the com.cloudera package removal, the Gradle introduction and the Java
> version bump would mean a somewhat small and irrelevant scope from a user
> perspective so maybe having two releases (1.5 and 3.0) would be a little
> bit overkill. I would instead suggest to go with a Sqoop 3.0 release
> containing all the changes listed above. What do you think?
>
> Summarizing it up I see the following dependencies for a next Sqoop release
> currently:
> * Finishing up the Gradle patch
> * Hive 3 release
> * Kite removal - this could be the next common effort in the community
>
> Anyhow I would be happy to take the Release Manager role for the next
> release, please let me know if everyone would be OK with that.
>
> I am looking forward to see others thoughts on this too.
>
> Many thanks,
> Bogi
>
> On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös 
> wrote:
>
> > Dear All,
> >
> > After some development towards supporting Hadoop 3 (and latest version of
> > downstream components) I'd like to summarize the current state of the
> > upgrade and start the conversation about releasing a new version of Sqoop
> > with Hadoop 3 support.
> >
> > Here's what happened so far:
> >  - Upgraded Hadoop dependency to 3.0.0
> >  - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
> >  - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
> >  - Dealt with a bunch of minor issues like changed Hadoop configuration
> > names and different packaging of Maven artifacts.
> >
> > For details please refer to this ticket and the attached review request:
> > https://issues.apache.org/jira/browse/SQOOP-3305
> >
> > Remaining work:
> >  - Parquet importing doesn't work. It was broken by a
> standalone-metastore
> > change in Hive and fixing would require a new Kite version to be built
> > against Hive 3.
> >  - Hive 3 is going to enable ACID tables by default. We should support
> > importing into these. Details:
> > https://issues.apache.org/jira/browse/SQOOP-3311
> >
> > Other blocking issues:
> >  - There's no Hive 3 release (no alpha/beta) yet.
> >
> > I'd like to kindly ask you all to share any other tasks/issues you know
> of
> > that we should address to support the latest versions. Also, there are a
> > couple open questions:
> >  1) How to get a new Kite release? Maybe we should remove the Kite
> > dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
> >  2) Should we drop support for Hadoop 2?
> >  3) What version number should we use? To avoid confusion with Sqoop2 I'd
> > go with 3.0.
> >  

Re: Release to support Hadoop 3

2018-04-13 Thread Boglarka Egyed
Hi Daniel et al,

Thanks for bringing up this topic and the detailed status update.

I am sharing my thoughts point by point, please find them below.

1) How to get a new Kite release? Maybe we should remove the Kite
> dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?


I think making a new Kite release would be a huge effort as it would
require upgrading the versions, making the necessary code modifications,
testing it thoroughly, etc. then making the release itself meanwhile Kite
is a very passively handled tool having minimal activity on it thus it
would definitely mean a lot of effort to get it done. It would have a
dependency on Solr community too as the Morphlines module of Kite is
heavily used and somewhat actively developed by them. Also indeed there is
a shorter/longer term goal to get rid of Kite dependency in Sqoop entirely,
i.e. all release efforts would become throw-away very soon.

Focusing on the Kite removal seems to be more reasonable to me. However it
would be great to see an estimation regarding this effort, @Szabolcs could
you maybe share your thoughts on this?

2) Should we drop support for Hadoop 2?
>

I think we can drop support for Hadoop 2 especially if we use
straightforward versioning with the new release.


> 3) What version number should we use? To avoid confusion with Sqoop2 I'd go
> with 3.0.
>

I like this idea, +1 for making a 3.0 release containing these changes.


> 4) Does (should?) this affect the 1.5 release?


I think the answer is yes. Currently the following breaking changes are on
the horizon which could be part of a next Sqoop release:
* com.cloudera package removal (done)
* Gradle introduction (in progress)
* Hadoop/Hive/HBase version upgrade (in progress)
* Kite deprecation/removal (planned)
* Bump Java version to 8 (planned )

Looking at this list I would say that making a Sqoop 1.5 release containing
only the com.cloudera package removal, the Gradle introduction and the Java
version bump would mean a somewhat small and irrelevant scope from a user
perspective so maybe having two releases (1.5 and 3.0) would be a little
bit overkill. I would instead suggest to go with a Sqoop 3.0 release
containing all the changes listed above. What do you think?

Summarizing it up I see the following dependencies for a next Sqoop release
currently:
* Finishing up the Gradle patch
* Hive 3 release
* Kite removal - this could be the next common effort in the community

Anyhow I would be happy to take the Release Manager role for the next
release, please let me know if everyone would be OK with that.

I am looking forward to see others thoughts on this too.

Many thanks,
Bogi

On Thu, Apr 12, 2018 at 5:17 PM, Dániel Vörös 
wrote:

> Dear All,
>
> After some development towards supporting Hadoop 3 (and latest version of
> downstream components) I'd like to summarize the current state of the
> upgrade and start the conversation about releasing a new version of Sqoop
> with Hadoop 3 support.
>
> Here's what happened so far:
>  - Upgraded Hadoop dependency to 3.0.0
>  - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
>  - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
>  - Dealt with a bunch of minor issues like changed Hadoop configuration
> names and different packaging of Maven artifacts.
>
> For details please refer to this ticket and the attached review request:
> https://issues.apache.org/jira/browse/SQOOP-3305
>
> Remaining work:
>  - Parquet importing doesn't work. It was broken by a standalone-metastore
> change in Hive and fixing would require a new Kite version to be built
> against Hive 3.
>  - Hive 3 is going to enable ACID tables by default. We should support
> importing into these. Details:
> https://issues.apache.org/jira/browse/SQOOP-3311
>
> Other blocking issues:
>  - There's no Hive 3 release (no alpha/beta) yet.
>
> I'd like to kindly ask you all to share any other tasks/issues you know of
> that we should address to support the latest versions. Also, there are a
> couple open questions:
>  1) How to get a new Kite release? Maybe we should remove the Kite
> dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
>  2) Should we drop support for Hadoop 2?
>  3) What version number should we use? To avoid confusion with Sqoop2 I'd
> go with 3.0.
>  4) Does (should?) this affect the 1.5 release?
>
> Regards,
> Daniel
>


Release to support Hadoop 3

2018-04-12 Thread Dániel Vörös
Dear All,

After some development towards supporting Hadoop 3 (and latest version of
downstream components) I'd like to summarize the current state of the
upgrade and start the conversation about releasing a new version of Sqoop
with Hadoop 3 support.

Here's what happened so far:
 - Upgraded Hadoop dependency to 3.0.0
 - Hive had to be upgraded, since old Hive didn't work with Hadoop 3.
 - HBase had to be upgraded since Hive 3 depends on HBase 2(alpha)
 - Dealt with a bunch of minor issues like changed Hadoop configuration
names and different packaging of Maven artifacts.

For details please refer to this ticket and the attached review request:
https://issues.apache.org/jira/browse/SQOOP-3305

Remaining work:
 - Parquet importing doesn't work. It was broken by a standalone-metastore
change in Hive and fixing would require a new Kite version to be built
against Hive 3.
 - Hive 3 is going to enable ACID tables by default. We should support
importing into these. Details:
https://issues.apache.org/jira/browse/SQOOP-3311

Other blocking issues:
 - There's no Hive 3 release (no alpha/beta) yet.

I'd like to kindly ask you all to share any other tasks/issues you know of
that we should address to support the latest versions. Also, there are a
couple open questions:
 1) How to get a new Kite release? Maybe we should remove the Kite
dependency altogether (as Szabolcs hinted in comments of SQOOP-3171)?
 2) Should we drop support for Hadoop 2?
 3) What version number should we use? To avoid confusion with Sqoop2 I'd
go with 3.0.
 4) Does (should?) this affect the 1.5 release?

Regards,
Daniel