subject:"Re\: Hadoop"

Re: Hadoop Metrics2 and JMX

2022-10-13 Thread Logan Jones

Thank you all very much for the responses!

On Wed, Oct 12, 2022 at 2:06 PM Dave Marion  wrote:

> Looking at [1], specifically the overview section, I think they are the
> same metrics just accessible via JMX instead of configuring a sink.
>
> [1]
>
> https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/metrics2/package-summary.html
>
> On Wed, Oct 12, 2022 at 1:39 PM Christopher  wrote:
>
> > I don't think we're doing anything special to publish to JMX. I think
> this
> > is something that is a feature of Hadoop Metrics2 that we're simply
> > enabling. So, this might be a question for the Hadoop general mailing
> list
> > if nobody knows the answer here.
> >
> > On Wed, Oct 12, 2022 at 1:06 PM Logan Jones 
> wrote:
> >
> > > Hello:
> > >
> > > I'm trying to figure out more about the metrics coming out of Accumulo
> > > 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system
> > and
> > > sending that to influxDB. In theory, I could also look at the JMX
> > metrics.
> > >
> > > Are the JMX metrics a superset of what comes out of Hadoop Metrics2?
> > >
> > > Thanks in advance,
> > >
> > > - Logan
> > >
> >
>

Re: Hadoop Metrics2 and JMX

2022-10-12 Thread Dave Marion

Looking at [1], specifically the overview section, I think they are the
same metrics just accessible via JMX instead of configuring a sink.

[1]
https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/metrics2/package-summary.html

On Wed, Oct 12, 2022 at 1:39 PM Christopher  wrote:

> I don't think we're doing anything special to publish to JMX. I think this
> is something that is a feature of Hadoop Metrics2 that we're simply
> enabling. So, this might be a question for the Hadoop general mailing list
> if nobody knows the answer here.
>
> On Wed, Oct 12, 2022 at 1:06 PM Logan Jones  wrote:
>
> > Hello:
> >
> > I'm trying to figure out more about the metrics coming out of Accumulo
> > 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system
> and
> > sending that to influxDB. In theory, I could also look at the JMX
> metrics.
> >
> > Are the JMX metrics a superset of what comes out of Hadoop Metrics2?
> >
> > Thanks in advance,
> >
> > - Logan
> >
>

Re: Hadoop Metrics2 and JMX

2022-10-12 Thread Christopher

I don't think we're doing anything special to publish to JMX. I think this
is something that is a feature of Hadoop Metrics2 that we're simply
enabling. So, this might be a question for the Hadoop general mailing list
if nobody knows the answer here.

On Wed, Oct 12, 2022 at 1:06 PM Logan Jones  wrote:

> Hello:
>
> I'm trying to figure out more about the metrics coming out of Accumulo
> 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system and
> sending that to influxDB. In theory, I could also look at the JMX metrics.
>
> Are the JMX metrics a superset of what comes out of Hadoop Metrics2?
>
> Thanks in advance,
>
> - Logan
>

Re: Hadoop

2016-06-02 Thread Christopher

Well, I wouldn't be surprised with some issues across a fedup (which is a
way to do an upgrade between Fedora versions), but it should have been
stable with normal/routine yum/dnf upgrades.

Were you using the Fedora-provided packages, or the BigTop ones? Or another
set?

On Thu, Jun 2, 2016 at 5:04 PM Corey Nolet  wrote:

> This may not be directly related but I've noticed Hadoop packages have been
> not uninstalling/updating well the past year or so. The last couple times
> I've run fedup, I've had to go back in manually and remove/update a bunch
> of the Hadoop packages like Zookeeper and Parquet.
>
> On Thu, Jun 2, 2016 at 4:59 PM, Christopher 
> wrote:
>
> > That first post was intended for the Fedora developer list. Apologies for
> > sending to the wrong list.
> >
> > If anybody is curious, it seems the Fedora community support around
> Hadoop
> > and Big Data is really dying... the packager for Flume and HTrace has
> > abandoned their efforts to package for Fedora, and now it looks like the
> > Hadoop package maintainer abandoned Hadoop, leaving Accumulo with
> > unsatisfied dependencies. This is actually kind of a sad state of
> affairs,
> > because better packaging downstream could really help users, and expose
> > more ways to improve the upstream products.
> >
> > As it stands, I think there is a disconnect between the upstream
> > communities and the downstream packagers in the Big Data space which
> > includes Accumulo. I would love to see more interest in better packaging
> > for downstream users through these existing downstream packager
> communities
> > (Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see
> > more volunteers come from these downstream communities to make
> improvements
> > upstream.
> >
> > As an upstream community, I believe the responsibility is for us to reach
> > down first, rather than wait for them to come to us. I've tried to do
> that
> > within Fedora, with the hope that others would follow for the downstream
> > communities they care about. Unfortunately, things haven't turned out how
> > I'd have preferred, but I'm still hopeful. If there is anybody interested
> > in downstream community packaging, let me know if I can help you get
> > started.
> >
> > On Thu, Jun 2, 2016 at 4:28 PM Christopher 
> > wrote:
> >
> > > Sorry, wrong list.
> > >
> > > On Thu, Jun 2, 2016 at 4:20 PM Christopher  >
> > > wrote:
> > >
> > >> So, it would seem at some point, without me noticing (certainly my
> > fault,
> > >> for not paying attention enough), the Hadoop packages got orphaned
> > and/or
> > >> retired? in Fedora.
> > >>
> > >> This is a big problem for me, because the main package I work on is
> > >> dependent upon Hadoop.
> > >>
> > >> What's the state of Hadoop in Fedora these days? Are there packaging
> > >> problems? Not enough support from upstream Apache community? Missing
> > >> dependencies in Fedora? Not enough time to work on it? No interest
> from
> > >> users?
> > >>
> > >> Whatever the issue is... I'd like to help wherever I can... I'd like
> to
> > >> keep this stuff going.
> > >>
> > >
> >
>

Re: Hadoop

2016-06-02 Thread Corey Nolet

This may not be directly related but I've noticed Hadoop packages have been
not uninstalling/updating well the past year or so. The last couple times
I've run fedup, I've had to go back in manually and remove/update a bunch
of the Hadoop packages like Zookeeper and Parquet.

On Thu, Jun 2, 2016 at 4:59 PM, Christopher 
wrote:

> That first post was intended for the Fedora developer list. Apologies for
> sending to the wrong list.
>
> If anybody is curious, it seems the Fedora community support around Hadoop
> and Big Data is really dying... the packager for Flume and HTrace has
> abandoned their efforts to package for Fedora, and now it looks like the
> Hadoop package maintainer abandoned Hadoop, leaving Accumulo with
> unsatisfied dependencies. This is actually kind of a sad state of affairs,
> because better packaging downstream could really help users, and expose
> more ways to improve the upstream products.
>
> As it stands, I think there is a disconnect between the upstream
> communities and the downstream packagers in the Big Data space which
> includes Accumulo. I would love to see more interest in better packaging
> for downstream users through these existing downstream packager communities
> (Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see
> more volunteers come from these downstream communities to make improvements
> upstream.
>
> As an upstream community, I believe the responsibility is for us to reach
> down first, rather than wait for them to come to us. I've tried to do that
> within Fedora, with the hope that others would follow for the downstream
> communities they care about. Unfortunately, things haven't turned out how
> I'd have preferred, but I'm still hopeful. If there is anybody interested
> in downstream community packaging, let me know if I can help you get
> started.
>
> On Thu, Jun 2, 2016 at 4:28 PM Christopher 
> wrote:
>
> > Sorry, wrong list.
> >
> > On Thu, Jun 2, 2016 at 4:20 PM Christopher 
> > wrote:
> >
> >> So, it would seem at some point, without me noticing (certainly my
> fault,
> >> for not paying attention enough), the Hadoop packages got orphaned
> and/or
> >> retired? in Fedora.
> >>
> >> This is a big problem for me, because the main package I work on is
> >> dependent upon Hadoop.
> >>
> >> What's the state of Hadoop in Fedora these days? Are there packaging
> >> problems? Not enough support from upstream Apache community? Missing
> >> dependencies in Fedora? Not enough time to work on it? No interest from
> >> users?
> >>
> >> Whatever the issue is... I'd like to help wherever I can... I'd like to
> >> keep this stuff going.
> >>
> >
>

Re: Hadoop

2016-06-02 Thread Christopher

That first post was intended for the Fedora developer list. Apologies for
sending to the wrong list.

If anybody is curious, it seems the Fedora community support around Hadoop
and Big Data is really dying... the packager for Flume and HTrace has
abandoned their efforts to package for Fedora, and now it looks like the
Hadoop package maintainer abandoned Hadoop, leaving Accumulo with
unsatisfied dependencies. This is actually kind of a sad state of affairs,
because better packaging downstream could really help users, and expose
more ways to improve the upstream products.

As it stands, I think there is a disconnect between the upstream
communities and the downstream packagers in the Big Data space which
includes Accumulo. I would love to see more interest in better packaging
for downstream users through these existing downstream packager communities
(Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see
more volunteers come from these downstream communities to make improvements
upstream.

As an upstream community, I believe the responsibility is for us to reach
down first, rather than wait for them to come to us. I've tried to do that
within Fedora, with the hope that others would follow for the downstream
communities they care about. Unfortunately, things haven't turned out how
I'd have preferred, but I'm still hopeful. If there is anybody interested
in downstream community packaging, let me know if I can help you get
started.

On Thu, Jun 2, 2016 at 4:28 PM Christopher 
wrote:

> Sorry, wrong list.
>
> On Thu, Jun 2, 2016 at 4:20 PM Christopher 
> wrote:
>
>> So, it would seem at some point, without me noticing (certainly my fault,
>> for not paying attention enough), the Hadoop packages got orphaned and/or
>> retired? in Fedora.
>>
>> This is a big problem for me, because the main package I work on is
>> dependent upon Hadoop.
>>
>> What's the state of Hadoop in Fedora these days? Are there packaging
>> problems? Not enough support from upstream Apache community? Missing
>> dependencies in Fedora? Not enough time to work on it? No interest from
>> users?
>>
>> Whatever the issue is... I'd like to help wherever I can... I'd like to
>> keep this stuff going.
>>
>

Re: Hadoop

2016-06-02 Thread Christopher

Sorry, wrong list.

On Thu, Jun 2, 2016 at 4:20 PM Christopher 
wrote:

> So, it would seem at some point, without me noticing (certainly my fault,
> for not paying attention enough), the Hadoop packages got orphaned and/or
> retired? in Fedora.
>
> This is a big problem for me, because the main package I work on is
> dependent upon Hadoop.
>
> What's the state of Hadoop in Fedora these days? Are there packaging
> problems? Not enough support from upstream Apache community? Missing
> dependencies in Fedora? Not enough time to work on it? No interest from
> users?
>
> Whatever the issue is... I'd like to help wherever I can... I'd like to
> keep this stuff going.
>

Re: Hadoop Summit 2015 Talk

2015-02-13 Thread THORMAN, ROBERT D

+3

v/r
Bob Thorman
Principal Big Data Engineer
ATT Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714






On 2/12/15, 11:13 PM, Josh Elser josh.el...@gmail.com wrote:

FYI -- Billie and I have submitted a talk to Hadoop Summit 2015 in San
Jose, CA in June.

http://hadoopsummit.uservoice.com/forums/283260-committer-track/suggestion
s/7073993-a-year-in-the-life-of-apache-accumulo

I'd be overjoyed if anyone would vote for the talk if they'd like to see
it happen. Thanks!

- Josh

Re: Hadoop Summit (San Jose June 3-5)

2014-04-28 Thread Billie Rinaldi

Just announced, an Accumulo Birds of a Feather session at the Hadoop Summit:
http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/179840512/

It looks like we have an hour and a half, exact schedule TBD.  Feel free to
contact me if there is any particular content you'd like to see at this
session.

Billie


On Mon, Apr 28, 2014 at 8:52 AM, Donald Miner dmi...@clearedgeit.comwrote:

 I'll be there. Is there interest in having an accumulo meetup like last
 year? Adam/Billie?


 On Mon, Apr 28, 2014 at 11:50 AM, Marc Reichman 
 mreich...@pixelforensics.com wrote:

 Will anyone be there? I wouldn't mind meeting up for a drink, talk about
 Accumulo, projects, etc.

 Looking forward to coming to my first Hadoop-based conference!

 Marc




 --

 Donald Miner
 Chief Technology Officer
 ClearEdge IT Solutions, LLC
 Cell: 443 799 7807
 www.clearedgeit.com

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher

The main thing is that I would not want to see an ACCUMULO-1790
*without* ACCUMULO-1795. Having 1792 alone would be insufficient for
me.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, Nov 12, 2013 at 9:22 AM, Sean Busbey bus...@clouderagovt.com wrote:
 On Fri, Oct 18, 2013 at 12:29 AM, Sean Busbey bus...@cloudera.com wrote:

 On Tue, Oct 15, 2013 at 10:20 AM, Sean Busbey bus...@cloudera.com wrote:


 On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.comwrote:


 On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the
 pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not
 talking about changing the default build profile for these branches are 
 we?



 for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0.
 I am not suggesting we change the default from building against Hadoop
 0.23.203.



 I mean 0.20.203.0. Ugh, Hadoop versions.



 Okay, barring additional suggestions, tomorrow afternoon I'll break things
 down into an umbrella and 3 sub tasks:

 1) addition of hadoop 2 support

  - to include backports of commits
  - to include making the target hadoop 2 version 2.2.0
  - to include test changes that flex hadoop 2 features like fail over

 2) ensuring compatibility for 0.20.203

 - presuming some subset of the commits in 1) will break it since 0.20
 support was left behind in 1.5

 3) doc / packaging updates

 - the issue of binary releases per distro
 - doc patch for what version(s) the release tests are expected to run
 against

 Once work is put against those tickets, I'd expect things to go into a
 branch based on the umbrella ticket until such time as the complete work
 can pass the test suite that we'll use at the next release. Then it can get
 rebased onto the 1.4.x dev branch.

 --
 Sean


 Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
 resurrect this thread to make sure everyone's concerns are addressed.

 For context, here's a link to the start of the last thread:

 http://bit.ly/1aPqKuH

 From ACCUMULO-1792, ctubbsii:

 I'd be reluctant to support any Hadoop 2.x support in the 1.4 release
 line that breaks compatibility with 0.20. I don't think breaking 0.20
 and then possibly fixing it again as a second step is acceptable (because
 that subsequent work may not ever be done, and I don't think
 we should break the compatibility contract that we've established with
 1.4.0).

 Chris, I believe keeping all of the work in a branch under the umbrella
 jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
 that doesn't have proper support for 0.20.203.

 Is there something beyond making sure the branch passes a full set of
 release tests on 0.20.203 that you'd like to see? In the event that the
 branch only ever contains the work for adding Hadoop 2, it's a simple
 matter to abandon without rolling into the 1.4 development line.

 From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):

 I'm very uncomfortable with risking breaking continuity in such an old
 release, and I don't think managing two lines of 1.4 releases is
 worth the effort. Though we have no official EOL policy, 1.3 was
 practically dead in the water once 1.4 was around, and I hope we start
 encouraging more adoption of 1.5 (and soon 1.6) versus continually
 propping up 1.4.

 I'd love to get people to move off of 1.4. However, I think adding Hadoop 2
 support to 1.4 encourages this more than leaving it out.

 Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
 surprised people find relying on 0.20 for the 1.5 WAL intimidating.
 Upgrading both HDFS and Accumulo across major versions at once is asking
 them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
 allow them to break the risk up into steps: they can upgrade HDFS versions
 first, get comfortable, then upgrade Accumulo to 1.5.

 I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 -Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher

On Tue, Nov 12, 2013 at 4:49 PM, Sean Busbey busbey...@clouderagovt.com wrote:
 On Tue, Nov 12, 2013 at 3:14 PM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

 The language of ACCUMULO-1795 indicated that an acceptable state was
 something that wasn't binary compatible. That's my #1 thing to avoid.


 Ah. So I see, not sure why I phrased that that way. Since the default build
 should still be 0.20.203.0, I'm not sure how it'd end up not being binary
 compatible. I can update the ticket to clarify the language. Any need to
 compile should be limited to running Hadoop 2.2.0.

 Sound good?

+1
(The confusing wording was the basis for my concerns also.)

  Maybe expressly only doing a binary convenience package for
  0.20.203.0?

 If we need an extra package, doesn't that mean a user can't just upgrade
 Accumulo?


 By binary convenience package I mean the binary distribution tarball (or
 rpms, or whatevs) that we make as a part of the release process. For users
 of Hadoop 0.20.203.0, upgrading should be unchanged from how they would
 normally get their Accumulo 1.4.x distribution.

 ACCUMULO-1796 has some leeway about the convenience packages for people who
 want Hadoop 2 support. On the extreme end, they'd have to build from source
 and then run a normal upgrade process.

I'd prefer binary compatibility with a single build, but if that's too
hard to achieve, I have no objection to providing a mechanism to
perform an alternate build against 2.x (whether or not we provide a
pre-built binary package for it), so long as the default build is
0.20.x

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Sean Busbey

On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote:

 The main thing is that I would not want to see an ACCUMULO-1790
 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for
 me.


That is precisely the intention of ACCUMULO-1790. All of the subtasks
(including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things
to get into the 1.4 branch. Until that time the work would just go into a
feature branch for ACCUMULO-1790 (to make working and testing easier for
those implementing the subtasks). If you wanted to see the full
implementation you would just wait until all of the subtasks were committed
to the feature branch.

Am I missing something?


-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-14 Thread Christopher

Nope, I think we're on the same page now.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Thu, Nov 14, 2013 at 7:39 PM, Sean Busbey busbey...@clouderagovt.com wrote:
 On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote:

 The main thing is that I would not want to see an ACCUMULO-1790
 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for
 me.


 That is precisely the intention of ACCUMULO-1790. All of the subtasks
 (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things
 to get into the 1.4 branch. Until that time the work would just go into a
 feature branch for ACCUMULO-1790 (to make working and testing easier for
 those implementing the subtasks). If you wanted to see the full
 implementation you would just wait until all of the subtasks were committed
 to the feature branch.

 Am I missing something?


 --
 Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Josh Elser



Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
resurrect this thread to make sure everyone's concerns are addressed.

For context, here's a link to the start of the last thread:

http://bit.ly/1aPqKuH

 From ACCUMULO-1792, ctubbsii:


I'd be reluctant to support any Hadoop 2.x support in the 1.4 release

line that breaks compatibility with 0.20. I don't think breaking 0.20

and then possibly fixing it again as a second step is acceptable (because

that subsequent work may not ever be done, and I don't think

we should break the compatibility contract that we've established with

1.4.0).

Chris, I believe keeping all of the work in a branch under the umbrella
jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
that doesn't have proper support for 0.20.203.

Is there something beyond making sure the branch passes a full set of
release tests on 0.20.203 that you'd like to see? In the event that the
branch only ever contains the work for adding Hadoop 2, it's a simple
matter to abandon without rolling into the 1.4 development line.

 From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):


I'm very uncomfortable with risking breaking continuity in such an old

release, and I don't think managing two lines of 1.4 releases is

worth the effort. Though we have no official EOL policy, 1.3 was

practically dead in the water once 1.4 was around, and I hope we start

encouraging more adoption of 1.5 (and soon 1.6) versus continually

propping up 1.4.

I'd love to get people to move off of 1.4. However, I think adding Hadoop 2
support to 1.4 encourages this more than leaving it out.


I'm not sure I agree that adding Hadoop2 support to 1.4 encourages 
people to upgrade Accumulo. My gut reaction would be that it allows 
people to completely ignore Accumulo updates (ignoring moving to 1.4.5 
which would allow them to do hadoop2 with your proposed changes)



Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
surprised people find relying on 0.20 for the 1.5 WAL intimidating.
Upgrading both HDFS and Accumulo across major versions at once is asking
them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
allow them to break the risk up into steps: they can upgrade HDFS versions
first, get comfortable, then upgrade Accumulo to 1.5.


Personally, maintaining 0.20 compatibility is not a big concern on my 
radar. If you're still running an 0.20 release, I'd *really* hope that 
you have an upgrade path to 1.2.x (if not 2.2.x) scheduled.


I think claiming that 1.5 has a higher burden on 1.4 is a bit of a 
fallacy. There were many problems and pains regarding WALs in =1.4 that 
are very difficult to work with in a large environment (try finding WALs 
in server failure cases). I think the increased I/O on HDFS is a much 
smaller cost than the completely different I/O path that the old loggers 
have.


I also think upgrading Accumulo is much less scary than upgrading HDFS, 
but that's just me.


To me, it seems like the argument may be coming down to whether or not 
we break 0.20 hadoop compatibility on a bug-fix release and how 
concerned we are about letting users lag behind the upstream development.



I think the existing tickets under the umbrella of ACCUMULO-1790 should
ensure that we end up with a single 1.4 line that can work with either the
existing 0.20.203.0 claimed in releases or against 2.2.0.

Bill (or Josh or Chris), is there stronger language you'd like to see
around docs / packaging (area #3 in the original plan and currently
ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for
0.20.203.0? Are you looking for something beyond a full release suite to
ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?



Again, my biggest concern here is not following our own guidelines of 
breaking changes across minor releases, but I'd hope 0.20 users have an 
upgrade path outlined for themselves.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread William Slacum

A user of 1.4.a should be able to move to 1.4.b without any major
infrastructure changes, such as swapping out HDFS or installing extra
add-ons.

I don't find much merit in debating local WAL vs HDFS WAL cost/benefit
since the only quantifiable evidence we have supported the move.

I should note, Sean, that if you see merit in the work, you don't need
community approval for forking and sharing. However, I do not think it is
in the community's best interest to continue to upgrade 1.4.



On Tue, Nov 12, 2013 at 2:12 PM, Josh Elser josh.el...@gmail.com wrote:


 Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to
 resurrect this thread to make sure everyone's concerns are addressed.

 For context, here's a link to the start of the last thread:

 http://bit.ly/1aPqKuH

  From ACCUMULO-1792, ctubbsii:

  I'd be reluctant to support any Hadoop 2.x support in the 1.4 release

 line that breaks compatibility with 0.20. I don't think breaking 0.20

 and then possibly fixing it again as a second step is acceptable (because

 that subsequent work may not ever be done, and I don't think

 we should break the compatibility contract that we've established with

 1.4.0).

 Chris, I believe keeping all of the work in a branch under the umbrella
 jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release
 that doesn't have proper support for 0.20.203.

 Is there something beyond making sure the branch passes a full set of
 release tests on 0.20.203 that you'd like to see? In the event that the
 branch only ever contains the work for adding Hadoop 2, it's a simple
 matter to abandon without rolling into the 1.4 development line.

  From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii):

  I'm very uncomfortable with risking breaking continuity in such an old

 release, and I don't think managing two lines of 1.4 releases is

 worth the effort. Though we have no official EOL policy, 1.3 was

 practically dead in the water once 1.4 was around, and I hope we start

 encouraging more adoption of 1.5 (and soon 1.6) versus continually

 propping up 1.4.

 I'd love to get people to move off of 1.4. However, I think adding Hadoop
 2
 support to 1.4 encourages this more than leaving it out.


 I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people
 to upgrade Accumulo. My gut reaction would be that it allows people to
 completely ignore Accumulo updates (ignoring moving to 1.4.5 which would
 allow them to do hadoop2 with your proposed changes)


  Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not
 surprised people find relying on 0.20 for the 1.5 WAL intimidating.
 Upgrading both HDFS and Accumulo across major versions at once is asking
 them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we
 allow them to break the risk up into steps: they can upgrade HDFS versions
 first, get comfortable, then upgrade Accumulo to 1.5.


 Personally, maintaining 0.20 compatibility is not a big concern on my
 radar. If you're still running an 0.20 release, I'd *really* hope that you
 have an upgrade path to 1.2.x (if not 2.2.x) scheduled.

 I think claiming that 1.5 has a higher burden on 1.4 is a bit of a
 fallacy. There were many problems and pains regarding WALs in =1.4 that
 are very difficult to work with in a large environment (try finding WALs in
 server failure cases). I think the increased I/O on HDFS is a much smaller
 cost than the completely different I/O path that the old loggers have.

 I also think upgrading Accumulo is much less scary than upgrading HDFS,
 but that's just me.

 To me, it seems like the argument may be coming down to whether or not we
 break 0.20 hadoop compatibility on a bug-fix release and how concerned we
 are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
 for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 Again, my biggest concern here is not following our own guidelines of
 breaking changes across minor releases, but I'd hope 0.20 users have an
 upgrade path outlined for themselves.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey

On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote:



 To me, it seems like the argument may be coming down to whether or not we
 break 0.20 hadoop compatibility on a bug-fix release and how concerned we
 are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should
 ensure that we end up with a single 1.4 line that can work with either the
 existing 0.20.203.0 claimed in releases or against 2.2.0.

 Bill (or Josh or Chris), is there stronger language you'd like to see
 around docs / packaging (area #3 in the original plan and currently
 ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
 for
 0.20.203.0? Are you looking for something beyond a full release suite to
 ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?


 Again, my biggest concern here is not following our own guidelines of
 breaking changes across minor releases, but I'd hope 0.20 users have an
 upgrade path outlined for themselves.



The plan outlined in the original thread, and in the subtasks under
ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the
1.4 bugfix line. If there's anything we can do besides running through the
release test suite on a 0.20 cluster to help ensure that, I am interested
in adding it to the existing plan.


-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey

On Tue, Nov 12, 2013 at 1:28 PM, William Slacum 
wilhelm.von.cl...@accumulo.net wrote:

 A user of 1.4.a should be able to move to 1.4.b without any major
 infrastructure changes, such as swapping out HDFS or installing extra
 add-ons.



Right, exactly. Hopefully no part of the original plan contradicts this. Is
there something that appears to?


-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Josh Elser


On 11/12/13, 12:24 PM, Sean Busbey wrote:

On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote:






To me, it seems like the argument may be coming down to whether or not we
break 0.20 hadoop compatibility on a bug-fix release and how concerned we
are about letting users lag behind the upstream development.


  I think the existing tickets under the umbrella of ACCUMULO-1790 should

ensure that we end up with a single 1.4 line that can work with either the
existing 0.20.203.0 claimed in releases or against 2.2.0.

Bill (or Josh or Chris), is there stronger language you'd like to see
around docs / packaging (area #3 in the original plan and currently
ACCUMULO-1796)? Maybe expressly only doing a binary convenience package
for
0.20.203.0? Are you looking for something beyond a full release suite to
ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203?



Again, my biggest concern here is not following our own guidelines of
breaking changes across minor releases, but I'd hope 0.20 users have an
upgrade path outlined for themselves.




The plan outlined in the original thread, and in the subtasks under
ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the
1.4 bugfix line. If there's anything we can do besides running through the
release test suite on a 0.20 cluster to help ensure that, I am interested
in adding it to the existing plan.




What about the other half: encouraging users to lag (soon to be) two 
major releases behind?

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread William Slacum

The language of ACCUMULO-1795 indicated that an acceptable state was
something that wasn't binary compatible. That's my #1 thing to avoid.

 Maybe expressly only doing a binary convenience package for
 0.20.203.0?

If we need an extra package, doesn't that mean a user can't just upgrade
Accumulo?

As a side note, 0.20.203.0 is 1.4,

On Tue, Nov 12, 2013 at 3:28 PM, Sean Busbey busbey...@clouderagovt.comwrote:

 On Tue, Nov 12, 2013 at 1:28 PM, William Slacum 
 wilhelm.von.cl...@accumulo.net wrote:

  A user of 1.4.a should be able to move to 1.4.b without any major
  infrastructure changes, such as swapping out HDFS or installing extra
  add-ons.
 
 

 Right, exactly. Hopefully no part of the original plan contradicts this. Is
 there something that appears to?


 --
 Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-11-12 Thread Sean Busbey

On Tue, Nov 12, 2013 at 2:48 PM, Josh Elser josh.el...@gmail.com wrote:



 What about the other half: encouraging users to lag (soon to be) two
 major releases behind?



I don't think our current user base needs to be encouraged strongly to
upgrade. And as I said previously I think this change provides them with an
upgrade path that's easier to stomach, but I suspect this is a point we
disagree on.

-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread dlmarion

Just to be clear, we are talking about adding profile support to the pom's for 
Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about 
changing the default build profile for these branches are we? 

- Original Message -

From: Billie Rinaldi billie.rina...@gmail.com 
To: dev@accumulo.apache.org 
Sent: Monday, October 14, 2013 11:57:40 PM 
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 

Thanks for the note, Ted. That vote is for 2.2.0, not -beta. 
On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: 

 w.r.t. hadoop-2 release, see this thread: 

 http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 

 Looks like 2.2.0-beta would pass votes. 

 Cheers 

 On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: 

  Responses Inline. 

  - Mike 

  On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com 
 wrote: 

   Hey All, 

   I'd like to restart the conversation from end July / start August about 
   Hadoop 2 support on the 1.4 branch. 

   Specifically, I'd like to get some requirements ironed out so I can 
 file 
   one or more jiras. I'd also like to get a plan for application. 

   =requirements 

   Here's the requirements I have from the last thread: 

   1)  Maintain existing 1.4 compatibility 

   The only thing I see listed in the pom is Apache release 0.20.203.0. 
  (1.4.4 
   tag)[1] 

   I don't see anything in the README[2] nor the user manual[3] on other 
   versions being supported. 

   Yep. 

   2) Gain Hadoop 2 support 

   At the moment, I'm presuming this means Apache release 2.0.4-alpha 
 since 
   that's what 1.5.0 builds against for Hadoop 2. 

   I haven't been following the Hadoop 2 release schedule that closely, 
 but 
  I 
  think the latest is a 2.1.0-beta? Pretty sure it was released after we 
  finished Accumulo 1.5, so there's no reason not to support it in my mind. 
  Depending on an alpha of something strikes me as either unstable or 
 lazy, 
  although I fully understand that it may be neither. 

   3) Test for correctness on given versions, with = 5 node cluster 

   * Unit Tests 
   * Functional Tests 
   * 24hr continuous + verification 
   * 24hr continuous + verification + agitation 
   * 24hr random walk 
   * 24hr random walk + agitation 

   Keith mentioned running these against a CDH4 cluster, but I presume 
 that 
   since Apache Releases are our stated compatibilities it would actually 
 be 
   against whatever versions we list. Based on #1 and #2 above, I would 
  expect 
   that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. 

   Hadoop 2 introduces some neat new things like NN HA, which I think it 
  might be worthwhile to test with. At that level it might be more of a 
  verification of the Hadoop code, but I'd like to be comfortable that our 
  DFS Clients switch correctly. This is in addition to the standard release 
  suite that we run. [1] 

  [1]: http://accumulo.apache.org/governance/releasing.html#testing 

   4) Binary packaging 
   4a) Either source produces a single binary for all accepted versions 

   or 

   4b) Instructions for building from source for each versions and somehow 
   flag what (if any) convenience binaries are made for the release. 

  Having run the binary packaging for 1.4.4, I can tell you that it is not 
 in 
  great shape. Christopher cleaned up a lot of the issues in the 1.5 line, 
 so 
  I didn't bother spending a ton of time on them here, but I think RPM and 
  DEB are both broken. It would be nice to be able to specify a Hadoop 2 
  version for compilation, similar to what happens in the newer code base, 
  which could be back ported, I suppose. 4b seems easier. 

  =application 

   There will be many back-ported patches. Not much active development 
  happens 
   on 1.4.x now, but I presume this should still all go onto a feature 
  branch? 

   Is the community preference that eventually all the changes become a 
  single 
   commit (or one-per-subtask if there are multiple jiras) on the active 
 1.4 
   development branch, or that the original patches remain broken out? 

   Not sure what you mean by this. 

   For what it's worth, I'd recommend keeping them broken out. (And that's 
  how 
   the initial development against CDH4 has been done.) 

   [1] http://bit.ly/1fxucMe 
   [2] http://bit.ly/192zUAJ 
   [3] 

 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies 

   -- 
   Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Sean Busbey

On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the pom's
 for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking
 about changing the default build profile for these branches are we?



for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
am not suggesting we change the default from building against Hadoop
0.23.203.


I'm not sure about the change to 1.5.1-SNAPSHOT. I believe we're talking
about changing the hadoop.profile for 2.0 to use the 2.2.0 release. I don't
think it makes sense to change the default off of the version in the
hadoop.profile for 1.0.

Presumably this change would also happen in master. Now that Hadoop 2.x is
going to have a GA release, I think it makes sense to have a discussion
about changing the default to be the hadoop 2.0 profile for master, but
this is not that discussion.


-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Sean Busbey

On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote:


 On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:

 Just to be clear, we are talking about adding profile support to the
 pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not
 talking about changing the default build profile for these branches are we?



 for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
 am not suggesting we change the default from building against Hadoop
 0.23.203.



I mean 0.20.203.0. Ugh, Hadoop versions.

-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-15 Thread Joey Echeverria

I think you meant:

Ugh, Hadoop versions.[1]

[1]
http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/


On Tue, Oct 15, 2013 at 11:20 AM, Sean Busbey bus...@cloudera.com wrote:

 On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote:

 
  On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote:
 
  Just to be clear, we are talking about adding profile support to the
  pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are
 not
  talking about changing the default build profile for these branches are
 we?
 
 
 
  for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I
  am not suggesting we change the default from building against Hadoop
  0.23.203.
 
 
 
 I mean 0.20.203.0. Ugh, Hadoop versions.

 --
 Sean




-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Mike Drob

Responses Inline.

- Mike

On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote:

 Hey All,

 I'd like to restart the conversation from end July / start August about
 Hadoop 2 support on the 1.4 branch.

 Specifically, I'd like to get some requirements ironed out so I can file
 one or more jiras. I'd also like to get a plan for application.

 =requirements

 Here's the requirements I have from the last thread:

 1)  Maintain existing 1.4 compatibility

 The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
 tag)[1]

 I don't see anything in the README[2] nor the user manual[3] on other
 versions being supported.

 Yep.


 2) Gain Hadoop 2 support

 At the moment, I'm presuming this means Apache release 2.0.4-alpha since
 that's what 1.5.0 builds against for Hadoop 2.

 I haven't been following the Hadoop 2 release schedule that closely, but I
think the latest is a 2.1.0-beta? Pretty sure it was released after we
finished Accumulo 1.5, so there's no reason not to support it in my mind.
Depending on an alpha of something strikes me as either unstable or lazy,
although I fully understand that it may be neither.


 3) Test for correctness on given versions, with = 5 node cluster

 * Unit Tests
 * Functional Tests
 * 24hr continuous + verification
 * 24hr continuous + verification + agitation
 * 24hr random walk
 * 24hr random walk + agitation

 Keith mentioned running these against a CDH4 cluster, but I presume that
 since Apache Releases are our stated compatibilities it would actually be
 against whatever versions we list. Based on #1 and #2 above, I would expect
 that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.

 Hadoop 2 introduces some neat new things like NN HA, which I think it
might be worthwhile to test with. At that level it might be more of a
verification of the Hadoop code, but I'd like to be comfortable that our
DFS Clients switch correctly. This is in addition to the standard release
suite that we run. [1]

[1]: http://accumulo.apache.org/governance/releasing.html#testing


 4) Binary packaging
 4a) Either source produces a single binary for all accepted versions

 or

 4b) Instructions for building from source for each versions and somehow
 flag what (if any) convenience binaries are made for the release.


Having run the binary packaging for 1.4.4, I can tell you that it is not in
great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so
I didn't bother spending a ton of time on them here, but I think RPM and
DEB are both broken. It would be nice to be able to specify a Hadoop 2
version for compilation, similar to what happens in the newer code base,
which could be back ported, I suppose. 4b seems easier.

=application

 There will be many back-ported patches. Not much active development happens
 on 1.4.x now, but I presume this should still all go onto a feature branch?

 Is the community preference that eventually all the changes become a single
 commit (or one-per-subtask if there are multiple jiras) on the active 1.4
 development branch, or that the original patches remain broken out?

 Not sure what you mean by this.


 For what it's worth, I'd recommend keeping them broken out. (And that's how
 the initial development against CDH4 has been done.)


 [1] http://bit.ly/1fxucMe
 [2] http://bit.ly/192zUAJ
 [3]
 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies

 --
 Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Josh Elser

For #2, from what I've read, we should definitely bump up the dependency 
on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 
2.2.0-beta for that hadoop-2 profile.


I probably stated this before, but I'd much rather see more effort in 
testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) 
against hadoop-2 (like Mike's point about HA). I'm not sure if anyone 
ever did testing of Accumulo with the hadoop-2 features -- I seem to 
recall that it was more testing does Accumulo run on both hadoop 1 and 2.


If we can maintain a single artifact, that would definitely be easiest 
for users, but falling back to user-built artifacts or convenience 
releases isn't the end of the world.


As far as commits, I'd like to see as much separation as possible, but 
it's understandable if the changes overlap and don't make sense to split 
out.


On 10/14/13 12:55 PM, Sean Busbey wrote:

Hey All,

I'd like to restart the conversation from end July / start August about
Hadoop 2 support on the 1.4 branch.

Specifically, I'd like to get some requirements ironed out so I can file
one or more jiras. I'd also like to get a plan for application.

=requirements

Here's the requirements I have from the last thread:

1)  Maintain existing 1.4 compatibility

The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4
tag)[1]

I don't see anything in the README[2] nor the user manual[3] on other
versions being supported.


2) Gain Hadoop 2 support

At the moment, I'm presuming this means Apache release 2.0.4-alpha since
that's what 1.5.0 builds against for Hadoop 2.

3) Test for correctness on given versions, with = 5 node cluster

* Unit Tests
* Functional Tests
* 24hr continuous + verification
* 24hr continuous + verification + agitation
* 24hr random walk
* 24hr random walk + agitation

Keith mentioned running these against a CDH4 cluster, but I presume that
since Apache Releases are our stated compatibilities it would actually be
against whatever versions we list. Based on #1 and #2 above, I would expect
that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.

4) Binary packaging
4a) Either source produces a single binary for all accepted versions

or

4b) Instructions for building from source for each versions and somehow
flag what (if any) convenience binaries are made for the release.

=application

There will be many back-ported patches. Not much active development happens
on 1.4.x now, but I presume this should still all go onto a feature branch?

Is the community preference that eventually all the changes become a single
commit (or one-per-subtask if there are multiple jiras) on the active 1.4
development branch, or that the original patches remain broken out?

For what it's worth, I'd recommend keeping them broken out. (And that's how
the initial development against CDH4 has been done.)


[1] http://bit.ly/1fxucMe
[2] http://bit.ly/192zUAJ
[3]
http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Sean Busbey

On Mon, Oct 14, 2013 at 9:24 PM, Mike Drob md...@mdrob.com wrote:


  3) Test for correctness on given versions, with = 5 node cluster
 
  * Unit Tests
  * Functional Tests
  * 24hr continuous + verification
  * 24hr continuous + verification + agitation
  * 24hr random walk
  * 24hr random walk + agitation
 
  Keith mentioned running these against a CDH4 cluster, but I presume that
  since Apache Releases are our stated compatibilities it would actually be
  against whatever versions we list. Based on #1 and #2 above, I would
 expect
  that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
 
 Hadoop 2 introduces some neat new things like NN HA, which I think it
 might be worthwhile to test with. At that level it might be more of a
 verification of the Hadoop code, but I'd like to be comfortable that our
 DFS Clients switch correctly. This is in addition to the standard release
 suite that we run. [1]

 [1]: http://accumulo.apache.org/governance/releasing.html#testing




Just to confirm, the change from Keith's request is

* 72hr continuous + agitation + cluster running
* Something to test that HA NN failover doesn't take out Accumulo

Would the latter be addressed by an additional functional test? or would it
need to be some kind of addition to the agitation?




 Having run the binary packaging for 1.4.4, I can tell you that it is not in
 great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so
 I didn't bother spending a ton of time on them here, but I think RPM and
 DEB are both broken. It would be nice to be able to specify a Hadoop 2
 version for compilation, similar to what happens in the newer code base,
 which could be back ported, I suppose. 4b seems easier.



I think this means you're +0 on 4b?



  =application
 
  There will be many back-ported patches. Not much active development
 happens
  on 1.4.x now, but I presume this should still all go onto a feature
 branch?
 
  Is the community preference that eventually all the changes become a
 single
  commit (or one-per-subtask if there are multiple jiras) on the active 1.4
  development branch, or that the original patches remain broken out?
 
 Not sure what you mean by this.


It's the difference between the 1.4.x branch having all the commits that
are backported from 1.5.x vs just having squashed ones. The former
maintains more of the original authorship and ties to original jiras. The
latter has less noise.

-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Sean Busbey

On Mon, Oct 14, 2013 at 10:02 PM, Josh Elser josh.el...@gmail.com wrote:

 For #2, from what I've read, we should definitely bump up the dependency
 on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to
 2.2.0-beta for that hadoop-2 profile.


so 1.5.1-SNAPSHOT and this proposed change to 1.4.5-SNAPSHOT should both
target 2.2.0-beta, presuming the RC passes (and 2.1.0-beta prior). This
sounds inline with Mike's comment re: alpha v beta.

anyone have an objection?



 I probably stated this before, but I'd much rather see more effort in
 testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon)
 against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever
 did testing of Accumulo with the hadoop-2 features -- I seem to recall that
 it was more testing does Accumulo run on both hadoop 1 and 2.



I figured whatever bar I end up passing for Hadoop 2 support on 1.4.x
should help with testing the same for 1.5.x and 1.6.x.


-- 
Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-10-14 Thread Billie Rinaldi

Thanks for the note, Ted. That vote is for 2.2.0, not -beta.
On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote:

 w.r.t. hadoop-2 release, see this thread:

 http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0

 Looks like 2.2.0-beta would pass votes.

 Cheers


 On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote:

  Responses Inline.
 
  - Mike
 
  On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com
 wrote:
 
   Hey All,
  
   I'd like to restart the conversation from end July / start August about
   Hadoop 2 support on the 1.4 branch.
  
   Specifically, I'd like to get some requirements ironed out so I can
 file
   one or more jiras. I'd also like to get a plan for application.
  
   =requirements
  
   Here's the requirements I have from the last thread:
  
   1)  Maintain existing 1.4 compatibility
  
   The only thing I see listed in the pom is Apache release 0.20.203.0.
  (1.4.4
   tag)[1]
  
   I don't see anything in the README[2] nor the user manual[3] on other
   versions being supported.
  
   Yep.
 
 
   2) Gain Hadoop 2 support
  
   At the moment, I'm presuming this means Apache release 2.0.4-alpha
 since
   that's what 1.5.0 builds against for Hadoop 2.
  
   I haven't been following the Hadoop 2 release schedule that closely,
 but
  I
  think the latest is a 2.1.0-beta? Pretty sure it was released after we
  finished Accumulo 1.5, so there's no reason not to support it in my mind.
  Depending on an alpha of something strikes me as either unstable or
 lazy,
  although I fully understand that it may be neither.
 
 
   3) Test for correctness on given versions, with = 5 node cluster
  
   * Unit Tests
   * Functional Tests
   * 24hr continuous + verification
   * 24hr continuous + verification + agitation
   * 24hr random walk
   * 24hr random walk + agitation
  
   Keith mentioned running these against a CDH4 cluster, but I presume
 that
   since Apache Releases are our stated compatibilities it would actually
 be
   against whatever versions we list. Based on #1 and #2 above, I would
  expect
   that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha.
  
   Hadoop 2 introduces some neat new things like NN HA, which I think it
  might be worthwhile to test with. At that level it might be more of a
  verification of the Hadoop code, but I'd like to be comfortable that our
  DFS Clients switch correctly. This is in addition to the standard release
  suite that we run. [1]
 
  [1]: http://accumulo.apache.org/governance/releasing.html#testing
 
 
   4) Binary packaging
   4a) Either source produces a single binary for all accepted versions
  
   or
  
   4b) Instructions for building from source for each versions and somehow
   flag what (if any) convenience binaries are made for the release.
  
  
  Having run the binary packaging for 1.4.4, I can tell you that it is not
 in
  great shape. Christopher cleaned up a lot of the issues in the 1.5 line,
 so
  I didn't bother spending a ton of time on them here, but I think RPM and
  DEB are both broken. It would be nice to be able to specify a Hadoop 2
  version for compilation, similar to what happens in the newer code base,
  which could be back ported, I suppose. 4b seems easier.
 
  =application
  
   There will be many back-ported patches. Not much active development
  happens
   on 1.4.x now, but I presume this should still all go onto a feature
  branch?
  
   Is the community preference that eventually all the changes become a
  single
   commit (or one-per-subtask if there are multiple jiras) on the active
 1.4
   development branch, or that the original patches remain broken out?
  
   Not sure what you mean by this.
 
 
   For what it's worth, I'd recommend keeping them broken out. (And that's
  how
   the initial development against CDH4 has been done.)
  
  
   [1] http://bit.ly/1fxucMe
   [2] http://bit.ly/192zUAJ
   [3]
  
 
 http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies
  
   --
   Sean

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Joey Echeverria

Sorry for the delay, it's been one of those weeks.

The current version would probably not be backwards compatible to
0.20.2 just based on changes in dependencies. We're looking right now
to see how hard it is to have three way compatibility (0.20, 1.0,
2.0).

-Joey

On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Christopher

Would it be reasonable to consider a version of 1.4 that breaks
compatibility with 0.20? I'm not really a fan of this, personally, but
am curious what others think.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote:
 Sorry for the delay, it's been one of those weeks.

 The current version would probably not be backwards compatible to
 0.20.2 just based on changes in dependencies. We're looking right now
 to see how hard it is to have three way compatibility (0.20, 1.0,
 2.0).

 -Joey

 On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Joey Echeverria

I don't think that's a good idea unless you can come up with very
clear version number change.

-Joey

On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote:
 Would it be reasonable to consider a version of 1.4 that breaks
 compatibility with 0.20? I'm not really a fan of this, personally, but
 am curious what others think.

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote:
 Sorry for the delay, it's been one of those weeks.

 The current version would probably not be backwards compatible to
 0.20.2 just based on changes in dependencies. We're looking right now
 to see how hard it is to have three way compatibility (0.20, 1.0,
 2.0).

 -Joey

 On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote:
 Any update?

 -Original Message-
 From: Joey Echeverria [mailto:j...@cloudera.com]
 Sent: Monday, July 29, 2013 1:24 PM
 To: dev@accumulo.apache.org
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

 We're testing this today. I'll report back what we find.


 -Joey
 —
 Sent from Mailbox for iPhone

 On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches?
 Great point Billie.
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that
  if a user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
 1.0) to make the compatibility requirements simpler; we ended up
 without dependency changes in the hadoop version profiles.  Will 1.4
 still work with 0.20 with these patches?  If there are dependency
 changes in the profiles, 1.4 would have to be compiled against a
 hadoop version compatible with the running version of hadoop, correct?
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary
 artifacts (each compiled against a different version of hadoop) for
 the same release.  Just something we should consider before we are in
 the midst of releasing 1.4.4.
 Billie
 -Joey





 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.



-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-02 Thread Mike Drob

Which version of 0.20 are you testing against? Vanilla, or cdh3 flavored?


On Fri, Aug 2, 2013 at 2:37 PM, Joey Echeverria j...@cloudera.com wrote:

 I don't think that's a good idea unless you can come up with very
 clear version number change.

 -Joey

 On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote:
  Would it be reasonable to consider a version of 1.4 that breaks
  compatibility with 0.20? I'm not really a fan of this, personally, but
  am curious what others think.
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com
 wrote:
  Sorry for the delay, it's been one of those weeks.
 
  The current version would probably not be backwards compatible to
  0.20.2 just based on changes in dependencies. We're looking right now
  to see how hard it is to have three way compatibility (0.20, 1.0,
  2.0).
 
  -Joey
 
  On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net
 wrote:
  Any update?
 
  -Original Message-
  From: Joey Echeverria [mailto:j...@cloudera.com]
  Sent: Monday, July 29, 2013 1:24 PM
  To: dev@accumulo.apache.org
  Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
 
  We're testing this today. I'll report back what we find.
 
 
  -Joey
  —
  Sent from Mailbox for iPhone
 
  On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:
 
  Will 1.4 still work with 0.20 with these patches?
  Great point Billie.
  - Original Message -
  From: Billie Rinaldi billie.rina...@gmail.com
  To: dev@accumulo.apache.org
  Sent: Friday, July 26, 2013 3:02:41 PM
  Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul
  26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
   If these patches are going to be included with 1.4.4 or 1.4.5, I
   would
  like
   to see the following test run using CDH4 on at least a 5 node
 cluster.
More nodes would be better.
  
 * unit test
 * Functional test
 * 24 hr Continuous ingest + verification
 * 24 hr Continuous ingest + verification + agitation
 * 24 hr Random walk
 * 24 hr Random walk + agitation
  
   I may be able to assist with this, but I can not make any promises.
 
  Sure thing. Is there already a write-up on running this full battery
  of tests? I have a 10 node cluster that I can use for this.
 
 
   Great.  I think this would be a good patch for 1.4.   I assume that
   if a user stays with Hadoop 1 there are no dependency changes?
 
  Yup. It works the same way as 1.5 where all of the dependency changes
  are in a Hadoop 2.0 profile.
 
  In 1.5.0, we gave up on compatibility with 0.20 (and early versions of
  1.0) to make the compatibility requirements simpler; we ended up
  without dependency changes in the hadoop version profiles.  Will 1.4
  still work with 0.20 with these patches?  If there are dependency
  changes in the profiles, 1.4 would have to be compiled against a
  hadoop version compatible with the running version of hadoop, correct?
  We had some trouble in the
  1.5 release process with figuring out how to provide multiple binary
  artifacts (each compiled against a different version of hadoop) for
  the same release.  Just something we should consider before we are in
  the midst of releasing 1.4.4.
  Billie
  -Joey
 
 
 
 
 
  --
  Joey Echeverria
  Director, Federal FTS
  Cloudera, Inc.



 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.

RE: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-08-01 Thread Dave Marion

Any update?

-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com] 
Sent: Monday, July 29, 2013 1:24 PM
To: dev@accumulo.apache.org
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

We're testing this today. I'll report back what we find. 


-Joey
—
Sent from Mailbox for iPhone

On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches? 
 Great point Billie. 
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com
 To: dev@accumulo.apache.org
 Sent: Friday, July 26, 2013 3:02:41 PM
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 
 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:
  If these patches are going to be included with 1.4.4 or 1.4.5, I 
  would
 like
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this.
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that 
  if a user stays with Hadoop 1 there are no dependency changes?
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile.
 
 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 
 1.0) to make the compatibility requirements simpler; we ended up 
 without dependency changes in the hadoop version profiles.  Will 1.4 
 still work with 0.20 with these patches?  If there are dependency 
 changes in the profiles, 1.4 would have to be compiled against a 
 hadoop version compatible with the running version of hadoop, correct?  
 We had some trouble in the
 1.5 release process with figuring out how to provide multiple binary 
 artifacts (each compiled against a different version of hadoop) for 
 the same release.  Just something we should consider before we are in 
 the midst of releasing 1.4.4.
 Billie
 -Joey

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-29 Thread Joey Echeverria

We're testing this today. I'll report back what we find. 


-Joey
—
Sent from Mailbox for iPhone

On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote:

 Will 1.4 still work with 0.20 with these patches? 
 Great point Billie. 
 - Original Message -
 From: Billie Rinaldi billie.rina...@gmail.com 
 To: dev@accumulo.apache.org 
 Sent: Friday, July 26, 2013 3:02:41 PM 
 Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 
 On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: 
  If these patches are going to be included with 1.4.4 or 1.4.5, I would 
 like 
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
    * unit test 
    * Functional test 
    * 24 hr Continuous ingest + verification 
    * 24 hr Continuous ingest + verification + agitation 
    * 24 hr Random walk 
    * 24 hr Random walk + agitation 
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this. 
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that if a 
  user stays with Hadoop 1 there are no dependency changes? 
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile. 
 
 In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) 
 to make the compatibility requirements simpler; we ended up without 
 dependency changes in the hadoop version profiles.  Will 1.4 still work 
 with 0.20 with these patches?  If there are dependency changes in the 
 profiles, 1.4 would have to be compiled against a hadoop version compatible 
 with the running version of hadoop, correct?  We had some trouble in the 
 1.5 release process with figuring out how to provide multiple binary 
 artifacts (each compiled against a different version of hadoop) for the 
 same release.  Just something we should consider before we are in the midst 
 of releasing 1.4.4. 
 Billie 
 -Joey

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Eric Newton

My question is if the community would be interested in us pulling those
back ports upstream?

Yes, please.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Joey Echeverria

We have both the unit tests and the full system test suite hooked up to a
Jenkins build server.

There are still a couple of tests that fail periodically with the full
system test due to timeouts. We're working on those which is why our
current release is just a beta.

There are no API changes or Accumulo behavior changes. You can use
unmodified 1.4.x clients with our release of the server daemons.

-Joey


On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote:

 On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com
 wrote:

  Cloudera announced last night our support for Accumulo 1.4.3 on CDH4:
 
  http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera
 
  This required back porting about 11 patches in whole or in part from the
  1.5 line on top of 1.4.3. Our release is still in a semi-private beta,
 but
  when it's fully public it will be downloadable along with all of the
 extra
  patches that we committed.
 
  My question is if the community would be interested in us pulling those
  back ports upstream?
 

 What testing has been done?  It would be nice to run accumulo's full test
 suite against 1.4.3+CDH4.

 Are there any Accumulo API changes or Accumulo behavior changes?


  I believe this would violate the previously agreed upon rule of no
 feature
  back ports to 1.4.3, depending on how we label support for Hadoop 2.0.


  Thoughts?
 
  -Joey
 




-- 
Joey Echeverria
Director, Federal FTS
Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Keith Turner

On Fri, Jul 26, 2013 at 12:24 PM, Joey Echeverria j...@cloudera.com wrote:

 We have both the unit tests and the full system test suite hooked up to a
 Jenkins build server.


If these patches are going to be included with 1.4.4 or 1.4.5, I would like
to see the following test run using CDH4 on at least a 5 node cluster.
 More nodes would be better.

  * unit test
  * Functional test
  * 24 hr Continuous ingest + verification
  * 24 hr Continuous ingest + verification + agitation
  * 24 hr Random walk
  * 24 hr Random walk + agitation

I may be able to assist with this, but I can not make any promises.



 There are still a couple of tests that fail periodically with the full
 system test due to timeouts. We're working on those which is why our
 current release is just a beta.

 There are no API changes or Accumulo behavior changes. You can use
 unmodified 1.4.x clients with our release of the server daemons.


Great.  I think this would be a good patch for 1.4.   I assume that if a
user stays with Hadoop 1 there are no dependency changes?



 -Joey


 On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote:

  On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com
  wrote:
 
   Cloudera announced last night our support for Accumulo 1.4.3 on CDH4:
  
   http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera
  
   This required back porting about 11 patches in whole or in part from
 the
   1.5 line on top of 1.4.3. Our release is still in a semi-private beta,
  but
   when it's fully public it will be downloadable along with all of the
  extra
   patches that we committed.
  
   My question is if the community would be interested in us pulling those
   back ports upstream?
  
 
  What testing has been done?  It would be nice to run accumulo's full test
  suite against 1.4.3+CDH4.
 
  Are there any Accumulo API changes or Accumulo behavior changes?
 
 
   I believe this would violate the previously agreed upon rule of no
  feature
   back ports to 1.4.3, depending on how we label support for Hadoop
 2.0.
 
 
   Thoughts?
  
   -Joey
  
 



 --
 Joey Echeverria
 Director, Federal FTS
 Cloudera, Inc.

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Joey Echeverria

 If these patches are going to be included with 1.4.4 or 1.4.5, I would like
 to see the following test run using CDH4 on at least a 5 node cluster.
  More nodes would be better.

   * unit test
   * Functional test
   * 24 hr Continuous ingest + verification
   * 24 hr Continuous ingest + verification + agitation
   * 24 hr Random walk
   * 24 hr Random walk + agitation

 I may be able to assist with this, but I can not make any promises.

Sure thing. Is there already a write-up on running this full battery
of tests? I have a 10 node cluster that I can use for this.


 Great.  I think this would be a good patch for 1.4.   I assume that if a
 user stays with Hadoop 1 there are no dependency changes?

Yup. It works the same way as 1.5 where all of the dependency changes
are in a Hadoop 2.0 profile.

-Joey

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Billie Rinaldi

On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote:

  If these patches are going to be included with 1.4.4 or 1.4.5, I would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


  Great.  I think this would be a good patch for 1.4.   I assume that if a
  user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.


In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0)
to make the compatibility requirements simpler; we ended up without
dependency changes in the hadoop version profiles.  Will 1.4 still work
with 0.20 with these patches?  If there are dependency changes in the
profiles, 1.4 would have to be compiled against a hadoop version compatible
with the running version of hadoop, correct?  We had some trouble in the
1.5 release process with figuring out how to provide multiple binary
artifacts (each compiled against a different version of hadoop) for the
same release.  Just something we should consider before we are in the midst
of releasing 1.4.4.

Billie


 -Joey

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread Keith Turner

On Fri, Jul 26, 2013 at 2:33 PM, Joey Echeverria j...@cloudera.com wrote:

  If these patches are going to be included with 1.4.4 or 1.4.5, I would
 like
  to see the following test run using CDH4 on at least a 5 node cluster.
   More nodes would be better.
 
* unit test
* Functional test
* 24 hr Continuous ingest + verification
* 24 hr Continuous ingest + verification + agitation
* 24 hr Random walk
* 24 hr Random walk + agitation
 
  I may be able to assist with this, but I can not make any promises.

 Sure thing. Is there already a write-up on running this full battery
 of tests? I have a 10 node cluster that I can use for this.


There are some instructions.

test/system/continuous/README
test/system/randomwalk/README

Continuous ingest has a lot of options.  For release testing we do
something like the following.

  #configure may need to adjust max mappers and max reducers to make map
reduce job run faster
  start-ingest.sh
  start-walker.sh
  #sleep 24hr
  stop-ingest.sh
  stop-walker.sh
  run-verify.sh

There continuous dir has scripts for starting and stopping the agitator.
 We also use this script to agitate while running random walk test.

For random walk we use the All.xml graph, configure it to log errors to
NFS, and run a walker on each node.  We look in NFS for walkers that died
or got stuck.  The random walk framework will log a message if a node in
the graph gets stuck.  It will also log a message when it gets unstuck.



  Great.  I think this would be a good patch for 1.4.   I assume that if a
  user stays with Hadoop 1 there are no dependency changes?

 Yup. It works the same way as 1.5 where all of the dependency changes
 are in a Hadoop 2.0 profile.

 -Joey

Re: Hadoop 2.0 Support for Accumulo 1.4 Branch

2013-07-26 Thread dlmarion



Will 1.4 still work with 0.20 with these patches? 



Great point Billie. 



- Original Message -


From: Billie Rinaldi billie.rina...@gmail.com 
To: dev@accumulo.apache.org 
Sent: Friday, July 26, 2013 3:02:41 PM 
Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch 

On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: 

  If these patches are going to be included with 1.4.4 or 1.4.5, I would 
 like 
  to see the following test run using CDH4 on at least a 5 node cluster. 
   More nodes would be better. 
  
    * unit test 
    * Functional test 
    * 24 hr Continuous ingest + verification 
    * 24 hr Continuous ingest + verification + agitation 
    * 24 hr Random walk 
    * 24 hr Random walk + agitation 
  
  I may be able to assist with this, but I can not make any promises. 
 
 Sure thing. Is there already a write-up on running this full battery 
 of tests? I have a 10 node cluster that I can use for this. 
 
 
  Great.  I think this would be a good patch for 1.4.   I assume that if a 
  user stays with Hadoop 1 there are no dependency changes? 
 
 Yup. It works the same way as 1.5 where all of the dependency changes 
 are in a Hadoop 2.0 profile. 
 

In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) 
to make the compatibility requirements simpler; we ended up without 
dependency changes in the hadoop version profiles.  Will 1.4 still work 
with 0.20 with these patches?  If there are dependency changes in the 
profiles, 1.4 would have to be compiled against a hadoop version compatible 
with the running version of hadoop, correct?  We had some trouble in the 
1.5 release process with figuring out how to provide multiple binary 
artifacts (each compiled against a different version of hadoop) for the 
same release.  Just something we should consider before we are in the midst 
of releasing 1.4.4. 

Billie 


 -Joey

Re: hadoop-2.0 incompatibility

2013-05-21 Thread John Vines

Is this something else we can resolve via reflection or are we back to
square 1?


On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com wrote:

 Ugh.  While running the continuous ingest verify, yarn spit this out:

 Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was
 expected

 This is preventing the reduce step from completing.

 -Eric

Re: hadoop-2.0 incompatibility

2013-05-21 Thread Eric Newton

I'm testing a fix, but I'm not for holding up the release for this.

First, calling a method by reflection is quite a bit slower, so even if we
fix it, it might not be appropriate.




On Tue, May 21, 2013 at 11:49 AM, John Vines vi...@apache.org wrote:

 Is this something else we can resolve via reflection or are we back to
 square 1?


 On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com
 wrote:

  Ugh.  While running the continuous ingest verify, yarn spit this out:
 
  Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was
  expected
 
  This is preventing the reduce step from completing.
 
  -Eric

Re: hadoop-2.0 incompatibility

2013-05-21 Thread Adam Fuchs

We still have the option of putting out a separate build for 1.5.0
compatibility with hadoop 2. Should we vote on that release separately?
Seems like it should be easy to add more binary packages that correspond to
the same source release, even after the initial vote.

Adam



On Tue, May 21, 2013 at 11:55 AM, Keith Turner ke...@deenlo.com wrote:

 On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com
 wrote:

  Ugh.  While running the continuous ingest verify, yarn spit this out:
 
  Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was
  expected
 
  This is preventing the reduce step from completing.
 

 Could fix it in 1.5.1

 I am starting to think that hadoop compat was so important, it should have
 been mostly completed before the feature freeze.


 
  -Eric

Re: Hadoop 2 compatibility issues

2013-05-16 Thread Adam Fuchs

I also just snuck in that Hadoop 1/2 compatibility fix with JobContext
(ACCUMULO-1421). Not sure if that's the only change needed, but it should
be a step forward.

Adam



On Thu, May 16, 2013 at 11:23 AM, Eric Newton eric.new...@gmail.com wrote:

 I've snuck some necessary changes in... doing integration testing on it
 right now.

 -Eric



 On Wed, May 15, 2013 at 8:03 PM, John Vines vi...@apache.org wrote:

  I will gladly do it next week, but I'd rather not have it delay the
  release. The question from there is, is doing this type of packaging
 change
  too large to put in 1.5.1?
 
 
  On Wed, May 15, 2013 at 2:44 PM, Christopher ctubb...@apache.org
 wrote:
 
   So, I think that'd be great, if it works, but who is willing to do
   this work and get it in before I make another RC?
   I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
   these in to get it to work before then... or, by the next RC if RC3
   fails to pass a vote.
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs afu...@apache.org wrote:
It seems like the ideal option would be to have one binary build that
determines Hadoop version and switches appropriately at runtime. Has
   anyone
attempted to do this yet, and do we have an enumeration of the places
  in
Accumulo code where the incompatibilities show up?
   
One of the incompatibilities is in
  org.apache.hadoop.mapreduce.JobContext
switching between an abstract class and an interface. This can be
 fixed
with something to the effect of:
   
  public static Configuration getConfiguration(JobContext context) {
Impl impl = new Impl();
Configuration configuration = null;
try {
  Class c =
   
  
 
 TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext);
  Method m = c.getMethod(getConfiguration);
  Object o = m.invoke(context, new Object[0]);
  configuration = (Configuration)o;
} catch (Exception e) {
  throw new RuntimeException(e);
}
return configuration;
  }
   
Based on a test I just ran, using that getConfiguration method
 instead
  of
just calling the getConfiguration method on context should avoid the
  one
incompatibility. Maybe with a couple more changes like that we can
 get
   down
to one bytecode release for all known Hadoop versions?
   
Adam

Re: Hadoop 2 compatibility issues - tangent

2013-05-15 Thread John Vines

Awesome Chris, thanks. I didn't know where to begin looking for that one.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:11 PM, Christopher ctubb...@apache.org wrote:

 With the right configuration, you could use the copy-dependencies goal
 of the maven-dependency-plugin to gather your dependencies to one
 place.

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote:
  On that note, I was wondering if there were any suggestions for how to
 deal
  with the laundry list of provided dependencies that Accumulo core has?
  Writing packages against it is a bit ugly if not using the accumulo
 script
  to start. Are there any maven utilities to automatically dissect provided
  dependencies and make them included.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote:
 
  One note about option 4.  When using 1.4 users have to include hadoop
 core
  as a dependency in their pom. This must be done because the 1.4 Accumulo
  pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
 in
  the profile are provided?
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
   So, I've run into a problem with ACCUMULO-1402 that requires a larger
   discussion about how Accumulo 1.5.0 should support Hadoop2.
  
   The problem is basically that profiles should not contain
   dependencies, because profiles don't get activated transitively. A
   slide deck by the Maven developers point this out as a bad practice...
   yet it's a practice we rely on for our current implementation of
   Hadoop2 support
   (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
   slide 80).
  
   What this means is that even if we go through the work of publishing
   binary artifacts compiled against Hadoop2, neither our Hadoop1
   binaries or our Hadoop2 binaries will be able to transitively resolve
   any dependencies defined in profiles. This has significant
   implications to user code that depends on Accumulo Maven artifacts.
   Every user will essentially have to explicitly add Hadoop dependencies
   for every Accumulo artifact that has dependencies on Hadoop, either
   because we directly or transitively depend on Hadoop (they'll have to
   peek into the profiles in our POMs and copy/paste the profile into
   their project). This becomes more complicated when we consider how
   users will try to use things like Instamo.
  
   There are workarounds, but none of them are really pleasant.
  
   1. The best way to support both major Hadoop APIs is to have separate
   modules with separate dependencies directly in the POM. This is a fair
   amount of work, and in my opinion, would be too disruptive for 1.5.0.
   This solution also gets us separate binaries for separate supported
   versions, which is useful.
  
   2. A second option, and the preferred one I think for 1.5.0, is to put
   a Hadoop2 patch in the branch's contrib directory
   (branches/1.5/contrib) that patches the POM files to support building
   against Hadoop2. (Acknowledgement to Keith for suggesting this
   solution.)
  
   3. A third option is to fork Accumulo, and maintain two separate
   builds (a more traditional technique). This adds merging nightmare for
   features/patches, but gets around some reflection hacks that we may
   have been motivated to do in the past. I'm not a fan of this option,
   particularly because I don't want to replicate the fork nightmare that
   has been the history of early Hadoop itself.
  
   4. The last option is to do nothing and to continue to build with the
   separate profiles as we are, and make users discover and specify
   transitive dependencies entirely on their own. I think this is the
   worst option, as it essentially amounts to ignore the problem.
  
   At the very least, it does not seem reasonable to complete
   ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
  
   Thoughts? Discussion? Vote on option?
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues - tangent

2013-05-15 Thread Christopher

No problem. FYI, this is essentially what we do to drop the
non-provided deps into lib/ in the first place.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 15, 2013 at 3:03 AM, John Vines vi...@apache.org wrote:
 Awesome Chris, thanks. I didn't know where to begin looking for that one.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 7:11 PM, Christopher ctubb...@apache.org wrote:

 With the right configuration, you could use the copy-dependencies goal
 of the maven-dependency-plugin to gather your dependencies to one
 place.

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote:
  On that note, I was wondering if there were any suggestions for how to
 deal
  with the laundry list of provided dependencies that Accumulo core has?
  Writing packages against it is a bit ugly if not using the accumulo
 script
  to start. Are there any maven utilities to automatically dissect provided
  dependencies and make them included.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote:
 
  One note about option 4.  When using 1.4 users have to include hadoop
 core
  as a dependency in their pom. This must be done because the 1.4 Accumulo
  pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
 in
  the profile are provided?
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
   So, I've run into a problem with ACCUMULO-1402 that requires a larger
   discussion about how Accumulo 1.5.0 should support Hadoop2.
  
   The problem is basically that profiles should not contain
   dependencies, because profiles don't get activated transitively. A
   slide deck by the Maven developers point this out as a bad practice...
   yet it's a practice we rely on for our current implementation of
   Hadoop2 support
   (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
   slide 80).
  
   What this means is that even if we go through the work of publishing
   binary artifacts compiled against Hadoop2, neither our Hadoop1
   binaries or our Hadoop2 binaries will be able to transitively resolve
   any dependencies defined in profiles. This has significant
   implications to user code that depends on Accumulo Maven artifacts.
   Every user will essentially have to explicitly add Hadoop dependencies
   for every Accumulo artifact that has dependencies on Hadoop, either
   because we directly or transitively depend on Hadoop (they'll have to
   peek into the profiles in our POMs and copy/paste the profile into
   their project). This becomes more complicated when we consider how
   users will try to use things like Instamo.
  
   There are workarounds, but none of them are really pleasant.
  
   1. The best way to support both major Hadoop APIs is to have separate
   modules with separate dependencies directly in the POM. This is a fair
   amount of work, and in my opinion, would be too disruptive for 1.5.0.
   This solution also gets us separate binaries for separate supported
   versions, which is useful.
  
   2. A second option, and the preferred one I think for 1.5.0, is to put
   a Hadoop2 patch in the branch's contrib directory
   (branches/1.5/contrib) that patches the POM files to support building
   against Hadoop2. (Acknowledgement to Keith for suggesting this
   solution.)
  
   3. A third option is to fork Accumulo, and maintain two separate
   builds (a more traditional technique). This adds merging nightmare for
   features/patches, but gets around some reflection hacks that we may
   have been motivated to do in the past. I'm not a fan of this option,
   particularly because I don't want to replicate the fork nightmare that
   has been the history of early Hadoop itself.
  
   4. The last option is to do nothing and to continue to build with the
   separate profiles as we are, and make users discover and specify
   transitive dependencies entirely on their own. I think this is the
   worst option, as it essentially amounts to ignore the problem.
  
   At the very least, it does not seem reasonable to complete
   ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
  
   Thoughts? Discussion? Vote on option?
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-15 Thread Adam Fuchs

It seems like the ideal option would be to have one binary build that
determines Hadoop version and switches appropriately at runtime. Has anyone
attempted to do this yet, and do we have an enumeration of the places in
Accumulo code where the incompatibilities show up?

One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext
switching between an abstract class and an interface. This can be fixed
with something to the effect of:

  public static Configuration getConfiguration(JobContext context) {
Impl impl = new Impl();
Configuration configuration = null;
try {
  Class c =
TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext);
  Method m = c.getMethod(getConfiguration);
  Object o = m.invoke(context, new Object[0]);
  configuration = (Configuration)o;
} catch (Exception e) {
  throw new RuntimeException(e);
}
return configuration;
  }

Based on a test I just ran, using that getConfiguration method instead of
just calling the getConfiguration method on context should avoid the one
incompatibility. Maybe with a couple more changes like that we can get down
to one bytecode release for all known Hadoop versions?

Adam

Re: Hadoop 2 compatibility issues

2013-05-15 Thread Christopher

So, I think that'd be great, if it works, but who is willing to do
this work and get it in before I make another RC?
I'd like to cut RC3 tomorrow if I have time. So, feel free to patch
these in to get it to work before then... or, by the next RC if RC3
fails to pass a vote.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs afu...@apache.org wrote:
 It seems like the ideal option would be to have one binary build that
 determines Hadoop version and switches appropriately at runtime. Has anyone
 attempted to do this yet, and do we have an enumeration of the places in
 Accumulo code where the incompatibilities show up?

 One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext
 switching between an abstract class and an interface. This can be fixed
 with something to the effect of:

   public static Configuration getConfiguration(JobContext context) {
 Impl impl = new Impl();
 Configuration configuration = null;
 try {
   Class c =
 TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext);
   Method m = c.getMethod(getConfiguration);
   Object o = m.invoke(context, new Object[0]);
   configuration = (Configuration)o;
 } catch (Exception e) {
   throw new RuntimeException(e);
 }
 return configuration;
   }

 Based on a test I just ran, using that getConfiguration method instead of
 just calling the getConfiguration method on context should avoid the one
 incompatibility. Maybe with a couple more changes like that we can get down
 to one bytecode release for all known Hadoop versions?

 Adam

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Sean Busbey

If a user is referencing any of the Hadoop classes, aren't they supposed to
add a dependency on the appropriate Hadoop artifact anyways?

FWIW, option 4 is what Avro does. Their discussion:

https://issues.apache.org/jira/browse/AVRO-1170




On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for 1.5.0.
 This solution also gets us separate binaries for separate supported
 versions, which is useful.

 2. A second option, and the preferred one I think for 1.5.0, is to put
 a Hadoop2 patch in the branch's contrib directory
 (branches/1.5/contrib) that patches the POM files to support building
 against Hadoop2. (Acknowledgement to Keith for suggesting this
 solution.)

 3. A third option is to fork Accumulo, and maintain two separate
 builds (a more traditional technique). This adds merging nightmare for
 features/patches, but gets around some reflection hacks that we may
 have been motivated to do in the past. I'm not a fan of this option,
 particularly because I don't want to replicate the fork nightmare that
 has been the history of early Hadoop itself.

 4. The last option is to do nothing and to continue to build with the
 separate profiles as we are, and make users discover and specify
 transitive dependencies entirely on their own. I think this is the
 worst option, as it essentially amounts to ignore the problem.

 At the very least, it does not seem reasonable to complete
 ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

 Thoughts? Discussion? Vote on option?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii




-- 
Sean Busbey
Solutions Architect
Cloudera, Inc.
Phone: MAN-VS-BEARD

Re: Hadoop 2 compatibility issues

2013-05-14 Thread John Vines

I'm an advocate of option 4. You say that it's ignoring the problem,
whereas I think it's waiting until we have the time to solve the problem
correctly. Your reasoning for this is for standardizing for maven
conventions, but the other options, while more 'correct' from a maven
standpoint or a larger headache for our user base and ourselves. In either
case, we're going to be breaking some sort of convention, and while it's
not good, we should be doing the one that's less bad for US. The important
thing here, now, is that the poms work and we should go with the method
that leaves the work minimal for our end users to utilize them.

I do agree that 1. is the correct option in the long run. More
specifically, I think it boils down to having a single module compatibility
layer, which is how hbase deals with this issue. But like you said, we
don't have the time to engineer a proper solution. So let sleeping dogs lie
and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
cycles to do it right.


On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for 1.5.0.
 This solution also gets us separate binaries for separate supported
 versions, which is useful.

 2. A second option, and the preferred one I think for 1.5.0, is to put
 a Hadoop2 patch in the branch's contrib directory
 (branches/1.5/contrib) that patches the POM files to support building
 against Hadoop2. (Acknowledgement to Keith for suggesting this
 solution.)

 3. A third option is to fork Accumulo, and maintain two separate
 builds (a more traditional technique). This adds merging nightmare for
 features/patches, but gets around some reflection hacks that we may
 have been motivated to do in the past. I'm not a fan of this option,
 particularly because I don't want to replicate the fork nightmare that
 has been the history of early Hadoop itself.

 4. The last option is to do nothing and to continue to build with the
 separate profiles as we are, and make users discover and specify
 transitive dependencies entirely on their own. I think this is the
 worst option, as it essentially amounts to ignore the problem.

 At the very least, it does not seem reasonable to complete
 ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

 Thoughts? Discussion? Vote on option?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Benson Margulies

CXF does (4) for the various competing JAX-WS implementations.

The different options are API-compatible, and the profiles just switch
the deps around.

There would be slightly more Maven correctness in marking the deps
optional, forcing each user to pick one explicitly.

However, (4) with good doc on what to put in the POM is really not a
cause for shame. Maven is weak in this area, and it's all tradeoffs.



On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
 I'm an advocate of option 4. You say that it's ignoring the problem,
 whereas I think it's waiting until we have the time to solve the problem
 correctly. Your reasoning for this is for standardizing for maven
 conventions, but the other options, while more 'correct' from a maven
 standpoint or a larger headache for our user base and ourselves. In either
 case, we're going to be breaking some sort of convention, and while it's
 not good, we should be doing the one that's less bad for US. The important
 thing here, now, is that the poms work and we should go with the method
 that leaves the work minimal for our end users to utilize them.

 I do agree that 1. is the correct option in the long run. More
 specifically, I think it boils down to having a single module compatibility
 layer, which is how hbase deals with this issue. But like you said, we
 don't have the time to engineer a proper solution. So let sleeping dogs lie
 and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
 cycles to do it right.


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for 1.5.0.
 This solution also gets us separate binaries for separate supported
 versions, which is useful.

 2. A second option, and the preferred one I think for 1.5.0, is to put
 a Hadoop2 patch in the branch's contrib directory
 (branches/1.5/contrib) that patches the POM files to support building
 against Hadoop2. (Acknowledgement to Keith for suggesting this
 solution.)

 3. A third option is to fork Accumulo, and maintain two separate
 builds (a more traditional technique). This adds merging nightmare for
 features/patches, but gets around some reflection hacks that we may
 have been motivated to do in the past. I'm not a fan of this option,
 particularly because I don't want to replicate the fork nightmare that
 has been the history of early Hadoop itself.

 4. The last option is to do nothing and to continue to build with the
 separate profiles as we are, and make users discover and specify
 transitive dependencies entirely on their own. I think this is the
 worst option, as it essentially amounts to ignore the problem.

 At the very least, it does not seem reasonable to complete
 ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

 Thoughts? Discussion? Vote on option?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Adam Fuchs

I tend to agree with Sean, John, and Benson. Option 4 works for now, and
until we can define something that works better (e.g. runtime compatibility
with both hadoop 1 and 2 using reflection and crazy class loaders) we
should not delay the release. Good docs are always helpful where
engineering is less than ideal (egad, I hope I didn't just volunteer!).

Adam


On Tue, May 14, 2013 at 5:16 PM, Benson Margulies bimargul...@gmail.comwrote:

 CXF does (4) for the various competing JAX-WS implementations.

 The different options are API-compatible, and the profiles just switch
 the deps around.

 There would be slightly more Maven correctness in marking the deps
 optional, forcing each user to pick one explicitly.

 However, (4) with good doc on what to put in the POM is really not a
 cause for shame. Maven is weak in this area, and it's all tradeoffs.



 On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
  I'm an advocate of option 4. You say that it's ignoring the problem,
  whereas I think it's waiting until we have the time to solve the problem
  correctly. Your reasoning for this is for standardizing for maven
  conventions, but the other options, while more 'correct' from a maven
  standpoint or a larger headache for our user base and ourselves. In
 either
  case, we're going to be breaking some sort of convention, and while it's
  not good, we should be doing the one that's less bad for US. The
 important
  thing here, now, is that the poms work and we should go with the method
  that leaves the work minimal for our end users to utilize them.
 
  I do agree that 1. is the correct option in the long run. More
  specifically, I think it boils down to having a single module
 compatibility
  layer, which is how hbase deals with this issue. But like you said, we
  don't have the time to engineer a proper solution. So let sleeping dogs
 lie
  and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
  cycles to do it right.
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are workarounds, but none of them are really pleasant.
 
  1. The best way to support both major Hadoop APIs is to have separate
  modules with separate dependencies directly in the POM. This is a fair
  amount of work, and in my opinion, would be too disruptive for 1.5.0.
  This solution also gets us separate binaries for separate supported
  versions, which is useful.
 
  2. A second option, and the preferred one I think for 1.5.0, is to put
  a Hadoop2 patch in the branch's contrib directory
  (branches/1.5/contrib) that patches the POM files to support building
  against Hadoop2. (Acknowledgement to Keith for suggesting this
  solution.)
 
  3. A third option is to fork Accumulo, and maintain two separate
  builds (a more traditional technique). This adds merging nightmare for
  features/patches, but gets around some reflection hacks that we may
  have been motivated to do in the past. I'm not a fan of this option,
  particularly because I don't want to replicate the fork nightmare that
  has been the history of early Hadoop itself.
 
  4. The last option is to do nothing and to continue to build with the
  separate profiles as we are, and make users discover and specify
  transitive dependencies entirely on their own. I think this is the
  worst option, as it essentially amounts to ignore the problem.
 
  At the very least, it does not seem reasonable to complete
  ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
 
  Thoughts? Discussion? Vote on option?
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Christopher

I think Option 2 is the best solution for waiting until we have the
time to solve the problem correctly, as it ensures that transitive
dependencies work for the stable version of Hadoop, and using Hadoop2
is a very simple documentation issue for how to apply the patch and
rebuild. Option 4 doesn't wait... it explicitly introduces a problem
for users.

Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0.


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
 I'm an advocate of option 4. You say that it's ignoring the problem,
 whereas I think it's waiting until we have the time to solve the problem
 correctly. Your reasoning for this is for standardizing for maven
 conventions, but the other options, while more 'correct' from a maven
 standpoint or a larger headache for our user base and ourselves. In either
 case, we're going to be breaking some sort of convention, and while it's
 not good, we should be doing the one that's less bad for US. The important
 thing here, now, is that the poms work and we should go with the method
 that leaves the work minimal for our end users to utilize them.

 I do agree that 1. is the correct option in the long run. More
 specifically, I think it boils down to having a single module compatibility
 layer, which is how hbase deals with this issue. But like you said, we
 don't have the time to engineer a proper solution. So let sleeping dogs lie
 and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
 cycles to do it right.


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for 1.5.0.
 This solution also gets us separate binaries for separate supported
 versions, which is useful.

 2. A second option, and the preferred one I think for 1.5.0, is to put
 a Hadoop2 patch in the branch's contrib directory
 (branches/1.5/contrib) that patches the POM files to support building
 against Hadoop2. (Acknowledgement to Keith for suggesting this
 solution.)

 3. A third option is to fork Accumulo, and maintain two separate
 builds (a more traditional technique). This adds merging nightmare for
 features/patches, but gets around some reflection hacks that we may
 have been motivated to do in the past. I'm not a fan of this option,
 particularly because I don't want to replicate the fork nightmare that
 has been the history of early Hadoop itself.

 4. The last option is to do nothing and to continue to build with the
 separate profiles as we are, and make users discover and specify
 transitive dependencies entirely on their own. I think this is the
 worst option, as it essentially amounts to ignore the problem.

 At the very least, it does not seem reasonable to complete
 ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

 Thoughts? Discussion? Vote on option?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Benson Margulies

I am a maven developer, and I'm offering this advice based on my
understanding of reason why that generic advice is offered.

If you have different profiles that _build different results_ but all
deliver the same GAV, you have chaos.

If you have different profiles that test against different versions of
dependencies, but all deliver the same byte code at the end of the
day, you don't have chaos.



On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote:
 I think it's interesting that Option 4 seems to be most preferred...
 because it's the *only* option that is explicitly advised against by
 the Maven developers (from the information I've read). I can see its
 appeal, but I really don't think that we should introduce an explicit
 problem for users (that applies to users using even the Hadoop version
 we directly build against... not just those using Hadoop 2... I don't
 know if that point was clear), to only partially support a version of
 Hadoop that is still alpha and has never had a stable release.

 BTW, Option 4 was how I had have achieved a solution for
 ACCUMULO-1402, but am reluctant to apply that patch, with this issue
 outstanding, as it may exacerbate the problem.

 Another implication for Option 4 (the current solution) is for
 1.6.0, with the planned accumulo-maven-plugin... because it means that
 the accumulo-maven-plugin will need to be configured like this:
 plugin
   groupIdorg.apache.accumulo/groupId
   artifactIdaccumulo-maven-plugin/artifactId
   dependencies
... all the required hadoop 1 dependencies to make the plugin work,
 even though this version only works against hadoop 1 anyway...
   /dependencies
   ...
 /plugin

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote:
 I think Option 2 is the best solution for waiting until we have the
 time to solve the problem correctly, as it ensures that transitive
 dependencies work for the stable version of Hadoop, and using Hadoop2
 is a very simple documentation issue for how to apply the patch and
 rebuild. Option 4 doesn't wait... it explicitly introduces a problem
 for users.

 Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0.


 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
 I'm an advocate of option 4. You say that it's ignoring the problem,
 whereas I think it's waiting until we have the time to solve the problem
 correctly. Your reasoning for this is for standardizing for maven
 conventions, but the other options, while more 'correct' from a maven
 standpoint or a larger headache for our user base and ourselves. In either
 case, we're going to be breaking some sort of convention, and while it's
 not good, we should be doing the one that's less bad for US. The important
 thing here, now, is that the poms work and we should go with the method
 that leaves the work minimal for our end users to utilize them.

 I do agree that 1. is the correct option in the long run. More
 specifically, I think it boils down to having a single module compatibility
 layer, which is how hbase deals with this issue. But like you said, we
 don't have the time to engineer a proper solution. So let sleeping dogs lie
 and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
 cycles to do it right.


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for

Re: Hadoop 2 compatibility issues

2013-05-14 Thread John Vines

We can easily fix the break it the hadoop dependencies by making the switch
to hadoop-client and relying on hadoop.version to set/override the version.
The hadoop 2 profile is just needed to bring in additional dependencies and
possibly setting the hadoop version for convenience.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 5:48 PM, Christopher ctubb...@apache.org wrote:

 I think it's interesting that Option 4 seems to be most preferred...
 because it's the *only* option that is explicitly advised against by
 the Maven developers (from the information I've read). I can see its
 appeal, but I really don't think that we should introduce an explicit
 problem for users (that applies to users using even the Hadoop version
 we directly build against... not just those using Hadoop 2... I don't
 know if that point was clear), to only partially support a version of
 Hadoop that is still alpha and has never had a stable release.

 BTW, Option 4 was how I had have achieved a solution for
 ACCUMULO-1402, but am reluctant to apply that patch, with this issue
 outstanding, as it may exacerbate the problem.

 Another implication for Option 4 (the current solution) is for
 1.6.0, with the planned accumulo-maven-plugin... because it means that
 the accumulo-maven-plugin will need to be configured like this:
 plugin
   groupIdorg.apache.accumulo/groupId
   artifactIdaccumulo-maven-plugin/artifactId
   dependencies
... all the required hadoop 1 dependencies to make the plugin work,
 even though this version only works against hadoop 1 anyway...
   /dependencies
   ...
 /plugin

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote:
  I think Option 2 is the best solution for waiting until we have the
  time to solve the problem correctly, as it ensures that transitive
  dependencies work for the stable version of Hadoop, and using Hadoop2
  is a very simple documentation issue for how to apply the patch and
  rebuild. Option 4 doesn't wait... it explicitly introduces a problem
  for users.
 
  Option 1 is how I'm tentatively thinking about fixing it properly in
 1.6.0.
 
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
  I'm an advocate of option 4. You say that it's ignoring the problem,
  whereas I think it's waiting until we have the time to solve the problem
  correctly. Your reasoning for this is for standardizing for maven
  conventions, but the other options, while more 'correct' from a maven
  standpoint or a larger headache for our user base and ourselves. In
 either
  case, we're going to be breaking some sort of convention, and while it's
  not good, we should be doing the one that's less bad for US. The
 important
  thing here, now, is that the poms work and we should go with the method
  that leaves the work minimal for our end users to utilize them.
 
  I do agree that 1. is the correct option in the long run. More
  specifically, I think it boils down to having a single module
 compatibility
  layer, which is how hbase deals with this issue. But like you said, we
  don't have the time to engineer a proper solution. So let sleeping dogs
 lie
  and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
  cycles to do it right.
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are workarounds, but none of them are really pleasant.
 
  1. The best way to support both major Hadoop APIs is to have separate
  modules with separate dependencies directly in the POM. This is a fair
  amount of work, and in my opinion, would be too disruptive for

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Keith Turner

One note about option 4.  When using 1.4 users have to include hadoop core
as a dependency in their pom. This must be done because the 1.4 Accumulo
pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
the profile are provided?


On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

 So, I've run into a problem with ACCUMULO-1402 that requires a larger
 discussion about how Accumulo 1.5.0 should support Hadoop2.

 The problem is basically that profiles should not contain
 dependencies, because profiles don't get activated transitively. A
 slide deck by the Maven developers point this out as a bad practice...
 yet it's a practice we rely on for our current implementation of
 Hadoop2 support
 (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
 slide 80).

 What this means is that even if we go through the work of publishing
 binary artifacts compiled against Hadoop2, neither our Hadoop1
 binaries or our Hadoop2 binaries will be able to transitively resolve
 any dependencies defined in profiles. This has significant
 implications to user code that depends on Accumulo Maven artifacts.
 Every user will essentially have to explicitly add Hadoop dependencies
 for every Accumulo artifact that has dependencies on Hadoop, either
 because we directly or transitively depend on Hadoop (they'll have to
 peek into the profiles in our POMs and copy/paste the profile into
 their project). This becomes more complicated when we consider how
 users will try to use things like Instamo.

 There are workarounds, but none of them are really pleasant.

 1. The best way to support both major Hadoop APIs is to have separate
 modules with separate dependencies directly in the POM. This is a fair
 amount of work, and in my opinion, would be too disruptive for 1.5.0.
 This solution also gets us separate binaries for separate supported
 versions, which is useful.

 2. A second option, and the preferred one I think for 1.5.0, is to put
 a Hadoop2 patch in the branch's contrib directory
 (branches/1.5/contrib) that patches the POM files to support building
 against Hadoop2. (Acknowledgement to Keith for suggesting this
 solution.)

 3. A third option is to fork Accumulo, and maintain two separate
 builds (a more traditional technique). This adds merging nightmare for
 features/patches, but gets around some reflection hacks that we may
 have been motivated to do in the past. I'm not a fan of this option,
 particularly because I don't want to replicate the fork nightmare that
 has been the history of early Hadoop itself.

 4. The last option is to do nothing and to continue to build with the
 separate profiles as we are, and make users discover and specify
 transitive dependencies entirely on their own. I think this is the
 worst option, as it essentially amounts to ignore the problem.

 At the very least, it does not seem reasonable to complete
 ACCUMULO-1402 for 1.5.0, given the complexity of this issue.

 Thoughts? Discussion? Vote on option?

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues - tangent

2013-05-14 Thread John Vines

On that note, I was wondering if there were any suggestions for how to deal
with the laundry list of provided dependencies that Accumulo core has?
Writing packages against it is a bit ugly if not using the accumulo script
to start. Are there any maven utilities to automatically dissect provided
dependencies and make them included.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote:

 One note about option 4.  When using 1.4 users have to include hadoop core
 as a dependency in their pom. This must be done because the 1.4 Accumulo
 pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
 the profile are provided?


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are workarounds, but none of them are really pleasant.
 
  1. The best way to support both major Hadoop APIs is to have separate
  modules with separate dependencies directly in the POM. This is a fair
  amount of work, and in my opinion, would be too disruptive for 1.5.0.
  This solution also gets us separate binaries for separate supported
  versions, which is useful.
 
  2. A second option, and the preferred one I think for 1.5.0, is to put
  a Hadoop2 patch in the branch's contrib directory
  (branches/1.5/contrib) that patches the POM files to support building
  against Hadoop2. (Acknowledgement to Keith for suggesting this
  solution.)
 
  3. A third option is to fork Accumulo, and maintain two separate
  builds (a more traditional technique). This adds merging nightmare for
  features/patches, but gets around some reflection hacks that we may
  have been motivated to do in the past. I'm not a fan of this option,
  particularly because I don't want to replicate the fork nightmare that
  has been the history of early Hadoop itself.
 
  4. The last option is to do nothing and to continue to build with the
  separate profiles as we are, and make users discover and specify
  transitive dependencies entirely on their own. I think this is the
  worst option, as it essentially amounts to ignore the problem.
 
  At the very least, it does not seem reasonable to complete
  ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
 
  Thoughts? Discussion? Vote on option?
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Keith Turner

On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.comwrote:

 I am a maven developer, and I'm offering this advice based on my
 understanding of reason why that generic advice is offered.

 If you have different profiles that _build different results_ but all
 deliver the same GAV, you have chaos.


What GAV are we currently producing for hadoop 1 and hadoop 2?



 If you have different profiles that test against different versions of
 dependencies, but all deliver the same byte code at the end of the
 day, you don't have chaos.



 On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote:
  I think it's interesting that Option 4 seems to be most preferred...
  because it's the *only* option that is explicitly advised against by
  the Maven developers (from the information I've read). I can see its
  appeal, but I really don't think that we should introduce an explicit
  problem for users (that applies to users using even the Hadoop version
  we directly build against... not just those using Hadoop 2... I don't
  know if that point was clear), to only partially support a version of
  Hadoop that is still alpha and has never had a stable release.
 
  BTW, Option 4 was how I had have achieved a solution for
  ACCUMULO-1402, but am reluctant to apply that patch, with this issue
  outstanding, as it may exacerbate the problem.
 
  Another implication for Option 4 (the current solution) is for
  1.6.0, with the planned accumulo-maven-plugin... because it means that
  the accumulo-maven-plugin will need to be configured like this:
  plugin
groupIdorg.apache.accumulo/groupId
artifactIdaccumulo-maven-plugin/artifactId
dependencies
 ... all the required hadoop 1 dependencies to make the plugin work,
  even though this version only works against hadoop 1 anyway...
/dependencies
...
  /plugin
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
 wrote:
  I think Option 2 is the best solution for waiting until we have the
  time to solve the problem correctly, as it ensures that transitive
  dependencies work for the stable version of Hadoop, and using Hadoop2
  is a very simple documentation issue for how to apply the patch and
  rebuild. Option 4 doesn't wait... it explicitly introduces a problem
  for users.
 
  Option 1 is how I'm tentatively thinking about fixing it properly in
 1.6.0.
 
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
  I'm an advocate of option 4. You say that it's ignoring the problem,
  whereas I think it's waiting until we have the time to solve the
 problem
  correctly. Your reasoning for this is for standardizing for maven
  conventions, but the other options, while more 'correct' from a maven
  standpoint or a larger headache for our user base and ourselves. In
 either
  case, we're going to be breaking some sort of convention, and while
 it's
  not good, we should be doing the one that's less bad for US. The
 important
  thing here, now, is that the poms work and we should go with the method
  that leaves the work minimal for our end users to utilize them.
 
  I do agree that 1. is the correct option in the long run. More
  specifically, I think it boils down to having a single module
 compatibility
  layer, which is how hbase deals with this issue. But like you said, we
  don't have the time to engineer a proper solution. So let sleeping
 dogs lie
  and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the
  cycles to do it right.
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Sean Busbey

This is part of my thinking. All of the dependencies included in the
profiles for Avro are marked provided. Provided scope, by definition, is
not transitive. Thus, it doesn't really matter that they aren't transitive
*also* because of the profile.

Is Accumulo including anything other than things provided by either Hadoop
1 or 2?



On Tue, May 14, 2013 at 6:08 PM, Keith Turner ke...@deenlo.com wrote:

 One note about option 4.  When using 1.4 users have to include hadoop core
 as a dependency in their pom. This must be done because the 1.4 Accumulo
 pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
 the profile are provided?


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are workarounds, but none of them are really pleasant.
 
  1. The best way to support both major Hadoop APIs is to have separate
  modules with separate dependencies directly in the POM. This is a fair
  amount of work, and in my opinion, would be too disruptive for 1.5.0.
  This solution also gets us separate binaries for separate supported
  versions, which is useful.
 
  2. A second option, and the preferred one I think for 1.5.0, is to put
  a Hadoop2 patch in the branch's contrib directory
  (branches/1.5/contrib) that patches the POM files to support building
  against Hadoop2. (Acknowledgement to Keith for suggesting this
  solution.)
 
  3. A third option is to fork Accumulo, and maintain two separate
  builds (a more traditional technique). This adds merging nightmare for
  features/patches, but gets around some reflection hacks that we may
  have been motivated to do in the past. I'm not a fan of this option,
  particularly because I don't want to replicate the fork nightmare that
  has been the history of early Hadoop itself.
 
  4. The last option is to do nothing and to continue to build with the
  separate profiles as we are, and make users discover and specify
  transitive dependencies entirely on their own. I think this is the
  worst option, as it essentially amounts to ignore the problem.
 
  At the very least, it does not seem reasonable to complete
  ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
 
  Thoughts? Discussion? Vote on option?
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 




-- 
Sean

Re: Hadoop 2 compatibility issues

2013-05-14 Thread John Vines

They're the same currently. I was requesting separate gavs for hadoop 2.
It's been on the mailing list and jira.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:

 On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com
 wrote:

  I am a maven developer, and I'm offering this advice based on my
  understanding of reason why that generic advice is offered.
 
  If you have different profiles that _build different results_ but all
  deliver the same GAV, you have chaos.
 

 What GAV are we currently producing for hadoop 1 and hadoop 2?


 
  If you have different profiles that test against different versions of
  dependencies, but all deliver the same byte code at the end of the
  day, you don't have chaos.
 
 
 
  On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
 wrote:
   I think it's interesting that Option 4 seems to be most preferred...
   because it's the *only* option that is explicitly advised against by
   the Maven developers (from the information I've read). I can see its
   appeal, but I really don't think that we should introduce an explicit
   problem for users (that applies to users using even the Hadoop version
   we directly build against... not just those using Hadoop 2... I don't
   know if that point was clear), to only partially support a version of
   Hadoop that is still alpha and has never had a stable release.
  
   BTW, Option 4 was how I had have achieved a solution for
   ACCUMULO-1402, but am reluctant to apply that patch, with this issue
   outstanding, as it may exacerbate the problem.
  
   Another implication for Option 4 (the current solution) is for
   1.6.0, with the planned accumulo-maven-plugin... because it means that
   the accumulo-maven-plugin will need to be configured like this:
   plugin
 groupIdorg.apache.accumulo/groupId
 artifactIdaccumulo-maven-plugin/artifactId
 dependencies
  ... all the required hadoop 1 dependencies to make the plugin work,
   even though this version only works against hadoop 1 anyway...
 /dependencies
 ...
   /plugin
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
  wrote:
   I think Option 2 is the best solution for waiting until we have the
   time to solve the problem correctly, as it ensures that transitive
   dependencies work for the stable version of Hadoop, and using Hadoop2
   is a very simple documentation issue for how to apply the patch and
   rebuild. Option 4 doesn't wait... it explicitly introduces a problem
   for users.
  
   Option 1 is how I'm tentatively thinking about fixing it properly in
  1.6.0.
  
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
   I'm an advocate of option 4. You say that it's ignoring the problem,
   whereas I think it's waiting until we have the time to solve the
  problem
   correctly. Your reasoning for this is for standardizing for maven
   conventions, but the other options, while more 'correct' from a maven
   standpoint or a larger headache for our user base and ourselves. In
  either
   case, we're going to be breaking some sort of convention, and while
  it's
   not good, we should be doing the one that's less bad for US. The
  important
   thing here, now, is that the poms work and we should go with the
 method
   that leaves the work minimal for our end users to utilize them.
  
   I do agree that 1. is the correct option in the long run. More
   specifically, I think it boils down to having a single module
  compatibility
   layer, which is how hbase deals with this issue. But like you said,
 we
   don't have the time to engineer a proper solution. So let sleeping
  dogs lie
   and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
 the
   cycles to do it right.
  
  
   On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
  wrote:
  
   So, I've run into a problem with ACCUMULO-1402 that requires a
 larger
   discussion about how Accumulo 1.5.0 should support Hadoop2.
  
   The problem is basically that profiles should not contain
   dependencies, because profiles don't get activated transitively. A
   slide deck by the Maven developers point this out as a bad
 practice...
   yet it's a practice we rely on for our current implementation of
   Hadoop2 support
   (
 http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
   slide 80).
  
   What this means is that even if we go through the work of publishing
   binary artifacts compiled against Hadoop2, neither our Hadoop1
   binaries or our Hadoop2 binaries will be able to transitively
 resolve
   any dependencies defined in profiles. This has significant
   implications to user code that depends on Accumulo Maven artifacts.
   Every user will essentially have to explicitly add Hadoop

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Josh Elser

I'm not sure what the best solution would be, but I'd easily assume 
any worthwhile solution would extend the 1.5.0 release date even farther 
than I'd be happy about. So, by that stance, I'm for #4 or another quick 
fix, even if it does perpetuate some sort of hack.


On 05/14/2013 07:09 PM, Benson Margulies wrote:

I just doesn't make very much sense to me to have two different GAV's
for the very same .class files, just to get different dependencies in
the poms. However, if someone really wanted that, I'd look to make
some scripting that created this downstream from the main build.
This makes sense to me. Although, I don't know exactly how one would go 
about doing this, I trust Benson enough not to throw something 
non-feasible at us :)

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Christopher

Benson-

They produce different byte-code. That's why we're even considering
this. ACCUMULO-1402 is the ticket under which our intent is to add
classifiers, so that they can be distinguished.

All-

To Keith's point, I think perhaps all this concern is a non-issue...
because as Keith points out, the dependencies in question are marked
as provided, and dependency resolution doesn't occur for provided
dependencies anyway... so even if we leave off the profiles, we're in
the same boat. Maybe not the boat we should be in... but certainly not
a sinking one as I had first imagined. It's as afloat as it was
before, when they were not in a profile, but still marked as
provided.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote:
 I just doesn't make very much sense to me to have two different GAV's
 for the very same .class files, just to get different dependencies in
 the poms. However, if someone really wanted that, I'd look to make
 some scripting that created this downstream from the main build.


 On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
 They're the same currently. I was requesting separate gavs for hadoop 2.
 It's been on the mailing list and jira.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:

 On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com
 wrote:

  I am a maven developer, and I'm offering this advice based on my
  understanding of reason why that generic advice is offered.
 
  If you have different profiles that _build different results_ but all
  deliver the same GAV, you have chaos.
 

 What GAV are we currently producing for hadoop 1 and hadoop 2?


 
  If you have different profiles that test against different versions of
  dependencies, but all deliver the same byte code at the end of the
  day, you don't have chaos.
 
 
 
  On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
 wrote:
   I think it's interesting that Option 4 seems to be most preferred...
   because it's the *only* option that is explicitly advised against by
   the Maven developers (from the information I've read). I can see its
   appeal, but I really don't think that we should introduce an explicit
   problem for users (that applies to users using even the Hadoop version
   we directly build against... not just those using Hadoop 2... I don't
   know if that point was clear), to only partially support a version of
   Hadoop that is still alpha and has never had a stable release.
  
   BTW, Option 4 was how I had have achieved a solution for
   ACCUMULO-1402, but am reluctant to apply that patch, with this issue
   outstanding, as it may exacerbate the problem.
  
   Another implication for Option 4 (the current solution) is for
   1.6.0, with the planned accumulo-maven-plugin... because it means that
   the accumulo-maven-plugin will need to be configured like this:
   plugin
 groupIdorg.apache.accumulo/groupId
 artifactIdaccumulo-maven-plugin/artifactId
 dependencies
  ... all the required hadoop 1 dependencies to make the plugin work,
   even though this version only works against hadoop 1 anyway...
 /dependencies
 ...
   /plugin
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
  wrote:
   I think Option 2 is the best solution for waiting until we have the
   time to solve the problem correctly, as it ensures that transitive
   dependencies work for the stable version of Hadoop, and using Hadoop2
   is a very simple documentation issue for how to apply the patch and
   rebuild. Option 4 doesn't wait... it explicitly introduces a problem
   for users.
  
   Option 1 is how I'm tentatively thinking about fixing it properly in
  1.6.0.
  
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote:
   I'm an advocate of option 4. You say that it's ignoring the problem,
   whereas I think it's waiting until we have the time to solve the
  problem
   correctly. Your reasoning for this is for standardizing for maven
   conventions, but the other options, while more 'correct' from a maven
   standpoint or a larger headache for our user base and ourselves. In
  either
   case, we're going to be breaking some sort of convention, and while
  it's
   not good, we should be doing the one that's less bad for US. The
  important
   thing here, now, is that the poms work and we should go with the
 method
   that leaves the work minimal for our end users to utilize them.
  
   I do agree that 1. is the correct option in the long run. More
   specifically, I think it boils down to having a single module
  compatibility
   layer, which is how hbase deals with this issue. But like you said,
 we
   don't have the time to engineer

Re: Hadoop 2 compatibility issues

2013-05-14 Thread John Vines

Sorry for the dupe Benson, meant to reply all

Oh no Benson, the compiled code is different. The fundamental issue is that
some interfaces got changes to abstract classes or vice versa. The source
is the same, but class files are different.
Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:09 PM, Benson Margulies bimargul...@gmail.com wrote:

 I just doesn't make very much sense to me to have two different GAV's
 for the very same .class files, just to get different dependencies in
 the poms. However, if someone really wanted that, I'd look to make
 some scripting that created this downstream from the main build.


 On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
  They're the same currently. I was requesting separate gavs for hadoop 2.
  It's been on the mailing list and jira.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:
 
  On Tue, May 14, 2013 at 5:51 PM, Benson Margulies 
 bimargul...@gmail.com
  wrote:
 
   I am a maven developer, and I'm offering this advice based on my
   understanding of reason why that generic advice is offered.
  
   If you have different profiles that _build different results_ but all
   deliver the same GAV, you have chaos.
  
 
  What GAV are we currently producing for hadoop 1 and hadoop 2?
 
 
  
   If you have different profiles that test against different versions of
   dependencies, but all deliver the same byte code at the end of the
   day, you don't have chaos.
  
  
  
   On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
  wrote:
I think it's interesting that Option 4 seems to be most preferred...
because it's the *only* option that is explicitly advised against by
the Maven developers (from the information I've read). I can see its
appeal, but I really don't think that we should introduce an
 explicit
problem for users (that applies to users using even the Hadoop
 version
we directly build against... not just those using Hadoop 2... I
 don't
know if that point was clear), to only partially support a version
 of
Hadoop that is still alpha and has never had a stable release.
   
BTW, Option 4 was how I had have achieved a solution for
ACCUMULO-1402, but am reluctant to apply that patch, with this issue
outstanding, as it may exacerbate the problem.
   
Another implication for Option 4 (the current solution) is for
1.6.0, with the planned accumulo-maven-plugin... because it means
 that
the accumulo-maven-plugin will need to be configured like this:
plugin
  groupIdorg.apache.accumulo/groupId
  artifactIdaccumulo-maven-plugin/artifactId
  dependencies
   ... all the required hadoop 1 dependencies to make the plugin
 work,
even though this version only works against hadoop 1 anyway...
  /dependencies
  ...
/plugin
   
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
   
   
On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
   wrote:
I think Option 2 is the best solution for waiting until we have
 the
time to solve the problem correctly, as it ensures that transitive
dependencies work for the stable version of Hadoop, and using
 Hadoop2
is a very simple documentation issue for how to apply the patch and
rebuild. Option 4 doesn't wait... it explicitly introduces a
 problem
for users.
   
Option 1 is how I'm tentatively thinking about fixing it properly
 in
   1.6.0.
   
   
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
   
   
On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org
 wrote:
I'm an advocate of option 4. You say that it's ignoring the
 problem,
whereas I think it's waiting until we have the time to solve the
   problem
correctly. Your reasoning for this is for standardizing for maven
conventions, but the other options, while more 'correct' from a
 maven
standpoint or a larger headache for our user base and ourselves.
 In
   either
case, we're going to be breaking some sort of convention, and
 while
   it's
not good, we should be doing the one that's less bad for US. The
   important
thing here, now, is that the poms work and we should go with the
  method
that leaves the work minimal for our end users to utilize them.
   
I do agree that 1. is the correct option in the long run. More
specifically, I think it boils down to having a single module
   compatibility
layer, which is how hbase deals with this issue. But like you
 said,
  we
don't have the time to engineer a proper solution. So let sleeping
   dogs lie
and we can revamp the whole system for 1.5.1 or 1.6.0 when we have
  the
cycles to do it right.
   
   
On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 
   wrote:
   
So, I've run into a problem with ACCUMULO-1402 that requires a
  larger

Re: Hadoop 2 compatibility issues

2013-05-14 Thread John Vines

We've written the code such that it works in either, and then we have
profiles which set the hadoop.version for convenience. The profiles also
alternate between using hadoop-client and hadoop-core, but as I mentioned
above, that is unnecessary.

Sent from my phone, please pardon the typos and brevity.
On May 14, 2013 7:42 PM, Benson Margulies bimargul...@gmail.com wrote:

 On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote:
  Benson-
 
  They produce different byte-code. That's why we're even considering
  this. ACCUMULO-1402 is the ticket under which our intent is to add
  classifiers, so that they can be distinguished.

 whoops, missed that.

 Then how do people succeed in just fixing up their dependencies and using
 it?

 In any case, speaking as a Maven-maven, classifiers are absolutely,
 positively, a cure worse than the disease. If you want the details
 just ask.

 
  All-
 
  To Keith's point, I think perhaps all this concern is a non-issue...
  because as Keith points out, the dependencies in question are marked
  as provided, and dependency resolution doesn't occur for provided
  dependencies anyway... so even if we leave off the profiles, we're in
  the same boat. Maybe not the boat we should be in... but certainly not
  a sinking one as I had first imagined. It's as afloat as it was
  before, when they were not in a profile, but still marked as
  provided.
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com
 wrote:
  I just doesn't make very much sense to me to have two different GAV's
  for the very same .class files, just to get different dependencies in
  the poms. However, if someone really wanted that, I'd look to make
  some scripting that created this downstream from the main build.
 
 
  On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
  They're the same currently. I was requesting separate gavs for hadoop
 2.
  It's been on the mailing list and jira.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:
 
  On Tue, May 14, 2013 at 5:51 PM, Benson Margulies 
 bimargul...@gmail.com
  wrote:
 
   I am a maven developer, and I'm offering this advice based on my
   understanding of reason why that generic advice is offered.
  
   If you have different profiles that _build different results_ but
 all
   deliver the same GAV, you have chaos.
  
 
  What GAV are we currently producing for hadoop 1 and hadoop 2?
 
 
  
   If you have different profiles that test against different versions
 of
   dependencies, but all deliver the same byte code at the end of the
   day, you don't have chaos.
  
  
  
   On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
  wrote:
I think it's interesting that Option 4 seems to be most
 preferred...
because it's the *only* option that is explicitly advised against
 by
the Maven developers (from the information I've read). I can see
 its
appeal, but I really don't think that we should introduce an
 explicit
problem for users (that applies to users using even the Hadoop
 version
we directly build against... not just those using Hadoop 2... I
 don't
know if that point was clear), to only partially support a
 version of
Hadoop that is still alpha and has never had a stable release.
   
BTW, Option 4 was how I had have achieved a solution for
ACCUMULO-1402, but am reluctant to apply that patch, with this
 issue
outstanding, as it may exacerbate the problem.
   
Another implication for Option 4 (the current solution) is for
1.6.0, with the planned accumulo-maven-plugin... because it means
 that
the accumulo-maven-plugin will need to be configured like this:
plugin
  groupIdorg.apache.accumulo/groupId
  artifactIdaccumulo-maven-plugin/artifactId
  dependencies
   ... all the required hadoop 1 dependencies to make the plugin
 work,
even though this version only works against hadoop 1 anyway...
  /dependencies
  ...
/plugin
   
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
   
   
On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
 
   wrote:
I think Option 2 is the best solution for waiting until we have
 the
time to solve the problem correctly, as it ensures that
 transitive
dependencies work for the stable version of Hadoop, and using
 Hadoop2
is a very simple documentation issue for how to apply the patch
 and
rebuild. Option 4 doesn't wait... it explicitly introduces a
 problem
for users.
   
Option 1 is how I'm tentatively thinking about fixing it
 properly in
   1.6.0.
   
   
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
   
   
On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org
 wrote:
I'm an advocate of option 4. You say that it's ignoring the
 problem,

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Christopher

Response to Benson inline, but additional note here:

It should be noted that the situation will be made worse for the
solution I was considering for ACCUMULO-1402, which would move the
accumulo artifacts, classified by the hadoop2 variant, into the
profiles... meaning they will no longer resolve transitively when they
did before. Can go into details on that ticket, if needed.

On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com wrote:
 On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote:
 Benson-

 They produce different byte-code. That's why we're even considering
 this. ACCUMULO-1402 is the ticket under which our intent is to add
 classifiers, so that they can be distinguished.

 whoops, missed that.

 Then how do people succeed in just fixing up their dependencies and using it?

The specific differences are things like changes from abstract class
to an interface. Apparently an import of these do not produce
compatible byte-code, even though the method signature looks the same.

 In any case, speaking as a Maven-maven, classifiers are absolutely,
 positively, a cure worse than the disease. If you want the details
 just ask.

Agreed. I just don't see a good alternative here.


 All-

 To Keith's point, I think perhaps all this concern is a non-issue...
 because as Keith points out, the dependencies in question are marked
 as provided, and dependency resolution doesn't occur for provided
 dependencies anyway... so even if we leave off the profiles, we're in
 the same boat. Maybe not the boat we should be in... but certainly not
 a sinking one as I had first imagined. It's as afloat as it was
 before, when they were not in a profile, but still marked as
 provided.

 --
 Christopher L Tubbs II
 http://gravatar.com/ctubbsii


 On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I just doesn't make very much sense to me to have two different GAV's
 for the very same .class files, just to get different dependencies in
 the poms. However, if someone really wanted that, I'd look to make
 some scripting that created this downstream from the main build.


 On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
 They're the same currently. I was requesting separate gavs for hadoop 2.
 It's been on the mailing list and jira.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:

 On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com
 wrote:

  I am a maven developer, and I'm offering this advice based on my
  understanding of reason why that generic advice is offered.
 
  If you have different profiles that _build different results_ but all
  deliver the same GAV, you have chaos.
 

 What GAV are we currently producing for hadoop 1 and hadoop 2?


 
  If you have different profiles that test against different versions of
  dependencies, but all deliver the same byte code at the end of the
  day, you don't have chaos.
 
 
 
  On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
 wrote:
   I think it's interesting that Option 4 seems to be most preferred...
   because it's the *only* option that is explicitly advised against by
   the Maven developers (from the information I've read). I can see its
   appeal, but I really don't think that we should introduce an explicit
   problem for users (that applies to users using even the Hadoop version
   we directly build against... not just those using Hadoop 2... I don't
   know if that point was clear), to only partially support a version of
   Hadoop that is still alpha and has never had a stable release.
  
   BTW, Option 4 was how I had have achieved a solution for
   ACCUMULO-1402, but am reluctant to apply that patch, with this issue
   outstanding, as it may exacerbate the problem.
  
   Another implication for Option 4 (the current solution) is for
   1.6.0, with the planned accumulo-maven-plugin... because it means that
   the accumulo-maven-plugin will need to be configured like this:
   plugin
 groupIdorg.apache.accumulo/groupId
 artifactIdaccumulo-maven-plugin/artifactId
 dependencies
  ... all the required hadoop 1 dependencies to make the plugin work,
   even though this version only works against hadoop 1 anyway...
 /dependencies
 ...
   /plugin
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii
  
  
   On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org
  wrote:
   I think Option 2 is the best solution for waiting until we have the
   time to solve the problem correctly, as it ensures that transitive
   dependencies work for the stable version of Hadoop, and using Hadoop2
   is a very simple documentation issue for how to apply the patch and
   rebuild. Option 4 doesn't wait... it explicitly introduces a problem
   for users.
  
   Option 1 is how I'm tentatively thinking about fixing it properly in
  1.6.0.
  
  
   --

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Benson Margulies

Maven will malfunction in various entertaining ways if you try to
change the GAV of the output of the build using a profile.

Maven will malfunction in various entertaining ways if you use
classifiers on real-live-JAR files that get used as
real-live-dependencies, because it has no concept of a
pom-per-classifier.

Where does this leave you/us? (I'm not sure that I've earned an 'us'
recently around here.)

First, I note that 'Apache releases are source releases'. So, one
resort of scoundrels here would be to support only one hadoop in the
convenience binaries that get pushed to Maven Central, and let other
hadoop users take the source release and build for themselves.

Second, I am reduced to suggesting an elaboration of the build in
which some tool edits poms and runs builds. The maven-invoker-plugin
could be used to run that, but a plain old script in a plain old
language might be less painful.

I appreciate that this may not be an appealing contribution to where
things are, but it might be the best of the evil choices.


On Tue, May 14, 2013 at 7:50 PM, John Vines vi...@apache.org wrote:
 The compiled code is compiled code. There are no concerns of dependency
 resolution. So I see no issues in using the profile to define the gav if
 that is feasible.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 7:47 PM, Christopher ctubb...@apache.org wrote:

 Response to Benson inline, but additional note here:

 It should be noted that the situation will be made worse for the
 solution I was considering for ACCUMULO-1402, which would move the
 accumulo artifacts, classified by the hadoop2 variant, into the
 profiles... meaning they will no longer resolve transitively when they
 did before. Can go into details on that ticket, if needed.

 On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com
 wrote:
  On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org
 wrote:
  Benson-
 
  They produce different byte-code. That's why we're even considering
  this. ACCUMULO-1402 is the ticket under which our intent is to add
  classifiers, so that they can be distinguished.
 
  whoops, missed that.
 
  Then how do people succeed in just fixing up their dependencies and
 using it?

 The specific differences are things like changes from abstract class
 to an interface. Apparently an import of these do not produce
 compatible byte-code, even though the method signature looks the same.

  In any case, speaking as a Maven-maven, classifiers are absolutely,
  positively, a cure worse than the disease. If you want the details
  just ask.

 Agreed. I just don't see a good alternative here.

 
  All-
 
  To Keith's point, I think perhaps all this concern is a non-issue...
  because as Keith points out, the dependencies in question are marked
  as provided, and dependency resolution doesn't occur for provided
  dependencies anyway... so even if we leave off the profiles, we're in
  the same boat. Maybe not the boat we should be in... but certainly not
  a sinking one as I had first imagined. It's as afloat as it was
  before, when they were not in a profile, but still marked as
  provided.
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 7:09 PM, Benson Margulies 
 bimargul...@gmail.com wrote:
  I just doesn't make very much sense to me to have two different GAV's
  for the very same .class files, just to get different dependencies in
  the poms. However, if someone really wanted that, I'd look to make
  some scripting that created this downstream from the main build.
 
 
  On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
  They're the same currently. I was requesting separate gavs for hadoop
 2.
  It's been on the mailing list and jira.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:
 
  On Tue, May 14, 2013 at 5:51 PM, Benson Margulies 
 bimargul...@gmail.com
  wrote:
 
   I am a maven developer, and I'm offering this advice based on my
   understanding of reason why that generic advice is offered.
  
   If you have different profiles that _build different results_ but
 all
   deliver the same GAV, you have chaos.
  
 
  What GAV are we currently producing for hadoop 1 and hadoop 2?
 
 
  
   If you have different profiles that test against different
 versions of
   dependencies, but all deliver the same byte code at the end of the
   day, you don't have chaos.
  
  
  
   On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org
  wrote:
I think it's interesting that Option 4 seems to be most
 preferred...
because it's the *only* option that is explicitly advised
 against by
the Maven developers (from the information I've read). I can see
 its
appeal, but I really don't think that we should introduce an
 explicit
problem for users (that applies to users using even the Hadoop
 version
we directly build against...

Re: Hadoop 2 compatibility issues

2013-05-14 Thread Christopher

I'm very much partial to the First option, as it's far less effort
for approximately the same value (in my opinion, but in light of the
enthusiasm above for hadoop2, I could be very wrong on my assessment
of the value).

I'm going to upload a patch to ACCUMULO-1402 soon (tiny polishing
left), to demonstrate a way to push redundant jars, with an extra
classifier (though I still have to build twice, to avoid
maven-invoker-plugin complexity) for hadoop2-compatible binaries. If
you don't mind, I'll tag you with a request to review that patch, as
I'd like more details about the classifier issues you mention, in
context.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 8:27 PM, Benson Margulies bimargul...@gmail.com wrote:
 Maven will malfunction in various entertaining ways if you try to
 change the GAV of the output of the build using a profile.

 Maven will malfunction in various entertaining ways if you use
 classifiers on real-live-JAR files that get used as
 real-live-dependencies, because it has no concept of a
 pom-per-classifier.

 Where does this leave you/us? (I'm not sure that I've earned an 'us'
 recently around here.)

 First, I note that 'Apache releases are source releases'. So, one
 resort of scoundrels here would be to support only one hadoop in the
 convenience binaries that get pushed to Maven Central, and let other
 hadoop users take the source release and build for themselves.

 Second, I am reduced to suggesting an elaboration of the build in
 which some tool edits poms and runs builds. The maven-invoker-plugin
 could be used to run that, but a plain old script in a plain old
 language might be less painful.

 I appreciate that this may not be an appealing contribution to where
 things are, but it might be the best of the evil choices.


 On Tue, May 14, 2013 at 7:50 PM, John Vines vi...@apache.org wrote:
 The compiled code is compiled code. There are no concerns of dependency
 resolution. So I see no issues in using the profile to define the gav if
 that is feasible.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 7:47 PM, Christopher ctubb...@apache.org wrote:

 Response to Benson inline, but additional note here:

 It should be noted that the situation will be made worse for the
 solution I was considering for ACCUMULO-1402, which would move the
 accumulo artifacts, classified by the hadoop2 variant, into the
 profiles... meaning they will no longer resolve transitively when they
 did before. Can go into details on that ticket, if needed.

 On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com
 wrote:
  On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org
 wrote:
  Benson-
 
  They produce different byte-code. That's why we're even considering
  this. ACCUMULO-1402 is the ticket under which our intent is to add
  classifiers, so that they can be distinguished.
 
  whoops, missed that.
 
  Then how do people succeed in just fixing up their dependencies and
 using it?

 The specific differences are things like changes from abstract class
 to an interface. Apparently an import of these do not produce
 compatible byte-code, even though the method signature looks the same.

  In any case, speaking as a Maven-maven, classifiers are absolutely,
  positively, a cure worse than the disease. If you want the details
  just ask.

 Agreed. I just don't see a good alternative here.

 
  All-
 
  To Keith's point, I think perhaps all this concern is a non-issue...
  because as Keith points out, the dependencies in question are marked
  as provided, and dependency resolution doesn't occur for provided
  dependencies anyway... so even if we leave off the profiles, we're in
  the same boat. Maybe not the boat we should be in... but certainly not
  a sinking one as I had first imagined. It's as afloat as it was
  before, when they were not in a profile, but still marked as
  provided.
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii
 
 
  On Tue, May 14, 2013 at 7:09 PM, Benson Margulies 
 bimargul...@gmail.com wrote:
  I just doesn't make very much sense to me to have two different GAV's
  for the very same .class files, just to get different dependencies in
  the poms. However, if someone really wanted that, I'd look to make
  some scripting that created this downstream from the main build.
 
 
  On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote:
  They're the same currently. I was requesting separate gavs for hadoop
 2.
  It's been on the mailing list and jira.
 
  Sent from my phone, please pardon the typos and brevity.
  On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote:
 
  On Tue, May 14, 2013 at 5:51 PM, Benson Margulies 
 bimargul...@gmail.com
  wrote:
 
   I am a maven developer, and I'm offering this advice based on my
   understanding of reason why that generic advice is offered.
  
   If you have different profiles that _build different results_ but
 all

Re: Hadoop 2 compatibility issues - tangent

2013-05-14 Thread David Medinets

You can have maven generate a file with the classpath dependencies and also
make a shaded jar. I use the classpath file for normal java processes and
the shaded jar file with 'hadoop jar'.


On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote:

 On that note, I was wondering if there were any suggestions for how to deal
 with the laundry list of provided dependencies that Accumulo core has?
 Writing packages against it is a bit ugly if not using the accumulo script
 to start. Are there any maven utilities to automatically dissect provided
 dependencies and make them included.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote:

  One note about option 4.  When using 1.4 users have to include hadoop
 core
  as a dependency in their pom. This must be done because the 1.4 Accumulo
  pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps
 in
  the profile are provided?
 
 
  On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org
 wrote:
 
   So, I've run into a problem with ACCUMULO-1402 that requires a larger
   discussion about how Accumulo 1.5.0 should support Hadoop2.
  
   The problem is basically that profiles should not contain
   dependencies, because profiles don't get activated transitively. A
   slide deck by the Maven developers point this out as a bad practice...
   yet it's a practice we rely on for our current implementation of
   Hadoop2 support
   (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
   slide 80).
  
   What this means is that even if we go through the work of publishing
   binary artifacts compiled against Hadoop2, neither our Hadoop1
   binaries or our Hadoop2 binaries will be able to transitively resolve
   any dependencies defined in profiles. This has significant
   implications to user code that depends on Accumulo Maven artifacts.
   Every user will essentially have to explicitly add Hadoop dependencies
   for every Accumulo artifact that has dependencies on Hadoop, either
   because we directly or transitively depend on Hadoop (they'll have to
   peek into the profiles in our POMs and copy/paste the profile into
   their project). This becomes more complicated when we consider how
   users will try to use things like Instamo.
  
   There are workarounds, but none of them are really pleasant.
  
   1. The best way to support both major Hadoop APIs is to have separate
   modules with separate dependencies directly in the POM. This is a fair
   amount of work, and in my opinion, would be too disruptive for 1.5.0.
   This solution also gets us separate binaries for separate supported
   versions, which is useful.
  
   2. A second option, and the preferred one I think for 1.5.0, is to put
   a Hadoop2 patch in the branch's contrib directory
   (branches/1.5/contrib) that patches the POM files to support building
   against Hadoop2. (Acknowledgement to Keith for suggesting this
   solution.)
  
   3. A third option is to fork Accumulo, and maintain two separate
   builds (a more traditional technique). This adds merging nightmare for
   features/patches, but gets around some reflection hacks that we may
   have been motivated to do in the past. I'm not a fan of this option,
   particularly because I don't want to replicate the fork nightmare that
   has been the history of early Hadoop itself.
  
   4. The last option is to do nothing and to continue to build with the
   separate profiles as we are, and make users discover and specify
   transitive dependencies entirely on their own. I think this is the
   worst option, as it essentially amounts to ignore the problem.
  
   At the very least, it does not seem reasonable to complete
   ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
  
   Thoughts? Discussion? Vote on option?
  
   --
   Christopher L Tubbs II
   http://gravatar.com/ctubbsii

Re: Hadoop 2 compatibility issues - tangent

2013-05-14 Thread Christopher

With the right configuration, you could use the copy-dependencies goal
of the maven-dependency-plugin to gather your dependencies to one
place.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote:
 On that note, I was wondering if there were any suggestions for how to deal
 with the laundry list of provided dependencies that Accumulo core has?
 Writing packages against it is a bit ugly if not using the accumulo script
 to start. Are there any maven utilities to automatically dissect provided
 dependencies and make them included.

 Sent from my phone, please pardon the typos and brevity.
 On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote:

 One note about option 4.  When using 1.4 users have to include hadoop core
 as a dependency in their pom. This must be done because the 1.4 Accumulo
 pom marks hadoop-core as provided.  So maybe option 4 is ok if the deps in
 the profile are provided?


 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote:

  So, I've run into a problem with ACCUMULO-1402 that requires a larger
  discussion about how Accumulo 1.5.0 should support Hadoop2.
 
  The problem is basically that profiles should not contain
  dependencies, because profiles don't get activated transitively. A
  slide deck by the Maven developers point this out as a bad practice...
  yet it's a practice we rely on for our current implementation of
  Hadoop2 support
  (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven
  slide 80).
 
  What this means is that even if we go through the work of publishing
  binary artifacts compiled against Hadoop2, neither our Hadoop1
  binaries or our Hadoop2 binaries will be able to transitively resolve
  any dependencies defined in profiles. This has significant
  implications to user code that depends on Accumulo Maven artifacts.
  Every user will essentially have to explicitly add Hadoop dependencies
  for every Accumulo artifact that has dependencies on Hadoop, either
  because we directly or transitively depend on Hadoop (they'll have to
  peek into the profiles in our POMs and copy/paste the profile into
  their project). This becomes more complicated when we consider how
  users will try to use things like Instamo.
 
  There are workarounds, but none of them are really pleasant.
 
  1. The best way to support both major Hadoop APIs is to have separate
  modules with separate dependencies directly in the POM. This is a fair
  amount of work, and in my opinion, would be too disruptive for 1.5.0.
  This solution also gets us separate binaries for separate supported
  versions, which is useful.
 
  2. A second option, and the preferred one I think for 1.5.0, is to put
  a Hadoop2 patch in the branch's contrib directory
  (branches/1.5/contrib) that patches the POM files to support building
  against Hadoop2. (Acknowledgement to Keith for suggesting this
  solution.)
 
  3. A third option is to fork Accumulo, and maintain two separate
  builds (a more traditional technique). This adds merging nightmare for
  features/patches, but gets around some reflection hacks that we may
  have been motivated to do in the past. I'm not a fan of this option,
  particularly because I don't want to replicate the fork nightmare that
  has been the history of early Hadoop itself.
 
  4. The last option is to do nothing and to continue to build with the
  separate profiles as we are, and make users discover and specify
  transitive dependencies entirely on their own. I think this is the
  worst option, as it essentially amounts to ignore the problem.
 
  At the very least, it does not seem reasonable to complete
  ACCUMULO-1402 for 1.5.0, given the complexity of this issue.
 
  Thoughts? Discussion? Vote on option?
 
  --
  Christopher L Tubbs II
  http://gravatar.com/ctubbsii

Re: Hadoop Summit Community Choice

2013-03-05 Thread Billie Rinaldi

On Tue, Mar 5, 2013 at 6:37 AM, Jim Klucar klu...@gmail.com wrote:

 The Hadoop Summit is coming up in San Jose this summer (
 http://hadoopsummit.org/san-jose/ ), and they just released abstracts for
 a Community Choice vote.  Community voting plays a role in what abstracts
 are selected to be presented at the conference. From what I saw, there are
 two Accumulo related abstracts proposed in the Enterprise Data
 Architecture track. If you're so inclined, please vote on them to help
 spread Accumulo.


 http://hadoopsummit2013.uservoice.com/forums/196821-enterprise-data-architecture?query=accumulo

 Full disclosure, I submitted the Clojure one, and I guess its time to show
 what I've been up to so stay tuned. Sorry for spamming both lists, but I
 know that not everyone subscribes to all the lists.


I submitted the other Accumulo talk.  If you vote for it, I'll have to
start researching and documenting the real differences between HBase and
Accumulo and their effects ...

Billie




 Jim

Re: hadoop classpath causing an exception (sub-command not defined?)

2012-12-14 Thread David Medinets

It looks to me like the change of Nov 21, 2012 added the 'hadoop
classpath' call to the accumulo script.

  ACCUMULO-708 initial implementation of VFS class loader …
  git-svn-id: https://svn.apache.org/repos/asf/accumulo/trunk@1412398
13f79535-47bb-0310-9956-ffa450edef68
  Dave Marion authored 23 days ago

Could the classpath sub-command be part of a newer version (0.20.2) of hadoop?

On Fri, Dec 14, 2012 at 12:18 AM, John Vines jvi...@gmail.com wrote:
 I didn't think hadoop had a classpath argument, just Accumulo.

 Sent from my phone, please pardon the typos and brevity.
 On Dec 13, 2012 10:43 PM, David Medinets david.medin...@gmail.com wrote:

 I am at a loss to explain what I am seeing. I have installed Accumulo
 many times without a hitch. But today, I am running into a problem
 getting the hadoop classpath.

 $ /usr/local/hadoop/bin/hadoop
 Usage: hadoop [--config confdir] COMMAND
 where COMMAND is one of:
   namenode -format format the DFS filesystem
   secondarynamenoderun the DFS secondary namenode
   namenode run the DFS namenode
   datanode run a DFS datanode
   dfsadmin run a DFS admin client
   mradmin  run a Map-Reduce admin client
   fsck run a DFS filesystem checking utility
   fs   run a generic filesystem user client
   balancer run a cluster balancing utility
   jobtracker   run the MapReduce job Tracker node
   pipesrun a Pipes job
   tasktracker  run a MapReduce task Tracker node
   job  manipulate MapReduce jobs
   queueget information regarding JobQueues
   version  print the version
   jar jarrun a jar file
   distcp srcurl desturl copy file or directories recursively
   archive -archiveName NAME src* dest create a hadoop archive
   daemonlogget/set the log level for each daemon
  or
   CLASSNAMErun the class named CLASSNAME
 Most commands print help when invoked w/o parameters.

 I am using using the following version of hadoop:

 $ hadoop version
 Hadoop 0.20.2
 Subversion
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
 -r 911707
 Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010

 Inside the accumulo script is the line:

 HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`

 This line results in the following exception:

 $ $HADOOP_HOME/bin/hadoop classpath
 Exception in thread main java.lang.NoClassDefFoundError: classpath
 Caused by: java.lang.ClassNotFoundException: classpath
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 Could not find the main class: classpath. Program will exit.

 Am I missing something basic? What?

Re: hadoop classpath causing an exception (sub-command not defined?)

2012-12-14 Thread David Medinets

Should we add an hadoop version check to the accumulo script?

On Fri, Dec 14, 2012 at 7:45 AM, Jason Trost jason.tr...@gmail.com wrote:
 We saw the same issue recently. We upgraded our dev nodes to hadoop 1.1.1
 and it fixed this issue. I'm not sure when class path was added to the
 hadoop command so a minor upgrade may work too.

 --Jason

 sent from my DROID
 On Dec 14, 2012 7:34 AM, David Medinets david.medin...@gmail.com wrote:

 It looks to me like the change of Nov 21, 2012 added the 'hadoop
 classpath' call to the accumulo script.

   ACCUMULO-708 initial implementation of VFS class loader …
   git-svn-id: https://svn.apache.org/repos/asf/accumulo/trunk@1412398
 13f79535-47bb-0310-9956-ffa450edef68
   Dave Marion authored 23 days ago

 Could the classpath sub-command be part of a newer version (0.20.2) of
 hadoop?

 On Fri, Dec 14, 2012 at 12:18 AM, John Vines jvi...@gmail.com wrote:
  I didn't think hadoop had a classpath argument, just Accumulo.
 
  Sent from my phone, please pardon the typos and brevity.
  On Dec 13, 2012 10:43 PM, David Medinets david.medin...@gmail.com
 wrote:
 
  I am at a loss to explain what I am seeing. I have installed Accumulo
  many times without a hitch. But today, I am running into a problem
  getting the hadoop classpath.
 
  $ /usr/local/hadoop/bin/hadoop
  Usage: hadoop [--config confdir] COMMAND
  where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenoderun the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin  run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs   run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker   run the MapReduce job Tracker node
pipesrun a Pipes job
tasktracker  run a MapReduce task Tracker node
job  manipulate MapReduce jobs
queueget information regarding JobQueues
version  print the version
jar jarrun a jar file
distcp srcurl desturl copy file or directories recursively
archive -archiveName NAME src* dest create a hadoop archive
daemonlogget/set the log level for each daemon
   or
CLASSNAMErun the class named CLASSNAME
  Most commands print help when invoked w/o parameters.
 
  I am using using the following version of hadoop:
 
  $ hadoop version
  Hadoop 0.20.2
  Subversion
  https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20
  -r 911707
  Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010
 
  Inside the accumulo script is the line:
 
  HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
 
  This line results in the following exception:
 
  $ $HADOOP_HOME/bin/hadoop classpath
  Exception in thread main java.lang.NoClassDefFoundError: classpath
  Caused by: java.lang.ClassNotFoundException: classpath
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
  Could not find the main class: classpath. Program will exit.
 
  Am I missing something basic? What?

Re: Hadoop Summit

2012-11-21 Thread Jason Trost

I wish, at this point it looks like no for me.

On Wed, Nov 21, 2012 at 8:48 AM, Billie Rinaldi bil...@apache.org wrote:

 Is anyone thinking about going to the Hadoop Summit in Amsterdam in March?
 http://hadoopsummit.org/amsterdam
 I'm thinking of proposing a talk on improvements in Accumulo 1.5.

 Billie

75 matches

Mail list logo