Re: Hadoop Metrics2 and JMX
Thank you all very much for the responses! On Wed, Oct 12, 2022 at 2:06 PM Dave Marion wrote: > Looking at [1], specifically the overview section, I think they are the > same metrics just accessible via JMX instead of configuring a sink. > > [1] > > https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/metrics2/package-summary.html > > On Wed, Oct 12, 2022 at 1:39 PM Christopher wrote: > > > I don't think we're doing anything special to publish to JMX. I think > this > > is something that is a feature of Hadoop Metrics2 that we're simply > > enabling. So, this might be a question for the Hadoop general mailing > list > > if nobody knows the answer here. > > > > On Wed, Oct 12, 2022 at 1:06 PM Logan Jones > wrote: > > > > > Hello: > > > > > > I'm trying to figure out more about the metrics coming out of Accumulo > > > 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system > > and > > > sending that to influxDB. In theory, I could also look at the JMX > > metrics. > > > > > > Are the JMX metrics a superset of what comes out of Hadoop Metrics2? > > > > > > Thanks in advance, > > > > > > - Logan > > > > > >
Re: Hadoop Metrics2 and JMX
Looking at [1], specifically the overview section, I think they are the same metrics just accessible via JMX instead of configuring a sink. [1] https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/metrics2/package-summary.html On Wed, Oct 12, 2022 at 1:39 PM Christopher wrote: > I don't think we're doing anything special to publish to JMX. I think this > is something that is a feature of Hadoop Metrics2 that we're simply > enabling. So, this might be a question for the Hadoop general mailing list > if nobody knows the answer here. > > On Wed, Oct 12, 2022 at 1:06 PM Logan Jones wrote: > > > Hello: > > > > I'm trying to figure out more about the metrics coming out of Accumulo > > 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system > and > > sending that to influxDB. In theory, I could also look at the JMX > metrics. > > > > Are the JMX metrics a superset of what comes out of Hadoop Metrics2? > > > > Thanks in advance, > > > > - Logan > > >
Re: Hadoop Metrics2 and JMX
I don't think we're doing anything special to publish to JMX. I think this is something that is a feature of Hadoop Metrics2 that we're simply enabling. So, this might be a question for the Hadoop general mailing list if nobody knows the answer here. On Wed, Oct 12, 2022 at 1:06 PM Logan Jones wrote: > Hello: > > I'm trying to figure out more about the metrics coming out of Accumulo > 1.9.3 and 1.10.2. I'm currently configuring the hadoop metrics 2 system and > sending that to influxDB. In theory, I could also look at the JMX metrics. > > Are the JMX metrics a superset of what comes out of Hadoop Metrics2? > > Thanks in advance, > > - Logan >
Re: Hadoop
Well, I wouldn't be surprised with some issues across a fedup (which is a way to do an upgrade between Fedora versions), but it should have been stable with normal/routine yum/dnf upgrades. Were you using the Fedora-provided packages, or the BigTop ones? Or another set? On Thu, Jun 2, 2016 at 5:04 PM Corey Noletwrote: > This may not be directly related but I've noticed Hadoop packages have been > not uninstalling/updating well the past year or so. The last couple times > I've run fedup, I've had to go back in manually and remove/update a bunch > of the Hadoop packages like Zookeeper and Parquet. > > On Thu, Jun 2, 2016 at 4:59 PM, Christopher > wrote: > > > That first post was intended for the Fedora developer list. Apologies for > > sending to the wrong list. > > > > If anybody is curious, it seems the Fedora community support around > Hadoop > > and Big Data is really dying... the packager for Flume and HTrace has > > abandoned their efforts to package for Fedora, and now it looks like the > > Hadoop package maintainer abandoned Hadoop, leaving Accumulo with > > unsatisfied dependencies. This is actually kind of a sad state of > affairs, > > because better packaging downstream could really help users, and expose > > more ways to improve the upstream products. > > > > As it stands, I think there is a disconnect between the upstream > > communities and the downstream packagers in the Big Data space which > > includes Accumulo. I would love to see more interest in better packaging > > for downstream users through these existing downstream packager > communities > > (Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see > > more volunteers come from these downstream communities to make > improvements > > upstream. > > > > As an upstream community, I believe the responsibility is for us to reach > > down first, rather than wait for them to come to us. I've tried to do > that > > within Fedora, with the hope that others would follow for the downstream > > communities they care about. Unfortunately, things haven't turned out how > > I'd have preferred, but I'm still hopeful. If there is anybody interested > > in downstream community packaging, let me know if I can help you get > > started. > > > > On Thu, Jun 2, 2016 at 4:28 PM Christopher > > wrote: > > > > > Sorry, wrong list. > > > > > > On Thu, Jun 2, 2016 at 4:20 PM Christopher > > > > wrote: > > > > > >> So, it would seem at some point, without me noticing (certainly my > > fault, > > >> for not paying attention enough), the Hadoop packages got orphaned > > and/or > > >> retired? in Fedora. > > >> > > >> This is a big problem for me, because the main package I work on is > > >> dependent upon Hadoop. > > >> > > >> What's the state of Hadoop in Fedora these days? Are there packaging > > >> problems? Not enough support from upstream Apache community? Missing > > >> dependencies in Fedora? Not enough time to work on it? No interest > from > > >> users? > > >> > > >> Whatever the issue is... I'd like to help wherever I can... I'd like > to > > >> keep this stuff going. > > >> > > > > > >
Re: Hadoop
This may not be directly related but I've noticed Hadoop packages have been not uninstalling/updating well the past year or so. The last couple times I've run fedup, I've had to go back in manually and remove/update a bunch of the Hadoop packages like Zookeeper and Parquet. On Thu, Jun 2, 2016 at 4:59 PM, Christopherwrote: > That first post was intended for the Fedora developer list. Apologies for > sending to the wrong list. > > If anybody is curious, it seems the Fedora community support around Hadoop > and Big Data is really dying... the packager for Flume and HTrace has > abandoned their efforts to package for Fedora, and now it looks like the > Hadoop package maintainer abandoned Hadoop, leaving Accumulo with > unsatisfied dependencies. This is actually kind of a sad state of affairs, > because better packaging downstream could really help users, and expose > more ways to improve the upstream products. > > As it stands, I think there is a disconnect between the upstream > communities and the downstream packagers in the Big Data space which > includes Accumulo. I would love to see more interest in better packaging > for downstream users through these existing downstream packager communities > (Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see > more volunteers come from these downstream communities to make improvements > upstream. > > As an upstream community, I believe the responsibility is for us to reach > down first, rather than wait for them to come to us. I've tried to do that > within Fedora, with the hope that others would follow for the downstream > communities they care about. Unfortunately, things haven't turned out how > I'd have preferred, but I'm still hopeful. If there is anybody interested > in downstream community packaging, let me know if I can help you get > started. > > On Thu, Jun 2, 2016 at 4:28 PM Christopher > wrote: > > > Sorry, wrong list. > > > > On Thu, Jun 2, 2016 at 4:20 PM Christopher > > wrote: > > > >> So, it would seem at some point, without me noticing (certainly my > fault, > >> for not paying attention enough), the Hadoop packages got orphaned > and/or > >> retired? in Fedora. > >> > >> This is a big problem for me, because the main package I work on is > >> dependent upon Hadoop. > >> > >> What's the state of Hadoop in Fedora these days? Are there packaging > >> problems? Not enough support from upstream Apache community? Missing > >> dependencies in Fedora? Not enough time to work on it? No interest from > >> users? > >> > >> Whatever the issue is... I'd like to help wherever I can... I'd like to > >> keep this stuff going. > >> > > >
Re: Hadoop
That first post was intended for the Fedora developer list. Apologies for sending to the wrong list. If anybody is curious, it seems the Fedora community support around Hadoop and Big Data is really dying... the packager for Flume and HTrace has abandoned their efforts to package for Fedora, and now it looks like the Hadoop package maintainer abandoned Hadoop, leaving Accumulo with unsatisfied dependencies. This is actually kind of a sad state of affairs, because better packaging downstream could really help users, and expose more ways to improve the upstream products. As it stands, I think there is a disconnect between the upstream communities and the downstream packagers in the Big Data space which includes Accumulo. I would love to see more interest in better packaging for downstream users through these existing downstream packager communities (Homebrew, Fedora, Debian, EPEL, Ubuntu, etc.), and I would love to see more volunteers come from these downstream communities to make improvements upstream. As an upstream community, I believe the responsibility is for us to reach down first, rather than wait for them to come to us. I've tried to do that within Fedora, with the hope that others would follow for the downstream communities they care about. Unfortunately, things haven't turned out how I'd have preferred, but I'm still hopeful. If there is anybody interested in downstream community packaging, let me know if I can help you get started. On Thu, Jun 2, 2016 at 4:28 PM Christopherwrote: > Sorry, wrong list. > > On Thu, Jun 2, 2016 at 4:20 PM Christopher > wrote: > >> So, it would seem at some point, without me noticing (certainly my fault, >> for not paying attention enough), the Hadoop packages got orphaned and/or >> retired? in Fedora. >> >> This is a big problem for me, because the main package I work on is >> dependent upon Hadoop. >> >> What's the state of Hadoop in Fedora these days? Are there packaging >> problems? Not enough support from upstream Apache community? Missing >> dependencies in Fedora? Not enough time to work on it? No interest from >> users? >> >> Whatever the issue is... I'd like to help wherever I can... I'd like to >> keep this stuff going. >> >
Re: Hadoop
Sorry, wrong list. On Thu, Jun 2, 2016 at 4:20 PM Christopherwrote: > So, it would seem at some point, without me noticing (certainly my fault, > for not paying attention enough), the Hadoop packages got orphaned and/or > retired? in Fedora. > > This is a big problem for me, because the main package I work on is > dependent upon Hadoop. > > What's the state of Hadoop in Fedora these days? Are there packaging > problems? Not enough support from upstream Apache community? Missing > dependencies in Fedora? Not enough time to work on it? No interest from > users? > > Whatever the issue is... I'd like to help wherever I can... I'd like to > keep this stuff going. >
Re: Hadoop Summit 2015 Talk
+3 v/r Bob Thorman Principal Big Data Engineer ATT Big Data CoE 2900 W. Plano Parkway Plano, TX 75075 972-658-1714 On 2/12/15, 11:13 PM, Josh Elser josh.el...@gmail.com wrote: FYI -- Billie and I have submitted a talk to Hadoop Summit 2015 in San Jose, CA in June. http://hadoopsummit.uservoice.com/forums/283260-committer-track/suggestion s/7073993-a-year-in-the-life-of-apache-accumulo I'd be overjoyed if anyone would vote for the talk if they'd like to see it happen. Thanks! - Josh
Re: Hadoop Summit (San Jose June 3-5)
Just announced, an Accumulo Birds of a Feather session at the Hadoop Summit: http://www.meetup.com/Hadoop-Summit-Community-San-Jose/events/179840512/ It looks like we have an hour and a half, exact schedule TBD. Feel free to contact me if there is any particular content you'd like to see at this session. Billie On Mon, Apr 28, 2014 at 8:52 AM, Donald Miner dmi...@clearedgeit.comwrote: I'll be there. Is there interest in having an accumulo meetup like last year? Adam/Billie? On Mon, Apr 28, 2014 at 11:50 AM, Marc Reichman mreich...@pixelforensics.com wrote: Will anyone be there? I wouldn't mind meeting up for a drink, talk about Accumulo, projects, etc. Looking forward to coming to my first Hadoop-based conference! Marc -- Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Nov 12, 2013 at 9:22 AM, Sean Busbey bus...@clouderagovt.com wrote: On Fri, Oct 18, 2013 at 12:29 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:20 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.comwrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. Okay, barring additional suggestions, tomorrow afternoon I'll break things down into an umbrella and 3 sub tasks: 1) addition of hadoop 2 support - to include backports of commits - to include making the target hadoop 2 version 2.2.0 - to include test changes that flex hadoop 2 features like fail over 2) ensuring compatibility for 0.20.203 - presuming some subset of the commits in 1) will break it since 0.20 support was left behind in 1.5 3) doc / packaging updates - the issue of binary releases per distro - doc patch for what version(s) the release tests are expected to run against Once work is put against those tickets, I'd expect things to go into a branch based on the umbrella ticket until such time as the complete work can pass the test suite that we'll use at the next release. Then it can get rebased onto the 1.4.x dev branch. -- Sean Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? -Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 4:49 PM, Sean Busbey busbey...@clouderagovt.com wrote: On Tue, Nov 12, 2013 at 3:14 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: The language of ACCUMULO-1795 indicated that an acceptable state was something that wasn't binary compatible. That's my #1 thing to avoid. Ah. So I see, not sure why I phrased that that way. Since the default build should still be 0.20.203.0, I'm not sure how it'd end up not being binary compatible. I can update the ticket to clarify the language. Any need to compile should be limited to running Hadoop 2.2.0. Sound good? +1 (The confusing wording was the basis for my concerns also.) Maybe expressly only doing a binary convenience package for 0.20.203.0? If we need an extra package, doesn't that mean a user can't just upgrade Accumulo? By binary convenience package I mean the binary distribution tarball (or rpms, or whatevs) that we make as a part of the release process. For users of Hadoop 0.20.203.0, upgrading should be unchanged from how they would normally get their Accumulo 1.4.x distribution. ACCUMULO-1796 has some leeway about the convenience packages for people who want Hadoop 2 support. On the extreme end, they'd have to build from source and then run a normal upgrade process. I'd prefer binary compatibility with a single build, but if that's too hard to achieve, I have no objection to providing a mechanism to perform an alternate build against 2.x (whether or not we provide a pre-built binary package for it), so long as the default build is 0.20.x -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote: The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. That is precisely the intention of ACCUMULO-1790. All of the subtasks (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things to get into the 1.4 branch. Until that time the work would just go into a feature branch for ACCUMULO-1790 (to make working and testing easier for those implementing the subtasks). If you wanted to see the full implementation you would just wait until all of the subtasks were committed to the feature branch. Am I missing something? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Nope, I think we're on the same page now. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Thu, Nov 14, 2013 at 7:39 PM, Sean Busbey busbey...@clouderagovt.com wrote: On Thu, Nov 14, 2013 at 6:27 PM, Christopher ctubb...@apache.org wrote: The main thing is that I would not want to see an ACCUMULO-1790 *without* ACCUMULO-1795. Having 1792 alone would be insufficient for me. That is precisely the intention of ACCUMULO-1790. All of the subtasks (including ACCUMULO-1792 and ACCUMULO-1795) have to be complete for things to get into the 1.4 branch. Until that time the work would just go into a feature branch for ACCUMULO-1790 (to make working and testing easier for those implementing the subtasks). If you wanted to see the full implementation you would just wait until all of the subtasks were committed to the feature branch. Am I missing something? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people to upgrade Accumulo. My gut reaction would be that it allows people to completely ignore Accumulo updates (ignoring moving to 1.4.5 which would allow them to do hadoop2 with your proposed changes) Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. Personally, maintaining 0.20 compatibility is not a big concern on my radar. If you're still running an 0.20 release, I'd *really* hope that you have an upgrade path to 1.2.x (if not 2.2.x) scheduled. I think claiming that 1.5 has a higher burden on 1.4 is a bit of a fallacy. There were many problems and pains regarding WALs in =1.4 that are very difficult to work with in a large environment (try finding WALs in server failure cases). I think the increased I/O on HDFS is a much smaller cost than the completely different I/O path that the old loggers have. I also think upgrading Accumulo is much less scary than upgrading HDFS, but that's just me. To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. I don't find much merit in debating local WAL vs HDFS WAL cost/benefit since the only quantifiable evidence we have supported the move. I should note, Sean, that if you see merit in the work, you don't need community approval for forking and sharing. However, I do not think it is in the community's best interest to continue to upgrade 1.4. On Tue, Nov 12, 2013 at 2:12 PM, Josh Elser josh.el...@gmail.com wrote: Based on recent feedback on ACCUMULO-1792 and ACCUMULO-1795, I want to resurrect this thread to make sure everyone's concerns are addressed. For context, here's a link to the start of the last thread: http://bit.ly/1aPqKuH From ACCUMULO-1792, ctubbsii: I'd be reluctant to support any Hadoop 2.x support in the 1.4 release line that breaks compatibility with 0.20. I don't think breaking 0.20 and then possibly fixing it again as a second step is acceptable (because that subsequent work may not ever be done, and I don't think we should break the compatibility contract that we've established with 1.4.0). Chris, I believe keeping all of the work in a branch under the umbrella jira of ACCUMULO-1790 will ensure that we don't end up with a 1.4 release that doesn't have proper support for 0.20.203. Is there something beyond making sure the branch passes a full set of release tests on 0.20.203 that you'd like to see? In the event that the branch only ever contains the work for adding Hadoop 2, it's a simple matter to abandon without rolling into the 1.4 development line. From ACCUMULO-1795, bills (and +1ed by elserj and ctubbsii): I'm very uncomfortable with risking breaking continuity in such an old release, and I don't think managing two lines of 1.4 releases is worth the effort. Though we have no official EOL policy, 1.3 was practically dead in the water once 1.4 was around, and I hope we start encouraging more adoption of 1.5 (and soon 1.6) versus continually propping up 1.4. I'd love to get people to move off of 1.4. However, I think adding Hadoop 2 support to 1.4 encourages this more than leaving it out. I'm not sure I agree that adding Hadoop2 support to 1.4 encourages people to upgrade Accumulo. My gut reaction would be that it allows people to completely ignore Accumulo updates (ignoring moving to 1.4.5 which would allow them to do hadoop2 with your proposed changes) Accumulo 1.5.x places a higher burden on HDFS than 1.4 did, and I'm not surprised people find relying on 0.20 for the 1.5 WAL intimidating. Upgrading both HDFS and Accumulo across major versions at once is asking them to take on a bunch of risk. By adding in Hadoop 2 support to 1.4 we allow them to break the risk up into steps: they can upgrade HDFS versions first, get comfortable, then upgrade Accumulo to 1.5. Personally, maintaining 0.20 compatibility is not a big concern on my radar. If you're still running an 0.20 release, I'd *really* hope that you have an upgrade path to 1.2.x (if not 2.2.x) scheduled. I think claiming that 1.5 has a higher burden on 1.4 is a bit of a fallacy. There were many problems and pains regarding WALs in =1.4 that are very difficult to work with in a large environment (try finding WALs in server failure cases). I think the increased I/O on HDFS is a much smaller cost than the completely different I/O path that the old loggers have. I also think upgrading Accumulo is much less scary than upgrading HDFS, but that's just me. To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote: To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves. The plan outlined in the original thread, and in the subtasks under ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the 1.4 bugfix line. If there's anything we can do besides running through the release test suite on a 0.20 cluster to help ensure that, I am interested in adding it to the existing plan. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 1:28 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. Right, exactly. Hopefully no part of the original plan contradicts this. Is there something that appears to? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On 11/12/13, 12:24 PM, Sean Busbey wrote: On Tue, Nov 12, 2013 at 1:12 PM, Josh Elser josh.el...@gmail.com wrote: To me, it seems like the argument may be coming down to whether or not we break 0.20 hadoop compatibility on a bug-fix release and how concerned we are about letting users lag behind the upstream development. I think the existing tickets under the umbrella of ACCUMULO-1790 should ensure that we end up with a single 1.4 line that can work with either the existing 0.20.203.0 claimed in releases or against 2.2.0. Bill (or Josh or Chris), is there stronger language you'd like to see around docs / packaging (area #3 in the original plan and currently ACCUMULO-1796)? Maybe expressly only doing a binary convenience package for 0.20.203.0? Are you looking for something beyond a full release suite to ensure 1.4 is still maintaining compatibility on Hadoop 0.20.203? Again, my biggest concern here is not following our own guidelines of breaking changes across minor releases, but I'd hope 0.20 users have an upgrade path outlined for themselves. The plan outlined in the original thread, and in the subtasks under ACCUMULO-1790, is expressly aimed at not breaking 0.20 compatibility in the 1.4 bugfix line. If there's anything we can do besides running through the release test suite on a 0.20 cluster to help ensure that, I am interested in adding it to the existing plan. What about the other half: encouraging users to lag (soon to be) two major releases behind?
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
The language of ACCUMULO-1795 indicated that an acceptable state was something that wasn't binary compatible. That's my #1 thing to avoid. Maybe expressly only doing a binary convenience package for 0.20.203.0? If we need an extra package, doesn't that mean a user can't just upgrade Accumulo? As a side note, 0.20.203.0 is 1.4, On Tue, Nov 12, 2013 at 3:28 PM, Sean Busbey busbey...@clouderagovt.comwrote: On Tue, Nov 12, 2013 at 1:28 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: A user of 1.4.a should be able to move to 1.4.b without any major infrastructure changes, such as swapping out HDFS or installing extra add-ons. Right, exactly. Hopefully no part of the original plan contradicts this. Is there something that appears to? -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Nov 12, 2013 at 2:48 PM, Josh Elser josh.el...@gmail.com wrote: What about the other half: encouraging users to lag (soon to be) two major releases behind? I don't think our current user base needs to be encouraged strongly to upgrade. And as I said previously I think this change provides them with an upgrade path that's easier to stomach, but I suspect this is a point we disagree on. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Monday, October 14, 2013 11:57:40 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch Thanks for the note, Ted. That vote is for 2.2.0, not -beta. On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: w.r.t. hadoop-2 release, see this thread: http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 Looks like 2.2.0-beta would pass votes. Cheers On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I'm not sure about the change to 1.5.1-SNAPSHOT. I believe we're talking about changing the hadoop.profile for 2.0 to use the 2.2.0 release. I don't think it makes sense to change the default off of the version in the hadoop.profile for 1.0. Presumably this change would also happen in master. Now that Hadoop 2.x is going to have a GA release, I think it makes sense to have a discussion about changing the default to be the hadoop 2.0 profile for master, but this is not that discussion. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
I think you meant: Ugh, Hadoop versions.[1] [1] http://blog.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/ On Tue, Oct 15, 2013 at 11:20 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 10:16 AM, Sean Busbey bus...@cloudera.com wrote: On Tue, Oct 15, 2013 at 7:16 AM, dlmar...@comcast.net wrote: Just to be clear, we are talking about adding profile support to the pom's for Hadoop 2.2.0 for a 1.4.5 and 1.5.1 release, correct? We are not talking about changing the default build profile for these branches are we? for 1.4.5-SNAPSHOT I am only talking about adding support Hadoop 2.2.0. I am not suggesting we change the default from building against Hadoop 0.23.203. I mean 0.20.203.0. Ugh, Hadoop versions. -- Sean -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
For #2, from what I've read, we should definitely bump up the dependency on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 2.2.0-beta for that hadoop-2 profile. I probably stated this before, but I'd much rather see more effort in testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever did testing of Accumulo with the hadoop-2 features -- I seem to recall that it was more testing does Accumulo run on both hadoop 1 and 2. If we can maintain a single artifact, that would definitely be easiest for users, but falling back to user-built artifacts or convenience releases isn't the end of the world. As far as commits, I'd like to see as much separation as possible, but it's understandable if the changes overlap and don't make sense to split out. On 10/14/13 12:55 PM, Sean Busbey wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Mon, Oct 14, 2013 at 9:24 PM, Mike Drob md...@mdrob.com wrote: 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing Just to confirm, the change from Keith's request is * 72hr continuous + agitation + cluster running * Something to test that HA NN failover doesn't take out Accumulo Would the latter be addressed by an additional functional test? or would it need to be some kind of addition to the agitation? Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. I think this means you're +0 on 4b? =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. It's the difference between the 1.4.x branch having all the commits that are backported from 1.5.x vs just having squashed ones. The former maintains more of the original authorship and ties to original jiras. The latter has less noise. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Mon, Oct 14, 2013 at 10:02 PM, Josh Elser josh.el...@gmail.com wrote: For #2, from what I've read, we should definitely bump up the dependency on 1.5.1-SNAPSHOT to 2.1.0-beta, and, given what Ted replied with, to 2.2.0-beta for that hadoop-2 profile. so 1.5.1-SNAPSHOT and this proposed change to 1.4.5-SNAPSHOT should both target 2.2.0-beta, presuming the RC passes (and 2.1.0-beta prior). This sounds inline with Mike's comment re: alpha v beta. anyone have an objection? I probably stated this before, but I'd much rather see more effort in testing Accumulo 1.5.x (and 1.6.0 as that will be feature frozen soon) against hadoop-2 (like Mike's point about HA). I'm not sure if anyone ever did testing of Accumulo with the hadoop-2 features -- I seem to recall that it was more testing does Accumulo run on both hadoop 1 and 2. I figured whatever bar I end up passing for Hadoop 2 support on 1.4.x should help with testing the same for 1.5.x and 1.6.x. -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Thanks for the note, Ted. That vote is for 2.2.0, not -beta. On Oct 14, 2013 7:30 PM, Ted Yu yuzhih...@gmail.com wrote: w.r.t. hadoop-2 release, see this thread: http://search-hadoop.com/m/YSTny19y1Ha1/hadoop+2.2.0 Looks like 2.2.0-beta would pass votes. Cheers On Mon, Oct 14, 2013 at 7:24 PM, Mike Drob md...@mdrob.com wrote: Responses Inline. - Mike On Mon, Oct 14, 2013 at 12:55 PM, Sean Busbey bus...@cloudera.com wrote: Hey All, I'd like to restart the conversation from end July / start August about Hadoop 2 support on the 1.4 branch. Specifically, I'd like to get some requirements ironed out so I can file one or more jiras. I'd also like to get a plan for application. =requirements Here's the requirements I have from the last thread: 1) Maintain existing 1.4 compatibility The only thing I see listed in the pom is Apache release 0.20.203.0. (1.4.4 tag)[1] I don't see anything in the README[2] nor the user manual[3] on other versions being supported. Yep. 2) Gain Hadoop 2 support At the moment, I'm presuming this means Apache release 2.0.4-alpha since that's what 1.5.0 builds against for Hadoop 2. I haven't been following the Hadoop 2 release schedule that closely, but I think the latest is a 2.1.0-beta? Pretty sure it was released after we finished Accumulo 1.5, so there's no reason not to support it in my mind. Depending on an alpha of something strikes me as either unstable or lazy, although I fully understand that it may be neither. 3) Test for correctness on given versions, with = 5 node cluster * Unit Tests * Functional Tests * 24hr continuous + verification * 24hr continuous + verification + agitation * 24hr random walk * 24hr random walk + agitation Keith mentioned running these against a CDH4 cluster, but I presume that since Apache Releases are our stated compatibilities it would actually be against whatever versions we list. Based on #1 and #2 above, I would expect that to be Apache Hadoop 0.20.203.0 and Apache Hadoop 2.0.4-alpha. Hadoop 2 introduces some neat new things like NN HA, which I think it might be worthwhile to test with. At that level it might be more of a verification of the Hadoop code, but I'd like to be comfortable that our DFS Clients switch correctly. This is in addition to the standard release suite that we run. [1] [1]: http://accumulo.apache.org/governance/releasing.html#testing 4) Binary packaging 4a) Either source produces a single binary for all accepted versions or 4b) Instructions for building from source for each versions and somehow flag what (if any) convenience binaries are made for the release. Having run the binary packaging for 1.4.4, I can tell you that it is not in great shape. Christopher cleaned up a lot of the issues in the 1.5 line, so I didn't bother spending a ton of time on them here, but I think RPM and DEB are both broken. It would be nice to be able to specify a Hadoop 2 version for compilation, similar to what happens in the newer code base, which could be back ported, I suppose. 4b seems easier. =application There will be many back-ported patches. Not much active development happens on 1.4.x now, but I presume this should still all go onto a feature branch? Is the community preference that eventually all the changes become a single commit (or one-per-subtask if there are multiple jiras) on the active 1.4 development branch, or that the original patches remain broken out? Not sure what you mean by this. For what it's worth, I'd recommend keeping them broken out. (And that's how the initial development against CDH4 has been done.) [1] http://bit.ly/1fxucMe [2] http://bit.ly/192zUAJ [3] http://accumulo.apache.org/1.4/user_manual/Administration.html#Dependencies -- Sean
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
I don't think that's a good idea unless you can come up with very clear version number change. -Joey On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote: Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc. -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Which version of 0.20 are you testing against? Vanilla, or cdh3 flavored? On Fri, Aug 2, 2013 at 2:37 PM, Joey Echeverria j...@cloudera.com wrote: I don't think that's a good idea unless you can come up with very clear version number change. -Joey On Fri, Aug 2, 2013 at 2:31 PM, Christopher ctubb...@apache.org wrote: Would it be reasonable to consider a version of 1.4 that breaks compatibility with 0.20? I'm not really a fan of this, personally, but am curious what others think. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Aug 2, 2013 at 2:22 PM, Joey Echeverria j...@cloudera.com wrote: Sorry for the delay, it's been one of those weeks. The current version would probably not be backwards compatible to 0.20.2 just based on changes in dependencies. We're looking right now to see how hard it is to have three way compatibility (0.20, 1.0, 2.0). -Joey On Thu, Aug 1, 2013 at 7:33 PM, Dave Marion dlmar...@comcast.net wrote: Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc. -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
RE: Hadoop 2.0 Support for Accumulo 1.4 Branch
Any update? -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Monday, July 29, 2013 1:24 PM To: dev@accumulo.apache.org Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
We're testing this today. I'll report back what we find. -Joey — Sent from Mailbox for iPhone On Fri, Jul 26, 2013 at 3:34 PM, null dlmar...@comcast.net wrote: Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
My question is if the community would be interested in us pulling those back ports upstream? Yes, please.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
We have both the unit tests and the full system test suite hooked up to a Jenkins build server. There are still a couple of tests that fail periodically with the full system test due to timeouts. We're working on those which is why our current release is just a beta. There are no API changes or Accumulo behavior changes. You can use unmodified 1.4.x clients with our release of the server daemons. -Joey On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote: On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com wrote: Cloudera announced last night our support for Accumulo 1.4.3 on CDH4: http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera This required back porting about 11 patches in whole or in part from the 1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but when it's fully public it will be downloadable along with all of the extra patches that we committed. My question is if the community would be interested in us pulling those back ports upstream? What testing has been done? It would be nice to run accumulo's full test suite against 1.4.3+CDH4. Are there any Accumulo API changes or Accumulo behavior changes? I believe this would violate the previously agreed upon rule of no feature back ports to 1.4.3, depending on how we label support for Hadoop 2.0. Thoughts? -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 12:24 PM, Joey Echeverria j...@cloudera.com wrote: We have both the unit tests and the full system test suite hooked up to a Jenkins build server. If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. There are still a couple of tests that fail periodically with the full system test due to timeouts. We're working on those which is why our current release is just a beta. There are no API changes or Accumulo behavior changes. You can use unmodified 1.4.x clients with our release of the server daemons. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? -Joey On Fri, Jul 26, 2013 at 11:45 AM, Keith Turner ke...@deenlo.com wrote: On Fri, Jul 26, 2013 at 11:02 AM, Joey Echeverria j...@cloudera.com wrote: Cloudera announced last night our support for Accumulo 1.4.3 on CDH4: http://www.slideshare.net/JoeyEcheverria/apache-accumulo-and-cloudera This required back porting about 11 patches in whole or in part from the 1.5 line on top of 1.4.3. Our release is still in a semi-private beta, but when it's fully public it will be downloadable along with all of the extra patches that we committed. My question is if the community would be interested in us pulling those back ports upstream? What testing has been done? It would be nice to run accumulo's full test suite against 1.4.3+CDH4. Are there any Accumulo API changes or Accumulo behavior changes? I believe this would violate the previously agreed upon rule of no feature back ports to 1.4.3, depending on how we label support for Hadoop 2.0. Thoughts? -Joey -- Joey Echeverria Director, Federal FTS Cloudera, Inc.
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
On Fri, Jul 26, 2013 at 2:33 PM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. There are some instructions. test/system/continuous/README test/system/randomwalk/README Continuous ingest has a lot of options. For release testing we do something like the following. #configure may need to adjust max mappers and max reducers to make map reduce job run faster start-ingest.sh start-walker.sh #sleep 24hr stop-ingest.sh stop-walker.sh run-verify.sh There continuous dir has scripts for starting and stopping the agitator. We also use this script to agitate while running random walk test. For random walk we use the All.xml graph, configure it to log errors to NFS, and run a walker on each node. We look in NFS for walkers that died or got stuck. The random walk framework will log a message if a node in the graph gets stuck. It will also log a message when it gets unstuck. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. -Joey
Re: Hadoop 2.0 Support for Accumulo 1.4 Branch
Will 1.4 still work with 0.20 with these patches? Great point Billie. - Original Message - From: Billie Rinaldi billie.rina...@gmail.com To: dev@accumulo.apache.org Sent: Friday, July 26, 2013 3:02:41 PM Subject: Re: Hadoop 2.0 Support for Accumulo 1.4 Branch On Fri, Jul 26, 2013 at 11:33 AM, Joey Echeverria j...@cloudera.com wrote: If these patches are going to be included with 1.4.4 or 1.4.5, I would like to see the following test run using CDH4 on at least a 5 node cluster. More nodes would be better. * unit test * Functional test * 24 hr Continuous ingest + verification * 24 hr Continuous ingest + verification + agitation * 24 hr Random walk * 24 hr Random walk + agitation I may be able to assist with this, but I can not make any promises. Sure thing. Is there already a write-up on running this full battery of tests? I have a 10 node cluster that I can use for this. Great. I think this would be a good patch for 1.4. I assume that if a user stays with Hadoop 1 there are no dependency changes? Yup. It works the same way as 1.5 where all of the dependency changes are in a Hadoop 2.0 profile. In 1.5.0, we gave up on compatibility with 0.20 (and early versions of 1.0) to make the compatibility requirements simpler; we ended up without dependency changes in the hadoop version profiles. Will 1.4 still work with 0.20 with these patches? If there are dependency changes in the profiles, 1.4 would have to be compiled against a hadoop version compatible with the running version of hadoop, correct? We had some trouble in the 1.5 release process with figuring out how to provide multiple binary artifacts (each compiled against a different version of hadoop) for the same release. Just something we should consider before we are in the midst of releasing 1.4.4. Billie -Joey
Re: hadoop-2.0 incompatibility
Is this something else we can resolve via reflection or are we back to square 1? On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com wrote: Ugh. While running the continuous ingest verify, yarn spit this out: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected This is preventing the reduce step from completing. -Eric
Re: hadoop-2.0 incompatibility
I'm testing a fix, but I'm not for holding up the release for this. First, calling a method by reflection is quite a bit slower, so even if we fix it, it might not be appropriate. On Tue, May 21, 2013 at 11:49 AM, John Vines vi...@apache.org wrote: Is this something else we can resolve via reflection or are we back to square 1? On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com wrote: Ugh. While running the continuous ingest verify, yarn spit this out: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected This is preventing the reduce step from completing. -Eric
Re: hadoop-2.0 incompatibility
We still have the option of putting out a separate build for 1.5.0 compatibility with hadoop 2. Should we vote on that release separately? Seems like it should be easy to add more binary packages that correspond to the same source release, even after the initial vote. Adam On Tue, May 21, 2013 at 11:55 AM, Keith Turner ke...@deenlo.com wrote: On Tue, May 21, 2013 at 11:02 AM, Eric Newton eric.new...@gmail.com wrote: Ugh. While running the continuous ingest verify, yarn spit this out: Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected This is preventing the reduce step from completing. Could fix it in 1.5.1 I am starting to think that hadoop compat was so important, it should have been mostly completed before the feature freeze. -Eric
Re: Hadoop 2 compatibility issues
I also just snuck in that Hadoop 1/2 compatibility fix with JobContext (ACCUMULO-1421). Not sure if that's the only change needed, but it should be a step forward. Adam On Thu, May 16, 2013 at 11:23 AM, Eric Newton eric.new...@gmail.com wrote: I've snuck some necessary changes in... doing integration testing on it right now. -Eric On Wed, May 15, 2013 at 8:03 PM, John Vines vi...@apache.org wrote: I will gladly do it next week, but I'd rather not have it delay the release. The question from there is, is doing this type of packaging change too large to put in 1.5.1? On Wed, May 15, 2013 at 2:44 PM, Christopher ctubb...@apache.org wrote: So, I think that'd be great, if it works, but who is willing to do this work and get it in before I make another RC? I'd like to cut RC3 tomorrow if I have time. So, feel free to patch these in to get it to work before then... or, by the next RC if RC3 fails to pass a vote. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs afu...@apache.org wrote: It seems like the ideal option would be to have one binary build that determines Hadoop version and switches appropriately at runtime. Has anyone attempted to do this yet, and do we have an enumeration of the places in Accumulo code where the incompatibilities show up? One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext switching between an abstract class and an interface. This can be fixed with something to the effect of: public static Configuration getConfiguration(JobContext context) { Impl impl = new Impl(); Configuration configuration = null; try { Class c = TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext); Method m = c.getMethod(getConfiguration); Object o = m.invoke(context, new Object[0]); configuration = (Configuration)o; } catch (Exception e) { throw new RuntimeException(e); } return configuration; } Based on a test I just ran, using that getConfiguration method instead of just calling the getConfiguration method on context should avoid the one incompatibility. Maybe with a couple more changes like that we can get down to one bytecode release for all known Hadoop versions? Adam
Re: Hadoop 2 compatibility issues - tangent
Awesome Chris, thanks. I didn't know where to begin looking for that one. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:11 PM, Christopher ctubb...@apache.org wrote: With the right configuration, you could use the copy-dependencies goal of the maven-dependency-plugin to gather your dependencies to one place. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote: On that note, I was wondering if there were any suggestions for how to deal with the laundry list of provided dependencies that Accumulo core has? Writing packages against it is a bit ugly if not using the accumulo script to start. Are there any maven utilities to automatically dissect provided dependencies and make them included. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues - tangent
No problem. FYI, this is essentially what we do to drop the non-provided deps into lib/ in the first place. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, May 15, 2013 at 3:03 AM, John Vines vi...@apache.org wrote: Awesome Chris, thanks. I didn't know where to begin looking for that one. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:11 PM, Christopher ctubb...@apache.org wrote: With the right configuration, you could use the copy-dependencies goal of the maven-dependency-plugin to gather your dependencies to one place. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote: On that note, I was wondering if there were any suggestions for how to deal with the laundry list of provided dependencies that Accumulo core has? Writing packages against it is a bit ugly if not using the accumulo script to start. Are there any maven utilities to automatically dissect provided dependencies and make them included. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
It seems like the ideal option would be to have one binary build that determines Hadoop version and switches appropriately at runtime. Has anyone attempted to do this yet, and do we have an enumeration of the places in Accumulo code where the incompatibilities show up? One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext switching between an abstract class and an interface. This can be fixed with something to the effect of: public static Configuration getConfiguration(JobContext context) { Impl impl = new Impl(); Configuration configuration = null; try { Class c = TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext); Method m = c.getMethod(getConfiguration); Object o = m.invoke(context, new Object[0]); configuration = (Configuration)o; } catch (Exception e) { throw new RuntimeException(e); } return configuration; } Based on a test I just ran, using that getConfiguration method instead of just calling the getConfiguration method on context should avoid the one incompatibility. Maybe with a couple more changes like that we can get down to one bytecode release for all known Hadoop versions? Adam
Re: Hadoop 2 compatibility issues
So, I think that'd be great, if it works, but who is willing to do this work and get it in before I make another RC? I'd like to cut RC3 tomorrow if I have time. So, feel free to patch these in to get it to work before then... or, by the next RC if RC3 fails to pass a vote. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, May 15, 2013 at 5:31 PM, Adam Fuchs afu...@apache.org wrote: It seems like the ideal option would be to have one binary build that determines Hadoop version and switches appropriately at runtime. Has anyone attempted to do this yet, and do we have an enumeration of the places in Accumulo code where the incompatibilities show up? One of the incompatibilities is in org.apache.hadoop.mapreduce.JobContext switching between an abstract class and an interface. This can be fixed with something to the effect of: public static Configuration getConfiguration(JobContext context) { Impl impl = new Impl(); Configuration configuration = null; try { Class c = TestCompatibility.class.getClassLoader().loadClass(org.apache.hadoop.mapreduce.JobContext); Method m = c.getMethod(getConfiguration); Object o = m.invoke(context, new Object[0]); configuration = (Configuration)o; } catch (Exception e) { throw new RuntimeException(e); } return configuration; } Based on a test I just ran, using that getConfiguration method instead of just calling the getConfiguration method on context should avoid the one incompatibility. Maybe with a couple more changes like that we can get down to one bytecode release for all known Hadoop versions? Adam
Re: Hadoop 2 compatibility issues
If a user is referencing any of the Hadoop classes, aren't they supposed to add a dependency on the appropriate Hadoop artifact anyways? FWIW, option 4 is what Avro does. Their discussion: https://issues.apache.org/jira/browse/AVRO-1170 On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii -- Sean Busbey Solutions Architect Cloudera, Inc. Phone: MAN-VS-BEARD
Re: Hadoop 2 compatibility issues
I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
CXF does (4) for the various competing JAX-WS implementations. The different options are API-compatible, and the profiles just switch the deps around. There would be slightly more Maven correctness in marking the deps optional, forcing each user to pick one explicitly. However, (4) with good doc on what to put in the POM is really not a cause for shame. Maven is weak in this area, and it's all tradeoffs. On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
I tend to agree with Sean, John, and Benson. Option 4 works for now, and until we can define something that works better (e.g. runtime compatibility with both hadoop 1 and 2 using reflection and crazy class loaders) we should not delay the release. Good docs are always helpful where engineering is less than ideal (egad, I hope I didn't just volunteer!). Adam On Tue, May 14, 2013 at 5:16 PM, Benson Margulies bimargul...@gmail.comwrote: CXF does (4) for the various competing JAX-WS implementations. The different options are API-compatible, and the profiles just switch the deps around. There would be slightly more Maven correctness in marking the deps optional, forcing each user to pick one explicitly. However, (4) with good doc on what to put in the POM is really not a cause for shame. Maven is weak in this area, and it's all tradeoffs. On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for
Re: Hadoop 2 compatibility issues
We can easily fix the break it the hadoop dependencies by making the switch to hadoop-client and relying on hadoop.version to set/override the version. The hadoop 2 profile is just needed to bring in additional dependencies and possibly setting the hadoop version for convenience. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for
Re: Hadoop 2 compatibility issues
One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues - tangent
On that note, I was wondering if there were any suggestions for how to deal with the laundry list of provided dependencies that Accumulo core has? Writing packages against it is a bit ugly if not using the accumulo script to start. Are there any maven utilities to automatically dissect provided dependencies and make them included. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues
On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.comwrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are
Re: Hadoop 2 compatibility issues
This is part of my thinking. All of the dependencies included in the profiles for Avro are marked provided. Provided scope, by definition, is not transitive. Thus, it doesn't really matter that they aren't transitive *also* because of the profile. Is Accumulo including anything other than things provided by either Hadoop 1 or 2? On Tue, May 14, 2013 at 6:08 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii -- Sean
Re: Hadoop 2 compatibility issues
They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support ( http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop
Re: Hadoop 2 compatibility issues
I'm not sure what the best solution would be, but I'd easily assume any worthwhile solution would extend the 1.5.0 release date even farther than I'd be happy about. So, by that stance, I'm for #4 or another quick fix, even if it does perpetuate some sort of hack. On 05/14/2013 07:09 PM, Benson Margulies wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. This makes sense to me. Although, I don't know exactly how one would go about doing this, I trust Benson enough not to throw something non-feasible at us :)
Re: Hadoop 2 compatibility issues
Benson- They produce different byte-code. That's why we're even considering this. ACCUMULO-1402 is the ticket under which our intent is to add classifiers, so that they can be distinguished. All- To Keith's point, I think perhaps all this concern is a non-issue... because as Keith points out, the dependencies in question are marked as provided, and dependency resolution doesn't occur for provided dependencies anyway... so even if we leave off the profiles, we're in the same boat. Maybe not the boat we should be in... but certainly not a sinking one as I had first imagined. It's as afloat as it was before, when they were not in a profile, but still marked as provided. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer
Re: Hadoop 2 compatibility issues
Sorry for the dupe Benson, meant to reply all Oh no Benson, the compiled code is different. The fundamental issue is that some interfaces got changes to abstract classes or vice versa. The source is the same, but class files are different. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem, whereas I think it's waiting until we have the time to solve the problem correctly. Your reasoning for this is for standardizing for maven conventions, but the other options, while more 'correct' from a maven standpoint or a larger headache for our user base and ourselves. In either case, we're going to be breaking some sort of convention, and while it's not good, we should be doing the one that's less bad for US. The important thing here, now, is that the poms work and we should go with the method that leaves the work minimal for our end users to utilize them. I do agree that 1. is the correct option in the long run. More specifically, I think it boils down to having a single module compatibility layer, which is how hbase deals with this issue. But like you said, we don't have the time to engineer a proper solution. So let sleeping dogs lie and we can revamp the whole system for 1.5.1 or 1.6.0 when we have the cycles to do it right. On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger
Re: Hadoop 2 compatibility issues
We've written the code such that it works in either, and then we have profiles which set the hadoop.version for convenience. The profiles also alternate between using hadoop-client and hadoop-core, but as I mentioned above, that is unnecessary. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:42 PM, Benson Margulies bimargul...@gmail.com wrote: On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote: Benson- They produce different byte-code. That's why we're even considering this. ACCUMULO-1402 is the ticket under which our intent is to add classifiers, so that they can be distinguished. whoops, missed that. Then how do people succeed in just fixing up their dependencies and using it? In any case, speaking as a Maven-maven, classifiers are absolutely, positively, a cure worse than the disease. If you want the details just ask. All- To Keith's point, I think perhaps all this concern is a non-issue... because as Keith points out, the dependencies in question are marked as provided, and dependency resolution doesn't occur for provided dependencies anyway... so even if we leave off the profiles, we're in the same boat. Maybe not the boat we should be in... but certainly not a sinking one as I had first imagined. It's as afloat as it was before, when they were not in a profile, but still marked as provided. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 4:56 PM, John Vines vi...@apache.org wrote: I'm an advocate of option 4. You say that it's ignoring the problem,
Re: Hadoop 2 compatibility issues
Response to Benson inline, but additional note here: It should be noted that the situation will be made worse for the solution I was considering for ACCUMULO-1402, which would move the accumulo artifacts, classified by the hadoop2 variant, into the profiles... meaning they will no longer resolve transitively when they did before. Can go into details on that ticket, if needed. On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com wrote: On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote: Benson- They produce different byte-code. That's why we're even considering this. ACCUMULO-1402 is the ticket under which our intent is to add classifiers, so that they can be distinguished. whoops, missed that. Then how do people succeed in just fixing up their dependencies and using it? The specific differences are things like changes from abstract class to an interface. Apparently an import of these do not produce compatible byte-code, even though the method signature looks the same. In any case, speaking as a Maven-maven, classifiers are absolutely, positively, a cure worse than the disease. If you want the details just ask. Agreed. I just don't see a good alternative here. All- To Keith's point, I think perhaps all this concern is a non-issue... because as Keith points out, the dependencies in question are marked as provided, and dependency resolution doesn't occur for provided dependencies anyway... so even if we leave off the profiles, we're in the same boat. Maybe not the boat we should be in... but certainly not a sinking one as I had first imagined. It's as afloat as it was before, when they were not in a profile, but still marked as provided. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against... not just those using Hadoop 2... I don't know if that point was clear), to only partially support a version of Hadoop that is still alpha and has never had a stable release. BTW, Option 4 was how I had have achieved a solution for ACCUMULO-1402, but am reluctant to apply that patch, with this issue outstanding, as it may exacerbate the problem. Another implication for Option 4 (the current solution) is for 1.6.0, with the planned accumulo-maven-plugin... because it means that the accumulo-maven-plugin will need to be configured like this: plugin groupIdorg.apache.accumulo/groupId artifactIdaccumulo-maven-plugin/artifactId dependencies ... all the required hadoop 1 dependencies to make the plugin work, even though this version only works against hadoop 1 anyway... /dependencies ... /plugin -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 5:42 PM, Christopher ctubb...@apache.org wrote: I think Option 2 is the best solution for waiting until we have the time to solve the problem correctly, as it ensures that transitive dependencies work for the stable version of Hadoop, and using Hadoop2 is a very simple documentation issue for how to apply the patch and rebuild. Option 4 doesn't wait... it explicitly introduces a problem for users. Option 1 is how I'm tentatively thinking about fixing it properly in 1.6.0. --
Re: Hadoop 2 compatibility issues
Maven will malfunction in various entertaining ways if you try to change the GAV of the output of the build using a profile. Maven will malfunction in various entertaining ways if you use classifiers on real-live-JAR files that get used as real-live-dependencies, because it has no concept of a pom-per-classifier. Where does this leave you/us? (I'm not sure that I've earned an 'us' recently around here.) First, I note that 'Apache releases are source releases'. So, one resort of scoundrels here would be to support only one hadoop in the convenience binaries that get pushed to Maven Central, and let other hadoop users take the source release and build for themselves. Second, I am reduced to suggesting an elaboration of the build in which some tool edits poms and runs builds. The maven-invoker-plugin could be used to run that, but a plain old script in a plain old language might be less painful. I appreciate that this may not be an appealing contribution to where things are, but it might be the best of the evil choices. On Tue, May 14, 2013 at 7:50 PM, John Vines vi...@apache.org wrote: The compiled code is compiled code. There are no concerns of dependency resolution. So I see no issues in using the profile to define the gav if that is feasible. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:47 PM, Christopher ctubb...@apache.org wrote: Response to Benson inline, but additional note here: It should be noted that the situation will be made worse for the solution I was considering for ACCUMULO-1402, which would move the accumulo artifacts, classified by the hadoop2 variant, into the profiles... meaning they will no longer resolve transitively when they did before. Can go into details on that ticket, if needed. On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com wrote: On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote: Benson- They produce different byte-code. That's why we're even considering this. ACCUMULO-1402 is the ticket under which our intent is to add classifiers, so that they can be distinguished. whoops, missed that. Then how do people succeed in just fixing up their dependencies and using it? The specific differences are things like changes from abstract class to an interface. Apparently an import of these do not produce compatible byte-code, even though the method signature looks the same. In any case, speaking as a Maven-maven, classifiers are absolutely, positively, a cure worse than the disease. If you want the details just ask. Agreed. I just don't see a good alternative here. All- To Keith's point, I think perhaps all this concern is a non-issue... because as Keith points out, the dependencies in question are marked as provided, and dependency resolution doesn't occur for provided dependencies anyway... so even if we leave off the profiles, we're in the same boat. Maybe not the boat we should be in... but certainly not a sinking one as I had first imagined. It's as afloat as it was before, when they were not in a profile, but still marked as provided. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all deliver the same GAV, you have chaos. What GAV are we currently producing for hadoop 1 and hadoop 2? If you have different profiles that test against different versions of dependencies, but all deliver the same byte code at the end of the day, you don't have chaos. On Tue, May 14, 2013 at 5:48 PM, Christopher ctubb...@apache.org wrote: I think it's interesting that Option 4 seems to be most preferred... because it's the *only* option that is explicitly advised against by the Maven developers (from the information I've read). I can see its appeal, but I really don't think that we should introduce an explicit problem for users (that applies to users using even the Hadoop version we directly build against...
Re: Hadoop 2 compatibility issues
I'm very much partial to the First option, as it's far less effort for approximately the same value (in my opinion, but in light of the enthusiasm above for hadoop2, I could be very wrong on my assessment of the value). I'm going to upload a patch to ACCUMULO-1402 soon (tiny polishing left), to demonstrate a way to push redundant jars, with an extra classifier (though I still have to build twice, to avoid maven-invoker-plugin complexity) for hadoop2-compatible binaries. If you don't mind, I'll tag you with a request to review that patch, as I'd like more details about the classifier issues you mention, in context. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 8:27 PM, Benson Margulies bimargul...@gmail.com wrote: Maven will malfunction in various entertaining ways if you try to change the GAV of the output of the build using a profile. Maven will malfunction in various entertaining ways if you use classifiers on real-live-JAR files that get used as real-live-dependencies, because it has no concept of a pom-per-classifier. Where does this leave you/us? (I'm not sure that I've earned an 'us' recently around here.) First, I note that 'Apache releases are source releases'. So, one resort of scoundrels here would be to support only one hadoop in the convenience binaries that get pushed to Maven Central, and let other hadoop users take the source release and build for themselves. Second, I am reduced to suggesting an elaboration of the build in which some tool edits poms and runs builds. The maven-invoker-plugin could be used to run that, but a plain old script in a plain old language might be less painful. I appreciate that this may not be an appealing contribution to where things are, but it might be the best of the evil choices. On Tue, May 14, 2013 at 7:50 PM, John Vines vi...@apache.org wrote: The compiled code is compiled code. There are no concerns of dependency resolution. So I see no issues in using the profile to define the gav if that is feasible. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 7:47 PM, Christopher ctubb...@apache.org wrote: Response to Benson inline, but additional note here: It should be noted that the situation will be made worse for the solution I was considering for ACCUMULO-1402, which would move the accumulo artifacts, classified by the hadoop2 variant, into the profiles... meaning they will no longer resolve transitively when they did before. Can go into details on that ticket, if needed. On Tue, May 14, 2013 at 7:41 PM, Benson Margulies bimargul...@gmail.com wrote: On Tue, May 14, 2013 at 7:36 PM, Christopher ctubb...@apache.org wrote: Benson- They produce different byte-code. That's why we're even considering this. ACCUMULO-1402 is the ticket under which our intent is to add classifiers, so that they can be distinguished. whoops, missed that. Then how do people succeed in just fixing up their dependencies and using it? The specific differences are things like changes from abstract class to an interface. Apparently an import of these do not produce compatible byte-code, even though the method signature looks the same. In any case, speaking as a Maven-maven, classifiers are absolutely, positively, a cure worse than the disease. If you want the details just ask. Agreed. I just don't see a good alternative here. All- To Keith's point, I think perhaps all this concern is a non-issue... because as Keith points out, the dependencies in question are marked as provided, and dependency resolution doesn't occur for provided dependencies anyway... so even if we leave off the profiles, we're in the same boat. Maybe not the boat we should be in... but certainly not a sinking one as I had first imagined. It's as afloat as it was before, when they were not in a profile, but still marked as provided. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 7:09 PM, Benson Margulies bimargul...@gmail.com wrote: I just doesn't make very much sense to me to have two different GAV's for the very same .class files, just to get different dependencies in the poms. However, if someone really wanted that, I'd look to make some scripting that created this downstream from the main build. On Tue, May 14, 2013 at 6:16 PM, John Vines vi...@apache.org wrote: They're the same currently. I was requesting separate gavs for hadoop 2. It's been on the mailing list and jira. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:14 PM, Keith Turner ke...@deenlo.com wrote: On Tue, May 14, 2013 at 5:51 PM, Benson Margulies bimargul...@gmail.com wrote: I am a maven developer, and I'm offering this advice based on my understanding of reason why that generic advice is offered. If you have different profiles that _build different results_ but all
Re: Hadoop 2 compatibility issues - tangent
You can have maven generate a file with the classpath dependencies and also make a shaded jar. I use the classpath file for normal java processes and the shaded jar file with 'hadoop jar'. On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote: On that note, I was wondering if there were any suggestions for how to deal with the laundry list of provided dependencies that Accumulo core has? Writing packages against it is a bit ugly if not using the accumulo script to start. Are there any maven utilities to automatically dissect provided dependencies and make them included. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop 2 compatibility issues - tangent
With the right configuration, you could use the copy-dependencies goal of the maven-dependency-plugin to gather your dependencies to one place. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, May 14, 2013 at 6:14 PM, John Vines vi...@apache.org wrote: On that note, I was wondering if there were any suggestions for how to deal with the laundry list of provided dependencies that Accumulo core has? Writing packages against it is a bit ugly if not using the accumulo script to start. Are there any maven utilities to automatically dissect provided dependencies and make them included. Sent from my phone, please pardon the typos and brevity. On May 14, 2013 6:09 PM, Keith Turner ke...@deenlo.com wrote: One note about option 4. When using 1.4 users have to include hadoop core as a dependency in their pom. This must be done because the 1.4 Accumulo pom marks hadoop-core as provided. So maybe option 4 is ok if the deps in the profile are provided? On Tue, May 14, 2013 at 4:40 PM, Christopher ctubb...@apache.org wrote: So, I've run into a problem with ACCUMULO-1402 that requires a larger discussion about how Accumulo 1.5.0 should support Hadoop2. The problem is basically that profiles should not contain dependencies, because profiles don't get activated transitively. A slide deck by the Maven developers point this out as a bad practice... yet it's a practice we rely on for our current implementation of Hadoop2 support (http://www.slideshare.net/aheritier/geneva-jug-30th-march-2010-maven slide 80). What this means is that even if we go through the work of publishing binary artifacts compiled against Hadoop2, neither our Hadoop1 binaries or our Hadoop2 binaries will be able to transitively resolve any dependencies defined in profiles. This has significant implications to user code that depends on Accumulo Maven artifacts. Every user will essentially have to explicitly add Hadoop dependencies for every Accumulo artifact that has dependencies on Hadoop, either because we directly or transitively depend on Hadoop (they'll have to peek into the profiles in our POMs and copy/paste the profile into their project). This becomes more complicated when we consider how users will try to use things like Instamo. There are workarounds, but none of them are really pleasant. 1. The best way to support both major Hadoop APIs is to have separate modules with separate dependencies directly in the POM. This is a fair amount of work, and in my opinion, would be too disruptive for 1.5.0. This solution also gets us separate binaries for separate supported versions, which is useful. 2. A second option, and the preferred one I think for 1.5.0, is to put a Hadoop2 patch in the branch's contrib directory (branches/1.5/contrib) that patches the POM files to support building against Hadoop2. (Acknowledgement to Keith for suggesting this solution.) 3. A third option is to fork Accumulo, and maintain two separate builds (a more traditional technique). This adds merging nightmare for features/patches, but gets around some reflection hacks that we may have been motivated to do in the past. I'm not a fan of this option, particularly because I don't want to replicate the fork nightmare that has been the history of early Hadoop itself. 4. The last option is to do nothing and to continue to build with the separate profiles as we are, and make users discover and specify transitive dependencies entirely on their own. I think this is the worst option, as it essentially amounts to ignore the problem. At the very least, it does not seem reasonable to complete ACCUMULO-1402 for 1.5.0, given the complexity of this issue. Thoughts? Discussion? Vote on option? -- Christopher L Tubbs II http://gravatar.com/ctubbsii
Re: Hadoop Summit Community Choice
On Tue, Mar 5, 2013 at 6:37 AM, Jim Klucar klu...@gmail.com wrote: The Hadoop Summit is coming up in San Jose this summer ( http://hadoopsummit.org/san-jose/ ), and they just released abstracts for a Community Choice vote. Community voting plays a role in what abstracts are selected to be presented at the conference. From what I saw, there are two Accumulo related abstracts proposed in the Enterprise Data Architecture track. If you're so inclined, please vote on them to help spread Accumulo. http://hadoopsummit2013.uservoice.com/forums/196821-enterprise-data-architecture?query=accumulo Full disclosure, I submitted the Clojure one, and I guess its time to show what I've been up to so stay tuned. Sorry for spamming both lists, but I know that not everyone subscribes to all the lists. I submitted the other Accumulo talk. If you vote for it, I'll have to start researching and documenting the real differences between HBase and Accumulo and their effects ... Billie Jim
Re: hadoop classpath causing an exception (sub-command not defined?)
It looks to me like the change of Nov 21, 2012 added the 'hadoop classpath' call to the accumulo script. ACCUMULO-708 initial implementation of VFS class loader … git-svn-id: https://svn.apache.org/repos/asf/accumulo/trunk@1412398 13f79535-47bb-0310-9956-ffa450edef68 Dave Marion authored 23 days ago Could the classpath sub-command be part of a newer version (0.20.2) of hadoop? On Fri, Dec 14, 2012 at 12:18 AM, John Vines jvi...@gmail.com wrote: I didn't think hadoop had a classpath argument, just Accumulo. Sent from my phone, please pardon the typos and brevity. On Dec 13, 2012 10:43 PM, David Medinets david.medin...@gmail.com wrote: I am at a loss to explain what I am seeing. I have installed Accumulo many times without a hitch. But today, I am running into a problem getting the hadoop classpath. $ /usr/local/hadoop/bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenoderun the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipesrun a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs queueget information regarding JobQueues version print the version jar jarrun a jar file distcp srcurl desturl copy file or directories recursively archive -archiveName NAME src* dest create a hadoop archive daemonlogget/set the log level for each daemon or CLASSNAMErun the class named CLASSNAME Most commands print help when invoked w/o parameters. I am using using the following version of hadoop: $ hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707 Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010 Inside the accumulo script is the line: HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` This line results in the following exception: $ $HADOOP_HOME/bin/hadoop classpath Exception in thread main java.lang.NoClassDefFoundError: classpath Caused by: java.lang.ClassNotFoundException: classpath at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: classpath. Program will exit. Am I missing something basic? What?
Re: hadoop classpath causing an exception (sub-command not defined?)
Should we add an hadoop version check to the accumulo script? On Fri, Dec 14, 2012 at 7:45 AM, Jason Trost jason.tr...@gmail.com wrote: We saw the same issue recently. We upgraded our dev nodes to hadoop 1.1.1 and it fixed this issue. I'm not sure when class path was added to the hadoop command so a minor upgrade may work too. --Jason sent from my DROID On Dec 14, 2012 7:34 AM, David Medinets david.medin...@gmail.com wrote: It looks to me like the change of Nov 21, 2012 added the 'hadoop classpath' call to the accumulo script. ACCUMULO-708 initial implementation of VFS class loader … git-svn-id: https://svn.apache.org/repos/asf/accumulo/trunk@1412398 13f79535-47bb-0310-9956-ffa450edef68 Dave Marion authored 23 days ago Could the classpath sub-command be part of a newer version (0.20.2) of hadoop? On Fri, Dec 14, 2012 at 12:18 AM, John Vines jvi...@gmail.com wrote: I didn't think hadoop had a classpath argument, just Accumulo. Sent from my phone, please pardon the typos and brevity. On Dec 13, 2012 10:43 PM, David Medinets david.medin...@gmail.com wrote: I am at a loss to explain what I am seeing. I have installed Accumulo many times without a hitch. But today, I am running into a problem getting the hadoop classpath. $ /usr/local/hadoop/bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenoderun the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipesrun a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs queueget information regarding JobQueues version print the version jar jarrun a jar file distcp srcurl desturl copy file or directories recursively archive -archiveName NAME src* dest create a hadoop archive daemonlogget/set the log level for each daemon or CLASSNAMErun the class named CLASSNAME Most commands print help when invoked w/o parameters. I am using using the following version of hadoop: $ hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707 Compiled by chrisdo on Fri Feb 19 08:07:34 UTC 2010 Inside the accumulo script is the line: HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath` This line results in the following exception: $ $HADOOP_HOME/bin/hadoop classpath Exception in thread main java.lang.NoClassDefFoundError: classpath Caused by: java.lang.ClassNotFoundException: classpath at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: classpath. Program will exit. Am I missing something basic? What?
Re: Hadoop Summit
I wish, at this point it looks like no for me. On Wed, Nov 21, 2012 at 8:48 AM, Billie Rinaldi bil...@apache.org wrote: Is anyone thinking about going to the Hadoop Summit in Amsterdam in March? http://hadoopsummit.org/amsterdam I'm thinking of proposing a talk on improvements in Accumulo 1.5. Billie