Re: [DISCUSSION]Pig releases with different versions of Hadoop

Daniel Dai Mon, 07 Nov 2011 13:42:31 -0800

Hi, Alejandro,
I understand your concern but creating multiple pig.jar is inevitable. See
my comments below.


Daniel

On Mon, Nov 7, 2011 at 11:40 AM, Alejandro Abdelnur <[email protected]>wrote:

> Hi Olga,
>
> Regarding #1, does this means we'd have a build of Pig X for each
> version of Hadoop we support? It seems to me this would be a bit
> complex to maintain.
>
Yes. Currently we only have plan to support 20.x and 23 (There is some work
for hadoop 22 in PIG-2277 <https://issues.apache.org/jira/browse/PIG-2277>,
but I don't know how it would end up). This is complex but I cannot see how
we can avoid it. Hopefully hadoop will converge and become API stable, so
that we don't need to do this trick in future hadoop release.

>
> Regarding #2, If Hadoop does a good job at maintaing public API
> backwards compatibility and Pig uses only Hadoop public API we would
> be good.
>
That's not true at least for 23 new apis.

>
> Regarding #3, still I can see potential issues (from my experience
> with Hadoop-Oozie) where the API did not change but the behavior dir.
> This means we'll have to be able to if/then/else within Pig whenever
> necessary based on the version of Hadoop.
>
We already do such trick if we can solve the version divergence by using
if/then/else or reflection. In that we only need to maintain only pig.jar.
However, there are some static dependencies which cannot be solved by these
tricks, that's why we do need a shims layer and generate different pig.jar
for different version of hadoop.

>
> A possible way of addressing this would be:
>
> * Pig should use the 'hadoop' to run Pig (this would help to cleanly
> bring into the classpath the Hadoop depedencies).
>
We've already done in PIG-2239


> * Pig could have a whitelist of Hadoop version it supports and fail if
> the current hadoop version is not supported (we could use version
> regex/ranges)
> * (what I'm suggesting in #3 above) Pig could use the Hadoop version
> as a code selector whenever necessary.
>
> Thanks.
>
> Alejandro
>
> On Mon, Nov 7, 2011 at 11:15 AM, Olga Natkovich <[email protected]>
> wrote:
> > Hi,
> >
> > In the past we have for the most part avoided supporting multiple
> versions of Hadoop with the same version of Pig. This is about to change
> with release of Hadoop 23. We need to come up with a strategy on how to
> support that. There are a couple of issues to consider:
> >
> >
> > (1)    Version numbering. Seems like encoding the information in the
> last version number makes sense. The details of the encoding need to be
> hashed out
> >
> > (2)    Code changes required to support different version of Hadoop.
> This time around we made an effort to make sure that the same code can work
> with both. In the future that might not work and we would need to figure
> out how to maintain different code base. Most likely we would have to have
> additional branches off of main release branch
> >
> > (3)    Anything else we need to consider?
> >
> > Olga
> >
>

Re: [DISCUSSION]Pig releases with different versions of Hadoop

Reply via email to