Hi Olga,

Regarding #1, does this means we'd have a build of Pig X for each
version of Hadoop we support? It seems to me this would be a bit
complex to maintain.

Regarding #2, If Hadoop does a good job at maintaing public API
backwards compatibility and Pig uses only Hadoop public API we would
be good.

Regarding #3, still I can see potential issues (from my experience
with Hadoop-Oozie) where the API did not change but the behavior dir.
This means we'll have to be able to if/then/else within Pig whenever
necessary based on the version of Hadoop.

A possible way of addressing this would be:

* Pig should use the 'hadoop' to run Pig (this would help to cleanly
bring into the classpath the Hadoop depedencies).
* Pig could have a whitelist of Hadoop version it supports and fail if
the current hadoop version is not supported (we could use version
regex/ranges)
* (what I'm suggesting in #3 above) Pig could use the Hadoop version
as a code selector whenever necessary.

Thanks.

Alejandro

On Mon, Nov 7, 2011 at 11:15 AM, Olga Natkovich <[email protected]> wrote:
> Hi,
>
> In the past we have for the most part avoided supporting multiple versions of 
> Hadoop with the same version of Pig. This is about to change with release of 
> Hadoop 23. We need to come up with a strategy on how to support that. There 
> are a couple of issues to consider:
>
>
> (1)    Version numbering. Seems like encoding the information in the last 
> version number makes sense. The details of the encoding need to be hashed out
>
> (2)    Code changes required to support different version of Hadoop. This 
> time around we made an effort to make sure that the same code can work with 
> both. In the future that might not work and we would need to figure out how 
> to maintain different code base. Most likely we would have to have additional 
> branches off of main release branch
>
> (3)    Anything else we need to consider?
>
> Olga
>

Reply via email to