Hi Olga, Regarding #1, does this means we'd have a build of Pig X for each version of Hadoop we support? It seems to me this would be a bit complex to maintain.
Regarding #2, If Hadoop does a good job at maintaing public API backwards compatibility and Pig uses only Hadoop public API we would be good. Regarding #3, still I can see potential issues (from my experience with Hadoop-Oozie) where the API did not change but the behavior dir. This means we'll have to be able to if/then/else within Pig whenever necessary based on the version of Hadoop. A possible way of addressing this would be: * Pig should use the 'hadoop' to run Pig (this would help to cleanly bring into the classpath the Hadoop depedencies). * Pig could have a whitelist of Hadoop version it supports and fail if the current hadoop version is not supported (we could use version regex/ranges) * (what I'm suggesting in #3 above) Pig could use the Hadoop version as a code selector whenever necessary. Thanks. Alejandro On Mon, Nov 7, 2011 at 11:15 AM, Olga Natkovich <[email protected]> wrote: > Hi, > > In the past we have for the most part avoided supporting multiple versions of > Hadoop with the same version of Pig. This is about to change with release of > Hadoop 23. We need to come up with a strategy on how to support that. There > are a couple of issues to consider: > > > (1) Version numbering. Seems like encoding the information in the last > version number makes sense. The details of the encoding need to be hashed out > > (2) Code changes required to support different version of Hadoop. This > time around we made an effort to make sure that the same code can work with > both. In the future that might not work and we would need to figure out how > to maintain different code base. Most likely we would have to have additional > branches off of main release branch > > (3) Anything else we need to consider? > > Olga >
