Hi, Alejandro, I understand your concern but creating multiple pig.jar is inevitable. See my comments below.
Daniel On Mon, Nov 7, 2011 at 11:40 AM, Alejandro Abdelnur <[email protected]>wrote: > Hi Olga, > > Regarding #1, does this means we'd have a build of Pig X for each > version of Hadoop we support? It seems to me this would be a bit > complex to maintain. > Yes. Currently we only have plan to support 20.x and 23 (There is some work for hadoop 22 in PIG-2277 <https://issues.apache.org/jira/browse/PIG-2277>, but I don't know how it would end up). This is complex but I cannot see how we can avoid it. Hopefully hadoop will converge and become API stable, so that we don't need to do this trick in future hadoop release. > > Regarding #2, If Hadoop does a good job at maintaing public API > backwards compatibility and Pig uses only Hadoop public API we would > be good. > That's not true at least for 23 new apis. > > Regarding #3, still I can see potential issues (from my experience > with Hadoop-Oozie) where the API did not change but the behavior dir. > This means we'll have to be able to if/then/else within Pig whenever > necessary based on the version of Hadoop. > We already do such trick if we can solve the version divergence by using if/then/else or reflection. In that we only need to maintain only pig.jar. However, there are some static dependencies which cannot be solved by these tricks, that's why we do need a shims layer and generate different pig.jar for different version of hadoop. > > A possible way of addressing this would be: > > * Pig should use the 'hadoop' to run Pig (this would help to cleanly > bring into the classpath the Hadoop depedencies). > We've already done in PIG-2239 > * Pig could have a whitelist of Hadoop version it supports and fail if > the current hadoop version is not supported (we could use version > regex/ranges) > * (what I'm suggesting in #3 above) Pig could use the Hadoop version > as a code selector whenever necessary. > > Thanks. > > Alejandro > > On Mon, Nov 7, 2011 at 11:15 AM, Olga Natkovich <[email protected]> > wrote: > > Hi, > > > > In the past we have for the most part avoided supporting multiple > versions of Hadoop with the same version of Pig. This is about to change > with release of Hadoop 23. We need to come up with a strategy on how to > support that. There are a couple of issues to consider: > > > > > > (1) Version numbering. Seems like encoding the information in the > last version number makes sense. The details of the encoding need to be > hashed out > > > > (2) Code changes required to support different version of Hadoop. > This time around we made an effort to make sure that the same code can work > with both. In the future that might not work and we would need to figure > out how to maintain different code base. Most likely we would have to have > additional branches off of main release branch > > > > (3) Anything else we need to consider? > > > > Olga > > >
