Option 2 is consistent with 'Pigs eat anything.' Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
On Nov 8, 2011, at 8:05 AM, Alan Gates <[email protected]> wrote: > > On Nov 7, 2011, at 11:15 AM, Olga Natkovich wrote: > >> Hi, >> >> In the past we have for the most part avoided supporting multiple versions >> of Hadoop with the same version of Pig. This is about to change with release >> of Hadoop 23. We need to come up with a strategy on how to support that. >> There are a couple of issues to consider: >> >> >> (1) Version numbering. Seems like encoding the information in the last >> version number makes sense. The details of the encoding need to be hashed out > > I can see two options. One is to do major.minor.patch.hadoopversion, so for > example 0.10.1.h23 and 0.10.1.h20. The problem I see with that is we *have* > to guarantee that they have the same functionality. That is, 0.10.1 has all > the same patches regardless of which Hadoop version it is (excepting maybe > patches specific to a particular Hadoop version), the only difference is > which one it's compiled for. Another problem is that this will proliferate > versions, cluttering up our website, confusing our users, and causing the PMC > members vote after vote. > > The second option would be to rework the pig package so that it had the jars > for both, and the pig shell script figures out based on the Hadoop it finds > which version is being used. This has the nice feature of guaranteeing the > same features, but it has a few downsides. One, it bloats our package (since > it's carrying multiple jars). Two, what happens when someone wants to add > support for a new version (say Hadoop 22) to an existing release? Three, now > a release manager must have access to all versions of Hadoop we claim to > cover, or wait for help from those who do, in order to test a release. > > Hive chose the second option, and dealt with the bloating issue by isolating > all the version specific code in one jar. > > We could deal with the concern of adding new versions to an existing release > by saying it's not allowed. If you want to add a new supported version then > you create a new version. This will devolve into versions 0.10 and 0.12 work > on 20 and 23, but 0.11 works on 22. That will be horribly confusing for our > users. > > I think the third issue of testability is going to mean certain Pig versions > only support certain Hadoop versions without it being explicitly marked as > well. Again, I think this is really bad. > > So I vote for the major.minor.patch.hadoopversion solution, though I think we > should work hard to make it clear to users how to select the right version of > Pig when downloading it. > > >> >> (2) Code changes required to support different version of Hadoop. This >> time around we made an effort to make sure that the same code can work with >> both. In the future that might not work and we would need to figure out how >> to maintain different code base. Most likely we would have to have >> additional branches off of main release branch > > Hopefully we can continue to do this via conditional compilation. Having > different branches isn't maintainable. How do I push a Hadoop version > specific patch to the next release? We'll get an ever growing collection of > patches that have to be applied on a Hadoop specific branch for every > release. We need to continue the rule that any patch must apply to the > trunk, even when it's version specific. > >> >> (3) Anything else we need to consider? >> >> Olga > > Alan.
