Dmitriy V. Ryaboy commented on PIG-924:

Daniel, you've hit the nail on the head.

This patch is specifically written to enable us to compile against all the 
versions of hadoop, and let the user pick which one he wants at runtime (by 
virtue of including the right hadoop on the path -- no flags needed).  In fact 
the default ant task in the shims directory compiles all the shims at once.

The version string hack is safe, as long as hadoop is built correctly (the 
zebra version is not, as it returns "Unknown", hence the last-resort hack of 
defaulting to 20).
If hadoop came from its own jar I could use reflection to get the jar name, and 
use that as a fallback for an Unknown version -- but in pig, hadoop comes from 
the pig.jar !

Ideally, Pig would compile all the versions of shims into its jars, and the pig 
jar woud not include hadoop. Then the user would include the right hadoop on 
the path (or bin/pig would do it for him), and everything would happen 

By bundling hadoop into the jar, however, switching hadoop versions on the fly 
is next to impossible (or at least I don't know how) -- we have multiple jars 
on the classpath, and the classloader will use whatever is the latest (or is it 
earliest?). Finding the right resource becomes fraught with peril.

If existing deployments need a single pig.jar without a hadoop dependency, it 
might be possible to create a new target (pig-all) that would create a 
statically bundled jar; but I think the default behavior should be to not 
bundle, build all the shims, and use whatever hadoop is on the path.

The current patch is written as is so that it can be applied to trunk, enabling 
people to compile statically, and only require a change to the ant build files 
to switch to a dynamic compile later on (after 0.4, probably)

> Make Pig work with multiple versions of Hadoop
> ----------------------------------------------
>                 Key: PIG-924
>                 URL: https://issues.apache.org/jira/browse/PIG-924
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>         Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to