Hi, I've got some Pig scripts that access data via HCat. They run fine on the command line but if I try to get some executed as part of an Oozie action it is failing. Unfortunately with very little detailed error messages.
So before I go into the specifics can I clarify what is needed to get Pig/HCat integration working with Oozie? I'm running this on CDH5 and the output of "oozie admin -listsharelib" includes Pig and Hcatalog. Within my Pig scripts I am referring to HCat tables by name alone, i.e. no hcat:// URI. The hive-site.xml that works for Hive actions is available. I have other non-HCat workflows running fine, including Pig and Hive actions. When I run Pig scripts that use HCatalog from the CLI I need specify -useHcatalog and have HCAT.BIN defined; should I be passing values for these to the Pig script within <argument> elements in the action definition? (I've tried both with and without). Anything else that is required for this to work? Or pointers to any documentation with examples/specs for what's needed? I found different parts of the picture spread around but no definitive spec or full examples. I cut my script down to the following; note the commented out second statement, we don't even get as far as trying to read the (existing and containing data) table: REGISTER /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/pig/piggybank.jar mydata = LOAD 'testtable_hcat' USING org.apache.hcatalog.pig.HCatLoader(); -- store mydata into '/tmp/zz.out' using PigStorage(); I commented out the store because the only error I get includes the seeming code JA018 and it was suggested on the Google this may be permission related. Anything I need consider here? The cluster isn't using any external security provider and only basic authentication: job_1402172905909_0054 FAILED/KILLEDJA018 Here's the workflow.xml: <workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf"> <start to="pig-node"/> <action name="pig-node"> <pig> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>${workflowRoot}/hive-site.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <script>${workflowRoot}/pig/simple.pig</script> </pig> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Pig action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app> Thanks Garry