Hi,

I've got some Pig scripts that access data via HCat. They run fine on the 
command line but if I try to get some executed as part of an Oozie action it is 
failing. Unfortunately with very little detailed error messages.

So before I go into the specifics can I clarify what is needed to get Pig/HCat 
integration working with Oozie?

I'm running this on CDH5 and the output of "oozie admin -listsharelib" includes 
Pig and Hcatalog. Within my Pig scripts I am referring to HCat tables by  name 
alone, i.e. no hcat:// URI. The  hive-site.xml that works for Hive actions is 
available. I have other non-HCat workflows  running fine, including Pig and 
Hive actions.

When I run Pig scripts that use HCatalog from the CLI I need specify 
-useHcatalog and have HCAT.BIN defined; should I be passing values for these to 
the Pig script within <argument> elements in the action definition? (I've tried 
both with and without).

Anything else that is required for this to work? Or pointers to any 
documentation with examples/specs for what's needed? I found different parts of 
the picture spread around but no definitive spec or full examples.

I cut my script down to the following; note the commented out second statement, 
we don't even get as far as trying to read the (existing and containing data) 
table:

REGISTER  /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/pig/piggybank.jar

mydata = LOAD 'testtable_hcat' USING org.apache.hcatalog.pig.HCatLoader();
-- store mydata into '/tmp/zz.out' using PigStorage();

I commented out the store because the only error I get includes the seeming 
code JA018 and it was suggested on the Google this may be permission related. 
Anything I need consider here? The cluster isn't using any external security 
provider and only basic authentication:

job_1402172905909_0054 FAILED/KILLEDJA018

Here's the workflow.xml:
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
    <start to="pig-node"/>
    <action name="pig-node">
<pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>${workflowRoot}/hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>${workflowRoot}/pig/simple.pig</script>
        </pig>
        <ok to="end"/>
        <error to="fail"/>
</action>

    <kill name="fail">
        <message>Pig action failed, error 
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

Thanks
Garry


Reply via email to