Hi,


Digging into this some more I have a little more information about the problem.



The simple Pig script is as below, i.e.:



REGISTER /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/pig/piggybank.jar



mydata = LOAD 'testtable_hcat' USING org.apache.hive.hcatalog.pig.HCatLoader();

This fails with the following message in the Oozie CLI and UI:

0000003-140615021945919-oozie-oozi-W@pig-node                                 
ERROR     job_1402823993415_0004 FAILED/KILLEDJA018



Though that particular MR job id is marked as successful in the MR and YARN 
logs. Which is I think why it's proving difficult to find any more logging.



What does work:

* Hive actions within Oozie

* Other Pig actions (that don't use HCatalog) within Oozie

* This Pig script run from the CLI as either the submitting or yarn user



I did change 2 things; the package name for the HCATLoader as the 
org.apache.hcatalog.* is now deprecated in favour of org.apache.hive.hcatalog.* 
and the /user/yarn directory was not present. But neither made an impact.



I think the JA018 -- referred to as being due to the output dir already 
existing  in oozie-defaults.xml is actually referring to something else. 
Possibly a missing library.



To run the script from the command line I add the -useHCatalog argument to Pig 
which explicitly adds jars to the classpath. Though many of these would be for 
the hcat binary etc which I'm not using. The HCatalog adaptor for Pig though 
does appear to be in the Oozie sharelib:



[cloudera@localhost ~]$ oozie admin -shareliblist hcatalog | grep -i pig

        
hdfs://localhost.localdomain:8020/user/oozie/share/lib/lib_20140404112820/hcatalog/hive-hcatalog-pig-adapter-0.12.0-cdh5.0.0.jar



Any insight in any of the above from anyone? The fact I can't find any examples 
of this Oozie/Pig/Hcat combo working isn't filling me with confidence.



One thing that would help -- if Pig is dropping an error log file is there any 
way of capturing that/making it available? I tried doing the equivalent of a 
"pig -l > <destination>" in the workflow.xml but that didn't seem to work 
either.



Or any thoughts on when things would be failing in such a way that the 
MapReduce job is logged as successful but Oozie sees the action as 
failed/killed?



Any pointers well received,

Garry





-----Original Message-----
From: Garry Turkington [mailto:g.turking...@improvedigital.com]
Sent: 11 June 2014 00:11
To: user@oozie.apache.org
Subject: RE: Using HCat within a Pig action



Mona,



Thanks for the response.



That doesn't quite look like my problem though; my Hive Oozie actions are 
working fine. As are my Pig Oozie actions, but things start breaking when 
trying to use HCat from within the Pig action.



Are there any additional arguments required -- or configuration options -- for  
a Pig job using HCat? Or any working  examples anywhere?



Thanks

Garry



-----Original Message-----

From: Mona Chitnis [mailto:chit...@yahoo-inc.com.INVALID]

Sent: 09 June 2014 19:20

To: user@oozie.apache.org<mailto:user@oozie.apache.org>

Subject: Re: Using HCat within a Pig action



Looks like some discussion on this problem already 
https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/m8NnJvzxGAQ



On 6/9/14, 6:00 AM, "Garry Turkington" 
<g.turking...@improvedigital.com<mailto:g.turking...@improvedigital.com>>

wrote:



>Hi,

>

>I've got some Pig scripts that access data via HCat. They run fine on

>the command line but if I try to get some executed as part of an Oozie

>action it is failing. Unfortunately with very little detailed error messages.

>

>So before I go into the specifics can I clarify what is needed to get

>Pig/HCat integration working with Oozie?

>

>I'm running this on CDH5 and the output of "oozie admin -listsharelib"

>includes Pig and Hcatalog. Within my Pig scripts I am referring to HCat

>tables by  name alone, i.e. no hcat:// URI. The  hive-site.xml that

>works for Hive actions is available. I have other non-HCat workflows

>running fine, including Pig and Hive actions.

>

>When I run Pig scripts that use HCatalog from the CLI I need specify

>-useHcatalog and have HCAT.BIN defined; should I be passing values for

>these to the Pig script within <argument> elements in the action

>definition? (I've tried both with and without).

>

>Anything else that is required for this to work? Or pointers to any

>documentation with examples/specs for what's needed? I found different

>parts of the picture spread around but no definitive spec or full

>examples.

>

>I cut my script down to the following; note the commented out second

>statement, we don't even get as far as trying to read the (existing and

>containing data) table:

>

>REGISTER

>/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/pig/piggybank.jar

>

>mydata = LOAD 'testtable_hcat' USING

>org.apache.hcatalog.pig.HCatLoader();

>-- store mydata into '/tmp/zz.out' using PigStorage();

>

>I commented out the store because the only error I get includes the

>seeming code JA018 and it was suggested on the Google this may be

>permission related. Anything I need consider here? The cluster isn't

>using any external security provider and only basic authentication:

>

>job_1402172905909_0054 FAILED/KILLEDJA018

>

>Here's the workflow.xml:

><workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">

>    <start to="pig-node"/>

>    <action name="pig-node">

><pig>

>            <job-tracker>${jobTracker}</job-tracker>

>            <name-node>${nameNode}</name-node>

>            <job-xml>${workflowRoot}/hive-site.xml</job-xml>

>            <configuration>

>                <property>

>                    <name>mapred.job.queue.name</name>

>                    <value>${queueName}</value>

>                </property>

>            </configuration>

>            <script>${workflowRoot}/pig/simple.pig</script>

>        </pig>

>        <ok to="end"/>

>        <error to="fail"/>

></action>

>

>    <kill name="fail">

>        <message>Pig action failed, error

>message[${wf:errorMessage(wf:lastErrorNode())}]</message>

>    </kill>

>    <end name="end"/>

></workflow-app>

>

>Thanks

>Garry

>

>





-----

No virus found in this message.

Checked by AVG - www.avg.com<http://www.avg.com>

Version: 2014.0.4570 / Virus Database: 3955/7637 - Release Date: 06/07/14



-----

No virus found in this message.

Checked by AVG - www.avg.com<http://www.avg.com>

Version: 2014.0.4570 / Virus Database: 3955/7637 - Release Date: 06/07/14

Reply via email to