Hi, Oozie users:
Currently I start using oozie 3.3.2, coming with IBM BigInsights 3.0.0.2, and I 
try to make the oozie hive action work in my environment. I finally made it 
work, but have 2 questions and would like if anyone here can give me a little 
more detail help.
Here is my workflow xml for Hive action, which works as now:<action 
name="hiveAction">
    <hive>
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        
<job-xml>${nameNode}/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml>
        <script>hql/hive.hql</script>
    </hive>
    <ok to="success"/>
    <error to="emailError"/>
</action>I use a HDFS location to be set in the <job-xml>, as I tried to avoid 
to assembly the "hive-site.xml" with my oozie application every time if the 
hive action is used. I know the correct hive-site.xml is stored in the HDFS 
there by my vendor.
My first question is following:If I omit the "<job-xml>" in the workflow.xml, 
but already set the parameter "oozie.use.system.libpath=true", I will get 
"databases/table not found error" in my hive-action. It looks like 
oozie/hive-action cannot find the correct hive metastore information in this 
case, but why? If I set to use the "sharedLibraries" in this case, should 
sharedLibraries/hive/conf already included in the classpath for oozie runtime?
I checked the log when the oozie action failed, I saw following in the 
parameters:Dir: conf
  File: hive-site.xml
  File: .hive-site.xml.crcIf I omitted the "<job-xml>", where is the above 
hive-site.xml coming from? If it is in fact coming from the shareLibraries, why 
it didn't work?  I have to explicitly set it in the "<job-xml>"?
The 2nd question is more trouble for me. In the hive, I have a following table 
defined as a Hive Avro table:
create external table test  row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  stored as  inputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  outputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  location 
'/xxx/data/'  TBLPROPERTIES (        
'avro.schema.url'='hdfs:///location/schema/coupon.avsc'  );
The table works fine and can be queried in the Hive CLI without any issue. But 
If I tried to query it in the oozie/hive-action, I got the following error in 
the log:
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs:/location/schema/coupon.avsc, expected: hdfs://namenode:9000
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:651)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:181)In
 the oozie runtime, all the hadoop configuration files are passed in. Like in 
the core-site.xml, following is defined:  <property>    <!-- The default file 
system used by Hadoop -->    <name>fs.defaultFS</name>    
<value>hdfs://namenode:9000</value>  </property>
But I don't know why in the oozie/hive-action, the above error will dump out. 
Currently I have to go to Hive and manually change the table definition to 
following, to make it work in the oozie/hive-action:
create external table coupon  row format serde 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'  stored as  inputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'  outputformat 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  location 
'/xxx/coupon/'  TBLPROPERTIES (        
'avro.schema.url'='hdfs://namenode:9000/location/schema/coupon.avsc'  );
I have lots of existing tables of AVRO format in Hive, which means I have to 
manually change all of their "avro.schema.url" in this case, to make them work 
for oozie/hive-action. 
Any idea why? Do I have any option without changing the Hive tables definition?
Thanks
Yong
                                          

Reply via email to