Hi, Oozie users:
Currently I start using oozie 3.3.2, coming with IBM BigInsights 3.0.0.2, and I
try to make the oozie hive action work in my environment. I finally made it
work, but have 2 questions and would like if anyone here can give me a little
more detail help.
Here is my workflow xml for Hive action, which works as now:<action
name="hiveAction">
<hive>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/biginsights/oozie/sharedLibraries/hive/conf/hive-site.xml</job-xml>
<script>hql/hive.hql</script>
</hive>
<ok to="success"/>
<error to="emailError"/>
</action>I use a HDFS location to be set in the <job-xml>, as I tried to avoid
to assembly the "hive-site.xml" with my oozie application every time if the
hive action is used. I know the correct hive-site.xml is stored in the HDFS
there by my vendor.
My first question is following:If I omit the "<job-xml>" in the workflow.xml,
but already set the parameter "oozie.use.system.libpath=true", I will get
"databases/table not found error" in my hive-action. It looks like
oozie/hive-action cannot find the correct hive metastore information in this
case, but why? If I set to use the "sharedLibraries" in this case, should
sharedLibraries/hive/conf already included in the classpath for oozie runtime?
I checked the log when the oozie action failed, I saw following in the
parameters:Dir: conf
File: hive-site.xml
File: .hive-site.xml.crcIf I omitted the "<job-xml>", where is the above
hive-site.xml coming from? If it is in fact coming from the shareLibraries, why
it didn't work? I have to explicitly set it in the "<job-xml>"?
The 2nd question is more trouble for me. In the hive, I have a following table
defined as a Hive Avro table:
create external table test row format serde
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' stored as inputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location
'/xxx/data/' TBLPROPERTIES (
'avro.schema.url'='hdfs:///location/schema/coupon.avsc' );
The table works fine and can be queried in the Hive CLI without any issue. But
If I tried to query it in the oozie/hive-action, I got the following error in
the log:
Caused by: java.lang.IllegalArgumentException: Wrong FS:
hdfs:/location/schema/coupon.avsc, expected: hdfs://namenode:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:651)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:181)In
the oozie runtime, all the hadoop configuration files are passed in. Like in
the core-site.xml, following is defined: <property> <!-- The default file
system used by Hadoop --> <name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value> </property>
But I don't know why in the oozie/hive-action, the above error will dump out.
Currently I have to go to Hive and manually change the table definition to
following, to make it work in the oozie/hive-action:
create external table coupon row format serde
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' stored as inputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location
'/xxx/coupon/' TBLPROPERTIES (
'avro.schema.url'='hdfs://namenode:9000/location/schema/coupon.avsc' );
I have lots of existing tables of AVRO format in Hive, which means I have to
manually change all of their "avro.schema.url" in this case, to make them work
for oozie/hive-action.
Any idea why? Do I have any option without changing the Hive tables definition?
Thanks
Yong