I'm evaluating Oozie for my team and work to decide if we want to use it.
I have been working through the examples that Apache Oozie project has for
download. I have spent much of the day trying to get basic Hive jobs to
work with mostly disappointing results. I'd like some help on what to do.
I've tried using the hive action and running hive via a shell action.
After struggling with write permissions to the /user/hive HDFS warehouse
directory, the former only partially works (even when it reports success)
and the latter never works and doesn't report any useful errors.
*Hive via Hive action:*
$ tree hive2
hive2
|-- hive-settings.xml
|-- job.properties
|-- script.sql
|-- workflow.xml
$ cat script.sql
DROP TABLE IF EXISTS yytest2;
CREATE EXTERNAL TABLE yytest2 (x int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/root/examples/input-data/yytable2/yytable';
INSERT OVERWRITE DIRECTORY '/user/root/examples/output-data/hive2yy' SELECT
* FROM yytest2;
CREATE TABLE IF NOT EXISTS xmas (x int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA INPATH '/user/root/examples/input-data/yytable2/yytable' INTO
TABLE xmas;
$ cat workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
<start to="hive-node"/>
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot
}/output-data/hive2"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot
}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.hive.defaults</name>
<value>hive-settings.xml</value>
</property>
</configuration>
<script>script.sql</script>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>BOO HOO2: Hive2 failed, error message[${wf:errorMessage(wf:
lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
$ cat job.properties
nameNode=hdfs://10.230.138.159:8020
jobTracker=http://10.230.138.159:50300
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot
}/apps/hive2
$ cat hive-settings.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.230.138.159:3306/hive?createDatabaseIfNotExist
=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hiveuser</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>
When I run it now reports SUCCESS.
It worked in the sense that the
/user/root/examples/output-data/hive2yy/000000_0 file is present in
HDFS(created via the first table) and the
/user/root/examples/input-data/yytable2/yytable/integers.out file is
missing (because it got moved when Hive created the "xmas" managed table).
But the tables I just created are not present in Hive nor are they in the
Hive metastore (MySQL):
$ hive -e "show tables"
OK
tab_name
Time taken: 2.153 seconds
But they must have been created in order for the other results to be true.
When I run the script.sql by hand in Hive it all works fine and the tables
remain. *Why do the tables disappear when I run this via Oozie?
*
This email is long enough. I'll send my other issue (Hive via shell
action) in another email.
*
*
*For reference:*
I'm using a CentOS VMWare VM running Hortonworks HDP 1.3
Oozie version: 3.3.2.1.3.2.0-111
Hive version: 0.11 (0.11.0.1.3.2.0-111)
Thanks for any help you can provide,
-Michael