I put the metastore database connection settings into a hive-site.xml file
bundled with my workflow definition and reference it using the <job-xml>
element within the Hive action. My hive-site.xml looks almost exactly the same
as your hive-settings.xml.
I used to be able to explain what exactly is happening as I had a similarly
maddening learning curve setting it up but it's been over a year now and I
can't remember the details. Give it a shot and let us know how it goes?
Snippet from a workflow template I use:
<action name="Process">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<script>${wf_script_path}</script>
<param>...</param>
</hive>
<ok to="end"/>
<error to="FailureEmail"/>
</action>
Hope that helps,
Paul Chavez
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Thursday, October 17, 2013 1:47 PM
To: [email protected]
Subject: Why don't tables I create in Hive jobs in Oozie stay around?
I'm evaluating Oozie for my team and work to decide if we want to use it.
I have been working through the examples that Apache Oozie project has for
download. I have spent much of the day trying to get basic Hive jobs to work
with mostly disappointing results. I'd like some help on what to do.
I've tried using the hive action and running hive via a shell action.
After struggling with write permissions to the /user/hive HDFS warehouse
directory, the former only partially works (even when it reports success) and
the latter never works and doesn't report any useful errors.
*Hive via Hive action:*
$ tree hive2
hive2
|-- hive-settings.xml
|-- job.properties
|-- script.sql
|-- workflow.xml
$ cat script.sql
DROP TABLE IF EXISTS yytest2;
CREATE EXTERNAL TABLE yytest2 (x int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/root/examples/input-data/yytable2/yytable';
INSERT OVERWRITE DIRECTORY '/user/root/examples/output-data/hive2yy' SELECT
* FROM yytest2;
CREATE TABLE IF NOT EXISTS xmas (x int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA INPATH '/user/root/examples/input-data/yytable2/yytable' INTO TABLE
xmas;
$ cat workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
<start to="hive-node"/>
<action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot
}/output-data/hive2"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot
}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.hive.defaults</name>
<value>hive-settings.xml</value>
</property>
</configuration>
<script>script.sql</script>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>BOO HOO2: Hive2 failed, error message[${wf:errorMessage(wf:
lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
$ cat job.properties
nameNode=hdfs://10.230.138.159:8020
jobTracker=http://10.230.138.159:50300
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot
}/apps/hive2
$ cat hive-settings.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.230.138.159:3306/hive?createDatabaseIfNotExist
=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hiveuser</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>
When I run it now reports SUCCESS.
It worked in the sense that the
/user/root/examples/output-data/hive2yy/000000_0 file is present in
HDFS(created via the first table) and the
/user/root/examples/input-data/yytable2/yytable/integers.out file is missing
(because it got moved when Hive created the "xmas" managed table).
But the tables I just created are not present in Hive nor are they in the Hive
metastore (MySQL):
$ hive -e "show tables"
OK
tab_name
Time taken: 2.153 seconds
But they must have been created in order for the other results to be true.
When I run the script.sql by hand in Hive it all works fine and the tables
remain. *Why do the tables disappear when I run this via Oozie?
*
This email is long enough. I'll send my other issue (Hive via shell
action) in another email.
*
*
*For reference:*
I'm using a CentOS VMWare VM running Hortonworks HDP 1.3 Oozie version:
3.3.2.1.3.2.0-111 Hive version: 0.11 (0.11.0.1.3.2.0-111)
Thanks for any help you can provide,
-Michael