I am trying to setup oozie on a Cassandra cluster with the following changes
When I try to run an example job oozie server freezes up and is not responding
anymore.
Can you give me more info on why this is happening.
Thanks,
Deepak
The details of my server are as below
Oozie-site.xml
<property>
<name>oozie.service.HadoopAccessorService.supported.filesystems</name>
<value>hdfs,hftp,webhdfs,cfs</value>
<description>
Enlist the different filesystems supported for
federation. If wildcard "*" is specified,
then ALL file schemes will be allowed.
</description>
</property>
<property>
<name>cassandra.thrift.address</name>
<value>sjc-prd-dt21</value>
</property>
<property>
<name>cassandra.thrift.port</name>
<value>9160</value>
</property>
<property>
<name>cassandra.partitioner.class</name>
<value>org.apache.cassandra.dht.RandomPartitioner</value>
</property>
<property>
<name>cassandra.consistencylevel.read</name>
<value>LOCAL_QUORUM</value>
</property>
<property>
<name>cassandra.consistencylevel.write</name>
<value>LOCAL_QUORUM</value>
</property>
<property>
<name>cassandra.range.batch.size</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
<value>false</value>
</property>
Hadoop-site.xml
<property>
<name>fs.cfs.impl</name>
<value>com.datastax.bdp.hadoop.cfs.CassandraFileSystem</value>
</property>
<property>
<name>cassandra.thrift.address</name>
<value>sjc-prd-dt21</value>
</property>
<property>
<name>cassandra.thrift.port</name>
<value>9160</value>
</property>
<property>
<name>fs.default.name</name>
<value>cfs://sjc-prd-dt21:9160</value>
</property>
After starting the oozie server, I am running an example job as
/usr/local/oozie/bin/oozie job -oozie http://sjc-prd-dt21:11000/oozie -config
examples/apps/no-op/job.properties -run
The job.properties file looks like
nameNode=cfs://sjc-prd-dt21:9160
jobTracker=sjc-prd-dt22:8012
oozie.wf.application.path=${nameNode}/user/cassandra/examples/apps/no-op
in the oozie logs I see the following info before oozie freezes up
2014-11-04 14:20:20,022 DEBUG HadoopAccessorService:545 - USER[cassandra]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Checking if filesystem cfs is
supported
2014-11-04 14:20:20,030 DEBUG UserGroupInformation:146 - hadoop login
2014-11-04 14:20:20,031 DEBUG UserGroupInformation:95 - hadoop login commit
2014-11-04 14:20:20,032 DEBUG UserGroupInformation:125 - using local
user:UnixPrincipal: root
2014-11-04 14:20:20,034 DEBUG UserGroupInformation:493 - UGI loginUser:root
2014-11-04 14:20:20,036 DEBUG UserGroupInformation:1143 - PriviledgedAction
as:cassandra via root
from:org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:420)
2014-11-04 14:20:20,047 DEBUG FileSystem:1381 - Creating filesystem for
cfs://sjc-prd-dt21:9160/user/cassandra/examples/apps/no-op
2014-11-04 14:20:20,714 INFO StatusTransitService$StatusTransitRunnable:539 -
USER[-] GROUP[-] Acquired lock for
[org.apache.oozie.service.StatusTransitService]
2014-11-04 14:20:20,714 INFO PauseTransitService:539 - USER[-] GROUP[-]
Acquired lock for [org.apache.oozie.service.PauseTransitService]
2014-11-04 14:20:20,715 INFO StatusTransitService$StatusTransitRunnable:539 -
USER[-] GROUP[-] Running coordinator status service first instance
2014-11-04 14:20:20,957 INFO StatusTransitService$StatusTransitRunnable:539 -
USER[-] GROUP[-] Running bundle status service first instance
2014-11-04 14:20:20,959 INFO
CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable:539 - USER[-]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] CoordMaterializeTriggerService - Curr
Date= Tue Nov 04 14:25:20 EST 2014, Num jobs to materialize = 0
2014-11-04 14:20:20,977 DEBUG ActionCheckerService$ActionCheckRunnable:545 -
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] QUEUING [] for potential
checking
2014-11-04 14:20:20,987 INFO StatusTransitService$StatusTransitRunnable:539 -
USER[-] GROUP[-] Released lock for
[org.apache.oozie.service.StatusTransitService]
2014-11-04 14:20:21,029 DEBUG PurgeXCommand:545 - USER[-] GROUP[-] TOKEN[-]
APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
2014-11-04 14:20:21,030 DEBUG PurgeXCommand:545 - USER[-] GROUP[-] TOKEN[-]
APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [30]
days, Coordinator Jobs older than [7] days, and Bundlejobs older than [7] days.
2014-11-04 14:20:21,030 DEBUG PurgeXCommand:545 - USER[-] GROUP[-] TOKEN[-]
APP[-] JOB[-] ACTION[-] ENDED Purge deleted [0] workflows, [0] coordinators,
[0] bundles
2014-11-04 14:20:21,033 INFO PauseTransitService:539 - USER[-] GROUP[-]
Released lock for [org.apache.oozie.service.PauseTransitService]
2014-11-04 14:20:21,047 DEBUG RecoveryService$RecoveryRunnable:545 - USER[-]
GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] QUEUING [ WF_ACTIONS 0, COORD_ACTIONS
: 0, COORD_READY_JOBS : 0, BUNDLE_ACTIONS : 0] for potential recovery
2014-11-04 14:20:23,624 INFO Services:539 - Shutdown