I am running DAGs generated by Hive for Tez in offline mode; as in I store
the DAGs to disk and then run them later using my own Tez Client.

I have been able to get this setup going in local mode. However, while
running on the cluster, I hit Processor class not found exception (snippet
below). I figure this is because, custom processor classes defined in Hive
(eg: HiveSplitGenerator) is not visible while executing a mapper.

I have uploaded, hive exec jar (apache-hive-2.0.0-SNAPSHOT-bin.tar.gz) to
HDFS and pointed ${tez.aux.uris} to that location. Not sure what more is
needed to make hive Classes visible to tez tasks ? "tar.gz" does not work ?


2015-09-11 00:59:02,973 INFO [Dispatcher thread: Central]
impl.VertexImpl: Recovered Vertex State,
vertexId=vertex_1441949856963_0006_1_02 [Map 1], state=NEW,
numInitedSourceVertices=0, numStartedSourceVertices=0,
numRecoveredSourceVertices=0, recoveredEvents=0, tasksIsNull=false,
numTasks=0
2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central]
impl.VertexImpl: Root Inputs exist for Vertex: Map 4 :
{a={InputName=a},
{Descriptor=ClassName=org.apache.tez.mapreduce.input.MRInputLegacy,
hasPayload=true},
{ControllerDescriptor=ClassName=org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator,
hasPayload=false}}
2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central]
impl.VertexImpl: Starting root input initializer for input: a, with
class: [org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator]
2015-09-11 00:59:02,974 INFO [Dispatcher thread: Central]
impl.VertexImpl: Setting vertexManager to RootInputVertexManager for
vertex_1441949856963_0006_1_00 [Map 4]
2015-09-11 00:59:02,979 INFO [Dispatcher thread: Central]
impl.VertexImpl: Num tasks is -1. Expecting
VertexManager/InputInitializers/1-1 split to set #tasks for the vertex
vertex_1441949856963_0006_1_00 [Map 4]
2015-09-11 00:59:02,979 INFO [Dispatcher thread: Central]
impl.VertexImpl: Vertex will initialize from input initializer.
vertex_1441949856963_0006_1_00 [Map 4]
2015-09-11 00:59:02,980 INFO [Dispatcher thread: Central]
impl.VertexImpl: Vertex will initialize via inputInitializers
vertex_1441949856963_0006_1_00 [Map 4]. Starting root input
initializers: 1
2015-09-11 00:59:02,981 ERROR [Dispatcher thread: Central]
common.AsyncDispatcher: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
        at 
org.apache.tez.common.ReflectionUtils.getClazz(ReflectionUtils.java:45)
        at 
org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:96)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)

Reply via email to