Avro Oozie Example

M, Paul Tue, 12 Mar 2013 13:02:27 -0700

Can anyone provide a sample workflow.xml file showing the properties needed 
using the <map-reduce> action.   I have provided my current settings, however 
no luck.   I do have the m/r job working fine in a driver so I know that the 
issue is likely a property configuration.  My plan thus far has been to take 
each property I set in the driver and apply it to the xml, however that 
transfer of key/values is proving to be difficult.   FYI, I am using CDH 4.2.0, 
hadoop-2.0.0-mr1-cdh4.2.0.  The oozie package is the one packaged in CDH 4.2.0.


Also, for the "avro.output.schema", is it really necessary to include the 
actual json schema pair, or is there a way to just specify a file/class?


This is the current exception that I am receiving.:

2013-03-12 12:44:00,053 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from attempt_201303111118_0046_m_000000_1: java.lang.NullPointerException
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:842)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:377)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:407)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1407)
at org.apache.hadoop.mapred.Child.main(Child.java:262)


<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${outputDir}" />
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>

<!-- Basic mapred Config -->
<property>
<name>mapred.reducer.class</name>
<value>org.apache.avro.mapred.HadoopReducer</value>
</property>
<property>
<name>avro.reducer</name>
<value>org.myproject.mapreduce.CombineAvroRecordsByHourReducer
</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.avro.mapred.HadoopMapper</value>
</property>

<property>
<name>avro.mapper</name>
<value>org.myproject.mapreduce.ParseMetadataAsTextIntoAvroMapper
</value>
</property>
<property>
<name>avro.output.schema</name>
<value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields"...}]}
</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.avro.mapred.AvroKey</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.avro.mapred.AvroValue</value>
</property>


<property>
    <name>avro.schema.output.key</name>
    <value>{"type":"record","name":"MyRecord","namespace":...]}</value>
</property>


<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
</value>
</property>

<property>
<name>mapred.output.key.comparator.class</name>
<value>org.apache.avro.mapred.AvroKeyComparator</value>
</property>

<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>



<!--Input/Output -->
<property>
<name>mapred.input.dir</name>
<value>/user/${wf:user()}/input/</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/${wf:user()}/${outputDir}</value>
</property>
</configuration>
</map-reduce>

Avro Oozie Example

Reply via email to