I'm trying to build a simple mapreduce job that reads avro files and outputs
plain text.
I pulled data from a mysql database with sqoop and wrote the files out as
snappy compressed avro files. I've operated on the files using AvroStorage in
pig, but the current task I'm trying to accomplish would be better suited with
a plain MR job I think. I'm using avro 1.7.4 but I think the version used to
generate the files is 1.5.3 (whatever ships with HDP 1.2). I've tried depending
on avro 1.5.3 in my project but I get the same error (but a different line).
When I try to execute my job the following exception is printed:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal;
at org.apache.avro.Schema$Parser.parse(Schema.java:924)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at my.classpath.jobs.CandidatesFor.run(CandidatesFor.java:44)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at my.classpath.jobs.CandidatesFor.main(CandidatesFor.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Below is the setup portion of my mapreduce job. The failure occurs when I call
parse on the Schema object.
public class CandidatesFor extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), getClass());
conf.setJobName("CandidatesFor");
InputStream is =
getClass().getClassLoader().getResourceAsStream("avro/data.avsc");
assert null != is;
String schemaString = IOUtils.toString(is);
System.out.println(schemaString);
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(schemaString); // exception is thrown here
…
Looking in the code for Schema the line is:
boolean saved = validateNames.get();
I wrote a unit test to help me understand how things hang together. The test
passes:
@Test
public void canReadSchema() throws Exception {
ClassLoader loader = getClass().getClassLoader();
InputStream schema_is = loader.getResourceAsStream("avro/data.avsc");
File data = new
File(loader.getResource("avro/part-m-00000.avro").toURI());
assertNotNull(data);
Schema schema = new Schema.Parser().parse(schema_is);
DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new
DataFileReader<GenericRecord>(data, datumReader);
GenericRecord case_ = null;
while (dataFileReader.hasNext()) {
case_ = dataFileReader.next(case_);
assertNotNull(case_.get("subject"));
assertNotNull(case_.get("description"));
}
}
I'm sure I'm missing something obvious but I don't know enough to recognize it.
Any help would be greatly appreciated.
Thanks