I'm looking into improving the performance of one of my pig jobs. I
figured storing the data which I keep reusing in a binary/serialized
format could help me a little with this and thus stumbled upon zebra.
It seems like a nice abstraction and seems to do exactly what I want to
achieve.

I started with something simple but that doesn't work.

register zebra-0.6.0-dev.jar;
dim_calendar = load '/user/dwh/dim/calendar.csv' using PigStorage('\t')
as (cldr_id: long, iso_date: chararray);
outfile = order dim_calendar by iso_date parallel 1;
store outfile into '/user/dwh/calendar.zebra' using
org.apache.hadoop.zebra.pig.TableStorer('cldr_id: long, iso_date:string');

On running this I get:
---------------
ERROR 2117: Unexpected error when launching map reduce job.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
to store alias 97
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1003)
at org.apache.pig.PigServer.registerQuery(PigServer.java:385)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:352)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
2117: Unexpected error when launching map reduce job.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:194)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780)
at org.apache.pig.PigServer.execute(PigServer.java:773)
at org.apache.pig.PigServer.access$100(PigServer.java:89)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:951)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:998)
... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that
occured when launching map reduce job: java.lang.RuntimeException:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.zebra.pig.TableOutputFormat
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:428)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1831)
-----

Any idea why?
TableOutputFormat is an inner class of TableStorer so I'm a little
puzzled how it could find one but not the other.
fyi.. I'm using hadoop-0.20.1 and pig/zebra from trunk but haven't
updated pig in a few weeks.

Thanks,
Bennie.

Reply via email to