I'm looking into improving the performance of one of my pig jobs. I figured storing the data which I keep reusing in a binary/serialized format could help me a little with this and thus stumbled upon zebra. It seems like a nice abstraction and seems to do exactly what I want to achieve.
I started with something simple but that doesn't work. register zebra-0.6.0-dev.jar; dim_calendar = load '/user/dwh/dim/calendar.csv' using PigStorage('\t') as (cldr_id: long, iso_date: chararray); outfile = order dim_calendar by iso_date parallel 1; store outfile into '/user/dwh/calendar.zebra' using org.apache.hadoop.zebra.pig.TableStorer('cldr_id: long, iso_date:string'); On running this I get: --------------- ERROR 2117: Unexpected error when launching map reduce job. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 97 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1003) at org.apache.pig.PigServer.registerQuery(PigServer.java:385) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:720) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:352) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:194) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:780) at org.apache.pig.PigServer.execute(PigServer.java:773) at org.apache.pig.PigServer.access$100(PigServer.java:89) at org.apache.pig.PigServer$Graph.execute(PigServer.java:951) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:998) ... 7 more Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.zebra.pig.TableOutputFormat at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:428) at java.lang.Thread.dispatchUncaughtException(Thread.java:1831) ----- Any idea why? TableOutputFormat is an inner class of TableStorer so I'm a little puzzled how it could find one but not the other. fyi.. I'm using hadoop-0.20.1 and pig/zebra from trunk but haven't updated pig in a few weeks. Thanks, Bennie.