You still need to map the Tuple fields to the avro schema fields. See the unit test for an example, or section 4.C of the documentation. It reads the schema from a data file, but the same approach is used when using schema_file instead.
https://cwiki.apache.org/confluence/display/PIG/AvroStorage On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[email protected]> wrote: > Hi Bill, > > I tried schema_file but I get this error: > > grunt> STORE A INTO '/user/hshankar/out1' USING > org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": > "/user/hshankar/schema1.schema"}'); > 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1200: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > Details at logfile: /export/home/hshankar/pig_scripts/pig_1323798959597.log > > This is what the logfile contains: > > ================================================================================ > Pig Stack Trace > --------------- > ERROR 1200: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > > Failed to parse: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) > at > org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) > at > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) > at org.apache.pig.PigServer.registerQuery(PigServer.java:583) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) > at org.apache.pig.Main.run(Main.java:487) > at org.apache.pig.Main.main(Main.java:108) > Caused by: java.lang.RuntimeException: could not instantiate > 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments > '[{"schema_file": "/user/hshankar/schema1.schema"}]' > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) > at > org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) > at > org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) > at > org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) > at > org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) > at > org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) > at > org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) > at > org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) > at > org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) > ... 10 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482) > ... 19 more > Caused by: java.io.IOException: Invalid parameter:schema_file > at > org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433) > ... 24 more > > ================================================================================ > > I am using pig version 0.9.1-SNAPSHOT > > > > > > On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[email protected]>wrote: > >> Yes, you can reference an Avro schema file in HDFS with the "schema_file" >> param. See TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile >> here for an example: >> >> >> http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java >> >> On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[email protected]> wrote: >> >> > Hi all, >> > >> > I want to keep the pig script and storage schema separate. Is it >> possible >> > to do this in a clean way? THe only way that has worked so far is to do >> > like: >> > AvroStorage('schema', >> > >> > >> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}'); >> > >> > That too, all the schema in one line. If I split it onto multiple >> lines, I >> > get a MismatchException (93-3) or something like that. Is there no way >> to >> > do AvroStorage('file', <hdfs path of schema file>) or something of that >> > sort, or at least be able to specify the schema in multiple lines? >> > >> > Thanks, >> > >> > >
