AFAIK, there is no way to implicitly map tuple fields to those loaded from a schema file.
On Tue, Dec 13, 2011 at 11:45 AM, IGZ Nick <[email protected]> wrote: > ah ok.. Isn't there anything that would take the elements in order as it > is? Because mapping each field would almost lead to the same coupling > between the schema file and the pig script which I am trying to avoid > > > On Wed, Dec 14, 2011 at 12:21 AM, Bill Graham <[email protected]>wrote: > >> You still need to map the Tuple fields to the avro schema fields. See the >> unit test for an example, or section 4.C of the documentation. It reads the >> schema from a data file, but the same approach is used when using >> schema_file instead. >> >> https://cwiki.apache.org/confluence/display/PIG/AvroStorage >> >> >> On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[email protected]> wrote: >> >>> Hi Bill, >>> >>> I tried schema_file but I get this error: >>> >>> grunt> STORE A INTO '/user/hshankar/out1' USING >>> org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file": >>> "/user/hshankar/schema1.schema"}'); >>> 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt - >>> ERROR 1200: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> Details at logfile: >>> /export/home/hshankar/pig_scripts/pig_1323798959597.log >>> >>> This is what the logfile contains: >>> >>> ================================================================================ >>> Pig Stack Trace >>> --------------- >>> ERROR 1200: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> >>> Failed to parse: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180) >>> at >>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622) >>> at >>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595) >>> at org.apache.pig.PigServer.registerQuery(PigServer.java:583) >>> at >>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942) >>> at >>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188) >>> at >>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164) >>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) >>> at org.apache.pig.Main.run(Main.java:487) >>> at org.apache.pig.Main.main(Main.java:108) >>> Caused by: java.lang.RuntimeException: could not instantiate >>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments >>> '[{"schema_file": "/user/hshankar/schema1.schema"}]' >>> at >>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492) >>> at >>> org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699) >>> at >>> org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459) >>> at >>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357) >>> at >>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171) >>> ... 10 more >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>> Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>> at >>> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>> at >>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482) >>> ... 19 more >>> Caused by: java.io.IOException: Invalid parameter:schema_file >>> at >>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601) >>> at >>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518) >>> at >>> org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433) >>> ... 24 more >>> >>> ================================================================================ >>> >>> I am using pig version 0.9.1-SNAPSHOT >>> >>> >>> >>> >>> >>> On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[email protected]>wrote: >>> >>>> Yes, you can reference an Avro schema file in HDFS with the >>>> "schema_file" >>>> param. See >>>> TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile >>>> here for an example: >>>> >>>> >>>> http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java >>>> >>>> On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[email protected]> wrote: >>>> >>>> > Hi all, >>>> > >>>> > I want to keep the pig script and storage schema separate. Is it >>>> possible >>>> > to do this in a clean way? THe only way that has worked so far is to >>>> do >>>> > like: >>>> > AvroStorage('schema', >>>> > >>>> > >>>> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}'); >>>> > >>>> > That too, all the schema in one line. If I split it onto multiple >>>> lines, I >>>> > get a MismatchException (93-3) or something like that. Is there no >>>> way to >>>> > do AvroStorage('file', <hdfs path of schema file>) or something of >>>> that >>>> > sort, or at least be able to specify the schema in multiple lines? >>>> > >>>> > Thanks, >>>> > >>>> >>> >>> >> >
