Hi Bill,
I tried schema_file but I get this error:
grunt> STORE A INTO '/user/hshankar/out1' USING
org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file":
"/user/hshankar/schema1.schema"}');
2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: could not instantiate
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[{"schema_file": "/user/hshankar/schema1.schema"}]'
Details at logfile: /export/home/hshankar/pig_scripts/pig_1323798959597.log
This is what the logfile contains:
================================================================================
Pig Stack Trace
---------------
ERROR 1200: could not instantiate
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[{"schema_file": "/user/hshankar/schema1.schema"}]'
Failed to parse: could not instantiate
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[{"schema_file": "/user/hshankar/schema1.schema"}]'
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595)
at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
at org.apache.pig.Main.run(Main.java:487)
at org.apache.pig.Main.main(Main.java:108)
Caused by: java.lang.RuntimeException: could not instantiate
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
'[{"schema_file": "/user/hshankar/schema1.schema"}]'
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492)
at
org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699)
at
org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688)
at
org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956)
at
org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482)
... 19 more
Caused by: java.io.IOException: Invalid parameter:schema_file
at
org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518)
at
org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433)
... 24 more
================================================================================
I am using pig version 0.9.1-SNAPSHOT
On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[email protected]> wrote:
> Yes, you can reference an Avro schema file in HDFS with the "schema_file"
> param. See TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile
> here for an example:
>
>
> http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>
> On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[email protected]> wrote:
>
> > Hi all,
> >
> > I want to keep the pig script and storage schema separate. Is it possible
> > to do this in a clean way? THe only way that has worked so far is to do
> > like:
> > AvroStorage('schema',
> >
> >
> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
> >
> > That too, all the schema in one line. If I split it onto multiple lines,
> I
> > get a MismatchException (93-3) or something like that. Is there no way to
> > do AvroStorage('file', <hdfs path of schema file>) or something of that
> > sort, or at least be able to specify the schema in multiple lines?
> >
> > Thanks,
> >
>