ah ok.. Isn't there anything that would take the elements in order as it
is? Because mapping each field would almost lead to the same coupling
between the schema file and the pig script which I am trying to avoid

On Wed, Dec 14, 2011 at 12:21 AM, Bill Graham <[email protected]> wrote:

> You still need to map the Tuple fields to the avro schema fields. See the
> unit test for an example, or section 4.C of the documentation. It reads the
> schema from a data file, but the same approach is used when using
> schema_file instead.
>
> https://cwiki.apache.org/confluence/display/PIG/AvroStorage
>
>
> On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[email protected]> wrote:
>
>> Hi Bill,
>>
>> I tried schema_file but I get this error:
>>
>> grunt> STORE A INTO '/user/hshankar/out1'  USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file":
>> "/user/hshankar/schema1.schema"}');
>> 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1200: could not instantiate
>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>> Details at logfile:
>> /export/home/hshankar/pig_scripts/pig_1323798959597.log
>>
>> This is what the logfile contains:
>>
>> ================================================================================
>> Pig Stack Trace
>> ---------------
>> ERROR 1200: could not instantiate
>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>
>> Failed to parse: could not instantiate
>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>         at
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
>>         at
>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622)
>>         at
>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595)
>>         at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>>         at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
>>         at org.apache.pig.Main.run(Main.java:487)
>>         at org.apache.pig.Main.main(Main.java:108)
>> Caused by: java.lang.RuntimeException: could not instantiate
>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>         at
>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492)
>>         at
>> org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699)
>>         at
>> org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459)
>>         at
>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357)
>>         at
>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171)
>>         ... 10 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>         at
>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482)
>>         ... 19 more
>> Caused by: java.io.IOException: Invalid parameter:schema_file
>>         at
>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601)
>>         at
>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518)
>>         at
>> org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433)
>>         ... 24 more
>>
>> ================================================================================
>>
>> I am using pig version 0.9.1-SNAPSHOT
>>
>>
>>
>>
>>
>> On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[email protected]>wrote:
>>
>>> Yes, you can reference an Avro schema file in HDFS with the "schema_file"
>>> param. See
>>> TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile
>>> here for an example:
>>>
>>>
>>> http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>>>
>>> On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[email protected]> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I want to keep the pig script and storage schema separate. Is it
>>> possible
>>> > to do this in a clean way? THe only way that has worked so far is to do
>>> > like:
>>> > AvroStorage('schema',
>>> >
>>> >
>>> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
>>> >
>>> > That too, all the schema in one line. If I split it onto multiple
>>> lines, I
>>> > get a MismatchException (93-3) or something like that. Is there no way
>>> to
>>> > do AvroStorage('file', <hdfs path of schema file>) or something of that
>>> > sort, or at least be able to specify the schema in multiple lines?
>>> >
>>> > Thanks,
>>> >
>>>
>>
>>
>

Reply via email to