AFAIK, there is no way to implicitly map tuple fields to those loaded from
a schema file.


On Tue, Dec 13, 2011 at 11:45 AM, IGZ Nick <[email protected]> wrote:

> ah ok.. Isn't there anything that would take the elements in order as it
> is? Because mapping each field would almost lead to the same coupling
> between the schema file and the pig script which I am trying to avoid
>
>
> On Wed, Dec 14, 2011 at 12:21 AM, Bill Graham <[email protected]>wrote:
>
>> You still need to map the Tuple fields to the avro schema fields. See the
>> unit test for an example, or section 4.C of the documentation. It reads the
>> schema from a data file, but the same approach is used when using
>> schema_file instead.
>>
>> https://cwiki.apache.org/confluence/display/PIG/AvroStorage
>>
>>
>> On Tue, Dec 13, 2011 at 10:15 AM, IGZ Nick <[email protected]> wrote:
>>
>>> Hi Bill,
>>>
>>> I tried schema_file but I get this error:
>>>
>>> grunt> STORE A INTO '/user/hshankar/out1'  USING
>>> org.apache.pig.piggybank.storage.avro.AvroStorage ('{"schema_file":
>>> "/user/hshankar/schema1.schema"}');
>>> 2011-12-13 18:06:00,879 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>>> ERROR 1200: could not instantiate
>>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>> Details at logfile:
>>> /export/home/hshankar/pig_scripts/pig_1323798959597.log
>>>
>>> This is what the logfile contains:
>>>
>>> ================================================================================
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 1200: could not instantiate
>>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>>
>>> Failed to parse: could not instantiate
>>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>>         at
>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:180)
>>>         at
>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1622)
>>>         at
>>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1595)
>>>         at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
>>>         at
>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>>>         at
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>         at
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>>>         at
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>>>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
>>>         at org.apache.pig.Main.run(Main.java:487)
>>>         at org.apache.pig.Main.main(Main.java:108)
>>> Caused by: java.lang.RuntimeException: could not instantiate
>>> 'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments
>>> '[{"schema_file":  "/user/hshankar/schema1.schema"}]'
>>>         at
>>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:492)
>>>         at
>>> org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:699)
>>>         at
>>> org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:688)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:3956)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:5450)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1041)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:638)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:459)
>>>         at
>>> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:357)
>>>         at
>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171)
>>>         ... 10 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>         at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>         at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>>         at
>>> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:482)
>>>         ... 19 more
>>> Caused by: java.io.IOException: Invalid parameter:schema_file
>>>         at
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:601)
>>>         at
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.init(AvroStorage.java:518)
>>>         at
>>> org.apache.pig.piggybank.storage.avro.AvroStorage.<init>(AvroStorage.java:433)
>>>         ... 24 more
>>>
>>> ================================================================================
>>>
>>> I am using pig version 0.9.1-SNAPSHOT
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 13, 2011 at 10:29 PM, Bill Graham <[email protected]>wrote:
>>>
>>>> Yes, you can reference an Avro schema file in HDFS with the
>>>> "schema_file"
>>>> param. See
>>>> TestAvroStorage.testRecordWithFieldSchemaFromTextWithSchemaFile
>>>> here for an example:
>>>>
>>>>
>>>> http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
>>>>
>>>> On Tue, Dec 13, 2011 at 2:49 AM, IGZ Nick <[email protected]> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > I want to keep the pig script and storage schema separate. Is it
>>>> possible
>>>> > to do this in a clean way? THe only way that has worked so far is to
>>>> do
>>>> > like:
>>>> > AvroStorage('schema',
>>>> >
>>>> >
>>>> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
>>>> >
>>>> > That too, all the schema in one line. If I split it onto multiple
>>>> lines, I
>>>> > get a MismatchException (93-3) or something like that. Is there no
>>>> way to
>>>> > do AvroStorage('file', <hdfs path of schema file>) or something of
>>>> that
>>>> > sort, or at least be able to specify the schema in multiple lines?
>>>> >
>>>> > Thanks,
>>>> >
>>>>
>>>
>>>
>>
>

Reply via email to