Hi Stan,
Here is my pig script:
REGISTER avro-1.4.0.jar
REGISTER joda-time-1.6.jar
REGISTER json-simple-1.1.jar
REGISTER jackson-core-asl-1.5.5.jar
REGISTER jackson-mapper-asl-1.5.5.jar
REGISTER pig-0.9.1-SNAPSHOT.jar
REGISTER dwh-udf-0.1.jar
REGISTER piggybank.jar
REGISTER linkedin-pig-0.8.jar
REGISTER google-collect-1.0-rc2.jar;
A = LOAD '/user/hshankar/temp' USING PigStorage();RMF
'/user/hshankar/out1';STORE A INTO '/user/hshankar/out1' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('{"type": "record",
"name": "test", "fields": [{"name":"my_region", "type": "string"}]}');
On executing it, I get this error:
2011-12-13 18:16:35,133 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3)
Details at logfile: /export/home/hshankar/pig_scripts/pig_1323800194535.log
Log file contains:
Pig Stack Trace
---------------
ERROR 1200: Pig script failed to parse: MismatchedTokenException(93!=3)
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
during parsing. Pig script failed to parse: MismatchedTokenException(93!=3)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1652)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1597)
at org.apache.pig.PigServer.registerQuery(PigServer.java:583)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:553)
at org.apache.pig.Main.main(Main.java:108)
Caused by: Failed to parse: Pig script failed to parse:
MismatchedTokenException(93!=3)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1644)
... 9 more
Caused by: MismatchedTokenException(93!=3)
at
org.apache.pig.parser.AstValidator.recoverFromMismatchedToken(AstValidator.java:209)
at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at
org.apache.pig.parser.AstValidator.func_clause(AstValidator.java:3497)
at
org.apache.pig.parser.AstValidator.store_clause(AstValidator.java:4626)
at
org.apache.pig.parser.AstValidator.op_clause(AstValidator.java:970)
at
org.apache.pig.parser.AstValidator.general_statement(AstValidator.java:574)
at
org.apache.pig.parser.AstValidator.statement(AstValidator.java:396)
at org.apache.pig.parser.AstValidator.query(AstValidator.java:306)
at
org.apache.pig.parser.QueryParserDriver.validateAst(QueryParserDriver.java:236)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
... 10 more
================================================================================
On Tue, Dec 13, 2011 at 9:05 PM, Stan Rosenberg <
[email protected]> wrote:
> The following test script works for me:
> =============================================
>
> A = load '$LOGS' using org.apache.pig.piggybank.storage.avro.AvroStorage();
> describe A;
>
> B = foreach A generate region as my_region, google_ip;
>
> dump B;
>
> store B into './output' using
> org.apache.pig.piggybank.storage.avro.AvroStorage(
> '{"debug": 5,
> "schema": {"type": "record", "name": "test", "fields": [{"name":
> "my_region", "type": ["null", "string"]}, {"name": "ip", "type":
> ["null", "string"]}]}
> }');
> =============================================================
> Note you don't need to pass the first parameter, i.e., 'schema'; you
> can just pass a string formatted in json.
> If you're still getting MismatchException, please compile a small
> repro and send it to the list.
>
> stan
>
> On Tue, Dec 13, 2011 at 5:49 AM, IGZ Nick <[email protected]> wrote:
> > Hi all,
> >
> > I want to keep the pig script and storage schema separate. Is it possible
> > to do this in a clean way? THe only way that has worked so far is to do
> > like:
> > AvroStorage('schema',
> >
> '{"name":"xyz","type":"record","fields":[{"name":"abc","type":"string"}]}');
> >
> > That too, all the schema in one line. If I split it onto multiple lines,
> I
> > get a MismatchException (93-3) or something like that. Is there no way to
> > do AvroStorage('file', <hdfs path of schema file>) or something of that
> > sort, or at least be able to specify the schema in multiple lines?
> >
> > Thanks,
>