Hello,

I am trying to

(i)                  read avro files in pig on Amazon EMR which I have created 
in my local cluster from JSONs (complex nested including arrays) and uploaded 
to S3

(ii)                Create avro files in EMR from those complex JSONs uploaded 
to S3

In my local Cloudera cluster I was able to load and work with the data in the 
avro file.

I was not able to load the existing avro files in Amazon EMR.

My EMR cluster is
´´
AMI version:3.0.4
Amazon 2.2.0
Hive 0.11.0.2,
Pig 0.11.1.1
Impala 1.2.1
´´

I searched a lot, but I could not find too much about EMR/Avro. I am stuck. Is 
there somewhere an example with data, schemas and pig scripts which I can try?

I hope this - as my 1st post in this mailing list - complies to your standards 
in terms of provided information and tone ,-). If not, apologies and let me try 
a 2nd time.

In pig I try this
´´
REGISTER s3://p3insight/libs/avro-1.7.4.jar;
-- REGISTER s3://p3insight/libs/pig/piggybank.jar;
REGISTER s3://p3insight/libs/jackson-mapper-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/jackson-core-2.3.4.jar
-- REGISTER s3://p3insight/libs/jackson-core-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/json-simple-1.1.1.jar;
REGISTER /home/hadoop/pig/lib/piggybank.jar

a = LOAD 's3://p3iqubole/data/avro/' USING 
org.apache.pig.piggybank.storage.avro.AvroStorage();
´´

Output is as follows
´´
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: 
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with 
arguments 'null'
Details at logfile: /mnt/var/log/apps/pig.log
´´

Content of log file is:
´´
Pig Stack Trace
---------------
ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: 
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with 
arguments 'null'

Failed to parse: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: 
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with 
arguments 'null'
        at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
        at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
        at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
        at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:542)
        at org.apache.pig.Main.main(Main.java:159)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: 
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with 
arguments 'null'
        at 
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
        at 
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3235)
        at 
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1314)
        at 
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:798)
        at 
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:516)
        at 
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:391)
        at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
        ... 15 more
Caused by: java.lang.RuntimeException: could not instantiate 
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
        at 
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
        at 
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:823)
        ... 21 more
Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
        at java.lang.Class.getConstructor0(Class.java:2803)
        at java.lang.Class.newInstance(Class.java:345)
        at 
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: 
org.json.simple.parser.ParseException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 27 more
´´



Kind regards.
Ralf Klüber

Reply via email to