Hello,
I am trying to
(i) read avro files in pig on Amazon EMR which I have created
in my local cluster from JSONs (complex nested including arrays) and uploaded
to S3
(ii) Create avro files in EMR from those complex JSONs uploaded
to S3
In my local Cloudera cluster I was able to load and work with the data in the
avro file.
I was not able to load the existing avro files in Amazon EMR.
My EMR cluster is
´´
AMI version:3.0.4
Amazon 2.2.0
Hive 0.11.0.2,
Pig 0.11.1.1
Impala 1.2.1
´´
I searched a lot, but I could not find too much about EMR/Avro. I am stuck. Is
there somewhere an example with data, schemas and pig scripts which I can try?
I hope this - as my 1st post in this mailing list - complies to your standards
in terms of provided information and tone ,-). If not, apologies and let me try
a 2nd time.
In pig I try this
´´
REGISTER s3://p3insight/libs/avro-1.7.4.jar;
-- REGISTER s3://p3insight/libs/pig/piggybank.jar;
REGISTER s3://p3insight/libs/jackson-mapper-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/jackson-core-2.3.4.jar
-- REGISTER s3://p3insight/libs/jackson-core-asl-1.9.9.jar;
REGISTER s3://p3insight/libs/json-simple-1.1.1.jar;
REGISTER /home/hadoop/pig/lib/piggybank.jar
a = LOAD 's3://p3iqubole/data/avro/' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
´´
Output is as follows
´´
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException:
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with
arguments 'null'
Details at logfile: /mnt/var/log/apps/pig.log
´´
Content of log file is:
´´
Pig Stack Trace
---------------
ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException:
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with
arguments 'null'
Failed to parse: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException:
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with
arguments 'null'
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException:
could not instantiate 'org.apache.pig.piggybank.storage.avro.AvroStorage' with
arguments 'null'
at
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:835)
at
org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3235)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1314)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:798)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:516)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:391)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
... 15 more
Caused by: java.lang.RuntimeException: could not instantiate
'org.apache.pig.piggybank.storage.avro.AvroStorage' with arguments 'null'
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
at
org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:823)
... 21 more
Caused by: java.lang.NoClassDefFoundError: org/json/simple/parser/ParseException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493)
at java.lang.Class.getConstructor0(Class.java:2803)
at java.lang.Class.newInstance(Class.java:345)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
... 22 more
Caused by: java.lang.ClassNotFoundException:
org.json.simple.parser.ParseException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 27 more
´´
Kind regards.
Ralf Klüber