Correction: when I read the file in Python, I get the error below. It
looks like a unicode problem? Can one tell Avro how to handle this?
Traceback (most recent call last):
File "./cat_avro", line 21, in <module>
for record in df_reader:
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
line 354, in next
datum = self.datum_reader.read(self.datum_decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 445, in read
return self.read_data(self.writers_schema, self.readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 490, in read_data
return self.read_record(writers_schema, readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 690, in read_record
field_val = self.read_data(field.type, readers_field.type, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 488, in read_data
return self.read_union(writers_schema, readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 654, in read_union
return self.read_data(selected_writers_schema, readers_schema, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 458, in read_data
return self.read_data(writers_schema, s, decoder)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 468, in read_data
return decoder.read_utf8()
File
"/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
line 233, in read_utf8
return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543:
invalid start byte
On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <[email protected]>wrote:
> I am writing Avro records in Ruby using the avro ruby gem in 1.8.7. I
> have problems with loading these files sometimes. As a result, I am unable
> to write large files that are readable.
>
> The exception I get is below. Anyone have an idea what this means? It
> looks like Avro is having trouble parsing the schema. The avro files parse
> in Ruby and Python, just not Pig. Are there more rigorous checks in Java?
>
> Pig Stack Trace
> ---------------
> ERROR 2998: Unhandled internal error.
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
>
> java.lang.NoSuchMethodError:
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
> at org.apache.avro.Schema.<clinit>(Schema.java:82)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorageUtils.<clinit>(AvroStorageUtils.java:49)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144)
> at
> org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150)
> at
> org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109)
> at
> org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100)
> at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at
> org.apache.pig.newplan.logical.visitor.CastLineageSetter.<init>(CastLineageSetter.java:57)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1679)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> ================================================================================
>
> --
> Russell Jurney
> twitter.com/rjurney
> [email protected]
> datasyndrome.com
>
--
Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com