FYI: https://issues.apache.org/jira/browse/AVRO-993
I expect that Avro 1.6.2 will add these methods back in. On 1/11/12 1:47 AM, "Andrew Kenworthy" <[email protected]> wrote: >Hi Stan, > >Thank you for your feedback. I've run the script passing "-D >mapred.child.java.opts=-verbose:class" and have the following in my logs: > >[Loaded org.apache.avro.generic.GenericDatumWriter from >file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth >y/jobcache/job_201111230039_0146/jars/job.jar] >[Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from >file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth >y/jobcache/job_201111230039_0146/jars/job.jar] > >I assume the .../job_201111230039_0146/jars/job.jar is the one prepared >by pig using the jars I have REGISTER-ed, in which case the classes are >the ones I expect, or have I misread that? > >Regards, > >Andrew > > > >>________________________________ >> From: Stan Rosenberg <[email protected]> >>To: [email protected]; Andrew Kenworthy <[email protected]> >>Sent: Tuesday, January 10, 2012 5:36 PM >>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 >> >>Andrew, >> >>Something looks odd in this stack trace: >> >>Caused by: java.lang.ClassCastException: >>org.apache.pig.data.BinSedesTuple cannot be cast to >>org.apache.avro.generic.IndexedRecord >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:525) >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:540) >>> at >>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWrite >>>r.java:103) >>> at >>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java >>>:65) >>> at >>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDa >>>tumWriter.java:99) >> >>PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order >>to extract values from a tuple. Thus, I would expect that the third >>method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone >>else has more insight as to why it's not getting invoked. In the >>meantime, please confirm that both PigAvroDatumWriter and >>GenericDatumWriter are loaded from the right jar files. (You can do >>this by temporarily changing the pig script to invoke JVM with 'java >>-verbose' and 'grep' the output for these classes.) >> >>Best, >> >>stan >> >>On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy >><[email protected]> wrote: >>> Hi Stan, >>> >>> here's the full stacktrace: >>> >>> org.apache.avro.file.DataFileWriter$AppendWriteException: >>>java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot >>>be cast to org.apache.avro.generic.IndexedRecord >>> at >>>org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) >>> at >>>org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroR >>>ecordWriter.java:49) >>> at >>>org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.ja >>>va:580) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo >>>rmat$PigRecordWriter.write(PigOutputFormat.java:138) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo >>>rmat$PigRecordWriter.write(PigOutputFormat.java:97) >>> at >>>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask. >>>java:530) >>> at >>>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutput >>>Context.java:80) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$ >>>Map.collect(PigMapOnly.java:48) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. >>>runPipeline(PigMapBase.java:238) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. >>>map(PigMapBase.java:231) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. >>>map(PigMapBase.java:53) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>> at >>>org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio >>>n.java:1115) >>> at org.apache.hadoop.mapred.Child.main(Child.java:262) >>> Caused by: java.lang.ClassCastException: >>>org.apache.pig.data.BinSedesTuple cannot be cast to >>>org.apache.avro.generic.IndexedRecord >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:525) >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:540) >>> at >>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWrite >>>r.java:103) >>> at >>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java >>>:65) >>> at >>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDa >>>tumWriter.java:99) >>> at >>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java >>>:57) >>> at >>>org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255) >>> ... 18 more >>> >>> >>> Andrew >>> >>> >>> >>>>________________________________ >>>> From: Stan Rosenberg <[email protected]> >>>>To: [email protected]; Andrew Kenworthy <[email protected]> >>>>Sent: Monday, January 9, 2012 5:30 PM >>>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 >>>> >>>>Andrew, >>>> >>>>The source of the problem may be AvroStorage in piggybank. Could you >>>>please include the entire stack trace? >>>> >>>>stan >>>> >>>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy >>>><[email protected]> wrote: >>>>> Hallo, >>>>> >>>>> When I run a simple pig script to LOAD and STORE avro data, I get:- >>>>> >>>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple >>>>>cannot be cast to org.apache.avro.generic.IndexedRecord >>>>> >>>>> >>>>> Script: >>>>> >>>>> REGISTER /tmp/avro-1.6.0.jar; >>>>> --REGISTER /tmp/avro-1.5.4.jar >>>>> --REGISTER /tmp/avro-1.4.1.jar; >>>>> >>>>> REGISTER /tmp/piggybank-0.9.1.jar; >>>>> REGISTER /tmp/json-simple-1.1.jar; >>>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar; >>>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; >>>>> >>>>> avroData=LOAD '$DATA_INPUTDIR' USING >>>>>org.apache.pig.piggybank.storage.avro.AvroStorage(); >>>>> >>>>> dataSubset = FOREACH avroData GENERATE myField1, myField2; >>>>> describe dataSubset; >>>>> ----------------------------------------------- >>>>> -- shows: >>>>> -- dataSubset : {myField1: int,myField2: int} >>>>> ----------------------------------------------- >>>>> STORE dataSubset INTO '$OUTPUTDIR' USING >>>>>org.apache.pig.piggybank.storage.avro.AvroStorage(); >>>>> >>>>> If I use the 1.5.4 jar I get the same error, but the script works >>>>>with the 1.4.1 version. If I just write one field, then it works with >>>>>1.6.0. >>>>> >>>>> I see there's been a related issue fixed here: >>>>> >>>>> https://issues.apache.org/jira/browse/PIG-2202 >>>>> https://issues.apache.org/jira/browse/PIG-2195 >>>>> >>>>> Can anyone confirm that this or similar works with avro 1.6.0, >>>>>and/or point me in the right direction concering where the problem >>>>>may lie? >>>>> >>>>> Many thanks, >>>>> >>>>> Andrew >>>> >>>> >>>> >> >>
