Seems this is a pig bug. Maybe it is caused by AvroStorage. According the log, it said pig read 4 records, and output 4 records.
On Wed, Sep 21, 2011 at 1:55 PM, Scott Carey <[email protected]> wrote: > You will want to ask the pig user mailing list this question. > > org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig > project and you will get more help from there. > > On 9/21/11 4:34 AM, "Alex Holmes" <[email protected]> wrote: > > >Hi all, > > > >I have a simple schema > > > >{"name": "Record", "type": "record", > > "fields": [ > > {"name": "name", "type": "string"}, > > {"name": "id", "type": "int"} > > ] > >} > > > >which I use to write 2 records to an Avro file, and my reader code > >(which reads the file and dumps the records) verifies that there are 2 > >records in the file: > > > >Record@1e9e5c73[name=r1,id=1] > >Record@ed42d08[name=r2,id=2] > > > >When using this file with pig and AvroStorage, pig seems to think > >there are 4 records: > > > >grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar; > >grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar; > >grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar; > >grunt> REGISTER > >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-core-asl-1.6.0.jar; > >grunt> REGISTER > >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-mapper-asl-1.6.0.jar; > >grunt> raw = LOAD 'test.v1.avro' USING > >org.apache.pig.piggybank.storage.avro.AvroStorage; > >grunt> dump raw; > >.. > >Input(s): > >Successfully read 4 records (825 bytes) from: > >"hdfs://localhost:9000/user/aholmes/test.v1.avro" > > > >Output(s): > >Successfully stored 4 records (46 bytes) in: > >"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585" > > > >Counters: > >Total records written : 4 > >Total bytes written : 46 > >.. > >(r1,1) > >(r2,2) > >(r1,1) > >(r2,2) > > > >I'm sure I'm doing something wrong (again)! > > > >Many thanks, > >Alex > > > -- Best Regards Jeff Zhang
