AvroStorage taking long time to load and iterate over records

Jaikit Savla Thu, 26 Mar 2015 15:51:12 -0700

Folks,
I am noticing weird behavior where loading and iterating avro records via 
AvroStorage takes long time as compared to  iterating via MapReduce job.  Any 
known issues or any clue as to why AvroStorage would take such long time ?
Example:Schema which I am using:
{  "type": "record",  "name": "Timber",  "namespace": "com.timber.avro",  
"fields": [    {      "name": "identifier",      "type": "string",      "doc": 
"Identifier. NonNull."    },    {      "name": "reservation",      "type": [    
    "null",        {          "type": "array",          "items": {            
"name": "Reservation",            "type": "record",            "fields": [      
        {                "name": "bookingDate",                "type": "long",  
              "doc": "Timestamp in UTC. NonNull"              },              { 
               "name": "code",                "type": [                  
"null",                  "string"                ],                "doc": 
"Code.",                "default": null              },                         
 ]          }        }      ],      "default": null,      "doc": "array of 
segment id which this urn belongs."    }  ]}
---> Pig
using Pig AvroStorage, it takes more than 30 minutes to simple iterate. I have 
been adding more optional fields (like code) in above Reservation record. Does 
that affect how I am using AvroStorage ?
register /json-simple-1.1.jarregister /piggybank.jar
records = LOAD '/data/*/one.avro'          USING 
org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check')
reservation = FOREACH records {            selectHotelAtt = FOREACH reservation 
GENERATE bookingDate;            GENERATE FLATTEN(selectHotelAtt.bookingDate) 
as bookingDate;                };DUMP reservation;


--> MapReduceWhen I use MapReduce job to iterate through all the records it 
completes in less than 2 minutes for about million records
Mapper interface        @Override        public void map(final AvroKey<Timber> 
key, final NullWritable value, final Context context) throws IOException, 
InterruptedException {
Thanks,Jaikit

AvroStorage taking long time to load and iterate over records

Reply via email to