If you're I/O bound, Avro will be faster. Avro's raw field serialization is very fast, but some types of object marshaling are not yet that fast. Hadoop's Writables aren't all that fast themselves anyway.
I don't know of any public direct benchmarks comparing the two in a standard Hadoop MapReduce. When attempted with Pig, Avro was faster (PIG-794): Storage Time spent on job_1 Output size of job_1 Mapper task number of job_2 Time spent on job_2 Total spent time on pig script AvroStorage 3min 51 sec 7.96G 120 17min 09 sec 21min 0 sec InterStorage 4min 33 sec 9.55G 143 17min 17 sec 21min 50 sec On 3/14/11 1:59 PM, "Aleksey Maslov" <[email protected]> wrote: >Hi, > >Has there been any benshmarking done to determine which serialization >architecture is better - Hadoop vs. Avro; >I understand Avro has language neutrality as its big plus; but what about >the perf? > >and yes, its a loaded question -all depends on the nature of the data: >text >vs. numeric - but still, are they close? > >Aleksey > > >-- >View this message in context: >http://apache-avro.679487.n3.nabble.com/Avro-vs-Hadoop-serialization-perfo >rmance-tp2677357p2677357.html >Sent from the Avro - Users mailing list archive at Nabble.com.
