Re: Avro vs. Hadoop serialization performance?

Scott Carey Mon, 14 Mar 2011 14:21:11 -0700

If you're I/O bound, Avro will be faster.  Avro's raw field serialization
is very fast, but some types of object marshaling are not yet that fast.
Hadoop's Writables aren't all that fast themselves anyway.


I don't know of any public direct benchmarks comparing the two in a
standard Hadoop MapReduce.


When attempted with Pig, Avro was faster (PIG-794):
Storage   Time spent on job_1   Output size of job_1   Mapper task number
of job_2   Time spent on job_2   Total spent time on pig script
AvroStorage   3min 51 sec  7.96G  120 17min 09 sec 21min 0 sec
InterStorage  4min 33 sec  9.55G  143  17min 17 sec  21min 50 sec


On 3/14/11 1:59 PM, "Aleksey Maslov" <[email protected]> wrote:

>Hi,
>
>Has there been any benshmarking done to determine which serialization
>architecture is better - Hadoop vs. Avro;
>I understand Avro has language neutrality as its big plus; but what about
>the perf?
>
>and yes, its a loaded question -all depends on the nature of the data:
>text
>vs. numeric - but still, are they close?
>
>Aleksey
>
>
>--
>View this message in context:
>http://apache-avro.679487.n3.nabble.com/Avro-vs-Hadoop-serialization-perfo
>rmance-tp2677357p2677357.html
>Sent from the Avro - Users mailing list archive at Nabble.com.

Re: Avro vs. Hadoop serialization performance?

Reply via email to