Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

Chad Walters Fri, 03 Apr 2009 05:47:55 -0700

Are there any strong technical reasons why we couldn¹t fold Avro¹s
functionality into Thrift?

Back when I was trying to get Thrift into a position to replace Hadoop
record IO, we talked about doing much of this.

I think the main challenges here are:
1. Runtime parsing of schemas
2. Lack of static typing for some items

I suggest that we talk with Doug, perhaps a face-to-face meeting with some
of us and figure out if we can get him to come on board Thrift. At the end
of the day, it will be a win for both the Thrift and the Hadoop communities
for us to make these two Apache projects work closely together, especially
since I know there is a huge overlap between the two communities.

Chad

On 4/2/09 10:48 PM, "David Reiss" <[email protected]> wrote:

> For those of you who don't have git, forrest, *and* Java 5
> (not 6! 5!) installed, I built the docs and put them online:
> 
> http://www.projectornation.com/avro-doc/spec.html
> 
> AFAICT, the main differences from Thrift are:
> 
> - No code generation.  The schema is all in JSON files that are parsed
>   at runtime.  For Python, this is probably fine.  I'm not really clear
>   on how it looks for Java (maybe someone can look at the Java tests and
>   explain it to the rest of us).  For C++, this will definitely make
>   the avro objects feel clunky because you'll have to access properties
>   by name.  And the lists won't be statically typed.
> - The full schema is included with the messages, rather than having
>   field ids delimit the contents.  This is nice for big Hadoop files
>   since you only include the schema once.  (It was a technique that
>   we discussed for Thrift.)  For a system like (I guess) Hadoop that
>   has long-lived RPC connections with multiple messages passed, I guess
>   it is not that big of a deal either.  For a system like we have at
>   Facebook where the web server must connect to the feed/search/chat
>   server once for each RPC, it is a killer.
> 
> --David

Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

Reply via email to