Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

David Reiss Thu, 02 Apr 2009 22:50:54 -0700

For those of you who don't have git, forrest, *and* Java 5
(not 6! 5!) installed, I built the docs and put them online:


http://www.projectornation.com/avro-doc/spec.html

AFAICT, the main differences from Thrift are:

- No code generation.  The schema is all in JSON files that are parsed
  at runtime.  For Python, this is probably fine.  I'm not really clear
  on how it looks for Java (maybe someone can look at the Java tests and
  explain it to the rest of us).  For C++, this will definitely make
  the avro objects feel clunky because you'll have to access properties
  by name.  And the lists won't be statically typed.
- The full schema is included with the messages, rather than having
  field ids delimit the contents.  This is nice for big Hadoop files
  since you only include the schema once.  (It was a technique that
  we discussed for Thrift.)  For a system like (I guess) Hadoop that
  has long-lived RPC connections with multiple messages passed, I guess
  it is not that big of a deal either.  For a system like we have at
  Facebook where the web server must connect to the feed/search/chat
  server once for each RPC, it is a killer.

--David



Bryan Duxbury wrote:
> Indeed, I am very curious about how this differs from Thrift.
> 
> On Apr 2, 2009, at 7:48 PM, Kevin Clark wrote:
> 
>> Reposting from thrift-user.
>>
>> On Thu, Apr 2, 2009 at 7:19 PM, Jeff Hammerbacher  
>> <[email protected]> wrote:
>>> See http://markmail.org/thread/7cgrwoc4er4mr3bp
>>>
>> Is this a vote of no confidence on Doug's part? Last I heard, he was
>> still one of our mentors, and this project sounds an awful lot like
>> Thrift.
>>
>>
>>
>> -- 
>> Kevin Clark
>> http://glu.ttono.us
>

Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

Reply via email to