Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

Bryan Duxbury Fri, 03 Apr 2009 07:52:08 -0700

I agree with Chad here. We should get in touch with Doug and see whyhe can't use Thrift.


-Bryan


On Apr 3, 2009, at 5:47 AM, Chad Walters wrote:

Are there any strong technical reasons why we couldn¹t fold Avro¹s
functionality into Thrift?

Back when I was trying to get Thrift into a position to replace Hadoop
record IO, we talked about doing much of this.

I think the main challenges here are:
1. Runtime parsing of schemas
2. Lack of static typing for some items
I suggest that we talk with Doug, perhaps a face-to-face meetingwith someof us and figure out if we can get him to come on board Thrift. Atthe endof the day, it will be a win for both the Thrift and the Hadoopcommunitiesfor us to make these two Apache projects work closely together,especially
since I know there is a huge overlap between the two communities.

Chad

On 4/2/09 10:48 PM, "David Reiss" <[email protected]> wrote:
For those of you who don't have git, forrest, *and* Java 5
(not 6! 5!) installed, I built the docs and put them online:

http://www.projectornation.com/avro-doc/spec.html

AFAICT, the main differences from Thrift are:
- No code generation. The schema is all in JSON files that areparsedat runtime. For Python, this is probably fine. I'm not reallyclearon how it looks for Java (maybe someone can look at the Javatests and
  explain it to the rest of us).  For C++, this will definitely make
the avro objects feel clunky because you'll have to accessproperties
  by name.  And the lists won't be statically typed.
- The full schema is included with the messages, rather than having
  field ids delimit the contents.  This is nice for big Hadoop files
  since you only include the schema once.  (It was a technique that
  we discussed for Thrift.)  For a system like (I guess) Hadoop that
has long-lived RPC connections with multiple messages passed, Iguess
  it is not that big of a deal either.  For a system like we have at
  Facebook where the web server must connect to the feed/search/chat
  server once for each RPC, it is a killer.

--David

Re: Avro, a cross-language serialization framework from Doug Cutting, proposed as Hadoop subproject

Reply via email to