I agree with Chad here. We should get in touch with Doug and see why
he can't use Thrift.
-Bryan
On Apr 3, 2009, at 5:47 AM, Chad Walters wrote:
Are there any strong technical reasons why we couldn¹t fold Avro¹s
functionality into Thrift?
Back when I was trying to get Thrift into a position to replace Hadoop
record IO, we talked about doing much of this.
I think the main challenges here are:
1. Runtime parsing of schemas
2. Lack of static typing for some items
I suggest that we talk with Doug, perhaps a face-to-face meeting
with some
of us and figure out if we can get him to come on board Thrift. At
the end
of the day, it will be a win for both the Thrift and the Hadoop
communities
for us to make these two Apache projects work closely together,
especially
since I know there is a huge overlap between the two communities.
Chad
On 4/2/09 10:48 PM, "David Reiss" <[email protected]> wrote:
For those of you who don't have git, forrest, *and* Java 5
(not 6! 5!) installed, I built the docs and put them online:
http://www.projectornation.com/avro-doc/spec.html
AFAICT, the main differences from Thrift are:
- No code generation. The schema is all in JSON files that are
parsed
at runtime. For Python, this is probably fine. I'm not really
clear
on how it looks for Java (maybe someone can look at the Java
tests and
explain it to the rest of us). For C++, this will definitely make
the avro objects feel clunky because you'll have to access
properties
by name. And the lists won't be statically typed.
- The full schema is included with the messages, rather than having
field ids delimit the contents. This is nice for big Hadoop files
since you only include the schema once. (It was a technique that
we discussed for Thrift.) For a system like (I guess) Hadoop that
has long-lived RPC connections with multiple messages passed, I
guess
it is not that big of a deal either. For a system like we have at
Facebook where the web server must connect to the feed/search/chat
server once for each RPC, it is a killer.
--David