Are there any strong technical reasons why we couldn¹t fold Avro¹s functionality into Thrift?
Back when I was trying to get Thrift into a position to replace Hadoop record IO, we talked about doing much of this. I think the main challenges here are: 1. Runtime parsing of schemas 2. Lack of static typing for some items I suggest that we talk with Doug, perhaps a face-to-face meeting with some of us and figure out if we can get him to come on board Thrift. At the end of the day, it will be a win for both the Thrift and the Hadoop communities for us to make these two Apache projects work closely together, especially since I know there is a huge overlap between the two communities. Chad On 4/2/09 10:48 PM, "David Reiss" <[email protected]> wrote: > For those of you who don't have git, forrest, *and* Java 5 > (not 6! 5!) installed, I built the docs and put them online: > > http://www.projectornation.com/avro-doc/spec.html > > AFAICT, the main differences from Thrift are: > > - No code generation. The schema is all in JSON files that are parsed > at runtime. For Python, this is probably fine. I'm not really clear > on how it looks for Java (maybe someone can look at the Java tests and > explain it to the rest of us). For C++, this will definitely make > the avro objects feel clunky because you'll have to access properties > by name. And the lists won't be statically typed. > - The full schema is included with the messages, rather than having > field ids delimit the contents. This is nice for big Hadoop files > since you only include the schema once. (It was a technique that > we discussed for Thrift.) For a system like (I guess) Hadoop that > has long-lived RPC connections with multiple messages passed, I guess > it is not that big of a deal either. For a system like we have at > Facebook where the web server must connect to the feed/search/chat > server once for each RPC, it is a killer. > > --David
