1. This decision was made because lexx/yacc, despite being C, is still one of the most common lexing/parsing toolkits around. It's also the easiest to install on a *nix system (almost every linux distro has all the libs installed off the shelf). The standard Java release doesn't have lexing/parsing tools, so already Java would require 3rd party libs which is enough to turn some people off.
2. Yes, some of the generator stuff has gotten a bit unwieldy over time as we've added more features. I wouldn't mind a templating system either, but this is relatively low-leverage work given that the end result is the same, just with cleaner code. Generally, I give much higher priority to the quality of the language runtime libraries than to the quality of the code generator internals. Most Thrift users should never need to touch the code generator. We should work on improving it to the extent that it'll help us continue to develop faster in the future. 3. What if you decide you gave a variable a stupid name and want to change it, but you've already deployed production code? Separating names in code from transport makes this painless, and saves a lot of frustration/confusing legacy naming issues. Also, to make the hash system provably correct we'd have to have a conflict resolution system, which would be quite complex. 4. Inheritance can be problematic due to the use of unique field identifiers. If developer A owns struct A and developer B subclasses it with struct B, problems ensue. If B chooses to use field identifiers that A later adds to struct A, downstream breakage happens. Composition is free from these problems, albeit less convenient in some instances. I'd definitely endorse development work on tools to make composition easier. 5. This is optional, for readable protocols which would like to include them. The TProtocol abstraction supports sending names, but if you look at the TBinaryProtocol implementation it actually doesn't send the names over the wire. You're correct, the sequence ids are good enough. 6. I would do this in 2 steps. First, move from "required a, required b" to "optional a, optional b, optional ab." These changes can be rolled out without any breakage. Then, you can switch your client side to "required ab" and finally switch the server side to "required ab," dropping the individual fields. I cannot think of any way to do this in only one switch without breakage. 7. I'm not sure exactly what you mean here. Which parts do you feel are not separated? The versioning and encoding is all isolated into the TProtocol abstraction, transfer lives exclusively in TTransport, and the generated TProcessors deal only with actual message dispatching. Cheers, Mark -----Original Message----- From: Torsten Curdt [mailto:[EMAIL PROTECTED] Sent: Monday, August 25, 2008 4:36 PM To: [email protected] Subject: why... Hey guys, I've looked into Thrift recently and a few questions came up: 1. Why a native compiler? Would it me a little bit simpler to have the compiler/code generator written in java? No language debate - just a curious question for the reason :) 2. Wouldn't it make sense to have a bit of better separation than having all code mixed up in the t_*_generator.cc files? Maybe more a template approach so adjusting the code that gets generated becomes a little bit easier? 3. Why not use the hash code of the attribute names as the sequence id? 4. Why only composition? Even a flattening model of multiple inheritance should be quite easy to implement (if overloading is forbidden). While in OOP I am big fan of composition over inheritance it makes the generated API kind of ugly. Maybe a include mechanism would be another way of simplifying composed structures. (Although I do realize that with the current model of sequence ids that might be a PITA to maintain) 5. If I noticed correctly the names of the attributes are included when serialized. Why is that? Shouldn't knowing the sequence id be good enough? 6. How do you guys suggest to deal with deterministic semantical changes. Let's say you have struct test { required string a; required string b; } and then you want to combine those values into one attribute struct test { required string ab; // = a + b } There are a couple of problems I see here. For one ab will have to have a different sequence id. And I guess then the 'required' will become a problem for sequence of a and b(?). And finally the conversion of ab = a+b needs to be handle on the application level while rule is very straight forward and deterministic and *could* be expressed in more generic manner. 7. Wouldn't it make sense to separate out the service and exception stuff from the actual message versioning/serialization code? cheers -- Torsten
