To the idea of using multiple language generators instead of C++,
I've been thinking that if the compiler itself generated to some
common intermediate language like JSON, it would be really easy to
write a generator. JSON (or XML or YAML or something like it)
probably already has a parser in most languages, so you'd just treat
it like an AST and generate code however you want. It could be hooked
up via stdin/stdout. Then, I could generate my Ruby classes with a
Ruby script :).
On Aug 25, 2008, at 5:11 PM, Chad Walters wrote:
Some quick thoughts:
1. Somewhat historical - Facebook's language of choice for backend
stuff was C++ and they were not using Java very much (although
their usage seems to have expanded somewhat, what with their use of
Hadoop and Zookeeper and their development of Cassandra).
2. That would be great. However, the current belief is that there
is a lot of special-casing for the specifics of each target
language and that it's not clear how much commonality could be
found to help here.
3. The current seqid mechanism guarantees uniqueness and also
allows the seqid's to be small, which is better for the
DenseProtocol and other compact protocols.
4. Yep, sounds like a PITA. Does it buy that much? Can it be
supported across all the languages we are trying to support?
5. They are available for use by protocols if desired but the seqid
is really the important piece of data -- the names are not actually
used in the binary protocol or other compact protocols.
6 and 7. I'll let someone else speak to these issues.
WRT 1 and 2, I would actually love to see some mechanism to allow
for the compiler to be abstracted to the point where we could
implement it in a broad choice of languages (C++, Java, Ruby, etc.)
and still produce the same target language bindings. This would
free non-C++ shops from needing the C++ tool chain. Sounds like a
pretty interesting and extensive project in and of itself -- if you
can figure out how to make this happen, more power to you.
Chad
On 8/25/08 4:36 PM, "Torsten Curdt" <[EMAIL PROTECTED]> wrote:
Hey guys,
I've looked into Thrift recently and a few questions came up:
1. Why a native compiler? Would it me a little bit simpler to have the
compiler/code generator written in java? No language debate - just a
curious question for the reason :)
2. Wouldn't it make sense to have a bit of better separation than
having all code mixed up in the t_*_generator.cc files? Maybe more a
template approach so adjusting the code that gets generated becomes a
little bit easier?
3. Why not use the hash code of the attribute names as the sequence
id?
4. Why only composition? Even a flattening model of multiple
inheritance should be quite easy to implement (if overloading is
forbidden). While in OOP I am big fan of composition over inheritance
it makes the generated API kind of ugly. Maybe a include mechanism
would be another way of simplifying composed structures. (Although I
do realize that with the current model of sequence ids that might be a
PITA to maintain)
5. If I noticed correctly the names of the attributes are included
when serialized. Why is that? Shouldn't knowing the sequence id be
good enough?
6. How do you guys suggest to deal with deterministic semantical
changes. Let's say you have
struct test {
required string a;
required string b;
}
and then you want to combine those values into one attribute
struct test {
required string ab; // = a + b
}
There are a couple of problems I see here. For one ab will have to
have a different sequence id. And I guess then the 'required' will
become a problem for sequence of a and b(?). And finally the
conversion of ab = a+b needs to be handle on the application level
while rule is very straight forward and deterministic and *could* be
expressed in more generic manner.
7. Wouldn't it make sense to separate out the service and exception
stuff from the actual message versioning/serialization code?
cheers
--
Torsten