RE: why...

Mark Slee Mon, 25 Aug 2008 17:09:39 -0700

1. This decision was made because lexx/yacc, despite being C, is still
one of the most common lexing/parsing toolkits around. It's also the
easiest to install on a *nix system (almost every linux distro has all
the libs installed off the shelf). The standard Java release doesn't
have lexing/parsing tools, so already Java would require 3rd party libs
which is enough to turn some people off.


2. Yes, some of the generator stuff has gotten a bit unwieldy over time
as we've added more features. I wouldn't mind a templating system
either, but this is relatively low-leverage work given that the end
result is the same, just with cleaner code. Generally, I give much
higher priority to the quality of the language runtime libraries than to
the quality of the code generator internals. Most Thrift users should
never need to touch the code generator. We should work on improving it
to the extent that it'll help us continue to develop faster in the
future.

3. What if you decide you gave a variable a stupid name and want to
change it, but you've already deployed production code? Separating names
in code from transport makes this painless, and saves a lot of
frustration/confusing legacy naming issues. Also, to make the hash
system provably correct we'd have to have a conflict resolution system,
which would be quite complex.

4. Inheritance can be problematic due to the use of unique field
identifiers. If developer A owns struct A and developer B subclasses it
with struct B, problems ensue. If B chooses to use field identifiers
that A later adds to struct A, downstream breakage happens. Composition
is free from these problems, albeit less convenient in some instances.
I'd definitely endorse development work on tools to make composition
easier.

5. This is optional, for readable protocols which would like to include
them. The TProtocol abstraction supports sending names, but if you look
at the TBinaryProtocol implementation it actually doesn't send the names
over the wire. You're correct, the sequence ids are good enough.

6. I would do this in 2 steps. First, move from "required a, required b"
to "optional a, optional b, optional ab." These changes can be rolled
out without any breakage. Then, you can switch your client side to
"required ab" and finally switch the server side to "required ab,"
dropping the individual fields. I cannot think of any way to do this in
only one switch without breakage.

7. I'm not sure exactly what you mean here. Which parts do you feel are
not separated? The versioning and encoding is all isolated into the
TProtocol abstraction, transfer lives exclusively in TTransport, and the
generated TProcessors deal only with actual message dispatching.

Cheers,
Mark

-----Original Message-----
From: Torsten Curdt [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 25, 2008 4:36 PM
To: [email protected]
Subject: why...

Hey guys,

I've looked into Thrift recently and a few questions came up:

1. Why a native compiler? Would it me a little bit simpler to have the
compiler/code generator written in java? No language debate - just a
curious question for the reason :)

2. Wouldn't it make sense to have a bit of better separation than having
all code mixed up in the t_*_generator.cc files? Maybe more a template
approach so adjusting the code that gets generated becomes a little bit
easier?

3. Why not use the hash code of the attribute names as the sequence id?

4. Why only composition? Even a flattening model of multiple inheritance
should be quite easy to implement (if overloading is forbidden). While
in OOP I am big fan of composition over inheritance it makes the
generated API kind of ugly. Maybe a include mechanism would be another
way of simplifying composed structures. (Although I do realize that with
the current model of sequence ids that might be a PITA to maintain)

5. If I noticed correctly the names of the attributes are included when
serialized. Why is that? Shouldn't knowing the sequence id be good
enough?

6. How do you guys suggest to deal with deterministic semantical
changes. Let's say you have

struct test {
   required string a;
   required string b;
}

and then you want to combine those values into one attribute

struct test {
   required string ab; // = a + b
}

There are a couple of problems I see here. For one ab will have to have
a different sequence id. And I guess then the 'required' will become a
problem for sequence of a and b(?). And finally the conversion of ab =
a+b needs to be handle on the application level while rule is very
straight forward and deterministic and *could* be expressed in more
generic manner.

7. Wouldn't it make sense to separate out the service and exception
stuff from the actual message versioning/serialization code?

cheers
--
Torsten

RE: why...

Reply via email to