On Aug 26, 2008, at 02:08, Mark Slee wrote:
1. This decision was made because lexx/yacc, despite being C, is still
one of the most common lexing/parsing toolkits around. It's also the
easiest to install on a *nix system (almost every linux distro has all
the libs installed off the shelf). The standard Java release doesn't
have lexing/parsing tools, so already Java would require 3rd party
libs
which is enough to turn some people off.
It's more Chad's pointer to be historical than I can understand very
much :)
As for standard and 3rd part and being installed: antlr then bundle
everything in one jar.
At least if you expect java to be installed on the system it can't be
much easier than that ;)
2. Yes, some of the generator stuff has gotten a bit unwieldy over
time
as we've added more features. I wouldn't mind a templating system
either, but this is relatively low-leverage work given that the end
result is the same, just with cleaner code.
"just" ;)
Generally, I give much
higher priority to the quality of the language runtime libraries
than to
the quality of the code generator internals. Most Thrift users should
never need to touch the code generator. We should work on improving it
to the extent that it'll help us continue to develop faster in the
future.
Well, true. Hopefully the code generation is minimal and the most
crucial stuff will
be handled by the runtime. So agree with that perspective. On the
other hand I would imagine
that implementing new languages would be much easier with a clean
templating approach.
3. What if you decide you gave a variable a stupid name and want to
change it, but you've already deployed production code? Separating
names
in code from transport makes this painless, and saves a lot of
frustration/confusing legacy naming issues.
Indeed - but using the hash does not make it any different. Thinking
about this:
required string somename -> sequence id = "somename".hashCode()
required string somenamenew [somename] -> sequence id =
"somename".hashCode()
You can easily change the name. All that really matters is it's hash
code.
For the API users it would make no difference - except that maintaing
the
sequence id would become a magnitude less of a hassle.
Also, to make the hash
system provably correct we'd have to have a conflict resolution
system,
which would be quite complex.
Indeed. You would have to check the hash codes for unique-ness.
But being pragmatic here you could
1. Just check and fail if there are the same hash codes used
2. Ask the user to rename one of the clashing fields or
3. Offer a generated unique id that could be use instead in such
(probably rare) cases
4. Inheritance can be problematic due to the use of unique field
identifiers. If developer A owns struct A and developer B subclasses
it
with struct B, problems ensue. If B chooses to use field identifiers
that A later adds to struct A, downstream breakage happens.
Of course you have to check for uniqueness. If you flatten the structs
as a last step
that should be fairly straight forward to do. The pain comes only from
the way how
currently the sequence id are maintained :) ..which is why I was
thinking about the
hash based approach.
5. This is optional, for readable protocols which would like to
include
them. The TProtocol abstraction supports sending names, but if you
look
at the TBinaryProtocol implementation it actually doesn't send the
names
over the wire. You're correct, the sequence ids are good enough.
Ah ...OK. Sorry - missed that :) Thanks for clarification.
6. I would do this in 2 steps. First, move from "required a,
required b"
to "optional a, optional b, optional ab." These changes can be rolled
out without any breakage. Then, you can switch your client side to
"required ab" and finally switch the server side to "required ab,"
dropping the individual fields. I cannot think of any way to do this
in
only one switch without breakage.
Now you focused more on the optional/required and so on. Indeed all
correct. But my focus was more on the fact that ab can be derived from
a and b. That means that even old struct implicitly have ab. So when
you make the switch you either have support this logic in (every) client
or you switch over to only rely on ab and can no longer read older
structs.
See what I mean now?
7. I'm not sure exactly what you mean here. Which parts do you feel
are
not separated? The versioning and encoding is all isolated into the
TProtocol abstraction, transfer lives exclusively in TTransport, and
the
generated TProcessors deal only with actual message dispatching.
Well, if you only use Thrift for serialization and versioning you
might not
always have a need for the service stub generation. While this isn't
really a big problem I am wondering if these aren't two separate things.
cheers
--
Torsten