Re: why...

Torsten Curdt Tue, 26 Aug 2008 01:10:33 -0700


On Aug 26, 2008, at 02:08, Mark Slee wrote:

1. This decision was made because lexx/yacc, despite being C, is still
one of the most common lexing/parsing toolkits around. It's also the
easiest to install on a *nix system (almost every linux distro has all
the libs installed off the shelf). The standard Java release doesn't

have lexing/parsing tools, so already Java would require 3rd partylibs

which is enough to turn some people off.

It's more Chad's pointer to be historical than I can understand verymuch :)As for standard and 3rd part and being installed: antlr then bundleeverything in one jar.At least if you expect java to be installed on the system it can't bemuch easier than that ;)

2. Yes, some of the generator stuff has gotten a bit unwieldy overtime

as we've added more features. I wouldn't mind a templating system
either, but this is relatively low-leverage work given that the end
result is the same, just with cleaner code.


"just" ;)

Generally, I give much

higher priority to the quality of the language runtime librariesthan to

the quality of the code generator internals. Most Thrift users should
never need to touch the code generator. We should work on improving it
to the extent that it'll help us continue to develop faster in the
future.

Well, true. Hopefully the code generation is minimal and the mostcrucial stuff willbe handled by the runtime. So agree with that perspective. On theother hand I would imaginethat implementing new languages would be much easier with a cleantemplating approach.

3. What if you decide you gave a variable a stupid name and want to

change it, but you've already deployed production code? Separatingnames

in code from transport makes this painless, and saves a lot of
frustration/confusing legacy naming issues.

Indeed - but using the hash does not make it any different. Thinkingabout this:


  required string somename -> sequence id =  "somename".hashCode()

required string somenamenew [somename] -> sequence id ="somename".hashCode()

You can easily change the name. All that really matters is it's hashcode.For the API users it would make no difference - except that maintaingthe

sequence id would become a magnitude less of a hassle.

Also, to make the hash
system provably correct we'd have to have a conflict resolutionsystem,
which would be quite complex.


Indeed. You would have to check the hash codes for unique-ness.
But being pragmatic here you could

1. Just check and fail if there are the same hash codes used
2. Ask the user to rename one of the clashing fields or

3. Offer a generated unique id that could be use instead in such(probably rare) cases

4. Inheritance can be problematic due to the use of unique field

identifiers. If developer A owns struct A and developer B subclassesit

with struct B, problems ensue. If B chooses to use field identifiers
that A later adds to struct A, downstream breakage happens.

Of course you have to check for uniqueness. If you flatten the structsas a last stepthat should be fairly straight forward to do. The pain comes only fromthe way howcurrently the sequence id are maintained :) ..which is why I wasthinking about the

hash based approach.

5. This is optional, for readable protocols which would like toincludethem. The TProtocol abstraction supports sending names, but if youlookat the TBinaryProtocol implementation it actually doesn't send thenames
over the wire. You're correct, the sequence ids are good enough.


Ah ...OK. Sorry - missed that :) Thanks for clarification.

6. I would do this in 2 steps. First, move from "required a,required b"
to "optional a, optional b, optional ab." These changes can be rolled
out without any breakage. Then, you can switch your client side to
"required ab" and finally switch the server side to "required ab,"
dropping the individual fields. I cannot think of any way to do thisin
only one switch without breakage.


Now you focused more on the optional/required and so on. Indeed all
correct. But my focus was more on the fact that ab can be derived from
a and b. That means that even old struct implicitly have ab. So when
you make the switch you either have support this logic in (every) client

or you switch over to only rely on ab and can no longer read olderstructs.


See what I mean now?

7. I'm not sure exactly what you mean here. Which parts do you feelare
not separated? The versioning and encoding is all isolated into the
TProtocol abstraction, transfer lives exclusively in TTransport, andthe
generated TProcessors deal only with actual message dispatching.

Well, if you only use Thrift for serialization and versioning youmight not

always have a need for the service stub generation. While this isn't
really a big problem I am wondering if these aren't two separate things.

cheers
--
Torsten

Re: why...

Reply via email to