On Thu, May 5, 2011 at 9:36 PM, Mark Slee <ms...@fb.com> wrote:
> Ironically, using C++ for the compiler was originally a decision motivated 
> entirely by
> portability, as at the time (2006) it was not standard for many typical 
> distributions to
> have Java pre-installed, there was not really a "standard" Java 
> lexing/parsing library,
> and ditto for Python. But pretty much everyone has lex/yacc, and boost was
> intentionally not a dependency of the original compiler.

I can understand your motivations but I disagree with the conclusions.   :-)

I would probably have made the same choices at some point.

in general, for typical utilities, having someone else do all the work
of porting a runtime is a lot easier than writing C code (or C++) that
needs to be ported to lots of platforms. and for a utility where the
point _is_ to work on multiple platforms that is a big win.

besides the thrift compiler isn't performance sensitive.

as for "no standard Lexer/Parser in <language X>", this is not really
relevant.  lex/yacc is "to be expected in some form" on on a subset of
unixen, so I would hesitate to use the word "standard".  everywhere
else it is something you have to install -- or upgrade, since if it is
on an old, or non open source OS, it is likely to be some antiquated
relic.

besides, the grammar for Thrift isn't that complicated (and for such a
domain specific language, a restrictive grammar would have been okay).
 you would have managed to hand-roll a parser in ANSI C and I am
certain you would have done a lot better than Rasmus Lerdorf did when
he wrote the original PHP/FI parser (this is not crapping on Rasmus.
I fully respect Rasmus, but he will tell you the same thing.  in fact
he told me I was fucking crazy during a dinner when I told him we had
written a 20-30kLOC system in PHP/FI at the time).  that thing was
certainly ... untouched by ... clues.  but it worked and it set off an
explosion.  (and eventually Zeev and Andi came in and replaced it with
a ... lex/yacc based parser that was much better).  but PHP is a bit
beside the point.  since it is not specificly a tool for dealing with
other languages and (more importantly) a gazillion build systems.

and if you can do it in ANSI C you can do it in any other language you
know as well.
(hey, I wrote a fairly large SGML parser in the 1990s in Perl, and if
you have ever read Goldfarb's SGML handbook you will have noted that
SGML wasn't exactly designed with someone actually writing parsers in
mind :-).

as for tools and libraries for Java (and C, C# and Python), there's
certainly Antlr and I have used JavaCC for certain domain specific
languages.  I don't know how old the Python support is in Antlr but
Antlr does support Python.

>>> as for criticizms of Thrift I'd rather hold off on those until I have 
>>> enough spare time to actually contribute solutions.
>
> Non-malicious criticisms are always welcome! You don't have to be able or 
> willing to provide solutions yourself to raise issues. Others may be unaware 
> of the issues and able to sort them out, or at the very least you will 
> probably get an explanation of why something is the way it is.

I've touched on this discussion (writing the compiler in a different
way) before and I backed off because I was afraid that I wouldn't be
able to express why I disagree without sounding like a dick.

anyway, the two things I would do would be to:

1) re-do the parser using Antlr and do the compiler in Java or Python.
 a Python runtime is more common and probably the most portable
alternative, but I think a Java compiler would be more useful.
especially since the pain is felt very strongly in, for instance,
Mavenized Java projects that have to deal with multiple platforms (we
use OSX, Linux, Windows and FreeBSD).  (build systems for C/C++
environments tend to be less....well, braindead, than for Java, so the
pain is less apparent there)

(if the code generation was done using some sort of templating system
for code generation that is already portable it might be feasible to
offer Thrift compilers in more languages without much extra effort
since adding a new language would mean expressing the code to be
generated just once, but being able to process it in multiple
languages)

someone on my team actually did write an almost complete Thrift
grammar for Antlr and I have been pestering him to release it,  but he
wanted to finish it first.  (we have an internal serialization
protocol for a domain specific use and we needed to automate
conversion between thrift structures and this protocol.  so we used
thrift as the common language for specifying data structures and
generated code for both and for the conversion between the formats).
we're in a race towards a deadline (again), but if anyone is
interested in the grammar I can probably squeeze it out of him :-)

2) I'd replace the networking code for Java with Netty and I would
look at doing similar things in other languages.

Like you I hate unnecessary dependencies.  Years ago (2003?) I wrote
my own NIO networking library for a piece of software that had
insanely stringent latency requirements, but if I were doing the same
project today my first instinct would have been to use Netty rather
than write my own.  Not because Netty is faster (it may or may not
be), but because it gives you a lot of stuff for free.  For instance
we would not have had the issue of servers being easy to crash by
feeding them random data in earlier versions.  There are lots of code
for doing packet framing for Netty that works well.  And adding SSL,
well, that is pretty much just a couple of lines of extra code.
Adding compression?  ditto.

I hate most Java frameworks but Netty would be a worthwhile dependency
since it is actually a quality piece of software.  It is the sort of
quality software that inspires people do things in more sensible ways.
 If you haven't looked at it, do a simple project in it.  You'll be
pleased.   Besides, in Java, lots of dependencies are the norm.
Depending on Netty is less of a hassle than depending on ... well even
just a compiler to begin with on OSX.

Netty would also make it a bit simpler to add neat things like a FORM
interface to the RPC system (you might have seen this in a previous
job ;-).  doing an HTTP interface when you have Netty already in place
is not a lot of work at all.


I can also offer some perspective on why we chose Thrift early on and
why we are now gradually drifting away from it and reverting to
REST/JSON APIs.

about half of my current team came from Google.  since you have worked
there you have used Google's RPC layer (is the name officially known
outside Google?).  the good thing about the RPC layer in Google is
that it works and you spend almost no time dealing with it.  we
genuinely hated the complexity of SOAP, we found Java RPC to be too
limiting, and even REST/JSON requires some painful level of engagement
in Java (which is a typical disease of most Java frameworks that
implement anything "hip").

so we chose Thrift hoping that it we would be able to make it work
painlessly for us.  but it didn't.  because it was a hassle to use on
anything but Linux and it even behaved a bit differently on different
macs in the office (I have forgotten the root cause).

we couldn't get other people to adopt it.  (the main challenge was
when we had some Windows "customers" and parts of the system were
using a newer Thrift version to get around the OOM-problem in the
framing code.  we'd have to assist them in getting the thrift compiler
built on Windows and then make it work with Maven.  with none of the
developers in my office having a Windows machine.  and of course, you
couldn't get pre-built windows binaries for arbitrary SVN revisions
from anywhere)

and to be honest, we spent a fair share of work ensuring that all the
code across all projects, being in different states of development
with different versions of the thrift compiler would work on every
platform we used.  it wasn't the painless development we were used to.

so gradually we ditched Thrift for all outward facing APIs.  and about
3-4 months ago we started ditching it for most new internal APIs as
well.  in favor of JSON/REST.  which is somewhat less efficient, but
well within the acceptable range for the things we use it for.
(besides, it is cheaper to add a few servers than it is to spend a lot
of time streamlining Thrift usage for people scattered across the
globe inside the company --  not to mention outside).

(and adding to all these problems was the fact that since Thrift
wasn't released into Maven central at that time, projects like
Cassandra were trailing behind current versions.  this meant that
anything that needed to talk to Cassandra would either have to use the
same version of Thrift as Cassandra or we would have to go the long
and painful route of having to OSGi'ify things to be able to support
mutiple Thrift generations in the same JVM.  Trust me, that can be a
real pain in the ass :-).

I'm sure a lot of these problems have been fixed, but the awkwardness
of using Thrift in a large, heterogenous development environment sort
of killed it for us.  And we really wanted to use it.  (I have
probably forgotten a truckload of the problems we had initially by
now. A lot has happened with our projects in the last year).

so when I say that I'd rather have the Thrift compiler in Java or
Python it isn't because I'm a Java or Python fanboy.  in fact, I don't
even use Python.  it is because I want to reduce complexity.  I also
think that solving a lot of problems for Java is more important.
People really need more things that actually *work* in Java without
dragging in more complexity.

but as I said, if I had the time I'd much rather contribute code than
what could be construed as whiny, long-winded criticisms.

-Bjørn

Reply via email to