On Thu, May 5, 2011 at 9:36 PM, Mark Slee <ms...@fb.com> wrote: > Ironically, using C++ for the compiler was originally a decision motivated > entirely by > portability, as at the time (2006) it was not standard for many typical > distributions to > have Java pre-installed, there was not really a "standard" Java > lexing/parsing library, > and ditto for Python. But pretty much everyone has lex/yacc, and boost was > intentionally not a dependency of the original compiler.
I can understand your motivations but I disagree with the conclusions. :-) I would probably have made the same choices at some point. in general, for typical utilities, having someone else do all the work of porting a runtime is a lot easier than writing C code (or C++) that needs to be ported to lots of platforms. and for a utility where the point _is_ to work on multiple platforms that is a big win. besides the thrift compiler isn't performance sensitive. as for "no standard Lexer/Parser in <language X>", this is not really relevant. lex/yacc is "to be expected in some form" on on a subset of unixen, so I would hesitate to use the word "standard". everywhere else it is something you have to install -- or upgrade, since if it is on an old, or non open source OS, it is likely to be some antiquated relic. besides, the grammar for Thrift isn't that complicated (and for such a domain specific language, a restrictive grammar would have been okay). you would have managed to hand-roll a parser in ANSI C and I am certain you would have done a lot better than Rasmus Lerdorf did when he wrote the original PHP/FI parser (this is not crapping on Rasmus. I fully respect Rasmus, but he will tell you the same thing. in fact he told me I was fucking crazy during a dinner when I told him we had written a 20-30kLOC system in PHP/FI at the time). that thing was certainly ... untouched by ... clues. but it worked and it set off an explosion. (and eventually Zeev and Andi came in and replaced it with a ... lex/yacc based parser that was much better). but PHP is a bit beside the point. since it is not specificly a tool for dealing with other languages and (more importantly) a gazillion build systems. and if you can do it in ANSI C you can do it in any other language you know as well. (hey, I wrote a fairly large SGML parser in the 1990s in Perl, and if you have ever read Goldfarb's SGML handbook you will have noted that SGML wasn't exactly designed with someone actually writing parsers in mind :-). as for tools and libraries for Java (and C, C# and Python), there's certainly Antlr and I have used JavaCC for certain domain specific languages. I don't know how old the Python support is in Antlr but Antlr does support Python. >>> as for criticizms of Thrift I'd rather hold off on those until I have >>> enough spare time to actually contribute solutions. > > Non-malicious criticisms are always welcome! You don't have to be able or > willing to provide solutions yourself to raise issues. Others may be unaware > of the issues and able to sort them out, or at the very least you will > probably get an explanation of why something is the way it is. I've touched on this discussion (writing the compiler in a different way) before and I backed off because I was afraid that I wouldn't be able to express why I disagree without sounding like a dick. anyway, the two things I would do would be to: 1) re-do the parser using Antlr and do the compiler in Java or Python. a Python runtime is more common and probably the most portable alternative, but I think a Java compiler would be more useful. especially since the pain is felt very strongly in, for instance, Mavenized Java projects that have to deal with multiple platforms (we use OSX, Linux, Windows and FreeBSD). (build systems for C/C++ environments tend to be less....well, braindead, than for Java, so the pain is less apparent there) (if the code generation was done using some sort of templating system for code generation that is already portable it might be feasible to offer Thrift compilers in more languages without much extra effort since adding a new language would mean expressing the code to be generated just once, but being able to process it in multiple languages) someone on my team actually did write an almost complete Thrift grammar for Antlr and I have been pestering him to release it, but he wanted to finish it first. (we have an internal serialization protocol for a domain specific use and we needed to automate conversion between thrift structures and this protocol. so we used thrift as the common language for specifying data structures and generated code for both and for the conversion between the formats). we're in a race towards a deadline (again), but if anyone is interested in the grammar I can probably squeeze it out of him :-) 2) I'd replace the networking code for Java with Netty and I would look at doing similar things in other languages. Like you I hate unnecessary dependencies. Years ago (2003?) I wrote my own NIO networking library for a piece of software that had insanely stringent latency requirements, but if I were doing the same project today my first instinct would have been to use Netty rather than write my own. Not because Netty is faster (it may or may not be), but because it gives you a lot of stuff for free. For instance we would not have had the issue of servers being easy to crash by feeding them random data in earlier versions. There are lots of code for doing packet framing for Netty that works well. And adding SSL, well, that is pretty much just a couple of lines of extra code. Adding compression? ditto. I hate most Java frameworks but Netty would be a worthwhile dependency since it is actually a quality piece of software. It is the sort of quality software that inspires people do things in more sensible ways. If you haven't looked at it, do a simple project in it. You'll be pleased. Besides, in Java, lots of dependencies are the norm. Depending on Netty is less of a hassle than depending on ... well even just a compiler to begin with on OSX. Netty would also make it a bit simpler to add neat things like a FORM interface to the RPC system (you might have seen this in a previous job ;-). doing an HTTP interface when you have Netty already in place is not a lot of work at all. I can also offer some perspective on why we chose Thrift early on and why we are now gradually drifting away from it and reverting to REST/JSON APIs. about half of my current team came from Google. since you have worked there you have used Google's RPC layer (is the name officially known outside Google?). the good thing about the RPC layer in Google is that it works and you spend almost no time dealing with it. we genuinely hated the complexity of SOAP, we found Java RPC to be too limiting, and even REST/JSON requires some painful level of engagement in Java (which is a typical disease of most Java frameworks that implement anything "hip"). so we chose Thrift hoping that it we would be able to make it work painlessly for us. but it didn't. because it was a hassle to use on anything but Linux and it even behaved a bit differently on different macs in the office (I have forgotten the root cause). we couldn't get other people to adopt it. (the main challenge was when we had some Windows "customers" and parts of the system were using a newer Thrift version to get around the OOM-problem in the framing code. we'd have to assist them in getting the thrift compiler built on Windows and then make it work with Maven. with none of the developers in my office having a Windows machine. and of course, you couldn't get pre-built windows binaries for arbitrary SVN revisions from anywhere) and to be honest, we spent a fair share of work ensuring that all the code across all projects, being in different states of development with different versions of the thrift compiler would work on every platform we used. it wasn't the painless development we were used to. so gradually we ditched Thrift for all outward facing APIs. and about 3-4 months ago we started ditching it for most new internal APIs as well. in favor of JSON/REST. which is somewhat less efficient, but well within the acceptable range for the things we use it for. (besides, it is cheaper to add a few servers than it is to spend a lot of time streamlining Thrift usage for people scattered across the globe inside the company -- not to mention outside). (and adding to all these problems was the fact that since Thrift wasn't released into Maven central at that time, projects like Cassandra were trailing behind current versions. this meant that anything that needed to talk to Cassandra would either have to use the same version of Thrift as Cassandra or we would have to go the long and painful route of having to OSGi'ify things to be able to support mutiple Thrift generations in the same JVM. Trust me, that can be a real pain in the ass :-). I'm sure a lot of these problems have been fixed, but the awkwardness of using Thrift in a large, heterogenous development environment sort of killed it for us. And we really wanted to use it. (I have probably forgotten a truckload of the problems we had initially by now. A lot has happened with our projects in the last year). so when I say that I'd rather have the Thrift compiler in Java or Python it isn't because I'm a Java or Python fanboy. in fact, I don't even use Python. it is because I want to reduce complexity. I also think that solving a lot of problems for Java is more important. People really need more things that actually *work* in Java without dragging in more complexity. but as I said, if I had the time I'd much rather contribute code than what could be construed as whiny, long-winded criticisms. -Bjørn