You might want to consider Pound (http://www.apsis.ch/pound/) as well. I've been using it for years to load balance HTTP services.
On Fri, Apr 24, 2009 at 1:49 PM, Brian Hammond <[email protected]>wrote: > Hi, > > HAProxy looks good. The problem I have with it is that it doesn't support > SSL. Some of my thrift requests must go over SSL (e.g. > login/logout/update-profile). Thus, if I used HAProxy I'd need to > incorporate a "token server" or "auth server" that uses SSL, and at that > point I may as well stick to nginx. > > Thanks, > Brian > > > On Apr 24, 2009, at 3:46 AM, David Balatero wrote: > > I don't see why not -- having a fast proxy seems like the best thing to do >> given 8 slower instances behind it. Also, you might look into HAProxy, as >> I >> hear it does arbitrary TCP load-balancing as well as specific HTTP >> balancing. >> >> On Thu, Apr 23, 2009 at 9:01 PM, Brian Hammond <[email protected] >> >wrote: >> >> Hi David, >>> >>> I've been working on a completely different project for the past few >>> weeks. >>> I'm now getting back into this. >>> >>> The Python THttpServer implementation might be a good starting point >>> >>>> for you in terms of the nuts and bolts of connecting your server to >>>> Thrift. I would *not* recommend using it for production use (I use >>>> it as a mock backend for some integration tests) for performance >>>> reasons. >>>> >>>> >>> >>> Right, I wouldn't expect *one* of the THttpServer instances to perform >>> well >>> -- too much of a funnel. However, this made me think that it might be >>> worthwhile to load-balance a number of them. >>> >>> I setup nginx with 4 worker processes (one per core) as a load balancer >>> to >>> 8 (arbitrary) python processes. These upstream processes are -- at first >>> stab (no Thrift yet) -- just running a BaseHTTPServer do_GET that returns >>> "hello world". Nginx simply does round-robin between the 8 upstream >>> processes. >>> >>> I figured this would be a good way to test if THttpServer would perform >>> well enough for my purposes since THttpServer.RequestHandler is based on >>> BaseHTTPServer. >>> >>> Over loopback: >>> >>> $ ab -n 20000 -c 1000 127.0.0.1/index.html >>> >>> ... >>> Requests per second: 11644.32 [#/sec] (mean) >>> ... >>> >>> From my laptop here in NY to my server in The Planet (Dallas, TX): >>> >>> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html >>> >>> ... >>> Requests per second: 788.20 [#/sec] (mean) >>> ... >>> >>> I'm pretty happy with these numbers but of course the upstream processes >>> do >>> nothing interesting. My data-store is redis [1] however which is >>> extremely >>> efficient given its nature (an in-memory key-value "database"). Thus, I >>> don't expect much overhead from thrift or redis. But, I'll test this >>> assumption of course. >>> >>> Sorry if this is obvious to a lot of you on this list. This might be >>> useful to others getting started. >>> >>> Does anyone see any huge glaring problem with the idea of putting fast >>> nginx in front of a number of "slow" THttpServer-based processes? >>> >>> Thanks, >>> Brian >>> >>> On Apr 3, 2009, at 12:59 AM, David Reiss wrote: >>> >>> >>> >>>> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71 >>>> >>>> The Python THttpServer implementation might be a good starting point >>>> for you in terms of the nuts and bolts of connecting your server to >>>> Thrift. I would *not* recommend using it for production use (I use >>>> it as a mock backend for some integration tests) for performance >>>> reasons. >>>> In order to avoid having a Thrift thread blocked on >>>> over-the-net-to-a-poorly-connected-client I/O, I would suggest using >>>> a server that will buffer up the whole request, then hand it to Thrift, >>>> then buffer up the Thrift response, then, send the response to the >>>> client. >>>> You probably want to put the POST data in a TMemoryBuffer (not a >>>> TBufferedTransport, which uses a fixed-size buffer). >>>> >>>> --David >>>> >>>> Brian Hammond wrote: >>>> >>>> HI Garrett, >>>>> >>>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote: >>>>> >>>>> ----- "Brian Hammond" <[email protected]> wrote: >>>>> >>>>>> >>>>>> What I'm curious about is how I can do all of the following: >>>>>>> >>>>>>> 1) use SSL to encrypt user credentials >>>>>>> 2) write my service implementation in python >>>>>>> >>>>>>> I guess there's a few options for python but none completely solve >>>>>>> both of these requirements. >>>>>>> >>>>>>> 1) use the Twisted python generator and run a daemon with twistd >>>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in support >>>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs. >>>>>>> >>>>>>> Unless you need an asynchronous server side framework for high >>>>>> concurrency and low memory footprint, I would stay clear of Twisted. >>>>>> >>>>>> >>>>> It turns out that I need a highly efficient server. I'm a one-man >>>>> shop and am limited in the number of servers I can afford to deploy. >>>>> I plan on starting with a bare minimum of two load-balanced VPS >>>>> instances so memory is tight. I do also need high concurrency. I'm >>>>> developing a turn-based game server and have a very large user base >>>>> already (iPhone app) and would like to license my solution to other >>>>> similar iPhone developers ... of course I can enlarge my cluster of >>>>> servers linearly with the number of licensees. I digress... >>>>> >>>>> I think a standard threaded wsgi server would work fine. >>>>> >>>>>> >>>>>> >>>>> Suggestions? CherryPy? >>>>> >>>>> If you're inclined to use a mod_wsgi, I recommend Graham Dumpleton's >>>>> >>>>>> outstanding wsgi implementation for Apache. The Nginx wsgi interface >>>>>> is good as well, but beware if your app needs to block -- you'll be >>>>>> serializing your requests. >>>>>> >>>>>> >>>>> True. Nginx is indeed single-threaded. I'm not leaning in any way to >>>>> any particular serving tech. at this point actually. I just want to >>>>> ensure that whatever tech. I choose is as efficient as possible. >>>>> >>>>> I actually don't have any points of blocking in the front-end >>>>> actually, not on disk I/O at least. My datastore is a file-backed key- >>>>> value database that runs in a separate process and writes to disk on >>>>> every Nth database modification. >>>>> >>>>> Both options would let you run SSL as well as handle basic or digest >>>>> >>>>>> auth. >>>>>> >>>>>> >>>>> True. >>>>> >>>>> As far as tying in Thrift, I haven't done this myself and >>>>> >>>>>> unfortunately can't offer much. Hopefully there are others here who >>>>>> can. As you've already suggested, taking a look at the RPC layer and >>>>>> seeing how you can tie it into the backend from wsgi is a start. >>>>>> >>>>>> >>>>> Yeah, that's what I gather. I'll play with it over the weekend. >>>>> >>>>> IMO, the lack of a security story for Thrift is a weakness. I'm not >>>>> >>>>>> sure what discussions there have been to address this. I started to >>>>>> implement SSL support for Java and Python, but found I had to modify >>>>>> a fair amount of Thrift code and ended up punting by using stunnel to >>>>>> setup a secure connection between client and server. You might find >>>>>> this the path of least resistance as well, in particular if you can >>>>>> add >>>>>> the authentication layer to your Thrift IDL. >>>>>> >>>>>> >>>>> Yeah, built-in SSL support would be nice. >>>>> >>>>> My client will be running on an iPhone -- no stunnel. Oh, yeah, I >>>>> should mention that it seems most people use Thrift for talking from >>>>> say their web server to *internal* web services but I'm planning on >>>>> using it as a public-facing web service, like the EverNote folks are. >>>>> It was actually good to see another instance of someone planning on >>>>> using Thrift this way. >>>>> >>>>> As one other approach, you can use a symmetric key to sign a request >>>>> >>>>>> and send the signature in the clear with the rest of your thrift data. >>>>>> As long as you keep the signing key secret, this would let you >>>>>> validate >>>>>> the origin and integrity of the request. If there's anything sensitive >>>>>> in the request itself, though, this is no good. >>>>>> >>>>>> >>>>> Right. I cannot really trust the client -- iPhone apps are getting >>>>> cracked left and right. Once cracked, someone will poke around enough >>>>> in the binary to find out my secret symmetric key even if not stored >>>>> as a literal string. >>>>> >>>>> Thus, I want to use SSL for anything sensitive. >>>>> >>>>> I'll create the equivalent of an auth token (same idea as login >>>>> cookies) with opaque data encrypted using a symmetric key only >>>>> available on the service-side. The client will send back the auth >>>>> token with each Thrift RPC. There's a lot more to this to fight >>>>> replay attacks, client spoofing, etc. but that isn't relevant here. >>>>> >>>>> I need to be able to register a user account from the client (I know, >>>>> spammers will try to automate that but I have countermeasures) and >>>>> login the user as well. This requires sending the sensitive user >>>>> information which, while essentially obfuscated to eavesdroppers by >>>>> virtue of using a binary protocol, can be reverse engineered easily >>>>> enough I bet. >>>>> >>>>> Alas, message signing is another application layer measure -- it would >>>>> >>>>>> be sweet to see auth work its way into the Thrift spec. >>>>>> >>>>>> >>>>> Yeah, I'm planning on requiring signatures ala Amazon Web Services. >>>>> Some data used in the request signature calculation will only be >>>>> available to the client and the service and never transmitted between >>>>> them in the clear -- it would be transmitted to the client during a >>>>> login over HTTPS. >>>>> >>>>> Auth in Thrift would be wonderful but I wonder if that's feature creep? >>>>> >>>>> Good luck! >>>>> >>>>>> >>>>>> Garrett >>>>>> >>>>>> >>>>> Thanks! >>>>> Brian >>>>> >>>>> >>>>> >>> >
