Re: Python server over HTTP, HTTPS -- How?

Chad Maine Fri, 24 Apr 2009 12:17:22 -0700

You might want to consider Pound (http://www.apsis.ch/pound/) as well.  I've
been using it for years to load balance HTTP services.


On Fri, Apr 24, 2009 at 1:49 PM, Brian Hammond <[email protected]>wrote:

> Hi,
>
> HAProxy looks good. The problem I have with it is that it doesn't support
> SSL. Some of my thrift requests must go over SSL (e.g.
> login/logout/update-profile). Thus, if I used HAProxy I'd need to
> incorporate a "token server" or "auth server" that uses SSL, and at that
> point I may as well stick to nginx.
>
> Thanks,
> Brian
>
>
> On Apr 24, 2009, at 3:46 AM, David Balatero wrote:
>
>  I don't see why not -- having a fast proxy seems like the best thing to do
>> given 8 slower instances behind it. Also, you might look into HAProxy, as
>> I
>> hear it does arbitrary TCP load-balancing as well as specific HTTP
>> balancing.
>>
>> On Thu, Apr 23, 2009 at 9:01 PM, Brian Hammond <[email protected]
>> >wrote:
>>
>>  Hi David,
>>>
>>> I've been working on a completely different project for the past few
>>> weeks.
>>> I'm now getting back into this.
>>>
>>> The Python THttpServer implementation might be a good starting point
>>>
>>>> for you in terms of the nuts and bolts of connecting your server to
>>>> Thrift.  I would *not* recommend using it for production use (I use
>>>> it as a mock backend for some integration tests) for performance
>>>> reasons.
>>>>
>>>>
>>>
>>> Right, I wouldn't expect *one* of the THttpServer instances to perform
>>> well
>>> -- too much of a funnel.  However, this made me think that it might be
>>> worthwhile to load-balance a number of them.
>>>
>>> I setup nginx with 4 worker processes (one per core) as a load balancer
>>> to
>>> 8 (arbitrary) python processes.  These upstream processes are -- at first
>>> stab (no Thrift yet) -- just running a BaseHTTPServer do_GET that returns
>>> "hello world".  Nginx simply does round-robin between the 8 upstream
>>> processes.
>>>
>>> I figured this would be a good way to test if THttpServer would perform
>>> well enough for my purposes since THttpServer.RequestHandler is based on
>>> BaseHTTPServer.
>>>
>>> Over loopback:
>>>
>>> $ ab -n 20000 -c 1000 127.0.0.1/index.html
>>>
>>> ...
>>> Requests per second:    11644.32 [#/sec] (mean)
>>> ...
>>>
>>> From my laptop here in NY to my server in The Planet (Dallas, TX):
>>>
>>> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html
>>>
>>> ...
>>> Requests per second:    788.20 [#/sec] (mean)
>>> ...
>>>
>>> I'm pretty happy with these numbers but of course the upstream processes
>>> do
>>> nothing interesting.  My data-store is redis [1] however which is
>>> extremely
>>> efficient given its nature (an in-memory key-value "database").  Thus, I
>>> don't expect much overhead from thrift or redis.  But, I'll test this
>>> assumption of course.
>>>
>>> Sorry if this is obvious to a lot of you on this list.  This might be
>>> useful to others getting started.
>>>
>>> Does anyone see any huge glaring problem with the idea of putting fast
>>> nginx in front of a number of "slow" THttpServer-based processes?
>>>
>>> Thanks,
>>> Brian
>>>
>>> On Apr 3, 2009, at 12:59 AM, David Reiss wrote:
>>>
>>>
>>>
>>>> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71
>>>>
>>>> The Python THttpServer implementation might be a good starting point
>>>> for you in terms of the nuts and bolts of connecting your server to
>>>> Thrift.  I would *not* recommend using it for production use (I use
>>>> it as a mock backend for some integration tests) for performance
>>>> reasons.
>>>> In order to avoid having a Thrift thread blocked on
>>>> over-the-net-to-a-poorly-connected-client I/O, I would suggest using
>>>> a server that will buffer up the whole request, then hand it to Thrift,
>>>> then buffer up the Thrift response, then, send the response to the
>>>> client.
>>>> You probably want to put the POST data in a TMemoryBuffer (not a
>>>> TBufferedTransport, which uses a fixed-size buffer).
>>>>
>>>> --David
>>>>
>>>> Brian Hammond wrote:
>>>>
>>>>  HI Garrett,
>>>>>
>>>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>>>>
>>>>> ----- "Brian Hammond" <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>  What I'm curious about is how I can do all of the following:
>>>>>>>
>>>>>>> 1) use SSL to encrypt user credentials
>>>>>>> 2) write my service implementation in python
>>>>>>>
>>>>>>> I guess there's a few options for python but none completely solve
>>>>>>> both of these requirements.
>>>>>>>
>>>>>>> 1) use the Twisted python generator and run a daemon with twistd
>>>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in support
>>>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>>>>>>
>>>>>>>  Unless you need an asynchronous server side framework for high
>>>>>> concurrency and low memory footprint, I would stay clear of Twisted.
>>>>>>
>>>>>>
>>>>> It turns out that I need a highly efficient server.  I'm a one-man
>>>>> shop and am limited in the number of servers I can afford to deploy.
>>>>> I plan on starting with a bare minimum of two load-balanced VPS
>>>>> instances so memory is tight.  I do also need high concurrency.  I'm
>>>>> developing a turn-based game server and have a very large user base
>>>>> already (iPhone app) and would like to license my solution to other
>>>>> similar iPhone developers ... of course I can enlarge my cluster of
>>>>> servers linearly with the number of licensees.  I digress...
>>>>>
>>>>> I think a standard threaded wsgi server would work fine.
>>>>>
>>>>>>
>>>>>>
>>>>> Suggestions?  CherryPy?
>>>>>
>>>>> If you're inclined to use a mod_wsgi, I recommend Graham Dumpleton's
>>>>>
>>>>>> outstanding wsgi implementation for Apache. The Nginx wsgi interface
>>>>>> is good as well, but beware if your app needs to block -- you'll be
>>>>>> serializing your requests.
>>>>>>
>>>>>>
>>>>> True.  Nginx is indeed single-threaded.  I'm not leaning in any way to
>>>>> any particular serving tech. at this point actually.  I just want to
>>>>> ensure that whatever tech. I choose is as efficient as possible.
>>>>>
>>>>> I actually don't have any points of blocking in the front-end
>>>>> actually, not on disk I/O at least.  My datastore is a file-backed key-
>>>>> value database that runs in a separate process and writes to disk on
>>>>> every Nth database modification.
>>>>>
>>>>> Both options would let you run SSL as well as handle basic or digest
>>>>>
>>>>>> auth.
>>>>>>
>>>>>>
>>>>> True.
>>>>>
>>>>> As far as tying in Thrift, I haven't done this myself and
>>>>>
>>>>>> unfortunately can't offer much. Hopefully there are others here who
>>>>>> can. As you've already suggested, taking a look at the RPC layer and
>>>>>> seeing how you can tie it into the backend from wsgi is a start.
>>>>>>
>>>>>>
>>>>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>>>>
>>>>> IMO, the lack of a security story for Thrift is a weakness. I'm not
>>>>>
>>>>>> sure what discussions there have been to address this. I started to
>>>>>> implement SSL support for Java and Python, but found I had to modify
>>>>>> a fair amount of Thrift code and ended up punting by using stunnel to
>>>>>> setup a secure connection between client and server. You might find
>>>>>> this the path of least resistance as well, in particular if you can
>>>>>> add
>>>>>> the authentication layer to your Thrift IDL.
>>>>>>
>>>>>>
>>>>> Yeah, built-in SSL support would be nice.
>>>>>
>>>>> My client will be running on an iPhone -- no stunnel.  Oh, yeah, I
>>>>> should mention that it seems most people use Thrift for talking from
>>>>> say their web server to *internal* web services but I'm planning on
>>>>> using it as a public-facing web service, like the EverNote folks are.
>>>>> It was actually good to see another instance of someone planning on
>>>>> using Thrift this way.
>>>>>
>>>>> As one other approach, you can use a symmetric key to sign a request
>>>>>
>>>>>> and send the signature in the clear with the rest of your thrift data.
>>>>>> As long as you keep the signing key secret, this would let you
>>>>>> validate
>>>>>> the origin and integrity of the request. If there's anything sensitive
>>>>>> in the request itself, though, this is no good.
>>>>>>
>>>>>>
>>>>> Right.  I cannot really trust the client -- iPhone apps are getting
>>>>> cracked left and right.  Once cracked, someone will poke around enough
>>>>> in the binary to find out my secret symmetric key even if not stored
>>>>> as a literal string.
>>>>>
>>>>> Thus, I want to use SSL for anything sensitive.
>>>>>
>>>>> I'll create the equivalent of an auth token (same idea as login
>>>>> cookies) with opaque data encrypted using a symmetric key only
>>>>> available on the service-side.  The client will send back the auth
>>>>> token with each Thrift RPC.  There's a lot more to this to fight
>>>>> replay attacks, client spoofing, etc. but that isn't relevant here.
>>>>>
>>>>> I need to be able to register a user account from the client (I know,
>>>>> spammers will try to automate that but I have countermeasures) and
>>>>> login the user as well.  This requires sending the sensitive user
>>>>> information which, while essentially obfuscated to eavesdroppers by
>>>>> virtue of using a binary protocol, can be reverse engineered easily
>>>>> enough I bet.
>>>>>
>>>>> Alas, message signing is another application layer measure -- it would
>>>>>
>>>>>> be sweet to see auth work its way into the Thrift spec.
>>>>>>
>>>>>>
>>>>> Yeah, I'm planning on requiring signatures ala Amazon Web Services.
>>>>> Some data used in the request signature calculation will only be
>>>>> available to the client and the service and never transmitted between
>>>>> them in the clear -- it would be transmitted to the client during a
>>>>> login over HTTPS.
>>>>>
>>>>> Auth in Thrift would be wonderful but I wonder if that's feature creep?
>>>>>
>>>>> Good luck!
>>>>>
>>>>>>
>>>>>> Garrett
>>>>>>
>>>>>>
>>>>> Thanks!
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>
>

Re: Python server over HTTP, HTTPS -- How?

Reply via email to