I've been experimenting with the retry logic in adapters.  It looks like it 
is actually very difficult to get it to work exactly right and not lose any 
requests when you're restarting the server.

Suppose the app server isn't running.  The adapter calls sock.connect() to 
connect to the server.  This will time out.  The adapter will then wait a 
little while under the assumption that the app server is being restarted, 
then it will retry the connect.  This part works fine, and the request will 
be successfully handled when the app server starts up.

But what if the app server shuts down right when a request comes in?  Let's 
suppose the adapter successfully calls sock.connect(), sock.send(data), and 
sock.shutdown(1) and is now blocking in sock.recv().  Now suppose that at 
this point the app server's select() hasn't yet awakened and noticed the 
new request.  If it gets a shutdown request at this point, it will close 
the main socket, which causes a socket error in the Adapter (specifically, 
errno=WSAECONNRESET on Windows, I assume some similar error on 
Unix?).  Right now the adapters will just fail at this point without 
retrying; they only retry on timeouts, not on genuine broken socket errors.

So when you restart your app server, you run the risk of killing requests 
that were happening right at the time that you restarted it.  This is 
actually pretty easy for me to provoke.

One possible fix is for the adapter to retry the whole transaction with the 
app server if a WSAECONNRESET or the corresponding Unix error occurs, as 
long as the adapter hasn't actually received any response data yet.

I tried implementing this in Adapter.py and it seems to fix the retry 
problems -- I can now restart the app server without losing any requests 
even under heavy load.  But I have no way to test this on Unix and I don't 
know what the corresponding socket error is there, so I figured I'd solicit 
some feedback before I checked in the change.

Also, from eyeballing mod_webkit's code, it looks like it also only retries 
the initial socket connect() -- mod_webkit would also have to be fixed up.

Any thoughts?  Is it even worth the trouble to fix this?


--

- Geoff Talvola
   [EMAIL PROTECTED]

_______________________________________________
Webware-devel mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/webware-devel

Reply via email to