On Tue, Dec 17, 2013 at 6:26 PM, Ævar Arnfjörð Bjarmason <[email protected]> wrote: > I have a key-value API implemented with nginx/uWSGI that just does two > types of requests. A GET request with a key=SOME_KEY query parameter > which returns a body with a blob as a payload, and a corresponding > POST request with the same key=SOME_KEY that returns the blob. > > Everything works just fine, but I'm running nginx 1.2.8 and uWSGI > 1.9.20 and when I do the zerg dance to deploy a new server, i.e. just: > > mv uwsgi.pid uwsgi.pid.old > uwsgi [...] --zerg-server --pidfile2 uwsgi.pid > uwsgi --pause uwsgi.pid.old > uwsgi --stop uwsgi.pid.old > > I found that in the --stop step (and I added a generous sleep before > that) I'd get a storm of requests where I'd done a GET and got no > payload. > > To debug this I added a UUID parameter to both the GET and the POST > requests, the server will get take the (unique) UUID, and send it back > in its response via a X-UUID header. > > What I'm finding is that during the zerg dance I'm mixing & matching > requests, i.e. usually the UUID parameter always matches the X-UUID > header, but during the zerg dance I get the a response in reply to > *other* requests. > > The reason I spotted this with the GET requests was that there I was > expecting a payload but didn't get it, I was instead getting the 200 > OK of "your blob was saved" from some random POST requests. > > I think that the POST requests are also being mixed up too but since > they (currently) would accept a reply from a GET request as a valid > reply they didn't error out. > > I also find that the GET request doesn't even show up in the nginx > access log, so it might be some nginx <-> uWSGI issue. Here's a GET > request for the 178c754f66fa0070 UUID: > > $ grep 178c754f66fa0070 /tmp/port-80-traffic.dump /var/log/nginx/*.log > /tmp/port-80-traffic.dump:010.187.186.026.56611-010.187.215.034.00080: > GET /get?key=0031496763b44b8eb3305a44f58eb0fa&uuid=178c754f66fa0070 > HTTP/1.1 > > Note that there's nothing in the access log for it. That request ended > up getting a reply for the 178c754e66fa006f UUID: > > $ grep 178c754e66fa006f /tmp/port-80-traffic.dump /var/log/nginx/*.log > /tmp/port-80-traffic.dump:010.187.186.026.56611-010.187.215.034.00080: > POST > /set?category=1&key=003121d2cac040e7950089e039be8693&uuid=178c754e66fa006f > HTTP/1.1 > /tmp/port-80-traffic.dump:X-UUID: 178c754e66fa006f > /var/log/nginx/access.log:10.187.186.26 - - [17/Dec/2013:17:41:02 > +0100] "POST /set?key=003121d2cac040e7950089e039be8693&uuid=178c754e66fa006f > HTTP/1.1" 200 0 0.316/0.026 "-" "-" > > I'll poke some more at this myself, I just thought I'd send something > about it to the list to ask if someone else has run into this before.
I wrote some standalone test scripts to reproduce this and found out that it wasn't an issue with nginx/uWSGI at all, but with the HTTP library I was using. It had a bug where if it ran into read timeouts it would return a timeout but never flush the buffer incoming response that it got from the request that produced the timeout. So if it made two requests, A and B and A timed out when it requested B it would get the reply to A. So all the mixing up of requests that was going on was between two subsequent requests from the same requesting process, and the reason it was associated with the Zerg Dance was that I was really close to running out of timeout, and spawning two sets of processes on a loaded machine would result in some timeouts. _______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
