On Tue, Dec 17, 2013 at 6:26 PM, Ævar Arnfjörð Bjarmason
<[email protected]> wrote:
> I have a key-value API implemented with nginx/uWSGI that just does two
> types of requests. A GET request with a key=SOME_KEY query parameter
> which returns a body with a blob as a payload, and a corresponding
> POST request with the same key=SOME_KEY that returns the blob.
>
> Everything works just fine, but I'm running nginx 1.2.8 and uWSGI
> 1.9.20 and when I do the zerg dance to deploy a new server, i.e. just:
>
>     mv uwsgi.pid uwsgi.pid.old
>     uwsgi [...] --zerg-server --pidfile2 uwsgi.pid
>     uwsgi --pause uwsgi.pid.old
>     uwsgi --stop uwsgi.pid.old
>
> I found that in the --stop step (and I added a generous sleep before
> that) I'd get a storm of requests where I'd done a GET and got no
> payload.
>
> To debug this I added a UUID parameter to both the GET and the POST
> requests, the server will get take the (unique) UUID, and send it back
> in its response via a X-UUID header.
>
> What I'm finding is that during the zerg dance I'm mixing & matching
> requests, i.e. usually the UUID parameter always matches the X-UUID
> header, but during the zerg dance I get the a response in reply to
> *other* requests.
>
> The reason I spotted this with the GET requests was that there I was
> expecting a payload but didn't get it, I was instead getting the 200
> OK of "your blob was saved" from some random POST requests.
>
> I think that the POST requests are also being mixed up too but since
> they (currently) would accept a reply from a GET request as a valid
> reply they didn't error out.
>
> I also find that the GET request doesn't even show up in the nginx
> access log, so it might be some nginx <-> uWSGI issue. Here's a GET
> request for the 178c754f66fa0070 UUID:
>
>     $ grep 178c754f66fa0070 /tmp/port-80-traffic.dump /var/log/nginx/*.log
>     /tmp/port-80-traffic.dump:010.187.186.026.56611-010.187.215.034.00080:
> GET /get?key=0031496763b44b8eb3305a44f58eb0fa&uuid=178c754f66fa0070
> HTTP/1.1
>
> Note that there's nothing in the access log for it. That request ended
> up getting a reply for the 178c754e66fa006f UUID:
>
>     $ grep 178c754e66fa006f /tmp/port-80-traffic.dump /var/log/nginx/*.log
>     /tmp/port-80-traffic.dump:010.187.186.026.56611-010.187.215.034.00080:
> POST 
> /set?category=1&key=003121d2cac040e7950089e039be8693&uuid=178c754e66fa006f
> HTTP/1.1
>     /tmp/port-80-traffic.dump:X-UUID: 178c754e66fa006f
>     /var/log/nginx/access.log:10.187.186.26 - - [17/Dec/2013:17:41:02
> +0100] "POST /set?key=003121d2cac040e7950089e039be8693&uuid=178c754e66fa006f
> HTTP/1.1" 200 0 0.316/0.026 "-" "-"
>
> I'll poke some more at this myself, I just thought I'd send something
> about it to the list to ask if someone else has run into this before.

I wrote some standalone test scripts to reproduce this and found
out that it wasn't an issue with nginx/uWSGI at all, but with the
HTTP library I was using.

It had a bug where if it ran into read timeouts it would return a
timeout but never flush the buffer incoming response that it got
from the request that produced the timeout.

So if it made two requests, A and B and A timed out when it
requested B it would get the reply to A.

So all the mixing up of requests that was going on was between
two subsequent requests from the same requesting process, and the
reason it was associated with the Zerg Dance was that I was
really close to running out of timeout, and spawning two sets of
processes on a loaded machine would result in some timeouts.
_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Reply via email to