Re: [uWSGI] Possible setup race condition between uwsgi-plugin-python and python-flask

Clemens Kolbitsch Fri, 13 Jun 2014 18:25:33 -0700

Hi all!

TL;DR: We found the issue; It's a race-condition in python-werkzeug (used
internally by flask), so NOT uwsgi related.


Thank you all for the help you provided!

I just wanted to post a quick follow-up on this anyways. A colleague of
mine (much smarter than me ;) ) just found the root-cause of the issue. It
is NOT related to uwsgi, but is a race-condition in flask. The blueprint
uses a lazy-ordering of blueprints, which causes a race-condition during
the set-up.

In the spirit of http://xkcd.com/979/, I'm posting a few details, despite
that it's not uwsgi-related (and now I'm quoting my colleague):

I was able to reproduce the issue reliably with 32 parallel requests by
adding the following to the wsgi script:

sys.setcheckinterval(1)

This forces the Python interpreter to switch threads more aggressively -
the default is 100. At that point I even saw it happen right after startup
(with or without lazy loading). So it's not an isolated issue of the
respawn functionality - it appeared to be more general.
Looking at the route handling of flask, which uses werkzeug internally,
there's a data structure werkzeug.routing.Map, which holds all registered
urls. For efficient matching rules upon a request, werkzeug needs the rules
sorted. The sorting, however, is done lazily using a method called update()
- self._remap is set to true whenever a new rule is added:

    def update(self):
        """Called before matching and building to keep the compiled rules
        in the correct order after things changed.
        """
        if self._remap:
            self._rules.sort(key=lambda x: x.match_compare_key())
            for rules in self._rules_by_endpoint.itervalues():
                rules.sort(key=lambda x: x.build_compare_key())
            self._remap = False

In the setup code of our application, we register blueprints. This adds
rules to the map and self._remap is set to true - the Map is not sorted at
this point. After our setup code, we return to uwsgi which start accepting
possibly multiple requests in parallel that are all dispatched to Python.
This means, however, that two threads can see self._remap as True and both
of them start sorting the map inplace.
The problem happens when the first thread sets self._remap to false and a
third thread starts using the Map assuming it is sorted: The second thread
might just have started its sorting operation. I assume that during sort()
in one thread, another thread can see an empty or partially sorted rule
list. The matching will not work correctly and you get a 404. I guess
that's what one calls race condition and I'm actually surprised if there
couldn't be much worse things happen.

Workaround:

If one never registers new urls during runtime, preemptively calling
app.url_map.update() in the setup code seems to be an acceptable solution.
I was not able to reproduce the 404s afterwards (even with the
checkinterval set to 1) after adding the following patch:

def create_app(...):
    app = flask.Flask(__name__)
    [...]

    app.register_blueprint(mymodel, url_prefix="/foo")
+    # Make sure the routing.Map used by werkzeug is sorted to avoid race
conditions in Map.update().
+    app.url_map.update()
    return app

If urls are registered during runtime, this will not help, obviously.

/quote

Maybe this will help other stumbling over the problem

-Clemens


On Wed, Apr 9, 2014 at 8:56 AM, Clemens Kolbitsch <[email protected]>
wrote:

> Hi,
>
> thanks for the suggestion! However, seems like all server processes are
> ready at this point
>
> ## modified wsgi file:
>
> import os
> pid = os.getpid()
> print "loading stage 1", pid
> application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
> print "loading stage 2", pid
>
> application.logger.setLevel(logging.DEBUG)
> for handler in log.logger().handlers:
>     application.logger.addHandler(handler)
> print "loading stage 3", pid
>
> ## updated logs
>
> ...The work of process 17248 is done. Seeya!
> ...The work of process 17249 is done. Seeya!
> worker 1 killed successfully (pid: 17248)
> Respawned uWSGI worker 1 (new pid: 17424)
> worker 2 killed successfully (pid: 17249)
> Respawned uWSGI worker 2 (new pid: 17425)
> loading stage 1 17425
> loading stage 1 17424
> loading stage 2 17424
> loading stage 2 17425
> loading stage 3 17424
> loading stage 3 17425
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
> pid: 17424 (default app)
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
> pid: 17425 (default app)
> ...The work of process 17425 is done. Seeya!
> ...The work of process 17424 is done. Seeya!
> worker 1 killed successfully (pid: 17424)
> Respawned uWSGI worker 1 (new pid: 17956)
> worker 2 killed successfully (pid: 17425)
> Respawned uWSGI worker 2 (new pid: 17957)
> loading stage 1 17957
> loading stage 1 17956
> loading stage 2 17956
> loading stage 2 17957
> loading stage 3 17956
> loading stage 3 17957
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
> pid: 17956 (default app)
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
> pid: 17957 (default app)
> [pid: 17956|app: 0|req: 2/3571] 192.168.100.11 () {32 vars in 432 bytes}
> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 7 msecs
> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1)
> [pid: 17956|app: 0|req: 10/3573] 192.168.100.11 () {32 vars in 432 bytes}
> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in
> 16 msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0)
> [pid: 17956|app: 0|req: 18/3575] 192.168.100.11 () {32 vars in 432 bytes}
> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in
> 24 msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 6)
>
> ## FWIW, I decided to do another test to see if new threads are spawned
> before the function returns
>
> import os, threading
> pid = os.getpid()
> print "loading stage 1", pid, threading.active_count()
>  application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
> print "loading stage 2", pid, threading.active_count()
>
> application.logger.setLevel(logging.DEBUG)
> for handler in log.logger().handlers:
>     application.logger.addHandler(handler)
> print "loading stage 3", pid, threading.active_count()
>
>
> and get
>
>
> Respawned uWSGI worker 1 (new pid: 23145)
> worker 2 killed successfully (pid: 22771)
> Respawned uWSGI worker 2 (new pid: 23146)
> loading stage 1 23146 1
> loading stage 1 23145 1
> loading stage 2 23145 1
> loading stage 3 23145 1
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480
> pid: 23145 (default app)
> loading stage 2 23146 1
> loading stage 3 23146 1
> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480
> pid: 23146 (default app)
> [pid: 23146|app: 0|req: 31/2387] 192.168.100.11 () {32 vars in 432 bytes}
> [Wed Apr  9 15:52:38 2014] GET /myapp/... => generated 238 bytes in 26
> msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 28)
>
>
> so, could it be that it's not uwsgi's fault after all? As mentioned
> initially, the flask-guys sent me to this mailing list, because they said
> it's not flask's fault. But, from all my debug-logs, it seems like the
> first thread does the set-up and at the point where all the blueprints have
> been set up, there is still only one thread.... but of course I have no
> idea of the uwsgi internals (and neither the flask ones) ;)
>
> -Clemens
>
>
>
>
>
>
>
> On Wed, Apr 9, 2014 at 2:40 AM, Roberto De Ioris <[email protected]> wrote:
>
>>
>> > I modified the code as follows:
>> >
>> > [...]
>> > print "loading stage 1"
>> > application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
>> > print "loading stage 2"
>> > application.logger.setLevel(logging.DEBUG)
>> > for handler in log.logger().handlers:
>> >     application.logger.addHandler(handler)
>> > print "loading stage 3"
>> >
>> > and get the following logs
>> >
>> > /usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini 2>&1 | grep -v
>> > HTTP.....200
>> > [uWSGI] getting INI configuration from /etc/uwsgi/apps-enabled/myapp.ini
>> > *** Starting uWSGI 2.0.3 (64bit) on [Tue Apr  8 18:16:20 2014] ***
>> > compiled with version: 4.6.3 on 08 April 2014 02:43:31
>> > os: Linux-3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014
>> > nodename: mynode
>> > machine: x86_64
>> > clock source: unix
>> > detected number of CPU cores: 8
>> > current working directory: /var/log
>> > detected binary path: /usr/local/bin/uwsgi
>> > !!! no internal routing support, rebuild with pcre support !!!
>> > setgid() to 1010
>> > setuid() to 1002
>> > your processes number limit is 128020
>> > your memory page size is 4096 bytes
>> > detected max file descriptor number: 1024
>> > lock engine: pthread robust mutexes
>> > thunder lock: disabled (you can enable it with --thunder-lock)
>> > uwsgi socket 0 bound to TCP address 127.0.0.1:4003 fd 3
>> > Python version: 2.7.3 (default, Feb 27 2014, 20:09:21)  [GCC 4.6.3]
>> > Python main interpreter initialized at 0x1d5d480
>> > python threads support enabled
>> > your server socket listen backlog is limited to 100 connections
>> > your mercy for graceful operations on workers is 60 seconds
>> > mapped 2503992 bytes (2445 KB) for 150 cores
>> > *** Operational MODE: preforking+threaded ***
>> > *** uWSGI is running in multiple interpreter mode ***
>> > spawned uWSGI master process (pid: 19627)
>> > spawned uWSGI worker 1 (pid: 19629, cores: 75)
>> > spawned uWSGI worker 2 (pid: 19630, cores: 75)
>> > loading stage 1
>> > loading stage 1
>> > loading stage 2
>> > loading stage 2
>> > loading stage 3
>> > loading stage 3
>> >
>> > [...]
>> >
>> > worker 2 killed successfully (pid: 22063)
>> > Respawned uWSGI worker 2 (new pid: 22494)
>> > worker 1 killed successfully (pid: 22064)
>> > Respawned uWSGI worker 1 (new pid: 22495)
>> > loading stage 1
>> > loading stage 1
>> > loading stage 2
>> > loading stage 3
>> > loading stage 2
>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>> > pid:
>> > 22494 (default app)
>> > loading stage 3
>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>> > pid:
>> > 22495 (default app)
>> >
>> > [...]
>> >
>> > ...The work of process 22494 is done. Seeya!
>> > ...The work of process 22495 is done. Seeya!
>> > worker 2 killed successfully (pid: 22494)
>> > Respawned uWSGI worker 2 (new pid: 22685)
>> > worker 1 killed successfully (pid: 22495)
>> > Respawned uWSGI worker 1 (new pid: 22686)
>> > loading stage 1
>> > loading stage 1
>> > loading stage 2
>> > loading stage 2
>> > loading stage 3
>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>> > pid:
>> > 22686 (default app)
>> > loading stage 3
>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>> > pid:
>> > 22685 (default app)
>> > [pid: 22685|app: 0|req: 3/4660] 192.168.100.11 () {32 vars in 432 bytes}
>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 6
>> > msecs
>> > (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0)
>> > [pid: 22686|app: 0|req: 16/4665] 192.168.100.11 () {32 vars in 432
>> bytes}
>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 26
>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 5)
>> > [pid: 22686|app: 0|req: 54/4677] 192.168.100.11 () {32 vars in 432
>> bytes}
>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 56
>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 7)
>> > [pid: 22686|app: 0|req: 54/4678] 192.168.100.11 () {32 vars in 432
>> bytes}
>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 58
>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1)
>> >
>> > after this, the logs are empty again - that is, no more 404s for a while
>> > (note that I grep out the HTTP-200s before and after)
>> >
>> > thanks!
>> > -Clemens
>> >
>> >
>>
>> can you repeat the test printing the process pid (os.getpid()) at each
>> line ? in this way we can see if all the stages are executed before the
>> "readyness" announce
>>
>> Thanks
>>
>> --
>> Roberto De Ioris
>> http://unbit.it
>> _______________________________________________
>> uWSGI mailing list
>> [email protected]
>> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
>>
>
>
>

_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Re: [uWSGI] Possible setup race condition between uwsgi-plugin-python and python-flask

Reply via email to