I've reported it to https://github.com/mitsuhiko/werkzeug/issues/546 .
On Sat, Jun 14, 2014 at 10:24 AM, Clemens Kolbitsch <[email protected]> wrote: > Hi all! > > TL;DR: We found the issue; It's a race-condition in python-werkzeug (used > internally by flask), so NOT uwsgi related. > > Thank you all for the help you provided! > > I just wanted to post a quick follow-up on this anyways. A colleague of mine > (much smarter than me ;) ) just found the root-cause of the issue. It is NOT > related to uwsgi, but is a race-condition in flask. The blueprint uses a > lazy-ordering of blueprints, which causes a race-condition during the > set-up. > > In the spirit of http://xkcd.com/979/, I'm posting a few details, despite > that it's not uwsgi-related (and now I'm quoting my colleague): > > I was able to reproduce the issue reliably with 32 parallel requests by > adding the following to the wsgi script: > > sys.setcheckinterval(1) > > This forces the Python interpreter to switch threads more aggressively - the > default is 100. At that point I even saw it happen right after startup (with > or without lazy loading). So it's not an isolated issue of the respawn > functionality - it appeared to be more general. > Looking at the route handling of flask, which uses werkzeug internally, > there's a data structure werkzeug.routing.Map, which holds all registered > urls. For efficient matching rules upon a request, werkzeug needs the rules > sorted. The sorting, however, is done lazily using a method called update() > - self._remap is set to true whenever a new rule is added: > > def update(self): > """Called before matching and building to keep the compiled rules > in the correct order after things changed. > """ > if self._remap: > self._rules.sort(key=lambda x: x.match_compare_key()) > for rules in self._rules_by_endpoint.itervalues(): > rules.sort(key=lambda x: x.build_compare_key()) > self._remap = False > > In the setup code of our application, we register blueprints. This adds > rules to the map and self._remap is set to true - the Map is not sorted at > this point. After our setup code, we return to uwsgi which start accepting > possibly multiple requests in parallel that are all dispatched to Python. > This means, however, that two threads can see self._remap as True and both > of them start sorting the map inplace. > The problem happens when the first thread sets self._remap to false and a > third thread starts using the Map assuming it is sorted: The second thread > might just have started its sorting operation. I assume that during sort() > in one thread, another thread can see an empty or partially sorted rule > list. The matching will not work correctly and you get a 404. I guess that's > what one calls race condition and I'm actually surprised if there couldn't > be much worse things happen. > > Workaround: > > If one never registers new urls during runtime, preemptively calling > app.url_map.update() in the setup code seems to be an acceptable solution. I > was not able to reproduce the 404s afterwards (even with the checkinterval > set to 1) after adding the following patch: > > def create_app(...): > app = flask.Flask(__name__) > [...] > > app.register_blueprint(mymodel, url_prefix="/foo") > + # Make sure the routing.Map used by werkzeug is sorted to avoid race > conditions in Map.update(). > + app.url_map.update() > return app > > If urls are registered during runtime, this will not help, obviously. > > /quote > > Maybe this will help other stumbling over the problem > > -Clemens > > > On Wed, Apr 9, 2014 at 8:56 AM, Clemens Kolbitsch <[email protected]> > wrote: >> >> Hi, >> >> thanks for the suggestion! However, seems like all server processes are >> ready at this point >> >> ## modified wsgi file: >> >> import os >> pid = os.getpid() >> print "loading stage 1", pid >> application = myapp.create_app(CONFIG_FILE, None, log, strict=True) >> print "loading stage 2", pid >> >> application.logger.setLevel(logging.DEBUG) >> for handler in log.logger().handlers: >> application.logger.addHandler(handler) >> print "loading stage 3", pid >> >> ## updated logs >> >> ...The work of process 17248 is done. Seeya! >> ...The work of process 17249 is done. Seeya! >> worker 1 killed successfully (pid: 17248) >> Respawned uWSGI worker 1 (new pid: 17424) >> worker 2 killed successfully (pid: 17249) >> Respawned uWSGI worker 2 (new pid: 17425) >> loading stage 1 17425 >> loading stage 1 17424 >> loading stage 2 17424 >> loading stage 2 17425 >> loading stage 3 17424 >> loading stage 3 17425 >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480 >> pid: 17424 (default app) >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480 >> pid: 17425 (default app) >> ...The work of process 17425 is done. Seeya! >> ...The work of process 17424 is done. Seeya! >> worker 1 killed successfully (pid: 17424) >> Respawned uWSGI worker 1 (new pid: 17956) >> worker 2 killed successfully (pid: 17425) >> Respawned uWSGI worker 2 (new pid: 17957) >> loading stage 1 17957 >> loading stage 1 17956 >> loading stage 2 17956 >> loading stage 2 17957 >> loading stage 3 17956 >> loading stage 3 17957 >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480 >> pid: 17956 (default app) >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480 >> pid: 17957 (default app) >> [pid: 17956|app: 0|req: 2/3571] 192.168.100.11 () {32 vars in 432 bytes} >> [Wed Apr 9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 7 msecs >> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1) >> [pid: 17956|app: 0|req: 10/3573] 192.168.100.11 () {32 vars in 432 bytes} >> [Wed Apr 9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 16 msecs >> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0) >> [pid: 17956|app: 0|req: 18/3575] 192.168.100.11 () {32 vars in 432 bytes} >> [Wed Apr 9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 24 msecs >> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 6) >> >> ## FWIW, I decided to do another test to see if new threads are spawned >> before the function returns >> >> import os, threading >> pid = os.getpid() >> print "loading stage 1", pid, threading.active_count() >> application = myapp.create_app(CONFIG_FILE, None, log, strict=True) >> print "loading stage 2", pid, threading.active_count() >> >> application.logger.setLevel(logging.DEBUG) >> for handler in log.logger().handlers: >> application.logger.addHandler(handler) >> print "loading stage 3", pid, threading.active_count() >> >> >> and get >> >> >> Respawned uWSGI worker 1 (new pid: 23145) >> worker 2 killed successfully (pid: 22771) >> Respawned uWSGI worker 2 (new pid: 23146) >> loading stage 1 23146 1 >> loading stage 1 23145 1 >> loading stage 2 23145 1 >> loading stage 3 23145 1 >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480 >> pid: 23145 (default app) >> loading stage 2 23146 1 >> loading stage 3 23146 1 >> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480 >> pid: 23146 (default app) >> [pid: 23146|app: 0|req: 31/2387] 192.168.100.11 () {32 vars in 432 bytes} >> [Wed Apr 9 15:52:38 2014] GET /myapp/... => generated 238 bytes in 26 msecs >> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 28) >> >> >> so, could it be that it's not uwsgi's fault after all? As mentioned >> initially, the flask-guys sent me to this mailing list, because they said >> it's not flask's fault. But, from all my debug-logs, it seems like the first >> thread does the set-up and at the point where all the blueprints have been >> set up, there is still only one thread.... but of course I have no idea of >> the uwsgi internals (and neither the flask ones) ;) >> >> -Clemens >> >> >> >> >> >> >> >> On Wed, Apr 9, 2014 at 2:40 AM, Roberto De Ioris <[email protected]> wrote: >>> >>> >>> > I modified the code as follows: >>> > >>> > [...] >>> > print "loading stage 1" >>> > application = myapp.create_app(CONFIG_FILE, None, log, strict=True) >>> > print "loading stage 2" >>> > application.logger.setLevel(logging.DEBUG) >>> > for handler in log.logger().handlers: >>> > application.logger.addHandler(handler) >>> > print "loading stage 3" >>> > >>> > and get the following logs >>> > >>> > /usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini 2>&1 | grep -v >>> > HTTP.....200 >>> > [uWSGI] getting INI configuration from >>> > /etc/uwsgi/apps-enabled/myapp.ini >>> > *** Starting uWSGI 2.0.3 (64bit) on [Tue Apr 8 18:16:20 2014] *** >>> > compiled with version: 4.6.3 on 08 April 2014 02:43:31 >>> > os: Linux-3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 >>> > nodename: mynode >>> > machine: x86_64 >>> > clock source: unix >>> > detected number of CPU cores: 8 >>> > current working directory: /var/log >>> > detected binary path: /usr/local/bin/uwsgi >>> > !!! no internal routing support, rebuild with pcre support !!! >>> > setgid() to 1010 >>> > setuid() to 1002 >>> > your processes number limit is 128020 >>> > your memory page size is 4096 bytes >>> > detected max file descriptor number: 1024 >>> > lock engine: pthread robust mutexes >>> > thunder lock: disabled (you can enable it with --thunder-lock) >>> > uwsgi socket 0 bound to TCP address 127.0.0.1:4003 fd 3 >>> > Python version: 2.7.3 (default, Feb 27 2014, 20:09:21) [GCC 4.6.3] >>> > Python main interpreter initialized at 0x1d5d480 >>> > python threads support enabled >>> > your server socket listen backlog is limited to 100 connections >>> > your mercy for graceful operations on workers is 60 seconds >>> > mapped 2503992 bytes (2445 KB) for 150 cores >>> > *** Operational MODE: preforking+threaded *** >>> > *** uWSGI is running in multiple interpreter mode *** >>> > spawned uWSGI master process (pid: 19627) >>> > spawned uWSGI worker 1 (pid: 19629, cores: 75) >>> > spawned uWSGI worker 2 (pid: 19630, cores: 75) >>> > loading stage 1 >>> > loading stage 1 >>> > loading stage 2 >>> > loading stage 2 >>> > loading stage 3 >>> > loading stage 3 >>> > >>> > [...] >>> > >>> > worker 2 killed successfully (pid: 22063) >>> > Respawned uWSGI worker 2 (new pid: 22494) >>> > worker 1 killed successfully (pid: 22064) >>> > Respawned uWSGI worker 1 (new pid: 22495) >>> > loading stage 1 >>> > loading stage 1 >>> > loading stage 2 >>> > loading stage 3 >>> > loading stage 2 >>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480 >>> > pid: >>> > 22494 (default app) >>> > loading stage 3 >>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480 >>> > pid: >>> > 22495 (default app) >>> > >>> > [...] >>> > >>> > ...The work of process 22494 is done. Seeya! >>> > ...The work of process 22495 is done. Seeya! >>> > worker 2 killed successfully (pid: 22494) >>> > Respawned uWSGI worker 2 (new pid: 22685) >>> > worker 1 killed successfully (pid: 22495) >>> > Respawned uWSGI worker 1 (new pid: 22686) >>> > loading stage 1 >>> > loading stage 1 >>> > loading stage 2 >>> > loading stage 2 >>> > loading stage 3 >>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480 >>> > pid: >>> > 22686 (default app) >>> > loading stage 3 >>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480 >>> > pid: >>> > 22685 (default app) >>> > [pid: 22685|app: 0|req: 3/4660] 192.168.100.11 () {32 vars in 432 >>> > bytes} >>> > [Tue Apr 8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 6 >>> > msecs >>> > (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0) >>> > [pid: 22686|app: 0|req: 16/4665] 192.168.100.11 () {32 vars in 432 >>> > bytes} >>> > [Tue Apr 8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 26 >>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 5) >>> > [pid: 22686|app: 0|req: 54/4677] 192.168.100.11 () {32 vars in 432 >>> > bytes} >>> > [Tue Apr 8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 56 >>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 7) >>> > [pid: 22686|app: 0|req: 54/4678] 192.168.100.11 () {32 vars in 432 >>> > bytes} >>> > [Tue Apr 8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 58 >>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1) >>> > >>> > after this, the logs are empty again - that is, no more 404s for a >>> > while >>> > (note that I grep out the HTTP-200s before and after) >>> > >>> > thanks! >>> > -Clemens >>> > >>> > >>> >>> can you repeat the test printing the process pid (os.getpid()) at each >>> line ? in this way we can see if all the stages are executed before the >>> "readyness" announce >>> >>> Thanks >>> >>> -- >>> Roberto De Ioris >>> http://unbit.it >>> _______________________________________________ >>> uWSGI mailing list >>> [email protected] >>> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi >> >> >> > > > _______________________________________________ > uWSGI mailing list > [email protected] > http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi > -- INADA Naoki <[email protected]> _______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
