Re: [uWSGI] Possible setup race condition between uwsgi-plugin-python and python-flask

INADA Naoki Sun, 15 Jun 2014 05:16:34 -0700

I've reported it to https://github.com/mitsuhiko/werkzeug/issues/546 .


On Sat, Jun 14, 2014 at 10:24 AM, Clemens Kolbitsch
<[email protected]> wrote:
> Hi all!
>
> TL;DR: We found the issue; It's a race-condition in python-werkzeug (used
> internally by flask), so NOT uwsgi related.
>
> Thank you all for the help you provided!
>
> I just wanted to post a quick follow-up on this anyways. A colleague of mine
> (much smarter than me ;) ) just found the root-cause of the issue. It is NOT
> related to uwsgi, but is a race-condition in flask. The blueprint uses a
> lazy-ordering of blueprints, which causes a race-condition during the
> set-up.
>
> In the spirit of http://xkcd.com/979/, I'm posting a few details, despite
> that it's not uwsgi-related (and now I'm quoting my colleague):
>
> I was able to reproduce the issue reliably with 32 parallel requests by
> adding the following to the wsgi script:
>
> sys.setcheckinterval(1)
>
> This forces the Python interpreter to switch threads more aggressively - the
> default is 100. At that point I even saw it happen right after startup (with
> or without lazy loading). So it's not an isolated issue of the respawn
> functionality - it appeared to be more general.
> Looking at the route handling of flask, which uses werkzeug internally,
> there's a data structure werkzeug.routing.Map, which holds all registered
> urls. For efficient matching rules upon a request, werkzeug needs the rules
> sorted. The sorting, however, is done lazily using a method called update()
> - self._remap is set to true whenever a new rule is added:
>
>     def update(self):
>         """Called before matching and building to keep the compiled rules
>         in the correct order after things changed.
>         """
>         if self._remap:
>             self._rules.sort(key=lambda x: x.match_compare_key())
>             for rules in self._rules_by_endpoint.itervalues():
>                 rules.sort(key=lambda x: x.build_compare_key())
>             self._remap = False
>
> In the setup code of our application, we register blueprints. This adds
> rules to the map and self._remap is set to true - the Map is not sorted at
> this point. After our setup code, we return to uwsgi which start accepting
> possibly multiple requests in parallel that are all dispatched to Python.
> This means, however, that two threads can see self._remap as True and both
> of them start sorting the map inplace.
> The problem happens when the first thread sets self._remap to false and a
> third thread starts using the Map assuming it is sorted: The second thread
> might just have started its sorting operation. I assume that during sort()
> in one thread, another thread can see an empty or partially sorted rule
> list. The matching will not work correctly and you get a 404. I guess that's
> what one calls race condition and I'm actually surprised if there couldn't
> be much worse things happen.
>
> Workaround:
>
> If one never registers new urls during runtime, preemptively calling
> app.url_map.update() in the setup code seems to be an acceptable solution. I
> was not able to reproduce the 404s afterwards (even with the checkinterval
> set to 1) after adding the following patch:
>
> def create_app(...):
>     app = flask.Flask(__name__)
>     [...]
>
>     app.register_blueprint(mymodel, url_prefix="/foo")
> +    # Make sure the routing.Map used by werkzeug is sorted to avoid race
> conditions in Map.update().
> +    app.url_map.update()
>     return app
>
> If urls are registered during runtime, this will not help, obviously.
>
> /quote
>
> Maybe this will help other stumbling over the problem
>
> -Clemens
>
>
> On Wed, Apr 9, 2014 at 8:56 AM, Clemens Kolbitsch <[email protected]>
> wrote:
>>
>> Hi,
>>
>> thanks for the suggestion! However, seems like all server processes are
>> ready at this point
>>
>> ## modified wsgi file:
>>
>> import os
>> pid = os.getpid()
>> print "loading stage 1", pid
>> application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
>> print "loading stage 2", pid
>>
>> application.logger.setLevel(logging.DEBUG)
>> for handler in log.logger().handlers:
>>     application.logger.addHandler(handler)
>> print "loading stage 3", pid
>>
>> ## updated logs
>>
>> ...The work of process 17248 is done. Seeya!
>> ...The work of process 17249 is done. Seeya!
>> worker 1 killed successfully (pid: 17248)
>> Respawned uWSGI worker 1 (new pid: 17424)
>> worker 2 killed successfully (pid: 17249)
>> Respawned uWSGI worker 2 (new pid: 17425)
>> loading stage 1 17425
>> loading stage 1 17424
>> loading stage 2 17424
>> loading stage 2 17425
>> loading stage 3 17424
>> loading stage 3 17425
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
>> pid: 17424 (default app)
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
>> pid: 17425 (default app)
>> ...The work of process 17425 is done. Seeya!
>> ...The work of process 17424 is done. Seeya!
>> worker 1 killed successfully (pid: 17424)
>> Respawned uWSGI worker 1 (new pid: 17956)
>> worker 2 killed successfully (pid: 17425)
>> Respawned uWSGI worker 2 (new pid: 17957)
>> loading stage 1 17957
>> loading stage 1 17956
>> loading stage 2 17956
>> loading stage 2 17957
>> loading stage 3 17956
>> loading stage 3 17957
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
>> pid: 17956 (default app)
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2278480
>> pid: 17957 (default app)
>> [pid: 17956|app: 0|req: 2/3571] 192.168.100.11 () {32 vars in 432 bytes}
>> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 7 msecs
>> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1)
>> [pid: 17956|app: 0|req: 10/3573] 192.168.100.11 () {32 vars in 432 bytes}
>> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 16 msecs
>> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0)
>> [pid: 17956|app: 0|req: 18/3575] 192.168.100.11 () {32 vars in 432 bytes}
>> [Wed Apr  9 15:48:11 2014] GET /myapp/... => generated 238 bytes in 24 msecs
>> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 6)
>>
>> ## FWIW, I decided to do another test to see if new threads are spawned
>> before the function returns
>>
>> import os, threading
>> pid = os.getpid()
>> print "loading stage 1", pid, threading.active_count()
>> application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
>> print "loading stage 2", pid, threading.active_count()
>>
>> application.logger.setLevel(logging.DEBUG)
>> for handler in log.logger().handlers:
>>     application.logger.addHandler(handler)
>> print "loading stage 3", pid, threading.active_count()
>>
>>
>> and get
>>
>>
>> Respawned uWSGI worker 1 (new pid: 23145)
>> worker 2 killed successfully (pid: 22771)
>> Respawned uWSGI worker 2 (new pid: 23146)
>> loading stage 1 23146 1
>> loading stage 1 23145 1
>> loading stage 2 23145 1
>> loading stage 3 23145 1
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480
>> pid: 23145 (default app)
>> loading stage 2 23146 1
>> loading stage 3 23146 1
>> WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x2340480
>> pid: 23146 (default app)
>> [pid: 23146|app: 0|req: 31/2387] 192.168.100.11 () {32 vars in 432 bytes}
>> [Wed Apr  9 15:52:38 2014] GET /myapp/... => generated 238 bytes in 26 msecs
>> (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 28)
>>
>>
>> so, could it be that it's not uwsgi's fault after all? As mentioned
>> initially, the flask-guys sent me to this mailing list, because they said
>> it's not flask's fault. But, from all my debug-logs, it seems like the first
>> thread does the set-up and at the point where all the blueprints have been
>> set up, there is still only one thread.... but of course I have no idea of
>> the uwsgi internals (and neither the flask ones) ;)
>>
>> -Clemens
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Apr 9, 2014 at 2:40 AM, Roberto De Ioris <[email protected]> wrote:
>>>
>>>
>>> > I modified the code as follows:
>>> >
>>> > [...]
>>> > print "loading stage 1"
>>> > application = myapp.create_app(CONFIG_FILE, None, log, strict=True)
>>> > print "loading stage 2"
>>> > application.logger.setLevel(logging.DEBUG)
>>> > for handler in log.logger().handlers:
>>> >     application.logger.addHandler(handler)
>>> > print "loading stage 3"
>>> >
>>> > and get the following logs
>>> >
>>> > /usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini 2>&1 | grep -v
>>> > HTTP.....200
>>> > [uWSGI] getting INI configuration from
>>> > /etc/uwsgi/apps-enabled/myapp.ini
>>> > *** Starting uWSGI 2.0.3 (64bit) on [Tue Apr  8 18:16:20 2014] ***
>>> > compiled with version: 4.6.3 on 08 April 2014 02:43:31
>>> > os: Linux-3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014
>>> > nodename: mynode
>>> > machine: x86_64
>>> > clock source: unix
>>> > detected number of CPU cores: 8
>>> > current working directory: /var/log
>>> > detected binary path: /usr/local/bin/uwsgi
>>> > !!! no internal routing support, rebuild with pcre support !!!
>>> > setgid() to 1010
>>> > setuid() to 1002
>>> > your processes number limit is 128020
>>> > your memory page size is 4096 bytes
>>> > detected max file descriptor number: 1024
>>> > lock engine: pthread robust mutexes
>>> > thunder lock: disabled (you can enable it with --thunder-lock)
>>> > uwsgi socket 0 bound to TCP address 127.0.0.1:4003 fd 3
>>> > Python version: 2.7.3 (default, Feb 27 2014, 20:09:21)  [GCC 4.6.3]
>>> > Python main interpreter initialized at 0x1d5d480
>>> > python threads support enabled
>>> > your server socket listen backlog is limited to 100 connections
>>> > your mercy for graceful operations on workers is 60 seconds
>>> > mapped 2503992 bytes (2445 KB) for 150 cores
>>> > *** Operational MODE: preforking+threaded ***
>>> > *** uWSGI is running in multiple interpreter mode ***
>>> > spawned uWSGI master process (pid: 19627)
>>> > spawned uWSGI worker 1 (pid: 19629, cores: 75)
>>> > spawned uWSGI worker 2 (pid: 19630, cores: 75)
>>> > loading stage 1
>>> > loading stage 1
>>> > loading stage 2
>>> > loading stage 2
>>> > loading stage 3
>>> > loading stage 3
>>> >
>>> > [...]
>>> >
>>> > worker 2 killed successfully (pid: 22063)
>>> > Respawned uWSGI worker 2 (new pid: 22494)
>>> > worker 1 killed successfully (pid: 22064)
>>> > Respawned uWSGI worker 1 (new pid: 22495)
>>> > loading stage 1
>>> > loading stage 1
>>> > loading stage 2
>>> > loading stage 3
>>> > loading stage 2
>>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>>> > pid:
>>> > 22494 (default app)
>>> > loading stage 3
>>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>>> > pid:
>>> > 22495 (default app)
>>> >
>>> > [...]
>>> >
>>> > ...The work of process 22494 is done. Seeya!
>>> > ...The work of process 22495 is done. Seeya!
>>> > worker 2 killed successfully (pid: 22494)
>>> > Respawned uWSGI worker 2 (new pid: 22685)
>>> > worker 1 killed successfully (pid: 22495)
>>> > Respawned uWSGI worker 1 (new pid: 22686)
>>> > loading stage 1
>>> > loading stage 1
>>> > loading stage 2
>>> > loading stage 2
>>> > loading stage 3
>>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>>> > pid:
>>> > 22686 (default app)
>>> > loading stage 3
>>> > WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1d5d480
>>> > pid:
>>> > 22685 (default app)
>>> > [pid: 22685|app: 0|req: 3/4660] 192.168.100.11 () {32 vars in 432
>>> > bytes}
>>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 6
>>> > msecs
>>> > (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 0)
>>> > [pid: 22686|app: 0|req: 16/4665] 192.168.100.11 () {32 vars in 432
>>> > bytes}
>>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 26
>>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 5)
>>> > [pid: 22686|app: 0|req: 54/4677] 192.168.100.11 () {32 vars in 432
>>> > bytes}
>>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 56
>>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 7)
>>> > [pid: 22686|app: 0|req: 54/4678] 192.168.100.11 () {32 vars in 432
>>> > bytes}
>>> > [Tue Apr  8 18:18:07 2014] GET /myapp/... => generated 238 bytes in 58
>>> > msecs (HTTP/1.1 404) 2 headers in 72 bytes (1 switches on core 1)
>>> >
>>> > after this, the logs are empty again - that is, no more 404s for a
>>> > while
>>> > (note that I grep out the HTTP-200s before and after)
>>> >
>>> > thanks!
>>> > -Clemens
>>> >
>>> >
>>>
>>> can you repeat the test printing the process pid (os.getpid()) at each
>>> line ? in this way we can see if all the stages are executed before the
>>> "readyness" announce
>>>
>>> Thanks
>>>
>>> --
>>> Roberto De Ioris
>>> http://unbit.it
>>> _______________________________________________
>>> uWSGI mailing list
>>> [email protected]
>>> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
>>
>>
>>
>
>
> _______________________________________________
> uWSGI mailing list
> [email protected]
> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
>



-- 
INADA Naoki  <[email protected]>
_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Re: [uWSGI] Possible setup race condition between uwsgi-plugin-python and python-flask

Reply via email to