On Fri, Jun 14, 2013 at 6:46 AM, Roberto De Ioris <[email protected]> wrote:
> > > Hi, uwsgi users, > > > > I need help in trying to determine where a strange problem comes and how > > to > > fix it. > > > > TL;DR: socket backlog full for one of the vassals, while no visible > reason > > for that, only restarting the emperor helps. > > > > I've had this problem a couple of times, to which only restarting uwsgi > > became a solution. > > nginx in the frontend started giving out 502 errors, saying > > > > connect() to unix:///var/lib/crm/uwsgi_includes/slontur.socket failed > (11: > > Resource temporarily unavailable) > > > > This led me to a thought, that this might be due to a *overfilled socket > > backlog*. > > > > "netstat" output had *hundreds of lines like this*: > > > > unix 2 [ ] STREAM CONNECTING 0 > > /var/lib/crm/uwsgi_includes/slontur.socket > > unix 2 [ ] STREAM CONNECTING 0 > > /var/lib/crm/uwsgi_includes/slontur.socket > > .... > > > > Unfortunately, I did not have time to inspect netstat's output more > > carefully, experimenting with command line parameters. > > > > I could not understand, where all of these connections were coming from, > > but actually it's only nginx and the emperor, who potentially could make > > all these connections, so I tried restarting nginx to see if it helps. It > > did not. > > > > Touching the ".ini" file did not reload the vassal either. So only > > restarting the emperor itself helped. > > > > The project is not under heavy load, so* these connections where > > definitely > > not coming from the browsers at the same time.* > > All the background threads in the app (it's running background threads > > with > > gevent) where also not running, cause there was nothing in the logs. > > * > > * > > *uwsgi.log did not have any messages either.* > > * > > * > > The app has "heartbeat = 20" in the .ini file, but that did not make it > > reload by itself. > > > > Here are the configs: > > > > *The vassal, which stopped working* > > [uwsgi] > > logfile-chown = crm > > disable-logging = > > home = /var/lib/crm/.virtualenvs/crm > > logto = /var/lib/crm/homes/%n/logs/uwsgi.log > > gid = crm-%n > > env = LC_ALL=en_US.UTF-8 > > env = LANG=en_US.UTF-8 > > env = SERVER_SOFTWARE=gevent > > env = DJANGO_SETTINGS_MODULE=%n_settings > > worker-reload-mercy = 5 > > gevent = 100 > > idle = 86400 > > harakiri = 30 > > reload-mercy = 5 > > lazy-apps = true > > cheap = true > > heartbeat = 20 > > pythonpath = /var/lib/crm/src > > pythonpath = /var/lib/crm/homes/%n > > harakiri-verbose = > > uid = crm-%n > > chdir = /var/lib/crm/homes/%n > > wsgi = crm.deploy.gevent_wsgi > > die-on-idle = true > > > > *Emperor:* > > exec uwsgi --logto /var/log/uwsgi/emperor.log \ > > --die-on-term --emperor "/var/lib/crm/uwsgi_includes" \ > > --emperor-tyrant \ > > --emperor-on-demand-directory "/var/lib/crm/uwsgi_includes" > > > > Thanks for your time, > > Igor Katson. > > _______________________________________________ > > > > > You have a single process, so if your app is blocked (for whatever reason) > your whole instance will be blocked. The best way to understand what is > going on would be adding the stats server, so when the app is blocked you > can ask it the whole server status. > > The heartbeat ensure the master is alive, while in your case the worker is > stuck. > > By the way when you experience the problem, try only touching the config > file. There is no need to reload the whole emperor stack > > > I enabled the stats to be able to inspect the problem next time. Unfortunately, I still can't find a way to see who is connecting to the unix socket, it seems there's no valid tool for that. In the meantime, is there any way to make uwsgi kill the worker or anything like that which will make it self-heal in a situation like above? Thanks, Roberto.
_______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
