I don't think it's a problem with that particular app - it was basically a vanilla django install - and it works fine after the restart.

the real problem is that it cascades. once one vassal starts to experience the problem, then any new vassals created from that point on, or any restarted, also start to see problems...

just happened again :/


--
Harry Percival
Developer
[email protected]

PythonAnywhere - a fully browser-based Python development and hosting 
environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

On 10/03/14 13:56, Roberto De Ioris wrote:
Hi there,

Happened again today, I tried to snapshot some more debug info:

here are the logs from the emperor, when i try to reload the vassal:

     2014-03-10 12:19:28 +0000 EMPEROR - [emperor] kill: No such process
     [core/emperor.c line 1699]
     2014-03-10 12:19:31 +0000 EMPEROR - emperor_respawn/write(): Broken
     pipe [core/emperor.c line 656]
     2014-03-10 12:19:31 +0000 EMPEROR - [emperor] reload the uwsgi
     instance redacted.pythonanywhere.com.ini
     2014-03-10 12:19:31 +0000 EMPEROR - [emperor] kill: No such process
     [core/emperor.c line 1699]
     2014-03-10 12:19:34 +0000 EMPEROR - [emperor] kill: No such process
     [core/emperor.c line 1699]
     2014-03-10 12:19:37 +0000 EMPEROR - [emperor] kill: No such process
     [core/emperor.c line 1699]

You can see the "no such process" error keeps happening, every couple of
seconds

here are the logs from the vassal server log:

     2014-03-10 11:58:51 VACUUM: unix socket
     /var/sockets/redacted.pythonanywhere.com/socket removed.
     2014-03-10 11:58:53 *** Starting uWSGI 2.0 (64bit) on [Mon Mar 10
     11:58:52 2014] ***
     2014-03-10 11:58:53 compiled with version: 4.8.1 on 07 February 2014
     19:06:17
     2014-03-10 11:58:53 os: Linux-3.11.0-15-generic #25-Ubuntu SMP Thu
     Jan 30 17:22:01 UTC 2014
     2014-03-10 11:58:53 nodename: giles-liveweb2
     2014-03-10 11:58:53 machine: x86_64
     2014-03-10 11:58:53 clock source: unix
     2014-03-10 11:58:53 pcre jit disabled
     2014-03-10 11:58:53 detected number of CPU cores: 4
     2014-03-10 11:58:53 current working directory: /etc/uwsgi/vassals
     2014-03-10 11:58:53 detected binary path: /usr/local/bin/uwsgi
     2014-03-10 11:58:53 using Linux cgroup
     /mnt/cgroups/cpu/user_types/free with mode 700
     2014-03-10 11:58:53 assigned process 16789 to cgroup
     /mnt/cgroups/cpu/user_types/free/tasks
     2014-03-10 11:58:53 using Linux cgroup
     /mnt/cgroups/cpuacct/users/Redacted with mode 700
     2014-03-10 11:58:53 assigned process 16789 to cgroup
     /mnt/cgroups/cpuacct/users/Redacted/tasks
     2014-03-10 11:58:53 using Linux cgroup
     /mnt/cgroups/memory/user_types/free with mode 700
     2014-03-10 11:58:53 assigned process 16789 to cgroup
     /mnt/cgroups/memory/user_types/free/tasks
     2014-03-10 11:58:53 uWSGI running as root, you can use
     --uid/--gid/--chroot options
     2014-03-10 11:58:53 chroot() to /mnt/chroots/Redacted
     2014-03-10 11:58:53 setgid() to 60000
     2014-03-10 11:58:53 setuid() to 231762
     2014-03-10 11:58:53 limiting number of processes to 64...
     2014-03-10 11:58:53 your processes number limit is 64
     2014-03-10 11:58:53 your memory page size is 4096 bytes
     2014-03-10 11:58:53 detected max file descriptor number: 123456
     2014-03-10 11:58:53 building mime-types dictionary from file
     /etc/mime.types...
     2014-03-10 11:58:53 536 entry found
     2014-03-10 11:58:53 lock engine: pthread robust mutexes
     2014-03-10 11:58:53 thunder lock: disabled (you can enable it with
     --thunder-lock)
     2014-03-10 11:58:53 uwsgi socket 0 bound to UNIX address
     /var/sockets/redacted.pythonanywhere.com/socket fd 7
     2014-03-10 11:58:53 Python version: 2.7.5+ (default, Sep 19 2013,
     13:52:09)  [GCC 4.8.1]
     2014-03-10 11:58:53 *** Python threads support is disabled. You can
     enable it with --enable-threads ***
     2014-03-10 11:58:53 Python main interpreter initialized at 0x1021bb0
     2014-03-10 11:58:53 your server socket listen backlog is limited to
     100 connections
     2014-03-10 11:58:53 your mercy for graceful operations on workers is
     60 seconds
     2014-03-10 11:58:53 setting request body buffering size to 65536 bytes
     2014-03-10 11:58:53 mapped 333936 bytes (326 KB) for 1 cores
     2014-03-10 11:58:53 *** Operational MODE: single process ***
     2014-03-10 11:58:53 WSGI app 0 (mountpoint='') ready in 1 seconds on
     interpreter 0x1021bb0 pid: 16789 (default app)
     2014-03-10 11:58:53 *** uWSGI is running in multiple interpreter
     mode ***
     2014-03-10 11:58:53 spawned uWSGI master process (pid: 16789)
     2014-03-10 11:58:53 spawned uWSGI worker 1 (pid: 16790, cores: 1)
     2014-03-10 11:58:53 spawned 2 offload threads for uWSGI worker 1
     2014-03-10 11:58:57 announcing my loyalty to the Emperor...
     2014-03-10 12:01:14 Mon Mar 10 12:01:14 2014 - received message 0
     from emperor
     2014-03-10 12:01:14 SIGINT/SIGQUIT received...killing workers...
     2014-03-10 12:01:15 worker 1 buried after 1 seconds
     2014-03-10 12:01:15 goodbye to uWSGI.
     2014-03-10 12:01:15 chdir(): No such file or directory [core/uwsgi.c
     line 1472]
     2014-03-10 12:01:15 VACUUM: unix socket
     /var/sockets/redacted.pythonanywhere.com/socket removed.

You'll notice the logs are from an earlier reload.  later reloads don't
seem to even log any more.

And here is the vassal config:

     [uwsgi]
     plugins = python27
     uid = 231762
     gid = 60000

     if-not-exists = /mnt/chroots/Redacted/bin/ls
     exec-pre-jail = python
     /home/anywhere/django/anywhere/jails/create.py Redacted
     endif =
     chroot = /mnt/chroots/Redacted
     limit-nproc = 64
     # shutdown app (but not master) after 26hrs of no hits
     idle=93600
     # kill any requests that take too long process
     harakiri = 300
     buffer-size = 32768
     post-buffering = 65536
     vacuum =
     # chrooted master cannot reload itself, so just exit
     exit-on-reload = true
     # file lock prevents respawning vassals from racing dying ones
     flock = %p

     log-encoder = format redacted.pythonanywhere.com ${strftime:%%F %%T}
     ${msg}
     logger = rsyslog:10.124.106.197:10515,uwsgi,142

     workers = 1
     cgroup = /mnt/cgroups/cpu/user_types/free
     cgroup = /mnt/cgroups/cpuacct/users/Redacted
     cgroup = /mnt/cgroups/memory/user_types/free

     auto-procname
     procname-prefix-spaced = Redacted Redacted.pythonanywhere.com
     disable-logging = true

     check-static=/var/www/static

     static-map =
     
/static/admin/=/home/Redacted/.virtualenvs/django16/lib/python2.7/site-packages/django/contrib/admin/static/admin

     static-index = index.html
     offload-threads = 2

     touch-reload = /var/www/redacted_pythonanywhere_com_wsgi.py
     socket = /var/sockets/redacted.pythonanywhere.com/socket
     chmod-socket = 666
     chdir = /var/www
     env = HOST_NAME=redacted.pythonanywhere.com
     env = WSGI_MODULE=redacted_pythonanywhere_com_wsgi

     env = no_proxy=localhost,127.0.0.1,localaddress,.localdomain.com

     env = HOME=/home/Redacted

     env = http_proxy=http://proxy.server:3128

     env = PYENCHANT_LIBRARY_PATH=/usr/lib/libenchant.so.1

     env = https_proxy=http://proxy.server:3128

     env = PATH=/home/Redacted/.local/bin:/usr/local/bin:/usr/bin:/bin
     unenv = UWSGI_EMPEROR_FD
     unenv = SHLVL
     unenv = SSH_TTY
     unenv = PWD
     unenv = UWSGI_RELOADS
     unenv = SSH_CLIENT
     unenv = LOGNAME
     unenv = UWSGI_ORIGINAL_PROC_NAME
     unenv = MAIL
     unenv = SSH_CONNECTION
     unenv = _

     file = /bin/user_wsgi_wrapper.py


I've checked the stats server, there aren't any vassals in the blacklist.


Bouncing UWSGI fixes the problem, but obviously it involves downtime, so
we'd rather avoid it if poss.



Hi Harry, do not do it, basically your vassal is not removed from the
linked list as the process mapped to it is no more available (hard to say
the reason). Removing the file (well rename it to .off) from the vassal
dir should be enough.

By the way, latest code improved that coner-case too:

https://github.com/unbit/uwsgi/commit/c118c75bfe5ed6b26668aa48ae076dddcf31a5b9


basically if killing the process is not possible the memory area is
removed from the list (so it can be restarted). If for some reason the pid
is changed, you will get a zombie, but the master will clear it soon or
later.

If you use the pid namespace (this is very easy, just add
emperor-use-clone = pid in your emperor config) you can be sure that once
the vassal master is dead no more user processes (even the daemons
eventually spawned by your customers) are left (as the master is the new
init for the vassal)


Let me know



_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Reply via email to