Ok, we've lowered the worker-reload-mercy setting to 5 secs to alleviate
the most immediate concern (the fact that it hung for a minute before being
killed).

Do you have any other suggestions in terms of debugging this? We aren't
sure the if the hangup is in our code or in uwsgi.

Thanks,
Andy


On Tue, Jul 22, 2014 at 6:58 AM, Roberto De Ioris <[email protected]> wrote:

>
> > We are having a problem when we are restarting our app that runs under
> > emperor mode. Sometimes, when we reload the config (an ini file) one or
> > two
> > workers will not die and will start to consume 100% of a cpu, and then
> die
> > off ~60 seconds later. This will sporadically happen no matter how many
> > workers we spawn.
> >
> > We are running Django under uwsgi (version 2.0.5) on Ubuntu 14.04 on
> > Amazon
> > EC2.
> >
> > Configs, logs and strace output (for one of the workers that hung) are
> > below. Has anyone seen/experienced this problem before? My assumption for
> > the 60 second time is the harakiri time, though I'm not 100% sure on
> that.
> >
> > Here's the emperor log when a worker was hung:
> > Mon Jul 21 22:39:41 2014 - [emperor] reload the uwsgi instance <app>
> > Mon Jul 21 22:40:44 2014 - [emperor] vassal <app> is ready to accept
> > requests
> >
> > Here's our app ini config (some info removed, though all commands are
> here
> > that are in the config):
> > [uwsgi]
> > uid = <uid>
> > gid = <gid>
> > socket = 127.0.0.1:<port>
> > listen = 16384
> > workers = 4
> > threads = 2
> > thunder-lock = true
> > max-requests = 20000
> > harakiri = 60
> > harakiri-verbose = true
> > master = true
> > single-interpreter = true
> > virtualenv = <virtualenv>
> > pythonpath = <pythonpath>
> > env = DJANGO_SETTINGS_MODULE=<module>
> > module = <wsgi_file>
> > pidfile2 = <pidfile>
> > logto2 = <logfile>
> > logfile-chmod = 644
> > stats = 127.0.0.1:<stats_port>
> > post-buffering = 65536
> > buffer-size = 32768
> > disable-logging = true
> > chdir = <dir>
> >
> > I was able to get an strace off of one of the hung workers, and this is
> > what I got (starting from when they get the signal to reload:
> > close(4)                                = 0
> > futex(0x7f3a15c37000, FUTEX_LOCK_PI, 1) = ? ERESTARTNOINTR (To be
> > restarted)
> > --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=1660, si_uid=601}
> ---
> > write(2, "Gracefully killing worker 6 (pid"..., 44) = -1 EPIPE (Broken
> > pipe)
> > open("/usr/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
> > file or directory)
> > open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
> > fstat(4, {st_mode=S_IFREG|0644, st_size=46184, ...}) = 0
> > mmap(NULL, 46184, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f3a15b3d000
> > close(4)                                = 0
> > access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or
> > directory)
> > open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 4
> > read(4,
> > "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260*\0\0\0\0\0\0"...,
> 832)
> > = 832
> > fstat(4, {st_mode=S_IFREG|0644, st_size=90080, ...}) = 0
> > mmap(NULL, 2185952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0)
> > =
> > 0x7f3a09947000
> > mprotect(0x7f3a0995d000, 2093056, PROT_NONE) = 0
> > mmap(0x7f3a09b5c000, 4096, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x15000) = 0x7f3a09b5c000
> > close(4)                                = 0
> > munmap(0x7f3a15b3d000, 46184)           = 0
> > tgkill(16665, 16668, SIGRTMIN)          = 0
> > rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3a15826340},
> > {0x460790,
> > [], SA_RESTORER, 0x7f3a15826340}, 8) = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a09907000
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > munmap(0x7f3a098c7000, 262144)          = 0
> > mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0)
> > = 0x7f3a098c7000
> > +++ killed by SIGKILL +++
> >
> > Any help would be appreciated. If anyone wants any other info, just let
> me
> > know and I'll supply it.
> >
> > Thanks,
> > Andy
> > _______________________________________________
> > uWSGI mailing list
> > [email protected]
> > http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
> >
>
> the 60 seconds timeout is the --worker-reload-mercy value (default 60). Is
> the maximum amount of time the master will wait for a worker to die (then
> the master will send -9). As a worker is free to ignore signals of the
> master, this timeout is a security measure for avoiding the master hanging
> forever.
>
> Unfortunately your strace does not show anything useful that could explain
> why your worker hung, but if 60 seconds are too much, just tune them to a
> lower value.
>
>
> --
> Roberto De Ioris
> http://unbit.it
> _______________________________________________
> uWSGI mailing list
> [email protected]
> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
>
_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Reply via email to