Re: [Mailman-Users] Stuck OutgoingRunner

2018-03-16 Thread Sebastian Hagedorn
It happened again yesterday. Details below. --On 7. Februar 2018 um 12:43:18 +0900 Yasuhito FUTATSUKI wrote: In fact, On 02/02/18 19:26, Sebastian Hagedorn wrote: root@mailman3/usr/lib/mailman/bin]$ strace -p 1677 Process 1677 attached recvfrom(10, ^CProcess 1677

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Yasuhito FUTATSUKI
On 02/07/18 01:01, Mark Sapiro wrote: On 02/06/2018 03:51 AM, Sebastian Hagedorn wrote: --On 4. Februar 2018 um 12:54:43 +0900 Yasuhito FUTATSUKI wrote: As far as I read the code, if OutgoingRunner catch SIGINT during waiting for response from the MTA, the signal

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Mark Sapiro
On 02/06/2018 03:48 AM, Sebastian Hagedorn wrote: > > Is it possible that the OutgoingRunner was done with transmitting the > message and had already removed the queue file, but that the connection > hadn't yet been closed? Only if something went very wrong in SMTPDirect.process() which would

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Sebastian Hagedorn
--On 6. Februar 2018 um 08:01:18 -0800 Mark Sapiro wrote: On 02/06/2018 03:51 AM, Sebastian Hagedorn wrote: --On 4. Februar 2018 um 12:54:43 +0900 Yasuhito FUTATSUKI wrote: As far as I read the code, if OutgoingRunner catch SIGINT during waiting for

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Mark Sapiro
On 02/06/2018 03:51 AM, Sebastian Hagedorn wrote: > > --On 4. Februar 2018 um 12:54:43 +0900 Yasuhito FUTATSUKI > wrote: >> >> As far as I read the code, if OutgoingRunner catch SIGINT during waiting >> for response from the MTA, the signal handler for SIGINT in qrunner set

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Sebastian Hagedorn
--On 4. Februar 2018 um 12:54:43 +0900 Yasuhito FUTATSUKI wrote: On 02/04/18 12:13, Mark Sapiro wrote: The status of 'S' for OutgoingRunner is "uninterruptable sleep". This means it's either called time.sleep for QRUNNER_SLEEP_TIME (default = 1 second) which is

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-06 Thread Sebastian Hagedorn
--On 3. Februar 2018 um 19:13:33 -0800 Mark Sapiro wrote: On 02/03/2018 01:03 AM, Sebastian Hagedorn wrote: Did you look at the out queue, and if so was there a .bak file there. This would be the entry currently being processed. I looked at the out queue, and there was no

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-03 Thread Yasuhito FUTATSUKI
On 02/04/18 12:13, Mark Sapiro wrote: The status of 'S' for OutgoingRunner is "uninterruptable sleep". This means it's either called time.sleep for QRUNNER_SLEEP_TIME (default = 1 second) which is unlikely as it should wake up, or it's waiting for response from something, most likely a response

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-03 Thread Mark Sapiro
On 02/03/2018 01:03 AM, Sebastian Hagedorn wrote: >> >> Did you look at the out queue, and if so was there a .bak file there. >> This would be the entry currently being processed. > > I looked at the out queue, and there was no .bak file. Interesting. That says that OutgoingRunner is not

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-03 Thread Sebastian Hagedorn
Thanks for your reply! On 02/02/2018 02:26 AM, Sebastian Hagedorn wrote: [root@mailman3/usr/lib/mailman/bin]$ lsof -p 1677 COMMAND    PID    USER   FD   TYPE   DEVICE SIZE/OFF   NODE NAME python2.7 1677 mailman  cwd    DIR    253,0 4096 173998 /usr/lib/mailman python2.7 1677 mailman  rtd   

Re: [Mailman-Users] Stuck OutgoingRunner

2018-02-02 Thread Mark Sapiro
On 02/02/2018 02:26 AM, Sebastian Hagedorn wrote: > Hi, > > we've been running Mailman for many years and have never had stability > issues, but about a month ago we moved the server from RHEL 5 to RHEL 6 > and to the current version (2.1.25), and since then it has already > happened twice that

[Mailman-Users] Stuck OutgoingRunner

2018-02-02 Thread Sebastian Hagedorn
Hi, we've been running Mailman for many years and have never had stability issues, but about a month ago we moved the server from RHEL 5 to RHEL 6 and to the current version (2.1.25), and since then it has already happened twice that one of our four OutgoingRunners got "stuck" and stopped