Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-21 Thread Mark Sapiro
On 02/20/2014 02:04 PM, Lindsay Haisley wrote:
 
 Here's a sampling of the qrunner log from the wee hours, before I
 started poking at the problem to try to fix it:
 
 Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2445) BounceRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2445) BounceRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2446) CommandRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2446) CommandRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2451) RetryRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2443) Master watcher caught SIGINT.  Restarting.
 Feb 20 03:22:02 2014 (2444) ArchRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2444) ArchRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2448) NewsRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2450) VirginRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2449) OutgoingRunner qrunner caught SIGINT.  Stopping.
 Feb 20 03:22:02 2014 (2451) RetryRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2448) NewsRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2450) VirginRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2445, sig: None, sts: 2, class: BounceRunner, slice: 1/1) [restarting]
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2446, sig: None, sts: 2, class: CommandRunner, slice: 1/1) [restarting]
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2451, sig: None, sts: 2, class: RetryRunner, slice: 1/1) [restarting]
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2448, sig: None, sts: 2, class: NewsRunner, slice: 1/1) [restarting]
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2444, sig: None, sts: 2, class: ArchRunner, slice: 1/1) [restarting]
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2447, sig: None, sts: 2, class: IncomingRunner, slice: 1/1) [restarting]


OK, From what you report here and elsewhere, it appears the issue was
with OutgoingRunner not processing Mailman's 'out' queue. If the above
log excerpt (appears to be from a mailmanctl restart) is complete, you
will note that there are three entries for most runners, e,g.

 Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner caught SIGINT.
Stopping.
 Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner exiting.
 Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
 (pid: 2447, sig: None, sts: 2, class: IncomingRunner, slice: 1/1)
[restarting]

But there is only one for OutgoingRunner

 Feb 20 03:22:02 2014 (2449) OutgoingRunner qrunner caught SIGINT.
Stopping.

suggesting that it was hung and never terminated.

Had it been me at that point, I would have stopped Mailman and made sure
it was completely stopped per the FAQ at http://wiki.list.org/x/_4A9,
and then started it to see if that fixed the problem. If the out queue
were still not being processed, I would try to trace the OutgoingRunner
process to see where it was hung.

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-21 Thread Lindsay Haisley
On Fri, 2014-02-21 at 05:56 -0800, Mark Sapiro wrote:
 OK, From what you report here and elsewhere, it appears the issue was
 with OutgoingRunner not processing Mailman's 'out' queue. If the above
 log excerpt (appears to be from a mailmanctl restart) is complete, you
 will note that there are three entries for most runners, e,g.
 
Well I can't reproduce the Mailman list problem today.  I (re)updated
the kernel version to 3.2.0-59-generic x86_64 and everything seems to be
working OK.  The only obvious change is the aforementioned issue with
the bind9 name server daemon which requires an explicit IPv4 network
interface ACL instead of defaulting to receiving on ALL interfaces if
the IPv4 ACL is absent in the config.  I didn't update bind, so this is
kernel dependent.  It's possible that some features of v4 and v6 address
handling have been merged (there's an IPv6 interface ACL in the bind9
config).  It's also possible that attempts at name resolution hung up
Mailman's qrunner.  If it won't break, I can't fix it :-S

Thanks for your time and attention.
 
-- 
Lindsay Haisley   | Everything works if you let it
FMP Computer Services |
512-259-1190  |  --- The Roadie
http://www.fmp.com|

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-21 Thread Mark Sapiro
On 02/21/2014 01:03 PM, Lindsay Haisley wrote:

 It's also possible that attempts at name resolution hung up
 Mailman's qrunner.


That's a possibility, but the only name OutgoingRunner looks up is the
value of SMTPHOST which in a more or less default Mailman is 'localhost'
which should be resolved through /etc/hosts.

It is also possible that OutgoingRunner got wedged for some transient
reason unrelated to the kernel upgrade.


 If it won't break, I can't fix it :-S

True ...

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] Kernel update breaks Mailman!!

2014-02-20 Thread Lindsay Haisley
I'm running Mailman 2.1.15 on a Ubuntu server, feeding into Courier MTA,
running Python 2.7.3.  I track security updates and install them
promptly when they're issued by Ubuntu.  Yesterday I updated the Linux
kernel from 3.2.0-58-generic (x86_64) to 3.2.0-59-generic and Mailman
quit working.  List posts made it through to the archives, and were
apparently queued within Mailman, but wouldn't go out.  The mail server
was working OK for non-list email. Today I backed out the kernel update
and posts to lists sent yesterday and today are going out without
problems.

I can find nothing in the Mailman logs, or in the mail server logs,
indicating a problem.  

This is the first time I've ever had a problem with a server kernel
update breaking something related to mail.  Has anyone else seen this
problem?  Does anyone have any insight into how to address it?

-- 
Lindsay Haisley   | Everything works if you let it
FMP Computer Services |
512-259-1190  |  --- The Roadie
http://www.fmp.com|

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] Kernel update breaks Mailman!!

2014-02-20 Thread Lindsay Haisley
I'm running Mailman 2.1.15 on a Ubuntu server, feeding into Courier MTA,
running Python 2.7.3.  I track security updates and install them
promptly when they're issued by Ubuntu.  Yesterday I updated the Linux
kernel from 3.2.0-58-generic (x86_64) to 3.2.0-59-generic and Mailman
quit working.  List posts made it through to the archives, and were
apparently queued within Mailman, but wouldn't go out.  The mail server
was working OK for non-list email. Today I backed out the kernel update
and posts to lists sent yesterday and today are going out without
problems.

I can find nothing in the Mailman logs, or in the mail server logs,
indicating a problem.  

This is the first time I've ever had a problem with a server kernel
update breaking something related to mail.  Has anyone else seen this
problem?  Does anyone have any insight into how to address it?

-- 
Lindsay Haisley   | Everything works if you let it
FMP Computer Services |
512-259-1190  |  --- The Roadie
http://www.fmp.com|

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-20 Thread Mark Sapiro
On 02/20/2014 10:07 AM, Lindsay Haisley wrote:
 I'm running Mailman 2.1.15 on a Ubuntu server, feeding into Courier MTA,
 running Python 2.7.3.  I track security updates and install them
 promptly when they're issued by Ubuntu.  Yesterday I updated the Linux
 kernel from 3.2.0-58-generic (x86_64) to 3.2.0-59-generic and Mailman
 quit working.  List posts made it through to the archives, and were
 apparently queued within Mailman, but wouldn't go out.  The mail server
 was working OK for non-list email. Today I backed out the kernel update
 and posts to lists sent yesterday and today are going out without
 problems.


What's in Mailman's 'post' and 'smtp' logs for these messages. Are they
timestamped before or after you backed out the update. If before, they
were queued in the MTA. If after, they were in Mailman's 'out' queue.

If the latter, what's in Mailman's 'qrunner' log related to OutgoingRunner.

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-20 Thread Lindsay Haisley
On Thu, 2014-02-20 at 10:37 -0800, Mark Sapiro wrote:
 On 02/20/2014 10:07 AM, Lindsay Haisley wrote:
  I'm running Mailman 2.1.15 on a Ubuntu server, feeding into Courier MTA,
  running Python 2.7.3.  I track security updates and install them
  promptly when they're issued by Ubuntu.  Yesterday I updated the Linux
  kernel from 3.2.0-58-generic (x86_64) to 3.2.0-59-generic and Mailman
  quit working.  List posts made it through to the archives, and were
  apparently queued within Mailman, but wouldn't go out.  The mail server
  was working OK for non-list email. Today I backed out the kernel update
  and posts to lists sent yesterday and today are going out without
  problems.
 
 
 What's in Mailman's 'post' and 'smtp' logs for these messages. Are they
 timestamped before or after you backed out the update. If before, they
 were queued in the MTA. If after, they were in Mailman's 'out' queue.

They weren't in the MTA's queue.  Looking at the count of messages in
the MTA queue was how I determined that list posts weren't being
delivered to the MTA by Mailman.  I restarted qrunner and it didn't make
any difference.  The mail queue had like 67 messages in it.  This would
go up to 68 or 69 at time and then fall back down again - normal
behavior.  I could send and receive mail.  All indications are that the
MTA was working normally.  Mailman lists run from several hundred to a
couple of thousand subscribers and if someone posts to a list the MTA
mail queue shoots up to hundreds of messages with VERP sender addresses
shown in the queue summary, and then works its way back down.

 If the latter, what's in Mailman's 'qrunner' log related to OutgoingRunner.

Here's a sampling of the qrunner log from the wee hours, before I
started poking at the problem to try to fix it:

Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner exiting.
Feb 20 03:22:02 2014 (2445) BounceRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2445) BounceRunner qrunner exiting.
Feb 20 03:22:02 2014 (2446) CommandRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2446) CommandRunner qrunner exiting.
Feb 20 03:22:02 2014 (2451) RetryRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2443) Master watcher caught SIGINT.  Restarting.
Feb 20 03:22:02 2014 (2444) ArchRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2444) ArchRunner qrunner exiting.
Feb 20 03:22:02 2014 (2448) NewsRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2450) VirginRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2449) OutgoingRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2451) RetryRunner qrunner exiting.
Feb 20 03:22:02 2014 (2448) NewsRunner qrunner exiting.
Feb 20 03:22:02 2014 (2450) VirginRunner qrunner exiting.
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2445, sig: None, sts: 2, class: BounceRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2446, sig: None, sts: 2, class: CommandRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2451, sig: None, sts: 2, class: RetryRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2448, sig: None, sts: 2, class: NewsRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2444, sig: None, sts: 2, class: ArchRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2447, sig: None, sts: 2, class: IncomingRunner, slice: 1/1) [restarting]

FWIW, another very strange thing happened after the kernel upgrade,
totally unrelated to mail.  I run bind9 on the same server, and it
provides recursive DNS for all our in-house boxes coming from our LAN
through our VPN to our server.  This has been working fine for some
time, but after the kernel upgrade it quit working.  The bind9 config
specifies that if there's no ACL in the bind config then bind listens on
ALL interfaces.  There was an interface ACL for IPv6 but none for v4.
After the upgrade, bind no longer worked for us as our recursive server
UNLESS I provided an v4 interface ACL, which I did, and it started
working again.  Go figure.


-- 
Lindsay Haisley   | Everything works if you let it
FMP Computer Services |
512-259-1190  |  --- The Roadie
http://www.fmp.com|

--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Kernel update breaks Mailman!!

2014-02-20 Thread Lindsay Haisley
On Thu, 2014-02-20 at 14:39 -0500, Barry Warsaw wrote:
 I'm really quite surprised about this.  From the kernel version numbers, I'm
 guessing you're running Ubuntu 12.04 LTS?

You are correct about the Ubuntu release on the box.

My next step would be to reinstall the new kernel version and test this
issue in a controlled way, but before I do so I'd like to have some
diagnostic code or configs in place which will track and log the
intimate details of the relationship between Mailman and the MTA.  Any
suggestions will be appreciated!

I'm also going to post to the very active courier-users list and see if
anyone else is having this or similar problems.

-- 
Lindsay Haisley   | Everything works if you let it
FMP Computer Services |
512-259-1190  |  --- The Roadie
http://www.fmp.com|


--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org