Re: [Mailman-Users] Mailman throughput

2011-08-15 Thread Brad Knowles

On 08/14/2011 11:24 PM, Ivan Fetch wrote:


Brad, I think we are already accomplishing a lot of this minimalism,
since the MTA on the Mailman VM is only accepting the message via SMTP,
then handing it off to Mailman via the Postfix aliases. The spam and
other checks are done before hand, by another upstream gateway MTA. That
gateway then hands mailing list messages off to the Mailman box.


You're talking about inbound, and how you have outsourced many of these 
kinds of checks to other boxes.  That's fine as far as it goes, but I 
was talking about *outbound*, from Mailman to the world of recipients.



You are likely to have a certain number of messages coming into your 
system which will require a certain amount of processing to scan them 
for viruses and spam, etc


However, on outbound, you will presumably have this same number of 
messages multiplied by the number of recipients.


If that's an average of ten recipients per list, then you have a factor 
of ten increase in the amount of work done to scan those messages for 
viruses and spam -- and since all those messages are largely identical 
in those regards, that's all wasted work, and therefore that's all work 
that you want to avoid to the greatest degree possible.


As you scale up to thousands, tens of thousands, hundreds of thousands, 
etc... numbers of recipients, the more work you can avoid doing on the 
outbound side, the better.



This is true for subscribers which are not part of our organization
-  the MTA which Mailman relays to accepts the messages, and then deals
with any delivery issues. However, accounts for which this MTA is the
final destination, will tempfail under certain conditions, like
mismatched attributes in an LDAP record, or an issue with the mailstore.


And those are precisely the circumstances under which the MTA should not 
be handing a tempfail condition back to Mailman.  It should go ahead and 
blindly accept those messages and accept responsibility for them, and 
then it should deal with those tempfail cases internally.


Mailman is really, really bad at handling large queues for all the same 
reasons that MTAs from twenty years ago were bad at handling large 
queues -- they're largely single threaded, disk bound, and use a single 
outbound directory for all file locking and message queueing, which 
means that they are absolutely decimated when it comes to having to scan 
a linear linked list on disk when trying to store the next file or pull 
up the next file.


Modern MTAs are fully multi-threaded, they keep their active queue in 
memory as opposed to putting them on disk, and they hash the disk queues 
for inactive messages over a large distributed set of directories so if 
one process is working on the files in a given directory then the odds 
are vanishingly small that any other process would be blocked waiting on 
the lock for that directory.



You wouldn't put a Model-T Ford into a Formula-1 race today, and 
likewise you should not be depending on ancient queueing methods as your 
bottleneck for handling all your outgoing mail.


Or, if you have no choice but to depend on them at all, then you should 
minimize your dependence on them as much as you possibly can.



For better or worse, we are moving a lot of our mailboxes to mail
forwards over the next few months - this will move the rest of these
tempfails out of Mailman's SMTP / retry queue, and into the downstream
relay (where they belong).


From Mailman's perspective, your local MTA *IS* the downstream relay, 
and it should not be causing these kinds of loads to be put on Mailman.


Pull as much of the queueing as possible out of Mailman and put it into 
your local MTA.  From there, it becomes an MTA problem, and it doesn't 
matter to Mailman whether the mailboxes are local or remote.



I say all this as a specialist in designing and building large-scale 
mail systems (such as AOL), a long-term member of the Mailman project, 
and a member of the postmaster team for python.org where all the 
official Mailman mailing lists are hosted -- using Mailman.


--
Brad Knowles b...@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-15 Thread Brad Knowles

On 08/15/2011 02:49 AM, Brad Knowles wrote:


You're talking about inbound, and how you have outsourced many of these
kinds of checks to other boxes. That's fine as far as it goes, but I was
talking about *outbound*, from Mailman to the world of recipients.


You are likely to have a certain number of messages coming into your
system which will require a certain amount of processing to scan them
for viruses and spam, etc

However, on outbound, you will presumably have this same number of
messages multiplied by the number of recipients.


I just thought of an analogy that I think will be very useful here. 
Input and output are two related, but very different processes -- both 
for computers as well as humans.  Having a pee is a different process 
from drinking a beer -- related, but still different.


Generally speaking, you want to think about mixing your inputs and your 
outputs -- and this gets more and more important as you scale up.  A 
single person who pees in the Colorado River is not going to materially 
impact the water quality of the downstream communities, but if an entire 
city were to dump untreated sewage into the river on an ongoing basis, 
that would be a different matter.



Likewise with e-mail, what works well for you as a small site is 
probably going to be something that you find doesn't necessarily work so 
well as you get bigger and bigger.  Mixing your inputs and outputs is 
one of those factors.


For example, when processing incoming e-mail, you want to apply one set 
of rules for handling viruses, but you want to apply a different set for 
outbound mail.  In both cases, you want to notify the internal person at 
your site about the situation and let them work on how to deal with the 
issue, but they are the recipient on inbound and they are the sender on 
outbound -- so you can't take a simple always notify the sender or 
always notify the recipient policy.


If you have performance complaints, then you have to look at where your 
bottlenecks are and what those bottlenecks do to you.  Eliminate the 
biggest bottlenecks first, then work on the next one.  If cost is a 
factor, then try to find big bottlenecks that you can fix that won't 
cost as much money, and keep working on eliminating those key 
bottlenecks as you find whatever the new issue is.  Again, mixing inputs 
and outputs tends to be one of those key bottlenecks, both overall and 
with regards to return-on-investment.



In the case of Mailman, we can reasonably guarantee that we follow the 
GIGO principle -- Garbage In, Garbage Out.  If you can keep the inbound 
flow of e-mail clean, then there's nothing that Mailman does that should 
make the outbound flow dirty again, so you can safely by-pass all the 
checks that you would normally make at the MTA level for outbound mail 
from Mailman.


At least, as far as your local MTA is concerned, you can eliminate all 
those checks.  If the checks are done at your edge, then changes to your 
local MTA won't have any impact on whether or not that work is done and 
how much it costs you, but at least you can avoid causing unnecessary 
additional load on Mailman itself.



Of course, the nature of mailing lists means that Mailman will multiply 
by orders of magnitude the amount of work to be done on outbound as 
compared to inbound, so if you can eliminate any of those unnecessary 
checks then that will tend to be a huge win overall with regards to both 
performance and monetary cost -- you won't have to devote so much money 
and resources to building a larger system to handle the flow, if you can 
make sure that the Mailman part of that flow is already clean and 
therefore doesn't need to be re-checked.




So, the general rules are don't mix the inputs and outputs, especially 
as you scale up.


--
Brad Knowles b...@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-15 Thread Ivan Fetch
Hi Brad,

On Aug 15, 2011, at 1:49 AM, Brad Knowles wrote:

On 08/14/2011 11:24 PM, Ivan Fetch wrote:

Brad, I think we are already accomplishing a lot of this minimalism,
since the MTA on the Mailman VM is only accepting the message via SMTP,
then handing it off to Mailman via the Postfix aliases. The spam and
other checks are done before hand, by another upstream gateway MTA. That
gateway then hands mailing list messages off to the Mailman box.

You're talking about inbound, and how you have outsourced many of these
kinds of checks to other boxes.  That's fine as far as it goes, but I
was talking about *outbound*, from Mailman to the world of recipients.


You are likely to have a certain number of messages coming into your
system which will require a certain amount of processing to scan them
for viruses and spam, etc

However, on outbound, you will presumably have this same number of
messages multiplied by the number of recipients.

If that's an average of ten recipients per list, then you have a factor
of ten increase in the amount of work done to scan those messages for
viruses and spam -- and since all those messages are largely identical
in those regards, that's all wasted work, and therefore that's all work
that you want to avoid to the greatest degree possible.

As you scale up to thousands, tens of thousands, hundreds of thousands,
etc... numbers of recipients, the more work you can avoid doing on the
outbound side, the better.


OK - now we're on the same page. :) The MTA which Mailman relays to, does not 
repeat processes like virus / spam scanning. We are re-working our gateways and 
relays over the next few months, to further separate out these roles. E.G. 
Quarantine of spam will be handled before a message hits Mailman, not after the 
message has been exploded to list subscribers.



This is true for subscribers which are not part of our organization
-  the MTA which Mailman relays to accepts the messages, and then deals
with any delivery issues. However, accounts for which this MTA is the
final destination, will tempfail under certain conditions, like
mismatched attributes in an LDAP record, or an issue with the mailstore.

And those are precisely the circumstances under which the MTA should not
be handing a tempfail condition back to Mailman.  It should go ahead and
blindly accept those messages and accept responsibility for them, and
then it should deal with those tempfail cases internally.

We are definitely moving to this (MTA will accept what ever Mailman gives it). 
For the next few months, we will have some local accounts tempfailing, until we 
get off of Sun IMS or JSMS or what ever the product is named today. Part of why 
the relayis tempfailing, is because we hapen to be using a relay which is also 
a mailstore.



Mailman is really, really bad at handling large queues for all the same
reasons that MTAs from twenty years ago were bad at handling large
queues -- they're largely single threaded, disk bound, and use a single
outbound directory for all file locking and message queueing, which
means that they are absolutely decimated when it comes to having to scan
a linear linked list on disk when trying to store the next file or pull
up the next file.

Modern MTAs are fully multi-threaded, they keep their active queue in
memory as opposed to putting them on disk, and they hash the disk queues
for inactive messages over a large distributed set of directories so if
one process is working on the files in a given directory then the odds
are vanishingly small that any other process would be blocked waiting on
the lock for that directory.

AH, good to know RE: Mailman queueing. SO, the only reason why things should be 
in qfiles/retry, woudl be something like a relay being unavailable.


For better or worse, we are moving a lot of our mailboxes to mail
forwards over the next few months - this will move the rest of these
tempfails out of Mailman's SMTP / retry queue, and into the downstream
relay (where they belong).

From Mailman's perspective, your local MTA *IS* the downstream relay,
and it should not be causing these kinds of loads to be put on Mailman.

Pull as much of the queueing as possible out of Mailman and put it into
your local MTA.  From there, it becomes an MTA problem, and it doesn't
matter to Mailman whether the mailboxes are local or remote.

WHen you say local MTA you don't mean strictly local to the Mailman box 
right? I believe you mean local as in a separate relay box.


I say all this as a specialist in designing and building large-scale
mail systems (such as AOL), a long-term member of the Mailman project,
and a member of the postmaster team for python.orghttp://python.org where all 
the
official Mailman mailing lists are hosted -- using Mailman.


Thanks Brad, for your time on this, and your later analogy RE: input and output.

- Ivan























.
--
Mailman-Users mailing list 

[Mailman-Users] Mailman throughput

2011-08-14 Thread Ivan Fetch
Hello,

I am trying to gage the capability of a Mailman virtual machine, which we will 
be moving our lists to. I'd like to do my best to size and tune this VM, and 
it's Postfix and Mailman installation, before putting it in production, and 
potentially having to troubleshoot and tune in a hurry.

What is a reasonable / realistic way to benchmark a Mailman installation? Are 
there details of other similarly sized instlalations and throughput numbers 
which I can compare?

We have 1300 mailing lists, and average 68000 posts to lists per day. We have 
lists as large as 7000 subscribers, but I'd say that the average size of a list 
is 500 subscribers (this number is a rough guess, based on some crunching of 
Mailman logs).

THe new VM will receive messages for mailing lists using Postfix. Mailman hands 
off to a separate box to deliver to list recipients. So Postfix on the Mailman 
VM will not be busy trying to deliver to subscribers.

I am sending some test messages through the VM, but don't know whether this is 
useful or pointless, in telling me how capable the VM is.

The VM processed 1 messages in an hour and 10 minutes. The messages went to 
two lists, one with 25 recipients and the other with 500 recipients. The VM 
(Linux) peaked at a load average of 4 (2 VCPUs), and was using 800Mb of it's 2G 
ERAM for IO caching. I could add more resources, but it doesn't look like they 
would get used.

Initially (maybe for the first 15-20 minutes) the Postfix queue had to catch up 
submitting messages to Mailman. After that, qfiles/in was empty, and the work 
which remained was to process qfiles/out and SMTP the outbound messages to the 
relay. I raised SMTP_MAX_RCPTS from 500 to 1000, but this did not seem to make 
a difference.

SO I may be able to tune Postfix handing off to Mailman, as well as Mailman 
handing off to the SMTP relay. Any suggestoins here?


The current Mailman box runs SOlaris / Sparc. The Mailman processes use 1.2G of 
memory, and CPU (this is a Sunfire 880, 4 Ultrasparc III procs) hovers around 
5%.


Thanks,

Ivan.























.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Mark Sapiro
Ivan Fetch wrote:

What is a reasonable / realistic way to benchmark a Mailman installation? Are 
there details of other similarly sized instlalations and throughput numbers 
which I can compare?


I don't have any data for a comparable size installation. The benchmark
test you describe seems reasonable to me.

See below for some comments.


I am sending some test messages through the VM, but don't know whether this is 
useful or pointless, in telling me how capable the VM is.


It seems this should be useful.


The VM processed 1 messages in an hour and 10 minutes. The messages went 
to two lists, one with 25 recipients and the other with 500 recipients. The VM 
(Linux) peaked at a load average of 4 (2 VCPUs), and was using 800Mb of it's 
2G ERAM for IO caching. I could add more resources, but it doesn't look like 
they would get used.

Initially (maybe for the first 15-20 minutes) the Postfix queue had to catch 
up submitting messages to Mailman. After that, qfiles/in was empty, and the 
work which remained was to process qfiles/out and SMTP the outbound messages 
to the relay. I raised SMTP_MAX_RCPTS from 500 to 1000, but this did not seem 
to make a difference.


There are several things going on.

1) Processing mail through Postfix to Mailman. This is almost entirely
Postfix. Delivery is piping the message to the mail wrapper which
involves very little Mailman processing - only making a queue entry
and storing it in qfiles/in. Any tuning would have to be in Postfix,
but I don't know what would be applicable beyond ensuring Postfix has
enough resources do do the job.

2) Mailman's IncommingRunner picking up the message from qfiles/in,
processing in through the pipeline and queuing the result in
qfiles/out and qfiles/archive. Also if the list is digestable, the
message will be added to the list's digest.mbox and possibly a digest
will be triggered on size.

3) Mailman's ArchRunner picking up the message from qfiles/archive and
adding it to the list's archive.

4) Mailman's OutgoingRunner picking up the message from qfiles/out and
delivering it to the outgoing MTA.


SO I may be able to tune Postfix handing off to Mailman, as well as Mailman 
handing off to the SMTP relay. Any suggestoins here?


It seems the major hurdle is in processing the 'out' queue. It is
possible to slice OutgoingRunner to provide some parallelism in this
process and that may speed things up, but I suspect that a lot of the
time is in network communications between OutgoingRunner and the
remote Postfix and that slicing OutgoingRunner may not help much, but
it would be worth rerunning your benchmark with 2 or 4 outgoing runner
slices to see if it helps.

Your raising of SMTP_MAX_RCPTS from 500 to 1000 would not have any
effect because your larger list had only 500 members so no outgoing
message had more than 500 recipients. Even if this were not the case,
I don't think raising SMTP_MAX_RCPTS would make much difference.
Messages are sent via SMTP transactions which look like

MAIL FROM ...
  reply
RCPT TO ...
  reply
RCPT TO ...
  reply
... repeated for each recipient
DATA
  reply
message text
  reply

If SMTP_MAX_RCPTS = 500, and there are a total of 500 recipients, the
above is done once with 500 recipients. If SMTP_MAX_RCPTS = 50, the
above would be done 10 times with 50 recipients per transaction which
would result in 9 additional MAIL FROM, DATA and message text
interactions, but I don't think this would add significantly to the
processing time.

There are some MTA tuning tips in the FAQ
http://wiki.list.org/x/AgA3, but some are only applicable to Mailman
2.0 so be careful.

The main outgoing MTA performance killer is doing DNS verification on
recipient domains during SMTP from Mailman. This should be avoided.

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Ivan Fetch
Hello,

THanks Mark, I appreciate this. MOre below:

On Aug 14, 2011, at 11:39 AM, Mark Sapiro wrote:
 
 It seems the major hurdle is in processing the 'out' queue. It is
 possible to slice OutgoingRunner to provide some parallelism in this
 process and that may speed things up, but I suspect that a lot of the
 time is in network communications between OutgoingRunner and the
 remote Postfix and that slicing OutgoingRunner may not help much, but
 it would be worth rerunning your benchmark with 2 or 4 outgoing runner
 slices to see if it helps.
 
BY slicing, do you mean setting MAX_DELIVERY_THREADS in mm_cfg.py (restarting 
Mailman of course)? I did this, with values of 2 and 4, and if it made a 
difference for my smaller benchmark of 5000 messages, to one list of 25 
recipients, it was only seconds of improvement.


 There are some MTA tuning tips in the FAQ
 http://wiki.list.org/x/AgA3, but some are only applicable to Mailman
 2.0 so be careful.
 
The only thing I can think of which may help, given that Mailman's Postfix is 
not delivering to subscribers, is to adjust concurrency or backoff settings for 
the local delivery agent, which is piping messages into Mailman's post script. 
I'm not sure whether Postfix still uses the backoff algorythm, when using local 
and pipe though.


 The main outgoing MTA performance killer is doing DNS verification on
 recipient domains during SMTP from Mailman. This should be avoided.

Using a local DNS cache cut my 5000 messages to a 25 recipient list, from 10 
minutes down to 8  1/2 minutes. Even avoiding looking up the same hand full of 
hosts over and over again, helps.


I have to amend my earlier statement about our receiving 68000 posts per day - 
I was not careful enough when mining the post log; a lot of the posts are 
Mailman retrying delivery for tempfailed subscribers. So we do not see 68000 
distinct posts, but we are doing a lot of redelivery attempts. Apparently we 
need to tune bounce processing for lists - this can be challenging to get 
right, and seems to require individual attention per list. I suppose I could 
have Mailman retry delivery less often, and if we have something like an outage 
of our own relays, I just trigger a retry by restarting the queue runners.


Thanks,

Ivan.























.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Mark Sapiro
On 8/14/2011 1:39 PM, Ivan Fetch wrote:
 
 On Aug 14, 2011, at 11:39 AM, Mark Sapiro wrote:

 It seems the major hurdle is in processing the 'out' queue. It is
 possible to slice OutgoingRunner to provide some parallelism in this
 process and that may speed things up, but I suspect that a lot of the
 time is in network communications between OutgoingRunner and the
 remote Postfix and that slicing OutgoingRunner may not help much, but
 it would be worth rerunning your benchmark with 2 or 4 outgoing runner
 slices to see if it helps.

 BY slicing, do you mean setting MAX_DELIVERY_THREADS in mm_cfg.py (restarting 
 Mailman of course)? I did this, with values of 2 and 4, and if it made a 
 difference for my smaller benchmark of 5000 messages, to one list of 25 
 recipients, it was only seconds of improvement.


No. Threaded delivery in SMTPDirect.py was an experimental feature in
Mailman 2.0. It was never implemented for Mailman 2.1 although the
setting and its documentation were not removed from Defaults.py. Setting
this in mm_cfg.py has no effect. Any difference would be due to random
variation or other factors.

What I meant was to put something like

try:
QRUNNERS.remove(('OutgoingRunner', 1))
QRUNNERS.append(('OutgoingRunner', 2))
except ValueError:
pass

in mm_cfg.py and restart Mailman. The above will cause Mailman to start
two copies of OutgoingRunner with each processing half of the hashed
queue space. See the

#
# Qrunner defaults
#

section in Defaults.py for more info.


[...]
 I have to amend my earlier statement about our receiving 68000 posts per day 
 - I was not careful enough when mining the post log; a lot of the posts are 
 Mailman retrying delivery for tempfailed subscribers. So we do not see 68000 
 distinct posts, but we are doing a lot of redelivery attempts. Apparently we 
 need to tune bounce processing for lists - this can be challenging to get 
 right, and seems to require individual attention per list. I suppose I could 
 have Mailman retry delivery less often, and if we have something like an 
 outage of our own relays, I just trigger a retry by restarting the queue 
 runners.


Just FYI, bounce processing never sees the retries until such time as
Mailman's retry processing gives up on the delivery (default after 5 days).

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Ivan Fetch
Hello,

On Aug 14, 2011, at 4:15 PM, Mark Sapiro wrote:

 No. Threaded delivery in SMTPDirect.py was an experimental feature in
 Mailman 2.0. It was never implemented for Mailman 2.1 although the
 setting and its documentation were not removed from Defaults.py. Setting
 this in mm_cfg.py has no effect. Any difference would be due to random
 variation or other factors.
 
 What I meant was to put something like
 
 try:
QRUNNERS.remove(('OutgoingRunner', 1))
QRUNNERS.append(('OutgoingRunner', 2))
 except ValueError:
pass
 
 in mm_cfg.py and restart Mailman. The above will cause Mailman to start
 two copies of OutgoingRunner with each processing half of the hashed
 queue space. See the
 
 #
 # Qrunner defaults
 #
 
 section in Defaults.py for more info.
 


Ok, I did this, and verified that more outgoing processes were started (in the 
qrunnenr log, and with ps). Testing 5000 messages to a list with 25 recipients 
took:
8  1/2 minutes with 1 outgoing slice
5  1/2 minutes with 2 slices
exactly 5 minutes with 4 slices.

I noticed that the incoming qrunner was using 10% CPU (according to the pcpu 
column of ps) even after qfiles/in was empty, and after all 5000 messages were 
processed. I wonder what the incoming runner is doing - any ideas there?


 Just FYI, bounce processing never sees the retries until such time as
 Mailman's retry processing gives up on the delivery (default after 5 days).

OK, this just means that a message which is tempfailing will have to get 
retried for 5 days, before normal bounce processing rules can (potentially) act 
on it. I suspect a lot of these addresses are our own accounts which are 
tempfailing because they are disabled, in some sort of transition, or have 
broken LDAP records - I will look at smtp-failure some more.

Thanks,

Ivan.























.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Mark Sapiro
On 8/14/2011 4:25 PM, Ivan Fetch wrote:
 
 I noticed that the incoming qrunner was using 10% CPU (according to the pcpu 
 column of ps) even after qfiles/in was empty, and after all 5000 messages 
 were processed. I wonder what the incoming runner is doing - any ideas there?


If I am not mistaken, pcpu is an average, not a current value. I.e., it
is total CPU time divided by elapsed time since process initiation for
the given process. What does %CPU from 'top' tell you?

-- 
Mark Sapiro m...@msapiro.netThe highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Brad Knowles

On 08/14/2011 03:39 PM, Ivan Fetch wrote:


There are some MTA tuning tips in the FAQ
http://wiki.list.org/x/AgA3, but some are only applicable to Mailman
2.0 so be careful.


The majority of the MTA tuning tips that I know of should be applicable 
to most any mailing list manager, since they are oriented towards 
helping the MTA better deal with large amounts of outgoing mail, and 
optimizing certain types of behaviors that are common with most mailing 
lists.


But I'll have to re-fresh my memory of what is written there.


The main outgoing MTA performance killer is doing DNS verification on
recipient domains during SMTP from Mailman. This should be avoided.


Using a local DNS cache cut my 5000 messages to a 25 recipient list,
from 10 minutes down to 8 1/2 minutes. Even avoiding looking up the
same hand full of hosts over and over again, helps.


Generally speaking, if there are any real-time queries being done by 
your MTA, you want those done against the message as it comes into your 
mail system the first time -- this includes checking black lists, 
checking content, or anything else.


You want to run a separate instance of your MTA for handling your 
outbound mail and it should listen only to a special port on the 
127.0.0.1 loopback interface where Mailman can speak directly to it, 
and that special instance should have pretty much all DNS queries and 
real-time checks turned off.  After all, those things should have been 
done when the message was checked on inbound and shouldn't need to be 
checked again on outbound.



I have to amend my earlier statement about our receiving 68000 posts
per day - I was not careful enough when mining the post log; a lot of
the posts are Mailman retrying delivery for tempfailed subscribers. So
we do not see 68000 distinct posts, but we are doing a lot of redelivery
attempts. Apparently we need to tune bounce processing for lists - this
can be challenging to get right, and seems to require individual
attention per list. I suppose I could have Mailman retry delivery less
often, and if we have something like an outage of our own relays, I just
trigger a retry by restarting the queue runners.


If Mailman is dealing with tempfails, then you've done something wrong. 
 The MTA should be blindly accepting whatever Mailman has to send, and 
then the MTA should be dealing with tempfails -- it's one step closer to 
wherever the problem might be, and it's more likely to be tuned for that 
kind of behaviour.


For example, most modern MTAs give you the ability to set up separate 
queues for given outbound targets, which are kept apart from all the 
other regular mail being handled.  This way you can set up local 
queues in your MTA that may have different resource handling rules or 
different retry algorithms, as compared to queues to external sites that 
might be known for being troublesome.


We were doing this kind of thing at AOL back in the mid-90s, and this 
has only gotten easier since.


--
Brad Knowles b...@shub-internet.org
LinkedIn Profile: http://tinyurl.com/y8kpxu
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Mailman throughput

2011-08-14 Thread Ivan Fetch
Hi Brad,

On Aug 14, 2011, at 8:44 PM, Brad Knowles wrote:

 The majority of the MTA tuning tips that I know of should be applicable 
 to most any mailing list manager, since they are oriented towards 
 helping the MTA better deal with large amounts of outgoing mail, and 
 optimizing certain types of behaviors that are common with most mailing 
 lists.
 
 But I'll have to re-fresh my memory of what is written there.
 

 Generally speaking, if there are any real-time queries being done by 
 your MTA, you want those done against the message as it comes into your 
 mail system the first time -- this includes checking black lists, 
 checking content, or anything else.
 
 You want to run a separate instance of your MTA for handling your 
 outbound mail and it should listen only to a special port on the 
 127.0.0.1 loopback interface where Mailman can speak directly to it, 
 and that special instance should have pretty much all DNS queries and 
 real-time checks turned off.  After all, those things should have been 
 done when the message was checked on inbound and shouldn't need to be 
 checked again on outbound.
 
Brad, I think we are already accomplishing a lot of this minimalism, since the 
MTA on the Mailman VM is only accepting the message via SMTP, then handing it 
off to Mailman via the Postfix aliases. The spam and other checks are done 
before hand, by another upstream gateway MTA. That gateway then hands mailing 
list messages off to the Mailman box.

 
 If Mailman is dealing with tempfails, then you've done something wrong. 
  The MTA should be blindly accepting whatever Mailman has to send, and 
 then the MTA should be dealing with tempfails -- it's one step closer to 
 wherever the problem might be, and it's more likely to be tuned for that 
 kind of behaviour.
 
 For example, most modern MTAs give you the ability to set up separate 
 queues for given outbound targets, which are kept apart from all the 
 other regular mail being handled.  This way you can set up local 
 queues in your MTA that may have different resource handling rules or 
 different retry algorithms, as compared to queues to external sites that 
 might be known for being troublesome.
 
 We were doing this kind of thing at AOL back in the mid-90s, and this 
 has only gotten easier since.


This is true for subscribers which are not part of our organization - the MTA 
which Mailman relays to accepts the messages, and then deals with any delivery 
issues. However, accounts for which this MTA is the final destination, will 
tempfail under certain conditions, like mismatched attributes in an LDAP 
record, or an issue with the mailstore.

For better or worse, we are moving a lot of our mailboxes to mail forwards over 
the next few months - this will move the rest of these tempfails out of 
Mailman's SMTP / retry queue, and into the downstream relay (where they belong).


Thanks,

Ivan.























.
--
Mailman-Users mailing list Mailman-Users@python.org
http://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-users/archive%40jab.org