Re: [Mailman-Users] Mailman throughput
On 08/14/2011 11:24 PM, Ivan Fetch wrote: Brad, I think we are already accomplishing a lot of this minimalism, since the MTA on the Mailman VM is only accepting the message via SMTP, then handing it off to Mailman via the Postfix aliases. The spam and other checks are done before hand, by another upstream gateway MTA. That gateway then hands mailing list messages off to the Mailman box. You're talking about inbound, and how you have outsourced many of these kinds of checks to other boxes. That's fine as far as it goes, but I was talking about *outbound*, from Mailman to the world of recipients. You are likely to have a certain number of messages coming into your system which will require a certain amount of processing to scan them for viruses and spam, etc However, on outbound, you will presumably have this same number of messages multiplied by the number of recipients. If that's an average of ten recipients per list, then you have a factor of ten increase in the amount of work done to scan those messages for viruses and spam -- and since all those messages are largely identical in those regards, that's all wasted work, and therefore that's all work that you want to avoid to the greatest degree possible. As you scale up to thousands, tens of thousands, hundreds of thousands, etc... numbers of recipients, the more work you can avoid doing on the outbound side, the better. This is true for subscribers which are not part of our organization - the MTA which Mailman relays to accepts the messages, and then deals with any delivery issues. However, accounts for which this MTA is the final destination, will tempfail under certain conditions, like mismatched attributes in an LDAP record, or an issue with the mailstore. And those are precisely the circumstances under which the MTA should not be handing a tempfail condition back to Mailman. It should go ahead and blindly accept those messages and accept responsibility for them, and then it should deal with those tempfail cases internally. Mailman is really, really bad at handling large queues for all the same reasons that MTAs from twenty years ago were bad at handling large queues -- they're largely single threaded, disk bound, and use a single outbound directory for all file locking and message queueing, which means that they are absolutely decimated when it comes to having to scan a linear linked list on disk when trying to store the next file or pull up the next file. Modern MTAs are fully multi-threaded, they keep their active queue in memory as opposed to putting them on disk, and they hash the disk queues for inactive messages over a large distributed set of directories so if one process is working on the files in a given directory then the odds are vanishingly small that any other process would be blocked waiting on the lock for that directory. You wouldn't put a Model-T Ford into a Formula-1 race today, and likewise you should not be depending on ancient queueing methods as your bottleneck for handling all your outgoing mail. Or, if you have no choice but to depend on them at all, then you should minimize your dependence on them as much as you possibly can. For better or worse, we are moving a lot of our mailboxes to mail forwards over the next few months - this will move the rest of these tempfails out of Mailman's SMTP / retry queue, and into the downstream relay (where they belong). From Mailman's perspective, your local MTA *IS* the downstream relay, and it should not be causing these kinds of loads to be put on Mailman. Pull as much of the queueing as possible out of Mailman and put it into your local MTA. From there, it becomes an MTA problem, and it doesn't matter to Mailman whether the mailboxes are local or remote. I say all this as a specialist in designing and building large-scale mail systems (such as AOL), a long-term member of the Mailman project, and a member of the postmaster team for python.org where all the official Mailman mailing lists are hosted -- using Mailman. -- Brad Knowles b...@shub-internet.org LinkedIn Profile: http://tinyurl.com/y8kpxu -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
On 08/15/2011 02:49 AM, Brad Knowles wrote: You're talking about inbound, and how you have outsourced many of these kinds of checks to other boxes. That's fine as far as it goes, but I was talking about *outbound*, from Mailman to the world of recipients. You are likely to have a certain number of messages coming into your system which will require a certain amount of processing to scan them for viruses and spam, etc However, on outbound, you will presumably have this same number of messages multiplied by the number of recipients. I just thought of an analogy that I think will be very useful here. Input and output are two related, but very different processes -- both for computers as well as humans. Having a pee is a different process from drinking a beer -- related, but still different. Generally speaking, you want to think about mixing your inputs and your outputs -- and this gets more and more important as you scale up. A single person who pees in the Colorado River is not going to materially impact the water quality of the downstream communities, but if an entire city were to dump untreated sewage into the river on an ongoing basis, that would be a different matter. Likewise with e-mail, what works well for you as a small site is probably going to be something that you find doesn't necessarily work so well as you get bigger and bigger. Mixing your inputs and outputs is one of those factors. For example, when processing incoming e-mail, you want to apply one set of rules for handling viruses, but you want to apply a different set for outbound mail. In both cases, you want to notify the internal person at your site about the situation and let them work on how to deal with the issue, but they are the recipient on inbound and they are the sender on outbound -- so you can't take a simple always notify the sender or always notify the recipient policy. If you have performance complaints, then you have to look at where your bottlenecks are and what those bottlenecks do to you. Eliminate the biggest bottlenecks first, then work on the next one. If cost is a factor, then try to find big bottlenecks that you can fix that won't cost as much money, and keep working on eliminating those key bottlenecks as you find whatever the new issue is. Again, mixing inputs and outputs tends to be one of those key bottlenecks, both overall and with regards to return-on-investment. In the case of Mailman, we can reasonably guarantee that we follow the GIGO principle -- Garbage In, Garbage Out. If you can keep the inbound flow of e-mail clean, then there's nothing that Mailman does that should make the outbound flow dirty again, so you can safely by-pass all the checks that you would normally make at the MTA level for outbound mail from Mailman. At least, as far as your local MTA is concerned, you can eliminate all those checks. If the checks are done at your edge, then changes to your local MTA won't have any impact on whether or not that work is done and how much it costs you, but at least you can avoid causing unnecessary additional load on Mailman itself. Of course, the nature of mailing lists means that Mailman will multiply by orders of magnitude the amount of work to be done on outbound as compared to inbound, so if you can eliminate any of those unnecessary checks then that will tend to be a huge win overall with regards to both performance and monetary cost -- you won't have to devote so much money and resources to building a larger system to handle the flow, if you can make sure that the Mailman part of that flow is already clean and therefore doesn't need to be re-checked. So, the general rules are don't mix the inputs and outputs, especially as you scale up. -- Brad Knowles b...@shub-internet.org LinkedIn Profile: http://tinyurl.com/y8kpxu -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
Hi Brad, On Aug 15, 2011, at 1:49 AM, Brad Knowles wrote: On 08/14/2011 11:24 PM, Ivan Fetch wrote: Brad, I think we are already accomplishing a lot of this minimalism, since the MTA on the Mailman VM is only accepting the message via SMTP, then handing it off to Mailman via the Postfix aliases. The spam and other checks are done before hand, by another upstream gateway MTA. That gateway then hands mailing list messages off to the Mailman box. You're talking about inbound, and how you have outsourced many of these kinds of checks to other boxes. That's fine as far as it goes, but I was talking about *outbound*, from Mailman to the world of recipients. You are likely to have a certain number of messages coming into your system which will require a certain amount of processing to scan them for viruses and spam, etc However, on outbound, you will presumably have this same number of messages multiplied by the number of recipients. If that's an average of ten recipients per list, then you have a factor of ten increase in the amount of work done to scan those messages for viruses and spam -- and since all those messages are largely identical in those regards, that's all wasted work, and therefore that's all work that you want to avoid to the greatest degree possible. As you scale up to thousands, tens of thousands, hundreds of thousands, etc... numbers of recipients, the more work you can avoid doing on the outbound side, the better. OK - now we're on the same page. :) The MTA which Mailman relays to, does not repeat processes like virus / spam scanning. We are re-working our gateways and relays over the next few months, to further separate out these roles. E.G. Quarantine of spam will be handled before a message hits Mailman, not after the message has been exploded to list subscribers. This is true for subscribers which are not part of our organization - the MTA which Mailman relays to accepts the messages, and then deals with any delivery issues. However, accounts for which this MTA is the final destination, will tempfail under certain conditions, like mismatched attributes in an LDAP record, or an issue with the mailstore. And those are precisely the circumstances under which the MTA should not be handing a tempfail condition back to Mailman. It should go ahead and blindly accept those messages and accept responsibility for them, and then it should deal with those tempfail cases internally. We are definitely moving to this (MTA will accept what ever Mailman gives it). For the next few months, we will have some local accounts tempfailing, until we get off of Sun IMS or JSMS or what ever the product is named today. Part of why the relayis tempfailing, is because we hapen to be using a relay which is also a mailstore. Mailman is really, really bad at handling large queues for all the same reasons that MTAs from twenty years ago were bad at handling large queues -- they're largely single threaded, disk bound, and use a single outbound directory for all file locking and message queueing, which means that they are absolutely decimated when it comes to having to scan a linear linked list on disk when trying to store the next file or pull up the next file. Modern MTAs are fully multi-threaded, they keep their active queue in memory as opposed to putting them on disk, and they hash the disk queues for inactive messages over a large distributed set of directories so if one process is working on the files in a given directory then the odds are vanishingly small that any other process would be blocked waiting on the lock for that directory. AH, good to know RE: Mailman queueing. SO, the only reason why things should be in qfiles/retry, woudl be something like a relay being unavailable. For better or worse, we are moving a lot of our mailboxes to mail forwards over the next few months - this will move the rest of these tempfails out of Mailman's SMTP / retry queue, and into the downstream relay (where they belong). From Mailman's perspective, your local MTA *IS* the downstream relay, and it should not be causing these kinds of loads to be put on Mailman. Pull as much of the queueing as possible out of Mailman and put it into your local MTA. From there, it becomes an MTA problem, and it doesn't matter to Mailman whether the mailboxes are local or remote. WHen you say local MTA you don't mean strictly local to the Mailman box right? I believe you mean local as in a separate relay box. I say all this as a specialist in designing and building large-scale mail systems (such as AOL), a long-term member of the Mailman project, and a member of the postmaster team for python.orghttp://python.org where all the official Mailman mailing lists are hosted -- using Mailman. Thanks Brad, for your time on this, and your later analogy RE: input and output. - Ivan . -- Mailman-Users mailing list
[Mailman-Users] Mailman throughput
Hello, I am trying to gage the capability of a Mailman virtual machine, which we will be moving our lists to. I'd like to do my best to size and tune this VM, and it's Postfix and Mailman installation, before putting it in production, and potentially having to troubleshoot and tune in a hurry. What is a reasonable / realistic way to benchmark a Mailman installation? Are there details of other similarly sized instlalations and throughput numbers which I can compare? We have 1300 mailing lists, and average 68000 posts to lists per day. We have lists as large as 7000 subscribers, but I'd say that the average size of a list is 500 subscribers (this number is a rough guess, based on some crunching of Mailman logs). THe new VM will receive messages for mailing lists using Postfix. Mailman hands off to a separate box to deliver to list recipients. So Postfix on the Mailman VM will not be busy trying to deliver to subscribers. I am sending some test messages through the VM, but don't know whether this is useful or pointless, in telling me how capable the VM is. The VM processed 1 messages in an hour and 10 minutes. The messages went to two lists, one with 25 recipients and the other with 500 recipients. The VM (Linux) peaked at a load average of 4 (2 VCPUs), and was using 800Mb of it's 2G ERAM for IO caching. I could add more resources, but it doesn't look like they would get used. Initially (maybe for the first 15-20 minutes) the Postfix queue had to catch up submitting messages to Mailman. After that, qfiles/in was empty, and the work which remained was to process qfiles/out and SMTP the outbound messages to the relay. I raised SMTP_MAX_RCPTS from 500 to 1000, but this did not seem to make a difference. SO I may be able to tune Postfix handing off to Mailman, as well as Mailman handing off to the SMTP relay. Any suggestoins here? The current Mailman box runs SOlaris / Sparc. The Mailman processes use 1.2G of memory, and CPU (this is a Sunfire 880, 4 Ultrasparc III procs) hovers around 5%. Thanks, Ivan. . -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
Ivan Fetch wrote: What is a reasonable / realistic way to benchmark a Mailman installation? Are there details of other similarly sized instlalations and throughput numbers which I can compare? I don't have any data for a comparable size installation. The benchmark test you describe seems reasonable to me. See below for some comments. I am sending some test messages through the VM, but don't know whether this is useful or pointless, in telling me how capable the VM is. It seems this should be useful. The VM processed 1 messages in an hour and 10 minutes. The messages went to two lists, one with 25 recipients and the other with 500 recipients. The VM (Linux) peaked at a load average of 4 (2 VCPUs), and was using 800Mb of it's 2G ERAM for IO caching. I could add more resources, but it doesn't look like they would get used. Initially (maybe for the first 15-20 minutes) the Postfix queue had to catch up submitting messages to Mailman. After that, qfiles/in was empty, and the work which remained was to process qfiles/out and SMTP the outbound messages to the relay. I raised SMTP_MAX_RCPTS from 500 to 1000, but this did not seem to make a difference. There are several things going on. 1) Processing mail through Postfix to Mailman. This is almost entirely Postfix. Delivery is piping the message to the mail wrapper which involves very little Mailman processing - only making a queue entry and storing it in qfiles/in. Any tuning would have to be in Postfix, but I don't know what would be applicable beyond ensuring Postfix has enough resources do do the job. 2) Mailman's IncommingRunner picking up the message from qfiles/in, processing in through the pipeline and queuing the result in qfiles/out and qfiles/archive. Also if the list is digestable, the message will be added to the list's digest.mbox and possibly a digest will be triggered on size. 3) Mailman's ArchRunner picking up the message from qfiles/archive and adding it to the list's archive. 4) Mailman's OutgoingRunner picking up the message from qfiles/out and delivering it to the outgoing MTA. SO I may be able to tune Postfix handing off to Mailman, as well as Mailman handing off to the SMTP relay. Any suggestoins here? It seems the major hurdle is in processing the 'out' queue. It is possible to slice OutgoingRunner to provide some parallelism in this process and that may speed things up, but I suspect that a lot of the time is in network communications between OutgoingRunner and the remote Postfix and that slicing OutgoingRunner may not help much, but it would be worth rerunning your benchmark with 2 or 4 outgoing runner slices to see if it helps. Your raising of SMTP_MAX_RCPTS from 500 to 1000 would not have any effect because your larger list had only 500 members so no outgoing message had more than 500 recipients. Even if this were not the case, I don't think raising SMTP_MAX_RCPTS would make much difference. Messages are sent via SMTP transactions which look like MAIL FROM ... reply RCPT TO ... reply RCPT TO ... reply ... repeated for each recipient DATA reply message text reply If SMTP_MAX_RCPTS = 500, and there are a total of 500 recipients, the above is done once with 500 recipients. If SMTP_MAX_RCPTS = 50, the above would be done 10 times with 50 recipients per transaction which would result in 9 additional MAIL FROM, DATA and message text interactions, but I don't think this would add significantly to the processing time. There are some MTA tuning tips in the FAQ http://wiki.list.org/x/AgA3, but some are only applicable to Mailman 2.0 so be careful. The main outgoing MTA performance killer is doing DNS verification on recipient domains during SMTP from Mailman. This should be avoided. -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
Hello, THanks Mark, I appreciate this. MOre below: On Aug 14, 2011, at 11:39 AM, Mark Sapiro wrote: It seems the major hurdle is in processing the 'out' queue. It is possible to slice OutgoingRunner to provide some parallelism in this process and that may speed things up, but I suspect that a lot of the time is in network communications between OutgoingRunner and the remote Postfix and that slicing OutgoingRunner may not help much, but it would be worth rerunning your benchmark with 2 or 4 outgoing runner slices to see if it helps. BY slicing, do you mean setting MAX_DELIVERY_THREADS in mm_cfg.py (restarting Mailman of course)? I did this, with values of 2 and 4, and if it made a difference for my smaller benchmark of 5000 messages, to one list of 25 recipients, it was only seconds of improvement. There are some MTA tuning tips in the FAQ http://wiki.list.org/x/AgA3, but some are only applicable to Mailman 2.0 so be careful. The only thing I can think of which may help, given that Mailman's Postfix is not delivering to subscribers, is to adjust concurrency or backoff settings for the local delivery agent, which is piping messages into Mailman's post script. I'm not sure whether Postfix still uses the backoff algorythm, when using local and pipe though. The main outgoing MTA performance killer is doing DNS verification on recipient domains during SMTP from Mailman. This should be avoided. Using a local DNS cache cut my 5000 messages to a 25 recipient list, from 10 minutes down to 8 1/2 minutes. Even avoiding looking up the same hand full of hosts over and over again, helps. I have to amend my earlier statement about our receiving 68000 posts per day - I was not careful enough when mining the post log; a lot of the posts are Mailman retrying delivery for tempfailed subscribers. So we do not see 68000 distinct posts, but we are doing a lot of redelivery attempts. Apparently we need to tune bounce processing for lists - this can be challenging to get right, and seems to require individual attention per list. I suppose I could have Mailman retry delivery less often, and if we have something like an outage of our own relays, I just trigger a retry by restarting the queue runners. Thanks, Ivan. . -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
On 8/14/2011 1:39 PM, Ivan Fetch wrote: On Aug 14, 2011, at 11:39 AM, Mark Sapiro wrote: It seems the major hurdle is in processing the 'out' queue. It is possible to slice OutgoingRunner to provide some parallelism in this process and that may speed things up, but I suspect that a lot of the time is in network communications between OutgoingRunner and the remote Postfix and that slicing OutgoingRunner may not help much, but it would be worth rerunning your benchmark with 2 or 4 outgoing runner slices to see if it helps. BY slicing, do you mean setting MAX_DELIVERY_THREADS in mm_cfg.py (restarting Mailman of course)? I did this, with values of 2 and 4, and if it made a difference for my smaller benchmark of 5000 messages, to one list of 25 recipients, it was only seconds of improvement. No. Threaded delivery in SMTPDirect.py was an experimental feature in Mailman 2.0. It was never implemented for Mailman 2.1 although the setting and its documentation were not removed from Defaults.py. Setting this in mm_cfg.py has no effect. Any difference would be due to random variation or other factors. What I meant was to put something like try: QRUNNERS.remove(('OutgoingRunner', 1)) QRUNNERS.append(('OutgoingRunner', 2)) except ValueError: pass in mm_cfg.py and restart Mailman. The above will cause Mailman to start two copies of OutgoingRunner with each processing half of the hashed queue space. See the # # Qrunner defaults # section in Defaults.py for more info. [...] I have to amend my earlier statement about our receiving 68000 posts per day - I was not careful enough when mining the post log; a lot of the posts are Mailman retrying delivery for tempfailed subscribers. So we do not see 68000 distinct posts, but we are doing a lot of redelivery attempts. Apparently we need to tune bounce processing for lists - this can be challenging to get right, and seems to require individual attention per list. I suppose I could have Mailman retry delivery less often, and if we have something like an outage of our own relays, I just trigger a retry by restarting the queue runners. Just FYI, bounce processing never sees the retries until such time as Mailman's retry processing gives up on the delivery (default after 5 days). -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
Hello, On Aug 14, 2011, at 4:15 PM, Mark Sapiro wrote: No. Threaded delivery in SMTPDirect.py was an experimental feature in Mailman 2.0. It was never implemented for Mailman 2.1 although the setting and its documentation were not removed from Defaults.py. Setting this in mm_cfg.py has no effect. Any difference would be due to random variation or other factors. What I meant was to put something like try: QRUNNERS.remove(('OutgoingRunner', 1)) QRUNNERS.append(('OutgoingRunner', 2)) except ValueError: pass in mm_cfg.py and restart Mailman. The above will cause Mailman to start two copies of OutgoingRunner with each processing half of the hashed queue space. See the # # Qrunner defaults # section in Defaults.py for more info. Ok, I did this, and verified that more outgoing processes were started (in the qrunnenr log, and with ps). Testing 5000 messages to a list with 25 recipients took: 8 1/2 minutes with 1 outgoing slice 5 1/2 minutes with 2 slices exactly 5 minutes with 4 slices. I noticed that the incoming qrunner was using 10% CPU (according to the pcpu column of ps) even after qfiles/in was empty, and after all 5000 messages were processed. I wonder what the incoming runner is doing - any ideas there? Just FYI, bounce processing never sees the retries until such time as Mailman's retry processing gives up on the delivery (default after 5 days). OK, this just means that a message which is tempfailing will have to get retried for 5 days, before normal bounce processing rules can (potentially) act on it. I suspect a lot of these addresses are our own accounts which are tempfailing because they are disabled, in some sort of transition, or have broken LDAP records - I will look at smtp-failure some more. Thanks, Ivan. . -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
On 8/14/2011 4:25 PM, Ivan Fetch wrote: I noticed that the incoming qrunner was using 10% CPU (according to the pcpu column of ps) even after qfiles/in was empty, and after all 5000 messages were processed. I wonder what the incoming runner is doing - any ideas there? If I am not mistaken, pcpu is an average, not a current value. I.e., it is total CPU time divided by elapsed time since process initiation for the given process. What does %CPU from 'top' tell you? -- Mark Sapiro m...@msapiro.netThe highway is for gamblers, San Francisco Bay Area, Californiabetter use your sense - B. Dylan -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
On 08/14/2011 03:39 PM, Ivan Fetch wrote: There are some MTA tuning tips in the FAQ http://wiki.list.org/x/AgA3, but some are only applicable to Mailman 2.0 so be careful. The majority of the MTA tuning tips that I know of should be applicable to most any mailing list manager, since they are oriented towards helping the MTA better deal with large amounts of outgoing mail, and optimizing certain types of behaviors that are common with most mailing lists. But I'll have to re-fresh my memory of what is written there. The main outgoing MTA performance killer is doing DNS verification on recipient domains during SMTP from Mailman. This should be avoided. Using a local DNS cache cut my 5000 messages to a 25 recipient list, from 10 minutes down to 8 1/2 minutes. Even avoiding looking up the same hand full of hosts over and over again, helps. Generally speaking, if there are any real-time queries being done by your MTA, you want those done against the message as it comes into your mail system the first time -- this includes checking black lists, checking content, or anything else. You want to run a separate instance of your MTA for handling your outbound mail and it should listen only to a special port on the 127.0.0.1 loopback interface where Mailman can speak directly to it, and that special instance should have pretty much all DNS queries and real-time checks turned off. After all, those things should have been done when the message was checked on inbound and shouldn't need to be checked again on outbound. I have to amend my earlier statement about our receiving 68000 posts per day - I was not careful enough when mining the post log; a lot of the posts are Mailman retrying delivery for tempfailed subscribers. So we do not see 68000 distinct posts, but we are doing a lot of redelivery attempts. Apparently we need to tune bounce processing for lists - this can be challenging to get right, and seems to require individual attention per list. I suppose I could have Mailman retry delivery less often, and if we have something like an outage of our own relays, I just trigger a retry by restarting the queue runners. If Mailman is dealing with tempfails, then you've done something wrong. The MTA should be blindly accepting whatever Mailman has to send, and then the MTA should be dealing with tempfails -- it's one step closer to wherever the problem might be, and it's more likely to be tuned for that kind of behaviour. For example, most modern MTAs give you the ability to set up separate queues for given outbound targets, which are kept apart from all the other regular mail being handled. This way you can set up local queues in your MTA that may have different resource handling rules or different retry algorithms, as compared to queues to external sites that might be known for being troublesome. We were doing this kind of thing at AOL back in the mid-90s, and this has only gotten easier since. -- Brad Knowles b...@shub-internet.org LinkedIn Profile: http://tinyurl.com/y8kpxu -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
Re: [Mailman-Users] Mailman throughput
Hi Brad, On Aug 14, 2011, at 8:44 PM, Brad Knowles wrote: The majority of the MTA tuning tips that I know of should be applicable to most any mailing list manager, since they are oriented towards helping the MTA better deal with large amounts of outgoing mail, and optimizing certain types of behaviors that are common with most mailing lists. But I'll have to re-fresh my memory of what is written there. Generally speaking, if there are any real-time queries being done by your MTA, you want those done against the message as it comes into your mail system the first time -- this includes checking black lists, checking content, or anything else. You want to run a separate instance of your MTA for handling your outbound mail and it should listen only to a special port on the 127.0.0.1 loopback interface where Mailman can speak directly to it, and that special instance should have pretty much all DNS queries and real-time checks turned off. After all, those things should have been done when the message was checked on inbound and shouldn't need to be checked again on outbound. Brad, I think we are already accomplishing a lot of this minimalism, since the MTA on the Mailman VM is only accepting the message via SMTP, then handing it off to Mailman via the Postfix aliases. The spam and other checks are done before hand, by another upstream gateway MTA. That gateway then hands mailing list messages off to the Mailman box. If Mailman is dealing with tempfails, then you've done something wrong. The MTA should be blindly accepting whatever Mailman has to send, and then the MTA should be dealing with tempfails -- it's one step closer to wherever the problem might be, and it's more likely to be tuned for that kind of behaviour. For example, most modern MTAs give you the ability to set up separate queues for given outbound targets, which are kept apart from all the other regular mail being handled. This way you can set up local queues in your MTA that may have different resource handling rules or different retry algorithms, as compared to queues to external sites that might be known for being troublesome. We were doing this kind of thing at AOL back in the mid-90s, and this has only gotten easier since. This is true for subscribers which are not part of our organization - the MTA which Mailman relays to accepts the messages, and then deals with any delivery issues. However, accounts for which this MTA is the final destination, will tempfail under certain conditions, like mismatched attributes in an LDAP record, or an issue with the mailstore. For better or worse, we are moving a lot of our mailboxes to mail forwards over the next few months - this will move the rest of these tempfails out of Mailman's SMTP / retry queue, and into the downstream relay (where they belong). Thanks, Ivan. . -- Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org