Gateway Server queues too many mails
Hello, I am running Postfix 2.9.4 (for more than a year now) on CentOS 6.5 x86_64 as a gateway server with postscreen, amavis, spamassassin. The server receives mail from the Internet and forwards (relays) clean mail to the final internal mail server (also running postfix). Today, I am facing the following problem, which I have not faced again: Mail is received, but most of it is queued up on the active queue, without it being delivered. However, some mails do get through. I tried postqueue -f but without success; in the log I see: Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv4 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv6 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: configured 2 IPv4 addresses Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: configured 3 IPv6 addresses I have now more than 1700 messages on the active queue, which is usually empty (we receive low mail volumes). What can I do to resolve the situation? Can I safely restart postfix (e.g. service postfix restart) (without risking losing the queued mails)? Please advise! Thanks in advance, Nick
Re: Gateway Server queues too many mails
Nikolaos Milas: Hello, I am running Postfix 2.9.4 (for more than a year now) on CentOS 6.5 x86_64 as a gateway server with postscreen, amavis, spamassassin. The server receives mail from the Internet and forwards (relays) clean mail to the final internal mail server (also running postfix). Today, I am facing the following problem, which I have not faced again: Mail is received, but most of it is queued up on the active queue, without it being delivered. However, some mails do get through. I tried postqueue -f but without success; in the log I see: Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv4 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv6 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: configured 2 IPv4 addresses Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: configured 3 IPv6 addresses 1) TURN OFF all verbose logging. 2) Run the command in http://www.postfix.org/DEBUG_README.html#logging and report what shows up. 3) If step[ 2 does not point to the problem, review http://www.postfix.org/QSHAPE_README.html I have now more than 1700 messages on the active queue, which is usually empty (we receive low mail volumes). What can I do to resolve the situation? Can I safely restart postfix (e.g. service postfix restart) (without risking losing the queued mails)? Please advise! Wietse
Re: Gateway Server queues too many mails
Thanks Wietse, I am following on this thread, replying to my own sent mail, because your reply is still in the queue... (I read it with pfqueue). I did not see anything suspicious looking for errors/fatals/warnings/panics, except perhaps: Feb 27 05:27:02 mailgw1 postfix/postscreen[16639]: warning: getpeername: Transport endpoint is not connected -- dropping this connection Could this point to a problem? And how should I deal with it? Additional info: qshape shows: T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 2217 11 31 44 105 297 484 829 4160 0 astro.noa.gr 813 3 2 24 39 106 187 268 1840 0 noa.gr 673 2 23 12 35 81 154 257 1090 0 meteo.noa.gr 353 3 1 4 11 60 67 140 67 0 0 space.noa.gr 192 2 2 4 7 40 34 77 26 0 0 gein.noa.gr 165 1 3 0 10 8 37 80 26 0 0 admin.noa.gr 21 0 0 0 3 2 5 7 4 0 0 All destination domains are the internally hosted ones (on our final destination server) to which the gateway should relay all (clean) incoming mail. For your information, here are the postfix processes running (excerpt from ps axjf): 1 5698 5698 5698 ? -1 Ss 0 6:17 /usr/libexec/postfix/master 5698 5707 5698 5698 ? -1 S 89 3:56 \_ qmgr -l -t fifo -u 5698 13360 5698 5698 ? -1 S 89 0:17 \_ anvil -l -t unix -u 5698 16639 16639 16639 ? -1 Ss 89 0:46 \_ postscreen -l -n smtp -t inet -u -s 2 5698 4726 5698 5698 ? -1 S 89 0:01 \_ scache -l -t unix -u 5698 20848 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 22554 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 23736 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 24112 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 24197 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 24744 5698 5698 ? -1 S 89 0:00 \_ dnsblog -z -t unix -u 5698 24956 5698 5698 ? -1 S 89 0:01 \_ trivial-rewrite -n rewrite -t unix -u 5698 26604 26604 26604 ? -1 Ss 89 0:00 \_ verify -l -t unix -u 5698 26647 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26664 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 2 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26726 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26734 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26741 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26754 5698 5698 ? -1 S 89 0:00 \_ cleanup -z -t unix -u 5698 26755 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 26757 5698 5698 ? -1 S 89 0:00 \_ cleanup -z -t unix -u 5698 26760 5698 5698 ? -1 S 89 0:00 \_ smtpd -t pass -u -o stress= 5698 3639 5698 5698 ? -1 S 89 0:00 \_ smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= -o relay_recipient_ 5698 3713 5698 5698 ? -1 S 89 0:00 \_ cleanup -z -t unix -u 5698 3758 5698 5698 ? -1 S 89 0:00 \_ pickup -l -t fifo -u 5698 3780 5698 5698 ? -1 S 89 0:00 \_ lmtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o 5698 3815 5698 5698 ? -1 S 89 0:00 \_ smtpd -n 127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= -o relay_recipient_ 5698 3816 5698 5698 ? -1 S 89 0:00 \_ lmtp -n smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o smtp_send_xforward_command=yes -o 5698 3861 5698 5698 ? -1 S 89 0:00 \_ smtp -n relay -t unix -u -o smtp_fallback_relay= Trying to release a particular mail fails with: [root@mailgw1 ~]# postqueue -v -i 3fZY516W5NzMmGF postqueue: name_mask: ipv4 postqueue: name_mask: ipv6 postqueue: inet_addr_local: configured 2 IPv4 addresses postqueue: inet_addr_local: configured 3 IPv6 addresses postqueue: flush_send_file: queue_id 3fZY516W5NzMmGF postqueue: connect to subsystem public/flush postqueue: send attr request = send_file postqueue: send attr queue_id = 3fZY516W5NzMmGF postqueue: public/flush socket: wanted attribute: status postqueue: input attribute name: status postqueue: input attribute value: 0 postqueue: public/flush socket: wanted attribute: (list terminator) postqueue: input attribute
Re: Gateway Server queues too many mails
Nikolaos Milas: Thanks Wietse, I am following on this thread, replying to my own sent mail, because your reply is still in the queue... (I read it with pfqueue). I did not see anything suspicious looking for errors/fatals/warnings/panics, except perhaps: All MASTER daemon logging is suspect. All ERROR logging is suspect. All FATAL logging is suspect. All PANIC logging is suspect. Please show all master/error/fatal/panic logging. Feb 27 05:27:02 mailgw1 postfix/postscreen[16639]: warning: getpeername: Transport endpoint is not connected -- dropping this connection postscreen is for RECEIVING mail. You have a mail SENDING problem. Therefore, postscreen logging is irrelevant. Postfix logs why it is not delivering mail. Have you looked at the status=deferred logfile records? What do they say? You can anonymize the email addresses but please keep the hostname and IP address info. Wietse
Re: Gateway Server queues too many mails
On 27/2/2014 4:10 μμ, Wietse Venema wrote: All MASTER daemon logging is suspect. All ERROR logging is suspect. All FATAL logging is suspect. All PANIC logging is suspect. Please show all master/error/fatal/panic logging. I had no such log entries (only warnings). I found that amavisd was using too much CPU and I thought something could be wrong with it. I tried restarting amavisd and then all active messages vanished from the active queue. However, CPU issues with amavisd did not end and qshape started again showing messages piling in the active queue; I decided I should also restart clamd daemon to make things clearer and monitor the situation. After that, I am not seeing the problem. Yet, I now have 2120 suspended messages; when running: postqueue -p those entries are indicated as: (delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused) (10024 is an amavisd port) Now that amavis seems to be running correctly, how can I resend immediately those suspended mails? Please advise, Nick
Re: Gateway Server queues too many mails
Nikolaos Milas: Yet, I now have 2120 suspended messages; when running: postqueue -p those entries are indicated as: (delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused) (10024 is an amavisd port) Now that amavis seems to be running correctly, how can I resend immediately those suspended mails? Wait $minimal_backoff_time before postqueue -p. Wietse
Re: Gateway Server queues too many mails
On 27/2/2014 4:40 μμ, Nikolaos Milas wrote: Now that amavis seems to be running correctly, how can I resend immediately those suspended mails? Unfortunately, I am afraid that after I run postqueue -f and messages were moved to the active queue, amavisd again topped CPU at 100% and postfix started piling up again messages. So, I restarted amavisd and messages moved again to the deferred queue. Now, I am thinking of temporarily removing the: content_filter = smtp-amavis:[127.0.0.1]:10024 line from main.cf and *restarting* postfix (or rebooting the server), then run postqueue -f again, at lest to have queued messages delivered. Can I leave the 127.0.0.1:10025 inet n - n - - smtpd ... line in master.cf as is? I think it won't hurt being there, even if amavisd is not running. Please confirm. Can I restart Postfix/server safely for the queue? I need to make sure these messages in the deferred queue do NOT get lost. Nick
Re: Gateway Server queues too many mails
Zitat von Nikolaos Milas nmi...@noa.gr: On 27/2/2014 4:40 μμ, Nikolaos Milas wrote: Now that amavis seems to be running correctly, how can I resend immediately those suspended mails? Unfortunately, I am afraid that after I run postqueue -f and messages were moved to the active queue, amavisd again topped CPU at 100% and postfix started piling up again messages. So, I restarted amavisd and messages moved again to the deferred queue. Now, I am thinking of temporarily removing the: content_filter = smtp-amavis:[127.0.0.1]:10024 You should use a process limit matching the number of amavisd processes to not feed it with too much concurrent smtp connections. Have a look how smtp-amavis is setup in master.cf, if there is no limit set the default (100) applies. This *could* be your problem. Regards Andreas smime.p7s Description: S/MIME Cryptographic Signature
Re: Gateway Server queues too many mails
On 27/2/2014 5:36 μμ, lst_ho...@kwsoft.de wrote: You should use a process limit matching the number of amavisd processes to not feed it with too much concurrent smtp connections. Have a look how smtp-amavis is setup in master.cf, if there is no limit set the default (100) applies. This *could* be your problem. Thank you, Here is what I have in my master.cf: 127.0.0.1:10025 inet n - n - - smtpd -o content_filter= -o local_recipient_maps= -o relay_recipient_maps= -o smtpd_restriction_classes= -o smtpd_delay_reject=no -o smtpd_client_restrictions=permit_mynetworks,reject -o smtpd_helo_restrictions= -o smtpd_sender_restrictions= -o smtpd_recipient_restrictions=permit_mynetworks,reject -o smtpd_data_restrictions=reject_unauth_pipelining -o smtpd_end_of_data_restrictions= -o mynetworks=127.0.0.0/8 -o smtpd_error_sleep_time=0 -o smtpd_soft_error_limit=1001 -o smtpd_hard_error_limit=1000 -o smtpd_client_connection_count_limit=0 -o smtpd_client_connection_rate_limit=0 -o receive_override_options=no_header_body_checks,no_unknown_recipient_checks Also, in /etc/amavisd.conf: $max_servers = 2; ...and I only see two amavisd processes (as expected); but these two take 49,9% of the CPU each. So, any advice would be welcome. One critical question, now, is whether I can stop/restart postfix/server without risking losing deferred mail messages! Thanks, Nick
Re: Gateway Server queues too many mails
On 27/2/2014 5:10 μμ, Nikolaos Milas wrote: Now, I am thinking of temporarily removing the: content_filter = smtp-amavis:[127.0.0.1]:10024 line from main.cf and *restarting* postfix (or rebooting the server), then run postqueue -f again, at least to have queued messages delivered. I am worried that this may not be a solution, because the deferred mails seem to be waiting to be delivered to amavis (127.0.0.1:10024): delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused ...so, even after a reload without the content_filter, these mail messages may still be trying to be delivered to 127.0.0.1:10024. Am I thinking right? If yes, can the deferred messages be re-processed from start (as new incoming messages) so as to avoid delivery through amavisd? How? Please advise. Thanks, Nick
Re: Gateway Server queues too many mails
On 2/27/2014 11:07 AM, Nikolaos Milas wrote: On 27/2/2014 5:10 μμ, Nikolaos Milas wrote: Now, I am thinking of temporarily removing the: content_filter = smtp-amavis:[127.0.0.1]:10024 line from main.cf and *restarting* postfix (or rebooting the server), then run postqueue -f again, at least to have queued messages delivered. I am worried that this may not be a solution, because the deferred mails seem to be waiting to be delivered to amavis (127.0.0.1:10024): delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: Connection refused ...so, even after a reload without the content_filter, these mail messages may still be trying to be delivered to 127.0.0.1:10024. Am I thinking right? If yes, can the deferred messages be re-processed from start (as new incoming messages) so as to avoid delivery through amavisd? How? Please advise. Thanks, Nick After removing the content_filter lines from postfix main.cf, you need to requeue messages that already have the content_filter set. postsuper -r QUEUEID or postsuper -r ALL (note: requeuing a large number of messages is horrible for overall postfix performance. Only requeue when required.) Sounds as if the real problem is you're sending amavisd more mail at a time than your system can handle. This can be especially bad if it's causing the system to swap. Or maybe you've configured postfix to make more connections to amavisd than there are amavisd listeners. This also badly affects performance. The general rule of thumb is 2~5 amavisd processes ($max_servers) per core available, but keep an eye on memory and make sure swapping doesn't occur. Authoritative sizing requires testing throughput on your server at different settings. The postfix master.cf smtp-amavis transport must have the master.cf maxproc column 7 set equal to or less than the amavisd $max_servers. Follow up on the amavis-users list for tuning/sizing help. Probably 90+% of the time and memory used by amavisd-new is really used by SpamAssassin, so tuning SA can have a big impact on performance too. The low-hanging fruit for SA tuning are 1) use a current version. 2) remove third-party rules if they cause a slowdown. 3) SA does a lot of DNS lookups, so fast, local DNS is required. -- Noel Jones
Re: Gateway Server queues too many mails
On 27/2/2014 8:45 μμ, Noel Jones wrote: Sounds as if the real problem is you're sending amavisd more mail at a time than your system can handle. Thank you Noel, I just found the cause: a particular peculiar mail (long, without attachment, containing multiple languages and html character coding) which was sent to a particular user by a particular user group 750 times ! Can I isolate these mails somehow in the deferred or active queue, remove them all at once and blast them? Is there a way to tell postfix: remove from queue all mail messages whose sender is x...@example.com? For some reason, this very mail takes too much time to be scanned by amavisd (or spam-assassin which runs under it): about 3,5 minutes each (i.e 3,5 mins x 750)! During this time, CPU tops 100% making the server suffer, causing active queue to get longer and longer. The server is an enterprise-class VM (under KMS) on clustered hardware, with one virtual CPU (it never had a problem until today - it has to deal with relatively low mail volume): === # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 2 model name : QEMU Virtual CPU version 0.12.5 stepping: 3 cpu MHz : 2000.412 cache size : 4096 KB fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good unfair_spinlock pni cx16 popcnt hypervisor lahf_lm bogomips: 4000.82 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: === Server info during this peak load: === [root@mailgw1 log]# free -m total used free sharedbuffers cached Mem: 4840 3573 1267 0 141 2554 -/+ buffers/cache:877 3963 Swap: 3023 4 3019 [root@mailgw1 log]# iostat Linux 2.6.32-431.3.1.el6.x86_64 (mailgw1.noa.gr) 02/27/2014 _x86_64_(1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 8.090.020.540.400.00 90.95 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn vda 6.2820.82 127.33 86776058 530599580 dm-0 16.3520.58 127.31 85755530 530501728 dm-1 0.01 0.02 0.02 99048 87304 [root@mailgw1 log]# vmstat procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buff cache si sobibo in cs us sy id wa st 2 0 4296 1299284 145172 26157640010644 4 8 1 91 0 0 [root@mailgw1 log]# mpstat 3 Linux 2.6.32-431.3.1.el6.x86_64 (mailgw1.noa.gr) 02/27/2014 _x86_64_(1 CPU) 09:27:00 PM CPU%usr %nice%sys %iowait%irq %soft %steal %guest %idle 09:27:03 PM all 100.000.000.000.000.00 0.000.00 0.000.00 09:27:06 PM all 99.670.000.330.000.00 0.000.00 0.000.00 09:27:09 PM all 100.000.000.000.000.00 0.000.00 0.000.00 ... === Thanks, Nick
Re: Gateway Server queues too many mails
On 27/2/2014 10:09 μμ, Nikolaos Milas wrote: Can I isolate these mails somehow in the deferred or active queue, remove them all at once and blast them? Is there a way to tell postfix: remove from queue all mail messages whose sender is x...@example.com? With a bit of googling, I found the following command worked for me: mailq | tail -n +2 | grep -v '^ *(' | \ gawk 'BEGIN {RS = } /@example\.com/ {print $1}' | \ tr -d '*!' | sudo postsuper -d - Ref.: http://keithscode.com/tutorials/linux/7-delete-a-group-of-messages-from-the-postfix-queue.html So, I removed these damned messages and blacklisted the user group. (If someone wants to subscribe, they may use their gmail :-) .) Case closed, I think. I might monitor CPU usage in the near future and see if it justifies more virtual CPU. All the best, Nick
Re: Gateway Server queues too many mails
On 2/27/2014 2:09 PM, Nikolaos Milas wrote: On 27/2/2014 8:45 μμ, Noel Jones wrote: Sounds as if the real problem is you're sending amavisd more mail at a time than your system can handle. Thank you Noel, I just found the cause: a particular peculiar mail (long, without attachment, containing multiple languages and html character coding) which was sent to a particular user by a particular user group 750 times ! Can I isolate these mails somehow in the deferred or active queue, remove them all at once and blast them? Is there a way to tell postfix: remove from queue all mail messages whose sender is x...@example.com? You need this script: http://www.arschkrebs.de/postfix/scripts/delete-from-mailq Other interesting scripts in http://www.arschkrebs.de/postfix/scripts/ -- Noel Jones