Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

Hello,

I am running Postfix 2.9.4 (for more than a year now) on CentOS 6.5 
x86_64 as a gateway server with postscreen, amavis, spamassassin. The 
server receives mail from the Internet and forwards (relays) clean mail 
to the final internal mail server (also running postfix).


Today, I am facing the following problem, which I have not faced again:

Mail is received, but most of it is queued up on the active queue, 
without it being delivered. However, some mails do get through.


I tried postqueue -f but without success; in the log I see:

Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv4
Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv6
Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: 
configured 2 IPv4 addresses
Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: 
configured 3 IPv6 addresses


 I have now more than 1700 messages on the active queue, which is 
usually empty (we receive low mail volumes).


What can I do to resolve the situation? Can I safely restart postfix 
(e.g. service postfix restart) (without risking losing the queued 
mails)? Please advise!


Thanks in advance,
Nick


Re: Gateway Server queues too many mails

2014-02-27 Thread Wietse Venema
Nikolaos Milas:
 Hello,
 
 I am running Postfix 2.9.4 (for more than a year now) on CentOS 6.5 
 x86_64 as a gateway server with postscreen, amavis, spamassassin. The 
 server receives mail from the Internet and forwards (relays) clean mail 
 to the final internal mail server (also running postfix).
 
 Today, I am facing the following problem, which I have not faced again:
 
 Mail is received, but most of it is queued up on the active queue, 
 without it being delivered. However, some mails do get through.
 
 I tried postqueue -f but without success; in the log I see:
 
 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv4
 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: name_mask: ipv6
 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: 
 configured 2 IPv4 addresses
 Feb 27 14:06:08 mailgw1 postfix/postqueue[26088]: inet_addr_local: 
 configured 3 IPv6 addresses

1) TURN OFF all verbose logging.
2) Run the command in http://www.postfix.org/DEBUG_README.html#logging
   and report what shows up.
3) If step[ 2 does not point to the problem, review
   http://www.postfix.org/QSHAPE_README.html

   I have now more than 1700 messages on the active queue, which is 
 usually empty (we receive low mail volumes).
 
 What can I do to resolve the situation? Can I safely restart postfix 
 (e.g. service postfix restart) (without risking losing the queued 
 mails)? Please advise!

Wietse


Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

Thanks Wietse,

I am following on this thread, replying to my own sent mail, because 
your reply is still in the queue... (I read it with pfqueue).


I did not see anything suspicious looking for 
errors/fatals/warnings/panics, except perhaps:


Feb 27 05:27:02 mailgw1 postfix/postscreen[16639]: warning: getpeername: 
Transport endpoint is not connected -- dropping this connection


Could this point to a problem? And how should I deal with it?

Additional info:

qshape shows:

   T  5 10 20  40  80 160 320 640 
1280 1280+
  TOTAL 2217 11 31 44 105 297 484 829 
4160 0
   astro.noa.gr  813  3  2 24  39 106 187 268 
1840 0
 noa.gr  673  2 23 12  35  81 154 257 
1090 0
   meteo.noa.gr  353  3  1  4  11  60  67 140 67
0 0
   space.noa.gr  192  2  2  4   7  40  34  77 26
0 0
gein.noa.gr  165  1  3  0  10   8  37  80 26
0 0
   admin.noa.gr   21  0  0  0   3   2   5 7   4
0 0


All destination domains are the internally hosted ones (on our final 
destination server) to which the gateway should relay all (clean) 
incoming mail.


For your information, here are the postfix processes running (excerpt 
from ps axjf):


1  5698  5698  5698 ?   -1 Ss   0   6:17 
/usr/libexec/postfix/master
 5698  5707  5698  5698 ?   -1 S   89   3:56  \_ qmgr -l -t 
fifo -u
 5698 13360  5698  5698 ?   -1 S   89   0:17  \_ anvil -l 
-t unix -u
 5698 16639 16639 16639 ?   -1 Ss  89   0:46  \_ postscreen 
-l -n smtp -t inet -u -s 2
 5698  4726  5698  5698 ?   -1 S   89   0:01  \_ scache -l 
-t unix -u
 5698 20848  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 22554  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 23736  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 24112  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 24197  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 24744  5698  5698 ?   -1 S   89   0:00  \_ dnsblog -z 
-t unix -u
 5698 24956  5698  5698 ?   -1 S   89   0:01  \_ 
trivial-rewrite -n rewrite -t unix -u
 5698 26604 26604 26604 ?   -1 Ss  89   0:00  \_ verify -l 
-t unix -u
 5698 26647  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26664  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 2  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26726  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26734  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26741  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26754  5698  5698 ?   -1 S   89   0:00  \_ cleanup -z 
-t unix -u
 5698 26755  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698 26757  5698  5698 ?   -1 S   89   0:00  \_ cleanup -z 
-t unix -u
 5698 26760  5698  5698 ?   -1 S   89   0:00  \_ smtpd -t 
pass -u -o stress=
 5698  3639  5698  5698 ?   -1 S   89   0:00  \_ smtpd -n 
127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= 
-o relay_recipient_
 5698  3713  5698  5698 ?   -1 S   89   0:00  \_ cleanup -z 
-t unix -u
 5698  3758  5698  5698 ?   -1 S   89   0:00  \_ pickup -l 
-t fifo -u
 5698  3780  5698  5698 ?   -1 S   89   0:00  \_ lmtp -n 
smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o 
smtp_send_xforward_command=yes -o
 5698  3815  5698  5698 ?   -1 S   89   0:00  \_ smtpd -n 
127.0.0.1:10025 -t inet -u -o content_filter= -o local_recipient_maps= 
-o relay_recipient_
 5698  3816  5698  5698 ?   -1 S   89   0:00  \_ lmtp -n 
smtp-amavis -t unix -u -o smtp_data_done_timeout=1200 -o 
smtp_send_xforward_command=yes -o
 5698  3861  5698  5698 ?   -1 S   89   0:00  \_ smtp -n 
relay -t unix -u -o smtp_fallback_relay=


Trying to release a particular mail fails with:

[root@mailgw1 ~]# postqueue -v -i 3fZY516W5NzMmGF
postqueue: name_mask: ipv4
postqueue: name_mask: ipv6
postqueue: inet_addr_local: configured 2 IPv4 addresses
postqueue: inet_addr_local: configured 3 IPv6 addresses
postqueue: flush_send_file: queue_id 3fZY516W5NzMmGF
postqueue: connect to subsystem public/flush
postqueue: send attr request = send_file
postqueue: send attr queue_id = 3fZY516W5NzMmGF
postqueue: public/flush socket: wanted attribute: status
postqueue: input attribute name: status
postqueue: input attribute value: 0
postqueue: public/flush socket: wanted attribute: (list terminator)
postqueue: input attribute 

Re: Gateway Server queues too many mails

2014-02-27 Thread Wietse Venema
Nikolaos Milas:
 Thanks Wietse,
 
 I am following on this thread, replying to my own sent mail, because 
 your reply is still in the queue... (I read it with pfqueue).
 
 I did not see anything suspicious looking for 
 errors/fatals/warnings/panics, except perhaps:

All MASTER daemon logging is suspect.

All ERROR logging is suspect.

All FATAL logging is suspect.

All PANIC logging is suspect.

Please show all master/error/fatal/panic logging.

 Feb 27 05:27:02 mailgw1 postfix/postscreen[16639]: warning: getpeername: 
 Transport endpoint is not connected -- dropping this connection

postscreen is for RECEIVING mail. You have a mail SENDING problem.
Therefore, postscreen logging is irrelevant.

Postfix logs why it is not delivering mail. Have you looked at the
status=deferred logfile records? What do they say? You can anonymize
the email addresses but please keep the hostname and IP address info.

Wietse


Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 4:10 μμ, Wietse Venema wrote:


All MASTER daemon logging is suspect.

All ERROR logging is suspect.

All FATAL logging is suspect.

All PANIC logging is suspect.

Please show all master/error/fatal/panic logging.


I had no such log entries (only warnings).

I found that amavisd was using too much CPU and I thought something 
could be wrong with it. I tried restarting amavisd and then all active 
messages vanished from the active queue.


However, CPU issues with amavisd did not end and qshape started again 
showing messages piling in the active queue; I decided I should also 
restart clamd daemon to make things clearer and monitor the situation. 
After that, I am not seeing the problem.


Yet, I now have 2120 suspended messages; when running: postqueue -p 
those entries are indicated as:


(delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: 
Connection refused)


(10024 is an amavisd port)

Now that amavis seems to be running correctly, how can I resend 
immediately those suspended mails?


Please advise,
Nick


Re: Gateway Server queues too many mails

2014-02-27 Thread Wietse Venema
Nikolaos Milas:
 Yet, I now have 2120 suspended messages; when running: postqueue -p 
 those entries are indicated as:
 
 (delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: 
 Connection refused)
 
 (10024 is an amavisd port)
 
 Now that amavis seems to be running correctly, how can I resend 
 immediately those suspended mails?

Wait $minimal_backoff_time before postqueue -p.

Wietse


Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 4:40 μμ, Nikolaos Milas wrote:

Now that amavis seems to be running correctly, how can I resend 
immediately those suspended mails? 


Unfortunately, I am afraid that after I run postqueue -f and messages 
were moved to the active queue, amavisd again topped CPU at 100% and 
postfix started piling up again messages.


So, I restarted amavisd and messages moved again to the deferred queue.

Now,  I am thinking of temporarily removing the:

   content_filter = smtp-amavis:[127.0.0.1]:10024

line from main.cf and *restarting* postfix (or rebooting the server), 
then run postqueue -f again, at lest to have queued messages delivered.


Can I leave the

   127.0.0.1:10025 inet n  -   n   -   - smtpd
   ...

line in master.cf as is? I think it won't hurt being there, even if 
amavisd is not running. Please confirm.


Can I restart Postfix/server safely for the queue? I need to make sure 
these messages in the deferred queue do NOT get lost.


Nick


Re: Gateway Server queues too many mails

2014-02-27 Thread lst_hoe02


Zitat von Nikolaos Milas nmi...@noa.gr:


On 27/2/2014 4:40 μμ, Nikolaos Milas wrote:

Now that amavis seems to be running correctly, how can I resend  
immediately those suspended mails?


Unfortunately, I am afraid that after I run postqueue -f and  
messages were moved to the active queue, amavisd again topped CPU at  
100% and postfix started piling up again messages.


So, I restarted amavisd and messages moved again to the deferred queue.

Now,  I am thinking of temporarily removing the:

   content_filter = smtp-amavis:[127.0.0.1]:10024



You should use a process limit matching the number of amavisd  
processes to not feed it with too much concurrent smtp connections.  
Have a look how smtp-amavis is setup in master.cf, if there is no  
limit set the default (100) applies. This *could* be your problem.


Regards

Andreas




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 5:36 μμ, lst_ho...@kwsoft.de wrote:

You should use a process limit matching the number of amavisd 
processes to not feed it with too much concurrent smtp connections. 
Have a look how smtp-amavis is setup in master.cf, if there is no 
limit set the default (100) applies. This *could* be your problem.


Thank you,

Here is what I have in my master.cf:

127.0.0.1:10025 inet n -   n   -   -   smtpd
-o content_filter=
-o local_recipient_maps=
-o relay_recipient_maps=
-o smtpd_restriction_classes=
-o smtpd_delay_reject=no
-o smtpd_client_restrictions=permit_mynetworks,reject
-o smtpd_helo_restrictions=
-o smtpd_sender_restrictions=
-o smtpd_recipient_restrictions=permit_mynetworks,reject
-o smtpd_data_restrictions=reject_unauth_pipelining
-o smtpd_end_of_data_restrictions=
-o mynetworks=127.0.0.0/8
-o smtpd_error_sleep_time=0
-o smtpd_soft_error_limit=1001
-o smtpd_hard_error_limit=1000
-o smtpd_client_connection_count_limit=0
-o smtpd_client_connection_rate_limit=0
-o 
receive_override_options=no_header_body_checks,no_unknown_recipient_checks


Also, in /etc/amavisd.conf:

   $max_servers = 2;

...and I only see two amavisd processes (as expected); but these two 
take 49,9% of the CPU each.


So, any advice would be welcome.

One critical question, now, is whether I can stop/restart postfix/server 
without risking losing deferred mail messages!


Thanks,
Nick



Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 5:10 μμ, Nikolaos Milas wrote:


Now,  I am thinking of temporarily removing the:

   content_filter = smtp-amavis:[127.0.0.1]:10024

line from main.cf and *restarting* postfix (or rebooting the server), 
then run postqueue -f again, at least to have queued messages 
delivered. 


I am worried that this may not be a solution, because the deferred mails 
seem to be waiting to be delivered to amavis (127.0.0.1:10024):


delivery temporarily suspended: connect to 127.0.0.1[127.0.0.1]:10024: 
Connection refused


...so, even after a reload without the content_filter, these mail 
messages may still be trying to be delivered to 127.0.0.1:10024.


Am I thinking right?

If yes, can the deferred messages be re-processed from start (as new 
incoming messages) so as to avoid delivery through amavisd? How?


Please advise.

Thanks,
Nick


Re: Gateway Server queues too many mails

2014-02-27 Thread Noel Jones
On 2/27/2014 11:07 AM, Nikolaos Milas wrote:
 On 27/2/2014 5:10 μμ, Nikolaos Milas wrote:
 
 Now,  I am thinking of temporarily removing the:

content_filter = smtp-amavis:[127.0.0.1]:10024

 line from main.cf and *restarting* postfix (or rebooting the
 server), then run postqueue -f again, at least to have queued
 messages delivered. 
 
 I am worried that this may not be a solution, because the deferred
 mails seem to be waiting to be delivered to amavis (127.0.0.1:10024):
 
 delivery temporarily suspended: connect to
 127.0.0.1[127.0.0.1]:10024: Connection refused
 
 ...so, even after a reload without the content_filter, these mail
 messages may still be trying to be delivered to 127.0.0.1:10024.
 
 Am I thinking right?
 
 If yes, can the deferred messages be re-processed from start (as new
 incoming messages) so as to avoid delivery through amavisd? How?
 
 Please advise.
 
 Thanks,
 Nick


After removing the content_filter lines from postfix main.cf, you
need to requeue messages that already have the content_filter set.

postsuper -r QUEUEID
  or
postsuper -r ALL

(note: requeuing a large number of messages is horrible for overall
postfix performance. Only requeue when required.)

Sounds as if the real problem is you're sending amavisd more mail at
a time than your system can handle. This can be especially bad if
it's causing the system to swap.

Or maybe you've configured postfix to make more connections to
amavisd than there are amavisd listeners. This also badly affects
performance.

The general rule of thumb is 2~5 amavisd processes ($max_servers)
per core available, but keep an eye on memory and make sure swapping
doesn't occur. Authoritative sizing requires testing throughput on
your server at different settings. The postfix master.cf smtp-amavis
transport must have the master.cf maxproc column 7 set equal to or
less than the amavisd $max_servers.

Follow up on the amavis-users list for tuning/sizing help.

Probably 90+% of the time and memory used by amavisd-new is really
used by SpamAssassin, so tuning SA can have a big impact on
performance too.  The low-hanging fruit for SA tuning are 1) use a
current version. 2) remove third-party rules if they cause a
slowdown. 3) SA does a lot of DNS lookups, so fast, local DNS is
required.


  -- Noel Jones


Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 8:45 μμ, Noel Jones wrote:


Sounds as if the real problem is you're sending amavisd more mail at
a time than your system can handle.


Thank you Noel,

I just found the cause: a particular peculiar mail (long, without 
attachment, containing multiple languages and html character coding) 
which was sent to a particular user by a particular user group 750 times !


Can I isolate these mails somehow in the deferred or active queue, 
remove them all at once and blast them? Is there a way to tell postfix: 
remove from queue all mail messages whose sender is x...@example.com?


For some reason, this very mail takes too much time to be scanned by 
amavisd (or spam-assassin which runs under it): about 3,5 minutes each 
(i.e 3,5 mins x 750)! During this time, CPU tops 100% making the server 
suffer, causing active queue to get longer and longer.


The server is an enterprise-class VM (under KMS) on clustered hardware, 
with one virtual CPU (it never had a problem until today - it has to 
deal with relatively low mail volume):


===

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 2
model name  : QEMU Virtual CPU version 0.12.5
stepping: 3
cpu MHz : 2000.412
cache size  : 4096 KB
fpu : yes
fpu_exception   : yes
cpuid level : 4
wp  : yes
flags   : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm up rep_good 
unfair_spinlock pni cx16 popcnt hypervisor lahf_lm

bogomips: 4000.82
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

===

Server info during this peak load:
===

[root@mailgw1 log]# free -m
 total   used   free sharedbuffers cached
Mem:  4840   3573   1267  0 141   2554
-/+ buffers/cache:877   3963
Swap: 3023  4   3019


[root@mailgw1 log]# iostat
Linux 2.6.32-431.3.1.el6.x86_64 (mailgw1.noa.gr) 02/27/2014  
_x86_64_(1 CPU)


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   8.090.020.540.400.00   90.95

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read Blk_wrtn
vda   6.2820.82   127.33   86776058 530599580
dm-0 16.3520.58   127.31   85755530 530501728
dm-1  0.01 0.02 0.02  99048 87304

[root@mailgw1 log]# vmstat
procs ---memory-- ---swap-- -io --system-- 
-cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa st
 2  0   4296 1299284 145172 26157640010644 4  8  1 
91  0  0



[root@mailgw1 log]# mpstat 3
Linux 2.6.32-431.3.1.el6.x86_64 (mailgw1.noa.gr) 02/27/2014  
_x86_64_(1 CPU)


09:27:00 PM  CPU%usr   %nice%sys %iowait%irq   %soft %steal  
%guest   %idle
09:27:03 PM  all  100.000.000.000.000.00 0.000.00
0.000.00
09:27:06 PM  all   99.670.000.330.000.00 0.000.00
0.000.00
09:27:09 PM  all  100.000.000.000.000.00 0.000.00
0.000.00

...

===

Thanks,
Nick




Re: Gateway Server queues too many mails

2014-02-27 Thread Nikolaos Milas

On 27/2/2014 10:09 μμ, Nikolaos Milas wrote:

Can I isolate these mails somehow in the deferred or active queue, 
remove them all at once and blast them? Is there a way to tell 
postfix: remove from queue all mail messages whose sender is 
x...@example.com? 


With a bit of googling, I found the following command worked for me:

   mailq | tail -n +2 | grep -v '^ *(' | \
   gawk 'BEGIN {RS = } /@example\.com/ {print $1}' | \
   tr -d '*!' | sudo postsuper -d -

Ref.: 
http://keithscode.com/tutorials/linux/7-delete-a-group-of-messages-from-the-postfix-queue.html


So, I removed these damned messages and blacklisted the user group. (If 
someone wants to subscribe, they may use their gmail :-) .)


Case closed, I think. I might monitor CPU usage in the near future and 
see if it justifies more virtual CPU.


All the best,
Nick


Re: Gateway Server queues too many mails

2014-02-27 Thread Noel Jones
On 2/27/2014 2:09 PM, Nikolaos Milas wrote:
 On 27/2/2014 8:45 μμ, Noel Jones wrote:
 
 Sounds as if the real problem is you're sending amavisd more mail at
 a time than your system can handle.
 
 Thank you Noel,
 
 I just found the cause: a particular peculiar mail (long, without
 attachment, containing multiple languages and html character coding)
 which was sent to a particular user by a particular user group 750
 times !
 
 Can I isolate these mails somehow in the deferred or active queue,
 remove them all at once and blast them? Is there a way to tell
 postfix: remove from queue all mail messages whose sender is
 x...@example.com?

You need this script:
http://www.arschkrebs.de/postfix/scripts/delete-from-mailq

Other interesting scripts in
http://www.arschkrebs.de/postfix/scripts/




  -- Noel Jones