Re: Postfix using all CPU after nightly mail submission

2018-02-21 Thread Wietse Venema
Zach Sheppard:
> I did as you suggested and opened a console on the VMWare host, did a tail
> of the mail log, and it sent mail for a good 5-10 minutes before finally
> becoming unresponsive. I tried to Ctrl-C out of tail, nothing. I've done

Thanks for confirming that the ENTIRE SYSTEM LOCKS UP. Evidently
something cannot handle the load of system calls that Postfix makes.
That should be enough cause to look for a better system. I do not
expect that the problem is with VMware itself, but I cannot exclude
that it is caused by some conservative configuration.

Wietse


Re: Postfix using all CPU after nightly mail submission

2018-02-21 Thread MK
Was seeing similar behaviour under high tcp load in other applications 
(unrelated to postfix).
It’s possible you’re running into this bug. Try to Set vmxnet3.rev.30=FALSE 
under the VMs properties and reboot
More info here:191201 – Randomly freezes due to VMXNET3 
  
|  
|   |  
191201 – Randomly freezes due to VMXNET3
 

  |   |

  |

 



Sent from Yahoo Mail for iPhone


On Wednesday, February 21, 2018, 8:30 AM, Zach Sheppard 
 wrote:

Hi Wietse,
I limited my postfix installation to default_process_limit 5, 4, 3, 2, and even 
1, and still saw the same effects. I am thinking it might be either my opendkim 
milter (which applies the DKIM signature for each mail) or SASL as these are 
the only other processes on the server. Are you aware of any issues with either 
related to I/O? I have not seen any configuration settings for opendkim to do 
any performance throttling.
I am still dumbfounded how this continues to occur. I am not sending mail in 
large quantity - maybe 7,000-8,000 total - just in a short amount of time. The 
I/O shouldn't be THAT high... at least not to leave the server unresponsive... 
the mail client connects to my server every evening (around midnight) and sends 
mails in a burst fashion within an hour or so.
I did as you suggested and opened a console on the VMWare host, did a tail of 
the mail log, and it sent mail for a good 5-10 minutes before finally becoming 
unresponsive. I tried to Ctrl-C out of tail, nothing. I've done the same 
monitoring with top and still see no culprit for the sudden halt. I check 
syslog and other logs on the server and see no crashes or panics.
Any other ideas what might be causing this? Further debugging I can do?
Thanks
On Fri, Jan 19, 2018 at 2:26 PM, Wietse Venema  wrote:

Zach Sheppard:
> Wietse:
>
> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.
>
> I'm not able to determine what program is consuming the CPU because I can't
> login to the console when this occurs. The only way I can recover the
> machine is by forcibly powering off.

I suspect that heavy I/O from Postfix and syslog is too much for
your VM.

To diagnose the problem, run screen(1) on a stable machine, and
then open a login session into the VM while it still responsive.
Then come back to that screen session when things go bad. You're
likely to find that when the VM is very slow, all time is spent in
the VM's kernel, and the host's VMM.

Note that VMs, while fine for CPU-bound jobs, can introduce serious
CPU overhead for things that do massive amounts of I/O like Postfix
plus syslog.

If you can't get a better VM, you can reduce the impact from a
'large' mailing by reducing the number of concurrent Postfix SMTP
server and client processes.

# postconf default_process_limit=10
# postfix reload

        Wietse



This message may contain confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.




RE: Postfix using all CPU after nightly mail submission

2018-02-21 Thread Fazzina, Angelo
Hi Zach, my postfix box is on Vmware too….Did the folks that manage VmWare see 
any oddity? No alarms ? No performance spikes ?
Also I use Zabbix for monitoring so I get email warnings when certain 
thresholds are exceeded.

Maybe the backend VMware storage is having an issue ?  shooting from the hip 
here…
Good Luck.
-ALF
P.S. not sure if list allows including pics from Win10 snipping tool.  ;-)

It doesn’t
http://tinypic.com/r/25hhh68/9



-ANGELO FAZZINA

ITS Service Manager:
Spam and Virus Prevention
Mass Mailing
G Suite/Gmail

ang...@uconn.edu
University of Connecticut,  ITS, SSG, Server Systems
860-486-9075

From: owner-postfix-us...@postfix.org [mailto:owner-postfix-us...@postfix.org] 
On Behalf Of Zach Sheppard
Sent: Wednesday, February 21, 2018 8:31 AM
To: Postfix users <postfix-users@postfix.org>
Subject: Re: Postfix using all CPU after nightly mail submission

Hi Wietse,

I limited my postfix installation to default_process_limit 5, 4, 3, 2, and even 
1, and still saw the same effects. I am thinking it might be either my opendkim 
milter (which applies the DKIM signature for each mail) or SASL as these are 
the only other processes on the server. Are you aware of any issues with either 
related to I/O? I have not seen any configuration settings for opendkim to do 
any performance throttling.

I am still dumbfounded how this continues to occur. I am not sending mail in 
large quantity - maybe 7,000-8,000 total - just in a short amount of time. The 
I/O shouldn't be THAT high... at least not to leave the server unresponsive... 
the mail client connects to my server every evening (around midnight) and sends 
mails in a burst fashion within an hour or so.

I did as you suggested and opened a console on the VMWare host, did a tail of 
the mail log, and it sent mail for a good 5-10 minutes before finally becoming 
unresponsive. I tried to Ctrl-C out of tail, nothing. I've done the same 
monitoring with top and still see no culprit for the sudden halt. I check 
syslog and other logs on the server and see no crashes or panics.

Any other ideas what might be causing this? Further debugging I can do?

Thanks

On Fri, Jan 19, 2018 at 2:26 PM, Wietse Venema 
<wie...@porcupine.org<mailto:wie...@porcupine.org>> wrote:
Zach Sheppard:
> Wietse:
>
> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.
>
> I'm not able to determine what program is consuming the CPU because I can't
> login to the console when this occurs. The only way I can recover the
> machine is by forcibly powering off.

I suspect that heavy I/O from Postfix and syslog is too much for
your VM.

To diagnose the problem, run screen(1) on a stable machine, and
then open a login session into the VM while it still responsive.
Then come back to that screen session when things go bad. You're
likely to find that when the VM is very slow, all time is spent in
the VM's kernel, and the host's VMM.

Note that VMs, while fine for CPU-bound jobs, can introduce serious
CPU overhead for things that do massive amounts of I/O like Postfix
plus syslog.

If you can't get a better VM, you can reduce the impact from a
'large' mailing by reducing the number of concurrent Postfix SMTP
server and client processes.

# postconf default_process_limit=10
# postfix reload

Wietse


This message may contain confidential information and is intended only for the 
individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.


Re: Postfix using all CPU after nightly mail submission

2018-02-21 Thread Zach Sheppard
Hi Wietse,

I limited my postfix installation to default_process_limit 5, 4, 3, 2, and
even 1, and still saw the same effects. I am thinking it might be either my
opendkim milter (which applies the DKIM signature for each mail) or SASL as
these are the only other processes on the server. Are you aware of any
issues with either related to I/O? I have not seen any configuration
settings for opendkim to do any performance throttling.

I am still dumbfounded how this continues to occur. I am not sending mail
in large quantity - maybe 7,000-8,000 total - just in a short amount of
time. The I/O shouldn't be THAT high... at least not to leave the server
unresponsive... the mail client connects to my server every evening (around
midnight) and sends mails in a burst fashion within an hour or so.

I did as you suggested and opened a console on the VMWare host, did a tail
of the mail log, and it sent mail for a good 5-10 minutes before finally
becoming unresponsive. I tried to Ctrl-C out of tail, nothing. I've done
the same monitoring with top and still see no culprit for the sudden halt.
I check syslog and other logs on the server and see no crashes or panics.

Any other ideas what might be causing this? Further debugging I can do?

Thanks

On Fri, Jan 19, 2018 at 2:26 PM, Wietse Venema  wrote:

> Zach Sheppard:
> > Wietse:
> >
> > I have not made any changes to rsyslog.conf. All it does it redirect all
> > mail log messages to one log in /var/log/mail which I rotate with a cron
> > script nightly. However, I do agree that it really could be the only
> other
> > process that could be hanging the server.
> >
> > I'm not able to determine what program is consuming the CPU because I
> can't
> > login to the console when this occurs. The only way I can recover the
> > machine is by forcibly powering off.
>
> I suspect that heavy I/O from Postfix and syslog is too much for
> your VM.
>
> To diagnose the problem, run screen(1) on a stable machine, and
> then open a login session into the VM while it still responsive.
> Then come back to that screen session when things go bad. You're
> likely to find that when the VM is very slow, all time is spent in
> the VM's kernel, and the host's VMM.
>
> Note that VMs, while fine for CPU-bound jobs, can introduce serious
> CPU overhead for things that do massive amounts of I/O like Postfix
> plus syslog.
>
> If you can't get a better VM, you can reduce the impact from a
> 'large' mailing by reducing the number of concurrent Postfix SMTP
> server and client processes.
>
> # postconf default_process_limit=10
> # postfix reload
>
> Wietse
>

-- 
This message may contain confidential information and is intended only for 
the individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.


Re: Postfix using all CPU after nightly mail submission

2018-01-20 Thread Zach Sheppard
Hi Viktor:

I configured that internally generated mail to be filtered through our DKIM
milter. This was to allow deliveries of error messages from the server to
my inbox. However, I see above how this is not ideal since that can just be
delivered to the root user inbox. I have turned off that setting.

After making several tweaks from suggestions in this thread I saw much
better performance last night.

On Fri, Jan 19, 2018 at 2:25 PM, Viktor Dukhovni  wrote:

> On Fri, Jan 19, 2018 at 02:04:05PM -0500, Zach Sheppard wrote:
>
> > I have not made any changes to rsyslog.conf. All it does it redirect all
> > mail log messages to one log in /var/log/mail which I rotate with a cron
> > script nightly. However, I do agree that it really could be the only
> other
> > process that could be hanging the server.
>
> Your milter could be another culprit, and you've unwisely configured
> filtering of internally generated mail, including notices about
> problems.  So the first thing is to undo the filtering of internally
> generated mail.  See also my other comments about your config.
>
> As for syslog, for MTAs I prefer syslog-ng, to rsyslog, and make
> sure that /dev/log is configured as a "dgram" not "stream" socket.
> Also make sure that log writes are not synchronous.
>
> --
> Viktor.
>

-- 
This message may contain confidential information and is intended only for 
the individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.


Re: Postfix using all CPU after nightly mail submission

2018-01-20 Thread Ralf Hildebrandt
> > Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: 
> > from=<...@oconee.k12.sc.us>, size=2408, nrcpt=1 (queue 
> > active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> > Jan 15 09:31:40 mailrelay opendkim[668]: OpenDKIM Filter v2.11.0 starting 
> > (args: -x /etc/opendkim.conf)
> 
> What you see as "^@" is how ASCII NUL (the zero byte) is displayed
> by "more", "less", "vi", ...   It appears that the log file has either
> lots of NULs written to it, or perhaps has a "hole" as a result of
> truncation of the log file while it was still being written by the
> syslog daemon.  Perhaps incorrect log rotation...
> 
> As for high CPU, that's a bit hard to explain without further
> information, which is difficult to obtain with a truncated log.
> Postfix does not normally bring systems to their knees, its
> resource limits are specifically designed to avoid that.

I've seen similar issues. Machines executing one job - which either
goes hogging the CPU or using all memory, then either the OOM killer
goes mad or the whole machine panics, leaving the filesystem in an
inconsistent state. Often leaving zeros in the log.

-- 
[*] sys4 AG

https://sys4.de, +49 (89) 30 90 46 64
Schleißheimer Straße 26/MG, 80333 München
   
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Wietse Venema
Zach Sheppard:
> Wietse:
> 
> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.
> 
> I'm not able to determine what program is consuming the CPU because I can't
> login to the console when this occurs. The only way I can recover the
> machine is by forcibly powering off.

I suspect that heavy I/O from Postfix and syslog is too much for
your VM.

To diagnose the problem, run screen(1) on a stable machine, and
then open a login session into the VM while it still responsive.
Then come back to that screen session when things go bad. You're
likely to find that when the VM is very slow, all time is spent in
the VM's kernel, and the host's VMM.

Note that VMs, while fine for CPU-bound jobs, can introduce serious
CPU overhead for things that do massive amounts of I/O like Postfix
plus syslog.

If you can't get a better VM, you can reduce the impact from a
'large' mailing by reducing the number of concurrent Postfix SMTP
server and client processes.

# postconf default_process_limit=10 
# postfix reload

Wietse


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Viktor Dukhovni
On Fri, Jan 19, 2018 at 02:04:05PM -0500, Zach Sheppard wrote:

> I have not made any changes to rsyslog.conf. All it does it redirect all
> mail log messages to one log in /var/log/mail which I rotate with a cron
> script nightly. However, I do agree that it really could be the only other
> process that could be hanging the server.

Your milter could be another culprit, and you've unwisely configured
filtering of internally generated mail, including notices about
problems.  So the first thing is to undo the filtering of internally
generated mail.  See also my other comments about your config.

As for syslog, for MTAs I prefer syslog-ng, to rsyslog, and make
sure that /dev/log is configured as a "dgram" not "stream" socket.
Also make sure that log writes are not synchronous.

-- 
Viktor.


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Zach Sheppard
Wietse:

I have not made any changes to rsyslog.conf. All it does it redirect all
mail log messages to one log in /var/log/mail which I rotate with a cron
script nightly. However, I do agree that it really could be the only other
process that could be hanging the server.

I'm not able to determine what program is consuming the CPU because I can't
login to the console when this occurs. The only way I can recover the
machine is by forcibly powering off.

#  /etc/rsyslog.confConfiguration file for rsyslog.
#
#   For more information see
#   /usr/share/doc/rsyslog-doc/html/rsyslog_conf.html


#
 MODULES 
#

module(load="imuxsock") # provides support for local system logging
module(load="imklog")   # provides kernel logging support
#module(load="immark")  # provides --MARK-- message capability

# provides UDP syslog reception
#module(load="imudp")
#input(type="imudp" port="514")

# provides TCP syslog reception
#module(load="imtcp")
#input(type="imtcp" port="514")


###
 GLOBAL DIRECTIVES 
###

#
# Use traditional timestamp format.
# To enable high precision timestamps, comment out the following line.
#
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat

#
# Set the default permissions for all log files.
#
$FileOwner root
$FileGroup adm
$FileCreateMode 0640
$DirCreateMode 0755
$Umask 0022

#
# Where to place spool and state files
#
$WorkDirectory /var/spool/rsyslog

#
# Include all config files in /etc/rsyslog.d/
#
$IncludeConfig /etc/rsyslog.d/*.conf


###
 RULES 
###

#
# First some standard log files.  Log by facility.
#
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none  -/var/log/syslog
#cron.* /var/log/cron.log
daemon.*-/var/log/daemon.log
kern.*  -/var/log/kern.log
lpr.*   -/var/log/lpr.log
mail.*  -/var/log/mail/mail.log
user.*  -/var/log/user.log

#
# Logging for the mail system.  Split it up so that
# it is easy to write scripts to parse these files.
#
#mail.info  -/var/log/mail.info
#mail.warn  -/var/log/mail.warn
#mail.err   /var/log/mail.err

#
# Some "catch-all" log files.
#
*.=debug;\
auth,authpriv.none;\
news.none;mail.none -/var/log/debug
*.=info;*.=notice;*.=warn;\
auth,authpriv.none;\
cron,daemon.none;\
mail,news.none  -/var/log/messages

#
# Emergencies are sent to everybody logged in.
#
*.emerg :omusrmsg:*

On Fri, Jan 19, 2018 at 1:20 PM, Wietse Venema  wrote:

> Zach Sheppard:
> > However, whenever Postfix gets a large e-mail load it processes the
> e-mails
> > for around 30-45 minutes and then consistently uses around 70-80% of the
> > CPU effectively locking up the entire system so much so that I can't even
> > login and debug without either disabling the network card in VMWare or
> > rebooting the system totally. The VM has 2 CPU cores and 4GB RAM
> allocated
> > to it which I thought was plenty for this application.
>
> Did you determine out WHAT PROGRAM is consuming CPU? As Postfix is
> totally I/O limited, I suspect that there may be some other system
> component that is messing up.
>
> For example, it is well known that one badly configured syslog
> service can use up lots more CPU than all of Postfix combined.
>
> Wietse
>

-- 
This message may contain confidential information and is intended only for 
the individuals named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Wietse Venema
Zach Sheppard:
> However, whenever Postfix gets a large e-mail load it processes the e-mails
> for around 30-45 minutes and then consistently uses around 70-80% of the
> CPU effectively locking up the entire system so much so that I can't even
> login and debug without either disabling the network card in VMWare or
> rebooting the system totally. The VM has 2 CPU cores and 4GB RAM allocated
> to it which I thought was plenty for this application.

Did you determine out WHAT PROGRAM is consuming CPU? As Postfix is
totally I/O limited, I suspect that there may be some other system
component that is messing up.

For example, it is well known that one badly configured syslog
service can use up lots more CPU than all of Postfix combined.

Wietse


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Dominic Raferd
On 19 January 2018 at 16:02, Viktor Dukhovni  wrote:
>
>
>> On Jan 19, 2018, at 10:58 AM, Dominic Raferd  wrote:
>>
>>> The pipes to "sort" should not be needed.  The output of "postconf" is 
>>> pre-sorted.
>>
>> yes I thought that - but without piping through sort I see:
>> $ comm -1 -2 <(postconf -n) <(postconf -d)
>> comm: file 2 is not in sorted order
>> comm: file 1 is not in sorted order
>
> Perhaps your locale sorts "_" the output differently than the C locale.
> Does the issue go away with:
>
> export LC_COLLATE=C

Ah yes it does. On my system LC_COLLATE is normally undefined. This
avoids the unwanted messages:

LC_COLLATE=C comm -1 -2 <(postconf -n) <(postconf -d)


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Viktor Dukhovni


> On Jan 19, 2018, at 10:58 AM, Dominic Raferd  wrote:
> 
>> The pipes to "sort" should not be needed.  The output of "postconf" is 
>> pre-sorted.
> 
> yes I thought that - but without piping through sort I see:
> $ comm -1 -2 <(postconf -n) <(postconf -d)
> comm: file 2 is not in sorted order
> comm: file 1 is not in sorted order

Perhaps your locale sorts "_" the output differently than the C locale.
Does the issue go away with:

export LC_COLLATE=C

-- 
Viktor.



Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Dominic Raferd
On 19 January 2018 at 15:55, Viktor Dukhovni  wrote:
>
>
>> On Jan 19, 2018, at 10:46 AM, Dominic Raferd  wrote:
>>
>> Here's a way to check for explicit settings in main.cf that are
>> actually defaults and so could be removed (works under bash):
>>
>> comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)
>
> The pipes to "sort" should not be needed.  The output of "postconf" is 
> pre-sorted.

yes I thought that - but without piping through sort I see:
$ comm -1 -2 <(postconf -n) <(postconf -d)
comm: file 2 is not in sorted order
comm: file 1 is not in sorted order


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Viktor Dukhovni


> On Jan 19, 2018, at 10:46 AM, Dominic Raferd  wrote:
> 
> Here's a way to check for explicit settings in main.cf that are
> actually defaults and so could be removed (works under bash):
> 
> comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)

The pipes to "sort" should not be needed.  The output of "postconf" is 
pre-sorted.

-- 
Viktor.



Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Dominic Raferd
On 19 January 2018 at 15:21, Viktor Dukhovni  wrote:
>
>
>
> > default_destination_concurrency_limit = 50
>
> This is the default, remove the setting.
>
> ...


Here's a way to check for explicit settings in main.cf that are
actually defaults and so could be removed (works under bash):

comm -1 -2 <(postconf -n|sort) <(postconf -d|sort)


Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Viktor Dukhovni


> On Jan 19, 2018, at 9:55 AM, Zach Sheppard  
> wrote:
> 
> bounce_queue_lifetime = 0

This is a bad idea.  Don't do that.  Allow transient delivery failures
to be retried.

> default_destination_concurrency_limit = 50

This is the default, remove the setting.

> internal_mail_filter_classes = bounce, notify

Leave this setting at its default value by removing it from main.cf.

> notify_classes = bounce, resource, software

Consider setting this empty, when sending bulk mail, your source of
notices is the mail log, not per-message notices.

> smtp_connect_timeout = 10s

This is may be a bit short, but should be OK unless some destination is
under sustained heavy load.

> smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache

You can/should set this empty, and leave session caching to the client via
session tickets.

> smtp_helo_timeout = 30s

This too can be a bit short for overloaded peers.

> smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

This is your client session cache, correctly configured, DO NOT also
clear this when clearing the smtpd setting.

> --master.cf--
> smtp  inet  n   -   y   -   -   smtpd
> pickupunix  n   -   y   60  1   pickup
> cleanup   unix  n   -   y   -   0   cleanup
> qmgr  unix  n   -   n   300 1   qmgr
> tlsmgrunix  -   -   y   1000?   1   tlsmgr
> rewrite   unix  -   -   y   -   -   trivial-rewrite
> bounceunix  -   -   y   -   0   bounce
> defer unix  -   -   y   -   0   bounce
> trace unix  -   -   y   -   0   bounce
> verifyunix  -   -   y   -   1   verify
> flush unix  n   -   y   1000?   0   flush
> proxymap  unix  -   -   n   -   -   proxymap
> proxywrite unix -   -   n   -   1   proxymap
> smtp  unix  -   -   y   -   -   smtp
> relay unix  -   -   y   -   -   smtp
> showq unix  n   -   y   -   -   showq
> error unix  -   -   y   -   -   error
> retry unix  -   -   y   -   -   error
> discard   unix  -   -   y   -   -   discard
> local unix  -   n   n   -   -   local
> virtual   unix  -   n   n   -   -   virtual
> lmtp  unix  -   -   y   -   -   lmtp
> anvil unix  -   -   y   -   1   anvil
> scacheunix  -   -   y   -   1   scache
> [...]


The master.cf file looks fairly normal, but uses chroot for many
services.  This can make for fragile management of the system when
the chroot jail is subtly misconfigured, but is not necessarily a
problem (when done right).

-- 
Viktor.



Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Viktor Dukhovni


> On Jan 19, 2018, at 8:52 AM, Zach Sheppard  
> wrote:
> 
> Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: 
> from=<...@oconee.k12.sc.us>, size=2408, nrcpt=1 (queue 
> active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
> Jan 15 09:31:40 mailrelay opendkim[668]: OpenDKIM Filter v2.11.0 starting 
> (args: -x /etc/opendkim.conf)

What you see as "^@" is how ASCII NUL (the zero byte) is displayed
by "more", "less", "vi", ...   It appears that the log file has either
lots of NULs written to it, or perhaps has a "hole" as a result of
truncation of the log file while it was still being written by the
syslog daemon.  Perhaps incorrect log rotation...

As for high CPU, that's a bit hard to explain without further
information, which is difficult to obtain with a truncated log.
Postfix does not normally bring systems to their knees, its
resource limits are specifically designed to avoid that.

-- 
Viktor.



Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Zach Sheppard
Hi Bastian,

I did read that debug readme and saw the following:

>> If you can't use a test email address, please anonymize email addresses
and host names consistently. Replace each letter by "A", each digit by "D"
so that the helpers can still recognize syntactical errors.

...and is why I replaced the e-mail address and host IP addresses. The from
address for this particular message was and is the same for all messages as
this server only sends mail from one internal application. However, the to
addresses are parents, staff, and other people I didn't want to identify.

As for the statement that is tagged to all of our outgoing mail. I
apologize for that being included.

I wasn't sure whether the other debug readme recommendations were needed
but I have the postfinger output below:

./postfinger
postfinger - postfix configuration on Fri Jan 19 09:50:34 EST 2018
version: 1.30

Warning: postfinger output may show private configuration information,
such as ip addresses and/or domain names which you do not want to show
to the public.  If this is the case it is your responsibility to modify
the output to hide this private information.  [Remove this warning with
the --nowarn option.]

--System Parameters--
mail_version = 3.1.6
hostname = 
uname = Linux 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
x86_64 GNU/Linux

--Packaging information--
looks like this postfix comes from deb package: postfix-3.1.6-0+deb9u1

--main.cf non-default parameters--
alias_maps = hash:/etc/postfix/aliases
append_dot_mydomain = no
biff = no
bounce_notice_recipient = root@localhost
bounce_queue_lifetime = 0
compatibility_level = 2
default_destination_concurrency_limit = 50
error_notice_recipient = root@localhost
inet_protocols = ipv4
in_flow_delay = 2s
internal_mail_filter_classes = bounce, notify
masquerade_domains = $mydomain
milter_default_action = reject
milter_protocol = 2
mydestination = $myhostname, localhost
mydomain = oconee.k12.sc.us
myhostname = .fqdn
mynetworks = localhost, 127.0.0.1, [::1]
myorigin = $mydomain
non_smtpd_milters = inet:localhost:8892
notify_classes = bounce, resource, software
proxy_interfaces = 
readme_directory = no
smtp_connect_timeout = 10s
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_client_restrictions = permit_mynetworks, permit_sasl_authenticated,
reject
smtpd_milters = inet:localhost:8892
smtpd_reject_unlisted_recipient = no
smtpd_sasl_auth_enable = yes
smtpd_sasl_authenticated_header = yes
smtpd_sasl_local_domain = oconee.k12.sc.us
smtpd_tls_cert_file = /etc/postfix/ssl/oconee.pem
smtpd_tls_key_file = /etc/postfix/ssl/oconee.key
smtpd_tls_security_level = may
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtpd_use_tls = yes
smtp_generic_maps = hash:/etc/postfix/generic_maps
smtp_helo_timeout = 30s
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

--master.cf--
smtp  inet  n   -   y   -   -   smtpd
pickupunix  n   -   y   60  1   pickup
cleanup   unix  n   -   y   -   0   cleanup
qmgr  unix  n   -   n   300 1   qmgr
tlsmgrunix  -   -   y   1000?   1   tlsmgr
rewrite   unix  -   -   y   -   -   trivial-rewrite
bounceunix  -   -   y   -   0   bounce
defer unix  -   -   y   -   0   bounce
trace unix  -   -   y   -   0   bounce
verifyunix  -   -   y   -   1   verify
flush unix  n   -   y   1000?   0   flush
proxymap  unix  -   -   n   -   -   proxymap
proxywrite unix -   -   n   -   1   proxymap
smtp  unix  -   -   y   -   -   smtp
relay unix  -   -   y   -   -   smtp
showq unix  n   -   y   -   -   showq
error unix  -   -   y   -   -   error
retry unix  -   -   y   -   -   error
discard   unix  -   -   y   -   -   discard
local unix  -   n   n   -   -   local
virtual   unix  -   n   n   -   -   virtual
lmtp  unix  -   -   y   -   -   lmtp
anvil unix  -   -   y   -   1   anvil
scacheunix  -   -   y   -   1   scache
maildrop  unix  -   n   n   -   -   pipe
  flags=DRhu user=vmail argv=/usr/bin/maildrop -d ${recipient}
uucp  unix  -   n   n   -   -   pipe
  flags=Fqhu user=uucp argv=uux -r -n -z -a$sender - $nexthop!rmail
($recipient)
ifmailunix  -   n   n   -   -   pipe
  flags=F user=ftn argv=/usr/lib/ifmail/ifmail -r $nexthop ($recipient)
bsmtp unix  -   n   n   -   -   pipe
  flags=Fq. user=bsmtp argv=/usr/lib/bsmtp/bsmtp -t$nexthop -f$sender
$recipient
scalemail-backend unix  -   n   n  

Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Bret Busby

On Fri, 19 Jan 2018, Bastian Blank wrote:


Date: Fri, 19 Jan 2018 22:06:15
From: Bastian Blank <bastian+postfix-users=postfix@waldi.eu.org>
To: postfix-users@postfix.org
Subject: Re: Postfix using all CPU after nightly mail submission

On Fri, Jan 19, 2018 at 08:52:39AM -0500, Zach Sheppard wrote:

I see no errors or alarms in the mail log nor syslog. The only odd message
I see is a series of ^@ symbols before the system freezes:


Please read http://www.postfix.org/DEBUG_README.html#mail about how to
report problems.


Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
...@oconee.k12.sc.us>, size=2282, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us


You have a broken system, nothing reports "DDD.DD.DD.DD" as ip address.


This message may contain confidential information and is intended only for
the individuals named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


Please remove this statement.  This mailing-list is already public
archived.

Bastian

--
Those who hate and fight must stop themselves -- otherwise it is not stopped.
-- Spock, "Day of the Dove", stardate unknown



I do not know whether the From address characters (non-alphanumeric, 
other than the '@' character) are the cause, but the above message is 
one of those messages that endlessly loops between filters for malicious 
email messages, and is only accidentally found by switching off filters.


--
Bret Busby
Armadale
West Australia
..

"So once you do know what the question actually is,
 you'll know what the answer means."
- Deep Thought,
  Chapter 28 of Book 1 of
  "The Hitchhiker's Guide to the Galaxy:
  A Trilogy In Four Parts",
  written by Douglas Adams,
  published by Pan Books, 1992



Re: Postfix using all CPU after nightly mail submission

2018-01-19 Thread Bastian Blank
On Fri, Jan 19, 2018 at 08:52:39AM -0500, Zach Sheppard wrote:
> I see no errors or alarms in the mail log nor syslog. The only odd message
> I see is a series of ^@ symbols before the system freezes:

Please read http://www.postfix.org/DEBUG_README.html#mail about how to
report problems.

> Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
> added (s=mailrelay, d=oconee.k12.sc.us)
> Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
> ...@oconee.k12.sc.us>, size=2282, nrcpt=1 (queue active)
> Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
> client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
> oconee.k12.sc.us

You have a broken system, nothing reports "DDD.DD.DD.DD" as ip address.

> This message may contain confidential information and is intended only for 
> the individuals named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and 
> delete this e-mail from your system. If you are not the intended recipient 
> you are notified that disclosing, copying, distributing or taking any 
> action in reliance on the contents of this information is strictly 
> prohibited.

Please remove this statement.  This mailing-list is already public
archived.

Bastian

-- 
Those who hate and fight must stop themselves -- otherwise it is not stopped.
-- Spock, "Day of the Dove", stardate unknown


Postfix using all CPU after nightly mail submission

2018-01-19 Thread Zach Sheppard
Hello,

We use a Postfix installation on Debian 9 in our network to serve as a
send-only SMTP server for a very large application in our school district.
This application does not have its own built-in SMTP server and requires a
relay to send notification e-mails to students, teachers, and parents
topping out at around 10,000 e-mails per day. Our hosted e-mail solution (G
Suite) has a sending cap that we hit with this application so we setup this
Postfix installation to serve as an alternative.

However, whenever Postfix gets a large e-mail load it processes the e-mails
for around 30-45 minutes and then consistently uses around 70-80% of the
CPU effectively locking up the entire system so much so that I can't even
login and debug without either disabling the network card in VMWare or
rebooting the system totally. The VM has 2 CPU cores and 4GB RAM allocated
to it which I thought was plenty for this application.

I see no errors or alarms in the mail log nor syslog. The only odd message
I see is a series of ^@ symbols before the system freezes:

Jan 15 00:42:41 mailrelay opendkim[1643]: 385B38094E: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 385B38094E: from=<
...@oconee.k12.sc.us>, size=2282, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: 8BFCC80968:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: 8BFCC80968: message-id=<
734906173.4337.1515994961560.javamail....@oconee.k12.sc.us>
Jan 15 00:42:41 mailrelay opendkim[1643]: 8BFCC80968: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: 8BFCC80968: from=<
...@oconee.k12.sc.us>, size=2370, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: B66B38096E:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: B66B38096E: message-id=<
902964141.4338.1515994961733.javamail....@oconee.k12.sc.us>
Jan 15 00:42:41 mailrelay opendkim[1643]: B66B38096E: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:41 mailrelay postfix/qmgr[5601]: B66B38096E: from=<
...@oconee.k12.sc.us>, size=3705, nrcpt=1 (queue active)
Jan 15 00:42:41 mailrelay postfix/smtpd[6128]: E39EA8096F:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:41 mailrelay postfix/cleanup[6375]: E39EA8096F: message-id=<
1233892182.4339.1515994961918.javamail....@oconee.k12.sc.us>
Jan 15 00:42:41 mailrelay opendkim[1643]: E39EA8096F: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: E39EA8096F: from=<
...@oconee.k12.sc.us>, size=2281, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 25EFC80970:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 25EFC80970: message-id=<
2097536797.4340.1515994962141.javamail....@oconee.k12.sc.us>
Jan 15 00:42:42 mailrelay opendkim[1643]: 25EFC80970: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 25EFC80970: from=<
...@oconee.k12.sc.us>, size=2486, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 5329380972:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 5329380972: message-id=<
951323567.4341.1515994962327.javamail....@oconee.k12.sc.us>
Jan 15 00:42:42 mailrelay opendkim[1643]: 5329380972: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 5329380972: from=<
...@oconee.k12.sc.us>, size=2326, nrcpt=1 (queue active)
Jan 15 00:42:42 mailrelay postfix/smtpd[6128]: 8EF0980973:
client=unknown[DDD.DD.DD.DD], sasl_method=LOGIN, sasl_username=AAA@
oconee.k12.sc.us
Jan 15 00:42:42 mailrelay postfix/cleanup[6375]: 8EF0980973: message-id=<
484411487.4342.1515994962572.javamail....@oconee.k12.sc.us>
Jan 15 00:42:42 mailrelay opendkim[1643]: 8EF0980973: DKIM-Signature field
added (s=mailrelay, d=oconee.k12.sc.us)
Jan 15 00:42:42 mailrelay postfix/qmgr[5601]: 8EF0980973: from=<
...@oconee.k12.sc.us>, size=2408, nrcpt=1 (queue
active)^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@