On 2023-01-23 at 12:06:34 UTC-0500 (Mon, 23 Jan 2023 17:06:34 +0000)
White, Daniel E. (GSFC-770.0)[AEGIS] <daniel.e.wh...@nasa.gov>
is rumored to have said:

There was no outage.
The queue filled faster than the processes could process them through.

I do not know which limit to increase to accommodate such bursts of traffic.

I did find 27 instances of this block of info in the logs:

postfix/qmgr[PID]: QUEUE_ID: from=<sender>, size=1370, nrcpt=1 (queue active) postfix/qmgr[PID]: warning: mail for [127.0.0.1]:10024 is using up NUMBER of NUMBER active queue entries postfix/qmgr[PID]: warning: you may need to reduce smtp-amavis connect and helo timeouts postfix/qmgr[PID]: warning: so that Postfix quickly skips unavailable hosts postfix/qmgr[PID]: warning: you may need to increase the main.cf minimal_backoff_time and maximal_backoff_time postfix/qmgr[PID]: warning: so that Postfix wastes less time on undeliverable mail postfix/qmgr[PID]: warning: you may need to increase the master.cf smtp-amavis process limit postfix/qmgr[PID]: warning: please avoid flushing the whole queue when you have postfix/qmgr[PID]: warning: lots of deferred mail, that is bad for performance postfix/qmgr[PID]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0

From postconf:
minimal_backoff_time = 300s
maximal_backoff_time = 4000s
smtp_helo_timeout = 300s

But where do I find smtp-amavis connect timeout ?

Is it the milter_connect_timeout ?

No, it appears that you are using Amavisd as a SMTP proxy in between 2 Postfix smtpd processes, not a milter. Presumably 'smtp-amavis' is the post-proxy smtpd, which uses the standard smtpd_* settings (from main.cf or defaults) unless you override those settings in master.cf. It is probably more important to make sure that the smtpd instance on the output side of the proxy has a process limit equal to the one handling the external connection, or else that will be a bottleneck and you can get those warnings from qmgr.

I don't believe that setting the process limit on the outbound side smtpd service higher than the inbound side provides anything, but Viktor or Wietse will likely correct me if I'm wrong.



From: Wietse Venema <wie...@porcupine.org>
Date: Monday, January 23, 2023 at 11:28
To: Daniel White <daniel.e.wh...@nasa.gov>
Cc: Postfix users <postfix-users@postfix.org>
Subject: [EXTERNAL] Re: Mail queue took 3 hours to recover from a flood. Suggestions ?

White, Daniel E. (GSFC-770.0)[AEGIS]:
Around 12000 messages.
The queue went from ~3000 to over 12000 in about 30 minutes and then took 3 hours to grind through all of them.

I am still trying to determine if this was an accident or not.
The source claims it was not intentionally malicious.

Some postconf values:

default_destination_concurrency_failed_cohort_limit = 1
default_destination_concurrency_limit = 20
default_process_limit = 100

I did not see anything at http://www.postfix.org/TUNING_README.html<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.postfix.org%2FTUNING_README.html&data=05%7C01%7Cdaniel.e.white%40nasa.gov%7C9a6019394f694266dd3908dafd5ecf70%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638100880889689135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5UPRuHrxK16Fw47%2FjVKPP4dm35xBAO%2F7CfehjEh30DY%3D&reserved=0> that looked like it would help, but then we are operating on a skeleton crew, and I do not have the luxury to spend time digging into the details.


When a message was not delivered for 30min because of an outage,
then it will take 30min before Postfix tries to deliver that message
again. So it will take at last an hour to clear the queue, more
depending on how much additional mail was queued in the meantime.

Without further details there can be no useful help.

                Wietse


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to