> netstat or lsof? Only the Postfix queue manager knows what deliveries
> are in progress, and it has never evolved a 'live status' API. None
> of the Postfix daemons has a status query API, it just isn't part of
> the architecture.
I created a way to watch the number of processes that exist for each
of our four randmap transports (c0, c1, c2, c3) by using:
ps -f -u postfix | grep smtp_helo_name=mail01-cx | wc -l
The script generates one line that looks like this when there is no load:
smtp procs: 0, 0, 0, 0 = 0
at the time we start loading the server with outgoing email:
smtp procs: 29, 26, 30, 31 = 116
and increases to the following under maximum load:
smtp procs: 49, 52, 48, 58 = 207
I can see the "slowly rises over time" mentioned by Weitse. I'm not
sure how this relates to maxproc in master.cf where each of the
randmap transports are set to 128.
> That state of affairs Sounds fine. Rather than monitoring queue size,
> it may be better to monitor smoothed running averages of the "b", "c"
> and "d", times in:
> delays=a/b/c/d
The first thing I look at is a set of stats by ISP: Email Sent, Ave
Delay, Max Delay and conn use=. We are seeing Ave Delay of 1-2
seconds and conn use= at 80% for the large ISPs. If this is not the
case, I dig into why not.
If maxproc is too small for the randmap transports, the Ave Delay will
increase and our throughput will decrease. We can also see a dramatic
increase (10 times) in transactions per second to our io subsystem
which are SSDs. A good run will see steady transactions per second
over time as it was this morning. Here is the maximum load log
interval from this morning (we get a snapshot like this once every 10
seconds in our logs -- this logging does not noticeably change server
performance):
01:03:57 up 26 days, 19:11, 0 users, load average: 0.52, 0.33, 0.13
total used free shared buff/cache
available
Mem: 3.6Gi 642Mi 426Mi 0.0Ki 2.6Gi
2.8Gi
Swap: 3.2Gi 94Mi 3.1Gi
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
dm-0 256.00 0.00 5058.00 0 5058
incoming/active T 5 10 20 40 80 160 320
640 1280 1280+
TOTAL 168 168 0 0 0 0 0 0
0 0 0
yahoo.com 65 65 0 0 0 0 0 0
0 0 0
gmail.com 39 39 0 0 0 0 0 0
0 0 0
comcast.net 12 12 0 0 0 0 0 0
0 0 0
deferred T 5 10 20 40 80 160 320
640 1280 1280+
TOTAL 56 53 0 0 3 0 0 0
0 0 0
comcast.net 48 48 0 0 0 0 0 0
0 0 0
satab.mx 1 0 0 0 1 0 0 0
0 0 0
gmail.com 1 0 0 0 1 0 0 0
0 0 0
smtp procs: 49, 52, 48, 58 = 207
Plenty of memory, no swapping, io tps is moderate, active queue size
is low, processor loading of four cores is low, smtp procs increased
to 207 and a bit of throttling from comcast. We will increase the
incoming load on the mail server for the run on Tuesday morning. I
expect io tps will remain the same, smtp processes will increase,
processor loading will increase and email throughput will increase --
we will see.
Thanks for the feedback! Greg
www.RayStedman.org
Blessings, Greg
www.RayStedman.org
On Sun, Jul 11, 2021 at 7:04 PM Viktor Dukhovni
<[email protected]> wrote:
>
> On Sat, Jul 10, 2021 at 07:34:15AM -0700, Greg Sims wrote:
>
> > I am tuning the performance of our mail server. We collect
> > information in our logs every 10 seconds including qshape, iostat,
> > free and mpstat. It seems that the maxproc parameter in master.cf is
> > important for us as we can see the size of the queues decrease as we
> > increase maxproc -- as expected.
>
> Running "qshape" every 10s does seem rather excessive. Two employers
> and over a decade ago I had a "qshaped" that kept state between scans
> avoiding rereading the same queue file twice, and would generate a
> nalert if some age bucket exceeded a threshold occupancy. I never
> released "qshaped" to the world at large.
>
> If you are running "qshape" to measure queue size, use "qshape -s" to
> count senders, so that messages with many recipients don't distort the
> numbers.
>
> My take is that what matters is latency and so long as most messages
> leave the queue quickly the queue size is not a problem.
>
> I don't typically raise max_proc across board, but rather only raise the
> process limits for smtpd(8) and perhaps smtp(8) (given sufficient
> network capacity). Delivery via local(8) and pipe(8) tends to be
> CPU-intensive, and I don't want high process counts there.
>
> > We are currently running with qshape showing 1,000 emails in the
> > incoming/active queue maximum -- all less than 5 minutes.
>
> That state of affairs Sounds fine. Rather than monitoring queue size,
> it may be better to monitor smoothed running averages of the "b", "c"
> and "d", times in:
>
> delays=a/b/c/d
>
> --
> Viktor.