Wietse Venema: > Mehmet Avcioglu: > > Wietse Venema: > > > Then I suspect that the code reaches the 10000 limit because > > > there are ~10000 files in the queue. > > .... > > > Below is a patch that should fix this. > > > > Thank you for the patch. I had previously said that we were not > > running 'showq' or 'postqueue -p' frequently, but upon further > > investigation found out that the prometheus exporter was in fact > > running 'showq' or accessing via showq socket every 15 seconds. > > Stopping the exporter fixed the issue. > > Well no, it just stopped requests to run showq daemon. It did nothing > to fix the showq daemon. > > > Would your suggestion be to; > > - not run 'showq' or 'postqueue -p' (and hence a metrics exporter like > > this) on a busy server like this at all > > - run it but run it less frequently on normal non-patched postfix > > - run it less frequently and also apply the patch you had sent > > I'm not going to tell you how to run Postfix with a bug, but I can > give you an insight into what would happen in different scenarios. > > I suspect that the problem is that showq logs false errors because > it does not properly reset the counter for reverse jumps. > > The following examples assume that you have the default Postfix > settings of max_use=100 and max_idle=100s. > > - If showq is run once, or once per more-than-100 seconds, the showq > process will run once and terminate, and it could report false > errors for queues with ~10000 or more messages. > > - If showq is run repeatedly every less-than-100 seconds, then you > will reuse the same showq process up to 100 times, and it could > report false errors for queues with ~100 or more messages. > > If this is the problem then I'm surprised that it has not been > observed before in the 12 years since the code was written.
Additionally you can tweak max_use or max_idle in master.cf: showq unix n - y - - showq -o max_use=xxx max_idle=yyy I would not set these in main.cf, as that would affect all Postfix processes. Wietse