Ask Bjørn Hansen wrote:
The original logic for not doing the immediate unlink was to keep it as
a tool to track down bugs in the error handling.
And I think we've done that now (mostly, see below), so maybe it's time
to revisit that decision.
If qpsmtpd crashes or gets killed (maybe except for by OOM) then it's
not bad that it leaves behind a tell-tale.
The problem is that the spooled message isn't usually enough to go on to
track down _why_ it crashed/was killed, so I don't think that the slow
accretion of garbage (which is what I was seeing).
I have had exactly one "orphan" file in my spool_dir since I last
cleaned them out, oddly enough from this morning. The file was just the
headers of a message (no body).
I looked at the log entries for that item and here are the last two log
lines for that item:
2006-02-01 10:32:41.108221500 23825 spooling message to disk
2006-02-01 10:52:41.588052500 23825 Connection Timed Out
When I first looked, it was before the "Connection Timed Out" and the
file was still in the tmp/ directory. After the 20 minutes had expired,
the code correctly cleaned up the spooled message, so qpsmtpd seems to
be doing what we want.
However, when I originally surveyed the contents of the tmp/ directory,
there were files there with a variety of dates and times, some in the
middle of the night, when I would normally not be restarting qpsmtpd.
So there is some uncommon failure path which we are not seeing, where
the spool file is not being deleted.
John