David Schweikert:
> Hi,
> 
> We are experiencing rather frequent mail deferrals because of a
> milter-related malfunction:
> 
> Oct 22 08:48:14 mailhost3 postfix/smtpd[723]: 23C7F8FF16: 
> client=xxx.xxx[1.2.3.4]
> Oct 22 08:48:14 mailhost3 postfix/cleanup[1415]: 23C7F8FF16: message-id=<xxxx>
> Oct 22 09:18:16 mailhost3 postfix/cleanup[1415]: warning: milter 
> inet:localhost:12337: can't read SMFIC_HEADER reply packet header: No such 
> file or directory
> Oct 22 09:18:16 mailhost3 postfix/cleanup[1415]: 23C7F8FF16: milter-reject: 
> END-OF-MESSAGE from xxx.xxx[1.2.3.4]: 4.7.1 Service unavailable - try again 
> later; from=<...> to=<...> proto=ESMTP helo=<...>
> 
> We use only the amavisd-milter milter (together with amavisd-new). Also,
> we have a few policy daemons (in particular, apparently a lot of
> policy-spf.pl processes).
> 
> What I find interesting is that the error comes after 30 minutes. The
> relevant timeouts are however as follows:
> 
>   daemon_timeout = 18000s
>   ipc_timeout = 3600s
>   milter_command_timeout = 30s
>   milter_connect_timeout = 30s
>   milter_content_timeout = 300s
>   smtpd_policy_service_timeout = 300s
>   smtpd_timeout = ${stress?30}${stress:180}
> 
> If this is a problem with the milter, shouldn't Postfix realize that the
> timeout has expired before (i.e. after 30 or 300 seconds)? Also,
> shouldn't the whole session be interrupted after 180 seconds?

Postfix does not enforce timeouts - instead, Postfix depends on
the kernel to do the job. When Postfix wants to read, it waits for
$timeout seconds for the socket to become readable. When the kernel
reports the socket is readable but the read blocks anyway, then
bad things happen. Solaris has inspired some Postfix workarounds
in this area.

Notice also that the read (or poll, or select, or ...) reports an
ENOENT error, which is an indication that one would not expect to
see on an open socket. The last time I heard of ENOENT on an open
file handle was with buggy Reiserfs.

> Any idea about what could be causing this? My theory is the following:
> 
> - A policy daemon blocks the whole SMTP transaction and finally times out.
> - Postfix doesn't timeout and proceeds with the email, even though the
>   connection is 30 minutes old.
> - Postfix tries to speak again with the milter, which has however
>   disconnected in the mean time.

If Postfix does not get to talk to the milter socket for a long
time, then the other end is likely to time out and close the socket.
Having a kernel that reports ENOENT on an open file handle is not
particularly useful.

        Wietse

Reply via email to