R: read timeout on cleanup socket on two different machines

Luigi Iotti Thu, 25 Sep 2008 09:22:17 -0700

> Da: Wietse Venema [mailto:[EMAIL PROTECTED] 
> Luigi Iotti:
> > Hi all
> > 
> > I operate two very different postfix machines. One is heavy 
> loaded and with
> > a decent hardware, the other is my home machine. Both have 
> CentOS5 with
> > postfix-2.3.3, amavis, spamassassin and clamav. On both 
> machines there is a
> > mail account signed on the same mailing list (in 
> particular, the popular
> > Squid web proxy daemon mailing list).
> > From time to time, one or both of these accounts exhibit 
> the same problem
> > while receiving a message from the mentioned mailing list.
> > A message is received saying (I paste a transcript of the 
> error I receive
> > from my home machine, but the problem on the other is the same):
> > 
> > Return-Path: <[EMAIL PROTECTED]>
> > From: [EMAIL PROTECTED] (Mail Delivery System)
> > To: [EMAIL PROTECTED] (Postmaster)
> > Subject: Postfix SMTP server: errors from 
> squid-cache.org[12.160.37.9]
> > 
> > Transcript of session follows.
> > 
> >  Out: 220 barattolo.rinnanet.it ESMTP Postfix
> >  In:  HELO squid-cache.org
> >  Out: 250 barattolo.rinnanet.it
> >  In:  MAIL 
> FROM:<[EMAIL PROTECTED]>
> >  Out: 250 2.1.0 Ok
> >  In:  RCPT TO:<[EMAIL PROTECTED]>
> >  Out: 250 2.1.5 Ok
> >  In:  DATA
> >  Out: 354 End data with <CR><LF>.<CR><LF>
> >  Out: 451 4.3.0 Error: queue file write error
> > 
> > Session aborted, reason: lost connection
> > 
> > 
> > Having a look at the logs, I find:
> > Sep 24 06:51:13 barattolo postfix/smtpd[5832]: connect from
> > squid-cache.org[12.160.37.9]
> > ...
> > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: NOQUEUE: 
> filter: RCPT from
> > squid-cache.org[12.160.37.9]: 
> <squid-cache.org[12.160.37.9]>: Client host
> > triggers FILTER smtp-amavis:[127.0.0.1]:10024;
> > from=<[EMAIL PROTECTED]>
> > to=<[EMAIL PROTECTED]> proto=SMTP helo=<squid-cache.org>
> > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: 2928F10000E:
> > client=squid-cache.org[12.160.37.9]
> > ...
> > Sep 24 07:52:07 barattolo postfix/cleanup[5848]: warning: 
> 2928F10000E: read
> > timeout on cleanup socket
> > ...
> > Sep 24 08:01:48 barattolo postfix/smtpd[5832]: disconnect from
> > squid-cache.org[12.160.37.9]
> > 
> > I'm tempted to think that this is a mailing list's manager 
> problem, and to
> > forget about it, but I would like to be sure that the fault 
> is not partly or
> > totally mine.
> > Any suggestions?
> 
> Normally, all Postfix network and inter-process I/O is subject to
> time limits. On Linux these time limits are implemented with poll().
> Network and inter-process I/O are done over TCP or UNIX-domain
> sockets.  Those sockets are in blocking mode, and Postfix relies
> on the kernel to return early when a read or write operation is
> incomplete (sockets are in blocking mode because of Solaris bugs;
> by now I should perhaps stop working around bugs from 1996).
> 
> In your case, the smtpd process gets stuck, the cleanup process
> gives up after waiting for one hour, and then the smtpd process
> becomes un-stuck more than 9 minutes later.  In the mean time, the
> SMTP client and the cleanup process have gone away, but of course
> the smtpd process discovers that only after it becomes un-stuck.
> 
> I have no idea why the smtpd process would get stuck except of
> course for kernel bugs.


Thank you for being so clear. Let's suppose it is a kernel bug (it's strange
that it affects only mail from that mailing list and its host, but it's
possible). Do you have any advice on how could I confirm it? I know I'm OT,
but since we've gone so far..
Do you think that running smtpd with -v can help finguring out the
situation?

Anyway I'm goig to investigate further.

R: read timeout on cleanup socket on two different machines

Reply via email to