> Da: Wietse Venema [mailto:[EMAIL PROTECTED] > Luigi Iotti: > > Hi all > > > > I operate two very different postfix machines. One is heavy > loaded and with > > a decent hardware, the other is my home machine. Both have > CentOS5 with > > postfix-2.3.3, amavis, spamassassin and clamav. On both > machines there is a > > mail account signed on the same mailing list (in > particular, the popular > > Squid web proxy daemon mailing list). > > From time to time, one or both of these accounts exhibit > the same problem > > while receiving a message from the mentioned mailing list. > > A message is received saying (I paste a transcript of the > error I receive > > from my home machine, but the problem on the other is the same): > > > > Return-Path: <[EMAIL PROTECTED]> > > From: [EMAIL PROTECTED] (Mail Delivery System) > > To: [EMAIL PROTECTED] (Postmaster) > > Subject: Postfix SMTP server: errors from > squid-cache.org[12.160.37.9] > > > > Transcript of session follows. > > > > Out: 220 barattolo.rinnanet.it ESMTP Postfix > > In: HELO squid-cache.org > > Out: 250 barattolo.rinnanet.it > > In: MAIL > FROM:<[EMAIL PROTECTED]> > > Out: 250 2.1.0 Ok > > In: RCPT TO:<[EMAIL PROTECTED]> > > Out: 250 2.1.5 Ok > > In: DATA > > Out: 354 End data with <CR><LF>.<CR><LF> > > Out: 451 4.3.0 Error: queue file write error > > > > Session aborted, reason: lost connection > > > > > > Having a look at the logs, I find: > > Sep 24 06:51:13 barattolo postfix/smtpd[5832]: connect from > > squid-cache.org[12.160.37.9] > > ... > > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: NOQUEUE: > filter: RCPT from > > squid-cache.org[12.160.37.9]: > <squid-cache.org[12.160.37.9]>: Client host > > triggers FILTER smtp-amavis:[127.0.0.1]:10024; > > from=<[EMAIL PROTECTED]> > > to=<[EMAIL PROTECTED]> proto=SMTP helo=<squid-cache.org> > > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: 2928F10000E: > > client=squid-cache.org[12.160.37.9] > > ... > > Sep 24 07:52:07 barattolo postfix/cleanup[5848]: warning: > 2928F10000E: read > > timeout on cleanup socket > > ... > > Sep 24 08:01:48 barattolo postfix/smtpd[5832]: disconnect from > > squid-cache.org[12.160.37.9] > > > > I'm tempted to think that this is a mailing list's manager > problem, and to > > forget about it, but I would like to be sure that the fault > is not partly or > > totally mine. > > Any suggestions? > > Normally, all Postfix network and inter-process I/O is subject to > time limits. On Linux these time limits are implemented with poll(). > Network and inter-process I/O are done over TCP or UNIX-domain > sockets. Those sockets are in blocking mode, and Postfix relies > on the kernel to return early when a read or write operation is > incomplete (sockets are in blocking mode because of Solaris bugs; > by now I should perhaps stop working around bugs from 1996). > > In your case, the smtpd process gets stuck, the cleanup process > gives up after waiting for one hour, and then the smtpd process > becomes un-stuck more than 9 minutes later. In the mean time, the > SMTP client and the cleanup process have gone away, but of course > the smtpd process discovers that only after it becomes un-stuck. > > I have no idea why the smtpd process would get stuck except of > course for kernel bugs.
Thank you for being so clear. Let's suppose it is a kernel bug (it's strange that it affects only mail from that mailing list and its host, but it's possible). Do you have any advice on how could I confirm it? I know I'm OT, but since we've gone so far.. Do you think that running smtpd with -v can help finguring out the situation? Anyway I'm goig to investigate further.