Hi Viktor > "milter-greylist" would be the main suspect for doing something > non-trivial at the "RCPT" stage. The others can't do much without the > message content. But isn't this just a normal "greylist" action??? If > there's no logging about milter timeouts or failures, no other > explanation comes to mind.
milter-greylist either accepts, tempfails or rejects an email. So it's function seems quite trivial to me. But still I try to rule out that something goes specifically wrong when some very specific, not yet identified situations occurs. > > First handful of recipients show up in the MIMEDefang Log. Then there > > is a 30 seconds gap, then Postfix fails all remaining recipients and > > the complete email with 4.7.1 If one email triggers that issue, it is > > triggered every time, on all of our 3 MX with identical milter config. > > But, it is also possible that spamassassin (via MIMEDefang) is > configured to do non-trivial sender or recipient DNS lookups, and these > may be timing out. I think we can rule out opendmarc. Yes ideed. MIMEDefang was my absolute suspect as we also use it to do SRS and forward email and optionally delivering a copy locally by altering the recipient list! This is why I added logging right at the start of filter_recipient() and before sending back the result to Postfix. When the issue occurs I first see a handful of recipients being logged at the very beginning for filter_recipient() and right before the result is returned. Then there is a 30 second gap in the logs. Then postfix reports '4.7.1' for the remaining recipients and the email content. The first recipient temp failed by postfix does not hit the beginning of filter_recipient(). Conclusion: * There is a communication issue between postfix and MIMEdefang. or * The issue is caused by one of the other milter! I guess the milter are called in the order they appear in the config? smtpd_milters = inet:filterhost:3331 inet:filterhost:3332 { inet:filterhost:3333, default_action=accept } first CONNECT is sent to 3331 then 3332 then 3333 then HELO is sent to 3331 then 3332 then 3333 then MAIL is sent to 3331 then 3333 then 3333 then FOREACH RCPT is sent to 3331, 3332, 3333 And at the end the whole payload is sent in DATA to 3331, 3332, 3333 I can rule out an issue during DATA as the issue clearly occurs while processing the 'rcpt to' phase. In my case 3331 would be milter-greylist, 3332 would be MIMEDefang and 3333 opendmarc. Observe: rcpt to: alice rcpt to: bob rcpt to: charlie I see 'alice' and 'bob' passing all stages for filter_recipient() in MIMEDefang. Charlie does not show up in MIMEDefang but is then logged by Postfix with 4.7.1 milter tempfail So if I keep digging the 'blame the other milter' rabbit hole my conclusion is: * bob timed out in OpenDMARC which is called AFTER MIMEDefang. or * charlie timed out in Milter-Greylist which is called BEFORE MIMEDefang. > > I start to suspect, that one of the other milter fails. Is there a way > > to log the milter commands sent from postfix to each milter from withing > > postfix to find out which milter is 'timing out' on RCPT? > > What do ordinary greylisting actions look like? Is the milter > configured to tarpit the sender? Otherwise, your culprit is > SA, likely trying and failing DNS lookups. Perhaps SPF? All of the milters employ DNS lookups for some checks. But all recipients are in the same domain. So it is not very plausible for me for one recipient failing with some kind of DNS lookup stuff while the other does not, especially not in milter-greylist and opendmarc. Well I have removed milter-greylist and opendmarc from the config for now and I will observer 1-2 to see if this issue still occurs. If not I know for sure (never say that working with IT :-) ) it's not MIMEDefang but one of the other milters. Maybe calling MIMEDefang last could also solve the issue as it is the milter altering the recipient list and altering the envelope sender because of SRS. But stop, the recipient list and envelope sender can ONLY be altered during the DATA phase, right? This is why I store the 'new' recipient list in a cached variable during the RCPT to phase and only in the DATA phase the functions change_sender() delete_recipient() and add_recipient() can be called. So I also consider it mostly ruled out, that any of the milter tampers with the recipient list tripping the other milter in the RCPT phase. - i keep digging that hole. Mit freundlichen Grüssen -Benoît Panizzon- -- I m p r o W a r e A G - Leiter Commerce Kunden ______________________________________________________ Zurlindenstrasse 29 Tel +41 61 826 93 00 CH-4133 Pratteln Fax +41 61 826 93 01 Schweiz Web http://www.imp.ch ______________________________________________________ _______________________________________________ Postfix-devel mailing list -- postfix-devel@postfix.org To unsubscribe send an email to postfix-devel-le...@postfix.org