Hi Viktor

> "milter-greylist" would be the main suspect for doing something
> non-trivial at the "RCPT" stage. The others can't do much without the
> message content.  But isn't this just a normal "greylist" action???  If
> there's no logging about milter timeouts or failures, no other
> explanation comes to mind.

milter-greylist either accepts, tempfails or rejects an email. So it's
function seems quite trivial to me. But still I try to rule out that
something goes specifically wrong when some very specific, not yet
identified situations occurs.

> > First handful of recipients show up in the MIMEDefang Log. Then there
> > is a 30 seconds gap, then Postfix fails all remaining recipients and
> > the complete email with 4.7.1 If one email triggers that issue, it is
> > triggered every time, on all of our 3 MX with identical milter config.  
> 
> But, it is also possible that spamassassin (via MIMEDefang) is
> configured to do non-trivial sender or recipient DNS lookups, and these
> may be timing out.  I think we can rule out opendmarc.

Yes ideed. MIMEDefang was my absolute suspect as we also use it to do
SRS and forward email and optionally delivering a copy locally by
altering the recipient list!

This is why I added logging right at the start of filter_recipient()
and before sending back the result to Postfix.

When the issue occurs I first see a handful of recipients being logged
at the very beginning for filter_recipient() and right before the
result is returned.
Then there is a 30 second gap in the logs.
Then postfix reports '4.7.1' for the remaining recipients and the email
content. The first recipient temp failed by postfix does not hit the
beginning of filter_recipient().

Conclusion:
* There is a communication issue between postfix and MIMEdefang.
or
* The issue is caused by one of the other milter!

I guess the milter are called in the order they appear in the config?

smtpd_milters = inet:filterhost:3331 inet:filterhost:3332 { 
inet:filterhost:3333, default_action=accept }

first CONNECT is sent to 3331 then 3332 then 3333
then
HELO is sent to 3331 then 3332 then 3333
then
MAIL is sent to 3331 then 3333 then 3333
then
FOREACH RCPT is sent to 3331, 3332, 3333
And at the end the whole payload is sent in DATA
to 3331, 3332, 3333

I can rule out an issue during DATA as the issue clearly occurs while
processing the 'rcpt to' phase.

In my case 3331 would be milter-greylist, 3332 would be MIMEDefang and
3333 opendmarc.

Observe:

rcpt to: alice
rcpt to: bob
rcpt to: charlie

I see 'alice' and 'bob' passing all stages for filter_recipient() in
MIMEDefang. Charlie does not show up in MIMEDefang but is then logged
by Postfix with 4.7.1 milter tempfail

So if I keep digging the 'blame the other milter' rabbit hole my
conclusion is:

* bob timed out in OpenDMARC which is called AFTER MIMEDefang.
or
* charlie timed out in Milter-Greylist which is called BEFORE
  MIMEDefang.

> > I start to suspect, that one of the other milter fails. Is there a way
> > to log the milter commands sent from postfix to each milter from withing
> > postfix to find out which milter is 'timing out' on RCPT?  
> 
> What do ordinary greylisting actions look like?  Is the milter
> configured to tarpit the sender?  Otherwise, your culprit is
> SA, likely trying and failing DNS lookups.  Perhaps SPF?

All of the milters employ DNS lookups for some checks. But all
recipients are in the same domain. So it is not very plausible for me
for one recipient failing with some kind of DNS lookup stuff while the
other does not, especially not in milter-greylist and opendmarc.

Well I have removed milter-greylist and opendmarc from the config for
now and I will observer 1-2 to see if this issue still occurs. If not I
know for sure (never say that working with IT :-) ) it's not MIMEDefang
but one of the other milters.

Maybe calling MIMEDefang last could also solve the issue as it is the
milter altering the recipient list and altering the envelope sender
because of SRS.
But stop, the recipient list and envelope sender can ONLY be altered
during the DATA phase, right? This is why I store the 'new' recipient
list in a cached variable during the RCPT to phase and only in the DATA
phase the functions change_sender() delete_recipient() and
add_recipient() can be called.

So I also consider it mostly ruled out, that any of the milter tampers
with the recipient list tripping the other milter in the RCPT phase.

- i keep digging that hole.

Mit freundlichen Grüssen

-Benoît Panizzon-
-- 
I m p r o W a r e   A G    -    Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29             Tel  +41 61 826 93 00
CH-4133 Pratteln                Fax  +41 61 826 93 01
Schweiz                         Web  http://www.imp.ch
______________________________________________________
_______________________________________________
Postfix-devel mailing list -- postfix-devel@postfix.org
To unsubscribe send an email to postfix-devel-le...@postfix.org

Reply via email to