Noel J. Bergman wrote:
Stefano wrote:
testing 2 times 6 multihomed servers and do this things 15 times by
default. In our default that single mail would result in 6*2*15 total
attempts (180 attempts) each one keeping a thread busy for 10 minutes
(1800 minutes, more than a whole thread day).
That does not match the default delivery schedule. 6IP*2Server*10min is two
hours, and then there is a retry delay before we try again. After the 4th
retry, the interval would be 3 hours, and the remaining are 6 hours up to 25
attempts. So the worse case should have the thread tied up for 2 hours
(which I agree is not acceptable), and there should be large blocks of hours
where e-mail delivery for that message is not attempted.
Yes, 2 hours each attempt. 15 attempts are 30 hours of thread time per
each mail destinated to that domain.
Yeah, *only* 2 hours before the other mails are processed, but take this
scenario then:
10 mails for the "bad" host, 1 for the good host in this order.
James try the first bad mail for 2 hours, then the second for 2 hours,
then the third.... so on, for 20 hours,.. then it try the good one and
deliver it (20 hours later!)
So to increase the probability to deliver a mail soon to a bad host we
delay the delivery time to the good host.
If this is not working as described, it is a bug. Not a design flaw.
It is working as described and imho is not a good thing. I would really
prefer a better default (test all MX but only 1 IP per mx, much lower
timeouts 30seconds on connect +3 minutes on session, 5 remote delivery
threads).
Btw I also patched RemoteDelivery to have a max MX servers to test
configurable: I use 2 as my default.
Few domains uses a lot of different MX domains and this would return to
the "bad" scenario we described.
I also think that 15 attempts to deliver a single mail is too much as a
default.
Maybe we should put in our config 2 different RemoteDelivery: one "best
effort" with good performance but less reliable and one with worst
performance but better reliability.
1800 thread minutes for a single mail as worst case default is not
acceptable to me.
With a 3 minute timeout, the worst case for your scenario should be 36
minutes before we reschedule the e-mail for later delivery.
This would be too much anyway if you are sending 100000 mails and even
if 1% of that mails are for the "bad" host.
Maybe we should change our accept order policy: now we sort by
last_updated. Maybe we should change it to sort by state then
last_updated (getting first all the messages not in ERROR and then the
one in ERROR). This way at least a first attempt should always be done
with a grater priority than retries.
Stefano
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]