Re: HELP!! Thousands of files stuck in spool at 'transport' state

Stefano Bagnara Wed, 21 Jun 2006 15:37:14 -0700

Noel J. Bergman wrote:

Stefano wrote:

testing 2 times 6 multihomed servers and do this things 15 times by
default. In our default that single mail would result in 6*2*15 total
attempts (180 attempts) each one keeping a thread busy for 10 minutes
(1800 minutes, more than a whole thread day).


That does not match the default delivery schedule.  6IP*2Server*10min is two
hours, and then there is a retry delay before we try again.  After the 4th
retry, the interval would be 3 hours, and the remaining are 6 hours up to 25
attempts.  So the worse case should have the thread tied up for 2 hours
(which I agree is not acceptable), and there should be large blocks of hours
where e-mail delivery for that message is not attempted.

Yes, 2 hours each attempt. 15 attempts are 30 hours of thread time pereach mail destinated to that domain.Yeah, *only* 2 hours before the other mails are processed, but take thisscenario then:


10 mails for the "bad" host, 1 for the good host in this order.

James try the first bad mail for 2 hours, then the second for 2 hours,then the third.... so on, for 20 hours,.. then it try the good one anddeliver it (20 hours later!)

So to increase the probability to deliver a mail soon to a bad host wedelay the delivery time to the good host.

If this is not working as described, it is a bug.  Not a design flaw.

It is working as described and imho is not a good thing. I would reallyprefer a better default (test all MX but only 1 IP per mx, much lowertimeouts 30seconds on connect +3 minutes on session, 5 remote deliverythreads).

Btw I also patched RemoteDelivery to have a max MX servers to testconfigurable: I use 2 as my default.

Few domains uses a lot of different MX domains and this would return tothe "bad" scenario we described.

I also think that 15 attempts to deliver a single mail is too much as adefault.

Maybe we should put in our config 2 different RemoteDelivery: one "besteffort" with good performance but less reliable and one with worstperformance but better reliability.

1800 thread minutes for a single mail as worst case default is not
acceptable to me.


With a 3 minute timeout, the worst case for your scenario should be 36
minutes before we reschedule the e-mail for later delivery.

This would be too much anyway if you are sending 100000 mails and evenif 1% of that mails are for the "bad" host.

Maybe we should change our accept order policy: now we sort bylast_updated. Maybe we should change it to sort by state thenlast_updated (getting first all the messages not in ERROR and then theone in ERROR). This way at least a first attempt should always be donewith a grater priority than retries.


Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HELP!! Thousands of files stuck in spool at 'transport' state

Reply via email to