At 2:06 PM -0700 5/4/04, Elliot Wilen  imposed structure on a stream
of electrons, yielding:
For the past few months, a user at my company has reported
difficulties receiving mail sent from a particular educational
institution. After examining the problem and getting some help from
someone on the technical staff there, I think I've found what's
going on. But I wonder if anyone has any additional insights or
suggestions.

The problem occurs when the person sending the mail has an email
address where the domain portion lacks an MX record. The domain
portion does have a valid A record, though. During the SMTP
transaction, SIMS was taking approximately 58 seconds to reply to
the MAIL FROM command. E.g.,

11:59:42 4 SMTP-300(smtp.bigschool.edu) Input Line: mail
from:<[EMAIL PROTECTED]>\r
12:00:40 4 SMTP-300(smtp.bigschool.edu) No relay exists for
'psych.bigschool.edu'
12:00:40 4 SMTP-300(smtp.bigschool.edu) Looking for psych.bigschool.edu
12:00:40 4 SMTP-300(smtp.bigschool.edu) Sending 250
<[EMAIL PROTECTED]> sender accepted\r\n

However, by the time SIMS had responded, the remote server had timed
out the connection:

12:00:40 3 SMTP-300(smtp.bigschool.edu) Abort Received, reason=54
12:00:40 4 SMTP-300(smtp.bigschool.edu) Nothing read - stream broken
12:00:40 3 SMTP-300(smtp.bigschool.edu) Reading Failed. Error
Code=-25010. Read:

At the remote end, the sender eventually gets a "warning: could not
send message" notice with a transcript that reads:

451 4.4.1 reply: read error from mprinc.com.
<[EMAIL PROTECTED]>... Deferred: Connection timed out with mprinc.com

It seems there are three things that can be done to fix this problem.

1. Get bigschool.edu to increase its timeouts if possible. (All
outbound mail at bigschool.edu passes through a centrally-managed
cluster.) This should work since I have found mail from other
sources coming through our server where the sender's address only
has an A record, provided the remote server is patient.

A timeout less than 60 seconds for a response to MAIL is indeed absurd. I bet they don't get much mail through to a lot of places.


1a. Get bigschool.edu to create MX records for its various
departmental mailservers. Would work, but may be asking too much.

Not really. That falls into the class of "fix your DNS" because having MX records for any legitimate domain part of an email address should be standard practice. Relying on A records works for mail delivery, but it is a fallback that no modern network should be relying upon. But this would not solve your problem (see below.)


2. Turn off "verify return path". This would let in a little more
spam, unfortunately.

Likely a LOT more spam. Many spam senders also time out fast and never retry. That means that this feature prevents spam sometimes without ever being obvious.


3. Reduce the amount of time it takes SIMS to perform the return path lookup.

That seems like a good choice. Taking as long as you seem to take to do the successful resolution indicates a problem in DNS somewhere. Note that you found the A record in the same second that you failed to find the MX.


(I really wish you had not munged the log lines. It would be useful to be able to tell whether their DNS was the slow bit...)

After going into the TCP/IP control panel on the SIMS machine and
removing all the DNS servers except itself (QuickDNS 3.5.3 runs on
the same machine), I found that the return path verification was
reduced to about 14 seconds when the path contained a valid A record
but no MX record. Since there were originally 4 DNS servers in the
TCP/IP control panel, it's likely that SIMS takes 14 seconds per
server to do the verification (4 x 14 = 56 seconds).

That's probably a sign that those other 3 nameservers are not really talking to you or are acting stupid.


This brings me to the following questions:

1) Any problems with my analysis?

I think you don't quite have a handle on how the name lookups are working. The MacOS resolver will ask the first server in the list first and wait for a response, and only try the other ones in the list if it times out or gets a SERVFAIL or NOERROR+NOANSWER (i.e. 'go do your own recursion') response. The fact that this is taking so long but eventually succeeding indicates that the first nameserver in your list is probably broken, at least as far as doing full recursive resolution for you is concerned.


2) Is bigschool.edu's timeout on their outbound SMTP sessions
unreasonably short?

Yes.

3) How many other DNS servers should I include in the TCP/IP control
panel, and which ones? I think I should have at least one besides
QuickDNS, but maybe I shouldn't have the server refer to itself at
all.

Always put the nearest caching nameserver that will do recursion for you (i.e. QuickDNS) first. If that nameserver is flaky, you may want another but you want to make sure that it really is working for you. The return on adding more after that diminishes very rapidly. DNS queries should give the same results from any machine willing to do recursive resolution for you, so having a lot of them is silly. You need enough of them so that the failure frequency of the first one doesn't become a problem, but if you have 99% reliability on the first two, it is silly to go much further. (i.e. that's one DNS failure in 10,000 queries)
--
Bill Cole
[EMAIL PROTECTED]



############################################################# This message is sent to you because you are subscribed to the mailing list <[EMAIL PROTECTED]>. To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>



Reply via email to