Dave Kitabjian <[EMAIL PROTECTED]> wrote:
> Dave Sill:
> "Say you're an MTA [ sending three messages. you could: ]
> 
> 1. ...[ send three copies through one connection, ]...then close
>    the connection.
> 2. ...start three processes, each of which...sends a copy of the
>    message...
> 3. ...send a [ single ] copy of the message addressed to all three
>    users..."
>
> Dave Kitabjian:
> Clearly, the rank of efficiency is, from best to worst,: 3, 1, 2

That might be true, if the situation you described were complete, but
it's not. The MTA handles hundreds (thousands) of messages per day,
and in typical situations a very small fraction of them are both 1) the
same message, and 2) bound for the same host.

For qmail to implement solution #1 or #3 means that qmail must identify
any mail traffic it can combine, and then must handle it specially. That
takes some work. The extra code becomes a possible source of bugs and
security holes. Does it speed qmail up noticeably?  No, for several
reasons:

  1. Only a tiny percentage of email can even potentially benefit from
     this ``optimization''. Hence, even a large speedup would have a
     small overall effect.

     (The only major exception to this is mailing list traffic. If enough
     of your local users subscribe to a remote mailing list, then it's
     worth your while to set up a sublist.)

     (The other major exception, corporate email, isn't an exception at
     all. Corporate email typically runs at LAN speeds, where the difference
     in speed is negligible.)

  2. Connection caching (strategy #1) is actually terrible for
     performance, because all mails for a given destination are
     serialized.  They would arrive faster if they were delivered in
     parallel, which is what qmail does.

     Connection caching also impairs the remote mail admins' ability to
     limit throughput to levels his server can handle. If his server goes
     down temporarily, for example, then you will probably try to shove
     thousands of emails down his throat as soon as you see he's back
     up. This effect is what Dan calls ``opportunistic bombardment'';
     it's why sendmail typically clobbers recovering mail hosts. (After
     an outage, AOL typically is taken down again, several times, by
     incredible waves of sendmail bombardment.)

     Connection caching is also unfair. It means that once you have a
     connection, you exploit it to send everything you've got. Meanwhile,
     if the server is near capacity, others are completely denied
     service. qmail'l parallel delivery, which at first seems more greedy,
     is actually fairer--admins can easily limit per-site connections,
     causing qmail to wait its turn. Meanwhile sendmail users hog up
     connections, forcing their mail through without waiting their turn.

  3. Option #3 is mainly harder for the outgoing server; it means that
     the server has to notice opportunities to combine emails into one
     message with several recipients.

     a. There are cases where a server will be fooled anyway.

     b. Ignoring that, his work will pay off only in a tiny minority of
        cases.

     c. This approach also has privacy implications: if I give a message
        with multiple recipients to another SMTP server, and some of
        those are BCC recipients, then the upstream server may violate
        privacy be recording the complete envelope.

     d. This approach makes things like VERPs impossible. VERPs are why
        ezmlm can handle bounces so conveniently. They actually have
        convenient uses for individuals, as well: it lets you map bounces
        of important emails to the exact address you were sending to
        (which may not be the address which caused the bounce).

So your logic is fine by itself, but it misses the bigger picture of
what's going on with email traffic.

Len.

--
Frugal Tip #4:
Keep skipping town two days ahead of the collection agency.

Reply via email to