Re: [spamdyke-users] feature requests :)

Sam Clippinger Mon, 07 Apr 2008 22:39:04 -0700

Andras Korn wrote:
> On Sun, Apr 06, 2008 at 02:53:24PM -0500, Sam Clippinger wrote:
>>> * I think spamdyke would make an even more seamless replacement for rblsmtpd
>>> if it supported the RBLSMTPD environment variable in roughly the same way as
>>> rblsmtpd itself; that is, if it's set but empty, skip RBL checks; if it's
>>> set to a string, reject the mail temporarily with the given string as the
>>> error message sent to the client; and if the string begins with a hyphen,
>>> reject the message permanently with the string sans the hyphen as the error
>>> message sent to the client. If the variable is unset, just filter normally.
>> I wasn't aware rblsmtpd included this feature.  I'm a little hesitant to 
>> duplicate it because of its design -- the existence of the environment 
>> variable and its effect on rblsmtpd's behavior are very non-intuitive. 
> 
> That's certainly a matter of taste. I find it intuitive by following this
> train of thought:
> 
> 1. with no extra settings (i.e. no envvar), filter normally.
> 2. string in envvar forces filtering (even if no RBL would match), provides
> custom error message.
> 3. hyphen switches to permanent rejection.
> 4. empty string doesn't make sense as custom error message, hence it can be
> used to disable filtering.


That makes sense, when it is explained in that way.  However, I still 
don't find it intuitive, because it requires an explanation.  The flag 
I've implemented in the next version of spamdyke will look like this:
        filter-level=normal
        filter-level=allow-all
        filter-level=require-auth
        filter-level=reject-all
If any of those lines are present in a configuration file, I believe no 
explanation is required to understand their basic effect.  The same is 
not true of an environment variable; that's why I don't like it.

>> However, if other tools already set this variable, I can make spamdyke 
>> use it to allow better compatibility.  Since I don't (and won't) use 
>> this feature myself, whether I implement it is up to everyone here.  If 
>> people want it, I'll add it.
> 
> Would you integrate a patch with this functionality?

Only if other code already exists that sets this environment variable 
(e.g. a tcpserver replacement).  I believe using the environment this 
way is a bad interface; I would only support it reluctantly, to make 
spamdyke more compatible with existing systems.

>> I've implemented a flag in the next version of spamdyke that will 
>> function the way you describe the "WHITELIST" variable.  It doesn't use 
>> an environment variable but it is otherwise identical.  It also has a 
> 
> The idea with the environment variable is that it can be set/unset using an
> arbitrarily flexible or complex mechanism outside spamdyke, based on
> arbitrary criteria. I don't see how you can duplicate that in any other way.

If the parent daemon (e.g. tcpserver) can alter the environment for its 
children based on arbitrary criteria, why can't it alter spamdyke's 
command line instead?  I'm getting the impression you're describing 
software that hasn't been written yet anyway, so the environment doesn't 
have to be the only way to communicate with child processes.

> Much of what you're doing in spamdyke is duplicating functionality that
> could be (and is) provided by a tcpserver replacement. For example,
> blacklisting IP addresses and rdns domains could be trivially accomplished
> using tcpsvd and environment variables; no need for these kinds of
> blacklists in spamdyke.
> 
> More generally, spamdyke would only need to concern itself with SMTP-level
> issues. Filtering based on data available at the TCP or IP level (including
> client rdns) could be implemented with tcpsvd and environment variables,
> reducing the amount of code in spamdyke.
> 
> It would be undesirable to refuse connections at the TCP level; spamdyke
> should start, find out who the mail is from and who it is addressed to, and
> then just refuse delivering it based on the value of the appropriate envvar.
> Because tcpsvd does a reverse DNS lookup on the client anyway, and passes
> the result on to its child in the TCPREMOTEHOST variable, spamdyke's lookup
> is superfluous. I think tcpserver provides this variable as well.
> 
> reject-empty-rdns could simply check whether TCPREMOTEHOST is empty.
> 
> If it's not empty, the value can be used as the basis for
> reject-unresolvable-rdns.
> 
> However, I think these tests actually belong in tcpsvd, and not spamdyke.
> They have nothing to do with SMTP.

All very true.  tcpserver does indeed provide the TCPREMOTEHOST 
environment variable, which spamdyke ignores.  tcpserver also parses 
/etc/tcp.smtp.cdb but spamdyke ignores its efforts and reparses 
/etc/tcp.smtp anyway.

There are several reasons I'm implementing these features in spamdyke 
and duplicating the effort put into tcpserver (and others).  Efficiency 
is not always my top priority.

First, there are some situations where spamdyke must perform duplicate 
work in order to achieve the correct result.  SMTP AUTH is the best 
example -- authenticated users are allowed to bypass all filters.  If 
blacklisting takes place before spamdyke is invoked, authenticated users 
will be incorrectly blacklisted.  This is one of rblsmtpd's major failings.

Second, most qmail servers use DJB's tcpserver.  Many replacements may 
be available but none are in wide use.  For that reason, I must design 
spamdyke for the "lowest common denominator" of qmail configurations. 
If I make spamdyke dependent on an alternative daemon, spamdyke's 
popularity will immediately drop to (almost) zero.  A major part of 
spamdyke's appeal is its "drop-in" design -- system administrators can 
try it quickly, see if it works and remove it just as quickly.  I am a 
qmail expert yet I still use tcpserver myself; I have no interest in 
testing a replacement.  Email is just too critical for that.

Third, I want spamdyke to be as self-contained and self-sufficient as 
possible.  I want everyone who installs spamdyke to be confident that it 
will act the same way on every qmail system.  (That makes the 
documentation and mailing list archives much more valuable as resources 
for answering questions.)  A spamdyke configuration on a bleeding-edge 
Linux installation should work just as well on an ancient NetBSD server. 
  That means I must implement many features myself, even when I could 
more easily use external libraries or daemons.

Fourth, I want spamdyke to be as easy to configure as possible.  Yes, 
remote servers could be blacklisted by tcpserver (or an alternative). 
They could also be blocked by a firewall or a packet filter.  Those 
alternatives, while possibly more efficient, are difficult to configure 
and nearly impossible to test.  Many administrators are not Unix experts 
and don't know how to reconfigure their servers' packet filters.  They 
know the configuration is complex and a small error can have huge 
negative consequences -- therefore the entire system is regarded as 
"black magic" and avoided whenever possible.  So I provide the blacklist 
feature in spamdyke because I can make it simple to configure.  An 
administrator can be assured that editing spamdyke's configuration file 
will affect only spamdyke and they can use spamdyke's "config-test" 
feature to see if they've made any mistakes.

There's a lot to be said here for verbose configuration files.  Which 
line in the spamdyke configuration file controls the IP blacklist? 
That's obvious -- there's no need to consult the documentation. 
Personally, I've had to repair broken sendmail servers in the middle of 
the night (with users and managers screaming at me).  If I can design 
spamdyke in a way that helps avoid that experience for even one person, 
I'll have no regrets.

Fifth, your comments about duplicated code assume spamdyke will only 
ever work for qmail servers.  That's not true, however -- I fully intend 
to turn spamdyke into a daemon that can proxy connections to other 
servers/ports so it can be used by sendmail/postfix administrators.  I 
intend to wrap spamdyke in a Windows Service so it can be used by 
Exchange administrators.  In those environments, spamdyke will not be 
able to rely on external/parent daemons to provide any information.

Lastly, spamdyke is a hobby for me.  No one is paying me for this, so I 
am free to write the code that most interests me and design it according 
to what I believe is most correct.  Does the next version of spamdyke 
_really_ need to contain its own DNS resolver routines so that the 
system resolver is not used at all?  No, probably not.  But I wanted to 
figure out how to send and parse DNS queries and I found a way spamdyke 
could benefit from my experimentation.  The same is true with the other 
features spamdyke (may have) duplicated from other packages.  I wanted 
to do it myself, so I did.

>> I don't like passing values to child processes in environment variables 
>> because they're not externally visible.  In other words, when an 
>> environment variable is set, only the child process can read it.
> 
> Both the child and the parent process is free to log it; and any process
> tracing either the child or the parent will see the envvar. On Linux
> systems, it's even available in /proc/$PID/environment.

You're speaking as a programmer here, not an administrator.  Most 
administrators know nothing about tracing processes and aren't 
comfortable looking at anything in /proc.  I agree tcpserver is free to 
log information about the environment variables it sets for its child 
processes.  But since it doesn't do that, administrators don't have that 
data to help track down an error.

Email is a critical service.  If an administrator is trying to 
troubleshoot a problem and there's even a /hint/ spamdyke may be at 
fault, spamdyke will be removed.  I know this because it's how I would 
respond in a crisis.  To avoid that, I want to make spamdyke's 
configuration as accessible as possible.  An administrator should be 
able to see how spamdyke was invoked by using just "ps".  A 
configuration file should be readable with "cat" and understandable 
without the documentation.  spamdyke should always be able to explain 
what it's doing (e.g. what file and line number were matched).  That 
way, the administrator is in control, not mysterious daemon processes.

This is the core of my design philosophy.  I can't change all of the 
other software on my server to behave this way, although if wishing made 
it so...  I can only make spamdyke behave this way and hope someone 
finds it useful.

>> On the other hand, when the configuration is set through command line
>> flags or configuration files, it's very easy to see what's happening. 
>> Configuration, testing and troubleshooting are much easier.
> 
> The price is duplicate configuration. Instead of relying on tcpsvd to set
> e.g. the RBLSMTPD variable, or one to turn off another specific spam filter
> for a specific client, I need to list the address or the domain of that
> client in spamdyke's configuration as well. I can do this as long as the
> configuration is static; it's a hassle, but doable.

Yes.  Efficiency is not always my top priority.  If you have to 
configure the same feature twice, you will understand that the filter is 
being run twice by two different programs.  You can choose to disable 
the filter in one of those two places.  If there are any problems, you 
will know to check both programs' configurations.

On the other hand, if the tcpserver daemon (or a replacement) sets an 
environment variable that affects a child process, you may not realize 
it also affects spamdyke.  You could easily spend many hours trying to 
understand why spamdyke is "malfunctioning", never realizing that a 
setting intended for an unrelated filter was also affecting spamdyke.

> However, suppose my tcpserver replacement can do database lookups. Imagine I
> have a database that many hosts update and it contains "reputations" of SMTP
> clients (amount of spam and ham received, number of attempted deliveries to
> fake recipients, backscatter, result of blitzed open proxy test, that kind
> of thing). Based on the contents of this database, my tcpserver replacement
> could set up $RBLSMTPD (or different variables) just so. I can't reproduce
> this with spamdyke, and wouldn't want to.

You're describing software that doesn't exist yet.  If you design your 
system to set environment variables, why can't it set spamdyke's command 
line instead?

> Another benefit I see is abstraction: the same functionality can be
> implemented, based on the same envvar, in different programs. The tcpserver
> replacement doesn't need to care whether it's going to start spamdyke or
> rblsmtpd or qmail-smtpd, because all of them can (could) use the same
> envvars to do the same thing (which is the fundamental idea in UCSPI). Also,
> the delivery chain can be arbitrarily long; as long as one process sets an
> environment variable and one child somewhere down the pipe uses it, the two
> processes don't need to know or care about how many other processes are
> between them. The same doesn't work quite as transparently with command line
> arguments.

I understand what you're saying but where you see benefits, I see only 
downsides.  This design superficially resembles Unix's small command 
line utilities that can be chained together with pipes to do interesting 
things.  However, the difference is that an environment variable 
persists even beyond a program that doesn't use it.  The potential for 
unintended side-effects far outweighs the slight benefit, IMHO.

An example: tcpserver sets an environment variable named TCPLOCALPORT. 
The value is the port number on which the incoming connection was 
received: 25 for SMTP, 110 for POP3, etc.  Most qmail-related programs 
don't use this environment variable but one significant program relies 
on it.  In the vpopmail package, "vchkpw" checks usernames and 
passwords.  It uses the TCPLOCALPORT environment variable to determine 
how to authenticate its input.  If the port is 25, 465 or 587, it will 
do SMTP AUTH.  If the port is 143 or 993, it will do IMAP.  Any other 
port is treated as POP3.

Now, suppose an administrator tries to reconfigure tcpserver to listen 
on port 2525 instead of port 25 (for whatever reason).  All of the 
sudden, SMTP AUTH will stop working because vchkpw will assume all 
incoming connections are POP3 connections, not SMTP.

This is completely non-intuitive behavior.  There's very little 
documentation that tcpserver sets TCPLOCALPORT and _no_ documentation 
that vchkpw uses it.  However, if vchkpw used a simple command line flag 
for "authenticate SMTP" instead of an environment variable, the 
situation would never arise.

I'm always in favor of less magic and more control, even if it means 
more typing in a configuration file.

>> So, as with the RBLSMTPD environment variable, I would be willing to 
>> implement environment variable-based configuration in spamdyke in order 
>> to work with existing tools.  But I don't think implementing new 
>> features this way is a good idea.
> 
> Did I succeed in changing your mind? :)

No, sorry. :)  If existing tools set environment variables to work with 
rblsmtpd (or others), I'll consider adding something to spamdyke to make 
it compatible.  But if those other tools haven't been written yet, I 
don't see a need to support them.

>>> * As for filtering invalid recipients,
[snip]
> Personally, I think the best way would be to use a simple co-daemon with a
> socket interface (fifos are too dumb). You write a recipient address into
> the socket and the co-daemon replies with a code indicating whether it's
> valid or not.

I'll consider that idea when I start working on it.  At this point, I'm 
trying to finish testing the next version so I can release it.  Once 
that's done, I want to tackle recipient validation.  I haven't yet 
determined which approach to take. :)

> Please take a look at ipsvd/tcpsvd (<http://smarden.org/ipsvd/>) before you
> do. I would hate to see you waste effort on something that's already been
> done. :)
> 
> Also, it would be a shame to have to choose between the features of spamdyke
> and the features of tcpsvd, which is already a huge improvement over
> tcpserver.

I wasn't aware of that project; I'll take a look.  Thanks for the tip.

>>> * check-rhsbl tests two separate things: whether the rdns of the client is
>>> blacklisted, and whether the envelope from domain is blacklisted. I may want
>>> the latter without the former (which would also save DNS lookups).
>> I can add this if it's something people want.  I made spamdyke check 
>> both because that was how DNS RHSBLs were described everywhere I read 
>> about them.  Personally, I couldn't envision a scenario where one check 
>> would be desirable but the other would not.  It would be easy to 
>> separate though.
> 
> I imagine it might be possible for a legitimate SMTP client to be located in
> the network of an ISP that is listed by a RHSBL. It's the envelope from I
> care more about. I don't necessarily want to block the client just because
> it happens to have chosen the wrong provider.

I understand your point but I usually feel exactly the opposite -- the 
origin matters more than the envelope, because the envelope is so easy 
to forge.  That's a policy decision that should be under the 
administrator's control, however.  I'll put this one on my TODO list.

-- Sam Clippinger
_______________________________________________
spamdyke-users mailing list
[email protected]
http://www.spamdyke.org/mailman/listinfo/spamdyke-users

Re: [spamdyke-users] feature requests :)

Reply via email to