Re: Using the PRAXIS spam filter

Steen Jansdal Wed, 05 Nov 2003 00:40:55 -0800

Vincenzo Gianferrari Pini wrote:

Steen,

sorry for the delay in answering.


What delay? Since this is not a paid support-list,
you are allowed to do other things than just wait
for my stupid questions to answer. :-)

* Whitelist: Is it possible to use wildchars in usernames. I would like to whitelist all mails from a certain domain, no matter which user from that domain is sending. (Perhaps something like *.<remote-domain> ).

There is no need to insert such "wildchared" addresses in the

> hitelist; you can achieve the desired behaviour simply
> appropriately coding your config file, as I did too
> (search for "SenderHostIs=xxx.com yyy.org" in the snippet below):


Yes, but in our CRM database we have approx. 1000 companies that
must be on the whitelist, and mails from them must in *NO* way be
considered spam. IMO entering and maintaining 1000 companies
directly in the config file is not the correct way to do it.

And I would like to be able to whitelist a mail address for all my
users in one step. (For example by using a wildchar: *.<mydomain> )
This is a good idea, and I will add it to WhiteListManager and IsInWhiteList. But what if a user ingenuously answers to a spam message to be removed?

> The spammer address would immediately go into the whitelist for
> many other users. Such option should be used cautiously.


Yes, I see what you mean. In my case these wildchared whitelists will
not be maintained by the user himself, but come directly from the
CRM database.

* Spam manager: Is it possible to automatically add a whitelisted mail directly to the corpus. This way my users don't have to send non-spam mails to the spam manager.

Again, you can do that just playing around with the config file,

> putting appropriate "not spam" bayesian analysis feeder
> mailet entries in appropriate places with appropriate matchers.

But beware: while JDBCBayesianAnalysis is quite fast, the

> JDBCBayesianAnalysisFeeder mailet does a lot of work
> (database activity) and takes several seconds or even
> minutes to update the statistics in the database for
> a single message feeded. This is ok as long as the
> number of spam/not.spam message feedings is low compared
> to the number of messages analysed, but feeding any
> whitelisted message would kill everything. Regarding this
> performance problem, I'm thinking on using a serializable
> object (with some kind of "asynchronous intermittent
> lazy writer" and appropriate behaviour against write
> failures) instead of a database for storing the corpus.

I see the problem and I'll drop my idea.

* Spam manager: Are mails identified as spam automatically added to
the corpus. Again to save my users from sending spam mails to the
spam manager. (You know, my users are lazy)

Same as above, but moreover IMO it would be dangerous: the

> effect would be amplifying "false positives" through a
> feedback mechanism, quickly ruining the corpus; only
> "true positives", determined as such in some other way,
> should be fed as spam, and you can do it simply playing
> around with the config.xml file.

I see what you mean and this idea is dropped too.

Vincenzo

Steen


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using the PRAXIS spam filter

Reply via email to