Re: The trouble with Bayes

Paul Boven 6 May 2005 16:31:30 -0000

Hi Kevin, everyone,

Kevin Peuhkurinen wrote:

Paul Boven wrote:

but my goal is to find a way of doing this that is independent of the rest of the mail-system, and can then become an integral part of SA.
Any suggestions on how to do this? One of SA's strengths is that it is designed to be a module that can be plugged into a larger mail flow environment rather than acting as a monolithic application. I think that any attempt to create a manual training method that suits every environment is doomed to failure.

Well, the reason I bring this up is the hope that we can come up with such a thing. I'm not convinced at this moment that it is already doomed to failure: if I were, I would not have started this whole discussion.

I currently have a bit of perl that strips message/rfc822 attachements and feeds them to the learner, which works with a number of clients and servers. It's run from the alias-file, and has the advantage that end-users don't need an account on the filter-machine. Disadvantage is that it is susceptible to all changes that get inflicted on the mail by those clients and servers.

You've just pointed out yourself why it is next to impossible for SA to associate a unique identification code to a specific email such that it will always be able to recognize that email in the future. SA has no control nor knowledge of what happens to the email after it has scanned it, and it can be altered in numerous ways. So again, do you have any better suggestions? As I said in my original email, any attempt to create a better means of identifying emails would need to be rigorously tested before it could be shown to actually be better than what SA already has.

Yes, you are right, this is not an easy puzzle. But it seems to me that given the way mail clients and servers behave, looking at the content of the mail could be more robust than the current method. And we don't need to be able to uniquely identify every email forever, but just to make sure that any (auto)learning can be undone within a reasonable time.

It is a fine thing to demonstrate weaknesses in a product. It's a much better thing to suggest ways to improve it. Your suggestion to weigh manually learned emails heavier than autolearned ones is a good start.

Oh, please don't get me wrong. I'm not just complaining about what's been bugging me for a while, I'd like to help get this fixed. Discussing things seems like a good first step in that direction, with some of the best spam-fighters hanging out in this forum.

Regards, Paul Boven.

Re: The trouble with Bayes

Reply via email to