Re: A better way to do bayesian filtering

Sidney Markowitz 7 Mar 2004 21:29:44 -0000

Marc Perkel wrote:

So spam has to catch your eye in the subject line

While I find the idea of concentrating on where the spam indicators are interesting, this sentence caught my eye. I see a lot of spam with misleading innocuous subjects, such as "Hi" or "Re: Hi" that are designed to look like ordinary mail until they are opened.

To evaluate your proposal I would want to know some numbers, such as, how much time and space would be saved given that we still have to parse the message for tokens and access the Bayes database for the tokens that we do deal with; given that we only look at the top 15 or whatever it is tokens in a message, does that automatically focus on the "hot" items; if it makes that much difference where in the message a token appears, do we get the benefits by encoding some indicator of the location with the token, e.g., store tokens as location/value instead of just value.

These are what I consider open questions raised by your suggestion, not a criticism of it.

-- sidney

Re: A better way to do bayesian filtering

Reply via email to