Marian> When this is not the case we could do the parsing locally, hash
Marian> the tokens, build the feature vector and send it to the remote
Marian> classifier. This way the local application would not disclose
Marian> sensitive information.
I'm not sure that will work. The SpamBayes classifier works off a token
stream which does nothing to "hide" the tokens. Sure, it creates a set() of
the tokens in the message, but if my phone number, email address and credit
card information are in the message they will be in the token stream as
well.
Marian> Another important problem is related to using a single
Marian> classifier for several application (possibly with a totally
Marian> different content). IMHO the spam for an application might be
Marian> totally different then the spam of another, or to be more exact:
Marian> the ham/spam features might differ. In this case the result of
Marian> the classification might not be relevant. "My SPAM is not your
Marian> SPAM" :)
Sure. Each wiki on a single physical server might well want its own spam
classifier. Just set up the configuration bits properly (port numbers
mostly) and fire up multiple classifiers.
Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev