Marian> When this is not the case we could do the parsing locally, hash Marian> the tokens, build the feature vector and send it to the remote Marian> classifier. This way the local application would not disclose Marian> sensitive information.
I'm not sure that will work. The SpamBayes classifier works off a token stream which does nothing to "hide" the tokens. Sure, it creates a set() of the tokens in the message, but if my phone number, email address and credit card information are in the message they will be in the token stream as well. Marian> Another important problem is related to using a single Marian> classifier for several application (possibly with a totally Marian> different content). IMHO the spam for an application might be Marian> totally different then the spam of another, or to be more exact: Marian> the ham/spam features might differ. In this case the result of Marian> the classification might not be relevant. "My SPAM is not your Marian> SPAM" :) Sure. Each wiki on a single physical server might well want its own spam classifier. Just set up the configuration bits properly (port numbers mostly) and fire up multiple classifiers. Skip _______________________________________________ spambayes-dev mailing list spambayes-dev@python.org http://mail.python.org/mailman/listinfo/spambayes-dev