Marian> When this is not the case we could do the parsing locally, hash
    Marian> the tokens, build the feature vector and send it to the remote
    Marian> classifier. This way the local application would not disclose
    Marian> sensitive information.

I'm not sure that will work.  The SpamBayes classifier works off a token
stream which does nothing to "hide" the tokens.  Sure, it creates a set() of
the tokens in the message, but if my phone number, email address and credit
card information are in the message they will be in the token stream as
well.

    Marian> Another important problem is related to using a single
    Marian> classifier for several application (possibly with a totally
    Marian> different content). IMHO the spam for an application might be
    Marian> totally different then the spam of another, or to be more exact:
    Marian> the ham/spam features might differ. In this case the result of
    Marian> the classification might not be relevant. "My SPAM is not your
    Marian> SPAM" :)

Sure.  Each wiki on a single physical server might well want its own spam
classifier.  Just set up the configuration bits properly (port numbers
mostly) and fire up multiple classifiers.

Skip
_______________________________________________
spambayes-dev mailing list
spambayes-dev@python.org
http://mail.python.org/mailman/listinfo/spambayes-dev

Reply via email to