Rich Puhek <[EMAIL PROTECTED]> writes: > You may be on to something there: > > body T_CONFIDENTIALITY1 /Confidentiality\sassured/i > body T_CONFIDENTIALITY2 /\bConfidentiality\b/i
Seems like Bayes territory to me. 0.997 15 0 1077783028 Confidence 0.995 9 0 1077726419 confidant 0.989 4 0 1077930310 confide 0.989 4 0 1077919123 confidante 0.978 2 0 1077245378 CONFIDENCE 0.970 51 5 1077917889 Confidentiality 0.958 1 0 1077931164 self-confidence 0.958 1 0 1077931160 confident! 0.936 5 1 1077861218 Confidential 0.928 273 70 1077937447 confidentiality 0.889 309 127 1077939518 confidence 0.853 120 68 1077933499 confident 0.752 12 13 1077679452 CONFIDENTIAL 0.686 378 572 1077935350 confidential 0.400 1 5 1077704032 confidently 0.026 0 2 1077731615 CONFIDENTIALITY 0.007 0 8 1077824775 confidentially The numbers are good for some, but not quite what you'd expect for all. It does remind me that we should look into using 2+ token sequences (which is on the developer wishlist). Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, http://www.pathname.com/~quinlan/ and open source consulting
