Rich Puhek <[EMAIL PROTECTED]> writes:

> You may be on to something there:
> 
> body  T_CONFIDENTIALITY1      /Confidentiality\sassured/i
> body  T_CONFIDENTIALITY2      /\bConfidentiality\b/i

Seems like Bayes territory to me.

0.997         15          0 1077783028  Confidence
0.995          9          0 1077726419  confidant
0.989          4          0 1077930310  confide
0.989          4          0 1077919123  confidante
0.978          2          0 1077245378  CONFIDENCE
0.970         51          5 1077917889  Confidentiality
0.958          1          0 1077931164  self-confidence
0.958          1          0 1077931160  confident!
0.936          5          1 1077861218  Confidential
0.928        273         70 1077937447  confidentiality
0.889        309        127 1077939518  confidence
0.853        120         68 1077933499  confident
0.752         12         13 1077679452  CONFIDENTIAL
0.686        378        572 1077935350  confidential
0.400          1          5 1077704032  confidently
0.026          0          2 1077731615  CONFIDENTIALITY
0.007          0          8 1077824775  confidentially

The numbers are good for some, but not quite what you'd expect for all.

It does remind me that we should look into using 2+ token sequences
(which is on the developer wishlist).

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to