> On Aug 22, 2016, at 8:09 AM, John Hardin <jhar...@impsec.org> wrote: > > On Mon, 22 Aug 2016, Antony Stone wrote: > >> On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote: >> >>> On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote: >>>>> So. What percentage of emails using your algorithm are actually >>>>> decidable? >>>> >>>> Almost 100% if you look at a wide variety of tokens from multiple >>>> attributes. >>> >>> I can't believe that, or I'm missing something. Almost every spam I see >>> contains words that also appear in ham. Things like "this" or "invoice" >>> or "regards" or "dear". >>> >>> What am I missing? >> >> I believe you're missing Marc's definition of "token". > > ...and it looks like we're venturing into the "SA Bayes multiple-word token > support" realm (as a surrogate). >
Even with the multiple tokens combined into one fingerprint, you've changed little. No matter how you bound the token, the assumption that there are not SPAM emails that contain HAM content, and vice versa is false. Regardless that is NOT what you claimed before, you seem to be flip-flopping between definitions to suite your argument. > -- > John Hardin KA7OHZ http://www.impsec.org/~jhardin/ > jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 > ----------------------------------------------------------------------- > USMC Rules of Gunfighting #6: If you can choose what to bring to a > gunfight, bring a long gun and a friend with a long gun. > ----------------------------------------------------------------------- > 2 days until the 1937th anniversary of the destruction of Pompeii