> On Aug 22, 2016, at 8:09 AM, John Hardin <jhar...@impsec.org> wrote:
> 
> On Mon, 22 Aug 2016, Antony Stone wrote:
> 
>> On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote:
>> 
>>> On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote:
>>>>> So.  What percentage of emails using your algorithm are actually
>>>>> decidable?
>>>> 
>>>> Almost 100% if you look at a wide variety of tokens from multiple
>>>> attributes.
>>> 
>>> I can't believe that, or I'm missing something.  Almost every spam I see
>>> contains words that also appear in ham.  Things like "this" or "invoice"
>>> or "regards" or "dear".
>>> 
>>> What am I missing?
>> 
>> I believe you're missing Marc's definition of "token".
> 
> ...and it looks like we're venturing into the "SA Bayes multiple-word token 
> support" realm (as a surrogate).
> 

Even with the multiple tokens combined into one fingerprint, you've changed 
little. No matter how you bound the token, the assumption that there are not 
SPAM emails that contain HAM content, and vice versa is false. 

Regardless that is NOT what you claimed before, you seem to be flip-flopping 
between definitions to suite your argument.


> -- 
> John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
> jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>  USMC Rules of Gunfighting #6: If you can choose what to bring to a
>  gunfight, bring a long gun and a friend with a long gun.
> -----------------------------------------------------------------------
> 2 days until the 1937th anniversary of the destruction of Pompeii

Reply via email to