On Thu, 15 Nov 2018, Amir Caspi wrote:

On Nov 15, 2018, at 2:36 PM, John Hardin <jhar...@impsec.org> wrote:

That and its resistance to FP avoidance.

Despite the generality, I don't see a significant FP risk on the general unicode version. 
 I don't see ANY legitimate reason why an email would hard-encode long sequences of 
human-readable text, in any language or character set, using HTML entities like this.  
Legitimate emails can be sent with a character encoding intended for the target language 
and then the content doesn't need to be entity-encoded, it can just be included 
"properly" in the email.

My recollection is there were few to no FPs in the corpora test, right?  Or am 
I misremembering?

Fairly low; I asked the corpora owner for a review and they were all apparently legit.

I'll reenable the base rules so we can watch their performance. I don't think a subrule that isn't used gets published unless it's pushed with a tflag...

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  One unexpected benefit of time passing more quickly as you get older
  is the perceived increase in the frequency of paychecks.
-----------------------------------------------------------------------
 595 days since the first commercial re-flight of an orbital booster (SpaceX)

Reply via email to