Re: [Assp-test] soft hyphen fooling Bayesian analysis

2022-09-07 Thread K Post
engines will have learned also > obscured words (word combinations). > > > Thomas > > > > > Von:"K Post" > An:"ASSP development mailing list" < > assp-test@lists.sourceforge.net> > Datum:06.09.2022 21:31 > Betreff:

Re: [Assp-test] soft hyphen fooling Bayesian analysis

2022-09-07 Thread Thomas Eckardt
of time, both engines will have learned also obscured words (word combinations). Thomas Von:"K Post" An: "ASSP development mailing list" Datum: 06.09.2022 21:31 Betreff: Re: [Assp-test] soft hyphen fooling Bayesian analysis Eager to see what you come up w

Re: [Assp-test] soft hyphen fooling Bayesian analysis

2022-09-06 Thread K Post
to find > for example : <<<\P{Cyrillic}\p{Cyrillic}+\P{Cyrillic}>>> > finds a sequence where cyrillic (a p b ) are used in words - commonly > used by spammers > > Thomas > > > > Von: "K Post&qu

Re: [Assp-test] soft hyphen fooling Bayesian analysis

2022-09-06 Thread Thomas Eckardt
}>>> finds a sequence where cyrillic (a p b ) are used in words - commonly used by spammers Thomas Von:"K Post" An: "ASSP development mailing list" Datum: 06.09.2022 16:16 Betreff: [Assp-test] soft hyphen fooling Bayesian analysis Is there a

[Assp-test] soft hyphen fooling Bayesian analysis

2022-09-06 Thread K Post
Is there a way to improve the way that ASSP parses certain special, non-printing, characters? I'm having trouble with spam emails that have their body heavily obfuscated with "soft hyphens" slipping through. They all seem to have multipart bodies, first with an iso-8559-1 text part with *=AD*