Re: charset=utf-16 tricks out SA

Reindl Harald Fri, 09 Oct 2015 05:49:16 -0700


Am 09.10.2015 um 14:22 schrieb Mark Martinec:

Reindl Harald wrote:

no custom body rules hit like they do for ISO/UTF8 :-(

What is your normalize_charsets setting?


enabled, that's what i meant with "like they do for ISO/UTF8" and
adding "dear potencial partner" to CUST_BODY_17 did not change the
score

see attached sample and rule below

body      CUST_BODY_17    /.*(1st page ranking of google|dear
potencial partner).*/i
score     CUST_BODY_17    1.0
describe  CUST_BODY_17    Contains Low


The problem with this message is that it declares encoding
as UTF-16, i.e. not explicitly stating endianness like
UTF-16BE or UTF-16LE, and there is no BOM mark at the
beginning of each textual part, so endianness cannot be
determined. The RFC 2781 says that big-endian encoding
should be assumed in absence of BOM.
See https://en.wikipedia.org/wiki/UTF-16

spammers are known to make mistakes, usually that's the things which got scored, that case is the opposite

In the provided message the actual endianness is LE, and
BOM is missing, so decoding as UTF-16BE fails and the
rule does not hit. Garbage-in, garbage-out.

If you manually edit the sample and replace UTF-16
with UTF-16LE (and normalize is enabled), your rule should
hit - at least it does so in the current trunk code.

yes, but since thunderbird shows the message and it don#t contain special chars....

If this seems to be common in the wild, please open a
bug ticket, as Kevin suggested, and attach the sample there.

that was a message from the wild hit BAYES_999 but not enough to exceed milter-reject score and hence the body rule which don't fire

will write a bugreport as soon i find some spare time (lot of stuff currently around me...)

signature.asc
Description: OpenPGP digital signature

Re: charset=utf-16 tricks out SA

Reply via email to