Diego Pomatta wrote:
(again as new mail)
Hey list,
I get lots of these errors while passing a mbox file to sa-learn for
spam learning:
Malformed UTF-8 character (unexpected non-continuation byte 0x72,
immediately after start byte 0xf3) in transliteration (tr///) at
/usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1049.
Malformed UTF-8 character (unexpected non-continuation byte 0x20,
immediately after start byte 0xe1) in transliteration (tr///) at
/usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1050.
with variations in non-continuation byte and start byte, but all in
lines 1049 and 1059 of Message.pm
The process finishes well and tokens are learned, so I assume it's
some of the messages within the mbox file that are somehow corrupted.
It started today after I added a bunch of new spammy msgs I collected.
What does the error mean and how can I identify the mails with the
problem?
What perl version are you running? I suspect this appears to be related
to a common bug in perl 5.8.6
It can be kludged with a "use bytes" added to message.pm, but that hurts
performance a bit.
See also:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3787
(note: that bug is actually about it cropping up in rules, but it is
likely the same root cause unless you're running perl 5.8.8)