Diego Pomatta wrote:
(again as new mail)
Hey list,

I get lots of these errors while passing a mbox file to sa-learn for spam learning:

Malformed UTF-8 character (unexpected non-continuation byte 0x72, immediately after start byte 0xf3) in transliteration (tr///) at /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1049. Malformed UTF-8 character (unexpected non-continuation byte 0x20, immediately after start byte 0xe1) in transliteration (tr///) at /usr/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/Message.pm line 1050.

with variations in non-continuation byte and start byte, but all in lines 1049 and 1059 of Message.pm The process finishes well and tokens are learned, so I assume it's some of the messages within the mbox file that are somehow corrupted.
It started today after I added a bunch of new spammy msgs I collected.
What does the error mean and how can I identify the mails with the problem?
What perl version are you running? I suspect this appears to be related to a common bug in perl 5.8.6

It can be kludged with a "use bytes" added to message.pm, but that hurts performance a bit.

See also:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3787

(note: that bug is actually about it cropping up in rules, but it is likely the same root cause unless you're running perl 5.8.8)




Reply via email to