>> Perhaps we could replace '\r' with ' ' in the subject before
>> tokenizing without losing much/any accuracy. I don't believe we can
>> get whitespace in body tokens.
Tony> +1.
Tony> (I presume that this is a nicer solution than having our own csv
Tony> subclass that has the problem fixed?)
Well, given that the bug is in the underlying _csv extension module, I
suspect so. ;-)
Checked in as tokenizer.py 1.34.
Skip
_______________________________________________
spambayes-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/spambayes-dev