On Mon, Feb 02, 2004 at 10:29:35AM -0800, Dan Quinlan wrote: > I'd promote T_MPART_ALT_DIFF_99 as it stands. I have a hunch we could > get another 1% spam hits by working on the comparison algorithm. Maybe > compare the T_MPART_ALT_DIFF_99 FPs with the T_MPART_ALT_DIFF_95 FPs.
It wouldn't surprise me that we could get better results with a more
complex algorithm... it's REALLY simple.
I look at my FPs ... Most were valid, and then there were 4 Apple mails
which got caught. After investigating, the problem was that those mails
had text/plain, text/x-aol, and text/html ... The latter two were rendered
as HTML and so therefore the diff algorithm failed (it saw 2x HTML tags
than text tags ...)
I modified the renderer to only render type "text" or "text/plain",
and skip things like text/x-aol. I'm running a mass-check to see if
that breaks anything obvious.
--
Randomly Generated Tagline:
"We had our orders. Mister, I don't care if you had a personal message
from God, complete with stone tablets; you lied to me."
- Bester & Sinclair on Babylon 5
pgpE2yCrW3gLj.pgp
Description: PGP signature
