On Mon, Feb 02, 2004 at 10:29:35AM -0800, Dan Quinlan wrote:
> I'd promote T_MPART_ALT_DIFF_99 as it stands.  I have a hunch we could
> get another 1% spam hits by working on the comparison algorithm.  Maybe
> compare the T_MPART_ALT_DIFF_99 FPs with the T_MPART_ALT_DIFF_95 FPs.

It wouldn't surprise me that we could get better results with a more
complex algorithm...  it's REALLY simple.

I look at my FPs ...  Most were valid, and then there were 4 Apple mails
which got caught.  After investigating, the problem was that those mails
had text/plain, text/x-aol, and text/html ... The latter two were rendered
as HTML and so therefore the diff algorithm failed (it saw 2x HTML tags
than text tags ...)

I modified the renderer to only render type "text" or "text/plain",
and skip things like text/x-aol.  I'm running a mass-check to see if
that breaks anything obvious.

-- 
Randomly Generated Tagline:
"We had our orders.  Mister, I don't care if you had a personal message
 from God, complete with stone tablets; you lied to me."
         - Bester & Sinclair on Babylon 5

Attachment: pgpE2yCrW3gLj.pgp
Description: PGP signature

Reply via email to