By giving the user this capability, the user can change the setting to minimize the number of false positives caused by short, common idioms in HTML or mail, at the possible expense of having some undetected spam.
I've been looking at the false positives reported by Razor. One class is due to short mail parts which are evaluated separately by Razor and marked as spam although there is nothing about them related to spam. In a sense, this is like marking the word "the" as an indicator of spam because more mail that is reported to Razor with "the" is spam than ham. I know that Razor had some issues in the past with messages that were too short.
Examples of these parts:
Most of the ones I've run across are clear (transparent) gifs that are from HTML where they are used as spacers. A description of their use can be found at http://insights.iwarp.com/advanced/clear.html . I've seen these in random bits of HTML that users have included in their mail or in articles forwarded from cnet.com or fortune.com. These may also be implicated in other false-positives where I didn't check to see the details of why a message was rejected if I knew it was a mass mailing (and so I presumed some luser reported it as spam).
Here's a typical mail part of one of these ------=_NextPart_000_0014_01C2D73E.1871A380 Content-Type: image/gif; name="b.gif" Content-Transfer-Encoding: base64 Content-Location: http://i.i.com.com/cnwk.1d/b.gif
R0lGODlhAQABAID/AMDAwAAAACH5BAEAAAAALAAAAAABAAEAQAICRAEAOw==
Here are two other examples that I ran into, (both from the same message which was a forward of a forwarded message. I believe this mime indicates the end of the internal message). Each of these is marked as spam by Razor with a confidence of 100%.
-Apple-Mail-108--11777278
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=ISO-8859-1;
format=flowed
> > > =A0 >
--Apple-Mail-108--11777278 Content-Transfer-Encoding: quoted-printable Content-Type: text/enriched; charset=ISO-8859-1
<excerpt>
=A0
</excerpt>=
--Apple-Mail-108--11777278--
The examples I've run into have had the part sizes be 100-300 bytes (after pre-processing). Given that, I'd want to set the minimum size to be 500 bytes.
Once I understood that the issue behind this is that there are common mail parts that are used by spammers and so get reported as spam (while legitimate use of the same bit wouldn't likely to be reported as non-spam), I was surprised that I didn't find any cases of, say, EBay's logo being recognized as spam, even though it was in a message that was truly sent by EBay. Maybe I just didn't have such mail.
------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Razor-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/razor-users