John Hardin wrote:
On 4/28/10 3:13 PM, Kris Deugau wrote:
> 0.0 TO_EQ_FM_HTML_ONLY To == From and HTML only
> 0.0 TO_EQ_FM_DIRECT_MX To == From and direct-to-MX
> 1.7 TO_EQ_FM_HTML_DIRECT To == From and HTML only, direct-to-MX
There was a bug in handling bare addresses in the first version of those
rules, which has since been fixed. Unfortunately sa-update
hasn'tpublished the update yet - so I'm off to the dev list. Sorry!
Ah. These rules weren't my original concern; the
TVD_PH_SUBJ_ACCOUNTS_POST and TVD_SUBJ_ACC_NUM rules were, since they
thoroughly overbalanced Bayes (even with the more aggressive local
BAYES_00 score) and caused the original FP.
I don't see anything obviously wrong with the root From == To meta
subrules:
header __TO_EQ_FROM_1 ALL =~
/\nFrom:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*To:[^\n]+\1/ism
header __TO_EQ_FROM_2 ALL =~
/\nTo:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*From:[^\n]+\1/ism
They assume a human-readable comment and angle brackets are present on
whichever header appears first, which was erroneous.
Hmm. I'll be curious to see the updates; I'm far from a regex expert
but I don't see what's actually broken.
Well, there _is_ a size limit on what will be accepted between those two
headers, so other headers _can_ affect whether it will hit.
*nod* So I can see in the subrules... but the From and To in the
original, and the sanitized example I posted to Pastebin, were right
next to each other. And with no pattern I could detect, removing or
altering other headers, or even the username and/or domain part of
either From or To *sometimes* caused a previously-matching header set to
not match, or vice versa. O_o
IIRC even moving a header from above to below the To/From pair altered
the behaviour at one point.
-kgd