On Thu, 29 Apr 2010, Kris Deugau wrote:
John Hardin wrote:
> > On 4/28/10 3:13 PM, Kris Deugau wrote:
> > > 0.0 TO_EQ_FM_HTML_ONLY To == From and HTML only
> > > 0.0 TO_EQ_FM_DIRECT_MX To == From and direct-to-MX
> > > 1.7 TO_EQ_FM_HTML_DIRECT To == From and HTML only, direct-to-MX
There was a bug in handling bare addresses in the first version of those
rules, which has since been fixed. Unfortunately sa-update hasn'tpublished
the update yet - so I'm off to the dev list. Sorry!
Ah. These rules weren't my original concern; the TVD_PH_SUBJ_ACCOUNTS_POST
and TVD_SUBJ_ACC_NUM rules were, since they thoroughly overbalanced Bayes
(even with the more aggressive local BAYES_00 score) and caused the original
FP.
> I don't see anything obviously wrong with the root From == To meta
> subrules:
>
> header __TO_EQ_FROM_1 ALL =~
> /\nFrom:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*To:[^\n]+\1/ism
> header __TO_EQ_FROM_2 ALL =~
> /\nTo:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*From:[^\n]+\1/ism
They assume a human-readable comment and angle brackets are present on
whichever header appears first, which was erroneous.
Hmm. I'll be curious to see the updates; I'm far from a regex expert but I
don't see what's actually broken.
If there were no angle brackets it would only capture the last character
of the first address. The part of the RE before <? grabs the rest.
The current versions are:
header __TO_EQ_FROM_1 ALL =~
/\nFrom:\s+(?:[^\n<]{0,80}<)?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*To:\s+(?:[^\n]+<)?\1[>,\s\n]/ism
header __TO_EQ_FROM_2 ALL =~
/\nTo:\s+(?:[^\n<]{0,80}<)?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*From:\s+(?:[^\n]+<)?\1[>,\s\n]/ism
Well, there _is_ a size limit on what will be accepted between those two
headers, so other headers _can_ affect whether it will hit.
IIRC even moving a header from above to below the To/From pair altered the
behaviour at one point.
Yow. If you can provide me with a couple of examples of that I'll see if I
can figure out what's going on...
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
A well educated Electorate, being necessary to the liberty of a
free State, the Right of the People to Keep and Read Books,
shall not be infringed.
-----------------------------------------------------------------------
9 days until the 65th anniversary of VE day