On Thu, 29 Apr 2010, Kris Deugau wrote:

John Hardin wrote:
> >   On 4/28/10 3:13 PM, Kris Deugau wrote:
> > >    0.0 TO_EQ_FM_HTML_ONLY     To == From and HTML only
> > >    0.0 TO_EQ_FM_DIRECT_MX     To == From and direct-to-MX
> > >    1.7 TO_EQ_FM_HTML_DIRECT   To == From and HTML only, direct-to-MX

 There was a bug in handling bare addresses in the first version of those
 rules, which has since been fixed. Unfortunately sa-update hasn'tpublished
 the update yet - so I'm off to the dev list. Sorry!

Ah. These rules weren't my original concern; the TVD_PH_SUBJ_ACCOUNTS_POST and TVD_SUBJ_ACC_NUM rules were, since they thoroughly overbalanced Bayes (even with the more aggressive local BAYES_00 score) and caused the original FP.

> I don't see anything obviously wrong with the root From == To meta > subrules: > > header __TO_EQ_FROM_1 ALL =~ > /\nFrom:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*To:[^\n]+\1/ism > header __TO_EQ_FROM_2 ALL =~ > /\nTo:[^\n<]{0,80}<?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*From:[^\n]+\1/ism

 They assume a human-readable comment and angle brackets are present on
 whichever header appears first, which was erroneous.

Hmm. I'll be curious to see the updates; I'm far from a regex expert but I don't see what's actually broken.

If there were no angle brackets it would only capture the last character of the first address. The part of the RE before <? grabs the rest.

The current versions are:

  header  __TO_EQ_FROM_1 ALL =~ 
/\nFrom:\s+(?:[^\n<]{0,80}<)?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*To:\s+(?:[^\n]+<)?\1[>,\s\n]/ism
  header  __TO_EQ_FROM_2 ALL =~ 
/\nTo:\s+(?:[^\n<]{0,80}<)?([^\n\s>]+)>?\n(?:[^\n]{1,100}\n)*From:\s+(?:[^\n]+<)?\1[>,\s\n]/ism

 Well, there _is_ a size limit on what will be accepted between those two
 headers, so other headers _can_ affect whether it will hit.

IIRC even moving a header from above to below the To/From pair altered the behaviour at one point.

Yow. If you can provide me with a couple of examples of that I'll see if I can figure out what's going on...

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  A well educated Electorate, being necessary to the liberty of a
  free State, the Right of the People to Keep and Read Books,
  shall not be infringed.
-----------------------------------------------------------------------
 9 days until the 65th anniversary of VE day

Reply via email to