Charlie Watts wrote:

>The current SUBJ_ALL_CAPS is broken.
>
>(The one from CVS - this:
>header SUBJ_ALL_CAPS Subject =~ /^[^a-z]*([A-Z][^a-z]*){3,}[^a-z]*$/
>)
>
Congratulations, Charlie! You're the next winner on "The Regex Is Right!"

Heh.

The problem is that this RE has exponential backoff time because it 
requires the RE engine to make some very, very expensive calculations. 
Say you have the case you pointed out here:

>Scanning this message takes 10 seconds:
>
>From: Charlie Watts <[EMAIL PROTECTED]>
>Subject: AAAAAAAAAAAAAAAAAAAAA foofoo
>
Ask yourself: what matches? Well, there's a lot of *possible* matches. 
And that's the problem. What's really wanted here is a multi-pass eval 
test, something like

sub subject_is_all_caps
{
  my $subject = @_[0];
  $subject =~ s/[^a-zA-Z]//;
  return $subject cmp lc($subject);
}

because now we can look at this and say that, for sure, anything left 
over is a capital letter and the subject is indeed all caps. The 
substitution is cheap compared to the expensive RE above.

-- 
          http://www.pricegrabber.com | Dog is my co-pilot.

                                   




_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to