On Wed, 2011-10-19 at 22:21 -0400, dar...@chaosreigns.com wrote:
> > body            __BODY_URI      m{https?://.{1,50}$}
> 
> That will match any email that ends with http:// followed by 1 to 50
> characters of anythings, including spaces and other stuff not part of the
> url.  "$" is not "I want stuff to stop matching here."  It's the end.
> Either of the line, or of the email, depending on how SA handles newlines.

Depends on the type of rule. (And the type of RE modifiers.) The
obscure, old-school definition of a paragraph in this case. See my
previous post.

And, again, for the URI matching case, the uri rule is the one to go for
anyway, ensuring the RE to be applied to URIs only.


> Some quick untested thoughts:
> 
> body            __LONG_BODY     /.{151}/
> describe        __LONG_BODY     Has a body of more than 150 characters
                                        ^^^^

Has a *paragraph* of more than 150 chars. Again, see my previous post.

These three very short paragraphs sum up to more than 150 chars.

However, that __LONG_BODY body rule would not match on these three
paragraphs alone, only the other stuff.


> You might be able to do:
> body            __SHORT_BODY    /(?!.{1,150})/
> But I'm new to this "negative look-ahead assertion" thing.

See perlre. That is a *zero-width* negative look-ahead assertion. Since
there is nothing before the look-ahead, *any* place in the string would
do, with less than 1 char following it, as per the look-ahead assertion.
(And in this case, it really is just a waste of cycles trying to not
match more than a single char...)

By definition of the body rule, the end of the first paragraph.
Coincidentally, the end of the Subject (which is the first paragraph of
the "body" for body rules), regardless of the mail body.


And yes, I verified this. Using ad-hoc rules and faked, specially
crafted messages. My previous post might really be educating...

Don't forget to grab a beer, though, and take your time reading it. :)


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to