Re: URI Basics

Matt Kettler Mon, 24 Apr 2006 17:57:06 -0700

Dan wrote:
>> In 3 ^ is the first character of the regex, just as it is in 1 and 2. It
>> is also inside the delimiters, just like 1 and 2. In example 3 @ is
>> being used as a delimiter,  and ^ is the first character after it.
>
> Are you saying that in URIs, any character (@ in this case) can serve
> as the delimiter, so long as it displays after the m and again at the
> end of the entry?
Well, any non-alphanumeric non-whitespace can be used. i.e. any punctuation.


Actually This actually is true of ANY SA rule, not just URIs. The use of
m to set up a regex delimiter is just part of the perl regex syntax,
which SA supports all of. It's called the "match operator".

So
 /foo/
m/foo/
m!foo!

Just be warry of what you use as a delimiter. Choosing something other
than / should only done to make things easier to read. It also
over-rides that character's normal uses until the end of the regex.

You can find a lot of detail about using the match operator (m) for this
purpose in section 7.4.3 of:

http://www.unix.org.ua/orelly/perl/learn/ch07_04.htm

(note: that page is general perl programing oriented, so a lot of things
in there are not so relevant.


>
> I'm beginning to realize how many of my learning curve issues are
> attempts to understand the very structure of a system created with a
> bare minimum of structure.
Heh, it's not that bad.. but there are a lot of advanced quirks you'll
see people using from their knowledge of heavy perl wizzardry.
>
>
>> There is definitely a VERY significant performance penalty to using
>> rawbody over URI, for any rule.
>>
>> Consider the size of input. A rawbody regex must be run against the
>> entire text of the body after QP decoding. A uri regex must be run
>> against all the text of the URIs that SA found. There is likely to be at
>> least a 100:1 difference in size of input. There's no "penalty" for
>> using a uri rule, as SA will always extract all the URIs and build the
>> input text, even if you aren't using it.
>
> Great information Matt, thanks. 
No problem.

Re: URI Basics

Reply via email to