Dan wrote: >> In 3 ^ is the first character of the regex, just as it is in 1 and 2. It >> is also inside the delimiters, just like 1 and 2. In example 3 @ is >> being used as a delimiter, and ^ is the first character after it. > > Are you saying that in URIs, any character (@ in this case) can serve > as the delimiter, so long as it displays after the m and again at the > end of the entry? Well, any non-alphanumeric non-whitespace can be used. i.e. any punctuation.
Actually This actually is true of ANY SA rule, not just URIs. The use of m to set up a regex delimiter is just part of the perl regex syntax, which SA supports all of. It's called the "match operator". So /foo/ m/foo/ m!foo! Just be warry of what you use as a delimiter. Choosing something other than / should only done to make things easier to read. It also over-rides that character's normal uses until the end of the regex. You can find a lot of detail about using the match operator (m) for this purpose in section 7.4.3 of: http://www.unix.org.ua/orelly/perl/learn/ch07_04.htm (note: that page is general perl programing oriented, so a lot of things in there are not so relevant. > > I'm beginning to realize how many of my learning curve issues are > attempts to understand the very structure of a system created with a > bare minimum of structure. Heh, it's not that bad.. but there are a lot of advanced quirks you'll see people using from their knowledge of heavy perl wizzardry. > > >> There is definitely a VERY significant performance penalty to using >> rawbody over URI, for any rule. >> >> Consider the size of input. A rawbody regex must be run against the >> entire text of the body after QP decoding. A uri regex must be run >> against all the text of the URIs that SA found. There is likely to be at >> least a 100:1 difference in size of input. There's no "penalty" for >> using a uri rule, as SA will always extract all the URIs and build the >> input text, even if you aren't using it. > > Great information Matt, thanks. No problem.