[pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns

Alan Lehotsky Wed, 23 Sep 2009 12:16:05 -0700

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=891

--- Comment #2 from Alan Lehotsky <[email protected]>  2009-09-23 18:34:34 
---
I had never heard of the syntax either (and agree that it's not really 
needed for completeness).  But one of my users ran across this.
If I get some free time, I'll try and implement it and contribute the 
code.  I did find a citation (below) to prior implementation.

Regards,
Al Lehotsky

>From http://arglist.com/regex/regex7.html, purporting to be the man pages 
for Spencer's BSD 4.4  regex.

There are two special cases+ of bracket expressions: the bracket 
expressions `[[:<:]]' and `[[:>:]]' match the null string at the beginning 
and end of a word respectively. A word is defined as a sequence of word 
characters which is neither preceded nor followed by word characters. A 
word character is an alnum character (as defined by ctype(3)) or an 
underscore. This is an extension, compatible with but not specified by 
POSIX 1003.2, and should be used with caution in software intended to be 
portable to other systems. 

Philip Hazel <[email protected]> 
Sent by: [email protected]
09/23/2009 08:45 AM
Please respond to
[email protected]

To
[email protected]
cc

Subject
[Bug 891] Support [[:<:]] and [[:>:]] patterns

------- You are receiving this mail because: -------
You reported the bug.

http://bugs.exim.org/show_bug.cgi?id=891

--- Comment #1 from Philip Hazel <[email protected]>  2009-09-23 
13:45:47 ---
On Tue, 22 Sep 2009, Alan Lehotsky wrote:

> Apparently one or more implementations (including possibly Henry 
Spencer's UCB
> regex code support this as synonyms for the beginning of a word and the 
end
> of a word respectively.
> 
> It would be handy for compatibility to recognize these two also in PCRE.

Are you sure about that? The patterns [[:<:]] and [[:>:]] look like a 
modification of the POSIX character class syntax - and a character class 
always matches a character. What would be the meaning of [abc[:<:]def] 
for example?

I did a google to try to find any documentation about this, and I 
couldn't. What I did find was that several engines use \< and \> for 
beginning and end of word. This is incompatible with Perl, and so could 
not be added to PCRE. (In Perl, and PCRE, backslash followed by a non-
alphanumeric character always matches a literal character. That is a 
nice, clean rule, and I would not want to violate it, even with a 
special option.)

If you can point me at some documentation that specifies what [[:<:]] 
and [[:>:]] actually mean in some other regex engine, I will think about 
it. But they are heckish long sequences, though in Perl and PCRE to do the
same thing takes one or two more characters:

\b(?=\w)      start of word
\b(?<=\w)     end of word

Regards,
Philip

-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns

Reply via email to