[pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns

Philip Hazel Wed, 23 Sep 2009 05:46:09 -0700

------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=891

--- Comment #1 from Philip Hazel <[email protected]>  2009-09-23 13:45:47 
---
On Tue, 22 Sep 2009, Alan Lehotsky wrote:

> Apparently one or more implementations (including possibly Henry Spencer's UCB
> regex code support this as synonyms for the beginning of a word and the end
> of a word respectively.
> 
> It would be handy for compatibility to recognize these two also in PCRE.

Are you sure about that? The patterns [[:<:]] and [[:>:]] look like a 
modification of the POSIX character class syntax - and a character class 
always matches a character. What would be the meaning of [abc[:<:]def] 
for example?

I did a google to try to find any documentation about this, and I 
couldn't. What I did find was that several engines use \< and \> for 
beginning and end of word. This is incompatible with Perl, and so could 
not be added to PCRE. (In Perl, and PCRE, backslash followed by a non-
alphanumeric character always matches a literal character. That is a 
nice, clean rule, and I would not want to violate it, even with a 
special option.)

If you can point me at some documentation that specifies what [[:<:]] 
and [[:>:]] actually mean in some other regex engine, I will think about 
it. But they are heckish long sequences, though in Perl and PCRE to do the
same thing takes one or two more characters:

\b(?=\w)      start of word
\b(?<=\w)     end of word

Regards,
Philip

-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 891] Support [[:<:]] and [[:>:]] patterns

Reply via email to