[pcre-dev] [Bug 1336] Universal Character Name escape code

Philip Hazel Tue, 26 Feb 2013 08:00:52 -0800

------- You are receiving this mail because: -------
You are on the CC list for the bug.


http://bugs.exim.org/show_bug.cgi?id=1336




--- Comment #9 from Philip Hazel <[email protected]>  2013-02-26 16:00:34 
---
I have now done some research on Universal Character Names. It seems that what
you are asking for is a way of matching "any character that may be encoded
using a Universal Character Name" rather than a Universal Character Name
itself, which is of the form \uxxxx (for characters whose Unicode code point is
no greater than U+FFFF) or \Uxxxxxxxx others. 

We do already have some "private" Unicode property names in PCRE, for example,
Xan for any Unicode alphanumeric character. They all begin with the letter X. I
propose to add Xuc ("universally-named character", keeping it down to 3
letters) which will match $ @ ` and all characters from \x{a0} upwards except
for the excluded range \x{d800} to \x{dfff}. These are the only characters that
are permitted to be specified using Universal Character Names. Most "base
characters" such as ASCII letters are not permitted. The PCRE syntax will
therefore be \p{Xuc}.

To match the same set, but without $ @ and ` you should be able to use the
double negative trick:  [^\P{Xuc}$@`]  (compare [^\W_] which matches letters
and digits but not underscore).


-- 
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 1336] Universal Character Name escape code

Reply via email to