Re: using regexp to search for Unicode code points and properties

Dennis Benzinger Thu, 20 Aug 2009 08:47:55 -0700

Am 20.08.2009 15:47, Brian Anderson schrieb:
> I'm interested in learning how to use regular expressions in Vi(m) to 
> search for Unicode code points.
> 
> In a book about regexp, it describes how to search for Unicode code 
> points by various means, and for various programming languages.
> 
> The book describes searching for a specific Unicode code point as \u2122 
> or \x{2122}.
> 
>  From what I've seen in the Vim help files, \u is to identify uppercase 
> characters, not Unicode code points, and \x is for hexadecimal digits.
> 
> The book also talks about  using Unicode property or categories in the 
> search. The book indicates there are 30 Unicode categories, grouped into 
> 7 super-categories.
> For example, \p{Ll} would find any lowercase letter that has an 
> uppercase variant, and \p{Lo} any letter or ideograph that does not have 
> lowercase and uppercase variants.
> 
> Unicode blocks are defined as \p{IsGreekExtended}. Blocks consist of a 
> single range of code points. Example: searching for any code point 
> between U+0000...U+007F can be found with \p{InBasicLatin}.
> 
> Unicode script is \p{Greek}. Each Unicode code point is part of only one 
> Unicode script. So if I wanted to search for any Greek letter, I'd use 
> \p{Greek}.
> 
> Unicode grapheme is \X or \P{M}. This would be either single codepoints 
> (U+00E0 Latin small letter a with grave accent) or combined codepoints 
> (U+0061 Latin small letter a + U+0300 combining grave accent).
> 
> Help on any of these, either in examples or where to look in the help 
> files, welcome.
> [...]


Read the help at :help /\%u for searching characters by a codepoint.


HTH,
Dennis Benzinger

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: using regexp to search for Unicode code points and properties

Reply via email to