On Aug 7, 2:50 pm, Tony Mechelynck <[email protected]> wrote: > On 07/08/10 20:41, Benjamin R. Haskell wrote: > > On Thu, 5 Aug 2010, Tim Chase wrote: > > >> On 08/05/10 00:17, Bee wrote: > >>> Too subtile for me! > > >>> I have looked and searched, this is the only difference I can find. > > >>> *[:blank:]* [:blank:] space and tab characters > >>> *[:space:]* [:space:] whitespace characters > > >>> What are other whitespace characters are than space and tab? > > >>> On the Mac non-breaking space is xA0 and neither find it. > > >>> Search for /\%xA0 finds the Mac non-breaking space. > > >> You have the right idea. Remember that a [...] character-class can be > >> prefixed by "\_" to include newlines, so you might do something like > > >> /the\_[[:space:]]\+brackets > > >> (finds a match in my help on those POSIX-style character-classes) > >> whereas it won't find a match if you use [[:blank:]] > > >> There are other Unicode whitespace characters (such as thin-space and > >> perhaps your non-breaking space, and other similar variants) so > >> [:blank:] is "JUST tabs and spaces" while [:space:] should find any of > >> the more generic whitespace. > > > So, Vim's [:space:] and [:blank:] don't seem to match Unicode spaces, > > which differs from Perl's [:space:] and [:blank:]. > > > Here's a complete list of what matches for me in perl 5.12.1, using a > > test program[1]: > > > Unicode name ║Hex ║Dec ║:space:║:blank: > > ═════════════════════════╬══════╬═════╬═══════╬═══════ > > CHARACTER TABULATION ║\u0009║9 ║1 ║1 > > LINE FEED (LF) ║\u000a║10 ║1 ║0 > > LINE TABULATION ║\u000b║11 ║1 ║0 > > FORM FEED (FF) ║\u000c║12 ║1 ║0 > > CARRIAGE RETURN (CR) ║\u000d║13 ║1 ║0 > > SPACE ║\u0020║32 ║1 ║1 > > OGHAM SPACE MARK ║\u1680║5760 ║1 ║1 > > MONGOLIAN VOWEL SEPARATOR║\u180e║6158 ║1 ║1 > > EN QUAD ║\u2000║8192 ║1 ║1 > > EM QUAD ║\u2001║8193 ║1 ║1 > > EN SPACE ║\u2002║8194 ║1 ║1 > > EM SPACE ║\u2003║8195 ║1 ║1 > > THREE-PER-EM SPACE ║\u2004║8196 ║1 ║1 > > FOUR-PER-EM SPACE ║\u2005║8197 ║1 ║1 > > SIX-PER-EM SPACE ║\u2006║8198 ║1 ║1 > > FIGURE SPACE ║\u2007║8199 ║1 ║1 > > PUNCTUATION SPACE ║\u2008║8200 ║1 ║1 > > THIN SPACE ║\u2009║8201 ║1 ║1 > > HAIR SPACE ║\u200a║8202 ║1 ║1 > > LINE SEPARATOR ║\u2028║8232 ║1 ║0 > > PARAGRAPH SEPARATOR ║\u2029║8233 ║1 ║0 > > NARROW NO-BREAK SPACE ║\u202f║8239 ║1 ║1 > > MEDIUM MATHEMATICAL SPACE║\u205f║8287 ║1 ║1 > > IDEOGRAPHIC SPACE ║\u3000║12288║1 ║1 > > > But, using a short test Vim script[2], I get only the non-Unicode > > spaces: > > > dec 9 space 1 blank 1 > > dec 10 space 1 blank 0 > > dec 11 space 1 blank 0 > > dec 12 space 1 blank 0 > > dec 13 space 1 blank 0 > > dec 32 space 1 blank 1 > > > I was surprised by that, but also surprised that NO-BREAK SPACE (\u00a > > [decimal 160]) didn't show up in either list. > > > Any reason the Unicode spaces in general don't match in Vim? > > It's documented, a few paragraphs below the [:list:]: > > These items only work for 8-bit characters. > > Characters in the range 0x80 to 0xFF are 8-bit in 8-bit encodings, but > in UTF-8 everything above 0x7F is multibyte (and what counts here is not > the 'fileencoding' used to represent your data on disk, but the > 'encoding' used to represent it in memory, which is how the data is > represented when the search looks at it). For instance the no-break > space U+00A0 is represented as 0xC2 0xA0 (2 bytes), the ideographic > space U+3000 is represented as 0xE3 0x80 0x80 (3 bytes), etc.
vim 7.2.446 Mac terminal I searched for [:list:] and helpgrep for :list: (just to be sure) and could find nothing. Then went to ftp.nluug.nl::Vim/runtime/doc to get (maybe) a more recent pattern.txt file. Still nothing. Is that info from vim 7.3? I did find the phrase: "These items only work for 8-bit characters." But no more info. Search for /\%xA0 finds the Mac non-breaking space. I have not seen it preceeded by 0xC2. I guess I need to use something like: [[:space:]\xA0]\+ If I copy the above commented "unicode table" from vim_use website, which has html non-breaking spaces, and paste into vim.app (a gvim for the Mac I like better than MacVim) is shows as " =" that is 0x20 0xA0 (2 byte) the "=" representing 0xA0 BUT pasted into terminal vim with: set pastetoggle=<F11> then it does show as 0xC2 0xA0 (2 bytes) --AND-- /[[:space:]] will find the 2 byte sequence! Thank you for the explanation. -Bill -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php
