On 07/08/10 20:41, Benjamin R. Haskell wrote:
On Thu, 5 Aug 2010, Tim Chase wrote:
On 08/05/10 00:17, Bee wrote:
Too subtile for me!
I have looked and searched, this is the only difference I can find.
*[:blank:]* [:blank:] space and tab characters
*[:space:]* [:space:] whitespace characters
What are other whitespace characters are than space and tab?
On the Mac non-breaking space is xA0 and neither find it.
Search for /\%xA0 finds the Mac non-breaking space.
You have the right idea. Remember that a [...] character-class can be
prefixed by "\_" to include newlines, so you might do something like
/the\_[[:space:]]\+brackets
(finds a match in my help on those POSIX-style character-classes)
whereas it won't find a match if you use [[:blank:]]
There are other Unicode whitespace characters (such as thin-space and
perhaps your non-breaking space, and other similar variants) so
[:blank:] is "JUST tabs and spaces" while [:space:] should find any of
the more generic whitespace.
So, Vim's [:space:] and [:blank:] don't seem to match Unicode spaces,
which differs from Perl's [:space:] and [:blank:].
Here's a complete list of what matches for me in perl 5.12.1, using a
test program[1]:
Unicode name ║Hex ║Dec ║:space:║:blank:
═════════════════════════╬══════╬═════╬═══════╬═══════
CHARACTER TABULATION ║\u0009║9 ║1 ║1
LINE FEED (LF) ║\u000a║10 ║1 ║0
LINE TABULATION ║\u000b║11 ║1 ║0
FORM FEED (FF) ║\u000c║12 ║1 ║0
CARRIAGE RETURN (CR) ║\u000d║13 ║1 ║0
SPACE ║\u0020║32 ║1 ║1
OGHAM SPACE MARK ║\u1680║5760 ║1 ║1
MONGOLIAN VOWEL SEPARATOR║\u180e║6158 ║1 ║1
EN QUAD ║\u2000║8192 ║1 ║1
EM QUAD ║\u2001║8193 ║1 ║1
EN SPACE ║\u2002║8194 ║1 ║1
EM SPACE ║\u2003║8195 ║1 ║1
THREE-PER-EM SPACE ║\u2004║8196 ║1 ║1
FOUR-PER-EM SPACE ║\u2005║8197 ║1 ║1
SIX-PER-EM SPACE ║\u2006║8198 ║1 ║1
FIGURE SPACE ║\u2007║8199 ║1 ║1
PUNCTUATION SPACE ║\u2008║8200 ║1 ║1
THIN SPACE ║\u2009║8201 ║1 ║1
HAIR SPACE ║\u200a║8202 ║1 ║1
LINE SEPARATOR ║\u2028║8232 ║1 ║0
PARAGRAPH SEPARATOR ║\u2029║8233 ║1 ║0
NARROW NO-BREAK SPACE ║\u202f║8239 ║1 ║1
MEDIUM MATHEMATICAL SPACE║\u205f║8287 ║1 ║1
IDEOGRAPHIC SPACE ║\u3000║12288║1 ║1
But, using a short test Vim script[2], I get only the non-Unicode
spaces:
dec 9 space 1 blank 1
dec 10 space 1 blank 0
dec 11 space 1 blank 0
dec 12 space 1 blank 0
dec 13 space 1 blank 0
dec 32 space 1 blank 1
I was surprised by that, but also surprised that NO-BREAK SPACE (\u00a
[decimal 160]) didn't show up in either list.
Any reason the Unicode spaces in general don't match in Vim?
It's documented, a few paragraphs below the [:list:]:
These items only work for 8-bit characters.
Characters in the range 0x80 to 0xFF are 8-bit in 8-bit encodings, but
in UTF-8 everything above 0x7F is multibyte (and what counts here is not
the 'fileencoding' used to represent your data on disk, but the
'encoding' used to represent it in memory, which is how the data is
represented when the search looks at it). For instance the no-break
space U+00A0 is represented as 0xC2 0xA0 (2 bytes), the ideographic
space U+3000 is represented as 0xE3 0x80 0x80 (3 bytes), etc.
Best regards,
Tony.
--
Dogs must have a permit signed by the mayor in order to congregate in groups
of three or more on private property.
[real standing law in Oklahoma, United States of America]
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php