On Thu, 5 Aug 2010, Tim Chase wrote:
> On 08/05/10 00:17, Bee wrote:
> > Too subtile for me!
> >
> > I have looked and searched, this is the only difference I can find.
> >
> > *[:blank:]* [:blank:] space and tab characters
> > *[:space:]* [:space:] whitespace characters
> >
> > What are other whitespace characters are than space and tab?
> >
> > On the Mac non-breaking space is xA0 and neither find it.
> >
> > Search for /\%xA0 finds the Mac non-breaking space.
>
> You have the right idea. Remember that a [...] character-class can be
> prefixed by "\_" to include newlines, so you might do something like
>
> /the\_[[:space:]]\+brackets
>
> (finds a match in my help on those POSIX-style character-classes)
> whereas it won't find a match if you use [[:blank:]]
>
> There are other Unicode whitespace characters (such as thin-space and
> perhaps your non-breaking space, and other similar variants) so
> [:blank:] is "JUST tabs and spaces" while [:space:] should find any of
> the more generic whitespace.
So, Vim's [:space:] and [:blank:] don't seem to match Unicode spaces,
which differs from Perl's [:space:] and [:blank:].
Here's a complete list of what matches for me in perl 5.12.1, using a
test program[1]:
Unicode name ║Hex ║Dec ║:space:║:blank:
═════════════════════════╬══════╬═════╬═══════╬═══════
CHARACTER TABULATION ║\u0009║9 ║1 ║1
LINE FEED (LF) ║\u000a║10 ║1 ║0
LINE TABULATION ║\u000b║11 ║1 ║0
FORM FEED (FF) ║\u000c║12 ║1 ║0
CARRIAGE RETURN (CR) ║\u000d║13 ║1 ║0
SPACE ║\u0020║32 ║1 ║1
OGHAM SPACE MARK ║\u1680║5760 ║1 ║1
MONGOLIAN VOWEL SEPARATOR║\u180e║6158 ║1 ║1
EN QUAD ║\u2000║8192 ║1 ║1
EM QUAD ║\u2001║8193 ║1 ║1
EN SPACE ║\u2002║8194 ║1 ║1
EM SPACE ║\u2003║8195 ║1 ║1
THREE-PER-EM SPACE ║\u2004║8196 ║1 ║1
FOUR-PER-EM SPACE ║\u2005║8197 ║1 ║1
SIX-PER-EM SPACE ║\u2006║8198 ║1 ║1
FIGURE SPACE ║\u2007║8199 ║1 ║1
PUNCTUATION SPACE ║\u2008║8200 ║1 ║1
THIN SPACE ║\u2009║8201 ║1 ║1
HAIR SPACE ║\u200a║8202 ║1 ║1
LINE SEPARATOR ║\u2028║8232 ║1 ║0
PARAGRAPH SEPARATOR ║\u2029║8233 ║1 ║0
NARROW NO-BREAK SPACE ║\u202f║8239 ║1 ║1
MEDIUM MATHEMATICAL SPACE║\u205f║8287 ║1 ║1
IDEOGRAPHIC SPACE ║\u3000║12288║1 ║1
But, using a short test Vim script[2], I get only the non-Unicode
spaces:
dec 9 space 1 blank 1
dec 10 space 1 blank 0
dec 11 space 1 blank 0
dec 12 space 1 blank 0
dec 13 space 1 blank 0
dec 32 space 1 blank 1
I was surprised by that, but also surprised that NO-BREAK SPACE (\u00a
[decimal 160]) didn't show up in either list.
Any reason the Unicode spaces in general don't match in Vim?
--
Best,
Ben
[1] Perl 'one-liner':
perl -CDS -Mcharnames=:full -lwe 'BEGIN{print "Unicode
name\tHex\tDec\t:space:\t:blank:";} for (map chr, 1..0xd700) { $s =
/[[:space:]]/; $b = /[[:blank:]]/; print join "\t", charnames::viacode(ord),
sprintf("\\u%04x",ord), ord, $s?1:0, $b?1:0 if $s or $b }'
[2] Vim script (saved as /tmp/script.vim, then :so /tmp/script.vim)
for n in range(0x4000)
let ord = n + 1
let c = nr2char(ord)
let sp = (match(c, '[[:space:]]') < 0) ? 0 : 1
let bl = (match(c, '[[:blank:]]') < 0) ? 0 : 1
if !(sp || bl)
continue
endif
echo "dec" ord "space" sp "blank" bl
endfor
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php