On Aug 7, 2:50 pm, Tony Mechelynck <[email protected]>
wrote:
> On 07/08/10 20:41, Benjamin R. Haskell wrote:
> > On Thu, 5 Aug 2010, Tim Chase wrote:
> > >> On 08/05/10 00:17, Bee wrote:
> >>> Too subtile for me!
>
> >>> I have looked and searched, this is the only difference I can find.
>
> >>> *[:blank:]*     [:blank:]     space and tab characters
> >>> *[:space:]*     [:space:]     whitespace characters
>
> >>> What are other whitespace characters are than space and tab?
>
> >>> On the Mac non-breaking space is xA0 and neither find it.
>
> >>> Search for /\%xA0 finds the Mac non-breaking space.
>
> >> You have the right idea.  Remember that a [...] character-class can be
> >> prefixed by "\_" to include newlines, so you might do something like
>
> >>    /the\_[[:space:]]\+brackets
>
> >> (finds a match in my help on those POSIX-style character-classes)
> >> whereas it won't find a match if you use [[:blank:]]
>
> >> There are other Unicode whitespace characters (such as thin-space and
> >> perhaps your non-breaking space, and other similar variants) so
> >> [:blank:] is "JUST tabs and spaces" while [:space:] should find any of
> >> the more generic whitespace.
>
> > So, Vim's [:space:] and [:blank:] don't seem to match Unicode spaces,
> > which differs from Perl's [:space:] and [:blank:].
>
> > Here's a complete list of what matches for me in perl 5.12.1, using a
> > test program[1]:
>
> > Unicode name             ║Hex   ║Dec  ║:space:║:blank:
> > ═════════════════════════╬══════╬═════╬═══════╬═══════
> > CHARACTER TABULATION     ║\u0009║9    ║1      ║1
> > LINE FEED (LF)           ║\u000a║10   ║1      ║0
> > LINE TABULATION          ║\u000b║11   ║1      ║0
> > FORM FEED (FF)           ║\u000c║12   ║1      ║0
> > CARRIAGE RETURN (CR)     ║\u000d║13   ║1      ║0
> > SPACE                    ║\u0020║32   ║1      ║1
> > OGHAM SPACE MARK         ║\u1680║5760 ║1      ║1
> > MONGOLIAN VOWEL SEPARATOR║\u180e║6158 ║1      ║1
> > EN QUAD                  ║\u2000║8192 ║1      ║1
> > EM QUAD                  ║\u2001║8193 ║1      ║1
> > EN SPACE                 ║\u2002║8194 ║1      ║1
> > EM SPACE                 ║\u2003║8195 ║1      ║1
> > THREE-PER-EM SPACE       ║\u2004║8196 ║1      ║1
> > FOUR-PER-EM SPACE        ║\u2005║8197 ║1      ║1
> > SIX-PER-EM SPACE         ║\u2006║8198 ║1      ║1
> > FIGURE SPACE             ║\u2007║8199 ║1      ║1
> > PUNCTUATION SPACE        ║\u2008║8200 ║1      ║1
> > THIN SPACE               ║\u2009║8201 ║1      ║1
> > HAIR SPACE               ║\u200a║8202 ║1      ║1
> > LINE SEPARATOR           ║\u2028║8232 ║1      ║0
> > PARAGRAPH SEPARATOR      ║\u2029║8233 ║1      ║0
> > NARROW NO-BREAK SPACE    ║\u202f║8239 ║1      ║1
> > MEDIUM MATHEMATICAL SPACE║\u205f║8287 ║1      ║1
> > IDEOGRAPHIC SPACE        ║\u3000║12288║1      ║1
>
> > But, using a short test Vim script[2], I get only the non-Unicode
> > spaces:
>
> > dec 9  space 1 blank 1
> > dec 10 space 1 blank 0
> > dec 11 space 1 blank 0
> > dec 12 space 1 blank 0
> > dec 13 space 1 blank 0
> > dec 32 space 1 blank 1
>
> > I was surprised by that, but also surprised that NO-BREAK SPACE (\u00a
> > [decimal 160]) didn't show up in either list.
>
> > Any reason the Unicode spaces in general don't match in Vim?
>
> It's documented, a few paragraphs below the [:list:]:
>
>           These items only work for 8-bit characters.
>
> Characters in the range 0x80 to 0xFF are 8-bit in 8-bit encodings, but
> in UTF-8 everything above 0x7F is multibyte (and what counts here is not
> the 'fileencoding' used to represent your data on disk, but the
> 'encoding' used to represent it in memory, which is how the data is
> represented when the search looks at it). For instance the no-break
> space U+00A0 is represented as 0xC2 0xA0 (2 bytes), the ideographic
> space U+3000 is represented as 0xE3 0x80 0x80 (3 bytes), etc.

vim 7.2.446 Mac terminal

I searched for [:list:] and helpgrep for :list: (just to be sure) and
could find nothing. Then went to ftp.nluug.nl::Vim/runtime/doc to get
(maybe) a more recent pattern.txt file. Still nothing.

Is that info from vim 7.3?

I did find the phrase: "These items only work for 8-bit characters."
But no more info.

Search for /\%xA0 finds the Mac non-breaking space.
I have not seen it preceeded by 0xC2.

I guess I need to use something like:

[[:space:]\xA0]\+

If I copy the above commented "unicode table" from vim_use website,
which has html non-breaking spaces, and paste into vim.app (a gvim for
the Mac I like better than MacVim) is shows as " =" that is 0x20 0xA0
(2 byte) the "=" representing 0xA0

BUT pasted into terminal vim with:
set pastetoggle=<F11>
then it does show as 0xC2 0xA0 (2 bytes)
--AND--
/[[:space:]]
will find the 2 byte sequence!

Thank you for the explanation.

-Bill

-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to