On Thu, Dec 15, 2011 at 4:42 PM, Tony Mechelynck <
[email protected]> wrote:

> On 15/12/11 22:15, Graham Lawrence wrote:
>
>> How can I find non-printing characters in a text?  I do not know which
>> specific characters I'm looking for, only that two different such
>> exist.  I have tried /Ctrl+V Ctrl+A thru Z to no avail.  Others that I
>> found visually appeared in vim as ~V ~W etc, but /~ would not go to any
>> of them so the tilde must designate tokens for something else.  As the
>> text was derived from html, I suspect what I'm looking for are those
>> curly opening and closing double-quotes.
>>
>> --
>> You received this message from the "vim_use" maillist.
>> Do not top-post! Type your reply below the text you are replying to.
>> For more information, visit 
>> http://www.vim.org/maillist.**php<http://www.vim.org/maillist.php>
>>
>
> For Latin1, the nonprinting characters are 0x00 to 0x1F (Ctrl-@ to
> Ctrl-_) and 0xFF to 0x9F (Ctrl-? to Ctrl-Alt-_). The following mapping
> ought to find them (assuming 'magic' and 'nocompatible'):
>
>  :map <F4> /[<Bslash>x00-<Bslash>x1F<**Bslash>xFF-<Bslash>x9F]<CR>
>  :map <S-F4> ?[<Bslash>x00-<Bslash>x1F<**Bslash>xFF-<Bslash>x9F]<CR>
>
> Note: this considers the space (0x20), the no-break space (0xA0) and the
> soft hyphen (0xAD) as "printing", the tab (0x09), carriage return (0x0D)
> and form feed (0x0C) as "nonprinting"; it also does not regard the
> end-of-line character (0x0A under Unix, Ox0D followed by 0x0A under
> Windows, 0x0D under Mac OS 9 or earlier) as part of the line. If your
> assumptions are different, a more or less trivial modification of the above
> mappings should suit you.
>
> For UTF-8 it's harder since there is a limit (257 or 258 I think) to the
> number of different characters that a collection can match, and OTOH there
> are non-printing characters all over the Unicode range, especially if you
> include "noncharacters", "invalid codepoints", unpaired surrogates (or any
> surrogates, even paired, if found in other than UTF-16 be or le) and
> "private-use" codepoints.
>
> To find _only_ invalid UTF-8 bytes (in Latin1 text), use 8g8 in Normal
> mode.
>
> To find the value of the character under the cursor (as a printable
> character if it is one, and in decimal, octal and hex), use ga
>
> The representation ^A ~B |C (usually in blue) used by Vim for characters
> declared as not part of 'isprint', means Ctrl-A, Ctrl-Alt-B, Alt-C. See the
> option's help for details.
>
> see
>        :help /[]
>        :help /\]
>        :help map_backslash
>        :help 8g8
>        :help ga
>        :help 'isprint'
>        http://www.unicode.org/charts/
> and in particular
>        
> http://www.unicode.org/charts/**PDF/U0000.pdf<http://www.unicode.org/charts/PDF/U0000.pdf>
>        
> http://www.unicode.org/charts/**PDF/U0080.pdf<http://www.unicode.org/charts/PDF/U0080.pdf>
>
> (about the latter two, note that Unicode codepoints U+0000 to U+00FF are
> the 256 characters of Latin1 in the same order).
>
> Best regards,
> Tony.
> --
> Conscience is a mother-in-law whose visit never ends.
>                -- H. L. Mencken
>

Many thanks, just what I needed.

Graham

-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to