On Thu, Dec 15, 2011 at 4:42 PM, Tony Mechelynck < [email protected]> wrote:
> On 15/12/11 22:15, Graham Lawrence wrote: > >> How can I find non-printing characters in a text? I do not know which >> specific characters I'm looking for, only that two different such >> exist. I have tried /Ctrl+V Ctrl+A thru Z to no avail. Others that I >> found visually appeared in vim as ~V ~W etc, but /~ would not go to any >> of them so the tilde must designate tokens for something else. As the >> text was derived from html, I suspect what I'm looking for are those >> curly opening and closing double-quotes. >> >> -- >> You received this message from the "vim_use" maillist. >> Do not top-post! Type your reply below the text you are replying to. >> For more information, visit >> http://www.vim.org/maillist.**php<http://www.vim.org/maillist.php> >> > > For Latin1, the nonprinting characters are 0x00 to 0x1F (Ctrl-@ to > Ctrl-_) and 0xFF to 0x9F (Ctrl-? to Ctrl-Alt-_). The following mapping > ought to find them (assuming 'magic' and 'nocompatible'): > > :map <F4> /[<Bslash>x00-<Bslash>x1F<**Bslash>xFF-<Bslash>x9F]<CR> > :map <S-F4> ?[<Bslash>x00-<Bslash>x1F<**Bslash>xFF-<Bslash>x9F]<CR> > > Note: this considers the space (0x20), the no-break space (0xA0) and the > soft hyphen (0xAD) as "printing", the tab (0x09), carriage return (0x0D) > and form feed (0x0C) as "nonprinting"; it also does not regard the > end-of-line character (0x0A under Unix, Ox0D followed by 0x0A under > Windows, 0x0D under Mac OS 9 or earlier) as part of the line. If your > assumptions are different, a more or less trivial modification of the above > mappings should suit you. > > For UTF-8 it's harder since there is a limit (257 or 258 I think) to the > number of different characters that a collection can match, and OTOH there > are non-printing characters all over the Unicode range, especially if you > include "noncharacters", "invalid codepoints", unpaired surrogates (or any > surrogates, even paired, if found in other than UTF-16 be or le) and > "private-use" codepoints. > > To find _only_ invalid UTF-8 bytes (in Latin1 text), use 8g8 in Normal > mode. > > To find the value of the character under the cursor (as a printable > character if it is one, and in decimal, octal and hex), use ga > > The representation ^A ~B |C (usually in blue) used by Vim for characters > declared as not part of 'isprint', means Ctrl-A, Ctrl-Alt-B, Alt-C. See the > option's help for details. > > see > :help /[] > :help /\] > :help map_backslash > :help 8g8 > :help ga > :help 'isprint' > http://www.unicode.org/charts/ > and in particular > > http://www.unicode.org/charts/**PDF/U0000.pdf<http://www.unicode.org/charts/PDF/U0000.pdf> > > http://www.unicode.org/charts/**PDF/U0080.pdf<http://www.unicode.org/charts/PDF/U0080.pdf> > > (about the latter two, note that Unicode codepoints U+0000 to U+00FF are > the 256 characters of Latin1 in the same order). > > Best regards, > Tony. > -- > Conscience is a mother-in-law whose visit never ends. > -- H. L. Mencken > Many thanks, just what I needed. Graham -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php
