On 07/05/11 08:34, Dominique Pellé wrote:
Hi

Consider the Ex command ":s/ \?/ /g".

It transforms a line "foo bar" into " f o o b a r " (good so far).

Now, if the line contains multibyte characters, it no longer works.
If current line contains "café bar" for example, it transforms it
into "c a f<c3>  <a9>  b a r"  (bad).

Tested with Vim-7.3.177 on Linux with a utf-8 locale.

Regards
-- Dominique


Yes, apparently, \? not only matches everywhere (which is expected), it even matches between individual bytes of a multibyte UTF-8 codepoint (which I wouldn't have expected). OTOH,

        :s/./\0 /g

correctly yields "c a f é b a r ", showing that the dot did match the two-byte e-acute as a unit. This suggests a workaround:

        :s/./ \0 /g
followed by
        :s/ \+/ /g

should (untested) result in having nonspace characters separated by exactly one space each, with one space at the start and end of the line, but without inserting anything in the middle of multibyte characters. I don't know how it would handle composing characters though.


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
86. E-mail Deficiency Depression (EDD) forces you to e-mail yourself.

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui