BUG: Unicode characters in commands

2008-09-29 Fir de Conversatie Matt Wozniski
On Sun, Sep 28, 2008 at 4:35 PM, Tony Mechelynck wrote:

 On Sun, Sep 28, 2008 at 9:40 AM, John Hughes wrote:
 I am trying to write a command that substitutes some Ascii characters
 with a Unicode character. The following substitution works when
 entered directly:

 :%s/\.\.\./…/eg

 However, when defined as a command, it does not work:

 :com Ellipsis %s/\.\.\./…/eg

 The command :Ellipsis converts

 ...

 into

 â80feX¦

 Why is this? Is there any way of using Unicode characters in
 substitute commands?

 I'm using gvim 7.2.21, huge build with Gnome2 GUI and 'encoding' set to
 UTF-8. Just like the OP, I see the following:

 - Typing the :s command at the command-line works OK.
 - Defining that :s command as a user-command text, then running that
 user command, replaces every set of three dots by â80feX¦ (5
 characters including two invalid UTF-8 sequences, 7 bytes viz. C3 A2 80
 FE 58 C2 A6).
 - Recalling that command definition with :command Ellipsis displays
 the ellipsis character as an ellipsis.
 - The ellipsis is U+2026, in UTF-8 0xE2 0x80 0xA6. Notice that 80 and A6
 appear (though not consecutively) as part of the replace-text actually
 used, and that E2 is C3 A2 which also appears. This makes me suspect
 that Vim is applying a spurious Latin1-to-UTF8 conversion to what is
 already UTF-8 (with something wrong, maybe buffer-overflow, happening in
 the middle). Another possibility would be using a character length
 instead of a byte length, or vice-versa, at some point in the
 user-command execution.

I can confirm this.  It looks to me like it's not a spurious
Latin1-UTF8 conversion, but an internally-escaped string that's not
un-escaped before being used.  Sourcediving, it seems that
mb_unescape() is called to escape any multibyte characters when
displaying the command, but that mb_unescape() is never called before
the command is passed to do_cmdline() to be executed.  That seems to
explain why it's displayed properly but executed incorrectly.  I don't
completely follow all of the string escaping being done here, though,
so Bram knows for sure.  I've cross-posted to the vim-dev list
accordingly.

~Matt

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: BUG: Unicode characters in commands

2008-09-29 Fir de Conversatie Bram Moolenaar


Matt Wozniski wrote:

 On Sun, Sep 28, 2008 at 4:35 PM, Tony Mechelynck wrote:
 
  On Sun, Sep 28, 2008 at 9:40 AM, John Hughes wrote:
  I am trying to write a command that substitutes some Ascii characters
  with a Unicode character. The following substitution works when
  entered directly:
 
  :%s/\.\.\./…/eg
 
  However, when defined as a command, it does not work:
 
  :com Ellipsis %s/\.\.\./…/eg
 
  The command :Ellipsis converts
 
  ...
 
  into
 
  â80feX¦
 
  Why is this? Is there any way of using Unicode characters in
  substitute commands?
 
  I'm using gvim 7.2.21, huge build with Gnome2 GUI and 'encoding' set to
  UTF-8. Just like the OP, I see the following:
 
  - Typing the :s command at the command-line works OK.
  - Defining that :s command as a user-command text, then running that
  user command, replaces every set of three dots by â80feX¦ (5
  characters including two invalid UTF-8 sequences, 7 bytes viz. C3 A2 80
  FE 58 C2 A6).
  - Recalling that command definition with :command Ellipsis displays
  the ellipsis character as an ellipsis.
  - The ellipsis is U+2026, in UTF-8 0xE2 0x80 0xA6. Notice that 80 and A6
  appear (though not consecutively) as part of the replace-text actually
  used, and that E2 is C3 A2 which also appears. This makes me suspect
  that Vim is applying a spurious Latin1-to-UTF8 conversion to what is
  already UTF-8 (with something wrong, maybe buffer-overflow, happening in
  the middle). Another possibility would be using a character length
  instead of a byte length, or vice-versa, at some point in the
  user-command execution.
 
 I can confirm this.  It looks to me like it's not a spurious
 Latin1-UTF8 conversion, but an internally-escaped string that's not
 un-escaped before being used.  Sourcediving, it seems that
 mb_unescape() is called to escape any multibyte characters when
 displaying the command, but that mb_unescape() is never called before
 the command is passed to do_cmdline() to be executed.  That seems to
 explain why it's displayed properly but executed incorrectly.  I don't
 completely follow all of the string escaping being done here, though,
 so Bram knows for sure.  I've cross-posted to the vim-dev list
 accordingly.

I'll add it to the todo list.  Don't expect a solution soon...

-- 
hundred-and-one symptoms of being an internet addict:
116. You are living with your boyfriend who networks your respective
 computers so you can sit in separate rooms and email each other

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---