Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-14 Fir de Conversatie Tony Mechelynck

On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote:
 Oops. sorry.

 However, the problem happen with the script as your said. :-)

 Thanks.

E486: Pattern not found means that there was no match. Are you sure 
you ran that script while the current file contained one or more 。 
characters? When I do (manually)

:%s/。/./g

on the UTF-8 script I sent you, the result is 2 substitutions on 2 
lines and the fullwidth fullstops are replaced by ASCII dots.

I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI).


Best regards,
Tony.
-- 
It's the opinion of some that crops could be grown on the moon.  Which
raises the fear that it may not be long before we're paying somebody
not to.
-- Franklin P. Jones

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-14 Fir de Conversatie Dominique Pelle

2009/1/14 Tony Mechelynck antoine.mechely...@gmail.com:

 On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote:
 Oops. sorry.

 However, the problem happen with the script as your said. :-)

 Thanks.

 E486: Pattern not found means that there was no match. Are you sure
 you ran that script while the current file contained one or more 。
 characters? When I do (manually)

:%s/。/./g

 on the UTF-8 script I sent you, the result is 2 substitutions on 2
 lines and the fullwidth fullstops are replaced by ASCII dots.

 I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI).


 Best regards,
 Tony.

I confirm the bug.

- Doing :%s/。/./g works (no error, and substitution happens).

- But doing...

  :command! SubJapanesePeriodToDot %s/。/./g
  :SubJapanesePeriodToDot

... then I get the error message:

E486: Pattern not found: e380feX82

I'm using Vim-7.2.84 on Linux, with a utf-8 locale.

。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82).

-- Dominique

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-14 Fir de Conversatie Tony Mechelynck

On 14/01/09 09:54, Dominique Pelle wrote:
 2009/1/14 Tony Mechelynckantoine.mechely...@gmail.com:
 
 On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote:
 Oops. sorry.

 However, the problem happen with the script as your said. :-)

 Thanks.
 E486: Pattern not found means that there was no match. Are you sure
 you ran that script while the current file contained one or more 。
 characters? When I do (manually)

 :%s/。/./g

 on the UTF-8 script I sent you, the result is 2 substitutions on 2
 lines and the fullwidth fullstops are replaced by ASCII dots.

 I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI).


 Best regards,
 Tony.
 
 I confirm the bug.
 
 - Doing :%s/。/./g works (no error, and substitution happens).
 
 - But doing...
 
:command! SubJapanesePeriodToDot %s/。/./g
:SubJapanesePeriodToDot
 
 ... then I get the error message:
 
 E486: Pattern not found:e380feX82
 
 I'm using Vim-7.2.84 on Linux, with a utf-8 locale.
 
 。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82).
 
 -- Dominique

Ah, yes, I get the same.


Best regards,
Tony.
-- 
All flesh is grass
-- Isiah
Smoke a friend today.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-14 Fir de Conversatie Yasuhiro MATSUMOTO

Bram, please check third patch from me. :-)

see below as descriptions.

http://groups.google.co.jp/group/vim_dev/browse_thread/thread/e9945dbdd6ab388f?hl=ja#455ac73ba4bb0e47

- Yasuhiro Matsumoto

On Wed, Jan 14, 2009 at 6:00 PM, Tony Mechelynck
antoine.mechely...@gmail.com wrote:

 On 14/01/09 09:54, Dominique Pelle wrote:
 2009/1/14 Tony Mechelynckantoine.mechely...@gmail.com:

 On 14/01/09 07:16, Yasuhiro MATSUMOTO wrote:
 Oops. sorry.

 However, the problem happen with the script as your said. :-)

 Thanks.
 E486: Pattern not found means that there was no match. Are you sure
 you ran that script while the current file contained one or more 。
 characters? When I do (manually)

 :%s/。/./g

 on the UTF-8 script I sent you, the result is 2 substitutions on 2
 lines and the fullwidth fullstops are replaced by ASCII dots.

 I'm using gvim 7.2.84 (Huge version, with GTK2/Gnome GUI).


 Best regards,
 Tony.

 I confirm the bug.

 - Doing :%s/。/./g works (no error, and substitution happens).

 - But doing...

:command! SubJapanesePeriodToDot %s/。/./g
:SubJapanesePeriodToDot

 ... then I get the error message:

 E486: Pattern not found:e380feX82

 I'm using Vim-7.2.84 on Linux, with a utf-8 locale.

 。is Unicode character U+3002 (i.e. UTF-8 sequence 0xe3 0x80 0x82).

 -- Dominique

 Ah, yes, I get the same.


 Best regards,
 Tony.
 --
 All flesh is grass
-- Isiah
 Smoke a friend today.

 


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-14 Fir de Conversatie Bram Moolenaar


Yasuhiro Matsumoto wrote:

 Bram, please check third patch from me. :-)
 
 see below as descriptions.
 
 http://groups.google.co.jp/group/vim_dev/browse_thread/thread/e9945dbdd6ab388f?hl=ja#455ac73ba4bb0e47

Thanks, I'll add it to the todo list.

-- 
Engineers understand that their appearance only bothers other people and
therefore it is not worth optimizing.
(Scott Adams - The Dilbert principle)

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Yasuhiro MATSUMOTO

oops. the patch have a bug. please check following.

Index: src/term.c
===
--- src/term.c  (revision 1318)
+++ src/term.c  (working copy)
@@ -5152,7 +5152,7 @@

 #ifdef FEAT_MBYTE
/* skip multibyte char correctly */
-   for (i = (*mb_ptr2len)(src); i  0; --i)
+   if ((i = (*mb_ptr2len)(src)) == 1)
 #endif
{
/*
@@ -5172,12 +5172,17 @@
result[dlen++] = K_SPECIAL;
result[dlen++] = KS_EXTRA;
result[dlen++] = (int)KE_CSI;
-   }
+   } else
+   result[dlen++] = *src;
 # endif
-   else
-   result[dlen++] = *src;
++src;
+#ifdef FEAT_MBYTE
+   } else {
+   mch_memmove(result + dlen, src, i);
+   dlen += i;
+   src += i;
}
+#endif
 }
 result[dlen] = NUL;
 --

On Tue, Jan 13, 2009 at 8:01 PM, Yasuhiro MATSUMOTO mattn...@gmail.com wrote:
 Hi. bram and all.

 I found a bug about treating multi-byte and special characters in command 
 line.
 ex:
  :set enc=utf-8
  :command! SubJapanesePeriodToDot %s/。/./g

 。 mean period in japanese utf-8. and it has 0x80 in leading byte.
 but replace_termcodes treat 0x80 as K_SPECIAL and break some
 multi-byte characters in command line above.
 Below is a patch for this problem. Please check and include.

 Thanks.

 Index: src/term.c
 ===
 --- src/term.c  (revision 1318)
 +++ src/term.c  (working copy)
 @@ -5155,28 +5155,33 @@
for (i = (*mb_ptr2len)(src); i  0; --i)
  #endif
{
 -   /*
 -* If the character is K_SPECIAL, replace it with K_SPECIAL
 -* KS_SPECIAL KE_FILLER.
 -* If compiled with the GUI replace CSI with K_CSI.
 -*/
 -   if (*src == K_SPECIAL)
 -   {
 -   result[dlen++] = K_SPECIAL;
 -   result[dlen++] = KS_SPECIAL;
 -   result[dlen++] = KE_FILLER;
 -   }
 +   if (i == 1) {
 +   /*
 +* If the character is K_SPECIAL, replace it with K_SPECIAL
 +* KS_SPECIAL KE_FILLER.
 +* If compiled with the GUI replace CSI with K_CSI.
 +*/
 +   if (*src == K_SPECIAL)
 +   {
 +   result[dlen++] = K_SPECIAL;
 +   result[dlen++] = KS_SPECIAL;
 +   result[dlen++] = KE_FILLER;
 +   }
  # ifdef FEAT_GUI
 -   else if (*src == CSI)
 -   {
 -   result[dlen++] = K_SPECIAL;
 -   result[dlen++] = KS_EXTRA;
 -   result[dlen++] = (int)KE_CSI;
 +   else if (*src == CSI)
 +   {
 +   result[dlen++] = K_SPECIAL;
 +   result[dlen++] = KS_EXTRA;
 +   result[dlen++] = (int)KE_CSI;
 +   } else
 +   result[dlen++] = *src;
 +# endif
 +   ++src;
 +   } else {
 +   mch_memmove(result + dlen, src, i);
 +   dlen += i;
 +   src += i;
}
 -# endif
 -   else
 -   result[dlen++] = *src;
 -   ++src;
}
 }
 result[dlen] = NUL;




 --
 - Yasuhiro Matsumoto




-- 
- Yasuhiro Matsumoto

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Yasuhiro MATSUMOTO

Hmm. I broke CUI mode.

Index: src/term.c
===
--- src/term.c  (revision 1318)
+++ src/term.c  (working copy)
@@ -5152,7 +5152,7 @@

 #ifdef FEAT_MBYTE
/* skip multibyte char correctly */
-   for (i = (*mb_ptr2len)(src); i  0; --i)
+   if ((i = (*mb_ptr2len)(src)) == 1)
 #endif
{
/*
@@ -5177,7 +5177,13 @@
else
result[dlen++] = *src;
++src;
+#ifdef FEAT_MBYTE
+   } else {
+   mch_memmove(result + dlen, src, i);
+   dlen += i;
+   src += i;
}
+#endif
 }
 result[dlen] = NUL;



On Tue, Jan 13, 2009 at 8:18 PM, Yasuhiro MATSUMOTO mattn...@gmail.com wrote:
 oops. the patch have a bug. please check following.

 Index: src/term.c
 ===
 --- src/term.c  (revision 1318)
 +++ src/term.c  (working copy)
 @@ -5152,7 +5152,7 @@

  #ifdef FEAT_MBYTE
/* skip multibyte char correctly */
 -   for (i = (*mb_ptr2len)(src); i  0; --i)
 +   if ((i = (*mb_ptr2len)(src)) == 1)
  #endif
{
/*
 @@ -5172,12 +5172,17 @@
result[dlen++] = K_SPECIAL;
result[dlen++] = KS_EXTRA;
result[dlen++] = (int)KE_CSI;
 -   }
 +   } else
 +   result[dlen++] = *src;
  # endif
 -   else
 -   result[dlen++] = *src;
++src;
 +#ifdef FEAT_MBYTE
 +   } else {
 +   mch_memmove(result + dlen, src, i);
 +   dlen += i;
 +   src += i;
}
 +#endif
 }
 result[dlen] = NUL;
  --

 On Tue, Jan 13, 2009 at 8:01 PM, Yasuhiro MATSUMOTO mattn...@gmail.com 
 wrote:
 Hi. bram and all.

 I found a bug about treating multi-byte and special characters in command 
 line.
 ex:
  :set enc=utf-8
  :command! SubJapanesePeriodToDot %s/。/./g

 。 mean period in japanese utf-8. and it has 0x80 in leading byte.
 but replace_termcodes treat 0x80 as K_SPECIAL and break some
 multi-byte characters in command line above.
 Below is a patch for this problem. Please check and include.

 Thanks.

 Index: src/term.c
 ===
 --- src/term.c  (revision 1318)
 +++ src/term.c  (working copy)
 @@ -5155,28 +5155,33 @@
for (i = (*mb_ptr2len)(src); i  0; --i)
  #endif
{
 -   /*
 -* If the character is K_SPECIAL, replace it with K_SPECIAL
 -* KS_SPECIAL KE_FILLER.
 -* If compiled with the GUI replace CSI with K_CSI.
 -*/
 -   if (*src == K_SPECIAL)
 -   {
 -   result[dlen++] = K_SPECIAL;
 -   result[dlen++] = KS_SPECIAL;
 -   result[dlen++] = KE_FILLER;
 -   }
 +   if (i == 1) {
 +   /*
 +* If the character is K_SPECIAL, replace it with K_SPECIAL
 +* KS_SPECIAL KE_FILLER.
 +* If compiled with the GUI replace CSI with K_CSI.
 +*/
 +   if (*src == K_SPECIAL)
 +   {
 +   result[dlen++] = K_SPECIAL;
 +   result[dlen++] = KS_SPECIAL;
 +   result[dlen++] = KE_FILLER;
 +   }
  # ifdef FEAT_GUI
 -   else if (*src == CSI)
 -   {
 -   result[dlen++] = K_SPECIAL;
 -   result[dlen++] = KS_EXTRA;
 -   result[dlen++] = (int)KE_CSI;
 +   else if (*src == CSI)
 +   {
 +   result[dlen++] = K_SPECIAL;
 +   result[dlen++] = KS_EXTRA;
 +   result[dlen++] = (int)KE_CSI;
 +   } else
 +   result[dlen++] = *src;
 +# endif
 +   ++src;
 +   } else {
 +   mch_memmove(result + dlen, src, i);
 +   dlen += i;
 +   src += i;
}
 -# endif
 -   else
 -   result[dlen++] = *src;
 -   ++src;
}
 }
 result[dlen] = NUL;




 --
 - Yasuhiro Matsumoto




 --
 - Yasuhiro Matsumoto




-- 
- Yasuhiro Matsumoto

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Bram Moolenaar


Yasuhiro Matsumoto wrote:

 Hi. bram and all.
 
 I found a bug about treating multi-byte and special characters in command 
 line.
 ex:
   :set enc=utf-8
   :command! SubJapanesePeriodToDot %s/。/./g
 
 。 mean period in japanese utf-8. and it has 0x80 in leading byte.
 but replace_termcodes treat 0x80 as K_SPECIAL and break some
 multi-byte characters in command line above.
 Below is a patch for this problem. Please check and include.

I cannot reproduce the problem.  The example you give does not contain a
0x80 byte.  Did it get mangled in the message?  I see Esc$B!#Esc(B,
where the Esc are one byte escape characters, 0x1b.
If I type what you sent then it works without a problem.

A valid UTF-8 character can never have 0x80 as a leading byte, this is
only for further bytes.

-- 
If the Universe is constantly expanding, why can't I ever find a parking space?

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Tony Mechelynck

On 13/01/09 16:31, Bram Moolenaar wrote:
 
 Yasuhiro Matsumoto wrote:
 
 Hi. bram and all.

 I found a bug about treating multi-byte and special characters in command 
 line.
 ex:
:set enc=utf-8
:command! SubJapanesePeriodToDot %s/。/./g

 。 mean period in japanese utf-8. and it has 0x80 in leading byte.
 but replace_termcodes treat 0x80 as K_SPECIAL and break some
 multi-byte characters in command line above.
 Below is a patch for this problem. Please check and include.
 
 I cannot reproduce the problem.  The example you give does not contain a
 0x80 byte.  Did it get mangled in the message?  I seeEsc$B!#Esc(B,
 where theEsc  are one byte escape characters, 0x1b.
 If I type what you sent then it works without a problem.
 
 A valid UTF-8 character can never have 0x80 as a leading byte, this is
 only for further bytes.
 

The replace-from character is a fullwidth full stop, Unicode codepoint
U+3002, represented in UTF-8 as E3 80 82. (So it's its _second_ byte
which is 0x80).

Note that this mail is not in UTF-8 but (like the rest of this thread)
in ISO-2022-JP. If you want to, i can send it again in UTF-8.


Best regards,
Tony.
-- 
Etymology, n.:
Some early etymological scholars came up with derivations that
were hard for the public to believe.  The term etymology was formed
from the Latin etus (eaten), the root mal (bad), and logy
(study of).  It meant the study of things that are hard to swallow.
-- Mike Kellen

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Yasuhiro MATSUMOTO
Hi, Tony.

Yes it's full width full stop character in utf-8. and it include 0x80.

  E38082

i.e.

  :command! SubJapanesePeriodToDot %s/。/./g
  :SubJapanesePeriodToDot

I get an error E486: Pattern not found. Please check attached script.

On Wed, Jan 14, 2009 at 1:46 AM, Tony Mechelynck
antoine.mechely...@gmail.com wrote:
 On 13/01/09 16:31, Bram Moolenaar wrote:

 Yasuhiro Matsumoto wrote:

 Hi. bram and all.

 I found a bug about treating multi-byte and special characters in command 
 line.
 ex:
:set enc=utf-8
:command! SubJapanesePeriodToDot %s/。/./g

 。 mean period in japanese utf-8. and it has 0x80 in leading byte.
 but replace_termcodes treat 0x80 as K_SPECIAL and break some
 multi-byte characters in command line above.
 Below is a patch for this problem. Please check and include.

 I cannot reproduce the problem.  The example you give does not contain a
 0x80 byte.  Did it get mangled in the message?  I seeEsc$B!#Esc(B,
 where theEsc  are one byte escape characters, 0x1b.
 If I type what you sent then it works without a problem.

 A valid UTF-8 character can never have 0x80 as a leading byte, this is
 only for further bytes.


 The replace-from character is a fullwidth full stop, Unicode codepoint
 U+3002, represented in UTF-8 as E3 80 82. (So it's its _second_ byte
 which is 0x80).

 Note that this mail is not in UTF-8 but (like the rest of this thread)
 in ISO-2022-JP. If you want to, i can send it again in UTF-8.


 Best regards,
 Tony.
 --
 Etymology, n.:
Some early etymological scholars came up with derivations that
 were hard for the public to believe.  The term etymology was formed
 from the Latin etus (eaten), the root mal (bad), and logy
 (study of).  It meant the study of things that are hard to swallow.
-- Mike Kellen




-- 
- Yasuhiro Matsumoto

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



test.vim
Description: Binary data


Re: some multi-byte is treated as K_SPECIAL in command line

2009-01-13 Fir de Conversatie Yasuhiro MATSUMOTO

Oops. sorry.

However, the problem happen with the script as your said. :-)

Thanks.

On Wed, Jan 14, 2009 at 2:03 PM, Tony Mechelynck
antoine.mechely...@gmail.com wrote:
 On 14/01/09 01:18, Yasuhiro MATSUMOTO wrote:

 Hi, Tony.

 Yes it's full width full stop character in utf-8. and it include 0x80.

   E38082

 i.e.

   :command! SubJapanesePeriodToDot %s/。/./g
   :SubJapanesePeriodToDot

 I get an error E486: Pattern not found. Please check attached script.

 The script attached to your mail is not in UTF-8, I got it with its
 fullwidth fullstop encoded as 0x81 0x42. After trying to read it in gvim in
 a couple of different encodings, I conclude that it is not in ISO-2022-JP
 either but in shift-JIS or in something that represents the fullwidth
 fullstop the same way as shift-JIS does.

 I'm attaching a UTF-8 version of the same (with BOM), your gvim ought to be
 able to read it correctly if you have 'encoding' set to utf-8 and
 'fileencodings' (plural) starting with ucs-bom. Any Unicode-capable editor
 or browser ought to be able to read it correctly too.


 Best regards,
 Tony.
 --
 X-rated movies are all alike ... the only thing they leave to the
 imagination is the plot.




-- 
- Yasuhiro Matsumoto

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---