Re: failure notice

2006-08-13 Thread Edward L. Fox

On 8/13/06, A.J.Mechelynck [EMAIL PROTECTED] wrote:

[...]
Edward had it on Windows. From :help encoding-values I gather that
Chinese and prc are alias to cp936 / euc-cn. Maybe gbk and gb18030
can be added to the family?


I'm using Debian Etch. But I had a look at the Windoze system and
found that cp936 is supported well in both Linux and Windoze, however,
GBK is not supported by any of them.

euc-cn is an alias of GB2312, which is only a subset of GBK. So we
should not put them together. GB18030 is not exactly the same with GBK
but 99% of them is the same, the remaining different part is cared by
nobody in the world, and is very very complicated and very very
difficult to support. Moreover, very few X servers support this
encoding. So I suggest to alias GB18030 to cp936, too, simply and
wrongly. :-)

After having discussed about the charset, I think it's right time to
do some work on the malformed characters in the toolbar tooltips. I
made a patch and solved the problem yesterday (or at least it was
seemed to be solved). Can anybody review my changes and give some
suggestions? Thanks.

http://groups.yahoo.com/group/vim/message/72396




Best regards,
Tony.



Re: failure notice

2006-08-13 Thread Bram Moolenaar

[removed the Vim maillist, this is development only]

Edward L. Fox wrote:

 On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote:
  [...]
  You may have uncovered a bug that went unnoticed so far.  Please try to
  discover what causes this problem.  I can't guess why the last character
  is messed up, looks strange.
 
 I think I solved the problem! That was caused by iconv.
 
size_t iconv(iconv_t cd,
  char **inbuf, size_t *inbytesleft,
  char **outbuf, size_t *outbytesleft);
 
 The parameter inbytesleft and outbytesleft should all include the
 trailing '\0' byte. In the previous version of gvim, we passed the
 parameter as the length of the string, excluding the trailing '\0'. So
 it is 1 byte less than the correct value.

This is not quite so.  iconv() does not require the terminating NUL (it
can also be used to convert part of a string).  If it does require the
NUL then iconv() is broken.  That's unlikely though.

Your change suggests that the length that is passed should be one more.
Thus only one byte of the last double-byte character is currently
converted.  I can't quickly figure out where the wrong length is
computed or passed.  You probably already know the call stack, please
have a look at where the length comes from.  It's probably an off-by-one
error somewhere.

-- 
hundred-and-one symptoms of being an internet addict:
129. You cancel your newspaper subscription.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: failure notice

2006-08-13 Thread Edward L. Fox

On 8/13/06, Bram Moolenaar [EMAIL PROTECTED] wrote:


[removed the Vim maillist, this is development only]

Edward L. Fox wrote:

 On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote:
  [...]
  You may have uncovered a bug that went unnoticed so far.  Please try to
  discover what causes this problem.  I can't guess why the last character
  is messed up, looks strange.

 I think I solved the problem! That was caused by iconv.

size_t iconv(iconv_t cd,
  char **inbuf, size_t *inbytesleft,
  char **outbuf, size_t *outbytesleft);

 The parameter inbytesleft and outbytesleft should all include the
 trailing '\0' byte. In the previous version of gvim, we passed the
 parameter as the length of the string, excluding the trailing '\0'. So
 it is 1 byte less than the correct value.

This is not quite so.  iconv() does not require the terminating NUL (it
can also be used to convert part of a string).  If it does require the
NUL then iconv() is broken.  That's unlikely though.


I wrote a short piece of testing code to test iconv with Chinese
characters. The fact is, if the last character is a Chinese character,
it is always malformed after converting. So I think it should be
necessary to pass the length including the trailing '\0' to iconv.

8
#include iconv.h

int main(void)
{
   char inbuffer[256];
   char outbuffer[256];
   int fd;

   fd = iconv_open(cp936, euc-cn);

   for (;;)
   {
   int inlength, outlength;
   char *inptr, *outptr;
   gets(inbuffer);
   inlength = strlen(inbuffer);
   outlength = 256;
   if (inlength == 0)
   break;
   inptr = inbuffer;
   outptr = outbuffer;
   iconv(fd, inptr, inlength, outptr, outlength);
   printf(%s\n, outbuffer);
   }

   iconv_close(fd);

   return 0;
}
8


Your change suggests that the length that is passed should be one more.
Thus only one byte of the last double-byte character is currently
converted.  I can't quickly figure out where the wrong length is
computed or passed.  You probably already know the call stack, please
have a look at where the length comes from.  It's probably an off-by-one
error somewhere.


I traced the code again and again but nothing special happened. You
called string_convert and pass 0 as the length of the string, so in
string_convert_ext you calculates the length of the string with
STRLEN, then call iconv_string, last iconv. There is nothing wrong
with the length anywhere. So... Maybe it is still iconv's fault.


[...]



Regards,

Edward L. Fox


Re: failure notice

2006-08-13 Thread Edward L. Fox

[...]
I traced the code again and again but nothing special happened. You
called string_convert and pass 0 as the length of the string, so in
string_convert_ext you calculates the length of the string with
STRLEN, then call iconv_string, last iconv. There is nothing wrong
with the length anywhere. So... Maybe it is still iconv's fault.


Sorry all. I did more tests and searched more documents and found that
it was a bug of libiconv, not gvim. The problem occurs only when
converting gb2312 to gbk. I'm trying to debug libiconv...


[...]



Ashamed

Edward L. Fox


Re: failure notice

2006-08-12 Thread Bram Moolenaar

Edward L. Fox wrote:

 On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote:
  [...]
   But gvim doesn't support an encoding named 'gbk'. If the system
   encoding is 'gbk', the menu and toolbar get malformed.
 
  What do you mean by system encoding?  How does Vim see this?
 
 I meant the $LANG variable.
 
 GBK is right an alias of cp936, so it is proper to add this alias
 entry into mbyte.c. But the situation with GB18030 was much more
 complicated and the current version of gvim is not able to handle it
 correctly. About GB18030 there is another long and not-so-funny but
 ridiculous story. If you like, I can tell you the detailed GOSSIP
 later...
 
 Because over 99% part of GB18030 and GBK is the same, and the
 remaining part is too difficult to handle, I want to set GB18030 as
 another alias of cp936. Do you think it is OK?

I can alias gbk and gb18030 to cp936.  But does setting 'encoding' to
cp936 really work on a non-Windows system?  If not then the alias won't
help.

  [...]
 
  You may have uncovered a bug that went unnoticed so far.  Please try to
  discover what causes this problem.  I can't guess why the last character
  is messed up, looks strange.
 
 In fact this bug was noticed years before. But most of the Chinese
 people decided to tolerate this situation. Anyway, I'm going to work
 on~

Yeah, I have noticed that Asian people don't often report problems.
Fixing them would still be nice!

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: failure notice

2006-08-12 Thread Bram Moolenaar

Tony Mechelynck wrote:

 Bram Moolenaar wrote:
  Edward L. Fox wrote:
 [...]
  The menu.vim file should never change 'encoding'.  It should load menus
  that are appropriate for the current 'encoding' and language.
  But gvim doesn't support an encoding named 'gbk'. If the system
  encoding is 'gbk', the menu and toolbar get malformed.
  
  What do you mean by system encoding?  How does Vim see this?
 [...]
 
 I think he means the charset part of the system locale, as used to 
 set 'encoding' before sourcing the [._]vimrc. $LC_CTYPE maybe? On my 
 Windows system gvim -u NONE shows all strings preset to 
 'French_Belgium.1252' and gvim starts up in French and Latin1; on Linux 
 I have $LC_CTYPE='en_US.UTF-8', the rest empty in bash, set to C in 
 gvim, and gvim starts up in English and in UTF-8. IIRC, Edward had 
 zh_CN.gbk and his gvim started in Chinese with unreadable menus and 
 tooltips. Making gbk an alias for cp936 solved the menu problem, but 
 only partially the tooltip problem. I suspect a byte-counting bug in one 
 or more of the routines responsible for the tooltips' storage and 
 display, manifesting on multibyte locales like CP936.

If this is on Unix, I don't think that cp936 is completely supported.  I
can make gbk an alias for cp936, but I don't think it will help much.

-- 
hundred-and-one symptoms of being an internet addict:
125. You begin to wonder how often it REALLY is necessary to get up
 and shower or bathe.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: failure notice

2006-08-12 Thread Edward L. Fox

On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote:

[...]
 But gvim doesn't support an encoding named 'gbk'. If the system
 encoding is 'gbk', the menu and toolbar get malformed.

What do you mean by system encoding?  How does Vim see this?


I meant the $LANG variable.

GBK is right an alias of cp936, so it is proper to add this alias
entry into mbyte.c. But the situation with GB18030 was much more
complicated and the current version of gvim is not able to handle it
correctly. About GB18030 there is another long and not-so-funny but
ridiculous story. If you like, I can tell you the detailed GOSSIP
later...

Because over 99% part of GB18030 and GBK is the same, and the
remaining part is too difficult to handle, I want to set GB18030 as
another alias of cp936. Do you think it is OK?


[...]

You may have uncovered a bug that went unnoticed so far.  Please try to
discover what causes this problem.  I can't guess why the last character
is messed up, looks strange.


In fact this bug was noticed years before. But most of the Chinese
people decided to tolerate this situation. Anyway, I'm going to work
on~


[...]



Regards,

Edward L. Fox


Re: failure notice

2006-08-12 Thread Edward L. Fox

Hi Bram,

On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote:

[...]
You may have uncovered a bug that went unnoticed so far.  Please try to
discover what causes this problem.  I can't guess why the last character
is messed up, looks strange.


I think I solved the problem! That was caused by iconv.

  size_t iconv(iconv_t cd,
char **inbuf, size_t *inbytesleft,
char **outbuf, size_t *outbytesleft);

The parameter inbytesleft and outbytesleft should all include the
trailing '\0' byte. In the previous version of gvim, we passed the
parameter as the length of the string, excluding the trailing '\0'. So
it is 1 byte less than the correct value.

As we know, every Chinese character is presented with 2 bytes in GBK encoding:

 AABBCCDD
 \--/
 4 characters

Because we passed the parameter as the length of the string (8 in this
example), so iconv treated the input string as 1 byte less (7 in this
example), then the 2nd but last letter was not able to be converted
because it is only half of a character, so gvim changed it to a
question mark:

 AABBCC?D

After that, gvim tried to convert the remaining 1 byte to the target
encoding but also failed. Then vim changed it to a question mark, too.

 AABBCC??

That is why every last character of the tooltip became 2 question
marks. Menu doesn't get malformed because most of the menu items are
not ended with a Chinese character. In fact, some menu item ends with
Chinese character also get malformed.


[...]


It's quite simple after finding out the problem. Here is the patch, in
which I also alias GBK and GB18030 to cp936. That solved the previous
problem I requested:

*** src/mbyte.c2006-05-14 20:32:49.0 +0800
--- ../vim7.build/vim7/src/mbyte.c 2006-08-12 19:22:06.0 +0800
***
*** 372,377 
--- 372,379 
 {5601,  IDX_EUC_KR},/* Sun: KS C 5601 */
 {euccn, IDX_EUC_CN},
 {gb2312,IDX_EUC_CN},
+ {gbk,   IDX_CP936},
+ {gb18030,   IDX_CP936},
 {euctw, IDX_EUC_TW},
 #if defined(WIN3264) || defined(WIN32UNIX) || defined(MACOS)
 {japan, IDX_CP932},
***
*** 3250,3256 
   }

   to = (char *)result + done;
!   tolen = len - done - 2;
   /* Avoid a warning for systems with a wrong iconv() prototype by
* casting the second argument to void *. */
   if (iconv(vcp-vc_fd, (void *)from, fromlen, to, tolen)
--- 3252,3259 
   }

   to = (char *)result + done;
!   tolen = len - done - 1;
! ++fromlen;
   /* Avoid a warning for systems with a wrong iconv() prototype by
* casting the second argument to void *. */
   if (iconv(vcp-vc_fd, (void *)from, fromlen, to, tolen)



Best Regards,

Edward L. Fox


Re: failure notice

2006-08-11 Thread Bram Moolenaar

Edward L. Fox wrote:

 Sorry for sending this mail for the second time because my previous
 mail with attachment was rejected by the mail daemon. :-(
 
 On 8/11/06, Bram Moolenaar [EMAIL PROTECTED] wrote:
 
  [...]
 
  The menu.vim file should never change 'encoding'.  It should load menus
  that are appropriate for the current 'encoding' and language.
 
 But gvim doesn't support an encoding named 'gbk'. If the system
 encoding is 'gbk', the menu and toolbar get malformed.

What do you mean by system encoding?  How does Vim see this?

 In the past two by system encoding?  How does Vim see this?  years
 (or more), all gvim users in mainland China should add the following
 two lines in their .vimrc if they are using Linux with GBK encoding:
 
 set enc=cp936
 so $VIMRUNTIME/delmenu.vim
 so $VIMRUNTIME/menu.vim
 
 That's why I had wanted to change the encoding value in menu.vim. :-)
 
  If the intention is to have the default for 'encoding' use something
  specific in $LANG then this must be done in enc_locale() in src/mbyte.c
 
 I modified mbyte.c, added gbk as an alias of cp936, then the
 menubar was displayed properly with the unmodified menu.vim. But there
 is still some problem with the toolbar - every last character of the
 tooltip becomes two question marks. I'm trying to debug the code and
 will send you another patch as soon as I solve the problem. Hope you
 can offer me some hints, too. :-)

You may have uncovered a bug that went unnoticed so far.  Please try to
discover what causes this problem.  I can't guess why the last character
is messed up, looks strange.

-- 
hundred-and-one symptoms of being an internet addict:
115. You are late picking up your kid from school and try to explain
 to the teacher you were stuck in Web traffic.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Re: failure notice

2006-08-10 Thread Edward L. Fox

Hi vimmers,

Sorry for sending this mail for the second time because my previous
mail with attachment was rejected by the mail daemon. :-(

On 8/11/06, Bram Moolenaar [EMAIL PROTECTED] wrote:


[...]

The menu.vim file should never change 'encoding'.  It should load menus
that are appropriate for the current 'encoding' and language.


But gvim doesn't support an encoding named 'gbk'. If the system
encoding is 'gbk', the menu and toolbar get malformed. In the past two
years (or more), all gvim users in mainland China should add the
following two lines in their .vimrc if they are using Linux with GBK
encoding:

set enc=cp936
so $VIMRUNTIME/delmenu.vim
so $VIMRUNTIME/menu.vim

That's why I had wanted to change the encoding value in menu.vim. :-)


If the intention is to have the default for 'encoding' use something
specific in $LANG then this must be done in enc_locale() in src/mbyte.c


I modified mbyte.c, added gbk as an alias of cp936, then the
menubar was displayed properly with the unmodified menu.vim. But there
is still some problem with the toolbar - every last character of the
tooltip becomes two question marks. I'm trying to debug the code and
will send you another patch as soon as I solve the problem. Hope you
can offer me some hints, too. :-)


[...]



Regards,

Edward L. Fox