Re: failure notice
On 8/13/06, A.J.Mechelynck [EMAIL PROTECTED] wrote: [...] Edward had it on Windows. From :help encoding-values I gather that Chinese and prc are alias to cp936 / euc-cn. Maybe gbk and gb18030 can be added to the family? I'm using Debian Etch. But I had a look at the Windoze system and found that cp936 is supported well in both Linux and Windoze, however, GBK is not supported by any of them. euc-cn is an alias of GB2312, which is only a subset of GBK. So we should not put them together. GB18030 is not exactly the same with GBK but 99% of them is the same, the remaining different part is cared by nobody in the world, and is very very complicated and very very difficult to support. Moreover, very few X servers support this encoding. So I suggest to alias GB18030 to cp936, too, simply and wrongly. :-) After having discussed about the charset, I think it's right time to do some work on the malformed characters in the toolbar tooltips. I made a patch and solved the problem yesterday (or at least it was seemed to be solved). Can anybody review my changes and give some suggestions? Thanks. http://groups.yahoo.com/group/vim/message/72396 Best regards, Tony.
Re: failure notice
[removed the Vim maillist, this is development only] Edward L. Fox wrote: On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. I think I solved the problem! That was caused by iconv. size_t iconv(iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft); The parameter inbytesleft and outbytesleft should all include the trailing '\0' byte. In the previous version of gvim, we passed the parameter as the length of the string, excluding the trailing '\0'. So it is 1 byte less than the correct value. This is not quite so. iconv() does not require the terminating NUL (it can also be used to convert part of a string). If it does require the NUL then iconv() is broken. That's unlikely though. Your change suggests that the length that is passed should be one more. Thus only one byte of the last double-byte character is currently converted. I can't quickly figure out where the wrong length is computed or passed. You probably already know the call stack, please have a look at where the length comes from. It's probably an off-by-one error somewhere. -- hundred-and-one symptoms of being an internet addict: 129. You cancel your newspaper subscription. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: failure notice
On 8/13/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [removed the Vim maillist, this is development only] Edward L. Fox wrote: On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. I think I solved the problem! That was caused by iconv. size_t iconv(iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft); The parameter inbytesleft and outbytesleft should all include the trailing '\0' byte. In the previous version of gvim, we passed the parameter as the length of the string, excluding the trailing '\0'. So it is 1 byte less than the correct value. This is not quite so. iconv() does not require the terminating NUL (it can also be used to convert part of a string). If it does require the NUL then iconv() is broken. That's unlikely though. I wrote a short piece of testing code to test iconv with Chinese characters. The fact is, if the last character is a Chinese character, it is always malformed after converting. So I think it should be necessary to pass the length including the trailing '\0' to iconv. 8 #include iconv.h int main(void) { char inbuffer[256]; char outbuffer[256]; int fd; fd = iconv_open(cp936, euc-cn); for (;;) { int inlength, outlength; char *inptr, *outptr; gets(inbuffer); inlength = strlen(inbuffer); outlength = 256; if (inlength == 0) break; inptr = inbuffer; outptr = outbuffer; iconv(fd, inptr, inlength, outptr, outlength); printf(%s\n, outbuffer); } iconv_close(fd); return 0; } 8 Your change suggests that the length that is passed should be one more. Thus only one byte of the last double-byte character is currently converted. I can't quickly figure out where the wrong length is computed or passed. You probably already know the call stack, please have a look at where the length comes from. It's probably an off-by-one error somewhere. I traced the code again and again but nothing special happened. You called string_convert and pass 0 as the length of the string, so in string_convert_ext you calculates the length of the string with STRLEN, then call iconv_string, last iconv. There is nothing wrong with the length anywhere. So... Maybe it is still iconv's fault. [...] Regards, Edward L. Fox
Re: failure notice
[...] I traced the code again and again but nothing special happened. You called string_convert and pass 0 as the length of the string, so in string_convert_ext you calculates the length of the string with STRLEN, then call iconv_string, last iconv. There is nothing wrong with the length anywhere. So... Maybe it is still iconv's fault. Sorry all. I did more tests and searched more documents and found that it was a bug of libiconv, not gvim. The problem occurs only when converting gb2312 to gbk. I'm trying to debug libiconv... [...] Ashamed Edward L. Fox
Re: failure notice
Edward L. Fox wrote: On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] But gvim doesn't support an encoding named 'gbk'. If the system encoding is 'gbk', the menu and toolbar get malformed. What do you mean by system encoding? How does Vim see this? I meant the $LANG variable. GBK is right an alias of cp936, so it is proper to add this alias entry into mbyte.c. But the situation with GB18030 was much more complicated and the current version of gvim is not able to handle it correctly. About GB18030 there is another long and not-so-funny but ridiculous story. If you like, I can tell you the detailed GOSSIP later... Because over 99% part of GB18030 and GBK is the same, and the remaining part is too difficult to handle, I want to set GB18030 as another alias of cp936. Do you think it is OK? I can alias gbk and gb18030 to cp936. But does setting 'encoding' to cp936 really work on a non-Windows system? If not then the alias won't help. [...] You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. In fact this bug was noticed years before. But most of the Chinese people decided to tolerate this situation. Anyway, I'm going to work on~ Yeah, I have noticed that Asian people don't often report problems. Fixing them would still be nice! -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: failure notice
Tony Mechelynck wrote: Bram Moolenaar wrote: Edward L. Fox wrote: [...] The menu.vim file should never change 'encoding'. It should load menus that are appropriate for the current 'encoding' and language. But gvim doesn't support an encoding named 'gbk'. If the system encoding is 'gbk', the menu and toolbar get malformed. What do you mean by system encoding? How does Vim see this? [...] I think he means the charset part of the system locale, as used to set 'encoding' before sourcing the [._]vimrc. $LC_CTYPE maybe? On my Windows system gvim -u NONE shows all strings preset to 'French_Belgium.1252' and gvim starts up in French and Latin1; on Linux I have $LC_CTYPE='en_US.UTF-8', the rest empty in bash, set to C in gvim, and gvim starts up in English and in UTF-8. IIRC, Edward had zh_CN.gbk and his gvim started in Chinese with unreadable menus and tooltips. Making gbk an alias for cp936 solved the menu problem, but only partially the tooltip problem. I suspect a byte-counting bug in one or more of the routines responsible for the tooltips' storage and display, manifesting on multibyte locales like CP936. If this is on Unix, I don't think that cp936 is completely supported. I can make gbk an alias for cp936, but I don't think it will help much. -- hundred-and-one symptoms of being an internet addict: 125. You begin to wonder how often it REALLY is necessary to get up and shower or bathe. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: failure notice
On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] But gvim doesn't support an encoding named 'gbk'. If the system encoding is 'gbk', the menu and toolbar get malformed. What do you mean by system encoding? How does Vim see this? I meant the $LANG variable. GBK is right an alias of cp936, so it is proper to add this alias entry into mbyte.c. But the situation with GB18030 was much more complicated and the current version of gvim is not able to handle it correctly. About GB18030 there is another long and not-so-funny but ridiculous story. If you like, I can tell you the detailed GOSSIP later... Because over 99% part of GB18030 and GBK is the same, and the remaining part is too difficult to handle, I want to set GB18030 as another alias of cp936. Do you think it is OK? [...] You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. In fact this bug was noticed years before. But most of the Chinese people decided to tolerate this situation. Anyway, I'm going to work on~ [...] Regards, Edward L. Fox
Re: failure notice
Hi Bram, On 8/12/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. I think I solved the problem! That was caused by iconv. size_t iconv(iconv_t cd, char **inbuf, size_t *inbytesleft, char **outbuf, size_t *outbytesleft); The parameter inbytesleft and outbytesleft should all include the trailing '\0' byte. In the previous version of gvim, we passed the parameter as the length of the string, excluding the trailing '\0'. So it is 1 byte less than the correct value. As we know, every Chinese character is presented with 2 bytes in GBK encoding: AABBCCDD \--/ 4 characters Because we passed the parameter as the length of the string (8 in this example), so iconv treated the input string as 1 byte less (7 in this example), then the 2nd but last letter was not able to be converted because it is only half of a character, so gvim changed it to a question mark: AABBCC?D After that, gvim tried to convert the remaining 1 byte to the target encoding but also failed. Then vim changed it to a question mark, too. AABBCC?? That is why every last character of the tooltip became 2 question marks. Menu doesn't get malformed because most of the menu items are not ended with a Chinese character. In fact, some menu item ends with Chinese character also get malformed. [...] It's quite simple after finding out the problem. Here is the patch, in which I also alias GBK and GB18030 to cp936. That solved the previous problem I requested: *** src/mbyte.c2006-05-14 20:32:49.0 +0800 --- ../vim7.build/vim7/src/mbyte.c 2006-08-12 19:22:06.0 +0800 *** *** 372,377 --- 372,379 {5601, IDX_EUC_KR},/* Sun: KS C 5601 */ {euccn, IDX_EUC_CN}, {gb2312,IDX_EUC_CN}, + {gbk, IDX_CP936}, + {gb18030, IDX_CP936}, {euctw, IDX_EUC_TW}, #if defined(WIN3264) || defined(WIN32UNIX) || defined(MACOS) {japan, IDX_CP932}, *** *** 3250,3256 } to = (char *)result + done; ! tolen = len - done - 2; /* Avoid a warning for systems with a wrong iconv() prototype by * casting the second argument to void *. */ if (iconv(vcp-vc_fd, (void *)from, fromlen, to, tolen) --- 3252,3259 } to = (char *)result + done; ! tolen = len - done - 1; ! ++fromlen; /* Avoid a warning for systems with a wrong iconv() prototype by * casting the second argument to void *. */ if (iconv(vcp-vc_fd, (void *)from, fromlen, to, tolen) Best Regards, Edward L. Fox
Re: failure notice
Edward L. Fox wrote: Sorry for sending this mail for the second time because my previous mail with attachment was rejected by the mail daemon. :-( On 8/11/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] The menu.vim file should never change 'encoding'. It should load menus that are appropriate for the current 'encoding' and language. But gvim doesn't support an encoding named 'gbk'. If the system encoding is 'gbk', the menu and toolbar get malformed. What do you mean by system encoding? How does Vim see this? In the past two by system encoding? How does Vim see this? years (or more), all gvim users in mainland China should add the following two lines in their .vimrc if they are using Linux with GBK encoding: set enc=cp936 so $VIMRUNTIME/delmenu.vim so $VIMRUNTIME/menu.vim That's why I had wanted to change the encoding value in menu.vim. :-) If the intention is to have the default for 'encoding' use something specific in $LANG then this must be done in enc_locale() in src/mbyte.c I modified mbyte.c, added gbk as an alias of cp936, then the menubar was displayed properly with the unmodified menu.vim. But there is still some problem with the toolbar - every last character of the tooltip becomes two question marks. I'm trying to debug the code and will send you another patch as soon as I solve the problem. Hope you can offer me some hints, too. :-) You may have uncovered a bug that went unnoticed so far. Please try to discover what causes this problem. I can't guess why the last character is messed up, looks strange. -- hundred-and-one symptoms of being an internet addict: 115. You are late picking up your kid from school and try to explain to the teacher you were stuck in Web traffic. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org///
Re: failure notice
Hi vimmers, Sorry for sending this mail for the second time because my previous mail with attachment was rejected by the mail daemon. :-( On 8/11/06, Bram Moolenaar [EMAIL PROTECTED] wrote: [...] The menu.vim file should never change 'encoding'. It should load menus that are appropriate for the current 'encoding' and language. But gvim doesn't support an encoding named 'gbk'. If the system encoding is 'gbk', the menu and toolbar get malformed. In the past two years (or more), all gvim users in mainland China should add the following two lines in their .vimrc if they are using Linux with GBK encoding: set enc=cp936 so $VIMRUNTIME/delmenu.vim so $VIMRUNTIME/menu.vim That's why I had wanted to change the encoding value in menu.vim. :-) If the intention is to have the default for 'encoding' use something specific in $LANG then this must be done in enc_locale() in src/mbyte.c I modified mbyte.c, added gbk as an alias of cp936, then the menubar was displayed properly with the unmodified menu.vim. But there is still some problem with the toolbar - every last character of the tooltip becomes two question marks. I'm trying to debug the code and will send you another patch as soon as I solve the problem. Hope you can offer me some hints, too. :-) [...] Regards, Edward L. Fox