Re: Suggest ':TOhtml' to use 'fileencoding' rather than 'encoding' as default html charset

2010-08-29 Fir de Conversatie JiaYanwei
Sorry, it's my omission, I had set 'fileencoding' in '.vimrc'...

ps:
Excuse me to get this message so late. I cannot visit google group
last few days.

On 2010-8-28, 03:37 Ben Fritz fritzophre...@gmail.com wrote:
 On Aug 25, 11:11 pm, JiaYanwei jia...@126.com wrote:



  e.g. If the system/vim encoding is 'UTF-8', but a text file encoding is
  'latin-1'. If the default HTML charset is 'encoding', after ':TOhtml', we
  should change the HTML charset to 'iso-8859-1', or save the generated HTML
  file by ':w ++enc=utf-8'.

 Hmm...unless I understand correctly, the sequence is:

 Load text file. File encoding is latin-1, Vim encoding is utf-8.
 Do :TOhtml to create a new html buffer. File encoding defaults to
 empty, Vim encoding is still utf-8.
 :TOhtml sees encoding and sets the charset in the generated markup to
 UTF-8.
 :w the new html buffer. Vim sees empty file encoding, so uses utf-8 as
 the new file's encoding. Thus file encoding matches the html charset.

 You claim that the new html buffer has latin-1 encoding. Am I
 missing something here?

 I still think using fileencoding might be the correct way to do it,
 but doing so would require 2html.vim to set the file encoding of the
 new html buffer explicitly to be equal to the source file.

 This also brings up another shortcoming of 2html, because using
 g:html_use_encoding may change the html charset meta tag, but it does
 NOT change the actual character encoding of the file. It looks like I
 will need to set the fileencoding of the new html buffer to whatever
 corresponds to the supplied user option as a separate fix.

 Any thoughts?

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


Re: Suggest ':TOhtml' to use 'fileencoding' rather than 'encoding' as default html charset

2010-08-26 Fir de Conversatie JiaYanwei
Oh, sorry, I forgeted that 'fileencoding' may be empty. This should be 
handled.

I encountered the opposite that 'fileencoding' is often different from 
'encoding' while editing existing files.

Ben Fritz wrote:
 On Aug 26, 9:40 am, Ben Fritz fritzophre...@gmail.com wrote:
 
  From my understanding, 'fileencoding' is the encoding Vim is supposed
  to use to read/write the file. So, it does make sense that we should
  use this instead of just 'encoding' for the charset of the generated
  html. Does anyone know why TOhtml has used 'encoding' instead?
 
 
 One problem with the supplied patch, is that Vim will use 'encoding'
 for a file's encoding, if 'fileencoding' is empty. In my setup, it
 looks like 'fileencoding' is nearly always empty.
 
 So, the script will need to fall back to 'encoding' if 'fileencoding'
 is empty. Probably it should also re-detect the charset using
 'encoding' when 'fileencoding' is not blank but does not resolve to a
 valid charset.
 
 Any thoughts? Like I said, I've never needed to mess with 'encoding'
 or 'fileencoding' in my daily use of Vim.

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


Suggest ':TOhtml' to use 'fileencoding' rather than 'encoding' as default html charset

2010-08-25 Fir de Conversatie JiaYanwei

I think this will be more reasonable than before.

If the encoding of edited text file differ form the system/vim encoding, it's 
inconvenient to set default HTML charset to be 'encoding'. Thus, after 
':TOhtml', we should modify the generated HTML file to make the file encoding 
the same as HTML charset.

e.g. If the system/vim encoding is 'UTF-8', but a text file encoding is 
'latin-1'. If the default HTML charset is 'encoding', after ':TOhtml', we
should change the HTML charset to 'iso-8859-1', or save the generated HTML
file by ':w ++enc=utf-8'. But if the default HTML charset is 'fileencoding', 
we should do nothing after ':TOhtml'.

Changes as the attachment.

Best regards, 
Yanwei. 
--  

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


tohtml.diff
Description: Binary data


Re:Re: [Win32] common dialogs of gVim cannot input some Unicode characters from IME

2010-08-07 Fir de Conversatie JiaYanwei




At 2010-08-07 21:57:23,Tony Mechelynck antoine.mechely...@gmail.com wrote:
On 04/08/10 19:16, JiaYanwei wrote:
 At 2010-08-04 23:46:23, Bram Moolenaarb...@moolenaar.net  wrote:
 JiaYanwei  wrote:
 For example, I work with Windows Xp Simplified Chinese Edition. There's a
 character 'CIRCLED NUMBER TWENTY' - U+2473, beyond the character set of ACP
 (system active codepage) CP936. While it can be copyed and pasted into the
 textbox of Find and Replace dialog, but it can't be inputed directly from
 windows IME (the inputed character will be the question mark '?').
 It puzzled me for a long time. I finally found the reason that ANSI Version
 functions such as DispatchMessageA and IsDialogMessageA will Ignore the
 WM_WCHAR message.
 The attachment 2274_uime.patch.gz is the patch for vim 7.2.446, 
 2477_uime.patch.gz is for 7.3d revision 2...@mercurial.
 Thanks.
 Can a few people verify this works OK with different compilers?
 I have just compiled it with msvc2005 express  mingw and also have tested 
 it.
 It works ok. ps:
 I have got a same waring many times while compile it by vc2005:
  warning C4819: The file contains a character that cannot be represented
in the current code page (936). Save the  file in Unicode format to
prevent data loss
 This warning is useful for the IDE since soure maybe modified by it. But we
 don't compile vim with the IDE, so... could we add /wd4819 to CFLAGS to 
 disable it?
OTOH, instead of having the Unicode codepoint in UTF-8, maybe it should 
be represented in some sort of escape format? I'm not sure whether 
\u2473 or \xE2\x91\xB3 or something else is the right representation 
in this case though.
Of course, you can input any codepoint into Vim (with 'encoding' set to 
UTF-8) by bypassing the IME, in this case by using Ctrl-V u 2 4 7 3 
without the spaces. Or if you use it often, you can assign it to a 
mapping or make up a keymap (about the latter, see 
http://vim.wikia.com/wiki/How_to_make_a_keymap ).
Thanks.

Maybe I have't explained clearly. I just wish I can input Unicode Characters 
that beyond ACP by IME(e.g. some Pinyin input method, not directly by enter 
Unicode hex sequence) into Find and Replace dialog of gVim. Maybe the table 
as follows could help explain this more clearly:

  gVim  gVim-RP notepad notepad-RP
Copy  paste characters inside ACP  +  ++   +
Input characters inside ACP by IME  +  ++   +
Copy  paste characters beyond ACP  +  ++   +
Input characters beyond ACP by IME  +  -+   +

gVim: main edit window of gVim-win32
gVim-RP: the textbox of Find and Replace dialog of gVim-win32
notpad: main edit window of the notepad.exe of Windows
notepad-RP: the textbox of Find and Replace dialog of notepad.exe




 --
 hundred-and-one symptoms of being an internet addict:
 2. You kiss your girlfriend's home page.

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
 ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
 \\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///


Best regards,
Tony.
-- 
Violators can be fined, arrested or jailed for making ugly faces at a dog.
   [real standing law in Oklahoma, United States of America]

Best regards,
Yanwei.
--

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


[Win32] common dialogs of gVim cannot input some Unicode characters from IME

2010-08-04 Fir de Conversatie JiaYanwei
For example, I work with Windows Xp Simplified Chinese Edition. There's a 
character 'CIRCLED NUMBER TWENTY' - U+2473, beyond the character set of ACP 
(system active codepage) CP936. While it can be copyed and pasted into the 
textbox of Find and Replace dialog, but it can't be inputed directly from 
windows IME (the inputed character will be the question mark '?').

It puzzled me for a long time. I finally found the reason that ANSI Version 
functions such as DispatchMessageA and IsDialogMessageA will Ignore the 
WM_WCHAR message. 

The attachment 2274_uime.patch.gz is the patch for vim 7.2.446, 
2477_uime.patch.gz is for 7.3d revision 2...@mercurial.

Best regards, 
Yanwei. 
--  

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


2274_uime.patch.gz
Description: GNU Zip compressed data


2477_uime.patch.gz
Description: GNU Zip compressed data


Re: [Win32] common dialogs of gVim cannot input some Unicode characters from IME

2010-08-04 Fir de Conversatie JiaYanwei

At 2010-08-04 23:46:23, Bram Moolenaar b...@moolenaar.net wrote:


JiaYanwei  wrote:

 For example, I work with Windows Xp Simplified Chinese Edition. There's a 
 character 'CIRCLED NUMBER TWENTY' - U+2473, beyond the character set of ACP 
 (system active codepage) CP936. While it can be copyed and pasted into the 
 textbox of Find and Replace dialog, but it can't be inputed directly from 
 windows IME (the inputed character will be the question mark '?').
 
 It puzzled me for a long time. I finally found the reason that ANSI Version 
 functions such as DispatchMessageA and IsDialogMessageA will Ignore the 
 WM_WCHAR message. 
 
 The attachment 2274_uime.patch.gz is the patch for vim 7.2.446, 
 2477_uime.patch.gz is for 7.3d revision 2...@mercurial.

Thanks.

Can a few people verify this works OK with different compilers?

I have just compiled it with msvc2005 express  mingw and also have tested it. 
It works ok.

ps:
I have got a same waring many times while compile it by vc2005:
warning C4819: The file contains a character that cannot be represented 
  in the current code page (936). Save the  file in Unicode format to 
  prevent data loss
This warning is useful for the IDE since soure maybe modified by it. But we 
don't compile vim with the IDE, so... could we add /wd4819 to CFLAGS to 
disable it?


-- 
hundred-and-one symptoms of being an internet addict:
2. You kiss your girlfriend's home page.

 /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

-- 
You received this message from the vim_dev maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php


[Win32] some dialog boxes of gVim doesn't support Unicode

2008-12-12 Fir de Conversatie JiaYanwei
The dialogs are poped up by the function inputdialog() and the commands 
promptfind, promptrepl. The procedures inside gVim cannot get the correct 
input from these dialogs if the input contains any unicode character beyond 
the character set of ACP (system active codepage), even if the gVim runs under
Windows NT and with the setting 'enc=utf-8'. In fact, if encoding is set to 
UTF-8 or any other encoding that differs from the ACP, there may be more 
problems to get the input from these dialogs since there's no encoding 
convert.

Here's a patch. It will detect Windows OS version when use these dialogs. If 
it is Windows NT, the wide versions of Windows API will be used instead of 
non-wide versions to get the input, then convert the wide string to the 
encoding used inside gVim.

Best regards,
Yanwei.
-- 

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



for72069.gz
Description: application/gzip-compressed


Gvim for Windows doesn't handle non-BMP characters when interchanging data with Windows OS

2008-10-22 Fir de Conversatie JiaYanwei
When interchanging data with Windows such as clipboard operation, gvim will 
convert the text into UCS-2 encoding, but different from UTF-16, UCS-2 can't 
encode non-BMP characters. 

For example, when paste a non-BMP character U+248BB from Windows clipboard, 
it will insert two separated characters d852 dcbb. It is caused by the 
function ucs2_to_utf8() in src/os_mswin.c, which treates the surrogate pairs 
as separated unicode characters, and convert it into bad UTF-8 sequence 
0xED 0xA1 0x92 0xED 0xB2 0xBB -- the correct UTF-8 sequence should be 
0xF0 0xA4 0xA2 0xBB.

Similarly, when copy a non-BMP character U+248BB into Windows clipboard, the 
content of clipboard will be U+48BB, because the function utf8_to_ucs2() 
in src/os_mswin.c will cast the integer 0x248BB into a short integer 0x48BB.

The attachment is a patch. The surrogate pairs handling has been add into the 
two functions mentioned above. This make the non-BMP characters can be 
correctly interchanged with Windows clipboard as I had tested:
Non-BMP character paste from/copy into Windows clipboard
+--+++
|  | WindowsXP with GB18030 support |  Windows 98|
+--+++
| editing  | before patch works bad | before patch works bad |
| UTF-* or | after patch works OK   | after patch works OK   |
| UCS-4*   |||
| text |||
+--+++
| editing  | before patch works bad | ( can not edit |
| GB18030  | after patch works OK   |   GB18030 text )   |
| text |||
+--+++
B.T.W.: It seems better to replace the functions name mentioned above with 
utf16_to_utf8 and utf8_to_utf16, I think.

Best regards,
Yanwei.
--

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



for72025.tgz
Description: Binary data


Re: Gvim for Windows doesn't handle non-BMP characters when interchanging data with Windows OS

2008-10-22 Fir de Conversatie JiaYanwei
Hello Tony,

It's really to be the similar problem, but this one only arise under Windows
operating system, the UTF-16le BOM problem is platform independence. I was 
uncertain wherher a combined patch was convenient.

On 2008-10-22 23:21:11, Tony Mechelynck wrote:
 I expect this is related with the UTF-16le BOM problem you noticed this
 past Saturday. Maybe a combined patch would be OK, since in both cases,
 the problem involves using UCS-2 (where surrogates are undefined)
 instead of UTF-16 (where surrogate pairs encode codepoints above the BMP)? 

Best regards,
Yanwei.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re:Re: Gvim for Windows doesn't handle non-BMP characters when interchanging data with Windows OS

2008-10-22 Fir de Conversatie JiaYanwei
Oh, I had made a mistake, I want to say They're really  similar problems 
the first sentence.

On 2008-10-23 00:16:20, JiaYanwei 
 Hello Tony,
 
 It's really to be the similar problem, but this one only arise under Windows
 operating system, the UTF-16le BOM problem is platform independence. I was 
 uncertain wherher a combined patch was convenient.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Encoding recognizing problem with 2 byte BOM FF FE

2008-10-18 Fir de Conversatie JiaYanwei
For a 2 byte BOM FF FE, ucs-2le is used, which doesn't work for  
little-endian UTF-16 text. 
Like the patch 7.1.261, the only difference is the byte order.
And I have also writen a patch for Vim-7.2.025:

*** ../vim-7.2.025/src/fileio.c Wed Oct 15 15:09:56 2008
--- src/fileio.cSat Oct 18 11:42:25 2008
***
*** 5550,5559 
    name = ucs-4le;   /* FF FE 00 00 */
    len = 4;
    }
!   else if (flags == FIO_ALL || flags == (FIO_UCS2 | FIO_ENDIAN_L))
!   name = ucs-2le;   /* FF FE */
!   else if (flags == (FIO_UTF16 | FIO_ENDIAN_L))
    name = utf-16le;  /* FF FE */
  }
  else if (p[0] == 0xfe  p[1] == 0xff
     (flags == FIO_ALL || flags == FIO_UCS2 || flags == FIO_UTF16))
--- 5550,5561 
    name = ucs-4le;   /* FF FE 00 00 */
    len = 4;
    }
!   
/* For little endian: default to utf-16, it works also for ucs-2 text. */
!   else if (flags == FIO_ALL || flags == (FIO_UTF16 | FIO_ENDIAN_L))
    name = utf-16le;  /* FF FE */
+   else if (flags == (FIO_UCS2 | FIO_ENDIAN_L))
+   name = ucs-2le;   /* FF FE */
+ 
  }
  else if (p[0] == 0xfe  p[1] == 0xff
     (flags == FIO_ALL || flags == FIO_UCS2 || flags == FIO_UTF16))
--
Best regards, 
Yanwei
--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re:Re: Encoding recognizing problem with 2 byte BOM FF FE

2008-10-18 Fir de Conversatie JiaYanwei
Hello Tony,

Thanks for your helpful suggestion.
By the way, wish Bram a wonderful holiday.

on 2008-10-18 18:18:45, Tony Mechelynck wrote: 
I confirm that Vim 7.2.25 with 'fencs' starting in ucs-bom identifies 
UTF-16le files with BOM as if they were UCS-2le, even if codepoints 
above U+ are present, which is an error. For instance U+20025 is 
read back as  (two surrogates shown as distinct characters) 
instead of as one double-wide character.

Bram, there's work for you when you're back from holiday :-). I'm not 
competent to check the proposed patch by eyeball but I hope it does what 
is needed.

Yanwei, in the meantime I suggest the following autocommand (untested) 
as a workaround which doesn't need recompilation:

   au BufReadPost * if (fenc == 'ucs-2le')bomb
   \ | e ++enc=utf-16le | endif


Best regards,
Yanwei. 

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---