On 26/08/10 16:40, Ben Fritz wrote:


On Aug 25, 11:11 pm, JiaYanwei<jia...@126.com>  wrote:
I think this will be more reasonable than before.

If the encoding of edited text file differ form the system/vim encoding, it's
inconvenient to set default HTML charset to be 'encoding'. Thus, after
':TOhtml', we should modify the generated HTML file to make the file encoding
the same as HTML charset.

e.g. If the system/vim encoding is 'UTF-8', but a text file encoding is
'latin-1'. If the default HTML charset is 'encoding', after ':TOhtml', we
should change the HTML charset to 'iso-8859-1', or save the generated HTML
file by ':w ++enc=utf-8'. But if the default HTML charset is 'fileencoding',
we should do nothing after ':TOhtml'.


Thanks, I'll take a look. I don't yet have a good handle on
'encoding', 'fileencoding', and any other related options. It looks
like I'm going to need to.

From my understanding, 'fileencoding' is the encoding Vim is supposed
to use to read/write the file. So, it does make sense that we should
use this instead of just 'encoding' for the charset of the generated
html. Does anyone know why TOhtml has used 'encoding' instead? I have
not touched the charset detection code yet, other than to move it from
the 2html.vim file into the autoload/tohtml.vim file.

You got it right, and it does indeed make sense.
One possibility is that anything can be represented in UTF-8, including text not yet saved from the latest edit of the file, and possibly incompatible with the 'fileencoding' - such text is of course in error, and will cause an error if one tries to save it.


You say you need to do nothing to the TOhtml output if we set the
charset to the file encoding. But, don't we also need to ensure that
the file encoding of the new html file is the same as the file
encoding of the source file? The file encoding could be different from
file to file, whereas Vim's encoding is always the same. I can picture
this causing problems, if the charset says one thing, but the file
encoding is different.

HTML metadata can be written in ASCII. If needed, one can use &#nnnnn; entities in text (where nnnnn is the decimal representation of the Unicode codepoint number; recent browsers accept also &#xnnnn; where x is the letter x as in X-Ray and nnnn is the hex representation) or percent-escaping in URLs (where, even in a Latin1 HTML page, percent-escaping always escapes each byte of the UTF-8 representation separately, with a % sign followed by exactly two hex digits: for instance U+00E9 (Latin small letter e with acute) would be represented as %C3%A9 and U+4E00 (Chinese "number one" horizontal-stroke sign) would be represented as %E4%B8%80 in a URL, including in the query text if any.


By the way, until this is fixed...you can use the g:html_use_encoding
option to override the normal detection mechanisms, rather than
manually editing the generated HTML file.


Best regards,
Tony.
--
If you put garbage in a computer nothing comes out but garbage.  But
this garbage, having passed through a very expensive machine, is
somehow enobled and none dare criticize it.

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Raspunde prin e-mail lui