Re: Suggest ':TOhtml' to use 'fileencoding' rather than 'encoding' as default html charset

Tony Mechelynck Sat, 28 Aug 2010 14:20:27 -0700

On 26/08/10 16:40, Ben Fritz wrote:



On Aug 25, 11:11 pm, JiaYanwei<jia...@126.com>  wrote:

I think this will be more reasonable than before.

If the encoding of edited text file differ form the system/vim encoding, it's
inconvenient to set default HTML charset to be 'encoding'. Thus, after
':TOhtml', we should modify the generated HTML file to make the file encoding
the same as HTML charset.

e.g. If the system/vim encoding is 'UTF-8', but a text file encoding is
'latin-1'. If the default HTML charset is 'encoding', after ':TOhtml', we
should change the HTML charset to 'iso-8859-1', or save the generated HTML
file by ':w ++enc=utf-8'. But if the default HTML charset is 'fileencoding',
we should do nothing after ':TOhtml'.


Thanks, I'll take a look. I don't yet have a good handle on
'encoding', 'fileencoding', and any other related options. It looks
like I'm going to need to.

From my understanding, 'fileencoding' is the encoding Vim is supposed

to use to read/write the file. So, it does make sense that we should
use this instead of just 'encoding' for the charset of the generated
html. Does anyone know why TOhtml has used 'encoding' instead? I have
not touched the charset detection code yet, other than to move it from
the 2html.vim file into the autoload/tohtml.vim file.


You got it right, and it does indeed make sense.

One possibility is that anything can be represented in UTF-8, includingtext not yet saved from the latest edit of the file, and possiblyincompatible with the 'fileencoding' - such text is of course in error,and will cause an error if one tries to save it.


You say you need to do nothing to the TOhtml output if we set the
charset to the file encoding. But, don't we also need to ensure that
the file encoding of the new html file is the same as the file
encoding of the source file? The file encoding could be different from
file to file, whereas Vim's encoding is always the same. I can picture
this causing problems, if the charset says one thing, but the file
encoding is different.

HTML metadata can be written in ASCII. If needed, one can use &#nnnnn;entities in text (where nnnnn is the decimal representation of theUnicode codepoint number; recent browsers accept also &#xnnnn; where xis the letter x as in X-Ray and nnnn is the hex representation) orpercent-escaping in URLs (where, even in a Latin1 HTML page,percent-escaping always escapes each byte of the UTF-8 representationseparately, with a % sign followed by exactly two hex digits: forinstance U+00E9 (Latin small letter e with acute) would be representedas %C3%A9 and U+4E00 (Chinese "number one" horizontal-stroke sign) wouldbe represented as %E4%B8%80 in a URL, including in the query text if any.


By the way, until this is fixed...you can use the g:html_use_encoding
option to override the normal detection mechanisms, rather than
manually editing the generated HTML file.


Best regards,
Tony.
--
If you put garbage in a computer nothing comes out but garbage.  But
this garbage, having passed through a very expensive machine, is
somehow enobled and none dare criticize it.

--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Re: Suggest ':TOhtml' to use 'fileencoding' rather than 'encoding' as default html charset

Raspunde prin e-mail lui