On 29/08/10 04:29, Benjamin Fritz wrote:
On Sat, Aug 28, 2010 at 4:16 PM, Tony Mechelynck
<[email protected]> wrote:
From my understanding, 'fileencoding' is the encoding Vim is supposed
to use to read/write the file. So, it does make sense that we should
use this instead of just 'encoding' for the charset of the generated
html. Does anyone know why TOhtml has used 'encoding' instead? I have
not touched the charset detection code yet, other than to move it from
the 2html.vim file into the autoload/tohtml.vim file.
You got it right, and it does indeed make sense.
One possibility is that anything can be represented in UTF-8, including text
not yet saved from the latest edit of the file, and possibly incompatible
with the 'fileencoding' - such text is of course in error, and will cause an
error if one tries to save it.
Ok, I think I'll make the edit, then.
Your response gives me an idea to fix something else that's been
bothering me. Currently, if Vim cannot determine the correct charset
to use, it defaults to not including one at all. I think I'll have it
default the charset and file encoding to UTF-8 if neither the
fileencoding nor the encoding option gives a valid charset. The user
should be able to manually leave out the charset and manually set the
encoding if desired.
Here's what I'm thinking in more detail:
For one buffer:
1. If user specified a charset, try to determine 'fileencoding' from
charset. If this fails, warn the user they will need to manually set
the fileencoding.
2. If no charset is specified, try to determine a charset from the
'fileencoding' option. If successful, use the same 'fileencoding' and
the associated charset in the generated buffer.
3. If could not determine charset from 'fileencoding', try again with
'encoding'. If successful, set 'fileencoding' to blank in the new html
buffer and use the charset from the 'encoding' option.
4. If could not determine charset from either 'encoding' or
'fileencoding', default to UTF-8 and warn the user.
Multiple buffers in diff mode will be done similarly, except that we
will determine the charset as above for ALL buffers. If they differ,
set 'fileencoding' to blank and use the charset from 'encoding' (or
UTF-8 if cannot determine charset from 'encoding').
What do you think? Or maybe this is too complicated and I should just
use 'encoding' as done currently?
What do you think?
I think you're on the right track. Maybe a little too complicated but
I'm not sure. I would just use 'fileencoding', or if empty (or if it can
be ascertained that the current buffer contains characters which are
invalid for it) then fall back on 'encoding' (by leaving 'fileencoding'
empty in the tohtml output buffer). But go ahead if you think you can
refine it more or make it better.
I don't know what is being done ATM, but I'd always include the line
<meta http-equiv="Content-Type" content="text/html; charset=whatever" />
(replacing "whatever" by the charset name) somewhere near the start of
the <head> element. You may want to use a synonym, e.g. iso-8859-1 for
Latin1, but that's just the finishing touch.
Best regards,
Tony.
--
"In defeat, unbeatable; in victory, unbearable."
-- Winston Curchill, of Montgomery
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php