Re: Save default font on Gvim on Windows 7.

Tony Mechelynck Sun, 03 Nov 2013 15:27:41 -0800

On 03/11/13 14:56, John Joche wrote:

OK. Thank you for your help...


I can put the command

|set guifont=Lucida_Console:h12:cDEFAULT
|

inside /C:\Users\JSonderson_gvimrc/ and this font family and font size
and character set is loaded each time I start gvim.

------------------------------------------------------------------------

However a question still remains, that is, how come UTF-8 is not on the
list of character sets?


tl;dr: see last paragraph above your next question

UTF-8 is one of the ways to represent Unicode in memory. Unicode is theUniversal character set, a superset of all possible character sets knownto computer software.


The following encodings can represent all Unicode codepoints ("characters"):

- UTF-8, with between 1 and 4 bytes per character (originally up to 6bytes had been foreseen, but then it was decided that codepoints aboveU+10FFFF would never be attributed). UTF-8 has the property that the 128US-ASCII characters are represented in UTF-8 by one byte in exactly thesame way as in US-ASCII, Latin1, and most other ASCII-derived encodings.(EBCDIC is of course a world apart).

- UTF-16, with one or two 2-byte words per character;
- UTF-32 (aka UCS-4), with one 4-byte doubleword per character;

- GB18030, with 1, 2 or 4 bytes per character but biased in favour ofChinese (this is the current official standard encoding of the PRC).Conversion between GB18030 and the other ones is possible but nottrivial, and requires bulky tables. The iconv utility can usually do it,and so can Vim if built with +iconv, or with +iconv/dyn and it can findthe iconv or libiconv library.

UTF-16 and UTF-32 can be big-endian (default) or little-endian (e.g.UTF-16le). UTF-32 even supports the rarely used 3412 and 2143 byteorderings but I'm not sure Vim knows about it.

Vim represents internally UTF-16 and UTF-32 as UTF-8 in memory, becausea NUL codepoint is a null word in UTF-16, a null doubleword in UTF-32,and the many other null bytes in the files would play havoc with Vim'suse of null-terminated C strings. OTOH, in UTF-8 nothing other than theNUL codepoint U+0000 may validly include a null byte in its representation.

With some filetypes, it is possible to tell user applications whichUnicode encoding and endianness to use by adding the codepoint U+FEFF atthe very start of the file. That codepoint is usually called the BOM(byte-order mark) but it can even identify UTF-8 which has no endiannessvariants. It is supported for at least HTML and CSS; it is notrecognized (and should not be present) in executable scripts in UTF-8,especially those where the first line starts with #! — I've been caughtby that in the past, and now I know better.

Note that when Windows people say "Unicode" they usually mean UTF-16le.That's e.g. how one must decode the sentence "The file is not in UTF-8,it's in Unicode" (which, taken literally, is nonsense) in the mouth of aMicrosoft engineer.

You set the 'encoding' option, preferably near the top of your vimrc, totell Vim how characters are to be represented in memory. The advantageof using ":set enc=utf8" is that it allows Vim to represent in memoryany character of any charset known to computer people. OTOH, e.g. usingLatin1 as your 'encoding' value only allows to represent the 256characters which are part of the Latin1 charset; those are also thefirst 256 codepoints (U+0000 to U+00FF) of Unicode.


See also http://vim.wikia.com/wiki/Working_with_Unicode

All of the above is independent of the 'guifont' setting. Why is therenothing relating to Unicode in the :cXX parameter of Windows 'guifont'settings? I'm not sure. Either :cDEFAULT means Unicode, or else it's aWindows mystery.


Isn't the character set something separate from the font anyways?

Yes, it is; but each font file has glyphs for a certain set oflanguages. Usually not for all Unicode codepoints which are defined,there are an enormous lot of them.


What's the difference between character set and character encoding?

Not much. In most situations they can be used as synonyms. When notsynonymous, the character set is the array of characters, and thecharacter encoding is the exact manner those characters are represented(by how many bytes, and which ones) in memory, on disk, on tape, etc.Sometimes both words are used one for the other: e.g. in HTTP or mailheaders, the Content-Type line uses "charset=" to tell the receivingapplication which encoding is used in the document.

Unicode can be regarded as one abstract character set with room for morethan a million characters (originally two thousand million, but then thenumber was reduced), which ATM can be represented in at least 8different encodings if all byte-ordering variants are considered. Notall the Unicode "slots" have already received an assignment; some arereserved "for private use" and others have been blocked as"noncharacters". For details, see http://www.unicode.org/ and inparticular http://www.unicode.org/charts/


How can I display the actual character set which is being used when I
use the DEFAULT setting?

You don't. The font either has a glyph for the character you're tryingto display (and you should see that glyph), or it doesn't (and youshould see some placeholder glyph instead, e.g. an empty frame or areverse-video question mark).


Thanks.


Best regards,
Tony.
--
Love in your heart wasn't put there to stay.
Love isn't love 'til you give it away.
                -- Oscar Hammerstein II

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Save default font on Gvim on Windows 7.

Reply via email to