Wow.  I never realized that it could be so involved.  I tested

   :echo has('multi_byte')

and both platforms returned 1.  However, I need to spend much more
time educating myself on the other aspects of representation.  This is
something I'm doing with snippets of time on the weekend.  Thanks, and
I will keep this nugget of insight at my fingertips for follow-up.

On Jan 11, 7:03 pm, Tony Mechelynck <[email protected]>
wrote:
> On 11/01/09 17:35, AndyHancock wrote:
>
>
>
> > Sorry for the repost, but the first time submitted through Google
> > Groups yielded a blank submission form.  So I have recomposed and
> > reposted (20 minutes of time).
>
> > I am using:
> > 1. Vim 6.2 on Windows 2000, Lucida Console font, and
> > 2. Vim 7.1.2 on Cygwin's Xwin[dows], Lucida Typewriter font, on to of
> >     Windows 2000
>
> > After some surfing, I found that I can get a real bullet character
> > (not asterisk or dash) in Windows using ASCII code 149.
>
> > A. On windows applications, press Alt, enter 0149 on number pad.
> > B. On #1 above in insert mode, enter Ctrl-V followed by 149.
>
> > Neither of these work for #2 above.  Even if I create a bullet
> > character using #1 and #B, it shows up as "~U" (minus quotes) in #2.
>
> > Is there a way to create bullets in #2?
>
> > Is there a way to have those bullets maintain their appearance across
> > Vim platforms?
>
> > Thanks.
>
> It depends on your 'encoding', which is how Vim represents data in memory.
>
> It also depends on each file's 'fileencoding', which is how that file's
> data is represented on disk.
>
> Of course, to be able to use any given character in a file edited by
> Vim, that character must be representable (not necessarily the same way)
> in both Vim's 'encoding' and the file's 'fileencoding'.
>
> In the Latin1 aka ISO-8859-1 encoding, the character decimal 149, hex
> 0x95 is a control character, corresponding to Unicode U+0095 <control> =
> MESSAGE WAITING. That character is not printable.
>
> In the Windows-1252 encoding, that same decimal 149 hex 0x95 value is
> used to represent a different character, namely the unicode codepoint
> U+2022 BULLET. That character is not representable in Latin1.
>
> Now you have several possibilities.
>
> First, I recommend using utf-8 for Vim's internal representation of the
> data in memory, because that 'encoding' can represent any Unicode
> codepoint, which means that regardless of the file's 'fileencoding', Vim
> will be able to represent it in memory. This requires a binary compiled
> with +multi_byte -- such a binary will answer with the number 1 (one)
> when you ask ":echo has('multi_byte').
>
> Then you will have to decide how to represent the data on disk. For
> portability between various computers, Latin1 is recommended; however
> this means that anything between 0x80 and 0x9F included is reserved for
> non-printable control characters.
>
> If you prefer having an additional 32 characters at your disposal in an
> 8-bit encoding, you can use Windows-1252 everywhere, and decide that
> you'll represent any 8-bit disk file in that 'fileencoding'. You could
> make Vim (with 'encoding' set to utf-8) recognize these files by means
> of the command ":set fileencodings=ucs-bom,utf1,Windows-1252" in your
> vimrc (see where in the snippet at the bottom of this email, and notice
> the difference between 'fileencoding' [singular] and 'fileencodings'
> [plural]). The problem with this approach is that if you publish such
> documents, anyone with a Unix or Linux or Mac operating system will
> probably not display those 32 additional characters correctly.
>
> Or else, you can choose the Unicode UTF-8 encoding as your preferred
> 'fileencoding', which doesn't forbid using Latin1, Windows-1252, or
> indeed anything else for occasional files. In that case I recommend
> using a BOM on Unicode files in order to let them be recognized
> unambiguously even by programs other than Vim and by computers other
> than your own.
>
> Now here's the promised snippet of code; place it near the top of your
> vimrc, after setting ":language" if you use that command but before
> defining any mappings. I have added comments to make it as
> understandable as I can.
>
> " Unicode can only be used if Vim is compiled with +multi_byte
> if has('multi_byte')
>         " if Vim is already using Unicode, no need to change it
>         if &encoding !~? '^u'
>                 " avoid clobbering the keyboard/display encoding
>                 if &termencoding == ''
>                         let &termencoding = &encoding
>                 endif
>                 " use UTF-8 internally in Vim memory
>                 set encoding=utf-8
>         endif
>         " setup the heuristics to recognize
>         " how existing files are coded
>         set fileencodings=ucs-bom,utf-8,Windows-1252
>         " define defaults for new files
>         " use Windows-1252 (8 bit) by default
>         setglobal fileencoding=Windows-1252
>         " use a BOM on Unicode files
>         setglobal bomb
> " if Vim has no +multi_byte capability, warn the user
> else
>         echomsg "No +multi_byte in this Vim version"
> endif
>
> You can vary the details of the above once you understand the general
> idea. If you don't change anything, your new files will be created in
> Windows-1252, and existing files will be assumed to be Windows-1252
> unless they either start with a Unicode BOM, or contain only codes which
> are valid for UTF-8 (anything above 0x7F is represented in UTF-8 by at
> least two bytes with the high bit set, so this will still allow
> recognizing your existing bullets). To write one new file in UTF-8
> instead, use either
>
>         :e ++enc=utf-8 newfile
> or
>         :e newfile
>         :setlocal fenc=utf-8
>
> (where 'fenc' is of course the short name for the 'fileencoding' option).
>
> See
>         :help Unicode
>         :help +multi_byte
>         :help 'encoding'
>         :help 'fileencoding'
>         :help 'fileencodings'
>         :help 'termencoding'
>         :help 'bomb'
>         :help ++opt
>        http://vim.wikia.org/wiki/Working_with_Unicode
>
> Oh, and one more thing: For a bullet-like character which looks the same
> in both Latin1 and Windows-1252, you could use the character 0xB7,
> corresponding in both of these encodings to the Unicode codepoint U+00B7
> MIDDLE DOT. This is a thinner bullet than U+2022 but it is more
> portable. This "middle dot" is used in Catalan to separate two letters l
> which must be pronounced as a "geminated hard l", as in col·lega (a
> colleague) rather than as a single "palatalized l" intermediary between
> l and y, as in collar (a collar).
>
> Best regards,
> Tony.
> --
> Cleanliness is next to impossible.
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply via email to