Re: Bullet character across Vim platforms

Tony Mechelynck Sun, 17 Jan 2010 15:47:12 -0800

On 17/01/10 23:55, AndyHancock wrote:

I am finally following up on the solution below for using bullets
corresponding to Windows-1252 code 149 (0x95).  I put the script in
vimrc, issues "gvim Temp.txt" from the bash command line (Temp.txt is
nonexistent), and got no warning about missing multi_byte capability
[ not surprising since ":echo has('multi_byte')" yields 1 ].  However,
I'm still not getting the bullet.  I tried a few ways.


First, I created a bullet in one of a couple of Windows app (Firefox,
Palm Desktop) using the usual method: Alt-0149 on the number pad.
(Actually, it's a laptops, so I had to use "num lk", which locks some
of the qwerty keys into a number pad function).  Then I copied and
pasted the bullet into gvim (mouse middle-button to paste, since it's
X-windows).  It pastes as a question-mark character.

I then tried Alt-0149 directly in gvim while in insert mode.  No joy,
as this translates into four characters, corrersponding to Alt-0 Alt-1
Alt-4 Alt-9.

Finally, in insert mode, I did Ctrl-V 149, which simply inserts a
character with hex code ox95.  I literally shows up as a blue-coloured
"<95>" without quotes.  The "ga" command shows this to be a character
with hex code 0x95.  I thought that the last method above might be
fine, and perhaps it just wasn't showing up in the gvim window due
some reason related to X-windows fonts.  So I copied and pasted the
text into a Windows app...the result was a space in place of the
bullet.  Note that the bullet 0x95 copies successfully between windows
apps, and between windows apps and windows-based gvim (as opposed to
cygwin gvim).

I also tried the alternative bullet 0xB7, but that's almost invisible
in Palm Desktop.  Better to use an asterisk (which is highly
nonideal).

To troubleshoot the script, I tried querired the following options,
with the shown results:

    Encoding
    --------
    set encoding
    Ans: encoding=utf-8

    setlocal encoding
    Ans: encoding=utf-8

    setglobal encoding
    Ans: encoding=utf-8

    This make sense, since encoding is global.

    Fileencoding
    ------------
    set fileencoding
    Ans: fileencoding=

    setlocal fileencoding
    Ans: fileencoding=

'fileencoding' empty means that the file will be recorded in the samecharset as 'encoding', i.e., UTF-8. In that charset, 0x95 (or 149decimal) is a non-printable control character. Since this controlcharacter has no representation in Windows-1252, you cannot convert toWindows-1252 an UTF-8 buffer which contains it (and remember,'encoding', not 'fileencoding', defines how Vim represent the data inmemory).

As I said in my previous post, to generate a file encoded on disk inWindows-1252 with a Windows-1252 bullet (0x95 on disk) in it, you must:


1) keep 'encoding' to UTF-8 as above

2) :setlocal fileencoding=cp1252   " (or: :setlocal fenc=windows-1252 )

3) Enter the character into Vim as the Unicode representation of theequivalent character, i.e. U+2022. To do that, in Insert mode, type(with no intervening spaces, I add them here only for legibility):Ctrl-V u 2 0 2 2 (but if your Ctrl-V has been remapped to the Pasteoperation, use Ctrl-Q instead).


See
        :help i_CTRL-V_digit
        :help i_CTRL-Q
        :help CTRL-V-alternative


    setglobal fileencoding
    Ans: fileencoding=windows-1252

This is the global default, but it won't be applied to the file becausethe local value is different.



    Fileencodings
    -------------
    set fileencodings
    Ans: fileencodings=ucs-bom,utf-8,Windows-1252

    setlocal fileencodings
    Ans: fileencodings=ucs-bom,utf-8,Windows-1252

    setglobal fileencodings
    Ans: fileencodings=ucs-bom,utf-8,Windows-1252

    This make sense, since fileencodings is global.

    Bomb
    ----
    set bomb?
    Ans: nobomb

    setlocal bomb?
    Ans: nobomb

    setglobal bomb?
    Ans: bomb

Rather than create a new Temp.txt from the bash command line, I also
tried creating new unnamed files using<Ctrl-w><Ctrl-n>  and ":new".
The only difference for these two buffers was that the local value of
fileencoding was Windows-1252, and local boolean bomb option was set.
However, but all the above attempts to create a bullet yielded the
same results.

I also issued "setlocal fileencoding=Windows-1252" in Temp.txt (using
both capital and small "w"), and "setlocal bomb".  That prevented me
from saving the file:

    E513:
    write error, conversion failed
    (make 'fenc' empty to override)

This was due to the 0x95 character that I inserted using Ctrl-V.  It
turns out that this also affected the two unnamed buffers -- that is,
if I tried to issue "w! Temp3.txt", I get the same error if the 0x95
character is present in the buffer.  The only way to be able to write
the file is to setlocal fileencoding to null, or remove the 0x95
character.

I admit that I am far from experienced with character encodings.  Is
there anything I'm missing from the solution below?

---------- Forwarded message ----------
From: Tony Mechelynck<[email protected]>
Date: Jan 11 2009, 7:03 pm
Subject: Bullet character across Vim platforms
To: vim_use

On 11/01/09 17:35, AndyHancock wrote:

Sorry for the repost, but the first time submitted through Google
Groups yielded a blank submission form.  So I have recomposed and
reposted (20 minutes of time).

I am using:
1. Vim6.2 on Windows 2000, Lucida Console font, and
2. Vim7.1.2 on Cygwin's Xwin[dows], Lucida Typewriter font, on to of
     Windows 2000

After some surfing, I found that I can get a realbulletcharacter
(not asterisk or dash) in Windows using ASCII code 149.

A. On windows applications, press Alt, enter 0149 on number pad.
B. On #1 above in insert mode, enter Ctrl-V followed by 149.

Neither of these work for #2 above.  Even if I create abullet
character using #1 and #B, it shows up as "~U" (minus quotes) in #2.

Is there a way to create bullets in #2?

Is there a way to have those bullets maintain their appearance across
Vim platforms?


It depends on your 'encoding', which is how Vim represents data in
memory.

It also depends on each file's 'fileencoding', which is how that
file's data is represented on disk.

Of course, to be able to use any given character in a file edited by
Vim, that character must be representable (not necessarily the same
way) in both Vim 's'encoding' and the file's 'fileencoding'.

In the Latin1 aka ISO-8859-1 encoding, the character decimal 149, hex
0x95 is a control character, corresponding to Unicode U+0095<control>
= MESSAGE WAITING. That character is not printable.

In the Windows-1252 encoding, that same decimal 149 hex 0x95 value is
used to represent a different character, namely the unicode codepoint U
+2022BULLET. That character is not representable in Latin1.

Now you have several possibilities.

First, I recommend using utf-8 for Vim'sinternal representation of the
data in memory, because that 'encoding' can represent any Unicode
codepoint, which means that regardless of the file's
'fileencoding',Vim will be able to represent it in memory. This
requires a binary compiled with +multi_byte -- such a binary will
answer with the number 1 (one) when you ask ":echo has('multi_byte').

Then you will have to decide how to represent the data on disk. For
portability between various computers, Latin1 is recommended; however
this means that anything between 0x80 and 0x9F included is reserved
for non-printable control characters.

If you prefer having an additional 32 characters at your disposal in
an 8-bit encoding, you can use Windows-1252 everywhere, and decide
that you'll represent any 8-bit disk file in that 'fileencoding'. You
could make Vim (with 'encoding' set to utf-8) recognize these files by
means of the command ":set fileencodings=ucs-bom,utf-8,Windows-1252"
in your vimrc (see where in the snippet at the bottom of this email,
and notice the difference between 'fileencoding' [singular] and
'fileencodings' [plural]). The problem with this approach is that if
you publish such documents, anyone with a Unix or Linux or Mac
operating system will probably not display those 32 additional
characters correctly.

Or else, you can choose the Unicode UTF-8 encoding as your preferred
'fileencoding', which doesn't forbid using Latin1, Windows-1252, or
indeed anything else for occasional files. In that case I recommend
using a BOM on Unicode files in order to let them be recognized
unambiguously even by programs other than Vim and by computers other
than your own.

Now here's the promised snippet of code; place it near the top of your
vimrc, after setting ":language" if you use that command but before
defining any mappings. I have added comments to make it as
understandable as I can.

" Unicode can only be used if Vim is compiled with +multi_byte
if has('multi_byte')
         " if Vim is already using Unicode, no need to change it
         if&encoding !~? '^u'
                 " avoid clobbering the keyboard/display encoding
                 if&termencoding == ''
                         let&termencoding =&encoding
                 endif
                 " use UTF-8 internally in Vim memory
                 set encoding=utf-8
         endif
         " setup the heuristics to recognize
         " how existing files are coded
         set fileencodings=ucs-bom,utf-8,Windows-1252
         " define defaults for new files
         " use Windows-1252 (8 bit) by default
         setglobal fileencoding=Windows-1252
         " use a BOM on Unicode files
         setglobal bomb
" if Vim has no +multi_byte capability, warn the user
else
         echomsg "No +multi_byte in this Vim version"
endif

You can vary the details of the above once you understand the general
idea. If you don't change anything, your new files will be created in
Windows-1252, and existing files will be assumed to be Windows-1252
unless they either start with a Unicode BOM, or contain only codes
which are valid for UTF-8 (anything above 0x7F is represented in UTF-8
by at least two bytes with the high bit set, so this will still allow
recognizing your existing bullets). To write one new file in UTF-8
instead, use either

         :e ++enc=utf-8 newfile
or
         :e newfile
         :setlocal fenc=utf-8

(where 'fenc' is of course the short name for the 'fileencoding'
option).

See
         :help Unicode
         :help +multi_byte
         :help 'encoding'
         :help 'fileencoding'
         :help 'fileencodings'
         :help 'termencoding'
         :help 'bomb'
         :help ++opt
        http://vim.wikia.org/wiki/Working_with_Unicode

Oh, and one more thing: For abullet-like character which looks the
same in both Latin1 and Windows-1252, you could use the character
0xB7, corresponding in both of these encodings to the Unicode
codepoint U+00B7 MIDDLE DOT. This is a thinnerbulletthan U+2022 but it
is more portable. This "middle dot" is used in Catalan to separate two
letters l which must be pronounced as a "geminated hard l", as in
col·lega (a colleague) rather than as a single "palatalized l"
intermediary between l and y, as in collar (a collar).

Best regards,
Tony.
--
Cleanliness is next to impossible.


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
242. You turn down a better-paying job because it doesn't come with
     a free e-mail account.

-- 
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

Re: Bullet character across Vim platforms

Reply via email to