On 17/01/10 23:55, AndyHancock wrote:
I am finally following up on the solution below for using bullets
corresponding to Windows-1252 code 149 (0x95). I put the script in
vimrc, issues "gvim Temp.txt" from the bash command line (Temp.txt is
nonexistent), and got no warning about missing multi_byte capability
[ not surprising since ":echo has('multi_byte')" yields 1 ]. However,
I'm still not getting the bullet. I tried a few ways.
First, I created a bullet in one of a couple of Windows app (Firefox,
Palm Desktop) using the usual method: Alt-0149 on the number pad.
(Actually, it's a laptops, so I had to use "num lk", which locks some
of the qwerty keys into a number pad function). Then I copied and
pasted the bullet into gvim (mouse middle-button to paste, since it's
X-windows). It pastes as a question-mark character.
I then tried Alt-0149 directly in gvim while in insert mode. No joy,
as this translates into four characters, corrersponding to Alt-0 Alt-1
Alt-4 Alt-9.
Finally, in insert mode, I did Ctrl-V 149, which simply inserts a
character with hex code ox95. I literally shows up as a blue-coloured
"<95>" without quotes. The "ga" command shows this to be a character
with hex code 0x95. I thought that the last method above might be
fine, and perhaps it just wasn't showing up in the gvim window due
some reason related to X-windows fonts. So I copied and pasted the
text into a Windows app...the result was a space in place of the
bullet. Note that the bullet 0x95 copies successfully between windows
apps, and between windows apps and windows-based gvim (as opposed to
cygwin gvim).
I also tried the alternative bullet 0xB7, but that's almost invisible
in Palm Desktop. Better to use an asterisk (which is highly
nonideal).
To troubleshoot the script, I tried querired the following options,
with the shown results:
Encoding
--------
set encoding
Ans: encoding=utf-8
setlocal encoding
Ans: encoding=utf-8
setglobal encoding
Ans: encoding=utf-8
This make sense, since encoding is global.
Fileencoding
------------
set fileencoding
Ans: fileencoding=
setlocal fileencoding
Ans: fileencoding=
'fileencoding' empty means that the file will be recorded in the same
charset as 'encoding', i.e., UTF-8. In that charset, 0x95 (or 149
decimal) is a non-printable control character. Since this control
character has no representation in Windows-1252, you cannot convert to
Windows-1252 an UTF-8 buffer which contains it (and remember,
'encoding', not 'fileencoding', defines how Vim represent the data in
memory).
As I said in my previous post, to generate a file encoded on disk in
Windows-1252 with a Windows-1252 bullet (0x95 on disk) in it, you must:
1) keep 'encoding' to UTF-8 as above
2) :setlocal fileencoding=cp1252 " (or: :setlocal fenc=windows-1252 )
3) Enter the character into Vim as the Unicode representation of the
equivalent character, i.e. U+2022. To do that, in Insert mode, type
(with no intervening spaces, I add them here only for legibility):
Ctrl-V u 2 0 2 2 (but if your Ctrl-V has been remapped to the Paste
operation, use Ctrl-Q instead).
See
:help i_CTRL-V_digit
:help i_CTRL-Q
:help CTRL-V-alternative
setglobal fileencoding
Ans: fileencoding=windows-1252
This is the global default, but it won't be applied to the file because
the local value is different.
Fileencodings
-------------
set fileencodings
Ans: fileencodings=ucs-bom,utf-8,Windows-1252
setlocal fileencodings
Ans: fileencodings=ucs-bom,utf-8,Windows-1252
setglobal fileencodings
Ans: fileencodings=ucs-bom,utf-8,Windows-1252
This make sense, since fileencodings is global.
Bomb
----
set bomb?
Ans: nobomb
setlocal bomb?
Ans: nobomb
setglobal bomb?
Ans: bomb
Rather than create a new Temp.txt from the bash command line, I also
tried creating new unnamed files using<Ctrl-w><Ctrl-n> and ":new".
The only difference for these two buffers was that the local value of
fileencoding was Windows-1252, and local boolean bomb option was set.
However, but all the above attempts to create a bullet yielded the
same results.
I also issued "setlocal fileencoding=Windows-1252" in Temp.txt (using
both capital and small "w"), and "setlocal bomb". That prevented me
from saving the file:
E513:
write error, conversion failed
(make 'fenc' empty to override)
This was due to the 0x95 character that I inserted using Ctrl-V. It
turns out that this also affected the two unnamed buffers -- that is,
if I tried to issue "w! Temp3.txt", I get the same error if the 0x95
character is present in the buffer. The only way to be able to write
the file is to setlocal fileencoding to null, or remove the 0x95
character.
I admit that I am far from experienced with character encodings. Is
there anything I'm missing from the solution below?
---------- Forwarded message ----------
From: Tony Mechelynck<[email protected]>
Date: Jan 11 2009, 7:03 pm
Subject: Bullet character across Vim platforms
To: vim_use
On 11/01/09 17:35, AndyHancock wrote:
Sorry for the repost, but the first time submitted through Google
Groups yielded a blank submission form. So I have recomposed and
reposted (20 minutes of time).
I am using:
1. Vim6.2 on Windows 2000, Lucida Console font, and
2. Vim7.1.2 on Cygwin's Xwin[dows], Lucida Typewriter font, on to of
Windows 2000
After some surfing, I found that I can get a realbulletcharacter
(not asterisk or dash) in Windows using ASCII code 149.
A. On windows applications, press Alt, enter 0149 on number pad.
B. On #1 above in insert mode, enter Ctrl-V followed by 149.
Neither of these work for #2 above. Even if I create abullet
character using #1 and #B, it shows up as "~U" (minus quotes) in #2.
Is there a way to create bullets in #2?
Is there a way to have those bullets maintain their appearance across
Vim platforms?
It depends on your 'encoding', which is how Vim represents data in
memory.
It also depends on each file's 'fileencoding', which is how that
file's data is represented on disk.
Of course, to be able to use any given character in a file edited by
Vim, that character must be representable (not necessarily the same
way) in both Vim 's'encoding' and the file's 'fileencoding'.
In the Latin1 aka ISO-8859-1 encoding, the character decimal 149, hex
0x95 is a control character, corresponding to Unicode U+0095<control>
= MESSAGE WAITING. That character is not printable.
In the Windows-1252 encoding, that same decimal 149 hex 0x95 value is
used to represent a different character, namely the unicode codepoint U
+2022BULLET. That character is not representable in Latin1.
Now you have several possibilities.
First, I recommend using utf-8 for Vim'sinternal representation of the
data in memory, because that 'encoding' can represent any Unicode
codepoint, which means that regardless of the file's
'fileencoding',Vim will be able to represent it in memory. This
requires a binary compiled with +multi_byte -- such a binary will
answer with the number 1 (one) when you ask ":echo has('multi_byte').
Then you will have to decide how to represent the data on disk. For
portability between various computers, Latin1 is recommended; however
this means that anything between 0x80 and 0x9F included is reserved
for non-printable control characters.
If you prefer having an additional 32 characters at your disposal in
an 8-bit encoding, you can use Windows-1252 everywhere, and decide
that you'll represent any 8-bit disk file in that 'fileencoding'. You
could make Vim (with 'encoding' set to utf-8) recognize these files by
means of the command ":set fileencodings=ucs-bom,utf-8,Windows-1252"
in your vimrc (see where in the snippet at the bottom of this email,
and notice the difference between 'fileencoding' [singular] and
'fileencodings' [plural]). The problem with this approach is that if
you publish such documents, anyone with a Unix or Linux or Mac
operating system will probably not display those 32 additional
characters correctly.
Or else, you can choose the Unicode UTF-8 encoding as your preferred
'fileencoding', which doesn't forbid using Latin1, Windows-1252, or
indeed anything else for occasional files. In that case I recommend
using a BOM on Unicode files in order to let them be recognized
unambiguously even by programs other than Vim and by computers other
than your own.
Now here's the promised snippet of code; place it near the top of your
vimrc, after setting ":language" if you use that command but before
defining any mappings. I have added comments to make it as
understandable as I can.
" Unicode can only be used if Vim is compiled with +multi_byte
if has('multi_byte')
" if Vim is already using Unicode, no need to change it
if&encoding !~? '^u'
" avoid clobbering the keyboard/display encoding
if&termencoding == ''
let&termencoding =&encoding
endif
" use UTF-8 internally in Vim memory
set encoding=utf-8
endif
" setup the heuristics to recognize
" how existing files are coded
set fileencodings=ucs-bom,utf-8,Windows-1252
" define defaults for new files
" use Windows-1252 (8 bit) by default
setglobal fileencoding=Windows-1252
" use a BOM on Unicode files
setglobal bomb
" if Vim has no +multi_byte capability, warn the user
else
echomsg "No +multi_byte in this Vim version"
endif
You can vary the details of the above once you understand the general
idea. If you don't change anything, your new files will be created in
Windows-1252, and existing files will be assumed to be Windows-1252
unless they either start with a Unicode BOM, or contain only codes
which are valid for UTF-8 (anything above 0x7F is represented in UTF-8
by at least two bytes with the high bit set, so this will still allow
recognizing your existing bullets). To write one new file in UTF-8
instead, use either
:e ++enc=utf-8 newfile
or
:e newfile
:setlocal fenc=utf-8
(where 'fenc' is of course the short name for the 'fileencoding'
option).
See
:help Unicode
:help +multi_byte
:help 'encoding'
:help 'fileencoding'
:help 'fileencodings'
:help 'termencoding'
:help 'bomb'
:help ++opt
http://vim.wikia.org/wiki/Working_with_Unicode
Oh, and one more thing: For abullet-like character which looks the
same in both Latin1 and Windows-1252, you could use the character
0xB7, corresponding in both of these encodings to the Unicode
codepoint U+00B7 MIDDLE DOT. This is a thinnerbulletthan U+2022 but it
is more portable. This "middle dot" is used in Catalan to separate two
letters l which must be pronounced as a "geminated hard l", as in
col·lega (a colleague) rather than as a single "palatalized l"
intermediary between l and y, as in collar (a collar).
Best regards,
Tony.
--
Cleanliness is next to impossible.
Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
242. You turn down a better-paying job because it doesn't come with
a free e-mail account.
--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php