Unicode chars NEL, FF, LS, PS

2006-09-29 Thread Steve Hall

Does anyone here know if Vim respects the following Unicode characters
(represents them rather than just indicating literals):

  http://en.wikipedia.org/wiki/Newline#Unicode

I'm not on a Unicode platform at the moment, but I'm wondering if Vim
could ever have the listchars to do it like mined:

  http://towo.net/mined/mined-uni.png


-- 
Steve Hall  [ digitect dancingpaper com ]



Re: Unicode chars NEL, FF, LS, PS

2006-09-29 Thread A.J.Mechelynck

Steve Hall wrote:

Does anyone here know if Vim respects the following Unicode characters
(represents them rather than just indicating literals):

  http://en.wikipedia.org/wiki/Newline#Unicode

I'm not on a Unicode platform at the moment, but I'm wondering if Vim
could ever have the listchars to do it like mined:

  http://towo.net/mined/mined-uni.png




Vim is a text editor, not a word processor. It does not necessarily show 
control characters as a word processor or a printer would. Even on a 
non-Unicode platform, you should be able to run a +multibyte version of gvim, 
set 'encoding' to UTF-8 while preserving the locale setting of 'encoding' in 
'termencoding', and enter the characters according to :help i_CTRL-V_digit 
to see what happens.


NEL (Next Line, 0x85) is an upper-ASCII control character. I expect Vim to 
represent it as 85 when 'encoding' is set to UTF-8. This, however, depends 
on the setting of the 'isprint' option. I don't know what this control 
character means.


FF (Form Feed, 0x0C) is an ASCII control character; it should be represented 
as ^L in Unicode just as in Latin1. When sent to a printer, it usually causes 
a page eject.


LS (Line Separator, L SEP, U+2028) and PS (Paragraph Separator, P SEP, U+2029) 
are Format characters according to Unicode 
http://www.unicode.org/charts/PDF/U2000.pdf . They are followed in the charts 
by Left-to-Right Embedding, Right-to-Left Embedding, Pop Directional 
Formatting etc. I don't expect Vim to handle them otherwise than any other 
character, i.e., fetch a glyph, if any (probably none) from your 'guifont'. In 
my Gnome2 gvim with 'encoding' set to UTF-8, both U+2028 and U+2029 display as 
single-width spaces.



Best regards,
Tony.


Re: Unicode chars NEL, FF, LS, PS

2006-09-29 Thread Steve Hall
On Sat, 2006-09-30 at 01:14 +0200, A.J.Mechelynck wrote:
 Steve Hall wrote:
  Does anyone here know if Vim respects the following Unicode
  characters (represents them rather than just indicating literals):
 
http://en.wikipedia.org/wiki/Newline#Unicode
 
  I'm not on a Unicode platform at the moment, but I'm wondering if
  Vim could ever have the listchars to do it like mined:
 
http://towo.net/mined/mined-uni.png

 Vim is a text editor, not a word processor. It does not necessarily
 show control characters as a word processor or a printer would.

However you might alternatively say that these floodgates were opened
when list was invented. :)

 Even on a non-Unicode platform, you should be able to run a
 +multibyte version of gvim, set 'encoding' to UTF-8 while preserving
 the locale setting of 'encoding' in 'termencoding', and enter the
 characters according to :help i_CTRL-V_digit to see what happens.

Sometimes there's a font limitation, and I don't always trust what I
see.

 NEL (Next Line, 0x85) is an upper-ASCII control character. I expect
 Vim to represent it as 85 when 'encoding' is set to UTF-8. This,
 however, depends on the setting of the 'isprint' option. I don't
 know what this control character means.

 FF (Form Feed, 0x0C) is an ASCII control character; it should be
 represented as ^L in Unicode just as in Latin1. When sent to a
 printer, it usually causes a page eject.

 LS (Line Separator, L SEP, U+2028) and PS (Paragraph Separator, P
 SEP, U+2029) are Format characters according to Unicode
 http://www.unicode.org/charts/PDF/U2000.pdf . They are followed in
 the charts by Left-to-Right Embedding, Right-to-Left Embedding,
 Pop Directional Formatting etc. I don't expect Vim to handle them
 otherwise than any other character, i.e., fetch a glyph, if any
 (probably none) from your 'guifont'. In my Gnome2 gvim with
 'encoding' set to UTF-8, both U+2028 and U+2029 display as
 single-width spaces.

It would be a lot to ask of any text editor to respect these new
Unicode formatting characters. But I do think the authors of the spec
intended these to be additions to the traditional CR and LF. I've been
involved in a why can't Vim do X, editor Y can do it discussion, so
my interest here is not actually using these chars myself. But there
are likely some cases where they will be useful, more and more as
software adopts Unicode. I'd personally only care that listchars has
an option for them, on screen they act the same as any other line
ending or tab char.


-- 
Steve Hall  [ digitect dancingpaper com ]




Re: Unicode chars NEL, FF, LS, PS

2006-09-29 Thread A.J.Mechelynck

Steve Hall wrote:
[...]

It would be a lot to ask of any text editor to respect these new
Unicode formatting characters. But I do think the authors of the spec
intended these to be additions to the traditional CR and LF. I've been
involved in a why can't Vim do X, editor Y can do it discussion, so
my interest here is not actually using these chars myself. But there
are likely some cases where they will be useful, more and more as
software adopts Unicode. I'd personally only care that listchars has
an option for them, on screen they act the same as any other line
ending or tab char.




Well, they don't. The only recognised line ending in Vim is the OS-specific 
one: CR on the Mac, LF under Unix, CR+LF on Windows. IIUC, in Unicode the use 
of embedded format characters is deprecated in favour of markup, e.g. in HTML 
span dir=rtl.../span rather than LRE ... PDF, P.../P rather than 
P-SEP, etc.



Best regards,
Tony.