Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-18 Thread Tomohiro KUBOTA

Hi,

At Tue, 17 Apr 2001 22:54:42 -0500,
David Starner [EMAIL PROTECTED] wrote:

 It's a dead horse, okay? Can we let it lie unless there's an actual
 pragmatic reason to bring it up?

Sorry, I don't understand your English.  My English-Japanese dictionary
doesn't explain what is "dead horse".

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-18 Thread Frank da Cruz

To introduce a new and irrelevent tangent...

On Wed, Apr 18, 2001 at 12:11:00PM +0900, Tomohiro KUBOTA wrote:
 If lack of proportional font were such very annoying thing for them,
 why typewriter was so widely used?  

From the 1950s or 60s there actually were proportionally-spacing
typewriters in common use, such as the IBM Executive.  Typing was done
in the normal way, but backspacing was more interesting.

David Starner - [EMAIL PROTECTED] - wrote:
 Typewriters are a pragmatic thing. Until recently, they were the
 only way to produce decent print (i.e. not handwritten) quickly and
 cheaply. Every so often, I find a book in the library, usually in
 the linguistics section, where it was typed up on a typewriter and
 all the accents were added in by hand.

Typewriters in some countries had accents on dead keys, like the Czech
typewriter I bought at a secondhand shop for five dollars and wrote
most of my undergraduate papers on.

By the way, the sentiment that Unicode is only for GUI font rendering
and typesetting is a misconception, albeit one that is held by many of
its proponents.  In fact it is a plain-text standard, period, that is
only slightly more hostile to terminal emulation than to any other form
of expression.  The fact that combining diacritics come after the base
character rather than before is the main stumbling block for terminal
emulators, which operate in realtime and don't have the luxury of
lookahead.  However, the "deprecated" status of line and box-drawing
characters can no longer be claimed because Unicode 3.1 adds tons of
them, mostly for math, but some expressly for terminal emulation.
See:

  ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt

- Frank

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-17 Thread Marcin 'Qrczak' Kowalczyk

Tue, 10 Apr 2001 09:44:58 +0100, Markus Kuhn [EMAIL PROTECTED] pisze:

 I understand, that there are also the block graphic characters. If
 you live in a world where you use mostly double-width glyphs in
 terminal emulators, it might be convenient to also have double-width
 block graphics characters.

I use single-width glyphs exclusively, including block graphics.
The Linux console doesn't support double-width anyway (at least in
2.2.x kernels).

 The answer of the Unicode consortium is very simple here: Nobody
 should be using the block graphics characters anyway. their use is
 deprecated, and they are only in Unicode to guarantee round-trip
 compatibility with legacy sets.

This advice is nonsense. What should be used instead in an API for
terminal drawing, if it uses Unicode otherwise anyway? Why should it be
made more complicated by treating block drawing characters specially?

 In modern display systems (such as HTML), you have appropriate
 alternative means such as table constructs to do what you used to
 use block graphics for.

HTML is not appropriate for a terminal drawing API.

 If you want to draw a line in a text, use proper graphical primitives
 for that, not block graphics symbols.

There are no graphical primitives in terminals.

-- 
 __("  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^  SYGNATURA ZASTPCZA
QRCZAK

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-17 Thread Markus Kuhn

Marcin 'Qrczak' Kowalczyk wrote on 2001-04-17 21:37 UTC:
 Tue, 10 Apr 2001 09:44:58 +0100, Markus Kuhn [EMAIL PROTECTED] pisze:
  I understand, that there are also the block graphic characters. If
  you live in a world where you use mostly double-width glyphs in
  terminal emulators, it might be convenient to also have double-width
  block graphics characters.
 
 I use single-width glyphs exclusively, including block graphics.
 The Linux console doesn't support double-width anyway (at least in
 2.2.x kernels).
 
  The answer of the Unicode consortium is very simple here: Nobody
  should be using the block graphics characters anyway. their use is
  deprecated, and they are only in Unicode to guarantee round-trip
  compatibility with legacy sets.
 
 This advice is nonsense. What should be used instead in an API for
 terminal drawing, if it uses Unicode otherwise anyway?

I know. However, if you read the Unicode standard, it will become very
apparent that it's authors clearly did not think in terms of terminal
emulators. They thought in terms of word processing documents and web
pages, in terms of typesetting, paragraph formatting and proportional
fonts. The UCS authors on the other hand very clearly had also
applications such as terminal emulators in mind (which is why they talk
about ISO 6429, though they clearly didn't specify solutions to all
terminal-emulator related problems). That just as a general warning to
people who might think that the Unicode bidi algorithm is directly
suitable for implementation in a terminal emulator, etc.

The reason why

  http://www.unicode.org/unicode/reports/tr11/

exists is primarily because of encoding conversion concerns, not because
of terminal emulation. Microsoft for instance is quietly phasing out
anything that looks remotely like a terminal emulator from its platform
concept (the command prompt is already mostly gone from the Windows 2000
Workstation Edition for instance).

A few more background thoughts on this:

Japanese typography doesn't use proportional fonts at all, so the
distinction between narrow and wide character versions shows up in Japan
even in typesetting. There is one deep reason behind some of the flame
wars and misunderstandings that we have here from time to time: In
Europe formatted plaintext files is naturally considered to be a far
more primitive representation of text than proper typography. In Europe,
plaintext is just able to represent the output of antique typewriters
and teletypers with very restricted glyph repertoirs and no style
variation, not even proportional fonts. In Japan on the other hand, what
formatted plaintext provides is very close to what proper typography
provides. Style variations like bold or italic are not commonly used,
wide characters are monospaced anyway and the traditional 16-bit
character sets contain almost the full repertoire of typographic
characters. So while in Europe we see plaintext as a primitive
compromise anyway, in Japan it is all that most people need to write
text, unless they want to have fancy graphic design with colors and
different font sizes.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-17 Thread David Starner

On Wed, Apr 18, 2001 at 12:11:00PM +0900, Tomohiro KUBOTA wrote:
 I have no idea what impression your word "very restricted" implies.
 I imagine Latin alphabet people have long tradition of typewriter.
 If lack of proportional font were such very annoying thing for them,
 why typewriter was so widely used?  

Typewriters are a pragmatic thing. Until recently, they were the
only way to produce decent print (i.e. not handwritten) quickly and
cheaply. Every so often, I find a book in the library, usually in
the linguistics section, where it was typed up on a typewriter and
all the accents were added in by hand. If they had had an option,
almost any option, they would have used it, because the end result
looks awful. Good typesetting is the top of my list on things I
appreciate computers for.

 (I have even read a journal which
 use typewritter (courier?) for typesetting.)

Find, if you can, Donald Knuth's "TeX and Metafont: New Directions
in Typesetting". In it, Knuth shows the typographic history of "The
Transactions of the American Mathematical Society" through 1977, and
shows where, at one point to save costs, they went to a typewriter
style printing. "At this point I regretfully stopped submitting
papers to the American Mathematical Society, as the finished product
was just too painful for me to look at. Similar fluctuations of
typegraphical quality have appeared recetly in all technical fields,
especially in physics where the situation has gotten even worse."
(page 7). Later he explains how the decreasing quality of The Art of
Computer Programming's typesetting got him to design TeX.

-- 
David Starner - [EMAIL PROTECTED]
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-11 Thread Florian Weimer

Markus Kuhn [EMAIL PROTECTED] writes:

 The only characters for which double-width (square) is appropriate are
 
   - Han ideographs
   - Hiragana/Katakana
   - Hangul
   - CJK punctuation
   - fullwidth forms

There are a few other characters which simply can't be displayed
properly using single-width glyphs, for example:

U+222D TRIPLE INTEGRAL
U+24A8 PARENTHESIZED LATIN SMALL LETTER M
U+FB03 LATIN SMALL LIGATURE FFI
U+FB04 LATIN SMALL LIGATURE FFL
U+2473 CIRCLED NUMBER TWENTY
U+2487 PARENTHESIZED NUMBER TWENTY
U+24DC CIRCLED LATIN SMALL LETTER M
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-11 Thread Martin Norbäck

2001-04-11T10:33:14+0200, Florian Weimer -
   Markus Kuhn [EMAIL PROTECTED] writes:
 
  The only characters for which double-width (square) is appropriate are
  
- Han ideographs
- Hiragana/Katakana
- Hangul
- CJK punctuation
- fullwidth forms
 
 There are a few other characters which simply can't be displayed
 properly using single-width glyphs, for example:
 
   U+222D TRIPLE INTEGRAL
   U+24A8 PARENTHESIZED LATIN SMALL LETTER M
   U+FB03 LATIN SMALL LIGATURE FFI
   U+FB04 LATIN SMALL LIGATURE FFL
 U+2473 CIRCLED NUMBER TWENTY
 U+2487 PARENTHESIZED NUMBER TWENTY
 U+24DC CIRCLED LATIN SMALL LETTER M

I think this is a simple issue of counting the vertical lines in the
glyph. Most people use 6x13 or something, which means there is room for
three vertical lines. If there are more than that in the glyph, then it
really should be double width.

As for your examples, triple integral is a border case, but if you add
circled then it surely should be double width.

All circled or parenthesized letters and numbers should clearly be double
width.

The latin ligatures should be double witdh as well, but who uses them in
plain text?

The line drawing characters should not be double width, it would defeat
their purpose completely. Will this break many CJK texts?

As for the EM DASH, typhographically it should perhaps be double width,
but we aren't dealing with typography. As long as it's readable, I would
rather see as few double width characters as possible.

n.

-- 
[ http://www.dtek.chalmers.se/~d95mback/ ] [ PGP: 0x453504F1 ] [ UIN: 4439498 ]
Opinions expressed above are mine, and not those of my future employees.
SIGBORE: Signature boring error, core dumped
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-11 Thread Florian Weimer

Martin Norbck [EMAIL PROTECTED] writes:

 I think this is a simple issue of counting the vertical lines in the
 glyph.

I think that's to coarse.  There might be some cases in which existing
monospace fonts treat characters as single-width because systems with
9x16 or 8x8 glyph cells are much more commonly used than 6x13 cells.
In such cases, compatibility should be preserved.

 The latin ligatures should be double witdh as well, but who uses them in
 plain text?

I guess people who play with Unicode to upset other people. ;-)

 As for the EM DASH, typhographically it should perhaps be double width,
 but we aren't dealing with typography. As long as it's readable, I would
 rather see as few double width characters as possible.

I think it has to be double-width in order to see that it's not an EN
DASH.
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-10 Thread Jungshik Shin




On Tue, 10 Apr 2001, Thomas Chan wrote:

 Markus Kuhn wrote:
 
  [There is also the minor issue of U+300a, U+300b, U+301a, U+301b, four
  mathematical brackets that somehow ended up in the CJK section probably
  because the Unicode/UCS authors were initially not aware of the
  mathematical origin of these doublestroke parenthesis and brackets. Do
  these have any non-mathematical use whatsoever in CJK typography?
  Otherwise, they clearly must be normal width.]

 U+300A LEFT DOUBLE ANGLE BRACKET and U+300B RIGHT DOUBLE ANGLE BRACKET
 are used to mark titles of books, periodicals, articles, etc.

  Some Korean books/newspaper articles do the same.  However, I don't
think this necessarily mean that they should have double-width.

   BTW, for mathematics, can we just use U+226A and U+226B?

   Jungshik Shin

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-10 Thread hsaka


Hello,

 From: Markus Kuhn [EMAIL PROTECTED]
 I understand, that there are also the block graphic characters. If you
 live in a world where you use mostly double-width glyphs in terminal
 emulators, it might be convenient to also have double-width block
 graphics characters. The answer of the Unicode consortium is very simple
 here: Nobody should be using the block graphics characters anyway. their
 use is deprecated, and they are only in Unicode to guarantee round-trip
 compatibility with legacy sets. In modern display systems (such as
 HTML), you have appropriate alternative means such as table constructs
 to do what you used to use block graphics for.

At least, a text-based browser w3m has to use the block graphic
characters for rendering of HTML. Ok, If the other apllications don't
use them, there is no problem to be double-width for only w3m.
# BTW, the development version of w3m which supports UTF-8 treats
# as double-width characters for display with CJK charsets and as
# normal-width characters for display with UTF-8.
# I am satisfied this behavior for xterm-152-27 by Robert Brady. 

On the other hand, some symbols (e.g. circle, triangle, star, and so on)
should be double-width characters in Japanese locale, because Japanese
use them with double-width glyphs for a long time. The normal-width
glyphs are not suitable for Japanese document.
May I ask a question? Did people who used ISO-8859 use these symbols
in plain text? I think new comers should respect old users.

Thank you,
--- 
Hironori Sakamoto [EMAIL PROTECTED] 
 http://www2u.biglobe.ne.jp/~hsaka/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-10 Thread Markus Kuhn

[EMAIL PROTECTED] wrote on 2001-04-10 12:11 UTC:
 On the other hand, some symbols (e.g. circle, triangle, star, and so on)
 should be double-width characters in Japanese locale, because Japanese
 use them with double-width glyphs for a long time. The normal-width
 glyphs are not suitable for Japanese document.
 May I ask a question? Did people who used ISO-8859 use these symbols
 in plain text? I think new comers should respect old users.

The people who use ISO 8859 also use or used CP437 (the original IBM
PC character set) is a widely used coded character set with a
significant amount of available text, which is often interspersed with
block graphic elements and other graphical symbols. It contains:

263A  WHITE SMILING FACE
263B  BLACK SMILING FACE
2665  BLACK HEART SUIT
2666  BLACK DIAMOND SUIT
2663  BLACK CLUB SUIT
2660  BLACK SPADE SUIT
2022  BULLET
25D8  INVERSE BULLET
25CB  WHITE CIRCLE
25D9  INVERSE WHITE CIRCLE
2642  MALE SIGN
2640  FEMALE SIGN
266A  EIGHTH NOTE
266B  BEAMED EIGHTH NOTES
263C  WHITE SUN WITH RAYS
25BA  BLACK RIGHT-POINTING POINTER
25C4  BLACK LEFT-POINTING POINTER
2195  UP DOWN ARROW
203C  DOUBLE EXCLAMATION MARK
00B6  PILCROW SIGN
00A7  SECTION SIGN
25AC  BLACK RECTANGLE
21A8  UP DOWN ARROW WITH BASE
2191  UPWARDS ARROW
2193  DOWNWARDS ARROW
2192  RIGHTWARDS ARROW
2190  LEFTWARDS ARROW
221F  RIGHT ANGLE
2194  LEFT RIGHT ARROW
25B2  BLACK UP-POINTING TRIANGLE
25BC  BLACK DOWN-POINTING TRIANGLE
2302  HOUSE
2591  LIGHT SHADE
2592  MEDIUM SHADE
2593  DARK SHADE
2502  BOX DRAWINGS LIGHT VERTICAL
2524  BOX DRAWINGS LIGHT VERTICAL AND LEFT
2561  BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
2562  BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
2556  BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
2555  BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
2563  BOX DRAWINGS DOUBLE VERTICAL AND LEFT
2551  BOX DRAWINGS DOUBLE VERTICAL
2557  BOX DRAWINGS DOUBLE DOWN AND LEFT
255d  BOX DRAWINGS DOUBLE UP AND LEFT
255c  BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
255b  BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
2510  BOX DRAWINGS LIGHT DOWN AND LEFT
2514  BOX DRAWINGS LIGHT UP AND RIGHT
2534  BOX DRAWINGS LIGHT UP AND HORIZONTAL
252c  BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
251c  BOX DRAWINGS LIGHT VERTICAL AND RIGHT
2500  BOX DRAWINGS LIGHT HORIZONTAL
253c  BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
255e  BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
255f  BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
255a  BOX DRAWINGS DOUBLE UP AND RIGHT
2554  BOX DRAWINGS DOUBLE DOWN AND RIGHT
2569  BOX DRAWINGS DOUBLE UP AND HORIZONTAL
2566  BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
2560  BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
2550  BOX DRAWINGS DOUBLE HORIZONTAL
256c  BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
2567  BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
2568  BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
2564  BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
2565  BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
2559  BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
2558  BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
2552  BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
2553  BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
256b  BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
256a  BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
2518  BOX DRAWINGS LIGHT UP AND LEFT
250c  BOX DRAWINGS LIGHT DOWN AND RIGHT
2588  FULL BLOCK
2584  LOWER HALF BLOCK
258c  LEFT HALF BLOCK
2590  RIGHT HALF BLOCK
2580  UPPER HALF BLOCK
25a0  BLACK SQUARE

Just yesterday, I used a very good disassembler for a microcontroller
under a DOS emulator that produced output files containing various of
the above symbols. I was glad to be able to convert this to UTF-8 and
be able to continue using it with my normal Linux toolchain. CP437 is
far from having tied out yet. There is a lot of perfectly useful
MS-DOS software around that is still in wide use and people like
myself wish to be able to send and display output files and text-mode
screenshots of these MS-DOS tools in UTF-8.

The ultimate solution for people who want to use formatted charcell
plaintext written for EUC-JP, etc. is to have wcwidth locale dependent
and to have two Japanese UTF-8 locales with both width conventions.

Sample implementations of both are available in

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

The wcwidth_cjk version assigns to characters with the EastAsian width
property "ambiguous" in the Unicode database a width of 2, while the
normal wcwidth assigns to these a 1.

Suggestion:

Call the locale with the normal wcwidth behaviour

  ja.UTF-8

and the traditional one (EUC backwards compatibility)

  ja.UTF-8@oldwidth

As long as applications follow the wcwidth provided by the C library,
users can easily change the wcwidth behaviour by simply recompiling
the locale definition files.

In the interest of simplicity and interoperability, we definitely
shout avoid to introduce more than two wcwidth conventions.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at 

Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-10 Thread Robert Brady

On Wed, 11 Apr 2001, Hironori Sakamoto wrote:

 This is not related with EUC-JP. This is not old or traditional.
 The normal-width glyphs are not suitable for Japanese document.
 The current wcwidth behaviour of xterm for Japanese locale is
 a BUG (at least for me).

Do you care about the doublewidth Cyrillic, or is it just the punctuation
and graphics and so forth?  Could you make a list of the characters you
care about being doublewidth?

-- 
Robert Brady
[EMAIL PROTECTED]

-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/



Re: Doublewidth Cyrillic for unhappy Japanese people

2001-04-10 Thread Bruno Haible

Thomas Chan writes:

 But I think the last new usages of CP437 really were in the
 mid/late 1990's, at least for US DOS/Windows users.  I think on
 i386 Linux, it might have lasted a bit longer--the linux console I
 used defaulted to CP437.

The Linux distributions I used in/around 1993 already had an
ISO-8859-1 console. But FreeBSD's console is today still by default
CP437.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/lists/