Re: Doublewidth Cyrillic for unhappy Japanese people
Hi, At Tue, 17 Apr 2001 22:54:42 -0500, David Starner [EMAIL PROTECTED] wrote: It's a dead horse, okay? Can we let it lie unless there's an actual pragmatic reason to bring it up? Sorry, I don't understand your English. My English-Japanese dictionary doesn't explain what is "dead horse". - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
To introduce a new and irrelevent tangent... On Wed, Apr 18, 2001 at 12:11:00PM +0900, Tomohiro KUBOTA wrote: If lack of proportional font were such very annoying thing for them, why typewriter was so widely used? From the 1950s or 60s there actually were proportionally-spacing typewriters in common use, such as the IBM Executive. Typing was done in the normal way, but backspacing was more interesting. David Starner - [EMAIL PROTECTED] - wrote: Typewriters are a pragmatic thing. Until recently, they were the only way to produce decent print (i.e. not handwritten) quickly and cheaply. Every so often, I find a book in the library, usually in the linguistics section, where it was typed up on a typewriter and all the accents were added in by hand. Typewriters in some countries had accents on dead keys, like the Czech typewriter I bought at a secondhand shop for five dollars and wrote most of my undergraduate papers on. By the way, the sentiment that Unicode is only for GUI font rendering and typesetting is a misconception, albeit one that is held by many of its proponents. In fact it is a plain-text standard, period, that is only slightly more hostile to terminal emulation than to any other form of expression. The fact that combining diacritics come after the base character rather than before is the main stumbling block for terminal emulators, which operate in realtime and don't have the luxury of lookahead. However, the "deprecated" status of line and box-drawing characters can no longer be claimed because Unicode 3.1 adds tons of them, mostly for math, but some expressly for terminal emulation. See: ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt - Frank - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Tue, 10 Apr 2001 09:44:58 +0100, Markus Kuhn [EMAIL PROTECTED] pisze: I understand, that there are also the block graphic characters. If you live in a world where you use mostly double-width glyphs in terminal emulators, it might be convenient to also have double-width block graphics characters. I use single-width glyphs exclusively, including block graphics. The Linux console doesn't support double-width anyway (at least in 2.2.x kernels). The answer of the Unicode consortium is very simple here: Nobody should be using the block graphics characters anyway. their use is deprecated, and they are only in Unicode to guarantee round-trip compatibility with legacy sets. This advice is nonsense. What should be used instead in an API for terminal drawing, if it uses Unicode otherwise anyway? Why should it be made more complicated by treating block drawing characters specially? In modern display systems (such as HTML), you have appropriate alternative means such as table constructs to do what you used to use block graphics for. HTML is not appropriate for a terminal drawing API. If you want to draw a line in a text, use proper graphical primitives for that, not block graphics symbols. There are no graphical primitives in terminals. -- __(" Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/ \__/ ^^ SYGNATURA ZASTPCZA QRCZAK - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Marcin 'Qrczak' Kowalczyk wrote on 2001-04-17 21:37 UTC: Tue, 10 Apr 2001 09:44:58 +0100, Markus Kuhn [EMAIL PROTECTED] pisze: I understand, that there are also the block graphic characters. If you live in a world where you use mostly double-width glyphs in terminal emulators, it might be convenient to also have double-width block graphics characters. I use single-width glyphs exclusively, including block graphics. The Linux console doesn't support double-width anyway (at least in 2.2.x kernels). The answer of the Unicode consortium is very simple here: Nobody should be using the block graphics characters anyway. their use is deprecated, and they are only in Unicode to guarantee round-trip compatibility with legacy sets. This advice is nonsense. What should be used instead in an API for terminal drawing, if it uses Unicode otherwise anyway? I know. However, if you read the Unicode standard, it will become very apparent that it's authors clearly did not think in terms of terminal emulators. They thought in terms of word processing documents and web pages, in terms of typesetting, paragraph formatting and proportional fonts. The UCS authors on the other hand very clearly had also applications such as terminal emulators in mind (which is why they talk about ISO 6429, though they clearly didn't specify solutions to all terminal-emulator related problems). That just as a general warning to people who might think that the Unicode bidi algorithm is directly suitable for implementation in a terminal emulator, etc. The reason why http://www.unicode.org/unicode/reports/tr11/ exists is primarily because of encoding conversion concerns, not because of terminal emulation. Microsoft for instance is quietly phasing out anything that looks remotely like a terminal emulator from its platform concept (the command prompt is already mostly gone from the Windows 2000 Workstation Edition for instance). A few more background thoughts on this: Japanese typography doesn't use proportional fonts at all, so the distinction between narrow and wide character versions shows up in Japan even in typesetting. There is one deep reason behind some of the flame wars and misunderstandings that we have here from time to time: In Europe formatted plaintext files is naturally considered to be a far more primitive representation of text than proper typography. In Europe, plaintext is just able to represent the output of antique typewriters and teletypers with very restricted glyph repertoirs and no style variation, not even proportional fonts. In Japan on the other hand, what formatted plaintext provides is very close to what proper typography provides. Style variations like bold or italic are not commonly used, wide characters are monospaced anyway and the traditional 16-bit character sets contain almost the full repertoire of typographic characters. So while in Europe we see plaintext as a primitive compromise anyway, in Japan it is all that most people need to write text, unless they want to have fancy graphic design with colors and different font sizes. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
On Wed, Apr 18, 2001 at 12:11:00PM +0900, Tomohiro KUBOTA wrote: I have no idea what impression your word "very restricted" implies. I imagine Latin alphabet people have long tradition of typewriter. If lack of proportional font were such very annoying thing for them, why typewriter was so widely used? Typewriters are a pragmatic thing. Until recently, they were the only way to produce decent print (i.e. not handwritten) quickly and cheaply. Every so often, I find a book in the library, usually in the linguistics section, where it was typed up on a typewriter and all the accents were added in by hand. If they had had an option, almost any option, they would have used it, because the end result looks awful. Good typesetting is the top of my list on things I appreciate computers for. (I have even read a journal which use typewritter (courier?) for typesetting.) Find, if you can, Donald Knuth's "TeX and Metafont: New Directions in Typesetting". In it, Knuth shows the typographic history of "The Transactions of the American Mathematical Society" through 1977, and shows where, at one point to save costs, they went to a typewriter style printing. "At this point I regretfully stopped submitting papers to the American Mathematical Society, as the finished product was just too painful for me to look at. Similar fluctuations of typegraphical quality have appeared recetly in all technical fields, especially in physics where the situation has gotten even worse." (page 7). Later he explains how the decreasing quality of The Art of Computer Programming's typesetting got him to design TeX. -- David Starner - [EMAIL PROTECTED] Pointless website: http://dvdeug.dhis.org "I don't care if Bill personally has my name and reads my email and laughs at me. In fact, I'd be rather honored." - Joseph_Greg - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Markus Kuhn [EMAIL PROTECTED] writes: The only characters for which double-width (square) is appropriate are - Han ideographs - Hiragana/Katakana - Hangul - CJK punctuation - fullwidth forms There are a few other characters which simply can't be displayed properly using single-width glyphs, for example: U+222D TRIPLE INTEGRAL U+24A8 PARENTHESIZED LATIN SMALL LETTER M U+FB03 LATIN SMALL LIGATURE FFI U+FB04 LATIN SMALL LIGATURE FFL U+2473 CIRCLED NUMBER TWENTY U+2487 PARENTHESIZED NUMBER TWENTY U+24DC CIRCLED LATIN SMALL LETTER M - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
2001-04-11T10:33:14+0200, Florian Weimer - Markus Kuhn [EMAIL PROTECTED] writes: The only characters for which double-width (square) is appropriate are - Han ideographs - Hiragana/Katakana - Hangul - CJK punctuation - fullwidth forms There are a few other characters which simply can't be displayed properly using single-width glyphs, for example: U+222D TRIPLE INTEGRAL U+24A8 PARENTHESIZED LATIN SMALL LETTER M U+FB03 LATIN SMALL LIGATURE FFI U+FB04 LATIN SMALL LIGATURE FFL U+2473 CIRCLED NUMBER TWENTY U+2487 PARENTHESIZED NUMBER TWENTY U+24DC CIRCLED LATIN SMALL LETTER M I think this is a simple issue of counting the vertical lines in the glyph. Most people use 6x13 or something, which means there is room for three vertical lines. If there are more than that in the glyph, then it really should be double width. As for your examples, triple integral is a border case, but if you add circled then it surely should be double width. All circled or parenthesized letters and numbers should clearly be double width. The latin ligatures should be double witdh as well, but who uses them in plain text? The line drawing characters should not be double width, it would defeat their purpose completely. Will this break many CJK texts? As for the EM DASH, typhographically it should perhaps be double width, but we aren't dealing with typography. As long as it's readable, I would rather see as few double width characters as possible. n. -- [ http://www.dtek.chalmers.se/~d95mback/ ] [ PGP: 0x453504F1 ] [ UIN: 4439498 ] Opinions expressed above are mine, and not those of my future employees. SIGBORE: Signature boring error, core dumped - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Martin Norbck [EMAIL PROTECTED] writes: I think this is a simple issue of counting the vertical lines in the glyph. I think that's to coarse. There might be some cases in which existing monospace fonts treat characters as single-width because systems with 9x16 or 8x8 glyph cells are much more commonly used than 6x13 cells. In such cases, compatibility should be preserved. The latin ligatures should be double witdh as well, but who uses them in plain text? I guess people who play with Unicode to upset other people. ;-) As for the EM DASH, typhographically it should perhaps be double width, but we aren't dealing with typography. As long as it's readable, I would rather see as few double width characters as possible. I think it has to be double-width in order to see that it's not an EN DASH. - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
On Tue, 10 Apr 2001, Thomas Chan wrote: Markus Kuhn wrote: [There is also the minor issue of U+300a, U+300b, U+301a, U+301b, four mathematical brackets that somehow ended up in the CJK section probably because the Unicode/UCS authors were initially not aware of the mathematical origin of these doublestroke parenthesis and brackets. Do these have any non-mathematical use whatsoever in CJK typography? Otherwise, they clearly must be normal width.] U+300A LEFT DOUBLE ANGLE BRACKET and U+300B RIGHT DOUBLE ANGLE BRACKET are used to mark titles of books, periodicals, articles, etc. Some Korean books/newspaper articles do the same. However, I don't think this necessarily mean that they should have double-width. BTW, for mathematics, can we just use U+226A and U+226B? Jungshik Shin - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Hello, From: Markus Kuhn [EMAIL PROTECTED] I understand, that there are also the block graphic characters. If you live in a world where you use mostly double-width glyphs in terminal emulators, it might be convenient to also have double-width block graphics characters. The answer of the Unicode consortium is very simple here: Nobody should be using the block graphics characters anyway. their use is deprecated, and they are only in Unicode to guarantee round-trip compatibility with legacy sets. In modern display systems (such as HTML), you have appropriate alternative means such as table constructs to do what you used to use block graphics for. At least, a text-based browser w3m has to use the block graphic characters for rendering of HTML. Ok, If the other apllications don't use them, there is no problem to be double-width for only w3m. # BTW, the development version of w3m which supports UTF-8 treats # as double-width characters for display with CJK charsets and as # normal-width characters for display with UTF-8. # I am satisfied this behavior for xterm-152-27 by Robert Brady. On the other hand, some symbols (e.g. circle, triangle, star, and so on) should be double-width characters in Japanese locale, because Japanese use them with double-width glyphs for a long time. The normal-width glyphs are not suitable for Japanese document. May I ask a question? Did people who used ISO-8859 use these symbols in plain text? I think new comers should respect old users. Thank you, --- Hironori Sakamoto [EMAIL PROTECTED] http://www2u.biglobe.ne.jp/~hsaka/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
[EMAIL PROTECTED] wrote on 2001-04-10 12:11 UTC: On the other hand, some symbols (e.g. circle, triangle, star, and so on) should be double-width characters in Japanese locale, because Japanese use them with double-width glyphs for a long time. The normal-width glyphs are not suitable for Japanese document. May I ask a question? Did people who used ISO-8859 use these symbols in plain text? I think new comers should respect old users. The people who use ISO 8859 also use or used CP437 (the original IBM PC character set) is a widely used coded character set with a significant amount of available text, which is often interspersed with block graphic elements and other graphical symbols. It contains: 263A WHITE SMILING FACE 263B BLACK SMILING FACE 2665 BLACK HEART SUIT 2666 BLACK DIAMOND SUIT 2663 BLACK CLUB SUIT 2660 BLACK SPADE SUIT 2022 BULLET 25D8 INVERSE BULLET 25CB WHITE CIRCLE 25D9 INVERSE WHITE CIRCLE 2642 MALE SIGN 2640 FEMALE SIGN 266A EIGHTH NOTE 266B BEAMED EIGHTH NOTES 263C WHITE SUN WITH RAYS 25BA BLACK RIGHT-POINTING POINTER 25C4 BLACK LEFT-POINTING POINTER 2195 UP DOWN ARROW 203C DOUBLE EXCLAMATION MARK 00B6 PILCROW SIGN 00A7 SECTION SIGN 25AC BLACK RECTANGLE 21A8 UP DOWN ARROW WITH BASE 2191 UPWARDS ARROW 2193 DOWNWARDS ARROW 2192 RIGHTWARDS ARROW 2190 LEFTWARDS ARROW 221F RIGHT ANGLE 2194 LEFT RIGHT ARROW 25B2 BLACK UP-POINTING TRIANGLE 25BC BLACK DOWN-POINTING TRIANGLE 2302 HOUSE 2591 LIGHT SHADE 2592 MEDIUM SHADE 2593 DARK SHADE 2502 BOX DRAWINGS LIGHT VERTICAL 2524 BOX DRAWINGS LIGHT VERTICAL AND LEFT 2561 BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE 2562 BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE 2556 BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE 2555 BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE 2563 BOX DRAWINGS DOUBLE VERTICAL AND LEFT 2551 BOX DRAWINGS DOUBLE VERTICAL 2557 BOX DRAWINGS DOUBLE DOWN AND LEFT 255d BOX DRAWINGS DOUBLE UP AND LEFT 255c BOX DRAWINGS UP DOUBLE AND LEFT SINGLE 255b BOX DRAWINGS UP SINGLE AND LEFT DOUBLE 2510 BOX DRAWINGS LIGHT DOWN AND LEFT 2514 BOX DRAWINGS LIGHT UP AND RIGHT 2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL 252c BOX DRAWINGS LIGHT DOWN AND HORIZONTAL 251c BOX DRAWINGS LIGHT VERTICAL AND RIGHT 2500 BOX DRAWINGS LIGHT HORIZONTAL 253c BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL 255e BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE 255f BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE 255a BOX DRAWINGS DOUBLE UP AND RIGHT 2554 BOX DRAWINGS DOUBLE DOWN AND RIGHT 2569 BOX DRAWINGS DOUBLE UP AND HORIZONTAL 2566 BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL 2560 BOX DRAWINGS DOUBLE VERTICAL AND RIGHT 2550 BOX DRAWINGS DOUBLE HORIZONTAL 256c BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL 2567 BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE 2568 BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE 2564 BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE 2565 BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE 2559 BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE 2558 BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE 2552 BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE 2553 BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE 256b BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE 256a BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE 2518 BOX DRAWINGS LIGHT UP AND LEFT 250c BOX DRAWINGS LIGHT DOWN AND RIGHT 2588 FULL BLOCK 2584 LOWER HALF BLOCK 258c LEFT HALF BLOCK 2590 RIGHT HALF BLOCK 2580 UPPER HALF BLOCK 25a0 BLACK SQUARE Just yesterday, I used a very good disassembler for a microcontroller under a DOS emulator that produced output files containing various of the above symbols. I was glad to be able to convert this to UTF-8 and be able to continue using it with my normal Linux toolchain. CP437 is far from having tied out yet. There is a lot of perfectly useful MS-DOS software around that is still in wide use and people like myself wish to be able to send and display output files and text-mode screenshots of these MS-DOS tools in UTF-8. The ultimate solution for people who want to use formatted charcell plaintext written for EUC-JP, etc. is to have wcwidth locale dependent and to have two Japanese UTF-8 locales with both width conventions. Sample implementations of both are available in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c The wcwidth_cjk version assigns to characters with the EastAsian width property "ambiguous" in the Unicode database a width of 2, while the normal wcwidth assigns to these a 1. Suggestion: Call the locale with the normal wcwidth behaviour ja.UTF-8 and the traditional one (EUC backwards compatibility) ja.UTF-8@oldwidth As long as applications follow the wcwidth provided by the C library, users can easily change the wcwidth behaviour by simply recompiling the locale definition files. In the interest of simplicity and interoperability, we definitely shout avoid to introduce more than two wcwidth conventions. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at
Re: Doublewidth Cyrillic for unhappy Japanese people
On Wed, 11 Apr 2001, Hironori Sakamoto wrote: This is not related with EUC-JP. This is not old or traditional. The normal-width glyphs are not suitable for Japanese document. The current wcwidth behaviour of xterm for Japanese locale is a BUG (at least for me). Do you care about the doublewidth Cyrillic, or is it just the punctuation and graphics and so forth? Could you make a list of the characters you care about being doublewidth? -- Robert Brady [EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
Re: Doublewidth Cyrillic for unhappy Japanese people
Thomas Chan writes: But I think the last new usages of CP437 really were in the mid/late 1990's, at least for US DOS/Windows users. I think on i386 Linux, it might have lasted a bit longer--the linux console I used defaulted to CP437. The Linux distributions I used in/around 1993 already had an ISO-8859-1 console. But FreeBSD's console is today still by default CP437. Bruno - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/