Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
Taking this thread back to the original question... The Line_Break property values for halfwidth katakana (lb=AL) and regular katakana (lb=ID) have been stable since they were first defined for Unicode 3.0 -- 15 years ago. Regardless of whether lb=AL is the optimal assignment for the halfwidth katakana, it seems likely to me that trying to *change* that Line_Break assignment, just for halfwidth katakana, at this late date, would likely be more destabilizing for existing implementations, rather than helpful. The citations below show *different* behavior between browsers for linebreaking around halfwidth katakana. That suggests that Firefox and IE11 have already provided tailoring to better match expectations. The correct avenue forward, it seems to me, would be to pursue bugs against browsers that do not show expected behavior, to see if improvements there are feasible, rather than to modify the base Line_Break property values that everybody has to tailor *from*. Note that this is not *just* a Japanese problem nor a matter of not matching JIS X 4051. UAX #14 is *not* a direct implementation of JIS X 4051 rules, although it is certainly informed by them and has many Line_Break values introduced to get default behavior closer to the Japanese rules for linebreaking. And the compatibility halfwidth characters in the standard also include halfwidth jamo and symbols, so any changes also would need to be considered in the context of consistency for those and for *Korean* rules, as well as for Japanese. --Ken On 4/27/2015 10:57 PM, Makoto Kato wrote: Hi, Suzuki-san. Thank you for reply. At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? Since half-width katakana isn't ID, it isn't break line like full-wdith katakana. Firefox and IE11 define half-width katakana as ID. The line break of half-width katakana is same as full-width katakana. Chrome doesn't define it as ID. Half-width katakana isn't line break per character. Although I read JIS X 4051, it doesn't define that half-width katakana and full-width katakana are differently. I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? My discussion is why half-width katanaka character isn't same class of full-width katakana character. In this case, half-width katakana originally defines as AL at current spec. So when moving to ID, break rule is strongly changed. (non-break - break before or after). -- Makoto
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). So effectively their line breaking behavior is very different. Those half-width letters are in fact similar to linear jamos (not composed into syllabic squares) in the Korean script, and to Bopomofo letters. And may be we could add the CJK key letters (radicals used for example in IDS) to this list, or Yi radicals. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really half-width but proportional. They are also used with non-ideographic punctuation (notably the ASCII punctuation) and standard SPACE (U+0020). If rendered in vertical lines, they could be either rotated (just like Latin letters), or not (aligned horizontallly like letters in columns of crosswords, but they may also have proportional height, like in Latin/Greek/Cyrillic where it is sometimes needed for example with capital letters with stacked accents, or when using sized spaces) So IMHO, those half-width letters are in fact to be considered as another separate script, for typographic purpose. They are unified with non-halfwidth letters, only for collation with minor differences (plain-text searching and sorting). 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
Note: is it really allowed to break between a Latin letter and an half-width kana? Such sequences are frequent when there are untranslated foreign Latin (or may be Greek/Cyrillic/Hebrew/Arabic) insertions in Japanese (toponyms, trademarks, people names...), that are followed by a semantic kana terminator. If you allow this break, the terminator will loose its semantic. There are probably similar exception between [ideographs or fullwidth Latin/Greek/Cyrillic] and [half-width or full-width kana], for those script boundaries. 2015-04-28 9:47 GMT+02:00 Philippe Verdy verd...@wanadoo.fr: My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). So effectively their line breaking behavior is very different. Those half-width letters are in fact similar to linear jamos (not composed into syllabic squares) in the Korean script, and to Bopomofo letters. And may be we could add the CJK key letters (radicals used for example in IDS) to this list, or Yi radicals. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really half-width but proportional. They are also used with non-ideographic punctuation (notably the ASCII punctuation) and standard SPACE (U+0020). If rendered in vertical lines, they could be either rotated (just like Latin letters), or not (aligned horizontallly like letters in columns of crosswords, but they may also have proportional height, like in Latin/Greek/Cyrillic where it is sometimes needed for example with capital letters with stacked accents, or when using sized spaces) So IMHO, those half-width letters are in fact to be considered as another separate script, for typographic purpose. They are unified with non-halfwidth letters, only for collation with minor differences (plain-text searching and sorting). 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto
Re: [Unicode] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
Dear Philippe, Philippe Verdy wrote: My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). So effectively their line breaking behavior is very different. Excuse me, do you mean that a half-width kana text should have the spaces between the words, although full-width (standard) kana text may not have? Could you tell me more about the community preferring such distinction? I think, the orthography proposed to write Japanese language in Kana without Kanji has the word-breaking space, like, http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png but it is not officialized, and, it does not distinguish full-width kana and half-width kana. Regards, mpsuzuki Those half-width letters are in fact similar to linear jamos (not composed into syllabic squares) in the Korean script, and to Bopomofo letters. And may be we could add the CJK key letters (radicals used for example in IDS) to this list, or Yi radicals. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really half-width but proportional. They are also used with non-ideographic punctuation (notably the ASCII punctuation) and standard SPACE (U+0020). If rendered in vertical lines, they could be either rotated (just like Latin letters), or not (aligned horizontallly like letters in columns of crosswords, but they may also have proportional height, like in Latin/Greek/Cyrillic where it is sometimes needed for example with capital letters with stacked accents, or when using sized spaces) So IMHO, those half-width letters are in fact to be considered as another separate script, for typographic purpose. They are unified with non-halfwidth letters, only for collation with minor differences (plain-text searching and sorting). 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto
Re: [Unicode] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
I just gave an opinion about what I have seen. I don't know if this is correct or preferred. Half-width text is a modern invention that does not obey the traditions used in CJK composition squares (which should also be rendered vertically by default, even if today on the Internet this is not the case, it is still the case for printed texts). They started being used at the same time that Latin letters started to be mixed in text, and computers appeared that offered only half-width character cells in monospaced fonts (to see other ideographs, those old computers needed to allocated two cells and use separate fonts for the left side and the right side) I don't know if whitespace is prefered or not in halfwidth text, I have seen both... 2015-04-28 10:04 GMT+02:00 suzuki toshiya mpsuz...@hiroshima-u.ac.jp: Dear Philippe, Philippe Verdy wrote: My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). So effectively their line breaking behavior is very different. Excuse me, do you mean that a half-width kana text should have the spaces between the words, although full-width (standard) kana text may not have? Could you tell me more about the community preferring such distinction? I think, the orthography proposed to write Japanese language in Kana without Kanji has the word-breaking space, like, http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png but it is not officialized, and, it does not distinguish full-width kana and half-width kana. Regards, mpsuzuki Those half-width letters are in fact similar to linear jamos (not composed into syllabic squares) in the Korean script, and to Bopomofo letters. And may be we could add the CJK key letters (radicals used for example in IDS) to this list, or Yi radicals. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really half-width but proportional. They are also used with non-ideographic punctuation (notably the ASCII punctuation) and standard SPACE (U+0020). If rendered in vertical lines, they could be either rotated (just like Latin letters), or not (aligned horizontallly like letters in columns of crosswords, but they may also have proportional height, like in Latin/Greek/Cyrillic where it is sometimes needed for example with capital letters with stacked accents, or when using sized spaces) So IMHO, those half-width letters are in fact to be considered as another separate script, for typographic purpose. They are unified with non-halfwidth letters, only for collation with minor differences (plain-text searching and sorting). 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto
AW: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
No. They are still in use. One typical usage of half-width kanas is the display of short texts on small devices of embedded systems, like status messages of control units, for example a one-line display, 30 characters wide, monospace, with 8x10 pixels per character. Albrecht -Ursprüngliche Nachricht- From: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Werner LEMBERG Sent: Dienstag, 28. April 2015 10:09 To: verd...@wanadoo.fr Cc: m_k...@ga2.so-net.ne.jp; unicode@unicode.org Subject: Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana? (...) AFAIK, the existence of half-width kanas in Unicode is purely for backwards and round-trip compatibility.
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
Hi, Suzuki-san. Thank you for reply. At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? Since half-width katakana isn't ID, it isn't break line like full-wdith katakana. This is a sample for line break of half-width katakana. (There is good sample by web browser implementation) http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html Firefox and IE11 define half-width katakana as ID. The line break of half-width katakana is same as full-width katakana. Chrome doesn't define it as ID. Half-width katakana isn't line break per character. Although I read JIS X 4051, it doesn't define that half-width katakana and full-width katakana are differently. I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? My discussion is why half-width katanaka character isn't same class of full-width katakana character. In this case, half-width katakana originally defines as AL at current spec. So when moving to ID, break rule is strongly changed. (non-break - break before or after). -- Makoto On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya mpsuz...@hiroshima-u.ac.jp wrote: Kato-san, At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? Regards, mpsuzuki Makoto Kato wrote: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
# Sorry, I slipped to consider about the # big picture attachment. I reduced the # image size and resend to Unicode mailing # list. Kato-san, Thank you very much for prompt response. This is a sample for line break of half-width katakana. (There is good sample by web browser implementation) http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html I wish if the sample text is longer to show the line breaking behaviour. I attached jugem.txt and the screenshot by Firefox and Chromium. Firefox and IE11 define half-width katakana as ID. The line break of half-width katakana is same as full-width katakana. Chrome doesn't define it as ID. Half-width katakana isn't line break per character. Oh, Google Chrome could not break half-width katakana text by per-character line breaking! It is very good example showing that the lack of explicit definition caused the incompatibility (and inconvenience). I'm sorry for troubling you about the explanation. I agree with your proposal to add halfwidth katakana to ID-class, even if further discussion is needed for other scripts. Regards, mpsuzuki Makoto Kato wrote: Hi, Suzuki-san. Thank you for reply. At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? Since half-width katakana isn't ID, it isn't break line like full-wdith katakana. This is a sample for line break of half-width katakana. (There is good sample by web browser implementation) http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html Firefox and IE11 define half-width katakana as ID. The line break of half-width katakana is same as full-width katakana. Chrome doesn't define it as ID. Half-width katakana isn't line break per character. Although I read JIS X 4051, it doesn't define that half-width katakana and full-width katakana are differently. I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? My discussion is why half-width katanaka character isn't same class of full-width katakana character. In this case, half-width katakana originally defines as AL at current spec. So when moving to ID, break rule is strongly changed. (non-break - break before or after). -- Makoto On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya mpsuz...@hiroshima-u.ac.jp wrote: Kato-san, At present, I have no objection to add halfwidth katakana to ideographic-class in UAX#14, but I'm unfamiliar with the (negative) impact caused by the lack of halfwidth katakana in it. Could you tell me if you know anything? I guess, the inclusion or exclusion in other classes, like, AI, AL, CJ, JL, JV, JT, SA might be quite important to realize the appropriate line breaking, but the inclusion or exclusion in ID-class does not seem to be important. If the inclusion in ID-class is important, more characters (e.g. Bopomofo) should be considered for full coverage. How do you think of? Regards, mpsuzuki Makoto Kato wrote: Hi. http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic (ID). Although full-width katakana is included in ID, half-width katakana (U+FF66 and U+FF71-U+FF9D) isn't. Why? Also, Conditional Japanese Starter (CJ, http://www.unicode.org/reports/tr14/proposed.html#CJ) considers half-width variants such as half-width katakana letter small a. -- Makoto シï¾ï½ï½¹ï¾ï¾ï½¼ï¾ï½ï½¹ï¾ï¾ï½ºï¾ï½ºï½³ï¾ï½½ï¾ï½·ï¾ï½¶ï½²ï½¼ï¾ï½¬ï¾ï½½ï½²ï½·ï¾ï½®ï¾ï½½ï½²ï½·ï¾ï½®ï½³ï¾ï¾ï½³ï¾ï¾ï½²ï¾ï¾ï¾ï½³ï¾ï½²ï¾ï¾ï½¸ï½³ï¾ï¾ï¾ï½ºï¾ï¾ï½½ï¾ï¾ï½ºï¾ï¾ï¾ï¾ï¾ï½ºï½³ï½¼ï¾ï¾ï¾ï¾ï¾ï½ºï½³ï½¼ï¾ï¾ï¾ï½²ï¾ï¾ï¾ï¾ï½²ï¾ï¾ï¾ï¾ï½²ï¾ï¾ï¾ï½¼ï½ï½°ï¾ï¾ï½¶ï¾ï¾ï½¼ï½ï½°ï¾ï¾ï½¶ï¾ï¾ï¾ï½¸ï¾ï½°ï¾ï¾ï¾ï¾ï½²ï½¸ï¾ï½°ï¾ï¾ï¾ï¾ï½²ï¾ï¾ï¾ï¾ï¾ï¾ï½ºï¾ï¾ï½°ï¾ï¾ï¾ï¾ï¾ï¾ï½ºï¾ ï½°ï¾ï¾ï½®ï½³ï½·ï½ï½³ï¾ï½²ï¾ï¾ï½®ï½³ï½½ï½¹
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
My feeeling is that half-width kanas behave like Latin letters and do not even have to follow the ideographic composition square to line up with them (unlike standard kanas). It's exactly the half of the ideographic square. So effectively their line breaking behavior is very different. Maybe. However, the most important property is to be able to start a new line after (almost) any half-width kana. They are harmonized to be used along with other alphabetic scripts. In fact they may even not be really half-width but proportional. Do you have an example for that? I've *exclusively* seen fonts where half-width kanas are really half the CJK width. If rendered in vertical lines, they could be either rotated (just like Latin letters), Actually, I haven't seen half-width kanas ever used in vertical context. Does this exist? So IMHO, those half-width letters are in fact to be considered as another separate script, for typographic purpose. Yes, for typographic purposes. But typographic issues are not covered by Unicode. AFAIK, the existence of half-width kanas in Unicode is purely for backwards and round-trip compatibility. Werner
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
However, the most important property is to be able to start a new line after (almost) any half-width kana. Bad formulation, sorry. I mean: However, the most important property is to be able to break a line after (almost) any half-width kana. Werner
Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?
2015-04-28 10:09 GMT+02:00 Werner LEMBERG w...@gnu.org: Yes, for typographic purposes. But typographic issues are not covered by Unicode. AFAIK, the existence of half-width kanas in Unicode is purely for backwards and round-trip compatibility. Yes, compatibility with typographic conventions. And yes I have seen half-width text rendered vertically (always rotated: I've not seen them for now aligned like in crosswords...).