Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Ken Whistler

Taking this thread back to the original question...

The Line_Break property values for halfwidth katakana (lb=AL)
and regular katakana (lb=ID) have been stable since they
were first defined for Unicode 3.0 -- 15 years ago.

Regardless of whether lb=AL is the optimal assignment for
the halfwidth katakana, it seems likely to me that trying to
*change* that Line_Break assignment, just for halfwidth
katakana, at this late date, would likely be more destabilizing
for existing implementations, rather than helpful.

The citations below show *different* behavior between browsers
for linebreaking around halfwidth katakana. That suggests that
Firefox and IE11 have already provided tailoring to better match
expectations. The correct avenue forward, it seems to me, would
be to pursue bugs against browsers that do not show expected
behavior, to see if improvements there are feasible, rather than
to modify the base Line_Break property values that everybody has
to tailor *from*.

Note that this is not *just* a Japanese problem nor a matter
of not matching JIS X 4051. UAX #14 is *not* a direct implementation
of JIS X 4051 rules, although it is certainly informed by them and
has many Line_Break values introduced to get default behavior closer to
the Japanese rules for linebreaking. And the compatibility halfwidth
characters in the standard also include halfwidth jamo and symbols,
so any changes also would need to be considered in the context
of consistency for those and for *Korean* rules, as well as for Japanese.

--Ken

On 4/27/2015 10:57 PM, Makoto Kato wrote:

Hi, Suzuki-san.  Thank you for reply.


At present, I have no objection to add halfwidth katakana
to ideographic-class in UAX#14, but I'm unfamiliar with the
(negative) impact caused by the lack of halfwidth katakana
in it. Could you tell me if you know anything?

Since half-width katakana isn't ID, it isn't break line like
full-wdith katakana.


Firefox and IE11 define half-width katakana as ID.  The line break of
half-width katakana is same as full-width katakana.
Chrome doesn't define it as ID.  Half-width katakana isn't line break
per character.

Although I read JIS X 4051, it doesn't define that half-width katakana
and full-width katakana are differently.



I guess, the inclusion or exclusion in other classes, like,
AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
the appropriate line breaking, but the inclusion or exclusion
in ID-class does not seem to be important. If the inclusion
in ID-class is important, more characters (e.g. Bopomofo)
should be considered for full coverage. How do you think of?

My discussion is why half-width katanaka character isn't same class of
full-width katakana character.  In this case, half-width katakana
originally defines as AL at current spec.  So when moving to ID, break
rule is strongly changed. (non-break - break before or after).


-- Makoto






Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Philippe Verdy
My feeeling is that half-width kanas behave like Latin letters and do not
even have to follow the ideographic composition square to line up with them
(unlike standard kanas). So effectively their line breaking behavior is
very different.

Those half-width letters are in fact similar to linear jamos (not
composed into syllabic squares) in the Korean script, and to Bopomofo
letters. And may be we could add the CJK key letters (radicals used for
example in IDS) to this list, or Yi radicals.

They are harmonized to be used along with other alphabetic scripts. In fact
they may even not be really half-width but proportional. They are also
used with non-ideographic punctuation (notably the ASCII punctuation) and
standard SPACE (U+0020).

If rendered in vertical lines, they could be either rotated (just like
Latin letters), or not (aligned horizontallly like letters in columns of
crosswords, but they may also have proportional height, like in
Latin/Greek/Cyrillic where it is sometimes needed for example with capital
letters with stacked accents, or when using sized spaces)

So IMHO, those half-width letters are in fact to be considered as another
separate script, for typographic purpose. They are unified with
non-halfwidth letters, only for collation with minor differences
(plain-text searching and sorting).


2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp:

 Hi.

 http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic
 (ID).  Although full-width katakana is included in ID, half-width
 katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?

 Also, Conditional Japanese Starter (CJ,
 http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
 half-width variants such as half-width katakana letter small a.


 -- Makoto



Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Philippe Verdy
Note: is it really allowed to break between a Latin letter and an
half-width kana? Such sequences are frequent when there are untranslated
foreign Latin (or may be Greek/Cyrillic/Hebrew/Arabic) insertions in
Japanese (toponyms, trademarks, people names...), that are followed by a
semantic kana terminator.
If you allow this break, the terminator will loose its semantic.
There are probably similar exception between [ideographs or fullwidth
Latin/Greek/Cyrillic] and [half-width or full-width kana], for those script
boundaries.

2015-04-28 9:47 GMT+02:00 Philippe Verdy verd...@wanadoo.fr:

 My feeeling is that half-width kanas behave like Latin letters and do not
 even have to follow the ideographic composition square to line up with them
 (unlike standard kanas). So effectively their line breaking behavior is
 very different.

 Those half-width letters are in fact similar to linear jamos (not
 composed into syllabic squares) in the Korean script, and to Bopomofo
 letters. And may be we could add the CJK key letters (radicals used for
 example in IDS) to this list, or Yi radicals.

 They are harmonized to be used along with other alphabetic scripts. In
 fact they may even not be really half-width but proportional. They are
 also used with non-ideographic punctuation (notably the ASCII punctuation)
 and standard SPACE (U+0020).

 If rendered in vertical lines, they could be either rotated (just like
 Latin letters), or not (aligned horizontallly like letters in columns of
 crosswords, but they may also have proportional height, like in
 Latin/Greek/Cyrillic where it is sometimes needed for example with capital
 letters with stacked accents, or when using sized spaces)

 So IMHO, those half-width letters are in fact to be considered as
 another separate script, for typographic purpose. They are unified with
 non-halfwidth letters, only for collation with minor differences
 (plain-text searching and sorting).


 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp:

 Hi.

 http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic
 (ID).  Although full-width katakana is included in ID, half-width
 katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?

 Also, Conditional Japanese Starter (CJ,
 http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
 half-width variants such as half-width katakana letter small a.


 -- Makoto





Re: [Unicode] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread suzuki toshiya
Dear Philippe,

Philippe Verdy wrote:
 My feeeling is that half-width kanas behave like Latin letters and do not
 even have to follow the ideographic composition square to line up with them
 (unlike standard kanas). So effectively their line breaking behavior is
 very different.

Excuse me, do you mean that a half-width kana text should
have the spaces between the words, although full-width
(standard) kana text may not have? Could you tell me more
about the community preferring such distinction?

I think, the orthography proposed to write Japanese language
in Kana without Kanji has the word-breaking space, like,
http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png
but it is not officialized, and, it does not distinguish
full-width kana and half-width kana.

Regards,
mpsuzuki


 Those half-width letters are in fact similar to linear jamos (not
 composed into syllabic squares) in the Korean script, and to Bopomofo
 letters. And may be we could add the CJK key letters (radicals used for
 example in IDS) to this list, or Yi radicals.
 
 They are harmonized to be used along with other alphabetic scripts. In fact
 they may even not be really half-width but proportional. They are also
 used with non-ideographic punctuation (notably the ASCII punctuation) and
 standard SPACE (U+0020).
 
 If rendered in vertical lines, they could be either rotated (just like
 Latin letters), or not (aligned horizontallly like letters in columns of
 crosswords, but they may also have proportional height, like in
 Latin/Greek/Cyrillic where it is sometimes needed for example with capital
 letters with stacked accents, or when using sized spaces)
 
 So IMHO, those half-width letters are in fact to be considered as another
 separate script, for typographic purpose. They are unified with
 non-halfwidth letters, only for collation with minor differences
 (plain-text searching and sorting).
 
 
 2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp:
 
 Hi.

 http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic
 (ID).  Although full-width katakana is included in ID, half-width
 katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?

 Also, Conditional Japanese Starter (CJ,
 http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
 half-width variants such as half-width katakana letter small a.


 -- Makoto

 


Re: [Unicode] Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Philippe Verdy
I just gave an opinion about what I have seen. I don't know if this is
correct or preferred.
Half-width text is a modern invention that does not obey the traditions
used in CJK composition squares (which should also be rendered vertically
by default, even if today on the Internet this is not the case, it is still
the case for printed texts).
They started being used at the same time that Latin letters started to be
mixed in text, and computers appeared that offered only half-width
character cells in monospaced fonts (to see other ideographs, those old
computers needed to allocated two cells and use separate fonts for the left
side and the right side)

I don't know if whitespace is prefered or not in halfwidth text, I have
seen both...

2015-04-28 10:04 GMT+02:00 suzuki toshiya mpsuz...@hiroshima-u.ac.jp:

 Dear Philippe,

 Philippe Verdy wrote:
  My feeeling is that half-width kanas behave like Latin letters and do not
  even have to follow the ideographic composition square to line up with
 them
  (unlike standard kanas). So effectively their line breaking behavior is
  very different.

 Excuse me, do you mean that a half-width kana text should
 have the spaces between the words, although full-width
 (standard) kana text may not have? Could you tell me more
 about the community preferring such distinction?

 I think, the orthography proposed to write Japanese language
 in Kana without Kanji has the word-breaking space, like,

 http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Kana_no_Hikari,_number_1,_page_1.png
 but it is not officialized, and, it does not distinguish
 full-width kana and half-width kana.

 Regards,
 mpsuzuki


  Those half-width letters are in fact similar to linear jamos (not
  composed into syllabic squares) in the Korean script, and to Bopomofo
  letters. And may be we could add the CJK key letters (radicals used for
  example in IDS) to this list, or Yi radicals.
 
  They are harmonized to be used along with other alphabetic scripts. In
 fact
  they may even not be really half-width but proportional. They are also
  used with non-ideographic punctuation (notably the ASCII punctuation) and
  standard SPACE (U+0020).
 
  If rendered in vertical lines, they could be either rotated (just like
  Latin letters), or not (aligned horizontallly like letters in columns of
  crosswords, but they may also have proportional height, like in
  Latin/Greek/Cyrillic where it is sometimes needed for example with
 capital
  letters with stacked accents, or when using sized spaces)
 
  So IMHO, those half-width letters are in fact to be considered as
 another
  separate script, for typographic purpose. They are unified with
  non-halfwidth letters, only for collation with minor differences
  (plain-text searching and sorting).
 
 
  2015-04-28 4:20 GMT+02:00 Makoto Kato m_k...@ga2.so-net.ne.jp:
 
  Hi.
 
  http://www.unicode.org/reports/tr14/proposed.html#ID defines
 Ideographic
  (ID).  Although full-width katakana is included in ID, half-width
  katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?
 
  Also, Conditional Japanese Starter (CJ,
  http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
  half-width variants such as half-width katakana letter small a.
 
 
  -- Makoto
 
 



AW: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Dreiheller, Albrecht
No. They are still in use.

One typical usage of half-width kanas is the display of short texts on small 
devices of embedded systems, like status messages of control units,
for example a one-line display, 30 characters wide, monospace,  with 8x10 
pixels per character.

Albrecht

-Ursprüngliche Nachricht-
From: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Werner LEMBERG
Sent: Dienstag, 28. April 2015 10:09
To: verd...@wanadoo.fr
Cc: m_k...@ga2.so-net.ne.jp; unicode@unicode.org
Subject: Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

(...)

AFAIK, the existence of half-width kanas in Unicode is
purely for backwards and round-trip compatibility.




Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Makoto Kato
Hi, Suzuki-san.  Thank you for reply.

 At present, I have no objection to add halfwidth katakana
 to ideographic-class in UAX#14, but I'm unfamiliar with the
 (negative) impact caused by the lack of halfwidth katakana
 in it. Could you tell me if you know anything?

Since half-width katakana isn't ID, it isn't break line like
full-wdith katakana.

This is a sample for line break of half-width katakana.  (There is
good sample by web browser implementation)
http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html

Firefox and IE11 define half-width katakana as ID.  The line break of
half-width katakana is same as full-width katakana.
Chrome doesn't define it as ID.  Half-width katakana isn't line break
per character.

Although I read JIS X 4051, it doesn't define that half-width katakana
and full-width katakana are differently.


 I guess, the inclusion or exclusion in other classes, like,
 AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
 the appropriate line breaking, but the inclusion or exclusion
 in ID-class does not seem to be important. If the inclusion
 in ID-class is important, more characters (e.g. Bopomofo)
 should be considered for full coverage. How do you think of?

My discussion is why half-width katanaka character isn't same class of
full-width katakana character.  In this case, half-width katakana
originally defines as AL at current spec.  So when moving to ID, break
rule is strongly changed. (non-break - break before or after).


-- Makoto

On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya
mpsuz...@hiroshima-u.ac.jp wrote:
 Kato-san,

 At present, I have no objection to add halfwidth katakana
 to ideographic-class in UAX#14, but I'm unfamiliar with the
 (negative) impact caused by the lack of halfwidth katakana
 in it. Could you tell me if you know anything?

 I guess, the inclusion or exclusion in other classes, like,
 AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
 the appropriate line breaking, but the inclusion or exclusion
 in ID-class does not seem to be important. If the inclusion
 in ID-class is important, more characters (e.g. Bopomofo)
 should be considered for full coverage. How do you think of?

 Regards,
 mpsuzuki

 Makoto Kato wrote:
 Hi.

 http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic
 (ID).  Although full-width katakana is included in ID, half-width
 katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?

 Also, Conditional Japanese Starter (CJ,
 http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
 half-width variants such as half-width katakana letter small a.


 -- Makoto


Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread suzuki toshiya
# Sorry, I slipped to consider about the
# big picture attachment. I reduced the
# image size and resend to Unicode mailing
# list.

Kato-san,

Thank you very much for prompt response.

 This is a sample for line break of half-width katakana.  (There is
 good sample by web browser implementation)
 http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html

I wish if the sample text is longer to show
the line breaking behaviour. I attached
jugem.txt and the screenshot by Firefox and
Chromium.

 Firefox and IE11 define half-width katakana as ID.  The line break of
 half-width katakana is same as full-width katakana.
 Chrome doesn't define it as ID.  Half-width katakana isn't line break
 per character.

Oh, Google Chrome could not break half-width katakana
text by per-character line breaking! It is very good
example showing that the lack of explicit definition
caused the incompatibility (and inconvenience). I'm
sorry for troubling you about the explanation. I agree
with your proposal to add halfwidth katakana to ID-class,
even if further discussion is needed for other scripts.

Regards,
mpsuzuki

Makoto Kato wrote:
 Hi, Suzuki-san.  Thank you for reply.
 
 At present, I have no objection to add halfwidth katakana
 to ideographic-class in UAX#14, but I'm unfamiliar with the
 (negative) impact caused by the lack of halfwidth katakana
 in it. Could you tell me if you know anything?
 
 Since half-width katakana isn't ID, it isn't break line like
 full-wdith katakana.
 
 This is a sample for line break of half-width katakana.  (There is
 good sample by web browser implementation)
 http://mxr.mozilla.org/mozilla-central/source/layout/reftests/line-breaking/ja-3.html
 
 Firefox and IE11 define half-width katakana as ID.  The line break of
 half-width katakana is same as full-width katakana.
 Chrome doesn't define it as ID.  Half-width katakana isn't line break
 per character.
 
 Although I read JIS X 4051, it doesn't define that half-width katakana
 and full-width katakana are differently.
 
 
 I guess, the inclusion or exclusion in other classes, like,
 AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
 the appropriate line breaking, but the inclusion or exclusion
 in ID-class does not seem to be important. If the inclusion
 in ID-class is important, more characters (e.g. Bopomofo)
 should be considered for full coverage. How do you think of?
 
 My discussion is why half-width katanaka character isn't same class of
 full-width katakana character.  In this case, half-width katakana
 originally defines as AL at current spec.  So when moving to ID, break
 rule is strongly changed. (non-break - break before or after).
 
 
 -- Makoto
 
 On Tue, Apr 28, 2015 at 12:14 PM, suzuki toshiya
 mpsuz...@hiroshima-u.ac.jp wrote:
 Kato-san,

 At present, I have no objection to add halfwidth katakana
 to ideographic-class in UAX#14, but I'm unfamiliar with the
 (negative) impact caused by the lack of halfwidth katakana
 in it. Could you tell me if you know anything?

 I guess, the inclusion or exclusion in other classes, like,
 AI, AL, CJ, JL, JV, JT, SA might be quite important to realize
 the appropriate line breaking, but the inclusion or exclusion
 in ID-class does not seem to be important. If the inclusion
 in ID-class is important, more characters (e.g. Bopomofo)
 should be considered for full coverage. How do you think of?

 Regards,
 mpsuzuki

 Makoto Kato wrote:
 Hi.

 http://www.unicode.org/reports/tr14/proposed.html#ID defines Ideographic
 (ID).  Although full-width katakana is included in ID, half-width
 katakana (U+FF66 and U+FF71-U+FF9D) isn't.  Why?

 Also, Conditional Japanese Starter (CJ,
 http://www.unicode.org/reports/tr14/proposed.html#CJ) considers
 half-width variants such as half-width katakana letter small a.


 -- Makoto


ジュゲムジュゲムゴコウノスリキレカイジャリスイギョノスイギョウマツウンライマツフウライマツクウネルトコロニスムトコロヤブラコウジノブラコウジパイポパイポパイポノシューリンガンシューリンガンノグーリンダイグーリンダイノポンポコピーノポンポコï¾
…ーノチョウキュウメイノチョウスケ

Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Werner LEMBERG

 My feeeling is that half-width kanas behave like Latin letters and
 do not even have to follow the ideographic composition square to
 line up with them (unlike standard kanas).

It's exactly the half of the ideographic square.

 So effectively their line breaking behavior is very different.

Maybe.  However, the most important property is to be able to start a
new line after (almost) any half-width kana.

 They are harmonized to be used along with other alphabetic
 scripts. In fact they may even not be really half-width but
 proportional.

Do you have an example for that?  I've *exclusively* seen fonts where
half-width kanas are really half the CJK width.

 If rendered in vertical lines, they could be either rotated (just
 like Latin letters),

Actually, I haven't seen half-width kanas ever used in vertical
context.  Does this exist?

 So IMHO, those half-width letters are in fact to be considered as
 another separate script, for typographic purpose.

Yes, for typographic purposes.  But typographic issues are not covered
by Unicode.  AFAIK, the existence of half-width kanas in Unicode is
purely for backwards and round-trip compatibility.


Werner


Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Werner LEMBERG

 However, the most important property is to be able to start a new
 line after (almost) any half-width kana.

Bad formulation, sorry.  I mean:

  However, the most important property is to be able to break a line
  after (almost) any half-width kana.


Werner


Re: Why doesn't Ideographic (ID) in UAX#14 have half-width katakana?

2015-04-28 Thread Philippe Verdy
2015-04-28 10:09 GMT+02:00 Werner LEMBERG w...@gnu.org:

 Yes, for typographic purposes.  But typographic issues are not covered
 by Unicode.  AFAIK, the existence of half-width kanas in Unicode is
 purely for backwards and round-trip compatibility.

Yes, compatibility with typographic conventions.

And yes I have seen half-width text rendered vertically (always rotated:
I've not seen them for now aligned like in crosswords...).