I suggest you submit this via the Unicode reporting form. markus from phone/vacation On Jun 11, 2012 11:00 AM, "Koji Ishii" <[email protected]> wrote:
> Hello Unicoders, > > I suppose this is the correct list to discuss on UAX#14, please correct me > if I'm wrong. > > In short, as subject says, I would like to reconsider the Line Break > property for U+3000 IDEOGRAPHIC SPACE, so that it does not allow break > before. > > I wasn't part of original discussions for UAX#14, so I may be repeating > discussions that were already done. I apologize if that's the case, but I > hope I can provide some new information here. > > How to handle U+3000 IDEOGRAPHIC SPACE in line breaking is a little > controversial in East Asia, and not all applications handle the same way > today. Its primary reason I believe is the best method varies by > requirements. > > After doing several investigations and discussions, I think many I talked > to got a consensus that prohibiting break before is the best general > answer. Exactly which class to use is a bit unclear to me, I'd appreciate > anyone's advice but I guess it's a discussion after we agreed to make the > change, so I'm leaving it for now. > > Here's the background of the proposal and what I discussed with people. > > Many people here might already know but almost every traditional East > Asian word processors treated U+3000 as ID, and I'm guessing it is the > reason why UAX#14 defines so. Many, including I, agreed that it will give > the best editing experience for East Asian scripts. > > East Asian versions of MS Word took different approach though, primarily > due to its re-flow architecture. It might not be well-known, and is already > a past story, but up until Word 95, Word changed line breaks slightly when > its printer was changed for some good reasons at that point, so its > documents needed to look good even if line breaks were changed after the > author has sent it to someone else. > > And problem arose, because people did not want U+3000 appearing at the > beginning of a line as a result of such re-flow. ID give the best editing > experience, but it does not fit well for such re-flowable documents, and > the importance for U+3000 not appearing at the beginning of a line is > bigger than slightly better editing experience. The same issue is happening > today to other re-flowable documents such as HTML or EPUB. > > Ambrose told me that there's a same issue in Chinese, known as honorific > spaces[1]. We tried to find examples of line breaking behavior for > honorific spaces without luck, and then Kenny pointed out that authors will > adjust text so that it will not appear at the beginning of a line and > therefore we will not be able to find it[2]. > > I also had discussions with W3C I18N WG JLTF (who authored JLREQ,) > professional printers, and people working on EPUB in Japan for the ideal > behavior of U+3000 around line breaks. As I wrote above, there are more > than one best method depends on context, so discussion was a little long, > but we tried to find the best algorithm that works for all cases. Two > options were left; one is to mimic Word's behavior, and the other is to > prohibit break before. The two methods give almost the same level of > results, in some cases one is slightly superior than the other but in other > cases the opposite, and all agreed that either option is acceptable for all > cases we investigated. Word's behavior, however, requires slightly more > logic, and does not support honorific space scenario well. > > Given this result, and given the honorific space situation thanks to > Ambrose and Kenny, my conclusion is prohibiting break before is the best > option for everyone. It may be appropriate to allow tailoring to ID where > editing experience is more important and the document is known to never > re-flow, but the one I proposed here is more generic. > > Allow me to end my long e-mail with a couple of notes about situation of > browsers and the CSS WG. The actual browser implementation varies today. IE > implements similar behavior to Word. Firefox does as I propose here; i.e., > prohibit break before. WebKit and Opera handles as ID. So browsers are not > interoperable today, and I'm hoping to resolve this interoperability issue > with CSS Text Level 3[3]. CSS Text Level 3 is going to define line breaking > behavior for CSS, and my current thinking is to define the one I'm > proposing here. > > I appreciate UAX#14 so much and I hope UAX#14 and CSS Text Level 3 are in > sync, therefore I'm asking here to consider a change. > > Any opinions, thoughts, or discussions are appreciated, and your support > for this proposal is greatly appreciated in advance. > > [1] http://lists.w3.org/Archives/Public/www-style/2012Apr/0013.html > [2] http://lists.w3.org/Archives/Public/www-style/2012May/0106.html > [3] http://dev.w3.org/csswg/css3-text/ > > Regards, > Koji > > > >

