Hello Unicoders,

I suppose this is the correct list to discuss on UAX#14, please correct me if 
I'm wrong.

In short, as subject says, I would like to reconsider the Line Break property 
for U+3000 IDEOGRAPHIC SPACE, so that it does not allow break before.

I wasn't part of original discussions for UAX#14, so I may be repeating 
discussions that were already done. I apologize if that's the case, but I hope 
I can provide some new information here.

How to handle U+3000 IDEOGRAPHIC SPACE in line breaking is a little 
controversial in East Asia, and not all applications handle the same way today. 
Its primary reason I believe is the best method varies by requirements.

After doing several investigations and discussions, I think many I talked to 
got a consensus that prohibiting break before is the best general answer. 
Exactly which class to use is a bit unclear to me, I'd appreciate anyone's 
advice but I guess it's a discussion after we agreed to make the change, so I'm 
leaving it for now.

Here's the background of the proposal and what I discussed with people.

Many people here might already know but almost every traditional East Asian 
word processors treated U+3000 as ID, and I'm guessing it is the reason why 
UAX#14 defines so. Many, including I, agreed that it will give the best editing 
experience for East Asian scripts.

East Asian versions of MS Word took different approach though, primarily due to 
its re-flow architecture. It might not be well-known, and is already a past 
story, but up until Word 95, Word changed line breaks slightly when its printer 
was changed for some good reasons at that point, so its documents needed to 
look good even if line breaks were changed after the author has sent it to 
someone else.

And problem arose, because people did not want U+3000 appearing at the 
beginning of a line as a result of such re-flow. ID give the best editing 
experience, but it does not fit well for such re-flowable documents, and the 
importance for U+3000 not appearing at the beginning of a line is bigger than 
slightly better editing experience. The same issue is happening today to other 
re-flowable documents such as HTML or EPUB.

Ambrose told me that there's a same issue in Chinese, known as honorific 
spaces[1]. We tried to find examples of line breaking behavior for honorific 
spaces without luck, and then Kenny pointed out that authors will adjust text 
so that it will not appear at the beginning of a line and therefore we will not 
be able to find it[2].

I also had discussions with W3C I18N WG JLTF (who authored JLREQ,) professional 
printers, and people working on EPUB in Japan for the ideal behavior of U+3000 
around line breaks. As I wrote above, there are more than one best method 
depends on context, so discussion was a little long, but we tried to find the 
best algorithm that works for all cases. Two options were left; one is to mimic 
Word's behavior, and the other is to prohibit break before. The two methods 
give almost the same level of results, in some cases one is slightly superior 
than the other but in other cases the opposite, and all agreed that either 
option is acceptable for all cases we investigated. Word's behavior, however, 
requires slightly more logic, and does not support honorific space scenario 
well.

Given this result, and given the honorific space situation thanks to Ambrose 
and Kenny, my conclusion is prohibiting break before is the best option for 
everyone. It may be appropriate to allow tailoring to ID where editing 
experience is more important and the document is known to never re-flow, but 
the one I proposed here is more generic.

Allow me to end my long e-mail with a couple of notes about situation of 
browsers and the CSS WG. The actual browser implementation varies today. IE 
implements similar behavior to Word. Firefox does as I propose here; i.e., 
prohibit break before. WebKit and Opera handles as ID. So browsers are not 
interoperable today, and I'm hoping to resolve this interoperability issue with 
CSS Text Level 3[3]. CSS Text Level 3 is going to define line breaking behavior 
for CSS, and my current thinking is to define the one I'm proposing here.

I appreciate UAX#14 so much and I hope UAX#14 and CSS Text Level 3 are in sync, 
therefore I'm asking here to consider a change.

Any opinions, thoughts, or discussions are appreciated, and your support for 
this proposal is greatly appreciated in advance.

[1] http://lists.w3.org/Archives/Public/www-style/2012Apr/0013.html
[2] http://lists.w3.org/Archives/Public/www-style/2012May/0106.html
[3] http://dev.w3.org/csswg/css3-text/

Regards,
Koji



Reply via email to