From: "Michael Everson" <[EMAIL PROTECTED]> > Japanese is different; the users all use both scripts all the time.
And there are occurences in Japanese of Katakana suffixes or particules added to Latin or Han words, notably to people names and trademarks... I've seen many texts where Han and Katakana are mixed in the same "word" (where it would be inappropriate to insert a word-break between runs of Han and Katakana particules.) My first implementation allowed line-breaks after each Han character, but an exception was made after users request to not do that after Han and before Katakana (despite line break is allowed between two Han characters), or after Latin and Katakana. So a simple approache that allows linebreaks between distinct scripts is deceptive. Am I wrong, or are my users wrong and want it as a presentation preference? Also, what about line breaking in long runs of Hangul grapheme clusters (I mean here the true L+V*T* syllables with their diacritics, not the simplified LV and LVT sub-syllables encoded in Hangul)? It seems that line breaking in Korean obeys more to semantics constraints than to normative syllables, and I think it is quite logical when you see that such presentation is sometimes prefered by Latin readers too... To make this work appropriately for some long Japanese or Korean sentences, and match with users expectations, I had to support explicitly marks where line-breaks should be allowed, using zero-width spaces. This makes things complicate if the text is not modified with them. So I had to consider ideographic (full-width) punctuation too (which is not directly equivalent to their half-width Latin counter-part, as they already include the space after them (for example the full-width period/dot, comma or colon) even if the glyph looks a bit larger.

