[whatwg] Hyphenation
Hyphenation does not seem to have been discussed on this list so far, and I think it should be. General discussion: [1] http://www.w3.org/International/O-HTML-hyphenation.html Old proposal: [2] http://www.nada.kth.se/i18n/html/hyph.html Babel (LaTeX i18n package) documentation: [3] ftp://tug.ctan.org/pub/tex-archive/macros/latex/required/babel/user.pdf Unicode Technical Report #14 -- Line Breaking Properties: [4] http://www.unicode.org/reports/tr14/tr14-6.html In summary, hyphenation is a hard problem: breaking points cannot in general be established algorithmically; hyphenation dictionaries are not always available and typically do not contain long/rare/complex words (the ones that really need to be hyphenated); furthermore, distinct words may be spelt identically, but still need to be hyphenated differently; and several languages require spelling changes when words are hyphenated ([3] mentions Dutch, German (alte Rechtschreibung), Spanish, Norwegian, Swedish and Hungarian). The controversy surrounding the meaning of (U+00AD) is probably over, although Opera currently seems not to render this character in accordance with Unicode (IE7 and Safari seem to do the right thing; Firefox does not hyphenate at all). [4] contains the following passage: > SHY is rendered invisibly and has no width, except at a line break. The > rendering of the soft hyphen depends on the script. For the Latin script > it is rendered as a hyphen, however, some languages require a change > in spelling surrounding an optional hyphen, if it occurs at a line break. > For example in Swedish the word “tuggummi” changes to “tugg-gummi” > when hyphenated. It is not clear to me how this last point is supposed to be implemented in practice, however. (It is certainly n o t the case that `gg' should be hyphenated `gg- g' in a l l Swedish words.) The proposal [2] suggests the addition of a new element, modelled after TeX's \discretionary command (with a possibly superfluous addition), that permits to specify which characters to render before/after a line break if the word is broken. Currently, hyphenation and justification are scarce on the Web, and the average blogger hardly misses these features. If, however, writing books in HTML (as mentioned on this list) is to become commonplace, these issues must be dealt with somehow, and explicit markup seems to be unavoidable at least in some cases. I hope this can lead to a fruitful discussion. -- Øistein E. Andersen
Re: [whatwg] Ruby markup - Furigana
Henri Sivonen wrote: For now, to provoke comments supporting or refuting my totally uninformed hypothesis, I am assuming that example of the reading not really being the reading (は vs. わ) as explained at http://www.w3.org/TR/ruby/#non-visual is not a real problem (see below). Or at least that leaving the problem unsolved and the default aural rendering slightly wrong in some cases is better than saying that ruby cannot have a reasonable default aural rendering at all. Yeah, I'd agree that that's not a real problem. All that paragraph is saying is that you can't just sound out ruby annotations, just like you can't just sound out base text, because kana and bopopmofo are not a linguist's phonetic annotation system, they're writing systems and all that implies. :) ~fantasai
Re: [whatwg] Ruby markup - Furigana Re: Presentational safety valves
Henri Sivonen wrote: On Jan 4, 2007, at 12:05, Karl Dubost wrote: Le 4 janv. 2007 à 18:41, Henri Sivonen a écrit : It doesn't matter much. It is rather clear that the ruby markup is intended for a particular Chinese and Japanese typographical device. You'd use the markup whenever you want to use that typographical device. Bothering authors with what they profoundly mean when they use the typographical device isn't particularly helpful. Furigana is an annotation system. And essential for learning the language at school. Or read the kanjis that are too difficult to be known when browsing. Right, but my point is that authors will use the ruby markup when they want the furigana typographic effect. It isn't helpful to insist on a particular semantic scope like, for example, requiring the ruby base to be considered "difficult kanji". Right. I have even seen cases where ruby is used to annotate English words (base) with Japanese Kanji (ruby): http://fantasai.inkedblade.net/style/discuss/directions/scans/genji2 Ruby is a nifty annotation system if you want to mark up words in parallel, as for pronunciation, or word-by-word translation, or grammatical labelling, etc. The key difference from other annotation systems is that it can be word-for-word without being awkward. (Imagine doing this with footnotes.) ~fantasai
[whatwg] contenteditable, and
Hi, The contenteditable spec says: Insert, and wrap text in, semantic elements UAs should offer a way for the user to mark text as having stress emphasis and as being important, and may offer the user the ability to mark text and blocks with other semantics. I think it is no surprise that most UAs will implement this as emitting for CTRL+I and for CTRL+B, or similar interfaces that imply that the user actually requested italics or bold with (to the UA) unknown intended semantics. (IE and Opera emit and , Safari emits class="Apple-style-span" style="font-weight: bold;">.) I think and should be emitted instead, and the above text should reflect that. Regards, Simon Pieters _ Fynda charter till solen http://www.msn.se/resor/