[whatwg] Hyphenation

2007-01-08 Thread Øistein E . Andersen
Hyphenation does not seem to have been discussed on this list so far, and I 
think
it should be.


General discussion:
[1] http://www.w3.org/International/O-HTML-hyphenation.html

Old proposal:
[2] http://www.nada.kth.se/i18n/html/hyph.html

Babel (LaTeX i18n package) documentation:
[3] ftp://tug.ctan.org/pub/tex-archive/macros/latex/required/babel/user.pdf

Unicode Technical Report #14 -- Line Breaking Properties:
[4] http://www.unicode.org/reports/tr14/tr14-6.html


In summary, hyphenation is a hard problem: breaking points cannot in general
be established algorithmically; hyphenation dictionaries are not always 
available
and typically do not contain long/rare/complex words (the ones that really
need to be hyphenated); furthermore, distinct words may be spelt identically,
but still need to be hyphenated differently; and several languages require 
spelling
changes when words are hyphenated ([3] mentions Dutch, German (alte
Rechtschreibung), Spanish, Norwegian, Swedish and Hungarian).

The controversy surrounding the meaning of ­ (U+00AD) is probably over,
although Opera currently seems not to render this character in accordance with
Unicode (IE7 and Safari seem to do the right thing; Firefox does not hyphenate
at all).

[4] contains the following passage:
> SHY is rendered invisibly and has no width, except at a line break. The
> rendering of the soft hyphen depends on the script. For the Latin script
> it is rendered as a hyphen, however, some languages require a change
> in spelling surrounding an optional hyphen, if it occurs at a line break.
> For example in Swedish the word “tuggummi” changes to “tugg-gummi”
> when hyphenated.

It is not clear to me how this last point is supposed to be implemented in 
practice,
however. (It is certainly  n o t  the case that `gg' should be hyphenated `gg-
g' in  a l l  Swedish words.)

The proposal [2] suggests the addition of a new  element, modelled after
TeX's \discretionary command (with a possibly superfluous addition), that 
permits
to specify which characters to render before/after a line break if the word is 
broken.

Currently, hyphenation and justification are scarce on the Web, and the average
blogger hardly misses these features. If, however, writing books in HTML
(as mentioned on this list) is to become commonplace, these issues must be
dealt with somehow, and explicit markup seems to be unavoidable at least in some
cases.


I hope this can lead to a fruitful discussion.

-- 
Øistein E. Andersen


Re: [whatwg] Ruby markup - Furigana

2007-01-08 Thread fantasai

Henri Sivonen wrote:


For now, to provoke comments supporting or refuting my totally 
uninformed hypothesis, I am assuming that example of the reading not 
really being the reading (は vs. わ) as explained at 
http://www.w3.org/TR/ruby/#non-visual is not a real problem (see below). 
Or at least that leaving the problem unsolved and the default aural 
rendering slightly wrong in some cases is better than saying that ruby 
cannot have a reasonable default aural rendering at all.


Yeah, I'd agree that that's not a real problem. All that paragraph
is saying is that you can't just sound out ruby annotations, just
like you can't just sound out base text, because kana and bopopmofo
are not a linguist's phonetic annotation system, they're writing
systems and all that implies. :)

~fantasai


Re: [whatwg] Ruby markup - Furigana Re: Presentational safety valves

2007-01-08 Thread fantasai

Henri Sivonen wrote:

On Jan 4, 2007, at 12:05, Karl Dubost wrote:


Le 4 janv. 2007 à 18:41, Henri Sivonen a écrit :
It doesn't matter much. It is rather clear that the ruby markup is 
intended for a particular Chinese and Japanese typographical device. 
You'd use the markup whenever you want to use that typographical 
device. Bothering authors with what they profoundly mean when they 
use the typographical device isn't particularly helpful.


Furigana is an annotation system.
And essential for learning the language at school.
Or read the kanjis that are too difficult to be known when browsing.


Right, but my point is that authors will use the ruby markup when they 
want the furigana typographic effect. It isn't helpful to insist on a 
particular semantic scope like, for example, requiring the ruby base to 
be considered "difficult kanji".


Right. I have even seen cases where ruby is used to annotate English words
(base) with Japanese Kanji (ruby):

  http://fantasai.inkedblade.net/style/discuss/directions/scans/genji2

Ruby is a nifty annotation system if you want to mark up words in parallel,
as for pronunciation, or word-by-word translation, or grammatical labelling,
etc. The key difference from other annotation systems is that it can be
word-for-word without being awkward. (Imagine doing this with footnotes.)

~fantasai


[whatwg] contenteditable, and

2007-01-08 Thread Simon Pieters

Hi,

The contenteditable spec says:

  Insert, and wrap text in, semantic elements

 UAs should offer a way for the user to mark text as
 having stress emphasis and as being important, and
 may offer the user the ability to mark text and blocks
 with other semantics.

I think it is no surprise that most UAs will implement this as emitting  
for CTRL+I and  for CTRL+B, or similar interfaces that imply that the 
user actually requested italics or bold with (to the UA) unknown intended 
semantics. (IE and Opera emit  and , Safari emits class="Apple-style-span" style="font-weight: bold;">.) I think  and  
should be emitted instead, and the above text should reflect that.


Regards,
Simon Pieters

_
Fynda charter till solen http://www.msn.se/resor/