Re: BOM as WJ?

2003-11-21 Thread Asmus Freytag
At 05:52 AM 11/20/2003, Philippe Verdy wrote: We need a comprehensive new technical report that lists all the exceptions to the general category system, as these line-breaking or word-breaking or grapheme cluster breaking properties are orthogonal to the basic GC system and to the combining class

Re: BOM as WJ?

2003-11-21 Thread Asmus Freytag
At 05:44 AM 11/19/2003, Philippe Verdy wrote: However, a couple of paragraphs up, the definition for No-Break Space says: U+00A0 [No-Break Space] behaves like the following coded character sequence: U+FEFF [Zero Width No-Break Space] + U+0020 [Space] + U+FEFF [Zero Width No-Break Space].

Re: BOM as WJ?

2003-11-20 Thread Peter Kirk
On 19/11/2003 17:44, Philippe Verdy wrote: ... This trick doesn't work if any of the CC's are in combining class zero. Of course, but which combining character of combining class 0 does need to combine with NBSP in a way that affect renderers? Do you think about sequences like NBSP,CGJ? Or

Re: BOM as WJ?

2003-11-20 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] As for line breaking (UAX14), WJ explicitly prohibits this; ZWJ and ZWNJ are not listed, and so as Cf characters are ignored in the line breaking algorithm. I note also that the combining mark CGJ is listed as GL and so is not CM. The descriptive text of

BOM as WJ?

2003-11-19 Thread Pim Blokland
In the online 4.0 book, chapter 15 http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf the definition for Word Joiner says: Until Unicode 3.1.1, U+FEFF was the only code point with word joining semantics, but because it is more commonly used as byte order mark, the use of U+2060 [word

Re: BOM as WJ?

2003-11-19 Thread Philippe Verdy
From: Pim Blokland [EMAIL PROTECTED] However, a couple of paragraphs up, the definition for No-Break Space says: U+00A0 [No-Break Space] behaves like the following coded character sequence: U+FEFF [Zero Width No-Break Space] + U+0020 [Space] + U+FEFF [Zero Width No-Break Space]. Is this

Re: BOM as WJ?

2003-11-19 Thread Peter Kirk
On 19/11/2003 01:49, Pim Blokland wrote: In the online 4.0 book, chapter 15 http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf the definition for Word Joiner says: Until Unicode 3.1.1, U+FEFF was the only code point with word joining semantics, but because it is more commonly used as

Re: BOM as WJ?

2003-11-19 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] Does this equivalence hold when combining characters are applied to the NBSP? Is the sequence NBSP, CC (recommended for spacing diacritics, where CC is any sequence of combining characters) equivalent to ZWNBS, SP, ZWNBS, CC? Or should the equivalence be to

Re: BOM as WJ?

2003-11-19 Thread Philippe Verdy
From: Philippe Verdy [EMAIL PROTECTED] So, NBSP,CC must not be treated as if it was: WJ,SP,WJ,CC but really rather as: WJ,SP,CC,WJ Note here the inversion. The inversion here acts as if WJ was a combining character of combining class 256 (i.e. with a class higher than the combining

Re: BOM as WJ?

2003-11-19 Thread Peter Kirk
On 19/11/2003 16:26, Philippe Verdy wrote: From: Philippe Verdy [EMAIL PROTECTED] So, NBSP,CC must not be treated as if it was: WJ,SP,WJ,CC but really rather as: WJ,SP,CC,WJ Note here the inversion. The inversion here acts as if WJ was a combining character of combining class 256

Re: BOM as WJ?

2003-11-19 Thread Philippe Verdy
From: Peter Kirk [EMAIL PROTECTED] Of course this is not a standard normalization form, but using this pseudo combining class may help render the last two coded strings (in my quote above) equivalently in renderers. This works even in the case where there are multiple diacritics (noted CC1