Peter responded to Mark: > On 05/08/2003 14:40, Mark Davis wrote: > > >Where did you get the notion that space is not a base character? And > >base characters include those that are not control or format > >characters. Space is neither one. > > > >The standard specifically states in a number of places that to exhibit > >a combining mark in isolation you use a space (or NBSP). > > > >Mark > >__________________________________ > >http://www.macchiato.com > >► “Eppur si muove” ◄ > > > > > > > I got this from the Unicode Standard 4.0, as quoted by Jim Allan:
*Mis*quoted by Jim Allan. > > > In http://www.unicode.org/book/preview/ch03.pdf the space characters > > in general are given class Zs: > > > > << Zs, Zl, and Zp are considered format characters, but their > > membership in the Z (separator) class takes precedence over their > > membership in the Cf class, because the General Category assigns only > > a single value to each character. >> That piece of text is *NOT* a quotation from Chapter 3 of Unicode 4.0. Go to that URL and search for it yourself. It is quoted from Chapter 4 of Unicode *3.0*, p. 88, in the discussion of General Category in Section 4.5, "General Category -- Normative in Part". The corresponding paragraph has been deleted from the relevant section in Unicode 4.0, precisely because the standard now precisely defines format control characters as {Cf, Zl, Zp} but *ex*cluding Zs. See p. 25 in: http://www.unicode.org/book/preview/ch02.pdf > > > > So the various space characters (class Zs) are also classified as > > format characters. > > > > From http://www.unicode.org/book/ch04.pdf: > > > > << _D13 Base character:_ a character that does not graphically > > combine with preceding character, and that is neither control nor a > > format character. >> > > > > Accordingly, by definition, spaces are not base characters. This conclusion is false. As Mark indicated, SPACE (and NBSP) are base characters, and have been treated as such in terms of diacritic application since Unicode 1.0 was published: "By convention, diacritical marks used by the Unicode encoding scheme may be exhibited in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This might be done, for example, when talking about the diacritical mark itself as a mark, rather than using it in its normal way in text." -- Unicode 1.0, p. 19 [1991] And that *is* an accurate quote from the standard. In Unicode 4.0 that text survives as: "By convention, diacritical marks used by the Unicode Standard may be exhibited in (apparent) isolation by applying them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This tactic might be employed, for example, when talking about the diacritical mark itself as a mark, rather than using it in its normal way in text." -- Unicode 4.0, p. 46 [2003] I'd say the intent of the UTC and the Unicode Standard in this regard has always been rather clear and has stayed unchanged for quite some time. --Ken