As Ken says the Unicode interlinear annotation characters are for internal use only. Specifically, their meanings can be different for different programs. If you have your nice marked up text in memory and want to export it for use by some program, you need to use a higher-level protocol that translates the interlinear annotation characters to a standardized external format, such as HTML. In addition to U+FFF9 - U+FFFB, there are other characters for internal use only, namely U+FDD0 - U+FDEF. The meanings of these characters also can (and do) differ for different programs. Originally it was hoped that the interlinear annotation characters might be able to describe ruby adequately, but it became clear that additional information is necessary to express ruby unambiguously. Hence the UTC adopted them for internal use only, with associated information presumably stored elsewhere to resolve the ambiguities.
Frankly IMHO the best thing for a program to do with reading such characters is to delete them. This isn't quite what one might think from the Standard since they unfortunately aren't labeled as noncharacters. But if a program uses them internally with a well defined meaning, getting them in from an external source can violate the internal usage. To actually roundtrip these "rogue" characters would require some extra internal protocol to ignore them when they've been read in. So my edit engine (RichEdit), which uses them for table row delimiters, simply deletes them on input and only exports them for RichEdit-specific contexts. Murray -----Original Message----- From: Michael Everson [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 7:52 AM To: [EMAIL PROTECTED] Cc: Ken Whistler Subject: Re: Furigana At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: >Ah, but read the caveats carefully. The Unicode interlinear annotation >characters are *not* intended for interchange, unlike the HTML4 <ruby> >tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor >points. What does this mean? That if I have a text all nice and marked up with furigana in Quark I can't export it to Word and reimport it in InDesign and expect my nice marked up text to still be marked up? Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? -- Michael Everson *** Everson Typography *** http://www.evertype.com

