I was going to make the following point earlier--maybe in light of Phil's conclusion I should do it now.
There seems to be a tendency not to distinguish between a(n orginal) character in the sense of character of a writing system, and a computer character. The former are visible symbols on a background medium. The latter are an entirely different set of symbols which to some extent parallel the former, and some extent do not. Space, control codes, etc. don't exist in the former, but exist in the latter because it was a convenient way to encode certain functions one wished to apply to the encoded other characters--the ones that correspond more or less to original writing system characters. These encoding sets have developed over time, and have consequently inherited all sorts of legacy issues, not all of which need supporting. Unicode provides tools. No one says one has to use them all. Specifically, the purpose of XeTeX and other such engines is to all for the nice typographical formatting of visual representations of script characters against some other defined background. From that point of view, so long as it does it, once it does it, it has achieved its goal. Transparency of all sorts of other things, providing input via PDF to other software isn't and shouldn't be a *primary* goal. That being said, no doubt it might be helpful to some to have this or that control character passed along. But that's not the essence of the exercise, and should only be done if it can be done cheaply, i.e. without a lot of risk to the primary objective. I guess the real question is that latter part. K >>> On Tue, Nov 15, 2011 at 4:45 PM, in message <[email protected]>, Philip TAYLOR <[email protected]> wrote: > > Ross Moore wrote: >> >> On 16/11/2011, at 5:56 AM, Herbert Schulz wrote: >> >>> Given that TeX (and XeTeX too) deal wit a non-breakble space already (where > we usually use the ~ to represent that space) it seems to me that XeTeX > should treat that the same way. >> >> No, I disagree completely. >> >> What if you really want the Ux00A0 character to be in the PDF? >> That is, when you copy/paste from the PDF, you want that character >> to come along for the ride. > > I'm not sure I entirely go along with this argument, Ross. > "What if you really want the \ character to be in the PDF", > or the "^" character, or the "$" character, or any character > that TeX currently treats specially ? Whilst I can agree > that there is considerable merit in extending XeTeX such > that it treats all of these "new", "special" characters > specially (by creating new catcodes, new node types and so > on), in the short term I can see no fundamental problem with > treating U+00A0 in such a way that it behaves indistinguishably > from the normal expansion of "~". >> >> In TeX ~ *simulates* a non-breaking space visually, but there is >> no actual character inserted. > > And I don't agree that a space is a character, non-breaking or not ! > > ** Phil. > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
