On Sun, Sep 24, 2006 at 07:38:20PM -0400, Tom Lane wrote: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > I don't think that any of our SGML documentation is actually in UCS-4 > > encoding. > > The source files use nothing beyond plain ASCII (and should remain that > way, IMHO) so there isn't any need to inquire very far into exactly what > the toolchain thinks the "document encoding" is. The issue at hand here > is what the *output* character set is, which is to say the "document > character set" if I have the jargon right. That is the space over which > we are permitted to use &-entities.
What you're talking about is generally referred to as the "character repertoire", the abstract set of characters a document is considered to be composed of. For example: HTML4 (and XML IIRC) explicitly defines the "character repertoire" to be Unicode, even though the "character encoding" may only point to a subset of the total. Any others can be generated via the &xxx; escape syntax. I'm surprised about the difference in installations. I didn't use your -c option because that directory does not exist on my computer, but maybe that's all the difference... http://www.unicode.org/unicode/reports/tr17/ Have a nice day, -- Martijn van Oosterhout <email@example.com> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate.
Description: Digital signature