From: "Peter Kirk" <[EMAIL PROTECTED]> > > Software developers, or applications, are not supposed to be party to > the agreement between *users*.
Do you say there that software developers are failing to comply with Unicode rules by refusing to develop systems that allow *users* to make such private private agreements and use the PUAs effectively as they are legitimately in right to ask to their software developers? Interesting point. This would be an argument for the developement (out of Unicode) of some standard technical solutions to exchange these private conventions on PUA usage, including exchange of character properties, etc... Why not then within fonts -- namely in Opentype tables for fonts built with these PUA assignments? If so, a fully Unicode-compliant system should offer ways to allow interchange of data between parties of these private agreements, and ensure that the PUA encoding conventions are isolated and kept within the domain of the private agreement (for example by labelling documents, with tags containing a URI, either by out of band encoding in rich text formats such as XML or precomposed PDF files, oe either in band within the encoded text using special tags, in a way similar to language tags, but currently Unicode has not defined such an area in plane 14 for other use than just language tags). I note however that language tags (even if they are discouraged by Unicode) are not deprecated and that they could even be used according to the RFC 3066 encoding format or one of its extensios, to cover as well additional attributes identifying private communities sharing a common agreement. So with Unicode language tags containing a standard language code and attributes such extension would become possible if Unicode explicits less ambiguously how to handle documents containing Language tags (notably for their application scope within the encoded documents). When a plain text document would be later converted to some rich text format, the language tag could be extracted and put of band within some XML schema to describe the semantic of the encoded plain-text fragments containing PUAs, within their restricted scope. So instead of identifying PUAs only with thir codepoint (which is bound to a unique namespace), they would be identified within a namespace made of the private agreement URI, and the codepoint (quite similar to the concept of namespaces in XML, where all entities are named within a well defined scope). One way to cope with this would be then to reserve and bind all non-PUA and all invalid codepoints in all possible namespaces, to the Unicode.org namespace. There's a way to make those PUAs easily manageable by users: - let each user have a registry of PUA agreements (identified in interchanges by their URI). If the user accepts this agreement, it is recorded in that user's registry - the registry will map each described Unicode PUA codepoint to non-Unicode codepoints (for example in the larger 31-bit space which was originally defined for ISO 10646). These internal mappings will allow local-only management of these encoded strings. For all interchanges, all non-Unicode codepoints (out of the 17 first planes), will be looked up in the user database that will remap this 32-bit codepoint into the URI + the 21-bit Unicode PUA, so that either a plain-text document can be regenerated using language-tags tagging, or using XML attributes or either rich text format... - for local document handling, UTF-8 (the original version!) or UTF-32 could be used to easily manage all private character properties, without colliding with PUAs used in other private agreements or with other standard Unicode codepoints. Such solution would have the additional effect that it will greatly reduce the number of PUAs needed in Unicode and each one can use them the way he wants with its own sets of character properties (including by overriding the default combining classes and canonical decompositions!). No need to split the PUA space which is really large enough with more than 135,000 codepoints, to allow encoding any single private agreement. The difficulties will be in the way to describe this agreement within a URI: what should that URI provide? If it's a URL, it could be the one of a XML document describing the set of conventions and properties tables and sets of suggested or required fonts... The problem is then to create and maintain a schema that allows describing these conventions. Such schema should allow containing at least all the properties that already described in Unicode, plus some other private data or tables. The next complexity will be when one wants to extend and agreement to allow migrating data from one private convention to another one. This looks exactly like describing a transliteration scheme working within the larger local-only 31-bit space... And it can be as complex as in other stateful transliteration schemes, or as simple as when mapping legacy 8-bit sets to Unicode. (using simple stateless mappings).