Re: Questions about UAX #29

Mark Davis ☕ Wed, 06 Jul 2011 13:39:42 -0700

I wouldn't be adverse to adding [:cn:][:cs:][:co:] to [:gcb:control:]. It
would make it align more with the current definition of Grapheme_Base.


As to how to handle private use characters, UAX #29 already allows
overriding:

"This specification defines *default* mechanisms; more sophisticated
implementations can *and should* tailor them for particular locales or
environments."

I'll file an agenda item for the August UTC meeting to consider this; you
can also add your feedback to the UTC using the reporting form.

Mark
*— Il meglio è l’inimico del bene —*


On Tue, Jul 5, 2011 at 16:31, Karl Williamson <[email protected]>wrote:

> On 07/05/2011 09:29 AM, Mark Davis ☕ wrote:
>
>> Ah, you're right; I wasn't looking carefully enough at what you wrote.
>>
>> Yes, an unassigned code point (Cn) is treated as a base character.
>>
>> Unassigned code points are peculiar beasts, since we don't know really
>> how they should behave until (and if) they are assigned. Their treatment
>> by  the Unicode algorithms varies based on some factors:
>>
>>    * safety - don't have them behave in a way that causes problems
>>    * foresight - have them behave like the most likely candidate for
>>      future assignment
>>    * simplicity - since they shouldn't occur normally in text, don't
>>      spend too much time worrying about them.
>>
>> These are not formalized principles, just my observations on how we've
>> operated over the years.
>>
>> Mark
>> /— Il meglio è l’inimico del bene —/
>>
>
> Thanks for the answer.  It does seem weird to me to treat them as base
> characters.
>
> But, I'm wondering then about Cs, isolated surrogates.  They also are
> treated as base characters.  That seems wrong to me.  Since UTS18 is
> starting to mention the possibility of them in regexes, perhaps this should
> be addressed?
>
> Also, my understanding of UAX #44 is that private use code points may or
> may not be treated as base characters at the application's discretion. But
> this isn't mentioned in UAX#29.
>

Re: Questions about UAX #29

Reply via email to