On 23 Apr 2014, at 19:18, Mathias Bynens <[email protected]> wrote:

> http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax defines 
> ID_Start as:
> 
>> Characters having the Unicode General_Category of uppercase letters (Lu), 
>> lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other 
>> letters (Lo), letter numbers (Nl), minus Pattern_Syntax and 
>> Pattern_White_Space code points, plus stability extensions. Note that “other 
>> letters” includes ideographs.
> 
> What are the “stability extensions” this document refers to?
> 
> I noticed that parsing `DerivedCoreProperties.txt` for `ID_Start` leads to 
> slightly different results, than parsing `UnicodeData.txt` for category names 
> and then adding the categories together, minus `Pattern_Syntax` and 
> `Pattern_White_Space` which you can get by parsing `PropList.txt`.
> 
> For example, U+2118 SCRIPT CAPITAL P is included in `ID_Start` as per 
> `DerivedCoreProperties.txt`, but it doesn’t match any of the above 
> categories. Is this an example of such a “stability extension”, or was this 
> an oversight?

Here are the code points that match the respective property according to 
`DerivedCoreProperties.txt`, yet don’t match these properties if you’re 
adding/removing the categories manually based on the property definition in 
TR31.

`ID_Start`:

* U+2118
* U+212E
* U+309B
* U+309C

`ID_Continue`:

* U+00B7
* U+0387
* U+1369
* U+1370
* U+1371
* U+19DA

Why these differences?


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to