On 23 Apr 2014, at 19:18, Mathias Bynens <[email protected]> wrote: > http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax defines > ID_Start as: > >> Characters having the Unicode General_Category of uppercase letters (Lu), >> lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other >> letters (Lo), letter numbers (Nl), minus Pattern_Syntax and >> Pattern_White_Space code points, plus stability extensions. Note that “other >> letters” includes ideographs. > > What are the “stability extensions” this document refers to? > > I noticed that parsing `DerivedCoreProperties.txt` for `ID_Start` leads to > slightly different results, than parsing `UnicodeData.txt` for category names > and then adding the categories together, minus `Pattern_Syntax` and > `Pattern_White_Space` which you can get by parsing `PropList.txt`. > > For example, U+2118 SCRIPT CAPITAL P is included in `ID_Start` as per > `DerivedCoreProperties.txt`, but it doesn’t match any of the above > categories. Is this an example of such a “stability extension”, or was this > an oversight?
Here are the code points that match the respective property according to `DerivedCoreProperties.txt`, yet don’t match these properties if you’re adding/removing the categories manually based on the property definition in TR31. `ID_Start`: * U+2118 * U+212E * U+309B * U+309C `ID_Continue`: * U+00B7 * U+0387 * U+1369 * U+1370 * U+1371 * U+19DA Why these differences? _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

