Hello, I have started working on adding Collation support for Babel ( https://github.com/mitsuhiko/babel/issues/154 ). My goal is to add support for the Unicode Collation Algorithm (UCA) with most features (tailoring, normalization, numeric, custom strength level, backwards, case, alternate and variable) based off the Default Unicode Collation Element Table (DUCET) and the various locale tailored tables available in the CLDR.
Ideally, I would want to provide some recent version of UCA. One of my issue is that the in order to provide an implementation that conforms to the UCA, it must specify the UCA version it conforms to ( http://unicode.org/reports/tr10/#C4 ). The UCA version is in sync with the version of the Unicode Standard. The UCA depends on various Unicode Standard Annexes (UAX #15: Unicode Normalization Forms, UAX #29: Unicode Text Segmentation, UAX #44: Unicode Character Database) which should match the same version as the Unicode Standard they were published with. Information contained in those documents is needed for the UCA and this information changes with the different versions. Python already provide some of the Unicode information needed to implement the UCA in the unicodedata standard library. Unfortunately, that information is based on Unicode versions that vary widely depending on the Python version. Here is a few sample of this: - Python 3.4: unicodedata based of Unicode Character Database (UCD) 6.3 - Python 3.3: unicodedata based of Unicode Character Database (UCD) 6.1 - Python 3.2: unicodedata based of Unicode Character Database (UCD) 6.0 - Python 3.1: unicodedata based of Unicode Character Database (UCD) 5.1 - Python 2.7: unicodedata based of Unicode Character Database (UCD) 5.2 - Python 2.6: unicodedata based of Unicode Character Database (UCD) 5.1 Implementing the UCA using the unicodedata library would mean to have different babel releases for all the different supported python version that have a different UCD version. The alternative would be to import the UCD within babel and avoid using the unicodedata library. Both of these solutions seem inelegant to me. To make things worst, I believe each CLDR release is related to a specific version of Unicode or assume a specific version of Unicode is provided. Right now, babel is using CLDR version 23.1 in the master branch which was released after Unicode 6.2 was released but before Unicode 6.3 was released. Do you have any suggestion or tips that would help solve this versioning problem? Regards, -- You received this message because you are subscribed to the Google Groups "pocoo-libs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/pocoo-libs. For more options, visit https://groups.google.com/d/optout.
