Babel: Collation support and unicode version

Rémy Roy Sat, 04 Apr 2015 02:20:12 -0700

Hello,

I have started working on adding Collation support for Babel ( 
https://github.com/mitsuhiko/babel/issues/154 ). My goal is to add support 
for the Unicode Collation Algorithm (UCA) with most features (tailoring, 
normalization, numeric, custom strength level, backwards, case, alternate 
and variable) based off the Default Unicode Collation Element Table (DUCET) 
and the various locale tailored tables available in the CLDR.


Ideally, I would want to provide some recent version of UCA. One of my 
issue is that the in order to provide an implementation that conforms to 
the UCA, it must specify the UCA version it conforms to ( 
http://unicode.org/reports/tr10/#C4 ).

The UCA version is in sync with the version of the Unicode Standard. The 
UCA depends on various Unicode Standard Annexes (UAX #15: Unicode 
Normalization Forms, UAX #29: Unicode Text Segmentation, UAX #44: Unicode 
Character Database) which should match the same version as the Unicode 
Standard they were published with. Information contained in those documents 
is needed for the UCA and this information changes with the different 
versions.

Python already provide some of the Unicode information needed to implement 
the UCA in the unicodedata standard library. Unfortunately, that 
information is based on Unicode versions that vary widely depending on the 
Python version. Here is a few sample of this:

   - Python 3.4: unicodedata based of Unicode Character Database (UCD) 6.3
   - Python 3.3: unicodedata based of Unicode Character Database (UCD) 6.1
   - Python 3.2: unicodedata based of Unicode Character Database (UCD) 6.0
   - Python 3.1: unicodedata based of Unicode Character Database (UCD) 5.1
   - Python 2.7: unicodedata based of Unicode Character Database (UCD) 5.2
   - Python 2.6: unicodedata based of Unicode Character Database (UCD) 5.1

Implementing the UCA using the unicodedata library would mean to have 
different babel releases for all the different supported python version 
that have a different UCD version. The alternative would be to import the 
UCD within babel and avoid using the unicodedata library. Both of these 
solutions seem inelegant to me.

To make things worst, I believe each CLDR release is related to a specific 
version of Unicode or assume a specific version of Unicode is provided. 
Right now, babel is using CLDR version 23.1 in the master branch which was 
released after Unicode 6.2 was released but before Unicode 6.3 was released.

Do you have any suggestion or tips that would help solve this versioning 
problem?

Regards,

-- 
You received this message because you are subscribed to the Google Groups 
"pocoo-libs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/pocoo-libs.
For more options, visit https://groups.google.com/d/optout.

Babel: Collation support and unicode version

Reply via email to