Unicode CLDR 34 beta available for testing

2018-10-04 Thread Rick McGowan via Unicode
The *beta* version of Unicode CLDR 34 
 is available for 
testing. The final release is expected on October 12.


CLDR 34 provides an update to the key building blocks for software 
supporting the world’s languages. This data is used by all major 
software systems  for 
their software internationalization and localization, adapting software 
to the conventions of different languages for such common software tasks.


CLDR 34 included a full Survey Tool data collection phase. Other 
enhancements include several changes to prepare for the new Japanese 
calendar era starting 2019-05-01; updated emoji names, annotations, 
collation and grouping; and other specific fixes. The draft release page 
at http://cldr.unicode.org/index/downloads/cldr-34 lists the major 
features, and has pointers to the newest data and charts. It will be 
fleshed out over the coming weeks with more details, migration issues, 
known problems, and so on. Particularly useful for review are:


   * Delta Charts 
 - the data that changed during the release
   * By-Type Charts
  - a
 side-by-side comparison of data from different locales
   * Annotation Charts
  - new
 emoji names and keywords

Please report any problems that you find using a CLDR ticket 
. We’d also appreciate it if 
programmatic users of CLDR data download the xml files and do a trial 
integration to see if any problems arise.





Re: Dealing with Georgian capitalization in programming languages

2018-10-04 Thread Martin J. Dürst via Unicode

Ken, Markus,

Many thanks for your ideas, which I noted at
https://bugs.ruby-lang.org/issues/14839.

Regards,   Martin.

On 2018/10/03 06:43, Ken Whistler wrote:


On 10/2/2018 12:45 AM, Martin J. Dürst via Unicode wrote:



My questions here are:
- Has this been considered when Georgian Mtavruli was discussed in the
  UTC?

Not explicitly, that I recall. The whole issue of titlecasing came up 
very late in the preparation of case mapping tables for Mtavruli and 
Mkhedruli for 11.0.


But it seems to me that the problem you are citing can be avoided if you 
simply rethink what your "capitalize" means. It really should be 
conceived of as first lowercasing the *entire* string, and then 
titlecasing the *eligible* letters -- i.e., usually the first letter. 
(Note that this allows for the concept that titlecasing might then be 
localized on a per-writing-system basis -- the issue would devolve to 
determining what the rules are for "eligible" letters.) But the simple 
default would just be to titlecase the initial letter of each "word" 
segment of a string.


Note that conceived this way, for the Georgian mappings, where the 
titlecase mapping for Mkhedruli is simply the letter itself, this 
approach ends up with:


capitalize(mkhedrulistring) --> mkhedrulistring

capitalize(MTAVRULISTRING) ==> titlecase(lowercase(MTAVRULISTRING)) --> 
mkhedrulistring


Thus avoiding any mixed case.