Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-07-05 Thread Markus Scherer
On Wed, Jul 4, 2012 at 6:12 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I've automated the check and have something like a 6 page list of anomalies in level 4 weights, with anomalies for DUCET and for the CLDR root locale. From what I have heard, the level-4 weights in

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-07-04 Thread Richard Wordingham
On Fri, 25 May 2012 12:34:01 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I spotted two differences flicking through the end of the differences - Nice work! Please submit your findings via

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-25 Thread Markus Scherer
On Thu, May 24, 2012 at 5:36 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I spotted two differences flicking through the end of the differences - Nice work! Please submit your findings via the Unicode reporting formhttp://www.unicode.org/reporting.html . As ICU does not load

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-24 Thread Richard Wordingham
On Wed, 23 May 2012 17:47:09 -0700 Markus Scherer markus@gmail.com wrote: Also, I just saw that http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.zipcontains allkeys_CLDR.txt which should correspond 1:1 with the FractionalUCA*.txt in the same .zip file. One format difference:

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Markus Scherer
On Tue, May 22, 2012 at 2:22 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I can dig up the ICU code that computes the collation case bits for a string. It would be helpful. I can't see well enough how the data gets in. I found the code that computes the case bits (2

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 10:35:46 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 22, 2012 at 2:22 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: I found the code that computes the case bits (2 bits for lower/mixed/upper) for building ICU tailorings. Search for

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Markus Scherer
On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: While we're picking on that poor routine - it looks as though it could come unstuck with kana in the supplementary planes - the Kana Supplement, and possibly also the Enclosed Ideographic Supplement.

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 15:50:24 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: While we're picking on that poor routine - it looks as though it could come unstuck with kana in the supplementary

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Markus Scherer
On Wed, May 23, 2012 at 5:17 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: Is there a definition of the precise relationship between DUCET and FractionalUCA.txt, or does FractionalUCA.txt define the relationship? See

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 17:47:09 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 5:17 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: The order of code points and contractions as listed in FractionalUCA.txt and allkeys.txt should be the same, except for

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Richard Wordingham
On Wed, 23 May 2012 15:50:24 -0700 Markus Scherer markus@gmail.com wrote: On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: Is there a definition of the precise relationship between DUCET and FractionalUCA.txt, or does FractionalUCA.txt

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-23 Thread Markus Scherer
On Wed, May 23, 2012 at 7:19 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: A practical example is the four contractions I've proposed to restore the collation of Tibetan vowels following a subscript RA. If they're added to DUCET, will they automatically be included in the

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-22 Thread Richard Wordingham
On Mon, 21 May 2012 17:07:33 -0700 Markus Scherer markus@gmail.com wrote: In principle, it's straightforward: Lowercase and uppercase follow Unicode (UCD) case properties. We distinguish an intermediate mixed case for titlecase characters and mixed-case contractions. I believe we also

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-22 Thread Markus Scherer
On Tue, May 22, 2012 at 1:09 AM, Richard Wordingham richard.wording...@ntlworld.com wrote: On Mon, 21 May 2012 17:07:33 -0700 Markus Scherer markus@gmail.com wrote: In principle, it's straightforward: Lowercase and uppercase follow Unicode (UCD) case properties. We distinguish an

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-22 Thread Richard Wordingham
On Tue, 22 May 2012 08:33:43 -0700 Markus Scherer markus@gmail.com wrote: On Tue, May 22, 2012 at 1:09 AM, Richard Wordingham richard.wording...@ntlworld.com wrote: On Mon, 21 May 2012 17:07:33 -0700 Markus Scherer markus@gmail.com wrote: I can dig up the ICU code that

CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Richard Wordingham
What are the definitions of upper and lower case for the caseFirst tailoring for the UCA and for LDML? I can't find any obvious definition. My suspicion is that they are defined by assignment of the DUCET tertiary weights, UTS#10 Issue 23 (Version 6.1.0) Section 7.2. Although these largely

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Markus Scherer
On Mon, May 21, 2012 at 4:37 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: What are the definitions of upper and lower case for the caseFirst tailoring for the UCA and for LDML? I can't find any obvious definition. I am having trouble finding a published definition too. I

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Ken Whistler
On 5/21/2012 4:37 PM, Richard Wordingham wrote: Again, even the interpretation of uppercase in terms of weights is not certain, for the ISO/IEC 14651:2007 example of a tailoring for uppercase first does not adjust the collation elements with a tertiary weight of 1C, although they are listed as

Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

2012-05-21 Thread Richard Wordingham
On Mon, 21 May 2012 17:43:27 -0700 Ken Whistler k...@sybase.com wrote: For example, when caseFirst is set to uppercase, ICU orders U+1D34 MODIFIER LETTER CAPITAL H before U+0068 LATIN SMALL LETTER H, but anomalously order U+A7F8 MODIFIER LETTER CAPITAL H WITH STROKE*after* U+0127 LATIN