subject:"Internationalization\: Normalization and canonical equivalence in string comparison"

Re: Internationalization: Normalization and canonical equivalence in string comparison

2012-06-21 Thread Norbert Lindenberg

I'm afraid it's not quite so simple. The Internationalization API spec defines 
localeCompare() as a wrapper around Intl.Collator.prototype.compare, so to make 
normalization mandatory for localeCompare, we'd have to make it mandatory for 
Collator as well. I'd like to get some input from implementors whether that 
makes sense, or whether they're planning to implement canonical equivalence in 
some other way.

Thanks,
Norbert


On Jun 19, 2012, at 10:37 , Gillam, Richard wrote:

 Norbert--
 
 The ECMAScript Internationalization API Specification currently has 
 normalization as an optional feature in collation. However, it requires that 
 the compare function return 0 when comparing Strings that are considered 
 canonically equivalent by the Unicode standard. Canonical equivalence, I 
 thought, is usually implemented through normalization. Does it make sense to 
 keep normalization as a separate and optional feature then? Is anybody 
 planning to implement canonical equivalence through other mechanisms, such 
 that the lack of normalization would be visible in the comparison of 
 non-equivalent strings?
 
 For what little it may be worth, I think it would make sense to just make 
 normalization mandatory in localeCompare().  Of course, I don't know if that 
 causes trouble for anybody (I'm pretty sure it doesn't for me).
 
 --Rich Gillam
  Lab126
 

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Internationalization: Normalization and canonical equivalence in string comparison

2012-06-19 Thread Rick Waldron

On Tue, Jun 19, 2012 at 12:36 AM, Norbert Lindenberg 
ecmascr...@norbertlindenberg.com wrote:

 The ECMAScript Internationalization API Specification currently has
 normalization as an optional feature in collation. However, it requires
 that the compare function return 0 when comparing Strings that are
 considered canonically equivalent by the Unicode standard. Canonical
 equivalence, I thought, is usually implemented through normalization. Does
 it make sense to keep normalization as a separate and optional feature
 then? Is anybody planning to implement canonical equivalence through other
 mechanisms, such that the lack of normalization would be visible in the
 comparison of non-equivalent strings?

 BTW, the requirement that canonically equivalent strings compare as equal
 has been part of the specification of String.prototype.localeCompare since
 ES3. When testing with a handful of string pairs pulled from chapter 3 of
 the Unicode Standard and from UTS 10, however, I found that only Opera on
 the Mac detects their equivalence correctly. Firefox on the Mac and the V8
 systems (Chrome, Node) fail to detect any equivalence; Safari, Explorer and
 the Windows versions of Opera and Firefox detect some and miss others.
 Obviously people haven't been paying much attention to localeCompare...



I don't know enough about the first part of your message to be any use; I
am, however, interested in the second part - will you be publishing your
tests and findings?

Rick





 Norbert

 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Internationalization: Normalization and canonical equivalence in string comparison

2012-06-19 Thread Norbert Lindenberg

The test is at
http://norbertlindenberg.com/ecmascript/ESTest.html (and .js).

The strings I used are:
[o\u0308, ö],
[ä\u0323, a\u0323\u0308], // requires reordering
[a\u0308\u0323, a\u0323\u0308], // requires reordering
[ạ\u0308, a\u0323\u0308],
[ä\u0306, a\u0308\u0306],
[ă\u0308, a\u0306\u0308],
[\u\u1171\u11b6, 퓛], // jamo/hangul
[Å, Å]

Results:

Safari on Mac, iOS: Fail for comparisons that require reordering nonspacing 
marks within strings; pass for others.
Firefox, Opera, Explorer on Windows: Fail for jamo/hangul comparison; pass for 
others.
Firefox, Node on Mac; Chrome on Mac, Windows: Fail for all.
Opera on Mac: Passes for all.

Norbert


On Jun 19, 2012, at 7:30 , Rick Waldron wrote:

 
 
 On Tue, Jun 19, 2012 at 12:36 AM, Norbert Lindenberg 
 ecmascr...@norbertlindenberg.com wrote:
 The ECMAScript Internationalization API Specification currently has 
 normalization as an optional feature in collation. However, it requires that 
 the compare function return 0 when comparing Strings that are considered 
 canonically equivalent by the Unicode standard. Canonical equivalence, I 
 thought, is usually implemented through normalization. Does it make sense to 
 keep normalization as a separate and optional feature then? Is anybody 
 planning to implement canonical equivalence through other mechanisms, such 
 that the lack of normalization would be visible in the comparison of 
 non-equivalent strings?
 
 BTW, the requirement that canonically equivalent strings compare as equal has 
 been part of the specification of String.prototype.localeCompare since ES3. 
 When testing with a handful of string pairs pulled from chapter 3 of the 
 Unicode Standard and from UTS 10, however, I found that only Opera on the Mac 
 detects their equivalence correctly. Firefox on the Mac and the V8 systems 
 (Chrome, Node) fail to detect any equivalence; Safari, Explorer and the 
 Windows versions of Opera and Firefox detect some and miss others. Obviously 
 people haven't been paying much attention to localeCompare...
 
 
 I don't know enough about the first part of your message to be any use; I am, 
 however, interested in the second part - will you be publishing your tests 
 and findings?
 
 Rick
 
 
  
 
 Norbert
 
 ___
 es-discuss mailing list
 es-discuss@mozilla.org
 https://mail.mozilla.org/listinfo/es-discuss
 

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Internationalization: Normalization and canonical equivalence in string comparison

2012-06-19 Thread Gillam, Richard

Norbert--

 The ECMAScript Internationalization API Specification currently has 
 normalization as an optional feature in collation. However, it requires that 
 the compare function return 0 when comparing Strings that are considered 
 canonically equivalent by the Unicode standard. Canonical equivalence, I 
 thought, is usually implemented through normalization. Does it make sense to 
 keep normalization as a separate and optional feature then? Is anybody 
 planning to implement canonical equivalence through other mechanisms, such 
 that the lack of normalization would be visible in the comparison of 
 non-equivalent strings?

For what little it may be worth, I think it would make sense to just make 
normalization mandatory in localeCompare().  Of course, I don't know if that 
causes trouble for anybody (I'm pretty sure it doesn't for me).

--Rich Gillam
  Lab126

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Internationalization: Normalization and canonical equivalence in string comparison

Re: Internationalization: Normalization and canonical equivalence in string comparison

Re: Internationalization: Normalization and canonical equivalence in string comparison

Re: Internationalization: Normalization and canonical equivalence in string comparison

4 matches

Site Navigation

Mail list logo

Footer information