Re: Internationalization: Normalization and canonical equivalence in string comparison
I'm afraid it's not quite so simple. The Internationalization API spec defines localeCompare() as a wrapper around Intl.Collator.prototype.compare, so to make normalization mandatory for localeCompare, we'd have to make it mandatory for Collator as well. I'd like to get some input from implementors whether that makes sense, or whether they're planning to implement canonical equivalence in some other way. Thanks, Norbert On Jun 19, 2012, at 10:37 , Gillam, Richard wrote: Norbert-- The ECMAScript Internationalization API Specification currently has normalization as an optional feature in collation. However, it requires that the compare function return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard. Canonical equivalence, I thought, is usually implemented through normalization. Does it make sense to keep normalization as a separate and optional feature then? Is anybody planning to implement canonical equivalence through other mechanisms, such that the lack of normalization would be visible in the comparison of non-equivalent strings? For what little it may be worth, I think it would make sense to just make normalization mandatory in localeCompare(). Of course, I don't know if that causes trouble for anybody (I'm pretty sure it doesn't for me). --Rich Gillam Lab126 ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Internationalization: Normalization and canonical equivalence in string comparison
On Tue, Jun 19, 2012 at 12:36 AM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: The ECMAScript Internationalization API Specification currently has normalization as an optional feature in collation. However, it requires that the compare function return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard. Canonical equivalence, I thought, is usually implemented through normalization. Does it make sense to keep normalization as a separate and optional feature then? Is anybody planning to implement canonical equivalence through other mechanisms, such that the lack of normalization would be visible in the comparison of non-equivalent strings? BTW, the requirement that canonically equivalent strings compare as equal has been part of the specification of String.prototype.localeCompare since ES3. When testing with a handful of string pairs pulled from chapter 3 of the Unicode Standard and from UTS 10, however, I found that only Opera on the Mac detects their equivalence correctly. Firefox on the Mac and the V8 systems (Chrome, Node) fail to detect any equivalence; Safari, Explorer and the Windows versions of Opera and Firefox detect some and miss others. Obviously people haven't been paying much attention to localeCompare... I don't know enough about the first part of your message to be any use; I am, however, interested in the second part - will you be publishing your tests and findings? Rick Norbert ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Internationalization: Normalization and canonical equivalence in string comparison
The test is at http://norbertlindenberg.com/ecmascript/ESTest.html (and .js). The strings I used are: [o\u0308, ö], [ä\u0323, a\u0323\u0308], // requires reordering [a\u0308\u0323, a\u0323\u0308], // requires reordering [ạ\u0308, a\u0323\u0308], [ä\u0306, a\u0308\u0306], [ă\u0308, a\u0306\u0308], [\u\u1171\u11b6, 퓛], // jamo/hangul [Å, Å] Results: Safari on Mac, iOS: Fail for comparisons that require reordering nonspacing marks within strings; pass for others. Firefox, Opera, Explorer on Windows: Fail for jamo/hangul comparison; pass for others. Firefox, Node on Mac; Chrome on Mac, Windows: Fail for all. Opera on Mac: Passes for all. Norbert On Jun 19, 2012, at 7:30 , Rick Waldron wrote: On Tue, Jun 19, 2012 at 12:36 AM, Norbert Lindenberg ecmascr...@norbertlindenberg.com wrote: The ECMAScript Internationalization API Specification currently has normalization as an optional feature in collation. However, it requires that the compare function return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard. Canonical equivalence, I thought, is usually implemented through normalization. Does it make sense to keep normalization as a separate and optional feature then? Is anybody planning to implement canonical equivalence through other mechanisms, such that the lack of normalization would be visible in the comparison of non-equivalent strings? BTW, the requirement that canonically equivalent strings compare as equal has been part of the specification of String.prototype.localeCompare since ES3. When testing with a handful of string pairs pulled from chapter 3 of the Unicode Standard and from UTS 10, however, I found that only Opera on the Mac detects their equivalence correctly. Firefox on the Mac and the V8 systems (Chrome, Node) fail to detect any equivalence; Safari, Explorer and the Windows versions of Opera and Firefox detect some and miss others. Obviously people haven't been paying much attention to localeCompare... I don't know enough about the first part of your message to be any use; I am, however, interested in the second part - will you be publishing your tests and findings? Rick Norbert ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Internationalization: Normalization and canonical equivalence in string comparison
Norbert-- The ECMAScript Internationalization API Specification currently has normalization as an optional feature in collation. However, it requires that the compare function return 0 when comparing Strings that are considered canonically equivalent by the Unicode standard. Canonical equivalence, I thought, is usually implemented through normalization. Does it make sense to keep normalization as a separate and optional feature then? Is anybody planning to implement canonical equivalence through other mechanisms, such that the lack of normalization would be visible in the comparison of non-equivalent strings? For what little it may be worth, I think it would make sense to just make normalization mandatory in localeCompare(). Of course, I don't know if that causes trouble for anybody (I'm pretty sure it doesn't for me). --Rich Gillam Lab126 ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss