Hello,

On Fri, Jun 19, 2009 at 23:28, Brion Vibber <[email protected]> wrote:

> Jaska Zedlik wrote:
>
>> Hi!
>>
>> There are different apostrophe signs exist. Let's consider 2 of them:
>> U+0027 and U+2019. They have the same meaning and both of them are
>> acceptable and apostrophes for the English language, for instance. The
>> problem is that MediaWiki internal search distinguishes these two
>> apostrophes and the words containing U+2019 can't be found with the
>> request containing U+0027 and vice versa.
>>
>>
> Probably what we should be doing in this area is running text through
> Unicode compatibility composition normalization as well as some other
> character folding for punctuation forms where necessary.
> (UtfNormal::toNFKC() will merge things like full-width Roman characters but
> won't merge these related-but-not-quite-the-same punctuation forms.)
>
> -- brion

As I understand, this is not a Unicode compatibility composition, as these
are 2 different charachters (U+2019 even defined as Right Single Quotation
Mark), but in some languages (not for all, for sure) they could have
identical meaning. As the characters are different, I'm afraid they are not
covered by the Unicode normalization process, and we should deal with the
functions available in the language class.

zedlik
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to