Jaska Zedlik wrote:
> Hi!
>
> There are different apostrophe signs exist. Let's consider 2 of them:
> U+0027 and U+2019. They have the same meaning and both of them are
> acceptable and apostrophes for the English language, for instance. The
> problem is that MediaWiki internal search distinguishes these two
> apostrophes and the words containing U+2019 can't be found with the
> request containing U+0027 and vice versa.
>   
Probably what we should be doing in this area is running text through 
Unicode compatibility composition normalization as well as some other 
character folding for punctuation forms where necessary. 
(UtfNormal::toNFKC() will merge things like full-width Roman characters 
but won't merge these related-but-not-quite-the-same punctuation forms.)

-- brion
> MediaWiki uses a search index for the internal search and the index is
> renewed every time the article is saved. I have found that if to
> override the function stripForSearch() in the language class with the
> new function wich relpaces the U+2019 with U+0027 for search index it
> appears that the internal search begins to work properly not paying
> attention to which exactly apostrophe was provided in the search
> query, either U+0027 or U+2019. For sure, the context is not
> highlighted if the apostrophes differ in the query and in the result,
> but the search returns what is really needed.
>
> The question is, if we override the stripForSearch() function in the
> language class in such a way, won't this cause any problems?
>
> The code of the override function is the following:
>
> function stripForSearch( $string ) {
>   $s = $string;
>   $s = preg_replace( '/\xe2\x80\x99/', '\'', $s );
>   return parent::stripForSearch( $s );
> }
>
> We want to introduce such an issue for Belarusian, but I think
> Ukrainian language may experience the same problem with the different
> apostrophes, as U+0027 is not a valid apostrophe here as well, but
> only U+0027 (the typewriter apostrophe) is available on the majority
> of Belarusian and Ukrainian keyboard layouts.
>
> Thanks,
> zedlik
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to