Last question (I believe) :
I've implemented something similar as Php72ToUpper in WPCleaner, and it
seems to work fine for removing false positives.
I've only one left on frwiki : ⅷ
<https://fr.wikipedia.org/w/index.php?title=%E2%85%B7&redirect=no>.
My code still converts it to uppercase, but on frwiki there is one page for
the lowercase letter, and one page for the uppercase letter, so this letter
is not converted to uppercase by current MediaWiki version.
Is it missing in Php72ToUpper to prevent it to be converted with PHP 7.2 ?

Nico

On Mon, Aug 5, 2019 at 8:45 AM Nicolas Vervelle <nverve...@gmail.com> wrote:

> Thanks Giuseppe !
>
> I've subscribed to T219279 to know when the pages are properly converted,
> and when I can remove the hack in my code.
>
> Nico
>
> On Mon, Aug 5, 2019 at 7:03 AM Giuseppe Lavagetto <
> glavage...@wikimedia.org> wrote:
>
>> On Sun, Aug 4, 2019 at 11:34 AM Nicolas Vervelle <nverve...@gmail.com>
>> wrote:
>>
>> > Thanks Brian,
>> >
>> > Great for the link to Php72ToUpper.php !
>> > I think I understand with it : for example, the first line says 'ƀ' =>
>> 'ƀ',
>> > which should mean that this letter shouldn't be converted to uppercase
>> by
>> > MW ?
>> > That's one of the letter I found that wasn't converted to uppercase and
>> > that was generating a false positive in my code : so it's because
>> specific
>> > MW code is preventing the conversion :-)
>> >
>>
>> Hi!
>>
>> No, that file is a temporary measure during a transition between two
>> versions of php.
>>
>> In HHVM and PHP 5.x, calling mb_toupper("ƀ") would give the erroneous
>> result "ƀ".
>>
>> In PHP 7.x, the result is the correct capitalization.
>>
>> The issue is that the titles of wiki articles get normalized, so under
>> php7
>> we would have
>>
>> ƀar => Ƀar
>>
>> which would prevent you from being able to reach the page.
>>
>> Once we're done with the transition and we go through the process of
>> coverting the (several hundred) pages/users that have the wrong title
>> normalization, we will remove that table, and obtain the correct
>> behaviour.
>>
>> You just need to subscribe https://phabricator.wikimedia.org/T219279 and
>> wait for its resolution I think - most unicode horrors are fixed in recent
>> versions of PHP, including the one you were citing.
>>
>> Cheers,
>>
>> Giuseppe
>> --
>> Giuseppe Lavagetto
>> Principal Site Reliability Engineer, Wikimedia Foundation
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to