Thanks Yuri,

I know of the normalization done through the API, but it doesn't work for
the case I'm working on : it's a dump analysis, and I want it to be able to
work offline...

Nico

On Sun, Aug 4, 2019 at 2:12 AM Yuri Astrakhan <yuriastrak...@gmail.com>
wrote:

> Hi Nico, if possible, can your tool to actually use MW API to normalize
> titles? It's a very quick API call, you can do multiple titles at once, but
> it will save you a lot of grief over incompatibilities.
> --Yuri
>
> On Sat, Aug 3, 2019 at 10:57 AM Nicolas Vervelle <nverve...@gmail.com>
> wrote:
>
> > Hello,
> >
> > On most wikis, MediaWiki is configuration to convert the first letter of
> a
> > title to uppercase, but apparently it's not converting every Unicode
> > characters : for example, on frwiki ɽ
> > <https://fr.wikipedia.org/w/index.php?title=%C9%BD&redirect=no> is a
> > different article than Ɽ <https://fr.wikipedia.org/wiki/%E2%B1%A4>, even
> > if
> > the second character is the uppercase version of the first one in
> Unicode.
> >
> > So, what characters are actually converted to uppercase by the title
> > normalization ?
> >
> > I need to know this information to stop reporting some false positives in
> > WPCleaner <https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:WPCleaner>.
> >
> > Thanks, Nico
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to