Thanks Yuri, I know of the normalization done through the API, but it doesn't work for the case I'm working on : it's a dump analysis, and I want it to be able to work offline...
Nico On Sun, Aug 4, 2019 at 2:12 AM Yuri Astrakhan <yuriastrak...@gmail.com> wrote: > Hi Nico, if possible, can your tool to actually use MW API to normalize > titles? It's a very quick API call, you can do multiple titles at once, but > it will save you a lot of grief over incompatibilities. > --Yuri > > On Sat, Aug 3, 2019 at 10:57 AM Nicolas Vervelle <nverve...@gmail.com> > wrote: > > > Hello, > > > > On most wikis, MediaWiki is configuration to convert the first letter of > a > > title to uppercase, but apparently it's not converting every Unicode > > characters : for example, on frwiki ɽ > > <https://fr.wikipedia.org/w/index.php?title=%C9%BD&redirect=no> is a > > different article than Ɽ <https://fr.wikipedia.org/wiki/%E2%B1%A4>, even > > if > > the second character is the uppercase version of the first one in > Unicode. > > > > So, what characters are actually converted to uppercase by the title > > normalization ? > > > > I need to know this information to stop reporting some false positives in > > WPCleaner <https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:WPCleaner>. > > > > Thanks, Nico > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l