On 14 May 2011 04:33, Andrew Dunbar <[email protected]> wrote: > On 14 May 2011 01:48, Aryeh Gregor <[email protected]> wrote: >> On Fri, May 13, 2011 at 3:31 AM, M. Williamson <[email protected]> wrote:
>>> I still don't think page titles should be case sensitive. Last time I asked >>> how useful this really was, back in 2005 or so, I got a tersely-worded >>> response that we need it to disambiguate certain pages. OK, but how many >>> cases does that actually apply to? I would think that the increased >>> usability from removing case sensitivity would far outweigh the benefit of >>> natural disambiguation that only applies to a tiny minority of pages, and >>> which could easily be replaced with disambiguation pages. >> From a software perspective, the way to do this would be to store a >> canonicalized version of each page's title, and require that to be >> unique instead of the title itself. This would be nice because we >> could allow underscores in page titles, for instance, in addition to >> being able to do case-folding. >> Note that Unicode capitalization is locale-dependent, but case-folding >> is not. Thus we could use the same case-folding on all projects, >> including international projects like Commons. There's only one >> exception -- Turkish, with its dotless and dotted i's. But that's >> minor enough that we should be able to work around it without too much >> pain. > I'm almost positive Azeri has the same dotless i issue and perhaps > some of the other Turkic languages of Central Asia. One solution is to > do accent/diacritic normalization too as part of the canonicalization. This is getting into "nirvana fallacy" territory - we can't have case-folding until every edge case works? Instead, I would ask first: What does it take in English? Then work out from there. - d. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
