On 14 May 2011 01:48, Aryeh Gregor <[email protected]> wrote:
> On Fri, May 13, 2011 at 3:31 AM, M. Williamson <[email protected]> wrote:
>> I still don't think page titles should be case sensitive. Last time I asked
>> how useful this really was, back in 2005 or so, I got a tersely-worded
>> response that we need it to disambiguate certain pages. OK, but how many
>> cases does that actually apply to? I would think that the increased
>> usability from removing case sensitivity would far outweigh the benefit of
>> natural disambiguation that only applies to a tiny minority of pages, and
>> which could easily be replaced with disambiguation pages.
>
> From a software perspective, the way to do this would be to store a
> canonicalized version of each page's title, and require that to be
> unique instead of the title itself.  This would be nice because we
> could allow underscores in page titles, for instance, in addition to
> being able to do case-folding.
>
> Note that Unicode capitalization is locale-dependent, but case-folding
> is not.  Thus we could use the same case-folding on all projects,
> including international projects like Commons.  There's only one
> exception -- Turkish, with its dotless and dotted i's.  But that's
> minor enough that we should be able to work around it without too much
> pain.

I'm almost positive Azeri has the same dotless i issue and perhaps
some of the other Turkic languages of Central Asia. One solution is to
do accent/diacritic normalization too as part of the canonicalization.

Andrew Dunbar (hippietrail)

> Some projects, like probably all Wiktionaries, would doubtless not
> want case-folding at all, so we should support different
> canonicalization algorithms.  Even the ones that don't want
> case-folding could still benefit from allowing underscores in titles.
>
> But all this would require a very intrusive rewrite.  Assumptions like
> "replace spaces by underscores to get dbkey" are hardwired into
> MediaWiki all over the place, unfortunately.  It's not clear that it's
> worth it, since there are downsides to case-folding too.  It might
> make more sense to auto-generate redirects instead, which would be a
> much easier project that wouldn't have the downsides.
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to