Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

David Gerard Sat, 14 May 2011 01:01:06 -0700

On 14 May 2011 04:33, Andrew Dunbar <[email protected]> wrote:
> On 14 May 2011 01:48, Aryeh Gregor <[email protected]> wrote:
>> On Fri, May 13, 2011 at 3:31 AM, M. Williamson <[email protected]> wrote:


>>> I still don't think page titles should be case sensitive. Last time I asked
>>> how useful this really was, back in 2005 or so, I got a tersely-worded
>>> response that we need it to disambiguate certain pages. OK, but how many
>>> cases does that actually apply to? I would think that the increased
>>> usability from removing case sensitivity would far outweigh the benefit of
>>> natural disambiguation that only applies to a tiny minority of pages, and
>>> which could easily be replaced with disambiguation pages.

>> From a software perspective, the way to do this would be to store a
>> canonicalized version of each page's title, and require that to be
>> unique instead of the title itself.  This would be nice because we
>> could allow underscores in page titles, for instance, in addition to
>> being able to do case-folding.
>> Note that Unicode capitalization is locale-dependent, but case-folding
>> is not.  Thus we could use the same case-folding on all projects,
>> including international projects like Commons.  There's only one
>> exception -- Turkish, with its dotless and dotted i's.  But that's
>> minor enough that we should be able to work around it without too much
>> pain.

> I'm almost positive Azeri has the same dotless i issue and perhaps
> some of the other Turkic languages of Central Asia. One solution is to
> do accent/diacritic normalization too as part of the canonicalization.


This is getting into "nirvana fallacy" territory - we can't have
case-folding until every edge case works?

Instead, I would ask first: What does it take in English? Then work
out from there.


- d.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] search=steven+tyler gets Steven_tyler

Reply via email to