Case insensitivity shouldn't be a problem for any language, as long as
you do it properly.

Turkish and other languages using dotless i, for example, will need a
special rule - Turkish lowercase dotted i capitalizes to a capital
dotted İ while lowercase undotted ı capitalizes to regular undotted I.

skype: node.ue

On Tue, Jul 28, 2009 at 9:26 AM, Aryeh
Gregor<[email protected]> wrote:
> On Tue, Jul 28, 2009 at 11:53 AM, Paul Houle<[email protected]> wrote:
>> I've been looking at the id structure of dbpedia and wikipedia and
>> finally found an example where case sensitivity issues really bite.
>
> We should keep in mind that case isn't so clear-cut if you move away
> from English, though -- is "groß" the same as "GROSS" and thus the
> same as "gross"?  How about languages that don't even have bijections
> between uppercase and lowercase if you stick to the same dialect?
> (I'm pretty sure there are some; don't some language strip diacritics
> from uppercase letters?)  There's probably some Unicode standard on
> normalization with respect to case, but it's not actually so simple in
> an international context.
>
> That said, I think case-insensitivity would be a good thing to support
> in the long run, optionally, and that it would probably be suitable
> for all Wikipedias.  Or at least almost all, if there are languages
> out there where case insensitivity is a real headache -- hopefully
> not, since most languages don't have letter case at all.  At any rate
> it would be good on enwiki.
>
> But it would require a lot of tedious and error-prone conversion of
> old code.  Everything tends to assume that a)
> $title->getPrefixedText() is what should be displayed to the user, but
> b) two titles are equal if and only if their
> $title->getPrefixedText()s are equal.  Likewise for
> $title->getPrefixedDbKey().  Those would need to be systematically and
> thoroughly fixed.  We'd also have to add a field to the page table or
> such to store the normalized form of the title, and fiddle with the
> indexes appropriately, and update all other tables to use the
> normalized form.  A lot of work.
>
> (But at least we could get rid of the silly Text/DbKey distinction
> while we're doing this.  I've heard recent MySQL versions actually
> support storage of ASCII space characters in text fields!)
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to