Дана Tuesday 28 July 2009 19:16:22 Brion Vibber написа:
> On 7/28/09 10:04 AM, Aryeh Gregor wrote:
> > On Tue, Jul 28, 2009 at 12:52 PM, Mark Williamson<[email protected]>  
wrote:
> >> Case insensitivity shouldn't be a problem for any language, as long as
> >> you do it properly.
> >>
> >> Turkish and other languages using dotless i, for example, will need a
> >> special rule - Turkish lowercase dotted i capitalizes to a capital
> >> dotted İ while lowercase undotted ı capitalizes to regular undotted I.
> >
> > And so what if a wiki is multilingual and you don't know what language
> > the page name is in?  What if a Turkish wiki contains some English
> > page names as loan words, for instance?
>
> Indeed, good handling of case-insensitive matchings would be a big win
> for human usability, but it's not easy to get right in all cases.
>
> The main problems are:
>
> 1) Conflicts when we really do consider something separate, but the case
> folding rules match them together
>
> 2) Language-specific case folding rules in a multilingual environment
>
> Turkish I with/without dot and German ß not always matching to SS are
> the primary examples off the top of my head. Also, some languages tend
> to drop accent markers in capital form (eg, Spanish). What can or should
> we do here?

Similar to automatic redirect, we could build an authomatic disambiguation 
page. For example, someone on srwiki going to [[Dj]] would get:

Did you mean:

* [[Đ]]
* [[DJ]]
* [[D.J.]]

> A nearer-term help would be to go ahead and implement what we talked
> about a billion years ago but never got around to -- a decent "did you
> mean X?" message to display when you go to an empty page but there's
> something similar nearby.

Was thinking a lot about this. The best solution I thought of would be to add 
a column to page table "page_title_canonical". When an article is 
created/moved, this canonical title is built from the real title. When an 
article is looked up, if there is no match in page_title, build the canonical 
title from the URL and see if there is a match in page_title_canonical and if 
yes, display "did you mean X" or even go there automatically as if from a 
redirect (if there is only one match) or "did you mean *X, *X1" if there are 
multiple matches.

This canonical title would be made like this:
* Remove disambiguator from the title if it exists
* Remove punctuation and the like
* Transliterate the title to Latin alphabet
* Transliterate to pure ASCII
* Lowercase
* Order the words alphabetically

What could possibly go wrong?

Note that this would also be very helpful for non-Latin wikis - people often 
want Latin-only URLs since non-Latin URLs are toooo long. I also recall a 
recent discussion about a wiki in a language with nonstandard spelling (nds?) 
where they use bots to create dozens or even hundreds of redirects to an 
article title - this would also make that unneeded.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to