Дана Tuesday 28 July 2009 19:16:22 Brion Vibber написа: > On 7/28/09 10:04 AM, Aryeh Gregor wrote: > > On Tue, Jul 28, 2009 at 12:52 PM, Mark Williamson<[email protected]> wrote: > >> Case insensitivity shouldn't be a problem for any language, as long as > >> you do it properly. > >> > >> Turkish and other languages using dotless i, for example, will need a > >> special rule - Turkish lowercase dotted i capitalizes to a capital > >> dotted İ while lowercase undotted ı capitalizes to regular undotted I. > > > > And so what if a wiki is multilingual and you don't know what language > > the page name is in? What if a Turkish wiki contains some English > > page names as loan words, for instance? > > Indeed, good handling of case-insensitive matchings would be a big win > for human usability, but it's not easy to get right in all cases. > > The main problems are: > > 1) Conflicts when we really do consider something separate, but the case > folding rules match them together > > 2) Language-specific case folding rules in a multilingual environment > > Turkish I with/without dot and German ß not always matching to SS are > the primary examples off the top of my head. Also, some languages tend > to drop accent markers in capital form (eg, Spanish). What can or should > we do here?
Similar to automatic redirect, we could build an authomatic disambiguation page. For example, someone on srwiki going to [[Dj]] would get: Did you mean: * [[Đ]] * [[DJ]] * [[D.J.]] > A nearer-term help would be to go ahead and implement what we talked > about a billion years ago but never got around to -- a decent "did you > mean X?" message to display when you go to an empty page but there's > something similar nearby. Was thinking a lot about this. The best solution I thought of would be to add a column to page table "page_title_canonical". When an article is created/moved, this canonical title is built from the real title. When an article is looked up, if there is no match in page_title, build the canonical title from the URL and see if there is a match in page_title_canonical and if yes, display "did you mean X" or even go there automatically as if from a redirect (if there is only one match) or "did you mean *X, *X1" if there are multiple matches. This canonical title would be made like this: * Remove disambiguator from the title if it exists * Remove punctuation and the like * Transliterate the title to Latin alphabet * Transliterate to pure ASCII * Lowercase * Order the words alphabetically What could possibly go wrong? Note that this would also be very helpful for non-Latin wikis - people often want Latin-only URLs since non-Latin URLs are toooo long. I also recall a recent discussion about a wiki in a language with nonstandard spelling (nds?) where they use bots to create dozens or even hundreds of redirects to an article title - this would also make that unneeded. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
