https://bugzilla.wikimedia.org/show_bug.cgi?id=8732
--- Comment #10 from Stefan Nowak <[email protected]> 2010-04-14 15:47:55 BST --- Honestly, your writing, Philippe Verdy, is beyond the scope of my knowledge, as I know little about database, collation, etc. From the little I understood, sorting seems to be really a complicated issue, especially for some Alphabet systems, and even more, if you mix them. I therefore suggest to start simple with a short term solution and then progress to the more sustainable solution. SHORT TERM SOLUTION: I guess the extended latin collation rules could really be solved client-side, without slowing the script down too much. Right? I really know little, maybe the Slavic or Scandinavian special characters are already hard, but at least French accents and German Umlauts could be easily fixed in a first patch. LONG TERM SOLUTION: 1) The sorting-code-base (mix of client/server side scripts/database-tables/etc) is written CENTRALLY for all MediaWikis. Simply for reasons of code sharing, as many functions/objects are likely to be universally used. 2)a) The collation rulesets shall be SEPERATEDLY defined, CENTRALLY PER EACH LANGUAGE Wiki (de,fr,en,he,ru,...), as languages have there different sorting rules for their native words, and for foreign words. b) It is designed in an intelligent plug-in approach. The ruleset may only define a limited amount of Unicode characters (its own languages, plus maybe the characters of its historical related cultures (pre globalisation) for which it has developed sorting rules, i.e. Austrian lexicographical order aware of French accents and Czechoslowakian Háčeks), and handing over responsibility/trust of the Unicode ranges of languages, which it doesn't know how to handle (i.e. Hebrew) by running their ruleset-plug-in. I guess 2a)b) is already pretty developed in Database applications, its rather just the question of how to integrate it properly, to satisfy the concept described above. CONCERNING PERFORMANCE: I advocate that the SortKeys are already calculated server-side, and that the client side script then only needs to sort numerically. (Offtopic remark: By this we could also offer to sort tables by multiple keys, with very little client processing power. My search for "multiple, many, search, keys" in the BugTracker did not show any results, but it's possible that people would like it.) As agreed: At best automatically without the need for human effort, only where necessary human added exceptions. I imagine it as shown in this ASCII diagram table: Name |SK| Einwohner |SK| Staat | SK London | 1| 7.554.236 | 1| Vereinigtes Königreich | 3 München| 2| 1.365.052 | 3| Deutschland | 1 Wien | 3| 1.697.982 | 2| Österreich {Oesterreich}| 2 In the Wiki markup we got the 3 columns "Name, Einwohner, Staat". The users only write the Unicode words as they are used too such as "München, Österreich", knowingly that MediaWiki cares about the SortKeys. And only if they know that the default sort-algorithm will conflict or make no sense, they can add the SortKey attribute. In my example shown as {Oesterreich} (approach Ö gets Oe) instead of the expected ruleset Ö gets O, as defined in the German collation ruleset. In the HTML/Javascript served to the browser, those additional SK value columns are supplied. Invisible to the user, but used by the client side script. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
