[Bug 8732] Sorttable table doesn't sort properly strings with German umlaut characters

bugzilla-daemon Wed, 14 Apr 2010 07:48:33 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=8732


--- Comment #10 from Stefan Nowak <[email protected]> 2010-04-14 15:47:55 BST ---
Honestly, your writing, Philippe Verdy, is beyond the scope of my knowledge, as
I know little about database, collation, etc.

From the little I understood, sorting seems to be really a complicated issue,
especially for some Alphabet systems, and even more, if you mix them.

I therefore suggest to start simple with a short term solution and then
progress to the more sustainable solution.

SHORT TERM SOLUTION:

I guess the extended latin collation rules could really be solved client-side,
without slowing the script down too much. Right? I really know little, maybe
the Slavic or Scandinavian special characters are already hard, but at least
French accents and German Umlauts could be easily fixed in a first patch.

LONG TERM SOLUTION:

1) The sorting-code-base (mix of client/server side
scripts/database-tables/etc) is written CENTRALLY for all MediaWikis. Simply
for reasons of code sharing, as many functions/objects are likely to be
universally used.

2)a) The collation rulesets shall be SEPERATEDLY defined, CENTRALLY PER EACH
LANGUAGE Wiki (de,fr,en,he,ru,...), as languages have there different sorting
rules for their native words, and for foreign words.

b) It is designed in an intelligent plug-in approach. The ruleset may only
define a limited amount of Unicode characters (its own languages, plus maybe
the characters of its historical related cultures (pre globalisation) for which
it has developed sorting rules, i.e. Austrian lexicographical order aware of
French accents and  Czechoslowakian Háčeks), and handing over
responsibility/trust of the Unicode ranges of languages, which it doesn't know
how to handle (i.e. Hebrew) by running their ruleset-plug-in.

I guess 2a)b) is already pretty developed in Database applications, its rather
just the question of how to integrate it properly, to satisfy the concept
described above.

CONCERNING PERFORMANCE:

I advocate that the SortKeys are already calculated server-side, and that the
client side script then only needs to sort numerically.

(Offtopic remark: By this we could also offer to sort tables by multiple keys,
with very little client processing power. My search for "multiple, many,
search, keys" in the BugTracker did not show any results, but it's possible
that people would like it.)

As agreed: At best automatically without the need for human effort, only where
necessary human added exceptions.

I imagine it as shown in this ASCII diagram table:

Name   |SK| Einwohner |SK| Staat                   | SK    
London | 1| 7.554.236 | 1| Vereinigtes Königreich  |  3
München| 2| 1.365.052 | 3| Deutschland             |  1
Wien   | 3| 1.697.982 | 2| Österreich {Oesterreich}|  2


In the Wiki markup we got the 3 columns "Name, Einwohner, Staat". The users
only write the Unicode words as they are used too such as "München,
Österreich", knowingly that MediaWiki cares about the SortKeys.

And only if they know that the default sort-algorithm will conflict or make no
sense, they can add the SortKey attribute. In my example shown as {Oesterreich}
(approach Ö gets Oe) instead of the expected ruleset Ö gets O, as defined in
the German collation ruleset.

In the HTML/Javascript served to the browser, those additional SK value columns
are supplied. Invisible to the user, but used by the client side script.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 8732] Sorttable table doesn't sort properly strings with German umlaut characters

Reply via email to