Category sorting in MediaWiki has always been done wrong.  
Categories are not sorted alphabetically, but in Unicode order. I 
don't know when or why DEFAULTSORT was introduced, but today it is 
being used mostly for sorting people by surname, e.g. 
{{DEFAULTSORT:Wales, Jimmy}}, and for sorting other topics by 
another word than the first, so that the "European Commission" is 
sorted under C rather than E in
http://en.wikipedia.org/wiki/Category:Institutions_of_the_European_Union

If you look closer at that category, you see that some items are 
sorted under E, which probably means somebody forgot to use 
DEFAULTSORT there:

 * European Court of Auditors (0)
 * European Union Mission (1)
 * European quarter of Brussels (1)

What's even more remarkable is that "quarter" is sorted after 
"Union".  This is because lower case letters sort after all the 
upper case letters in ASCII and Unicode.  That is how broken 
category sorting is in MediaWiki.

Another example of broken sorting is when whitespace is compared 
to letters.  In ASCII and Unicode, whitespace (position 32) sorts 
ahead of all printable characters.  This means Moon illusion sorts 
ahead of Moonbow in http://en.wikipedia.org/wiki/Category:Moon 
because the whitespace before "illusion" is compared to the b in 
Moonbow.  I'm not sure if this is correct in English, but in 
Swedish it is wrong; bow should sort before illusion, regardless 
of the whitespace.

There is a way to avoid all such problems, namely by a more 
aggressive use of DEFAULTSORT that removes from sorting all upper 
case letters (except the initial one), all whitespace and all 
commas.  It would mean almost every article needs a DEFAULTSORT. 
In the examples above:

 {{DEFAULTSORT:Walesjimmy}}
 {{DEFAULTSORT:Europeancourtofauditors}}
 {{DEFAULTSORT:Europeanunionmission}}
 {{DEFAULTSORT:Europeanquarterofbrussels}}
 {{DEFAULTSORT:Moonillusion}}

This can be done with bots, for sure, if we agree that it should 
be done. Is this something we should strive for?  Has any language 
of Wikipedia (or Wikinews or...) already started to do this?


-- 
  Lars Aronsson ([email protected])
  Aronsson Datateknik - http://aronsson.se

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to