https://bugzilla.wikimedia.org/show_bug.cgi?id=164





--- Comment #134 from Philippe Verdy <[email protected]>  2009-11-19 13:51:53 
UTC ---
Collation keys can coexist with custom sort keys.
Custom sort keys are useful and will continue to be useful, to tweak the
default collation (which for now is simply a binary ordering of codepoints).

Tdoday, basically, the categories are sorted by [sort key, full page name].
(possibly truncated to a reasonable size using unique identifier).

What we want is to be able to sort by: [collationkey([sort key, full page
name]), sort key, full pagename) (also with possible truncation of the whole). 

This will preserve the existing tweaks made in pages when they reference
categories, and will help sort all the rest.

Ideally, the collation keys should be generated "on the wild" by the SQL engine
itself (because it would allow alternate sort orders, according to user locale
preferences or according to web query parameters set by GUI buttons, especially
for Chinese where several sort orders are common: sort by Pinyin, sort by
radical/strokes, sort by traditional dictionary orders), as part of its
supported "ORDER BY" clause for getting the list of article names to display in
categories.

But if the SQL engine does not have such support, this must be implemented in
the PHP code and collation keys can be stored in a new datacolumn (the extra
data column can be added or filled conditionnally : if the SQL engine supports
the needed collations, this column can remain NULL to save storage space).

If the SQL engine does not have support for dynamic collations, then the
alternate (user locale-based) sort orders will not easy to implement because of
the cost that it would require in the SQL client-side (in PHP) for heavily
populated categories, where the support for true locale-based collation orders
is the most wanted, unless the database can store multiple collation keys (for
distinct specific locales): supporting the storage of multiple collation keys
for different locales can severaly impact the server performance as it would
require an extra join to a separate 1:N table to store the collation keys
indexed by (pageid, locale); instead of storing these keys in the same SQL
table used for storing the category index.

Additionally, the stored collation keys will sometimes need to be updated (when
the CLDR data for locale-tailored collations will be updated or when there will
be updated in the Unicode version with new characters): updating a large volume
of stored collation keys will require a lot of work, and this can impact the
availability of the wiki project, unless the data model includes a versioning
system that allows at least two versions for the same locale to coexist for
some time, and then allows switching from one version to the next before
cleaning up the old collation keys after the collation keys have been updated
to the new tailoring.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to