https://bugzilla.wikimedia.org/show_bug.cgi?id=164





--- Comment #137 from Roan Kattouw <[email protected]>  2009-11-19 
14:17:38 UTC ---
(In reply to comment #134)
> Tdoday, basically, the categories are sorted by [sort key, full page name].
> (possibly truncated to a reasonable size using unique identifier).
> 
This is not true. We're sorting by (sort key, pageID) where the sortkey is
whatever the page sets as sortkey (or the page name if not set), and pageID is
a unique number identifying the page (roughly proportional to creation time).

> Ideally, the collation keys should be generated "on the wild" by the SQL 
> engine
> itself (because it would allow alternate sort orders, according to user locale
> preferences or according to web query parameters set by GUI buttons, 
> especially
> for Chinese where several sort orders are common: sort by Pinyin, sort by
> radical/strokes, sort by traditional dictionary orders), as part of its
> supported "ORDER BY" clause for getting the list of article names to display 
> in
> categories.
> 
SQL engines probably support this, but using it would result in an unacceptably
inefficient query. Only queries sorting by the actual values of fields are
efficient, and only if there is an index on those fields.

> But if the SQL engine does not have such support, this must be implemented in
> the PHP code and collation keys can be stored in a new datacolumn (the extra
> data column can be added or filled conditionnally : if the SQL engine supports
> the needed collations, this column can remain NULL to save storage space).
> 
If you sort this stuff in PHP, you need to grab the entire list before you can
reliably sort it. Doing that for [[Category:Living people]] has no chance of
staying within the memory limit.

> If the SQL engine does not have support for dynamic collations, then the
> alternate (user locale-based) sort orders will not easy to implement because 
> of
> the cost that it would require in the SQL client-side (in PHP) for heavily
> populated categories, where the support for true locale-based collation orders
> is the most wanted, unless the database can store multiple collation keys (for
> distinct specific locales): supporting the storage of multiple collation keys
> for different locales can severaly impact the server performance as it would
> require an extra join to a separate 1:N table to store the collation keys
> indexed by (pageid, locale); instead of storing these keys in the same SQL
> table used for storing the category index.
> 
For supporting multiple collations, a separate table sounds a lot more sane in
terms of read performance, as it would suffer less from the problems mentioned
above.

> Additionally, the stored collation keys will sometimes need to be updated 
> (when
> the CLDR data for locale-tailored collations will be updated or when there 
> will
> be updated in the Unicode version with new characters): updating a large 
> volume
> of stored collation keys will require a lot of work, and this can impact the
> availability of the wiki project, unless the data model includes a versioning
> system that allows at least two versions for the same locale to coexist for
> some time, and then allows switching from one version to the next before
> cleaning up the old collation keys after the collation keys have been updated
> to the new tailoring.
> 
Expensive writes are better than expensive reads.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to