[Bug 164] Support collation by a certain locale (sorting order of characters)

bugzilla-daemon Thu, 22 Jul 2010 03:05:34 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=164


--- Comment #176 from Philippe Verdy <[email protected]> 2010-07-22 10:04:53 
UTC ---
Anyway, the Aryeh's proposal is not found or documented at the location you
indicate. It just says that he will try to work on it today and still asks for
solutions and asks for questions that are unanswered in his comment on the
wikitech-l list.

In a similar spirit, the generation of the section heading for categories could
also be a different builtin parser function such as:

: {{COLLATIONMAP:text|locale|level|clusters}}

with the similar parameters, and clusters=1 be default (more below). It will
return a non-opaque string that can be displayed in category pages, and could
even become a valid anchor for searching starting at some text position in the
list (ordered using the specified locale). It should be only a level-1
collation mapping by default (only level 1 will be considered for displaying
headings, however some categories could later specify another default collation
level for such mapping.

Note that collation-based mappings is a VERY different concept from the concept
of collation-based sortkeys (read the Unicode UCA specification): sortkeys are
opaque and intended to generate a full order of texts, mappings are readable
but only intended to produce a partial order.

Another optional parameter could be given after the collation level, to
indicate how many locale grapheme clusters should be included in the heading.
The default 
 headings in categories should just use 1 grapheme cluster from the text given
to the first parameter of {{COLLATIONMAP:text|locale|level|clusters}} (more
below).

In a category you could *LATER* specify additional sort orders (and collation
mappings at the same time) using a syntax like:
{{SORTORDER:locale|level}} (the default collation level will be 1 for
categories).

Example 1: in a Chinese category page, you could specify:
; {{SORTORDER:zh-hans}}
: to indicate that pages in that category will be also available using the
radical/stroke order of sinograms. 
; {{SORTORDER:zh-latn}}
: to indicate that pages in that category will be also available using the
Pinyin Latin order.

Example 2: in a Korean cateogry, where the primary order is based on the
decomposition into "jamos", it will be enough to specify:
; {{SORTORDER:ko}}
: (for the default South Korean order of jamos)
; {{SORTORDER:ko-kp}}
: (for the default North Korean order of jamos)

Indicating the collation level with a value different from 1 could generate sub
headings for level 2, but I think it should only display them with the
specified level, all using the same heading from the level-2 collation mapping.

Indicating clusters=2 (or more) in the builtin parserfunction {{COLLATIONMAP:}}
may be used to generate more precise headings (for example in heavily English
or French populated categories, use 2 grapheme clusters to generate headings on
the 2 first letters, but keep the collation level to 1). By default the builtin
parser function will not limit the number of grapheme clusters (so it will
remap all the text), but a category could still specify this maximum number of
clusters to consider for headings.

By default a category will consider either
* clusters=1 : section headings will be generated only for the first cluster
(this is what is currently used in categories). or
* clusters=0 : section headings will not be generated at all.
(this default could be a per-project default).

The generation of section headings (using the same PHP function used by
{{COLLATIONMAP:}}) does not require any modification of the schema. Headings
can be computed and generated on the fly, from the retrieved list of pages.
Headings should not even influence the sort order, they are just convenient
groupings for display.

The {{COLLATIONMAP:}} function described here would in fact enter in the list
of string builtin function, as it falls in the same category as other mapping
functions like {{UC:}} or {{LC:}}. This is just another mapping.

The generation of sort keys (using the same PHP function used by {{SORTKEY:}})
however requires active parsing to store them in the schema. So it can only be
done later. This developement should be made *later* (when the SQL schema for
category indexes will be adjusted to support multiple/upgradable collation
orders).

So yes, a builtin parser function such as
{{COLLATIONMAP:text|locale|level|clusters should first be implemented and
tested separately before being used to group items in category headings, but it
can be implemented separately from the developement and deployment of the SQL
schema for indexing categories with {{SORTKEY:}}), and integrated later in the
code that will present the list of pages to the users.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 164] Support collation by a certain locale (sorting order of characters)

Reply via email to