https://bugzilla.wikimedia.org/show_bug.cgi?id=164





--- Comment #154 from Philippe Verdy <[email protected]>  2009-11-25 23:06:01 
UTC ---
That's another good resson why collation should be supported directly within
the MediaWiki software, which already depends completely of PHP, so that it
should really use the best integration as possible using the dedicated ICU
integration module for PHP.
This also means that it will be much better to simply store the computed
collation keys directly within the database schema, unless the database adapter
for PHP already supports ICU (and there's a commitment from the database vendor
to integrate ICU as part of its design).
The Todo:ICU in PostgreSQL will be fine when it will be effectively
implemented, and this integration becomes fully supported.
But anyway, the first thing to do is to press the PHP developers to have their
own commitment to offer full support and integration of ICU within PHP, and if
this is still nto the case, making sure that the ICU integration module for PHP
comes from a viable project (otherwise there will be the need, in MediaWiki, to
develop an adaptation layer for collation, that will support transparent change
for another PHP integration module, or a later integrtion within PHP core
itself).

The second thing to look for (and that is still missing) is a support for a
ICU-like project (or port) for Javascript (for integration on the client-side,
in the browser) with here also an Javascript-written adapter layer, that allows
replacement of the Javascript-written collator by some future API supported
natively by browsers (because it will perform much better).

The best integration tools (for client-side collation) that I have seen, using
Javascript, fully depends on AJAX (i.e. with collaboration with
serverside-scripts that can provide precomputed collation data, or that can
compute the collation keys from the client-provided texts): some interesting
demos use JSON requests or XML requests though AJAX, but this adds some delays
and increases the number of HTTP requests needed to sort lots of client-side
data (for example when sorting the rendered HTML table columns, which currently
just uses the Javascript "localeCompare" function which seems to use only the
DUCET or some locale-neutral collation, without taking into account the actual
locale).

It would be much better if all major browser engines (for IE, Mozilla for
Firefox, Wekbit for Safari/Chrome/KHTML) decided to extend the very poor
support of Unicode and locales within Javascript/ECMAScript strings, using ICU
as a base foundation or at least for the services API that it can implement
(even if those browsers use different integration strategies): they should
still support the same collation rules, with the same syntax in a similar
language, such as the languages already documented in the Unicode standard,
including the possibility to use the collation tailoring data already coming
the CLDR project, and the possibility for these implementations to still
support user-specified tailorings (so without hardcoding them in a way that
would completely depend on the implemented Unicode version and the limited list
of locales already supported by CLDR).

There are two standard languages defined for collation tailorings : one is
XML-based but is extremely verbose (it is probably easier to use from a
DOM-based object view, and most probably more efficient at run-time), another
equivalent one is much more compact and more readable and much easier to
specify by users or in scripts. Both syntaxes are automatically and easily
convertible between each other, with equivalent compilation times and
complexities, but the comapct form is easier to transmit in small scripts over
HTTP (including through AJAX), and the compact form is much faster to parse as
it won't depend on a ressource-hungry XML parser (due to its required and
complex conformance rules). For Javascript clients, a JSON-based syntax for
collation tailorings may also be even more efficient without requiring
additional complex code written in Javascript itself.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to