https://bugzilla.wikimedia.org/show_bug.cgi?id=19190

Philippe Verdy <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #12 from Philippe Verdy <[email protected]> 2010-10-10 04:59:41 
UTC ---
"In the case PLURAL: we probably have to modify the way that is calculated on
the server via having a array representation ie( array('1-4':X, '5':Y,
'6-11':Z) (instead of having a php function with switch statements)"

Please note that some messages contain (and do need) several occurences of
PLURAL: with DISTINCT numeric values. Such messages are not necessarily
splittable into distinct ressources (due to language grammars, or the meaning
of translated FULL sentences, which may require reordering some items).

If all what Javascript has to do is to parse PLURAL: items, with their $n
parameters, I think this is not complicate to implement, because such parsing
will be extremely simple, provided that there's a policy about its presence and
encoding in messages. So it will just look like this basic regexp that all
javascript engines will handle correctly:

/\{\{PLURAL:\$([0-9]+)\|([^}]*)\}\}/

The only restriction being that the part between the pipe and the first closing
brace should not contain any wiki markup, or some characters like newlines, or
pipe or brace character (however these characters may be transmitted by the
server as numeric entities, if they are really present in the source wiki or
PHP code). Such policy is enforcable in translatable resources sent and
received to Translate.net, or by correct documentation of the messages to
translate.

Then the content of $2 (the texts between the first pipe and first closing
brace, should be splittable immediately on the pipe character into a basic
array.

The difficulty will be to implement the plural rules according to locale (which
value is a plural, and how many forms are needed : consider singular, plural,
dual, few, many, other...): how many locales does the japascript to support?

Can these rules be encoded in a way that Javascript can handle the plural rules
correctly for all locales supported by the Wiki ?

The same could be used for GENDER. The wiki cal also provide to the Javascript
the appropriate external data, along with the message, as additional
properties, without forcing the javascript to perform another AJAX or JSON
query to the server each time it detects messages containing a GENDER or PLURAL
subsitution function.

Another difficulty comes with plural forms that are causing change of
grammatical case (notably in Slavic languages), and which also depends on how
sentences containing these conditional plural forms are created: other parts of
the sentences may need to be changed. But it's impossible to predic which part
will be affected (notably if there are several GENDER or PLURAL occurences in
the message). Should we consider GRAMMAR ? Probably not for Javascript.

Finally there's the problem of wikis that use:
- multiple scripts (including Chinese for converting the simplified vs.
traditional ideographs). This requires a complex script to correctly handle the
dynamic message formatting (or character substitutions).
- RTL scripts (Hebrew, Arabic, ...), because they are in fact using a mix of
scripts. The correct rendering of formated messags often requires specific Bidi
control for embedding some variable items (this is really complex in the
presence of BiDi-neutral or weak characters, notably for final common
punctuations ; for example in a RTL wiki, a message that starts or end by Latin
letters (possibly in the variable part of the sentence) will cause these 
characters without strong directionality to be displayed at the wrong place, or
the whole sentence may appear broken or reordered, creating confusion.

Currently, MediaWiki does not handle BiDi gracefully, and offers no easy way to
support correct BiDi embedding of variable elements in the middle of a
sentence, and no easy way to restore the default directionality after this
variable part. Unicode offers BiDi controls, but they are NOT recommanded in
HTML, which should use <element dir=""> overrides, or CSS bidi properties.

The solution seems simple but it is not: before and after the variable parts of
the message in the same HTML block element, there needs to be some <span
dir=""> element to embed the static parts, but most ressources are not prepared
this way: this has to be done in Translatewiki.net when translating those
resources with variable positions whose content directionaly is ambiguous,
variable or unknown — for example user names, page names, native foreign
language names from {{language:}}, or ressources autotranslated via {{int:}} :
this affects in fact all wikis, including in English, not just those wikis with
a default RTL locale.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to