Hi,
Aaaah! Plural, gender, etc... Right. As you know, ICU provides a
service for plural form (http://icu-project.org/userguide/formatMessages.html#CF
). Since your example for message formatting comes from the same page,
I suppose you already know about that. :)
gettext also covers the issues, even including cases where you have
more than 2 forms for plurality (e.g. some languages like arabic have
a "dual" used for "2 things"). So it's more complicated than just
"one" and "many".
In general, for forms that depend of the cardinal of a value, the best
is to have a selector in the code and a plurality of strings: 2, 3 or
more depending of the courage of the coder and the criticality of the
translation.
For gender, it's even more complex as most languages have 2 or 3
genders (adding a "neutral" like in german). And then, you have the
combination of gender and plural in case the language have special
grammatical link rules (like French). For instance, a message like:
"%(number)d %(thing)s have been selected" will require no less than 4
variations since "selected" will have to be written "sélectionné",
"sélectionnés", "sélectionnée" or "sélectionnées" depending on
(number) and the gender of (thing). That beats your "nuevo/nueva"
example. Would have you guessed that "selected" was influenced by the
arguments (number) and (thing)? And French is easy compared to German
or Baltic or Slavic languages. Not to mention Asian languages...
Even better, in your "nuevo/nueva" example, imagine that, in French,
you have 3 forms "nouvel/nouveau/nouvelle". "nouvel" is used if
[ITEMTYPE] is masculine and its first letter is a vowel. e.g.:
- un nouvel objet
- un nouveau programme
Aaaagh...
Frankly, I don't think it's possible to spell all the possibilities
for all possible languages. The best I think is to use caution when
creating the strings. Some rules of thumb:
- never concatenate strings: that's just plain evil as it assumes some
general grammar
- use as little arguments as possible in any string: because of the
combinatory explosions here above mentioned and because it's
commensurate to string concatenation
- when using arguments, simply "print" the argument separating it with
punctuation from the sentence (i.e. don't make it part of the
grammatical meaning) e.g. "Number of files: [NUMBER, integer]"
- use multiple strings and a selector when stuck: don't try to be too
smart creating complex argument lists
Then, it's up to the translator to be a little smart and use his/her
native language artfully to avoid the pitfalls. In your example for
instance, here's what I'd do in French:
"Vous avez reçu un '[ITEMTYPE]' de la part de '[NAME]' il y a
[DAYS,integer] jour(s)"
At least, that's how I'll translate that into French if I had to :)
Cheers,
- Merov
On Feb 20, 2009, at 2:16 PM, Steve Bennetts (Steve Linden) wrote:
Great feedback, thanks!
One other issue I've been thinking about: how to handle
pluralization and gender. For example:
"[NAME] gave you a new [ITEMTYPE] [DAYS,integer] days ago."
There are 3 potential problems here:
1. [NAME] gave - 'gave' might vary based on gender or familiarity.
This is pretty much impossible to solve since there is no practical
way to know the gender or relationship of, say, "M Linden".
2. new [ITEMTYPE] - 'new' might vary based the gender of ITEMTYPE.
In this case we could specify the gender, since ITEMTYPE is
presumably in a localized table somewhere. We could do something
like '[ITEMTYPE] [nuevo|nueva,gender(ITEMTYPE)]'. Has anyone seen
anything like this before?
3. [DAYS,integer] days - we see this problem in English all the time
"1 days ago". We could do something similar to the above example:
'[DAYS,integer] [day|days,plural(DAYS)]. Again, any good references
for this sort of thing?
Thanks,
-Steve
Philippe Bossut (Merov Linden) wrote:
Hi,
As someone who did i18n/l10n in a former project (and even did
translations from English to French...), here's my comments on this
subject:
i18n (internationalization):
On Feb 17, 2009, at 11:48 AM, Steve Linden wrote:
The I18N dev team is going to be tackling date, time, number, and
currency localization issues in the next couple of quarters. We
are looking at existing standards for replacing text inside a
message and want to cover as many as possible before making a
decision. Some possibilities that we are looking at include ICU
and XSLT. If anyone on this list is familiar with any other good
options, please reply to this thread.
- ICU is great! It uses the Olson tables for date/time locale and
Time zone sensitive formating. Time zone support in particular can
be mind blowing. Don't underestimate this and think you can do your
own home brew "simple" version: TZ support is anything but
simple... ICU is by far the best here.
- Make sure you support primary and secondary locales as lots of
people use 2 (a primary and a fallback).
- Make sure you support the country flavors (e.g. fr_CA, fr_BE,
etc...). Beware of its infuence in data formating (use of "."
instead of "," for decimal separator for instance)
- You didn't mention "sorting" in your list. That's also a service
provided by ICU and should be used when presenting lists to users
(and we've plenty of this in SL)
- There's also a Python version of ICU (PyICU) which can prove
useful considering we've quite a bit of Python code floating around
(though none with user facing strings... yet...)
- What about providing l10n for LSL? (/me ducks...) Seriously,
that'd be really cool...
l10n (localization):
I am not particularly fond of indexed substitutions, I prefer name/
value pairs, because it gives the translator a little more
context, i.e. it is easier for a translator to look at "At [TIME]
on [DATE], there was [EVENT] on planet [PLANET]" then "At {1,time}
on {1,date}, there was {2} on planet{0,number,integer}."
Our current compromise proposal would look something like this:
std::string bar(const LLSD& sdargs)
{
LLUIString foo = getString("bar"); // bar = "At [DATE,time] on
[DATE,date], there was [EVENT] on planet [PLANET,integer]";
foo.setLLSDArgs(sdargs);
return foo.getString();
}
+1 on (name/value) pairs in the code and big -1 on indexed
substitutions. As a localizer, the less guess work I have to do
about the context of a string, the faster I can get a translation
out. I don't really care about the format that much and your
example could easily be reordered in French as:
"[EVENT] a eu lieu sur [PLANET,integer] le [DATE,date] �
[DATE,time]"
If you think however to localize Python scripts also, you may want
to use Python syntax though rather than your own, i.e.:
"At %(time)s on %(date)s, there was an %(event)s on planet %
(planet)d"
But, heck, again, I've no religion here.
One question: which translation tool will be available to
translators? I used poedit in the past (http://www.poedit.net/) and
it's pretty handy. That also opens the door for sldev community
members to participate in the localization process. Of course, that
supposes that there's a tool to convert SL resources to the .po
format and back. Any plan for doing this?
Cheers,
- Merov
_______________________________________________
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/SLDev
Please read the policies before posting to keep unmoderated posting
privileges
_______________________________________________
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/SLDev
Please read the policies before posting to keep unmoderated posting privileges