Re: [sldev] Looking at I18N formatting standards

Philippe Bossut (Merov Linden) Fri, 20 Feb 2009 15:51:37 -0800

Hi,

Aaaah! Plural, gender, etc... Right. As you know, ICU provides aservice for plural form (http://icu-project.org/userguide/formatMessages.html#CF). Since your example for message formatting comes from the same page,I suppose you already know about that. :)

gettext also covers the issues, even including cases where you havemore than 2 forms for plurality (e.g. some languages like arabic havea "dual" used for "2 things"). So it's more complicated than just"one" and "many".

In general, for forms that depend of the cardinal of a value, the bestis to have a selector in the code and a plurality of strings: 2, 3 ormore depending of the courage of the coder and the criticality of thetranslation.

For gender, it's even more complex as most languages have 2 or 3genders (adding a "neutral" like in german). And then, you have thecombination of gender and plural in case the language have specialgrammatical link rules (like French). For instance, a message like:"%(number)d %(thing)s have been selected" will require no less than 4variations since "selected" will have to be written "sélectionné","sélectionnés", "sélectionnée" or "sélectionnées" depending on(number) and the gender of (thing). That beats your "nuevo/nueva"example. Would have you guessed that "selected" was influenced by thearguments (number) and (thing)? And French is easy compared to Germanor Baltic or Slavic languages. Not to mention Asian languages...

Even better, in your "nuevo/nueva" example, imagine that, in French,you have 3 forms "nouvel/nouveau/nouvelle". "nouvel" is used if[ITEMTYPE] is masculine and its first letter is a vowel. e.g.:

- un nouvel objet
- un nouveau programme

Aaaagh...

Frankly, I don't think it's possible to spell all the possibilitiesfor all possible languages. The best I think is to use caution whencreating the strings. Some rules of thumb:- never concatenate strings: that's just plain evil as it assumes somegeneral grammar- use as little arguments as possible in any string: because of thecombinatory explosions here above mentioned and because it'scommensurate to string concatenation- when using arguments, simply "print" the argument separating it withpunctuation from the sentence (i.e. don't make it part of thegrammatical meaning) e.g. "Number of files: [NUMBER, integer]"- use multiple strings and a selector when stuck: don't try to be toosmart creating complex argument lists

Then, it's up to the translator to be a little smart and use his/hernative language artfully to avoid the pitfalls. In your example forinstance, here's what I'd do in French:

"Vous avez reçu un '[ITEMTYPE]' de la part de '[NAME]' il y a[DAYS,integer] jour(s)"


At least, that's how I'll translate that into French if I had to :)

Cheers,
- Merov


On Feb 20, 2009, at 2:16 PM, Steve Bennetts (Steve Linden) wrote:

Great feedback, thanks!
One other issue I've been thinking about: how to handlepluralization and gender. For example:
"[NAME] gave you a new [ITEMTYPE] [DAYS,integer] days ago."

There are 3 potential problems here:
1. [NAME] gave - 'gave' might vary based on gender or familiarity.This is pretty much impossible to solve since there is no practicalway to know the gender or relationship of, say, "M Linden".
2. new [ITEMTYPE] - 'new' might vary based the gender of ITEMTYPE.In this case we could specify the gender, since ITEMTYPE ispresumably in a localized table somewhere. We could do somethinglike '[ITEMTYPE] [nuevo|nueva,gender(ITEMTYPE)]'. Has anyone seenanything like this before?
3. [DAYS,integer] days - we see this problem in English all the time"1 days ago". We could do something similar to the above example:'[DAYS,integer] [day|days,plural(DAYS)]. Again, any good referencesfor this sort of thing?
Thanks,
-Steve



Philippe Bossut (Merov Linden) wrote:
Hi,
As someone who did i18n/l10n in a former project (and even didtranslations from English to French...), here's my comments on thissubject:
i18n (internationalization):
On Feb 17, 2009, at 11:48 AM, Steve Linden wrote:
The I18N dev team is going to be tackling date, time, number, andcurrency localization issues in the next couple of quarters. Weare looking at existing standards for replacing text inside amessage and want to cover as many as possible before making adecision. Some possibilities that we are looking at include ICUand XSLT. If anyone on this list is familiar with any other goodoptions, please reply to this thread.
- ICU is great! It uses the Olson tables for date/time locale andTime zone sensitive formating. Time zone support in particular canbe mind blowing. Don't underestimate this and think you can do yourown home brew "simple" version: TZ support is anything butsimple... ICU is by far the best here.- Make sure you support primary and secondary locales as lots ofpeople use 2 (a primary and a fallback).- Make sure you support the country flavors (e.g. fr_CA, fr_BE,etc...). Beware of its infuence in data formating (use of "."instead of "," for decimal separator for instance)- You didn't mention "sorting" in your list. That's also a serviceprovided by ICU and should be used when presenting lists to users(and we've plenty of this in SL)- There's also a Python version of ICU (PyICU) which can proveuseful considering we've quite a bit of Python code floating around(though none with user facing strings... yet...)- What about providing l10n for LSL? (/me ducks...) Seriously,that'd be really cool...
l10n (localization):
I am not particularly fond of indexed substitutions, I prefer name/value pairs, because it gives the translator a little morecontext, i.e. it is easier for a translator to look at "At [TIME]on [DATE], there was [EVENT] on planet [PLANET]" then "At {1,time}on {1,date}, there was {2} on planet{0,number,integer}."
Our current compromise proposal would look something like this:

std::string bar(const LLSD& sdargs)
{
LLUIString foo = getString("bar"); // bar = "At [DATE,time] on[DATE,date], there was [EVENT] on planet [PLANET,integer]";
    foo.setLLSDArgs(sdargs);
    return foo.getString();
}
+1 on (name/value) pairs in the code and big -1 on indexedsubstitutions. As a localizer, the less guess work I have to doabout the context of a string, the faster I can get a translationout. I don't really care about the format that much and yourexample could easily be reordered in French as:"[EVENT] a eu lieu sur [PLANET,integer] le [DATE,date] �[DATE,time]"
If you think however to localize Python scripts also, you may wantto use Python syntax though rather than your own, i.e.:"At %(time)s on %(date)s, there was an %(event)s on planet %(planet)d"
But, heck, again, I've no religion here.
One question: which translation tool will be available totranslators? I used poedit in the past (http://www.poedit.net/) andit's pretty handy. That also opens the door for sldev communitymembers to participate in the localization process. Of course, thatsupposes that there's a tool to convert SL resources to the .poformat and back. Any plan for doing this?
Cheers,
- Merov



_______________________________________________
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/SLDev
Please read the policies before posting to keep unmoderated postingprivileges

_______________________________________________
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/SLDev
Please read the policies before posting to keep unmoderated posting privileges

Re: [sldev] Looking at I18N formatting standards

Reply via email to