Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

Graydon Hoare Thu, 09 May 2013 17:42:03 -0700

On 13-05-09 04:26 AM, Matthieu Monrocq wrote:

> However, I am not too sure about the idea of string -> string mapping.
> The example you give here is actually slightly more complicated because
> there are several orthogonal axes:


Hm. I think you're missing what I mean. I mean that the interface --
literally the localization-library-interface we're going to be talking
to on a given OS -- takes a string and returns a string. And
translations are stored in string->string maps. And edited on websites
and with tools that store a translated string as another string.

I'm not a translation expert by any means, I'm just trying to
reverse-engineer their requirements. And I think you're misunderstanding
them. The "translation produces a single string" model is, I think,
wired into all the tooling.

And more importantly (see below...)

> My point is, therefore, that even a seemingly innocent looking sentence
> like this one actually turns into a monster:
> 
>   "{0} {1, select, singular {{2, select, female {est allée} other {est
> allé}}}, other {{2, select, female {sont allées} other {sont allés}}}}
> {3, select, singular {{4, female {à la} other {au}}} other {aux}} {5}"
> 
>   (note: I apologize if the { and } are mismatched... I gave up)

Ok, three things to note here:

  - Any expression of that conditional logic is going to be ugly,
    but it is actually required for the translator to give an
    accurate translation.

  - The odds are that not all those values will be runtime-variable;
    the parts that aren't can be directly translated. The switching
    is _just_ to defer a decision to runtime based on the provided
    substitution value.

  - The important part: you can't ask a translator to express this
    "as rust code" because the _locale_ is also a runtime setting;
    that is, the translation string is evaluated at runtime
    based on whatever-gettext()-returns. The programmer cannot
    accommodate the translator's switch-logic because it is neither
    static (locale varies at runtime) nor will be it be the same
    between locales (logical structure varies with locale).

I am not trying to be obtuse, just figure out why translators have come
up with this system and what we need to preserve about it. As far as I
can tell, the "balance" between runtime and compile-time variability is
the key factor. So any example has to be very careful to reason about
which things vary and which are constant.

> However, even that example is a bit... too simple. Gender is not
> universal, English people talk about "a table" (neutral) whilst French
> people talk about "une table" (feminine) and German talk about "der
> Tisch" (masculin)... so the programmer cannot indicate whether the word
> is feminine or not: it depends on the target language!

That assumes you're talking about a runtime-provided noun being slotted
into a runtime-provided format string. It's of course possible this
could happen, but it's a bit of a corner case within corner cases. The
case I think the gender-selectors are designed for are those where
you're presenting a runtime-variable _person_ in a message (eg. an email
program or such). And you can pass their gender (assuming they want to
use one of the gender-binary words for it) as a value directly to the
formatter.

A seemingly-good and short-ish slide deck on this is available here. I
recommend reading it:

https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000#slide=id.g1bc43a82_2_14

Especially the "non-goals". There's a limit. They just want to hit the
majority of cases. "Handle gender - at least for people".

> It seems to me that given the extraordinary complexity that is lurking here:
> 
>  - either you end up with a complicated micro-syntax that you'll have to
> keep buffing up as you discover corner cases in various languages and
> translators keep complaining they cannot do their job.

I think you're overstating it. This is a problem people have been
struggling with for a long time, but have worked their way towards a
_reasonable_ solution that isn't impossibly complex. There's a
simplified implementation of it here:

https://github.com/SlexAxton/messageformat.js

>  - or you just decouple formatting from translation, and provide a
> separate library for translation (outside of core, most probably)

Layering it might work. I'm not opposed to that. I just thought it worth
looking over the problem space and considering whether it's "too hard"
to support localization from the get-go, and/or whether there'd be any
advantage to combining the design of the two parts. It's pretty
important. We're going to want to localize rustc, and most other things
we write in rust.

-Graydon

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

Reply via email to