Fwd: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
> Well my first feeling was that U+202F should work all the time, but I
> found cases where this is not always the case. So this must be bugs in
> those renderers.
>

I think we can attribute these bugs to the fact that this character is
insufficiently known, and not even accessible in most input tools...
including the Windows "Charmap" where it is not even listed with other
spaces or punctuations, except if we display the FULL list of characters
supported by a selected font that maps it (many fonts don't map it) and the
"Unicode" encoding. Windows charmap is so outdated (and has many
inconsistancies in its proposed grouping, look for example at the groups
proposed for Greek, they are complete non-sense, with duplicate subranges,
but groups made completely arbitrarily, making this basic tool really
difficult to use).

And beside that, all the input methods proposed in Windows still don't
offer it (this is also true on other platforms). So finally there are not
enough text to render with it, and renderers are not fixed to render it
correctly, developers think there's no emergency and that this bug is
minor, it can stay for years without ever being corrected (just like with
the old "Charmap" on Windows) even if such bug or omission was signaled
repeatedly.

This finally tends to perpetuate the old bad practices (and this is what
happened with ASCII speading everywhere even in scopes where it should not
have been used at all and certainly not selected as the only viable
alternative, the same is seen today with the choice of languages/locales,
where everything that is not English is minored as non-important for users).


Re: Numeric group separators and Bidi

2019-07-09 Thread Egmont Koblinger via Unicode
On Tue, Jul 9, 2019 at 10:43 PM Philippe Verdy  wrote:
>
> Well my first feeling was that U+202F should work all the time, but I found 
> cases where this is not always the case. So this must be bugs in those 
> renderers.

Could you share some concrete examples?


Re: Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
Well my first feeling was that U+202F should work all the time, but I found
cases where this is not always the case. So this must be bugs in those
renderers.

And using Bidi controls (LRI/BDI) is absolutely not an option. These
controls are only intended to be used in pure plain-text files that have no
other ways to specify the embedding, and whose content is entirely static
(no generated by templates that return data from unspecified locales to an
unspecified locale).

As well the option of localizing each item is not possible. That's why I
search a locale-neutral solution that is acceptable in all languages, and
does not give false interpretation on the actual values of numbers (which
can have different scales or precision, and with also optional data, not
always present in all items to render but added to the list, for example as
annotations that should still be as locale-neutral as possible).

So U+202F is supposed to the the solution, but I did not find any way to
properly present the decimal separator: it is only unambiguous as a decimal
separator (and not a group separator) if there's a group separator present
in the number (and this is not always true!) And there I'm stuck with the
dot or comma, with no appropriate symbol that would not be confusable (may
be the small vertical tick hanging from the baseline could replace both the
dot and the comma?).



Le mar. 9 juil. 2019 à 22:10, Egmont Koblinger  a écrit :

> Hi Philippe,
>
> What do you mean U+202F doesn't work fo you?
>
> Whereas the logical string "hebrew 123456 hebrew" indeed shows
> the number incorrectly as "456 123", it's not the case with U+202F
> instead of space, then the number shows up as "123 456" as expected.
>
> I think you need to pick a character whose BiDi class is "Common
> Number Separator", see e.g.
> https://www.compart.com/en/unicode/bidiclass/CS for a list of such
> characters including U+00A0 no-break space and U+202F narrow no-break
> space. This suggests to me that U+202F is a correct choice if you need
> the look of a narrow space.
>
> Another possibility is to embed the number in a LRI...PDI block, as
> e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%"
> fragment of its default example.
>
> cheers,
> egmont
>
> On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode
>  wrote:
> >
> > Is there a narrow space usable as a numeric group separator, and that
> also has the same bidi property as digits (i.e. neutral outside the span of
> digits and separators, but inheriting the implied directionality of the
> previous digit) ?
> >
> > I can't find a way to use narrow spaces instead of punctuation signs
> (dot or comma) for example in Arabic/Hebrew, for example to present tabular
> numeric data in a really language-neutral way. In Arabic/Hebrew we need to
> use punctuations as group separators because spaces don't work (not even
> the narrow non-breaking space U+202F used in French and recommended in
> ISO), but then these punctuation separators are interpreted differently
> (notably between French and English where the interpretation dot and comma
> are swapped)
> >
> > Note that:
> > - the "figure space" is not suitable (as it has the same width as digits
> and is used as a "filler" in tabular data; but it also does not have the
> correct bidi behavior, as it does not have the same bidi properties as
> digits).
> > - the "thin space" is not suitable (it is breakable)
> > - the "narrow non-breaking space" U+202F (used in French and currently
> in ISO) is not suitable, or may be I'm wrong and its presence is still
> neutral between groups of digits where it inherits the properties of the
> previous digit, but still does not enforces the bidi direction of the whole
> span of digits.
> >
> > Can you point me if U+202F is really suitable ? I made some tests with
> various text renderers, and some of them "break" the group of digits by
> reordering these groups, changing completely the rendered value (units
> become thousands or more, and thousands become units...). But may be these
> are bugs in renderers.
> >
>


Re: Numeric group separators and Bidi

2019-07-09 Thread Egmont Koblinger via Unicode
Hi Philippe,

What do you mean U+202F doesn't work fo you?

Whereas the logical string "hebrew 123456 hebrew" indeed shows
the number incorrectly as "456 123", it's not the case with U+202F
instead of space, then the number shows up as "123 456" as expected.

I think you need to pick a character whose BiDi class is "Common
Number Separator", see e.g.
https://www.compart.com/en/unicode/bidiclass/CS for a list of such
characters including U+00A0 no-break space and U+202F narrow no-break
space. This suggests to me that U+202F is a correct choice if you need
the look of a narrow space.

Another possibility is to embed the number in a LRI...PDI block, as
e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%"
fragment of its default example.

cheers,
egmont

On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode
 wrote:
>
> Is there a narrow space usable as a numeric group separator, and that also 
> has the same bidi property as digits (i.e. neutral outside the span of digits 
> and separators, but inheriting the implied directionality of the previous 
> digit) ?
>
> I can't find a way to use narrow spaces instead of punctuation signs (dot or 
> comma) for example in Arabic/Hebrew, for example to present tabular numeric 
> data in a really language-neutral way. In Arabic/Hebrew we need to use 
> punctuations as group separators because spaces don't work (not even the 
> narrow non-breaking space U+202F used in French and recommended in ISO), but 
> then these punctuation separators are interpreted differently (notably 
> between French and English where the interpretation dot and comma are swapped)
>
> Note that:
> - the "figure space" is not suitable (as it has the same width as digits and 
> is used as a "filler" in tabular data; but it also does not have the correct 
> bidi behavior, as it does not have the same bidi properties as digits).
> - the "thin space" is not suitable (it is breakable)
> - the "narrow non-breaking space" U+202F (used in French and currently in 
> ISO) is not suitable, or may be I'm wrong and its presence is still neutral 
> between groups of digits where it inherits the properties of the previous 
> digit, but still does not enforces the bidi direction of the whole span of 
> digits.
>
> Can you point me if U+202F is really suitable ? I made some tests with 
> various text renderers, and some of them "break" the group of digits by 
> reordering these groups, changing completely the rendered value (units become 
> thousands or more, and thousands become units...). But may be these are bugs 
> in renderers.
>



Re: Numeric group separators and Bidi

2019-07-09 Thread Eli Zaretskii via Unicode
> Date: Tue, 9 Jul 2019 20:59:15 +0200
> From: Philippe Verdy via Unicode 
> 
> I can't find a way to use narrow spaces instead of punctuation signs (dot or 
> comma) for example in
> Arabic/Hebrew, for example to present tabular numeric data in a really 
> language-neutral way. In Arabic/Hebrew
> we need to use punctuations as group separators because spaces don't work 
> (not even the narrow
> non-breaking space U+202F used in French and recommended in ISO), but then 
> these punctuation
> separators are interpreted differently (notably between French and English 
> where the interpretation dot and
> comma are swapped)

Please show an example and describe how would you like it to look on
display.  I don't think I understand the use case(s).


Numeric group separators and Bidi

2019-07-09 Thread Philippe Verdy via Unicode
Is there a narrow space usable as a numeric group separator, and that also
has the same bidi property as digits (i.e. neutral outside the span of
digits and separators, but inheriting the implied directionality of the
previous digit) ?

I can't find a way to use narrow spaces instead of punctuation signs (dot
or comma) for example in Arabic/Hebrew, for example to present tabular
numeric data in a really language-neutral way. In Arabic/Hebrew we need to
use punctuations as group separators because spaces don't work (not even
the narrow non-breaking space U+202F used in French and recommended in
ISO), but then these punctuation separators are interpreted differently
(notably between French and English where the interpretation dot and comma
are swapped)

Note that:
- the "figure space" is not suitable (as it has the same width as digits
and is used as a "filler" in tabular data; but it also does not have the
correct bidi behavior, as it does not have the same bidi properties as
digits).
- the "thin space" is not suitable (it is breakable)
- the "narrow non-breaking space" U+202F (used in French and currently in
ISO) is not suitable, or may be I'm wrong and its presence is still neutral
between groups of digits where it inherits the properties of the previous
digit, but still does not enforces the bidi direction of the whole span of
digits.

Can you point me if U+202F is really suitable ? I made some tests with
various text renderers, and some of them "break" the group of digits by
reordering these groups, changing completely the rendered value (units
become thousands or more, and thousands become units...). But may be these
are bugs in renderers.