Fwd: Numeric group separators and Bidi
> Well my first feeling was that U+202F should work all the time, but I > found cases where this is not always the case. So this must be bugs in > those renderers. > I think we can attribute these bugs to the fact that this character is insufficiently known, and not even accessible in most input tools... including the Windows "Charmap" where it is not even listed with other spaces or punctuations, except if we display the FULL list of characters supported by a selected font that maps it (many fonts don't map it) and the "Unicode" encoding. Windows charmap is so outdated (and has many inconsistancies in its proposed grouping, look for example at the groups proposed for Greek, they are complete non-sense, with duplicate subranges, but groups made completely arbitrarily, making this basic tool really difficult to use). And beside that, all the input methods proposed in Windows still don't offer it (this is also true on other platforms). So finally there are not enough text to render with it, and renderers are not fixed to render it correctly, developers think there's no emergency and that this bug is minor, it can stay for years without ever being corrected (just like with the old "Charmap" on Windows) even if such bug or omission was signaled repeatedly. This finally tends to perpetuate the old bad practices (and this is what happened with ASCII speading everywhere even in scopes where it should not have been used at all and certainly not selected as the only viable alternative, the same is seen today with the choice of languages/locales, where everything that is not English is minored as non-important for users).
Re: Numeric group separators and Bidi
On Tue, Jul 9, 2019 at 10:43 PM Philippe Verdy wrote: > > Well my first feeling was that U+202F should work all the time, but I found > cases where this is not always the case. So this must be bugs in those > renderers. Could you share some concrete examples?
Re: Numeric group separators and Bidi
Well my first feeling was that U+202F should work all the time, but I found cases where this is not always the case. So this must be bugs in those renderers. And using Bidi controls (LRI/BDI) is absolutely not an option. These controls are only intended to be used in pure plain-text files that have no other ways to specify the embedding, and whose content is entirely static (no generated by templates that return data from unspecified locales to an unspecified locale). As well the option of localizing each item is not possible. That's why I search a locale-neutral solution that is acceptable in all languages, and does not give false interpretation on the actual values of numbers (which can have different scales or precision, and with also optional data, not always present in all items to render but added to the list, for example as annotations that should still be as locale-neutral as possible). So U+202F is supposed to the the solution, but I did not find any way to properly present the decimal separator: it is only unambiguous as a decimal separator (and not a group separator) if there's a group separator present in the number (and this is not always true!) And there I'm stuck with the dot or comma, with no appropriate symbol that would not be confusable (may be the small vertical tick hanging from the baseline could replace both the dot and the comma?). Le mar. 9 juil. 2019 à 22:10, Egmont Koblinger a écrit : > Hi Philippe, > > What do you mean U+202F doesn't work fo you? > > Whereas the logical string "hebrew 123456 hebrew" indeed shows > the number incorrectly as "456 123", it's not the case with U+202F > instead of space, then the number shows up as "123 456" as expected. > > I think you need to pick a character whose BiDi class is "Common > Number Separator", see e.g. > https://www.compart.com/en/unicode/bidiclass/CS for a list of such > characters including U+00A0 no-break space and U+202F narrow no-break > space. This suggests to me that U+202F is a correct choice if you need > the look of a narrow space. > > Another possibility is to embed the number in a LRI...PDI block, as > e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%" > fragment of its default example. > > cheers, > egmont > > On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode > wrote: > > > > Is there a narrow space usable as a numeric group separator, and that > also has the same bidi property as digits (i.e. neutral outside the span of > digits and separators, but inheriting the implied directionality of the > previous digit) ? > > > > I can't find a way to use narrow spaces instead of punctuation signs > (dot or comma) for example in Arabic/Hebrew, for example to present tabular > numeric data in a really language-neutral way. In Arabic/Hebrew we need to > use punctuations as group separators because spaces don't work (not even > the narrow non-breaking space U+202F used in French and recommended in > ISO), but then these punctuation separators are interpreted differently > (notably between French and English where the interpretation dot and comma > are swapped) > > > > Note that: > > - the "figure space" is not suitable (as it has the same width as digits > and is used as a "filler" in tabular data; but it also does not have the > correct bidi behavior, as it does not have the same bidi properties as > digits). > > - the "thin space" is not suitable (it is breakable) > > - the "narrow non-breaking space" U+202F (used in French and currently > in ISO) is not suitable, or may be I'm wrong and its presence is still > neutral between groups of digits where it inherits the properties of the > previous digit, but still does not enforces the bidi direction of the whole > span of digits. > > > > Can you point me if U+202F is really suitable ? I made some tests with > various text renderers, and some of them "break" the group of digits by > reordering these groups, changing completely the rendered value (units > become thousands or more, and thousands become units...). But may be these > are bugs in renderers. > > >
Re: Numeric group separators and Bidi
Hi Philippe, What do you mean U+202F doesn't work fo you? Whereas the logical string "hebrew 123456 hebrew" indeed shows the number incorrectly as "456 123", it's not the case with U+202F instead of space, then the number shows up as "123 456" as expected. I think you need to pick a character whose BiDi class is "Common Number Separator", see e.g. https://www.compart.com/en/unicode/bidiclass/CS for a list of such characters including U+00A0 no-break space and U+202F narrow no-break space. This suggests to me that U+202F is a correct choice if you need the look of a narrow space. Another possibility is to embed the number in a LRI...PDI block, as e.g. https://unicode.org/cldr/utility/bidic.jsp does with the "1–3%" fragment of its default example. cheers, egmont On Tue, Jul 9, 2019 at 9:01 PM Philippe Verdy via Unicode wrote: > > Is there a narrow space usable as a numeric group separator, and that also > has the same bidi property as digits (i.e. neutral outside the span of digits > and separators, but inheriting the implied directionality of the previous > digit) ? > > I can't find a way to use narrow spaces instead of punctuation signs (dot or > comma) for example in Arabic/Hebrew, for example to present tabular numeric > data in a really language-neutral way. In Arabic/Hebrew we need to use > punctuations as group separators because spaces don't work (not even the > narrow non-breaking space U+202F used in French and recommended in ISO), but > then these punctuation separators are interpreted differently (notably > between French and English where the interpretation dot and comma are swapped) > > Note that: > - the "figure space" is not suitable (as it has the same width as digits and > is used as a "filler" in tabular data; but it also does not have the correct > bidi behavior, as it does not have the same bidi properties as digits). > - the "thin space" is not suitable (it is breakable) > - the "narrow non-breaking space" U+202F (used in French and currently in > ISO) is not suitable, or may be I'm wrong and its presence is still neutral > between groups of digits where it inherits the properties of the previous > digit, but still does not enforces the bidi direction of the whole span of > digits. > > Can you point me if U+202F is really suitable ? I made some tests with > various text renderers, and some of them "break" the group of digits by > reordering these groups, changing completely the rendered value (units become > thousands or more, and thousands become units...). But may be these are bugs > in renderers. >
Re: Numeric group separators and Bidi
> Date: Tue, 9 Jul 2019 20:59:15 +0200 > From: Philippe Verdy via Unicode > > I can't find a way to use narrow spaces instead of punctuation signs (dot or > comma) for example in > Arabic/Hebrew, for example to present tabular numeric data in a really > language-neutral way. In Arabic/Hebrew > we need to use punctuations as group separators because spaces don't work > (not even the narrow > non-breaking space U+202F used in French and recommended in ISO), but then > these punctuation > separators are interpreted differently (notably between French and English > where the interpretation dot and > comma are swapped) Please show an example and describe how would you like it to look on display. I don't think I understand the use case(s).
Numeric group separators and Bidi
Is there a narrow space usable as a numeric group separator, and that also has the same bidi property as digits (i.e. neutral outside the span of digits and separators, but inheriting the implied directionality of the previous digit) ? I can't find a way to use narrow spaces instead of punctuation signs (dot or comma) for example in Arabic/Hebrew, for example to present tabular numeric data in a really language-neutral way. In Arabic/Hebrew we need to use punctuations as group separators because spaces don't work (not even the narrow non-breaking space U+202F used in French and recommended in ISO), but then these punctuation separators are interpreted differently (notably between French and English where the interpretation dot and comma are swapped) Note that: - the "figure space" is not suitable (as it has the same width as digits and is used as a "filler" in tabular data; but it also does not have the correct bidi behavior, as it does not have the same bidi properties as digits). - the "thin space" is not suitable (it is breakable) - the "narrow non-breaking space" U+202F (used in French and currently in ISO) is not suitable, or may be I'm wrong and its presence is still neutral between groups of digits where it inherits the properties of the previous digit, but still does not enforces the bidi direction of the whole span of digits. Can you point me if U+202F is really suitable ? I made some tests with various text renderers, and some of them "break" the group of digits by reordering these groups, changing completely the rendered value (units become thousands or more, and thousands become units...). But may be these are bugs in renderers.