Re: Egyptian Hieroglyph Man with a Laptop

2020-02-13 Thread Frédéric Grosshans via Unicode

Le 12/02/2020 à 23:30, Michel Suignard a écrit :


Interesting that a single character is creating so much feedback, but 
it is not the first time.


Extrapolating from my own case, I guess it’s because hieroglyphs have a 
strong cultural significance — especially to people following unicode 
encoding — but that very few are qualified enough to emit a judgement, 
except maybe for this character.



It is true that the glyph in question was not in the base 
Hieroglyphica glyph set (that is why I referenced it as an 
'extension'). Its presence though raises an interesting point 
concerning abstraction of Egyptian hieroglyphs in general. All 
Egyptian hieroglyphs proposals imply some abstraction from the 
original evidences found on stone, wood, papyrus. At some point you 
have to decide some level where you feel confident that you created 
enough glyphs to allow meaningful interaction among Egyptologists. 
Because the set represents an extinct system you probably have to be a 
bit liberal in allowing some visual variants (because we can never be 
completely sure two similar looking signs are 100% equivalent in all 
their possible functions in the writing system and are never used in 
contrast).


This is clearly a problem difficult to tackle, with both extinct and 
logographic script, and hieroglyphics is both. It is obvious to me (and 
probably to anyone following unicode encoding) that the work you have 
been doing over the last few tear is a very difficult one. By the way, 
you expalin this approach very well explained on page 6, when you take 
the “disunification” on *U+14828 N-19-016 and the already encoded 
U+1321A N037A (Which would be N-19-017)


These abstract collections have started to appear in the first part of 
the nineteen century (Champollion starting in 1822). Interestingly 
these collections have started to be useful on their own even if in 
some case the main use of  parts is self-referencing, either because 
the glyph is a known mistake, or a ghost (character for which 
attestation is now firmly disputed). For example, it would be very 
difficult to create a new set not including the full Gardiner set, 
even if some of the characters are not necessarily justified. To a 
large degree, Hieroglyphica (and its related collection JSesh) has 
obtained that status as well. The IFAO (Institut Français 
d’Archéologie Orientatle) set is another one, although there is no 
modern font representing all of it (although many of the IFAO glyphs 
should not be encoded separately).


I  see this as variant of the “round-trip compatibility” principle of 
unicode adapted to ancient scripts, where the role of “legacy standards” 
is often taken by old scholarly litterature.



There is obviously no doubt that the character in question is a modern 
invention and not based on historical evidence. But interestingly 
enough it has started to be used as a pictogram with some content 
value, describing in fact an Egyptologist. It may not belong to that 
block, but it actually describes an use case and has been used a 
symbol in some technical publication.


I think the main problem I see with this character is that it seems to 
be sneaked in the main proposal. The text of the proposal seems to imply 
that the charcters proposed where either in use in ancient egypt or 
correspond to abstractions used by modern (=Champollion and later) 
egyptologists intended to reflect them.


This character does not fit in this picture, but that does not mean it 
does not belong to the hieroglyphic bloc: I think modern use of 
hieroglyphs (like e.g. the ones described in Hieroglyphs For Your Eyes 
Only: Samuel K. Lothrop and His Use of Ancient Egyptian as Cipher, by 
Pierre//http://www.mesoweb.com/articles/meyrat/Meyrat2014.pdf, 2014) 
should use the standard unicode encoding. There is a precedent in 
encoding modern characters in an extinct script with the encoding of 
Tolkienian characters U+16F1 to U+16F3 in the Runic block.


But I feel the encoding of such a character needs at the very to be 
explicitly discussed in the text of the proposal., e.g. by giving 
evidence of its modern use.


Concerning:

The question is then: was this well known about people reading 
hieroglyphs who checked this proposal? If not, it is very difficult to 
trust other hieroglyphs, especially if the first explanation is the good


one: some trap characters could actually look like real ones. Except 
of course if we accept some hieroglyphs for compatibility purpose, but 
this is not mentioned as a valid reason in any propoal yet.


> In my opinion, this is an invalid character, which should not be

> included in Unicode.

I agree.

You are allowed to have your own opinion, but I can tell you I have 
spent a lot of times checking attestation from many sources for the 
proposed repertoire. It won’t be perfect, but perfection (or a closer 
reach) would probably cost decades in study while preventing current 
research to have a communication platform. I 

Re: Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Frédéric Grosshans via Unicode

Le 12/02/2020 à 20:38, Marius Spix a écrit :

That is a pretty interesting finding. This glyph was not part of
http://www.unicode.org/L2/L2018/18165-n4944-hieroglyphs.pdf


It is, as *U+1355A EGYPTIAN HIEROGLYPH A-12-051



but has been first seen in
http://www.unicode.org/L2/L2019/19220-n5063-hieroglyphs.pdf

The only "evidence" for this glyph I could find, is a stock photo,
which is clearly made in the 21th century.
https://www.alamy.com/stock-photo-egyptian-hieroglyphics-with-notebook-digital-illustration-57472465.html
I don’t even think it could qualify, since I think the woman in this 
picture would correspond to another hieroglyph, from the B series 
(B-04), not a A-12.


I know, that some font creators include so-called trap characters,
similar to trap streets which are often found in maps to catch copyright
violations. But it is also possible that the someone wanted to smuggle
an easter-egg into Unicode or just test if the quality assurance works.


The question is then: was this well known about people reading 
hieroglyphs who checked this proposal? If not, it is very difficult to 
trust other hieroglyphs, especially if the first explanation is the good 
one: some trap characters could actually look like real ones. Except of 
course if we accept some hieroglyphs for compatibility purpose, but this 
is not mentioned as a valid reason in any propoal yet.



In my opinion, this is an invalid character, which should not be
included in Unicode.


I agree.

  Frédéric



On Thu, 12 Feb 2020 19:12:14 +0100
Frédéric Grosshans via Unicode  wrote:


Dear Unicode list members (CC Michel Suignard),

    the Unicode proposal L2/20-068
<https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf>,
“Revised draft for the encoding of an extended Egyptian Hieroglyphs
repertoire, Groups A to N” (
https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf ) by
Michel Suignard contains a very interesting hieroglyph at position
*U+13579 EGYPTIAN HIEROGLYPH A-12-054, which seems to represent a man
with a laptop, as can be obvious in the attached image.

    I am curious about the source of this hieroglyph: in the table
acompannying the document, its sources are said to be “Hieroglyphica
extension (various sources)” with number A58C and “Hornung & Schenkel
(2007, last modified in 2015)”, but with no number (A;), which seems
unique in the table. It leads me to think this glyph only exist in
some modern font, either as a joke, or for some computer related
modern use. Can anyone infirm or confirm this intuition ?

     Frédéric






Egyptian Hieroglyph Man with a Laptop

2020-02-12 Thread Frédéric Grosshans via Unicode

Dear Unicode list members (CC Michel Suignard),

  the Unicode proposal L2/20-068 
, 
“Revised draft for the encoding of an extended Egyptian Hieroglyphs 
repertoire, Groups A to N” ( 
https://www.unicode.org/L2/L2020/20068-n5128-ext-hieroglyph.pdf ) by 
Michel Suignard contains a very interesting hieroglyph at position 
*U+13579 EGYPTIAN HIEROGLYPH A-12-054, which seems to represent a man 
with a laptop, as can be obvious in the attached image.


  I am curious about the source of this hieroglyph: in the table 
acompannying the document, its sources are said to be “Hieroglyphica 
extension (various sources)” with number A58C and “Hornung & Schenkel 
(2007, last modified in 2015)”, but with no number (A;), which seems 
unique in the table. It leads me to think this glyph only exist in some 
modern font, either as a joke, or for some computer related modern use. 
Can anyone infirm or confirm this intuition ?


   Frédéric




Re: Proposal for BiDi in terminal emulators

2019-01-31 Thread Frédéric Grosshans via Unicode

Le 31/01/2019 à 10:41, Egmont Koblinger a écrit :

Hi,


Personally, I think we should simply assume that complex script
shaping is left to the terminal, and if the terminal cannot do that,
then that's a restriction of working on a text terminal.

I cannot read any of the Arabic, Syriac etc. scripts, but I did lots
of experimenting with picking random Arabic words, displaying in the
terminal their unshaped as well as shaped (using presentation form
characters) variants, and compared them to pango-view's (harfbuzz's)
rendering.

To my eyes the version I got in the terminal with the presentation
form characters matched the "expected" (pango-view) rendering
extremely closely. Of course there's still some tradeoffs due to fixed
with cells (just as in English, arguably an "i" and "w" having the
same width doesn't look as nice as with proportional fonts). In the
mean time, the unshaped characters looks vastly differently.


OTOH a terminal emulator who wants to perform shaping needs
information from the application

And the presentation form characters are pretty much exactly that
information, aren't they (for Arabic)?


There's nothing you can do here [...] there's no way for the application to 
provide

Instead of saying that it's not possible, could we perpahs try to
solve it, or at least substantially improve the situation? I mean, for
example we can introduce control characters that specify the language.
We can introduce a flag that tells the terminal whether to do shaping
or not. There are probably plenty of more ideas to be thrown in for
discussion and improvement.


cheers,
egmont





Re: Proposal for BiDi in terminal emulators

2019-01-30 Thread Frédéric Grosshans via Unicode

Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :

- It doesn't do Arabic shaping. In my recommendation I'm arguing that
in this mode, where shuffling the characters is the task of the text
editor and not the terminal, so should it be for Arabic shaping using
presentation form characters.


I guess Arabic shaping is doable through presentation form characters, 
because the latter are character inherited from legacy standards using 
them in such solutions. But if you want to support other “arabic like” 
scripts (like Syriac, N’ko), or even some LTR complex scripts, like 
Myanmar or Khmer, this “solution” cannot work, because no equivalent of 
“presentation form characters” exists for these scripts





Re: wws dot org

2019-01-17 Thread Frédéric Grosshans via Unicode

  
  
Thanks for this nice website !



Some feedback:



  
Given the number of scripts in this period, I think that
  splitting 10c-19c in two (or even three) would be a good idea
A finer unicode status would be nice


Coptic is listed as European, while, I think it is Africac,
  (even if a member of the LGC (LAtin-Greek-Cyrillic) family
  since, to my knowledge, it has only be used in Africa for
  African llanguages (Coptic and Old Nubian). 

Coptic still used for religious purpose today. Why to you
  write it dead in the 14th century ?
Khitan Small Script: According to Wikipedia, it “was
invented in about 924 or 925 CE”, not
  920 (that is the date of the Khitan Large Script
Cyrillic I think its birth date is 890s, slightly more
  precice than the 10c you write
You include two well known Tolkienian scripts (Cirth and
  Tengwar), but you omit the third (first ?) one, the Sarati
  (see e.g. http://at.mansbjorkman.net/sarati.htm and
  https://en.wikipedia.org
  
  On a side note, you the site considers visible speech as a
living-script, which surprised be. This information is indeed in
the Wikipedia infobox and implied by its “HMA status” on the
Berkeley SEI page, but the text of the wikipedia page says “However, although heavily promoted
  [...] in 1880, after a period of a dozen years or so
  in which it was applied to the education of the deaf, Visible
  Speech was found to be more cumbersome [...] compared to other
  methods, and eventually faded from use.”
  My (cursory) research failed to show a more recent date than
this for the system than this “dosen of year or so [past 1880]”
. Is there any indication of the system to be used later? (say,
any date in the 20th century)
  
  
  All the best,
  
  
     Frédéric
  

Le 15/01/2019 à 19:22, Johannes
  Bergerhausen via Unicode a écrit :


  
  Dear list,
  
  I am happy to
report that www.worldswritingsystems.org is now online.
  
  The web site is
a joint venture by
  

—
Institut Designlabor Gutenberg (IDG), Mainz, Germany,
  — Atelier
National de Recherche Typographique (ANRT), Nancy, France and
—
Script Encoding Initiative (SEI), Berkeley, USA.
  
  For every
known script, we researched and designed a reference glyph.
  
  You can sort
these 292 scripts by Time, Region, Name, Unicode version and
Status.
Exactly half of them (146) are already encoded in
  Unicode.
  
  Here you can
find more about the project:
  www.youtube.com/watch?v=CHh2Ww_bdyQ


And is a link to see the poster:
https://shop.designinmainz.de/produkt/the-worlds-writing-systems-poster/
  
  All the
best,
  Johannes





  

  

  







  ↪ Prof.
Bergerhausen
  Hochschule
  Mainz, School
of Design, Germany
  www.designinmainz.de
  www.decodeunicode.org

  
  
  
  
  
  
  

  

  

  



  



Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Frédéric Grosshans via Unicode

  
  
Le 07/06/2018 à 18:01, Alastair
  Houghton a écrit :


  
  


I appreciate that the upshot of the Anglicised world of
  software engineering is that native English speakers have an
  advantage, and those for whom Latin isn’t their usual script
  are at a particular disadvantage, and I’m sure that seems
  unfair to many of us — but that doesn’t mean that allowing the
  use of other scripts everywhere, desirable as it is, is
  entirely unproblematic.
  

It depends of what what means by “allowing”, and it clearly can be
clearly problematic to use non ASCII characters. Restriction to (a
subset of) ASCII is indeed often the most reasonable choice, but
when on writes a specification on something which can be used in
many contexts (like url addresses, or a programming language), not
allowing it means forbidding it, even in contexts where it makes
sense. 

  [...]

  
If I understand you correctly, an Arabic
  speaker should always transliterate the function name to
  ASCII,
  



That’s one option; or they could write it in Arabic, but
  they need to be aware of the consequences of doing so (and
  those they are working for or with also need to understand
  that) [...];
  

We agree on this: they should be aware of the consequences. I think
these consequences should be essentially societal (as the example
you give), but not technical, since the first ones are supposed to
be well understood by everyone. 


  
[...]t.


  
  

  

   UAX #31 also manages (I
suspect unintentionally?) to give a good example of a
pair of Farsi identifiers that might be awkward to tell
apart in certain fonts, namely نامهای and نامه‌ای; I
think those are OK in monospaced fonts, where the join
is reasonably wide, but at small point sizes in
proportional fonts the difference in appearance is very
subtle, particularly for a non-Arabic speaker.
  
  In ASCII, identifiers with I, l, and 1 can be difficult to
  tell apart. And it is not an artificial problem: I’ve once
  had some difficulties with an automatically generated
  login which was do11y but tried to type dolly, despites my
  familiarity with ASCII. So I guess this problem is not
  specific to the ASCII vs non-ASCII debate

  


  
  It isn’t, though fonts used by programmers typically
emphasise the differences between I, l and 1 as well as 0 and O,
5 and S and so on specifically to avoid this problem.

In your example, you specifically mentioned that it “might be
awkward in certain fonts” but “OK in monospaced font”, so nothing
ASCII specific here. 


  
  
  But please don’t misunderstand; I am not — and have not been
— arguing against non-ASCII identifiers. We
were asked whether there were any problems. These are problems
(or perhaps we might call them “trade-offs”). We can debate the
severity of them, and whether, and what, it’s worthwhile doing
anything to mitigate any of them. What we shouldn’t do is sweep
them under the carpet.


I totally agree. (And I misunderstood you in the first place,
probably because “non-ASCII is bad, whatever the context” is a
common attitude in programmers, even non-Latin native ones.


  
  
  Personally I think a combination of documentation to explain
that it’s worth thinking carefully about which script(s) to use,
and some steps to consider certain characters to be equivalent
even though they aren’t the same (and shouldn’t be the same even
when normalised) might be a good idea. Is that really so
controversial a position?


Not at all. I misread “for reasonably wide values of ‘everyone’, at
any rate…” as saying “it is unreasonable to think of people not
comfortable with ASCII”, but it is clearly not what you intend to
say.

We both agree that:

  using non-ASCII identifiers instantly limit the number of
people who can work with them, so they should be used with care
  
  they however have some use cases, for users not familiar with
ASCII 
  

Frédéric

  



Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-07 Thread Frédéric Grosshans via Unicode

Le 06/06/2018 à 11:29, Alastair Houghton via Unicode a écrit :

On 4 Jun 2018, at 20:49, Manish Goregaokar via Unicode  
wrote:

The Rust community is considering adding non-ascii identifiers, which follow 
UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for 
identifiers to be treated as equivalent under NFKC.

Are there any cases where this will lead to inconsistencies? I.e. can the NFKC 
of a valid UAX 31 ident be invalid UAX 31?

(In general, are there other problems folks see with this proposal?)

IMO the major issue with non-ASCII identifiers is not a technical one, but 
rather that it runs the risk of fragmenting the developer community.  Everyone 
can *type* ASCII and everyone can read Latin characters (for reasonably wide 
values of “everyone”, at any rate… most computer users aren’t going to have a 
problem). Not everyone can type Hangul, Chinese or Arabic (for instance), and 
there is no good fix or workaround for this.
Well, your ”reasonable” value of everyone exclude many kids, and puts 
social barriers in the use of computer to non-native latin writers. If 
the programme has no reason to be read and written by foreign 
programmers, why not use native language and alphabet identifiers? Of 
course, as long as you write a function named الطول, you consciously 
restrict the developer community having access to this programme. But 
you also make your programme more clear to your arabic speaking 
community. If said community is e.g. school teachers (or students) in an 
arab speaking country, it may be a good choice. I don’t see the 
difference with choosing to write a book in a language or another.

Note that this is orthogonal to issues such as which language identifiers [...] 
are written in [...];

It is indeed different, but not orthogonal


the problem is that e.g. given a function

   func الطول(s : String)

it isn’t obvious to a non-Arabic speaking user how to enter الطول in order to 
call it.
OK. Clearly, someone not knowing the Arabic alphabet will have 
difficulties with this one, but if one has good reason to think the 
targeted developper community is literate in Arabic and a lower mastery 
of the latin alphabet, it still may be a good idea.
If I understand you correctly, an Arabic speaker should always 
transliterate the function name to ASCII, and there are many different 
way to do it  (see e.g. 
https://en.wikipedia.org/wiki/Romanization_of_Arabic). Should they name 
his function altawil, altwl, alt.wl ? And when calling it later, they 
should remember their ad-hoc ASCII Arabic orthography. I don’t soubt 
many, if not most, do it, but it can add an extra burden in programming. 
It’s a bit like remembering if your name should be transliterated in 
Greek as Ηουγητον or Ουχτων, and use that for every identifier you come 
across. A mitigation strategy is to name your identifier x1, x2, x3 and 
so on. The common knowledge is that this is a bad idea, and programming 
teachers spend some time discouraging their student to use such a 
strategy. However, many Chinese website and email addresses are of this 
form, because it is the only one clear enough for a big fraction of the 
population.




This isn’t true of e.g.

   func pituus(s : String)

Even though “pituus” is Finnish, it’s still ASCII and everyone knows how to 
type that.


Avoiding “special characters” can be annoying in Latin based language, 
specially for beginners, and kids among them. Unicode (too slow) 
adoption has already eased the difficulty of writing a “Hello world” 
and  “What‘s your name programme”, but avoiding non-ASCII characters in 
identifiers can be a bit esoteric for kids with a native language full 
of them. (And by the way, several big French companies regularly send me 
mail with my first name mojibakeed, while their software is presumably 
written by adults)


[...]


  UAX #31 also manages (I suspect unintentionally?) to give a good example of a 
pair of Farsi identifiers that might be awkward to tell apart in certain fonts, 
namely نامهای and نامه‌ای; I think those are OK in monospaced fonts, where the 
join is reasonably wide, but at small point sizes in proportional fonts the 
difference in appearance is very subtle, particularly for a non-Arabic speaker.
In ASCII, identifiers with I, l, and 1 can be difficult to tell apart. 
And it is not an artificial problem: I’ve once had some difficulties 
with an automatically generated login which was do11y but tried to type 
dolly, despites my familiarity with ASCII. So I guess this problem is 
not specific to the ASCII vs non-ASCII debate


Re: metric for block coverage

2018-03-08 Thread Frédéric Grosshans via Unicode


Hi !

   I’ll just add two points to the various points raised in the 
previous conversation about block coverage :



Le 17/02/2018 à 23:18, Adam Borowski via Unicode a écrit :

Hi!
As a part of Debian fonts team work, we're trying to improve fonts review:
ways to organize them, add metadata, pick which fonts are installed by
default and/or recommended to users, etc.

I'm looking for a way to determine a font's coverage of available scripts.
It's probably reasonable to do this per Unicode block.  [...]

A naïve way would be to count codepoints present in the font vs the number
of all codepoints in the block.  Alas, there's way too much chaff for such
an approach to be reasonable: þ or ą count the same as LATIN TURNED CAPITAL
LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON.
A slightly less naïve way would be to take care of when the code-points 
ere added to Unicode, with the rough idea that the most widespread use 
characters were added first. It also adds the nice feature that this 
metric is less ambiguous for the blocks which are not yet completed.


For example, if you have a 100% coverage of
Armenian for Unicode 10.0 (which I’ll call Armenian10.0 for short), it 
only implies a coverage of 89/91=97.8% of Armenian11.0, which will see 
the addition of two characters used in Armenian dialectology (ARMENIAN 
SMALL LETTER TURNED AYB and YI WITH STROKE).
If you look at the history of Armenian Block (e.g. here 
https://en.wikipedia.org/wiki/Armenian_(Unicode_block)),
Most (84) characters where added in 1.0, A ligature was added in 1.0, 
ARMENIAN HYPHEN was added in 3.0, a currency symbol in 6.1, two 
decorative symbols in 7.0 and two characters used in dialectology are 
planned in 11.0. I guess this roughly correspond to a ranking of the 
characters from the most used to the least used.



To take your examples, both þ and ą are in unicode since 1.1 (and, I 
guess 1.0), while LATIN TURNED CAPITAL
LETTER SAMPI WITH HORNS AND TAIL WITH SMALL LETTER X WITH CARON is not 
encoded yet, so,they are not the same according to this metric...  To 
know what this means for othe Latin example, you can watch the Latin 
Extende-D block (history here 
https://en.wikipedia.org/wiki/Latin_Extended-D ) whith new characters in 
5.0, 5.1, 6.1, 7.0, 8.0, 9.0 and some accepted for 11.0 (SMALL CAPITAL 
Q, CAPITAL/SMALL LETTER U WITH STROKE), and later (15, for  Egyptology, 
Assyriology, medieval English and historical Pinyin)


Of course, this measure is only rough. A counter example is in the 
monetary symbol block, where € U+20AC EURO SIGN (in Unicode since 2.1) 
is much more used than ₣ U+20A3 FRENCH FRANC SIGN encode since Unicode 
1.1 (1.0?) but that I never saw, despite living in France for more than 
four decades.

[...]



I don't think I'm the first to have this question.  Any suggestions?


For the Han (CJK) script, the IRG (Ideographic Rapporteur Group) defined 
a set of less than 10k essential Han characters, IICore (International 
Ideographs Core, 
https://en.wikipedia.org/wiki/International_Ideographs_Core). This is 
described in the Unihan database in the Unihan_IRGSources.txt file, 
kIICore field (https://www.unicode.org/reports/tr38/#kIICore ). This 
field also includes a letter (A,B or C) indicating a priority value and 
some regional information. For Unicode 10.0, a simple grep tells that 
there are 9810 IICore characters, 7772 of hitch pritority A, 417 
priority B and 1621 priority C.


Note that IICore has been stable (as version 2.2) since 2004, but Ken 
Lunde, from Adobe, has recently proposed an update to it 
(https://www.unicode.org/L2/L2018/18066-iicore-changes.pdf), but only in 
the region tags, neither on the priorities nor on the list of 
characters. However, reading the associated blog post of Ken Lunde, it 
seems a few characters could be added to IICore in the future.


   Cheers,

            French


Re: Internationalised Computer Science Exercises

2018-01-22 Thread Frédéric Grosshans via Unicode

Le 22/01/2018 à 17:39, Andre Schappo via Unicode a écrit :


By way of example, one programming challenge I set to students a 
couple of weeks ago involves diacritics. Please see 
jsfiddle.net/coas/wda45gLp 


There is huge potential for some really interesting and challenging 
Unicode exercises. If you have any suggestions for such exercises they 
would be most welcome. Email me direct or share on this list.


A simple challenge is to write a function which localize numbers in a 
script having decimal digits or parse them (i.e. which have characters 
with property Numeric_Type=Decimal, as explained in §4.6 of the Unicode 
10 standard). The list of these scripts is specified in table 22-3. 
There is usually a most one set of digits/script (with the exception of 
Arabic, Myanmar and Tai Tham).


Then, of course, one can look at other numeral systems (CJK, Ethiopic, 
Roman, to name a few in contemporaneous use). The section 22.3 of the 
Unicode standard is an interesting starting point for these.



A internationalised exercise which doesn’t (always) use unicode is the 
localization of separators in numbers: 2¹⁰+π = 1,027.14 in US and 1 
027,14 in France. One also should not forget that half a million is 
5,00,000 in India. These simple things can be very surprising the first 
time you meet them.


  Frédéric