Le mardi 22 avril 2014 14:21:40 UTC+2, Steven D'Aprano a écrit : > On Tue, 22 Apr 2014 02:07:58 -0700, wxjmfauth wrote: > > > > > Le mardi 22 avril 2014 08:30:45 UTC+2, Rustom Mody a écrit : > > >> > > >> > > >> > > >> > > > @ rusy > > > > > >> "Ive reworded it to make it clear that I am referring to the > > > character-sets and not encodings." > > > > > > Very good, excellent, comment. An healthy coding scheme can only work > > > properly with a unique characters set and the coding is achieved with > > > the help of a unique operator. There is no other way to do it and that's > > > the reason why we have to live today with all these coding schemes > > > (unicode or not). Note: A coding scheme can be much more complex than > > > the coding of "raw" characters (eg. CID fonts). > > >> "So instead of using λ (0x3bb) we should use 𝝀 (0x1d740) or > > >> something thereabouts like 𝜆" > > > > For those who cannot see them, they are: > > > > py> unicodedata.name('\U0001d740') > > 'MATHEMATICAL BOLD ITALIC SMALL LAMDA' > > py> unicodedata.name('\U0001d706') > > 'MATHEMATICAL ITALIC SMALL LAMDA' > > > > > > ("LAMDA" is the official Unicode name for Lambda.) > > > > > > > This is a very good understanding of unicode. The letter lambda is not > > > the mathematical symbole lambda. Another example, the micro sign is not > > > the greek letter mu which is not the mathematical mu. > > > > Depends what you mean by "is not". The micro sign is a legacy > > compatibility character, we shouldn't use it except for compatibility > > with legacy (non-Unicode) character sets. Instead, we should use the NFKC > > or NFKD normalization forms to convert it to the recommended character. > > > > > > py> import unicodedata > > py> a = '\N{GREEK SMALL LETTER MU}' # Preferred > > py> b = '\N{MICRO SIGN}' # Legacy > > py> a == b > > False > > py> unicodedata.normalize('NFKD', b) == a > > True > > py> unicodedata.normalize('NFKC', b) == a > > True > > > > As for the mathematical mu, there is no separate Unicode "maths symbol > > mu" so far as I am aware. One would simply use '\N{MICRO SIGN}' or > > '\N{GREEK SMALL LETTER MU}' to get a μ. > > > > Likewise, the λ used in mathematics is the Greek letter λ, not a separate > > symbol, just like the Latin letter x and the x used in mathematics are > > the same. > >
Normalization is working fine, but it proofs nothing, it has to use some convention. There are several code points ranges (latin + greek), which can be used for mathematical purpose (different mu's). If you are interested, search for "unimath-symbols.pdf" on CTAN (I have all this stuff on my hd). ... "Likewise, the λ used in mathematics is the Greek letter λ, not a separate symbol, just like the Latin letter x and the x used in mathematics are the same. "... just like the Latin letter x and the x used in mathematics are the same. ... Oh! Definitively not. A tool with an unicode engine able to produce "math text" will certainly not use the same code point for a "textual x" or for a "mathematical x", even if one enter/type/hit the same "x". To be exaggeratedly stict, the real question is to know if a used "lambda" or "x" belongs to a "math unicode range" or not. This is quite a different approach. (Please no confusion with a "text litteral variable x"). A text processing tool will notice the difference, it will use different fonts. jmf -- https://mail.python.org/mailman/listinfo/python-list