We've been specifically discussing normalization form KC as defined by UAX
#15 (http://unicode.org/reports/tr15/) in the issue
(https://github.com/JuliaLang/julia/issues/5434), which is a compatibility
normalization.
On Friday, January 17, 2014 11:34:44 PM UTC-6, Marcus Urban wrote:
I'm not
Leaving aside the nuclear option and proselytizing for the moment, the
Unicode consortium does helpfully (?) provide a long list of
confusable characters
http://www.unicode.org/Public/security/revision-05/confusables.txt
and a related technical standard
http://www.unicode.org/reports/tr39/
On Jan 17, 2014, at 12:28, Jiahao Chen jia...@mit.edu wrote:
Leaving aside the nuclear option and proselytizing for the moment, the
Unicode consortium does helpfully (?) provide a long list of
confusable characters
http://www.unicode.org/Public/security/revision-05/confusables.txt
If we had a script to check for this, it could be set up as part of the
default Travis thing generated for packages.
There could be a package/tool to generate helpful error messages when you
try to used a function/MathConstant by the wrong (confusable) unicode
character. Something like You're
Ok, the joke wears thin the third time round.
I think the ideal behavior would be for Julia itself to have an opinion on
which character in each set of identical-looking characters was right, and
to warn on using a homograph that was not canonical. Combined with a tool
that would substitute any character causing a warning with the
Perhaps Julia could canonicalize symbols at parse time (besides warning for
non-canonical ones?). I think that whichever homograph is chosen as canonical,
it won't be the one that is easiest to type for everyone.
This is my secret weapon for entering unicode characters:
https://gist.github.com/JeffBezanson/8480786
After adding that to a .emacs, you can switch to symbol-input mode and
type e.g. \theta to enter a theta. The set of characters is obviously
easy to extend.
On Fri, Jan 17, 2014 at 3:17 PM,
Sublime Text (2 and 3) has a package called UnicodeMath which has similar
functionality. and covers much of unicode by default. Has functionality to
add symbols and to add synonyms to existing names.
P.S.: \upSigma produces Σ and \sum produces ∑ (different).
On Friday, 17 January 2014 14:29:39
On Friday, January 17, 2014 1:32:08 PM UTC-5, Raphael Sofaer wrote:
I think the ideal behavior would be for Julia itself to have an opinion on
which character in each set of identical-looking characters was right, and
to warn on using a homograph that was not canonical. Combined with a
I opened an issue for this:
https://github.com/JuliaLang/julia/issues/5434
My preference would be for Julia to silently canonicalize all homoglyphs in
identifiers (rather than issuing a warning or whatever).
+10 for automatic canoncialization. If we could have a optional warning if
canonicalization is needed for travis to barf at, that would be great too.
kl. 22:22:02 UTC+1 fredag 17. januar 2014 skrev Steven G. Johnson følgende:
I opened an issue for this:
I'm not sure whether people are using canonicalize in the generic sense
or if they mean canonical mappings as defined by the Unicode standard. Just
to be clear, the initial issue raised about U+00B5 MICRO SIGN versus U+03BC
GREEK SMALL LETTER MU would not be fixed by a canonical decomposition.
Make that last link to the FAQ http://www.unicode.org/faq/normalization.html
On Friday, January 17, 2014 11:34:44 PM UTC-6, Marcus Urban wrote:
I'm not sure whether people are using canonicalize in the generic sense
or if they mean canonical mappings as defined by the Unicode standard. Just
14 matches
Mail list logo