On Sun, Oct 23, 2005 at 10:55:34PM +0900, Dan Kogai wrote:
: To make the matter worse, there are not just one "yen sign" in
: Unicode. Take a look at this.
: ¥ U+00A5 YEN SIGN
: ￥ U+FFE5 FULLWIDTH YEN SIGN
: Tough they look and groks the same to human, computers handle them
: differently. This happened when Unicode Consortium decided to make
: BMP round-trippable against legacy encodings. They were distinct in
: JIS standards, so happened Unicode.
: Maybe we should avoid other symbols like this for sigils -- those not
: in ASCII that have 'fullwidth' variations. q($) and q(\) are okay
: (or too late) because they are already in ASCII. q(¥) should be
: avoided because you can hardly tell the difference from q(￥) in the
: But this will also outlaw the cent sign. I have attached a list of
: those affected. As you see, most are with ASCII equivalents but some
: are not.
We'd have to outlaw A..Z as well. :-)
I think a better plan might just be to say that we'll treat any fullwidth
character as equivalent to its narrow companion, at least when used as
an operator. Canonicalizing identifiers may be another matter though.
On the other hand, certain of the double-width characters are likely to
be confused with two singles, such as
＝ FF1D FULLWIDTH EQUALS SIGN
＿ FF3F FULLWIDTH LOW LINE
so maybe they should be equivalent to == and __, or outlawed.
And one could (un)reasonably argue that
～ FF5E FULLWIDTH TILDE
ought to mean ~~ rather than ~. But in general we need to go slow
on such decisions. For now just sticking our toe into Latin-1
is enough, as long as we're looking ahead for visual pitfalls.
As for the ¥ pitfall, so far we've intentionally been careful to use
it only where an operator is expected, whereas \ is legal only where a
term is expected. So at least for Perl code, we can translate legacy
¥ to different codepoints. (Whether the Japanese font distinguishes
them is another issue, of course. I have a "Unicode" font on my
machine that prints backslash as ¥, which I find slightly irritating,
but doubtless will be par for the course in Japan for the foreseeable
future. Maybe that's a good reason to allow the doublewith backslash
as an alias for normal backslash. Maybe not.)
Anyway, I think people will be able to distinguish visually between
"A ¥ B" and "¥X" as long as we keep the operator/term distinction.