Re: can a method name contain a funny character?

Larry Wall Sat, 21 May 2016 09:58:25 -0700

On Fri, May 20, 2016 at 09:39:30AM -0400, yary wrote:
: On Tue, Apr 12, 2016 at 6:12 PM, Brandon Allbery <[email protected]>
: wrote:
: > I was explaining why some "symbols" are acceptable to the parser. Which
: one
: > is more appropriate is not my call,
: 
: I was thinking about what exactly are valid identifiers in Perl6/rakudo's
: implementation. The docs <http://docs.perl6.org/language/syntax#Identifiers>
: say:
: 
: An identifier is a primitive name, and must start with an alphabetic
: character (or an underscore), followed by zero or more word characters
: (alphabetic, underscore or number). You can also embed dashes - or single
: quotes ' in the middle, but not two in a row.


At this point, "number" means only characters with a GeneralCategory
of Nd.  We could talk about generalizing that, but there are potential
issues.  We can't simply extend it to No characters, because then

    pi²

would misparse as a 3-character identifier.

: Experimenting with some of the numeric codes from Wikipedia
: <https://en.wikipedia.org/wiki/Numerals_in_Unicode>, some of the numeric
: codes seem inconsistent-

Note that, even if we used this table, we could not distinguish ² from ② and 
such.

: > my $_६೬𝟨 = ६೬𝟨 # "De" Devanagari, Kannada, Mathematical. "De" is all
: good.
: 666

That's fine, those work because of the Nd general property, so they're 
equivalent
to 0..9 as far as we're concerned.

: > my $x六 = 6 #  "Nu" Han number 6
: 6
: >  say 六
: ===SORRY!=== ...

Note that 六 works in identifiers by virtue of being not numeric at all,
but by being in general category Lo, that is, it's a "letter other",
so considered alphabetic.

: > say ௰  # "Nu" Tamil number 10
: 10
: > my $x௰ = 5
: ===SORRY!=== Error ...

Excluded because it's No, not Nd.

: > say ① + 3 # "Di" 1 in typographic context has value 1
: 4
: > my $b① = 44 "Di" 1 not valid in identifier
: ===SORRY!=== Error ...

① is indistinguishable from superscripts, even by "Di", and falls into
the No general category, so excluded.

: Some numeric codepoints are recognized as such, yet Rakudo isn't allowing
: them in identifiers. Especially confounding is the treatment of the "Han
: number 6" and "Tamil number 10", both of which are unicode "Nu" numeric.
: The Tamil is recognized as a number on its own but not as an identifier;
: the Han is allowed in an identifier but isn't recognized as a number!

We currently rely only on GeneralCategory.  I don't believe we use
NumericType anywhere in parsing Perl 6.

: Is there some deeper rule at work here- which could be added to the
: documentation? Or are these bugs?

Not a bug, but potentially negotiable.  It simply comes down to Nd vs
No at the moment.  One could argue that we could notice superscripts
as a separate category and treat them differently, but there are two
arguments against that.

The first is that we'd like to keep the basic identifier rules fairly
simple.  We're already pushing the state of the art here, and I don't
see much benefit in making the rules more arcane that they are.

The second argument is that we should probably reserve syntax for the
user here.  Once we get slangs fully hooked up, we can easily let users
define identifiers to include ①  and such.  But it's just as likely,
perhaps more likely, that the user will want to use ①  for a postfix,
just like we currently treat superscripts as powers.  We can't guess
(well, we *could* guess, but can't know) which way the user will want to
use these, so the conservative approach is to make neither of them work,
and let the user take an additive approach, rather than forcing them to
use a subtractive approach if we guessed wrong.

Larry

Re: can a method name contain a funny character?

Reply via email to