[perl.git] branch blead, updated. v5.23.0-113-gc4e131a

Karl Williamson Mon, 13 Jul 2015 11:18:43 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/c4e131a911a886c1978fea41bd198d709effb11e?hp=b92342550433b215a30d5d4b9bfe55321c69f8ac>


- Log -----------------------------------------------------------------
commit c4e131a911a886c1978fea41bd198d709effb11e
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Jul 13 12:08:32 2015 -0600

    toke.c: Move macro definition
    
    This moves the definition to before the function it is used in, rather
    than disrupting the flow of code within the function.

M       toke.c

commit ce4793f183b29c423cb9d2d993fb4399c8d46baa
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 12:19:59 2015 -0600

    Forbid variable names with ASCII non-graphic chars
    
    See http://nntp.perl.org/group/perl.perl5.porters/229168
    
    Also, the documentation has been updated beyond this change to clarify
    related matters, based on some experimentation.
    
    Previously, spaces couldn't be in variable names; now ASCII control
    characters can't be either.  The remaining permissible ASCII characters
    in a variable name now must be all graphic ones.

M       pod/perldata.pod
M       pod/perldelta.pod
M       pod/perlvar.pod
M       t/lib/warnings/toke
M       t/uni/variables.t
M       toke.c

commit 87518e92cecac2acea7073cceea51ca610774fb0
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 22:37:35 2015 -0600

    perldata: Change pod to reflect reality
    
    Caret variable names don't have to be limited to $^A through $^Z.  $^],
    etc. are also valid.

M       pod/perldata.pod

commit fefb073f144151139233ca435fb1fc9edf684fe4
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 22:39:54 2015 -0600

    toke.c: Comments, white-space only
    
    Add some clarifying comments, and properly indent some lines to
    prevailing level.

M       toke.c

commit 97bf8a2377185e29b65c2d10276fb50d0ad63d41
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 12:03:20 2015 -0600

    uni/variables.t: Add TODO tests
    
    These show a bug in perl parsing where utf8ness makes a difference.
    in what happens.  In this case, a syntax error is accompanied by warning
    messages when in 'use utf8', and no warnings when not.  I'm not filing a
    bug report, as I don't think it is worth fixing, as it is a syntax error
    after all.  But I did make tests for it, as TODOs.

M       t/uni/variables.t

commit e68670aedff308b76d0c1076a6073146840fb322
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 12:08:42 2015 -0600

    uni/variables.t: Output unexpected warnings
    
    This helps debug when the test fails.

M       t/uni/variables.t

commit 9100b351e15341ed12d22822713635b0e4e2237d
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Jul 11 12:01:54 2015 -0600

    uni/variables.t: Fix grammar in comment

M       t/uni/variables.t
-----------------------------------------------------------------------

Summary of changes:
 pod/perldata.pod    | 93 +++++++++++++++++++++++++++++----------------------
 pod/perldelta.pod   | 23 +++++++++----
 pod/perlvar.pod     | 38 ++++++++++-----------
 t/lib/warnings/toke | 42 -----------------------
 t/uni/variables.t   | 96 +++++++++++++++++++----------------------------------
 toke.c              | 75 +++++++++++++++++++----------------------
 6 files changed, 156 insertions(+), 211 deletions(-)

diff --git a/pod/perldata.pod b/pod/perldata.pod
index 3af3f0b..a285eb7 100644
--- a/pod/perldata.pod
+++ b/pod/perldata.pod
@@ -37,8 +37,8 @@ collide with one of your normal variables.  Strings that match
 parenthesized parts of a regular expression are saved under names
 containing only digits after the C<$> (see L<perlop> and L<perlre>).
 In addition, several special variables that provide windows into
-the inner working of Perl have names containing punctuation characters
-and control characters.  These are documented in L<perlvar>.
+the inner working of Perl have names containing punctuation characters.
+These are documented in L<perlvar>.
 X<variable, built-in>
 
 Scalar values are always named with '$', even when referring to a
@@ -99,11 +99,11 @@ that returns a reference to the appropriate type.  For a 
description
 of this, see L<perlref>.
 
 Names that start with a digit may contain only more digits.  Names
-that do not start with a letter, underscore, digit or a caret (i.e.
-a control character) are limited to one character, e.g.,  C<$%> or
+that do not start with a letter, underscore, digit or a caret are
+limited to one character, e.g.,  C<$%> or
 C<$$>.  (Most of these one character names have a predefined
 significance to Perl.  For instance, C<$$> is the current process
-id.)
+id.  And all such names are reserved for Perl's possible use.)
 
 =head2 Identifier parsing
 X<identifiers>
@@ -129,7 +129,7 @@ match C<\w> (this prevents some problematic cases); and Perl
 additionally accepts identfier names beginning with an underscore.
 
 If not under C<use utf8>, the source is treated as ASCII + 128 extra
-controls, and identifiers should match
+generic characters, and identifiers should match
 
     / (?aa) (?!\d) \w+ /x
 
@@ -184,53 +184,66 @@ Put together, a grammar to match a basic identifier 
becomes
 Meanwhile, special identifiers don't follow the above rules; For the most
 part, all of the identifiers in this category have a special meaning given
 by Perl.  Because they have special parsing rules, these generally can't be
-fully-qualified.  They come in four forms:
+fully-qualified.  They come in six forms (but don't use forms 5 and 6):
 
 =over
 
-=item *
+=item 1.
 
 A sigil, followed solely by digits matching C<\p{POSIX_Digit}>, like
 C<$0>, C<$1>, or C<$10000>.
 
-=item *
-
-A sigil, followed by either a caret and a single POSIX uppercase letter,
-like C<$^V> or C<$^W>, or a sigil followed by a literal non-space,
-non-C<NUL> control character matching the C<\p{POSIX_Cntrl}> property.
-Due to a historical oddity, if not running under C<use utf8>, the 128
-characters in the C<[0x80-0xff]> range are considered to be controls,
-and may also be used in length-one variables.  However, the use of
-non-graphical characters is deprecated as of v5.22, and support for them
-will be removed in a future version of perl.  ASCII space characters and
-C<NUL> already aren't allowed, so this means that a single-character
-variable name with that name being any other C0 control C<[0x01-0x1F]>,
-or C<DEL> will generate a deprecated warning.  Already, under C<"use
-utf8">, non-ASCII characters must match C<Perl_XIDS>.  As of v5.22, when
-not under C<"use utf8"> C1 controls C<[0x80-0x9F]>, NO BREAK SPACE, and
-SOFT HYPHEN (C<SHY>)) generate a deprecated warning.
-
-=item *
-
-Similar to the above, a sigil, followed by bareword text in brackets,
-where the first character is either a caret followed by an uppercase
-letter, like C<${^GLOBAL_PHASE}> or a non-C<NUL>, non-space literal
-control like C<${\7LOBAL_PHASE}>.  Like the above, when not under
-C<"use utf8">, the characters in C<[0x80-0xFF]> are considered controls, but as
-of v5.22, the use of any that are non-graphical are deprecated, and as
-of v5.20 the use of any ASCII-range literal control is deprecated.
-Support for these will be removed in a future version of perl.
-
-=item *
+=item 2.
 
 A sigil followed by a single character matching the C<\p{POSIX_Punct}>
 property, like C<$!> or C<%+>, except the character C<"{"> doesn't work.
 
+=item 3.
+
+A sigil, followed by a caret and any one of the characters
+C<[][A-Z^_?\]>, like C<$^V> or C<$^]>.
+
+=item 4.
+
+Similar to the above, a sigil, followed by bareword text in braces,
+where the first character is a caret.  The next character is any one of
+the characters C<[][A-Z^_?\]>, followed by ASCII word characters.  An
+example is C<${^GLOBAL_PHASE}>.
+
+=item 5.
+
+A sigil, followed by any single character in the range C<[\x80-\xFF]>
+when not under C<S<"use utf8">>.  (Under C<S<"use utf8">>, the normal
+identifier rules given earlier in this section apply.)  Use of
+non-graphic characters (the C1 controls, the NO-BREAK SPACE, and the
+SOFT HYPHEN) is deprecated and will be forbidden in a future Perl
+version.  The use of the other characters is unwise, as these are all
+reserved to have special meaning to Perl, and none of them currently
+do have special meaning, though this could change without notice.
+
+Note that an implication of this form is that there are identifiers only
+legal under C<S<"use utf8">>, and vice-versa, for example the identifier
+C<$E<233>tat> is legal under C<S<"use utf8">>, but is otherwise
+considered to be the single character variable C<$E<233>> followed by
+the bareword C<"tat">, the combination of which is a syntax error.
+
+=item 6.
+
+This is a combination of the previous two forms.  It is valid only when
+not under S<C<"use utf8">> (normal identifier rules apply when under
+S<C<"use utf8">>).  The form is a sigil, followed by text in braces,
+where the first character is any one of the characters in the range
+C<[\x80-\xFF]> followed by ASCII word characters up to the trailing
+brace.
+
+The same caveats as the previous form apply:  The non-graphic characters
+are deprecated, it is unwise to use this form at all, and utf8ness makes
+a big difference.
+
 =back
 
-Note that as of Perl 5.20, literal control characters in variable names
-are deprecated; and as of Perl 5.22, any other non-graphic characters
-are also deprecated.
+Prior to Perl v5.24, non-graphical ASCII control characters were also
+allowed in some situations; this had been deprecated since v5.20.
 
 =head2 Context
 X<context> X<scalar context> X<list context>
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index b3114a9..b6ec5df 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -69,13 +69,22 @@ L</Selected Bug Fixes> section.
 
 =head1 Incompatible Changes
 
-XXX For a release on a stable branch, this section aspires to be:
-
-    There are no changes intentionally incompatible with 5.XXX.XXX
-    If any exist, they are bugs, and we request that you submit a
-    report.  See L</Reporting Bugs> below.
-
-[ List each incompatible change as a =head2 entry ]
+=head2 ASCII characters in variable names must now be all visible
+
+It was legal until now on ASCII platforms for variable names to contain
+non-graphical ASCII control characters (ordinals 0 through 31, and 127,
+which are the C0 controls and C<DELETE>).  This usage has been
+deprecated since v5.20, and as of now causes a syntax error.  The
+variables these names referred to are special, reserved by Perl for
+whatever use it may choose, now, or in the future.  Each such variable
+has an alternative way of spelling it.  Instead of the single
+non-graphic control character, a two character sequence beginning with a
+caret is used, like C<$^]> and C<${^GLOBAL_PHASE}>.  Details are at
+L<perlvar>.   It remains legal, though unwise and deprecated (raising a
+deprecation warning), to use certain non-graphic non-ASCII characters in
+variables names when not under S<C<use utf8>>.  No code should do this,
+as all such variables are reserved by Perl, and Perl doesn't currently
+define any of them (but could at any time, without notice).
 
 =head2 The C<autoderef> feature has been removed
 
diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index cc69c3c..f825754 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -12,32 +12,30 @@ arbitrarily long (up to an internal limit of 251 
characters) and
 may contain letters, digits, underscores, or the special sequence
 C<::> or C<'>.  In this case, the part before the last C<::> or
 C<'> is taken to be a I<package qualifier>; see L<perlmod>.
-
-Perl variable names may also be a sequence of digits or a single
-punctuation or control character (with the literal control character
-form deprecated).  These names are all reserved for
+A Unicode letter that is not ASCII is not considered to be a letter
+unless S<C<"use utf8">> is in effect, and somewhat more complicated
+rules apply; see L<perldata/Identifier parsing> for details.
+
+Perl variable names may also be a sequence of digits, a single
+punctuation character, or the two-character sequence: C<^> (caret or
+CIRCUMFLEX ACCENT) followed by any one of the characters C<[][A-Z^_?\]>.
+These names are all reserved for
 special uses by Perl; for example, the all-digits names are used
 to hold data captured by backreferences after a regular expression
-match.  Perl has a special syntax for the single-control-character
-names: It understands C<^X> (caret C<X>) to mean the control-C<X>
-character.  For example, the notation C<$^W> (dollar-sign caret
-C<W>) is the scalar variable whose name is the single character
-control-C<W>.  This is better than typing a literal control-C<W>
-into your program.
-
-Since Perl v5.6.0, Perl variable names may be alphanumeric strings that
-begin with a caret (or a control character, but this form is
-deprecated).
-These variables must be written in the form C<${^Foo}>; the braces
-are not optional.  C<${^Foo}> denotes the scalar variable whose
-name is a control-C<F> followed by two C<o>'s.  These variables are
+match.
+
+Since Perl v5.6.0, Perl variable names may also be alphanumeric strings
+preceded by a caret.  These must all be written in the form C<${^Foo}>;
+the braces are not optional.  C<${^Foo}> denotes the scalar variable
+whose name is considered to be a control-C<F> followed by two C<o>'s.
+These variables are
 reserved for future special uses by Perl, except for the ones that
-begin with C<^_> (control-underscore or caret-underscore).  No
-control-character name that begins with C<^_> will acquire a special
+begin with C<^_> (caret-underscore).  No
+name that begins with C<^_> will acquire a special
 meaning in any future version of Perl; such names may therefore be
 used safely in programs.  C<$^_> itself, however, I<is> reserved.
 
-Perl identifiers that begin with digits, control characters, or
+Perl identifiers that begin with digits or
 punctuation characters are exempt from the effects of the C<package>
 declaration and are always forced to be in package C<main>; they are
 also exempt from C<strict 'vars'> errors.  A few other names are also
diff --git a/t/lib/warnings/toke b/t/lib/warnings/toke
index ad0e74b..493c8a2 100644
--- a/t/lib/warnings/toke
+++ b/t/lib/warnings/toke
@@ -150,38 +150,6 @@ EXPECT
 Use of bare << to mean <<"" is deprecated at - line 2.
 ########
 # toke.c
-BEGIN {
-    if (ord('A') == 193) {
-        print "SKIPPED\n# Literal control characters in variable names 
forbidden on EBCDIC";
-        exit 0;
-    }
-}
-eval "\$\cT";
-eval "\${\7LOBAL_PHASE}";
-eval "\${\cT}";
-eval "\${\n\cT}";
-eval "\${\cT\n}";
-my $ret = eval "\${\n\cT\n}";
-print "ok\n" if $ret == $^T;
-
-no warnings 'deprecated' ;
-eval "\$\cT";
-eval "\${\7LOBAL_PHASE}";
-eval "\${\cT}";
-eval "\${\n\cT}";
-eval "\${\cT\n}";
-eval "\${\n\cT\n}";
-
-EXPECT
-Use of literal control characters in variable names is deprecated at (eval 1) 
line 1.
-Use of literal control characters in variable names is deprecated at (eval 2) 
line 1.
-Use of literal control characters in variable names is deprecated at (eval 3) 
line 1.
-Use of literal control characters in variable names is deprecated at (eval 4) 
line 2.
-Use of literal control characters in variable names is deprecated at (eval 5) 
line 1.
-Use of literal control characters in variable names is deprecated at (eval 6) 
line 2.
-ok
-########
-# toke.c
 $a =~ m/$foo/eq;
 $a =~ s/$foo/fool/seq;
 
@@ -1497,20 +1465,10 @@ I
 ########
 # toke.c
 #[perl #119123] disallow literal control character variables
-BEGIN {
-    if (ord('A') == 193) {
-        print "SKIPPED\n# Literal control characters in variable names 
forbidden on EBCDIC";
-        exit 0;
-    }
-}
-eval "\$\cQ = 25";
-eval "\${ \cX } = 24";
 *{
     Foo
 }; # shouldn't warn on {\n, even though \n is a control character
 EXPECT
-Use of literal control characters in variable names is deprecated at (eval 1) 
line 1.
-Use of literal control characters in variable names is deprecated at (eval 2) 
line 1.
 ########
 # toke.c
 # [perl #120288] -X at start of line gave spurious warning, where X is not
diff --git a/t/uni/variables.t b/t/uni/variables.t
index 0b73d5f..33f057a 100644
--- a/t/uni/variables.t
+++ b/t/uni/variables.t
@@ -15,7 +15,7 @@ use utf8;
 use open qw( :utf8 :std );
 no warnings qw(misc reserved);
 
-plan (tests => 66900);
+plan (tests => 66894);
 
 # ${single:colon} should not be treated as a simple variable, but as a
 # block with a label inside.
@@ -96,15 +96,8 @@ for ( 0x0 .. 0xff ) {
         $syntax_error = 1;
     }
     elsif ($chr =~ /[[:cntrl:]]/a) {
-        if ($chr eq "\N{NULL}") {
-            $name = sprintf "\\x%02x, NUL", $ord;
-            $syntax_error = 1;
-        }
-        else {
-            $name = sprintf "\\x%02x, an ASCII control", $ord;
-            $syntax_error = $::IS_EBCDIC;
-            $deprecated = ! $syntax_error;
-        }
+        $name = sprintf "\\x%02x, an ASCII control", $ord;
+        $syntax_error = 1;
     }
     elsif ($chr =~ /\pC/) {
         if ($chr eq "\N{SHY}") {
@@ -136,19 +129,20 @@ for ( 0x0 .. 0xff ) {
             like($@, qr/ syntax\ error | Unrecognized\ character /x,
                      "$name as a length-1 variable generates a syntax error");
             $tests++;
+            utf8::upgrade($chr);
+            evalbytes "no strict; use utf8; \$$chr = 4;",
+            like($@, qr/ syntax\ error | Unrecognized\ character /x,
+                     "  ... and the same under 'use utf8'");
+            $tests++;
         }
-        elsif ($ord < 32 || $chr =~ /[[:punct:][:digit:]]/a) {
+        elsif ($chr =~ /[[:punct:][:digit:]]/a) {
 
             # Unlike other variables, we dare not try setting the length-1
-            # variables that are \cX (for all valid X) nor ASCII ones that are
-            # punctuation nor digits.  This is because many of these variables
-            # have meaning to the system, and setting them could have side
-            # effects or not work as expected (And using fresh_perl() doesn't
-            # always help.) For example, setting $^D (to use a visible
-            # representation of code point 0x04) turns on tracing, and setting
-            # $^E sets an error number, but what gets printed is instead a
-            # string associated with that number.  For all these we just
-            # verify that they don't generate a syntax error.
+            # variables that are ASCII punctuation and digits.  This is
+            # because many of these variables have meaning to the system, and
+            # setting them could have side effects or not work as expected
+            # (And using fresh_perl() doesn't always help.) For all these we
+            # just verify that they don't generate a syntax error.
             local $@;
             evalbytes "\$$chr;";
             is $@, '', "$name as a length-1 variable doesn't generate a syntax 
error";
@@ -237,13 +231,19 @@ for ( 0x0 .. 0xff ) {
         if ($chr =~ /[#*]/) {
 
             # Length-1 variables with these two characters used to be used by
-            # Perl, but now their generates a warning that they're gone.
+            # Perl, but now it generates a warning that they're gone.
             # Ignore such warnings.
             for (my $i = @warnings - 1; $i >= 0; $i--) {
                 splice @warnings, $i, 1 if $warnings[$i] =~ /is no longer 
supported/;
             }
         }
-        ok(@warnings == 0, "  ... and doesn't generate any warnings");
+        my $message = "  ... and doesn't generate any warnings";
+        $message = "  TODO $message" if    $ord == 0
+                                        || $chr =~ /\s/a;
+
+        if (! ok(@warnings == 0, $message)) {
+            note join "\n", @warnings;
+        }
         $tests++;
     }
     elsif (! @warnings) {
@@ -350,21 +350,25 @@ EOP
 
     {
         no strict;
-        # Silence the deprecation warning for literal controls
-        no warnings 'deprecated';
 
-        for my $var ( '$', "\7LOBAL_PHASE", "^GLOBAL_PHASE", "^V" ) {
-          SKIP: {
-            skip("Literal control characters in variable names forbidden on 
EBCDIC", 3)
-                             if ($::IS_EBCDIC && ord substr($var, 0, 1) < 32);
+        for my $var ( '$', "^GLOBAL_PHASE", "^V" ) {
             eval "\${ $var}";
             is($@, '', "\${ $var} works" );
             eval "\${$var }";
             is($@, '', "\${$var } works" );
             eval "\${ $var }";
             is($@, '', "\${ $var } works" );
-          }
         }
+        my $var = "\7LOBAL_PHASE";
+        eval "\${ $var}";
+        like($@, qr/Unrecognized character \\x07/,
+             "\${ $var} generates 'Unrecognized character' error" );
+        eval "\${$var }";
+        like($@, qr/Unrecognized character \\x07/,
+             "\${$var } generates 'Unrecognized character' error" );
+        eval "\${ $var }";
+        like($@, qr/Unrecognized character \\x07/,
+             "\${ $var } generates 'Unrecognized character' error" );
     }
 }
 
@@ -386,40 +390,8 @@ EOP
         );
     }
     
-  SKIP: {
-    skip("Literal control characters in variable names forbidden on EBCDIC", 2)
-                                                                if 
$::IS_EBCDIC;
-    no warnings 'deprecated';
     my $ret = eval "\${\cT\n}";
-    is($@, "", 'No errors from using ${\n\cT\n}');
-    is($ret, $^T, "  ... and we got the right value");
-  }
-}
-
-SKIP: {
-    skip("Literal control characters in variable names forbidden on EBCDIC", 5)
-                                                                if 
$::IS_EBCDIC;
-
-    # Originally from t/base/lex.t, moved here since we can't
-    # turn deprecation warnings off in that file.
-    no strict;
-    no warnings 'deprecated';
-    
-    my $CX  = "\cX";
-    $ {$CX} = 17;
-    
-    # Does the syntax where we use the literal control character still work?
-    is(
-       eval "\$ {\cX}",
-       17,
-       "Literal control character variables work"
-    );
-
-    eval "\$\cQ = 24";                 # Literal control character
-    is($@, "", "  ... and they can be assigned to without error");
-    is(${"\cQ"}, 24, "  ... and the assignment works");
-    is($^Q, 24, "  ... even if we access the variable through the caret name");
-    is(\${"\cQ"}, \$^Q, '\${\cQ} == \$^Q');
+    like($@, qr/\QUnrecognized character/, '${\n\cT\n} gives an error 
message');
 }
 
 {
diff --git a/toke.c b/toke.c
index 29ebbbf..9a94f91 100644
--- a/toke.c
+++ b/toke.c
@@ -8609,6 +8609,34 @@ S_scan_word(pTHX_ char *s, char *dest, STRLEN destlen, 
int allow_package, STRLEN
     return s;
 }
 
+/* Is the byte 'd' a legal single character identifier name?  'u' is true
+ * iff Unicode semantics are to be used.  The legal ones are any of:
+ *  a) all ASCII characters except:
+ *          1) control and space-type ones, like NUL, SOH, \t, and SPACE;
+ *          2) '{'
+ *     The final case currently doesn't get this far in the program, so we
+ *     don't test for it.  If that were to change, it would be ok to allow it.
+ *  c) When not under Unicode rules, any upper Latin1 character
+ *  d) Otherwise, when unicode rules are used, all XIDS characters.
+ *
+ *      Because all ASCII characters have the same representation whether
+ *      encoded in UTF-8 or not, we can use the foo_A macros below and '\0' and
+ *      '{' without knowing if is UTF-8 or not.
+ * EBCDIC already uses the rules that ASCII platforms will use after the
+ * deprecation cycle; see comment below about the deprecation. */
+#ifdef EBCDIC
+#   define VALID_LEN_ONE_IDENT(s, is_utf8)                                    \
+    (isGRAPH_A(*(s)) || ((is_utf8)                                            \
+                         ? isIDFIRST_utf8((U8*) (s))                          \
+                         : (isGRAPH_L1(*s)                                    \
+                            && LIKELY((U8) *(s) != LATIN1_TO_NATIVE(0xAD)))))
+#else
+#   define VALID_LEN_ONE_IDENT(s, is_utf8)                                    \
+    (isGRAPH_A(*(s)) || ((is_utf8)                                            \
+                         ? isIDFIRST_utf8((U8*) (s))                          \
+                         : ! isASCII_utf8((U8*) (s))))
+#endif
+
 STATIC char *
 S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, I32 ck_uni)
 {
@@ -8631,7 +8659,7 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
            *d++ = *s++;
        }
     }
-    else {
+    else {  /* See if it is a "normal" identifier */
         parse_ident(&s, &d, e, 1, is_utf8);
     }
     *d = '\0';
@@ -8643,6 +8671,9 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
            PL_lex_state = LEX_INTERPENDMAYBE;
        return s;
     }
+
+    /* Here, it is not a run-of-the-mill identifier name */
+
     if (*s == '$' && s[1] &&
       (isIDFIRST_lazy_if(s+1,is_utf8)
          || isDIGIT_A((U8)s[1])
@@ -8664,36 +8695,6 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
             s = skipspace(s);
         }
     }
-
-/* Is the byte 'd' a legal single character identifier name?  'u' is true
- * iff Unicode semantics are to be used.  The legal ones are any of:
- *  a) all ASCII characters except:
- *          1) space-type ones, like \t and SPACE;
-            2) NUL;
- *          3) '{'
- *     The final case currently doesn't get this far in the program, so we
- *     don't test for it.  If that were to change, it would be ok to allow it.
- *  c) When not under Unicode rules, any upper Latin1 character
- *  d) Otherwise, when unicode rules are used, all XIDS characters.
- *
- *      Because all ASCII characters have the same representation whether
- *      encoded in UTF-8 or not, we can use the foo_A macros below and '\0' and
- *      '{' without knowing if is UTF-8 or not.
- * EBCDIC already uses the rules that ASCII platforms will use after the
- * deprecation cycle; see comment below about the deprecation. */
-#ifdef EBCDIC
-#   define VALID_LEN_ONE_IDENT(s, is_utf8)                                    \
-    (isGRAPH_A(*(s)) || ((is_utf8)                                            \
-                         ? isIDFIRST_utf8((U8*) (s))                          \
-                         : (isGRAPH_L1(*s)                                    \
-                            && LIKELY((U8) *(s) != LATIN1_TO_NATIVE(0xAD)))))
-#else
-#   define VALID_LEN_ONE_IDENT(s, is_utf8) (! isSPACE_A(*(s))                 \
-                                            && LIKELY(*(s) != '\0')           \
-                                            && (! is_utf8                     \
-                                                || isASCII_utf8((U8*) (s))    \
-                                                || isIDFIRST_utf8((U8*) (s))))
-#endif
     if ((s <= PL_bufend - (is_utf8)
                           ? UTF8SKIP(s)
                           : 1)
@@ -8708,13 +8709,7 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
             : (! isGRAPH_L1( (U8) *s)
                || UNLIKELY((U8) *(s) == LATIN1_TO_NATIVE(0xAD))))
         {
-            /* Split messages for back compat */
-            if (isCNTRL_A( (U8) *s)) {
-                deprecate("literal control characters in variable names");
-            }
-            else {
-                deprecate("literal non-graphic characters in variable names");
-            }
+            deprecate("literal non-graphic characters in variable names");
         }
         
         if (is_utf8) {
@@ -8745,8 +8740,8 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
             /* if it starts as a valid identifier, assume that it is one.
                (the later check for } being at the expected point will trap
                cases where this doesn't pan out.)  */
-        d += is_utf8 ? UTF8SKIP(d) : 1;
-        parse_ident(&s, &d, e, 1, is_utf8);
+            d += is_utf8 ? UTF8SKIP(d) : 1;
+            parse_ident(&s, &d, e, 1, is_utf8);
            *d = '\0';
             tmp_copline = CopLINE(PL_curcop);
             if (s < PL_bufend && isSPACE(*s)) {

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.23.0-113-gc4e131a

Reply via email to