[perl.git] branch blead, updated. v5.21.5-57-g42327f0

Karl Williamson Tue, 21 Oct 2014 08:27:42 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/42327f062c1a45d590c71a5bceb2682cc200be1e?hp=636fe681770e29874acb69ab162bc53731425867>


- Log -----------------------------------------------------------------
commit 42327f062c1a45d590c71a5bceb2682cc200be1e
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 18 18:56:13 2014 -0600

    Document length-1 variable name rules
    
    This cleans up the existing documentation with added details, and
    documents the effect of recent changes.

M       pod/perldata.pod
M       pod/perldelta.pod

commit e92292576e58ce767c60c4cd8ebc1989792659ec
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 19 23:03:44 2014 -0600

    Don't allow literal control chars in var names in EBCDIC
    
    Currently, a variable name of length-1 may have as its name some of the
    possible control characters, though this usage is deprecated.  It is a
    pain to fix this to work properly on EBCDIC, and since the use of these
    is deprecated, the pumpking agreed with me to not to bother with doing
    so.

M       t/lib/warnings/toke
M       t/uni/variables.t
M       toke.c

commit 4475d0d23c30e1ffbe123b0f5e3b800c0be35f4c
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 19 10:35:04 2014 -0600

    Deprecate all length-1 non-graphic variable names
    
    v5.20 deprecated all otherwise-legal control characters as length-1
    variable names.  This extends this to include all non-graphic
    characters.  The practical effect of this occurs only when not under
    "use utf8", and affects just the C1 controls (code points 0x80
    through 0xFF), NO-BREAK SPACE, and SOFT HYPHEN.

M       pod/perldelta.pod
M       pod/perldiag.pod
M       t/uni/variables.t
M       toke.c

commit de1631081e2407e5021257de0eb1de20ad1534ca
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 19 09:42:25 2014 -0600

    toke.c: Simplify macro for length-1 variable names
    
    It turns out that any non-NUL, non-space ASCII character that gets this
    far in the program is a valid length-1 variable.  So, can simplify the
    expression in the macro that tests for legal ones.

M       t/uni/variables.t
M       toke.c

commit 3027a07d5f585cc9ab5e9c0c08df9f73a758e86b
Author: Karl Williamson <[email protected]>
Date:   Sun Oct 19 09:20:44 2014 -0600

    t/uni/variables.t: Add tests for the generated warnings

M       t/uni/variables.t

commit 40c554ff2a344883f8b00708a6a832d79136bf1e
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 18 19:00:01 2014 -0600

    t/uni/variables.t: White-space only
    
    Indent to newly-formed (by the previous commit) blocks

M       t/uni/variables.t

commit fef6cdc52740801feb44b31268d429391c8e6103
Author: Karl Williamson <[email protected]>
Date:   Sat Oct 18 08:56:00 2014 -0600

    t/uni/variables.t: Add tests
    
    The length-1 variables in the ASCII range had not been systematically
    tested until this commit

M       t/uni/variables.t

commit 2fb9f1434f6c6f7916e429be68140fc5d6b9eb31
Author: Karl Williamson <[email protected]>
Date:   Mon Oct 20 09:30:01 2014 -0600

    t/uni/variables.t: Add test skips
    
    A future commit will change the behavior of this test depending on
    whether it is run on an EBCDIC vs ASCII platform.  In the loop, add
    skips so the total number run per iteration is the same no matter what
    branches (and tests) get run in the iteration.  This will allow the
    ASCII and EBCDIC test plan to have the same number.  This in turn will
    prevent platform-specific failures when adding/subtracting tests in this
    file.

M       t/uni/variables.t

commit 502bdc0fe4cdbb3e2e0b49893a622431a77ce409
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 17 23:27:24 2014 -0600

    t/uni/variables.t: Rename variable
    
    A future commit will be using the current variable name for a different
    purpose.

M       t/uni/variables.t

commit 048c5953b72547aea9b36024cb01c8605a016cca
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 17 11:43:06 2014 -0600

    t/uni/variables.t: Clarify some test names
    
    I found it hard to read use utf8 instead of 'use utf8', etc, but more
    importantly, this changes a bunch of tests so the character in related
    tests is only named in the first such, and more description is used.
    This is in preparation for a future commit.

M       t/uni/variables.t

commit d84210af634f1e1759ff6baaaf8c1c2fe656d6fe
Author: Karl Williamson <[email protected]>
Date:   Mon Apr 1 22:28:08 2013 -0600

    t/uni/variables.t: Fix to run under EBCDIC

M       t/uni/variables.t

commit f7d29c389f10807951c8c1a6b07cd11099e5c733
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 17 22:54:21 2014 -0600

    t/uni/variables.t: Remove duplicate test
    
    The same exact test was repeated.

M       t/uni/variables.t

commit 77462865e9cbba7a19b6599367e59da6dc03a7a6
Author: Karl Williamson <[email protected]>
Date:   Fri Oct 17 11:50:13 2014 -0600

    t/uni/variables.t: Remove unused param to sprintf

M       t/uni/variables.t

commit 5333b64c49d92edaebcba2a634f2cb1d4edc5920
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 22:24:13 2014 -0600

    toke.c: Fix potential buffer overflow
    
    Malformed UTF-8 input could have potentially caused a read off the end
    of the buffer here until this commit.

M       toke.c

commit aae773bbd74e39596c3ed0ed403f36b245b6d687
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 22:03:30 2014 -0600

    perlebcdic: Add clarification

M       pod/perlebcdic.pod

commit 9415f65901e09c7e3366bca00059a439e82fa804
Author: Karl Williamson <[email protected]>
Date:   Wed Jun 26 15:40:53 2013 -0600

    uvoffuni_to_utf8_flags() die if platform can't handle
    
    On non EBCDIC platforms currently any UV is encodable as UTF-8.  (This
    would change if there were 128-bit words).  Thus, much code assumes that
    nothing can go wrong when converting to UTF-8, and hence does no error
    checking.
    
    However, UTF-EBCDIC is only capable of representing code points below
    2**32, so if there are 64-bit words, this function can fail.
    
    Prior to this patch, there was no real overflow check, and garbage was
    returned by this function if called with too large a number.
    
    While not ideal, the easiest thing to do is to just die for such a
    number, like we do for division by 0.  This involves changing only code
    within this function, and not its many callers.

M       pod/perldiag.pod
M       t/lib/warnings/utf8
M       utf8.c

commit a27ed9805e6767d92a9ba275f4379cd1f342cafe
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 19:24:53 2014 -0600

    dquote_static.c: Forbid \c{ on EBCDIC
    
    Unlike ASCII platforms where \c{ evaluates to a semi-colon, and raises a
    fatal error because that result is not a control character, on EBCDIC
    platforms it does evaluate to a control, but we have to forbid it anyway
    because unescaped "{" now is special as a part of larger sequences, and
    we don't want to get the parser confused.

M       dquote_static.c
M       pod/perldiag.pod
M       t/lib/warnings/toke

commit d9759938eb6b6cd4b44f6be9200b1fdcce67382c
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 19:44:07 2014 -0600

    utf8.h: EBCDIC fix
    
    These macros are supposed to accommodate larger than a byte inputs.
    Therefore, under EBCDIC, we have to use a different macro which handles
    the larger values.  On ASCII platforms, these called macros are no-ops
    so it doesn't matter there.

M       utf8.h

commit 88794300ed3b4c4d9b3ec9303c1b0c910d503082
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 19:55:42 2014 -0600

    handy.h: Two EBCDIC fixes
    
    In EBCDIC only macros, an argument previously was failed to be
    dereferenced, and there was an extra ==.  A few comment changes as well

M       handy.h

commit 4d2de1058c946439368821b6622c0290c8e9f855
Author: Karl Williamson <[email protected]>
Date:   Thu Oct 16 20:45:17 2014 -0600

    lib/locale.t: Add some \Q
    
    More places in this file need to have \Q to quote things.  This was
    noticed in EBCDIC testing.

M       lib/locale.t

commit c46c4601920d32884d90fbc4c84b4947cb60f436
Author: Karl Williamson <[email protected]>
Date:   Wed Apr 17 21:47:41 2013 -0600

    t/re/regexp.t: Add ability to skip depending on platform
    
    This adds the capability to specify that a test is to be done only on an
    ASCII platform, or only on an EBCDIC.

M       t/re/regexp.t

commit d6395ff9ae082229dda4b7ca25537f92c446bbab
Author: Karl Williamson <[email protected]>
Date:   Tue Apr 16 15:49:06 2013 -0600

    t/re/regexp.t: Generalize for non-ASCII platforms
    
    This adds code to the processing of the tests in t/re/re_tests to
    automatically convert most character constants from unicode to native
    character sets.  This allows most tests in t/re/re_tests to be run on
    both platforms without change.  A later commit will add the capability
    to skip individual tests if on the wrong platform, so those few tests
    that this commit doesn't work for can be accommodated

M       t/re/regexp.t

commit 98c62be8f6a8b6ad8f0782b33467fabcfe191465
Author: Karl Williamson <[email protected]>
Date:   Mon Jul 7 16:59:19 2014 -0600

    charset_tools.pl: Fix changed function name
    
    The function has changed names, but this file did not get updated to
    reflect that.

M       t/charset_tools.pl

commit 92c0a900a3381af22f6d14091dbaffaa47ad592c
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:43:26 2014 -0600

    EBCDIC doesn't have real UTF-8 locales.
    
    At least on the system that we have tested on.  There are locales that
    say they are UTF-8, but they're not; they're EBCDIC 1047.

M       locale.c
M       t/loc_tools.pl

commit 1ff128e487762041398e9cf94a21edf1833aa2ed
Author: Yaroslav Kuzmin <[email protected]>
Date:   Wed Jul 2 22:22:23 2014 -0600

    Fix ebcdic error of nostdio.h

M       nostdio.h

commit 3d915d2236450147aaf3ae3b16195dd74072c071
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:41:22 2014 -0600

    regcomp.c: isASCII should be isASCII_uni
    
    It doesn't effectively matter, but isACII_uni is the one that is
    supposed to be used for UVs.

M       regcomp.c

commit 0852beacbdeb373adc1e26bf890db2d8698ec0fb
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:24:50 2014 -0600

    Fix isASCII for EBCDIC
    
    Prior to this commit isASCII on EBCDIC platforms was defined as the
    isascii() libc function.  It turns out that this doesn't work properly.
    It needed to be this way back when EBCDIC was bootstrapped onto the
    target machine, but now, various header files are furnished with the
    requisite definitions, so this is no longer necessary.
    
    The problem with isascii() is that it is locale-dependent, unlike on
    ASCII platforms.  This means that instead of getting a standard ASCII
    definition, it returns whatever the underlying locale says, even if
    there is no 'use locale' anywhere in the program.  Starting with this
    commit, the isASCII definition now comes from the l1_char_class_tab.h
    file which we know is accurate and not locale-dependent.
    
    This header can be used in compilations of utility programs where perl.h
    is not available.  For these, there are alternate, more complicated
    definitions, should they be needed in those utility programs.  Several
    of those definitions prior to this commit also used locale-sensitive
    isfoo() functions.  The bulk of this commit refactors those definitions
    to not use these functions as much as possible.  As noted in the
    added comments in the code, the one remaining use of such a function is
    only for the lesser-used control characters.  Likely these aren't used
    in the utility programs.

M       handy.h

commit caa94d3551e3f67e2b7157876e52748d055d30a4
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:21:16 2014 -0600

    handy.h: Add missing macro
    
    This section of code is normally not compiled, but when circumstances
    call for it to be compiled, it may be missing the macro defined in this
    commit, which is trivial on ASCII platforms, so just define it if
    missing

M       handy.h

commit 6838b41e3cd58738e5e73705c9cef5bef6e28885
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:16:59 2014 -0600

    handy.h: Need macro definition for normally non-compiled code
    
    This section of code is compiled only when perl.h is not available, i.e.
    for utility programs.  I periodically test that it still works, and this
    time a macro was added to the other branch of the #if, but not this one.
    This commit adds a trivial one to the missing area.

M       handy.h

commit 687c8d01ec65dc1466743a4045ad091451aedc41
Author: Karl Williamson <[email protected]>
Date:   Wed Oct 8 11:12:12 2014 -0600

    handy.h: Comments only
    
    Removes obsolete comment, and adds text to make it easier to find
    matching #else and #endif of a #if

M       handy.h

commit ea5bc90f605c42b5ba11bf0aa5b9df6e7d9ec3e7
Author: Yaroslav Kuzmin <[email protected]>
Date:   Sun Sep 28 10:02:17 2014 -0600

    perlio: Fix to work with MVS Dataset

M       AUTHORS
M       perlio.c

commit 8a65147e419322d7c5ae5680e93886a7deb8b7c3
Author: Karl Williamson <[email protected]>
Date:   Sun Aug 24 10:12:49 2014 -0600

    mktables: Use chr() instead of pack()
    
    The input is the native code point, not the Unicode one, and so it is a
    bug that pack(U) works.  The W format is what should be used, but I
    think chr() is easier to read.

M       lib/unicore/mktables

commit 60071a22d6a6c85bd3b817ebc160338978eee9bd
Author: Karl Williamson <[email protected]>
Date:   Sun Jun 22 21:29:28 2014 -0600

    utf8.c: Improve debug message
    
    This function was called with an empty string "" because that string was
    not actually needed in the function, except to better identify the
    source when there is an error.  So change to specify the actual source.

M       utf8.c

commit 5e2adf8597b4131da54f048b3d903feb3849164a
Author: Karl Williamson <[email protected]>
Date:   Mon Jun 9 12:28:10 2014 -0600

    regcomp.c: EBCDIC fix

M       regcomp.c

commit d11389730686d2620d5490fa96eb210ea5837773
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit 0e0580046e789498690a4323d54c9c0ba13a05c5
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit 558bc5733bc9dde03f4161b1179a8a49b7dbc432
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit 86c3d61a7d00d9c0436b87cee87cb18a0055b6f8
Author: Karl Williamson <[email protected]>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit 6edd329f2057bbd709a169f99a8ea89d83e6b1ad
Author: Karl Williamson <[email protected]>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit 636a024b8c315521b8ddde9c3a7d31cea4e8d2f2
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <[email protected]> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit a4c2c75af551764534f9f8cb6afd4ef6ba514f45
Author: Andy Dougherty <[email protected]>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <[email protected]> for reporting this.

M       hints/os390.sh
-----------------------------------------------------------------------

Summary of changes:
 AUTHORS              |   1 +
 dquote_static.c      |  12 ++-
 handy.h              | 151 ++++++++++++++++++++----------
 hints/os390.sh       |  23 ++++-
 lib/locale.t         |  24 ++---
 lib/unicore/mktables |   9 +-
 locale.c             |   3 +
 makedepend.SH        |  64 +++++++------
 nostdio.h            |   4 +
 perlio.c             |  33 +++++++
 pod/perldata.pod     |  60 +++++++-----
 pod/perldelta.pod    |  18 +++-
 pod/perldiag.pod     |  32 ++++++-
 pod/perlebcdic.pod   |   4 +-
 regcomp.c            |   4 +-
 t/charset_tools.pl   |   2 +-
 t/lib/warnings/toke  |  33 ++++++-
 t/lib/warnings/utf8  |  18 +++-
 t/loc_tools.pl       |   4 +
 t/re/regexp.t        |  39 ++++++++
 t/uni/variables.t    | 257 +++++++++++++++++++++++++++++++++++++++------------
 toke.c               |  61 ++++++++----
 utf8.c               |  13 ++-
 utf8.h               |   4 +-
 24 files changed, 664 insertions(+), 209 deletions(-)

diff --git a/AUTHORS b/AUTHORS
index 4b059a6..482b2de 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -1229,6 +1229,7 @@ Wolfgang Laun                     
<[email protected]>
 Wolfram Humann                 <[email protected]>
 Xavier Noria                   <[email protected]>
 YAMASHINA Hio                  <[email protected]>
+Yaroslav Kuzmin                 <[email protected]>
 Yary Hluchan
 Yasushi Nakajima               <[email protected]>
 Yitzchak Scott-Thoennes                <[email protected]>
diff --git a/dquote_static.c b/dquote_static.c
index 802d83b..5fe7f0b 100644
--- a/dquote_static.c
+++ b/dquote_static.c
@@ -50,10 +50,14 @@ S_grok_bslash_c(pTHX_ const char source, const bool 
output_warning)
                         "Character following \"\\c\" must be printable ASCII");
     }
     else if (source == '{') {
-        assert(isPRINT_A(toCTRL('{')));
-
-        /* diag_listed_as: Use "%s" instead of "%s" */
-        Perl_croak(aTHX_ "Use \"%c\" instead of \"\\c{\"", toCTRL('{'));
+        const char control = toCTRL('{');
+        if (isPRINT_A(control)) {
+            /* diag_listed_as: Use "%s" instead of "%s" */
+            Perl_croak(aTHX_ "Use \"%c\" instead of \"\\c{\"", control);
+        }
+        else {
+            Perl_croak(aTHX_ "Sequence \"\\c{\" invalid");
+        }
     }
 
     result = toCTRL(source);
diff --git a/handy.h b/handy.h
index 5e0c86e..445d495 100644
--- a/handy.h
+++ b/handy.h
@@ -275,6 +275,7 @@ typedef U64TYPE U64;
 #define TYPE_DIGITS(T)  BIT_DIGITS(sizeof(T) * 8)
 #define TYPE_CHARS(T)   (TYPE_DIGITS(T) + 2) /* sign, NUL */
 
+/* Unused by core; should be deprecated */
 #define Ctl(ch) ((ch) & 037)
 
 /* This is a helper macro to avoid preprocessor issues, replaced by nothing
@@ -893,27 +894,22 @@ patched there.  The file as of this writing is 
cpan/Devel-PPPort/parts/inc/misc
 
 #ifdef EBCDIC
 #   ifndef _ALL_SOURCE
-        /* This returns the wrong results on at least z/OS unless this is
-         * defined. */
+        /* The native libc isascii() et.al. functions return the wrong results
+         * on at least z/OS unless this is defined. */
 #       error   _ALL_SOURCE should probably be defined
 #   endif
-
-    /* We could be called without perl.h, in which case NATIVE_TO_ASCII() is
-     * likely not defined, and so we use the native function */
-#   define isASCII(c)    cBOOL(isascii(c))
 #else
+    /* There is a simple definition of ASCII for ASCII platforms.  But the
+     * EBCDIC one isn't so simple, so is defined using table look-up like the
+     * other macros below */
 #   define isASCII(c)    ((WIDEST_UTYPE)(c) < 128)
 #endif
 
-#define isASCII_A(c)  isASCII(c)
-#define isASCII_L1(c)  isASCII(c)
-
 /* The lower 3 bits in both the ASCII and EBCDIC representations of '0' are 0,
  * and the 8 possible permutations of those bits exactly comprise the 8 octal
  * digits */
 #define isOCTAL_A(c)  cBOOL(FITS_IN_8_BITS(c) && (0xF8 & (c)) == '0')
 
-/* ASCII range only */
 #ifdef H_PERL       /* If have access to perl.h, lookup in its table */
 
 /* Character class numbers.  For internal core Perl use only.  The ones less
@@ -1089,6 +1085,10 @@ EXTCONST U32 PL_charclass[];
 #   define isWORDCHAR_L1(c) _generic_isCC(c, _CC_WORDCHAR)
 #   define isIDFIRST_L1(c) _generic_isCC(c, _CC_IDFIRST)
 
+#   ifdef EBCDIC
+#       define isASCII(c) _generic_isCC(c, _CC_ASCII)
+#   endif
+
     /* Participates in a single-character fold with a character above 255 */
 #   define 
_HAS_NONLATIN1_SIMPLE_FOLD_CLOSURE_ONLY_FOR_USE_BY_REGCOMP_DOT_C_AND_REGEXEC_DOT_C(c)
 ((! cBOOL(FITS_IN_8_BITS(c))) || (PL_charclass[(U8) (c)] & 
_CC_mask(_CC_NONLATIN1_SIMPLE_FOLD)))
 
@@ -1102,43 +1102,95 @@ EXTCONST U32 PL_charclass[];
                                             _generic_isCC(c, 
_CC_IS_IN_SOME_FOLD)
 #   define _IS_MNEMONIC_CNTRL_ONLY_FOR_USE_BY_REGCOMP_DOT_C(c) \
                                             _generic_isCC(c, 
_CC_MNEMONIC_CNTRL)
-#else   /* else we don't have perl.h */
+#else   /* else we don't have perl.h H_PERL */
 
     /* If we don't have perl.h, we are compiling a utility program.  Below we
      * hard-code various macro definitions that wouldn't otherwise be available
-     * to it. */
-#   ifdef EBCDIC
-        /* Use the native functions.  They likely will return false for all
-         * non-ASCII values, but this makes sure */
-#       define isLOWER_A(c)    (isASCII(c) && islower(c))
-#       define isPRINT_A(c)    (isASCII(c) && isprint(c))
-#       define isUPPER_A(c)    (isASCII(c) && isupper(c))
-#   else   /* ASCII platform.  These are coded based on first principals */
+     * to it. Most are coded based on first principals.  First some ones common
+     * to both ASCII and EBCDIC */
+#   define isDIGIT_A(c)  ((c) <= '9' && (c) >= '0')
+#   define isBLANK_A(c)  ((c) == ' ' || (c) == '\t')
+#   define isSPACE_A(c)  (isBLANK_A(c)                                       \
+                          || (c) == '\n'                                     \
+                          || (c) == '\r'                                     \
+                          || (c) == '\v'                                     \
+                          || (c) == '\f')
+#   ifdef EBCDIC    /* There are gaps between 'i' and 'j'; 'r' and 's'.  Same
+                       for uppercase.  This is ordered to exclude most things
+                       early */
+#       define isLOWER_A(c)  ((c) >= 'a' && (c) <= 'z'                       \
+                               && ((c) <= 'i'                                \
+                                   || ((c) >= 'j' && (c) <= 'r')             \
+                                   || (c) >= 's'))
+#       define isUPPER_A(c)  ((c) >= 'A' && (c) <= 'Z'                       \
+                               && ((c) <= 'I'                                \
+                                   || ((c) >= 'J' && (c) <= 'R')             \
+                                   || (c) >= 'S'))
+#   else   /* ASCII platform. */
 #       define isLOWER_A(c)  ((c) >= 'a' && (c) <= 'z')
-#       define isPRINT_A(c)  (((c) >= 32 && (c) < 127))
 #       define isUPPER_A(c)  ((c) <= 'Z' && (c) >= 'A')
-#   endif   /* Below are common definitions for ASCII and non-ASCII */
+#   endif
+
+    /* Some more ASCII, non-ASCII common definitions */
 #   define isALPHA_A(c)  (isUPPER_A(c) || isLOWER_A(c))
 #   define isALPHANUMERIC_A(c) (isALPHA_A(c) || isDIGIT_A(c))
-#   define isBLANK_A(c)      ((c) == ' ' || (c) == '\t')
-#   define isCNTRL_A(c)  (isASCII(c) && (! isPRINT_A(c)))
-#   define isDIGIT_A(c)  ((c) <= '9' && (c) >= '0')
-#   define isGRAPH_A(c)  (isPRINT_A(c) && (c) != ' ')
-#   define isIDFIRST_A(c)    (isALPHA_A(c) || (c) == '_')
-#   define isPUNCT_A(c)  (isGRAPH_A(c) && (! isALPHANUMERIC_A(c)))
-#   define isSPACE_A(c)  ((c) == ' '                                         \
-                       || (c) == '\t'                                        \
-                       || (c) == '\n'                                        \
-                       || (c) == '\r'                                        \
-                       || (c) == '\v'                                        \
-                       || (c) == '\f')
 #   define isWORDCHAR_A(c)   (isALPHANUMERIC_A(c) || (c) == '_')
-#   define isXDIGIT_A(c) (isDIGIT_A(c)                                       \
-                          || ((c) >= 'a' && (c) <= 'f')                      \
+#   define isIDFIRST_A(c)    (isALPHA_A(c) || (c) == '_')
+#   define isXDIGIT_A(c) (isDIGIT_A(c)                                      \
+                          || ((c) >= 'a' && (c) <= 'f')                     \
                           || ((c) <= 'F' && (c) >= 'A'))
 
+#   ifdef EBCDIC
+#       define isPUNCT_A(c)  ((c) == '-' || (c) == '!' || (c) == '"'        \
+                           || (c) == '#' || (c) == '$' || (c) == '%'        \
+                           || (c) == '&' || (c) == '\'' || (c) == '('       \
+                           || (c) == ')' || (c) == '*' || (c) == '+'        \
+                           || (c) == ',' || (c) == '.' || (c) == '/'        \
+                           || (c) == ':' || (c) == ';' || (c) == '<'        \
+                           || (c) == '=' || (c) == '>' || (c) == '?'        \
+                           || (c) == '@' || (c) == '[' || (c) == '\\'       \
+                           || (c) == ']' || (c) == '^' || (c) == '_'        \
+                           || (c) == '`' || (c) == '{' || (c) == '|'        \
+                           || (c) == '}' || (c) == '~')
+#       define isGRAPH_A(c)  (isALPHANUMERIC_A(c) || isPUNCT_A(c))
+#       define isPRINT_A(c)  (isGRAPH_A(c) || (c) == ' ')
+
+#       ifdef QUESTION_MARK_CTRL
+#           define _isQMC(c) ((c) == QUESTION_MARK_CTRL)
+#       else
+#           define _isQMC(c) 0
+#       endif
+
+        /* I (khw) can't think of a way to define all the ASCII controls
+         * without resorting to a libc (locale-sensitive) call.  But we know
+         * that all controls but the question-mark one are in the range 0-0x3f.
+         * This makes sure that all the controls that have names are included,
+         * and all controls that are also considered ASCII in the locale.  This
+         * may include more or fewer than what it actually should, but the
+         * wrong ones are less-important controls, so likely won't impact
+         * things (keep in mind that this is compiled only if perl.h isn't
+         * available).  The question mark control is included if available */
+#       define isCNTRL_A(c)  (((c) < 0x40 && isascii(c))                    \
+                            || (c) == '\0' || (c) == '\a' || (c) == '\b'    \
+                            || (c) == '\f' || (c) == '\n' || (c) == '\r'    \
+                            || (c) == '\t' || (c) == '\v' || _isQMC(c))
+
+#       define isASCII(c)    (isCNTRL_A(c) || isPRINT_A(c))
+#   else    /* ASCII platform; things are simpler, and  isASCII has already
+               been defined */
+#       define isGRAPH_A(c)  (((c) > ' ' && (c) < 127))
+#       define isPRINT_A(c)  (isGRAPH_A(c) || (c) == ' ')
+#       define isPUNCT_A(c)  (isGRAPH_A(c) && (! isALPHANUMERIC_A(c)))
+#       define isCNTRL_A(c)  (isASCII(c) && (! isPRINT_A(c)))
+#   endif
+
     /* The _L1 macros may be unnecessary for the utilities; I (khw) added them
-     * during debugging, and it seems best to keep them. */
+     * during debugging, and it seems best to keep them.  We may be called
+     * without NATIVE_TO_LATIN1 being defined.  On ASCII platforms, it doesn't
+     * do anything anyway, so make it not a problem */
+#   if ! defined(EBCDIC) && ! defined(NATIVE_TO_LATIN1)
+#       define NATIVE_TO_LATIN1(ch) (ch)
+#   endif
 #   define isPSXSPC_A(c)     isSPACE_A(c) /* XXX Assumes SPACE matches '\v' */
 #   define isALPHA_L1(c)     (isUPPER_L1(c) || isLOWER_L1(c))
 #   define isALPHANUMERIC_L1(c) (isALPHA_L1(c) || isDIGIT_A(c))
@@ -1192,6 +1244,7 @@ EXTCONST U32 PL_charclass[];
     /*  And these aren't accurate at all.  They are useful only for above
      *  Latin1, which utilities and bootstrapping don't deal with */
 #   define _IS_NON_FINAL_FOLD_ONLY_FOR_USE_BY_REGCOMP_DOT_C(c) 0
+#   define 
_HAS_NONLATIN1_SIMPLE_FOLD_CLOSURE_ONLY_FOR_USE_BY_REGCOMP_DOT_C_AND_REGEXEC_DOT_C(c)
 0
 #   define 
_HAS_NONLATIN1_FOLD_CLOSURE_ONLY_FOR_USE_BY_REGCOMP_DOT_C_AND_REGEXEC_DOT_C(c) 0
 
     /* Many of the macros later in this file are defined in terms of these.  By
@@ -1205,10 +1258,12 @@ EXTCONST U32 PL_charclass[];
          (FITS_IN_8_BITS(c) && S_bootstrap_ctype((U8) (c), (classnum), TRUE))
 #   define _generic_isCC_A(c, classnum)                                      \
          (FITS_IN_8_BITS(c) && S_bootstrap_ctype((U8) (c), (classnum), FALSE))
-#endif  /* End of no perl.h */
+#endif  /* End of no perl.h H_PERL */
 
 #define isALPHANUMERIC(c)  isALPHANUMERIC_A(c)
 #define isALPHA(c)   isALPHA_A(c)
+#define isASCII_A(c)  isASCII(c)
+#define isASCII_L1(c)  isASCII(c)
 #define isBLANK(c)   isBLANK_A(c)
 #define isCNTRL(c)   isCNTRL_A(c)
 #define isDIGIT(c)   isDIGIT_A(c)
@@ -1559,7 +1614,7 @@ EXTCONST U32 PL_charclass[];
 #ifdef EBCDIC
     /* Because all controls are UTF-8 invariants in EBCDIC, we can use this
      * more efficient macro instead of the more general one */
-#   define isCNTRL_utf8(p)      isCNTRL_L1(p)
+#   define isCNTRL_utf8(p)      isCNTRL_L1(*(p))
 #else
 #   define isCNTRL_utf8(p)      _generic_utf8(_CC_CNTRL, p, 0)
 #endif
@@ -1665,17 +1720,19 @@ EXTCONST U32 PL_charclass[];
  * the outlier from the block that contains the other controls, just like
  * toCTRL('?') on ASCII yields DEL, the control that is the outlier from the C0
  * block.  If it weren't special cased, it would yield a non-control.
- * The conversion works both ways, so CTRL('D') is 4, and CTRL(4) is D, etc. */
+ * The conversion works both ways, so toCTRL('D') is 4, and toCTRL(4) is D,
+ * etc. */
 #ifndef EBCDIC
-#  define toCTRL(c)    (toUPPER(c) ^ 64)
+#  define toCTRL(c)    (__ASSERT_(FITS_IN_8_BITS(c)) toUPPER(c) ^ 64)
 #else
-#  define toCTRL(c)    ((isPRINT_A(c))                          \
-                       ? UNLIKELY((c) == '?')                   \
-                         ? QUESTION_MARK_CTRL                   \
-                         : (NATIVE_TO_LATIN1(toUPPER(c)) ^ 64)  \
-                       : UNLIKELY((c) == QUESTION_MARK_CTRL)    \
-                         ? ((c) == '?')                         \
-                         : (LATIN1_TO_NATIVE((c) ^ 64)))
+#  define toCTRL(c)   (__ASSERT_(FITS_IN_8_BITS(c))              \
+                      ((isPRINT_A(c))                            \
+                       ? (UNLIKELY((c) == '?')                   \
+                         ? QUESTION_MARK_CTRL                    \
+                         : (NATIVE_TO_LATIN1(toUPPER(c)) ^ 64))  \
+                       : (UNLIKELY((c) == QUESTION_MARK_CTRL)    \
+                         ? '?'                                   \
+                         : (LATIN1_TO_NATIVE((c) ^ 64)))))
 #endif
 
 /* Line numbers are unsigned, 32 bits. */
diff --git a/hints/os390.sh b/hints/os390.sh
index 5aafb4e..d9b0f8a 100644
--- a/hints/os390.sh
+++ b/hints/os390.sh
@@ -158,10 +158,10 @@ esac
 # under a compiler other than c89.
 case "$usedl" in
 define)
-echo 'cat >.$$.c; '"$cc"' -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE 
-D_SHR_ENVIRON -E -Wc,"LANGLVL(DOLLARINNAMES)",NOLOC ${1+"$@"} .$$.c | grep -v 
"??="; rm .$$.c' > cppstdin
+echo 'cat >.$$.c; '"$cc"' -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE 
-D_SHR_ENVIRON -E -Wc,"LANGLVL(DOLLARINNAMES)",NOLOC ${1+"$@"} .$$.c | fgrep -v 
"??="; rm .$$.c' > cppstdin
     ;;
 *)
-echo 'cat >.$$.c; '"$cc"' -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE 
-E -Wc,"LANGLVL(DOLLARINNAMES)",NOLOC ${1+"$@"} .$$.c | grep -v "??="; rm 
.$$.c' > cppstdin
+echo 'cat >.$$.c; '"$cc"' -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE 
-E -Wc,"LANGLVL(DOLLARINNAMES)",NOLOC ${1+"$@"} .$$.c | fgrep -v "??="; rm 
.$$.c' > cppstdin
     ;;
 esac
 
@@ -231,3 +231,22 @@ d_gethostbyaddr_r='undef'
 d_gethostbyname_r='undef'
 d_gethostent_r='undef'
 
+# The z/OS C compiler compiler supports the attribute keyword, but in a
+# limited manner.
+#
+# Ideally, Configure's tests should test the attributes as they are expected
+# to be used in perl, and, ideally, those tests would fail on z/OS.
+# Until then, just tell Configure to ignore the attributes.  Currently,
+# Configure thinks attributes are supported because it does not recognize
+# warning messages like this:
+#
+# INFORMATIONAL CCN4108 ./proto.h:4534 Â The use of keyword '__attribute__' is 
non-portable.
+
+d_attribute_deprecated='undef'
+d_attribute_format='undef'
+d_attribute_malloc='undef'
+d_attribute_nonnull='undef'
+d_attribute_noreturn='undef'
+d_attribute_pure='undef'
+d_attribute_unused='undef'
+d_attribute_warn_unused_result='undef'
diff --git a/lib/locale.t b/lib/locale.t
index f59e17b..54dbd38 100644
--- a/lib/locale.t
+++ b/lib/locale.t
@@ -2075,10 +2075,10 @@ foreach my $Locale (@Locale) {
                             "; lc=", disp_chars(($y)), "; ",
                             "; fc=", disp_chars((fc $x)), "; ",
                             disp_chars(($x)), "=~/", disp_chars(($y)), "/i=",
-                            $x =~ /$y/i ? 1 : 0,
+                            $x =~ /\Q$y/i ? 1 : 0,
                             "; ",
                             disp_chars(($y)), "=~/", disp_chars(($x)), "/i=",
-                            $y =~ /$x/i ? 1 : 0,
+                            $y =~ /\Q$x/i ? 1 : 0,
                             "\n");
                 #
                 # If $x and $y contain regular expression characters
@@ -2108,7 +2108,7 @@ foreach my $Locale (@Locale) {
                     print "# Regex characters in '$x' or '$y', skipping test 
$locales_test_number for locale '$Locale'\n";
                     next;
                 }
-                push @f, $x unless $x =~ /$y/i && $y =~ /$x/i;
+                push @f, $x unless $x =~ /\Q$y/i && $y =~ /\Q$x/i;
 
                 # fc is not a locale concept, so Perl uses lc for it.
                 push @f, $x unless lc $x eq fc $x;
@@ -2121,13 +2121,13 @@ foreach my $Locale (@Locale) {
                             "; lc=", disp_chars(($y)), "; ",
                             "; fc=", disp_chars((fc $x)), "; ",
                             disp_chars(($x)), "=~/", disp_chars(($y)), "/i=",
-                            $x =~ /$y/i ? 1 : 0,
+                            $x =~ /\Q$y/i ? 1 : 0,
                             "; ",
                             disp_chars(($y)), "=~/", disp_chars(($x)), "/i=",
-                            $y =~ /$x/i ? 1 : 0,
+                            $y =~ /\Q$x/i ? 1 : 0,
                             "\n");
 
-                push @f, $x unless $x =~ /$y/i && $y =~ /$x/i;
+                push @f, $x unless $x =~ /\Q$y/i && $y =~ /\Q$x/i;
 
                 # The places where Unicode's lc is different from fc are
                 # skipped here by virtue of the 'next unless uc...' line above
@@ -2143,16 +2143,16 @@ foreach my $Locale (@Locale) {
                             "; uc=", disp_chars(($y)), "; ",
                             "; fc=", disp_chars((fc $x)), "; ",
                             disp_chars(($x)), "=~/", disp_chars(($y)), "/i=",
-                            $x =~ /$y/i ? 1 : 0,
+                            $x =~ /\Q$y/i ? 1 : 0,
                             "; ",
                             disp_chars(($y)), "=~/", disp_chars(($x)), "/i=",
-                            $y =~ /$x/i ? 1 : 0,
+                            $y =~ /\Q$x/i ? 1 : 0,
                             "\n");
                 if ($x =~ $re || $y =~ $re) { # See above.
                     print "# Regex characters in '$x' or '$y', skipping test 
$locales_test_number for locale '$Locale'\n";
                     next;
                 }
-                push @f, $x unless $x =~ /$y/i && $y =~ /$x/i;
+                push @f, $x unless $x =~ /\Q$y/i && $y =~ /\Q$x/i;
 
                 push @f, $x unless lc $x eq fc $x;
             }
@@ -2164,12 +2164,12 @@ foreach my $Locale (@Locale) {
                             "; uc=", disp_chars(($y)), "; ",
                             "; fc=", disp_chars((fc $x)), "; ",
                             disp_chars(($x)), "=~/", disp_chars(($y)), "/i=",
-                            $x =~ /$y/i ? 1 : 0,
+                            $x =~ /\Q$y/i ? 1 : 0,
                             "; ",
                             disp_chars(($y)), "=~/", disp_chars(($x)), "/i=",
-                            $y =~ /$x/i ? 1 : 0,
+                            $y =~ /\Q$x/i ? 1 : 0,
                             "\n");
-                push @f, $x unless $x =~ /$y/i && $y =~ /$x/i;
+                push @f, $x unless $x =~ /\Q$y/i && $y =~ /\Q$x/i;
 
                 push @f, $x unless lc $x eq fc $x;
             }
diff --git a/lib/unicore/mktables b/lib/unicore/mktables
index ffbfe74..a21aa1a 100644
--- a/lib/unicore/mktables
+++ b/lib/unicore/mktables
@@ -6999,11 +6999,14 @@ END
                     }
                 }
 
-                # I (khw) have never waded through this line to
-                # understand it well enough to comment it.
+                # The unpack yields a list of the bytes that comprise the
+                # UTF-8 of $code_point, which are each placed in \xZZ format
+                # and output in the %s to map to $tostr, so the result looks
+                # like:
+                # "\xC4\xB0" => "\x{0069}\x{0307}",
                 my $utf8 = sprintf(qq["%s" => "$tostr",],
                         join("", map { sprintf "\\x%02X", $_ }
-                            unpack("U0C*", pack("U", $code_point))));
+                            unpack("U0C*", chr $code_point)));
 
                 # Add a comment so that a human reader can more easily
                 # see what's going on.
diff --git a/locale.c b/locale.c
index 2e68b23..0bf234c 100644
--- a/locale.c
+++ b/locale.c
@@ -1425,6 +1425,8 @@ Perl__is_cur_LC_category_utf8(pTHX_ int category)
 
 #endif /* the code that is compiled when no nl_langinfo */
 
+#ifndef EBCDIC  /* On os390, even if the name ends with "UTF-8', it isn't a
+                   UTF-8 locale */
     /* As a last resort, look at the locale name to see if it matches
      * qr/UTF -?  * 8 /ix, or some other common locale names.  This "name", the
      * return of setlocale(), is actually defined to be opaque, so we can't
@@ -1464,6 +1466,7 @@ Perl__is_cur_LC_category_utf8(pTHX_ int category)
                               "Locale %s doesn't end with UTF-8 in name\n",
                                 save_input_locale));
     }
+#endif
 
 #ifdef WIN32
     /* http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756.aspx */
diff --git a/makedepend.SH b/makedepend.SH
index f992af3..9870c3f 100755
--- a/makedepend.SH
+++ b/makedepend.SH
@@ -141,34 +141,46 @@ for file in `$cat .clist`; do
     #          && defined(BAR) /* comment */ \
     #          && defined(BAZ) /* comment */ \
     #               etc.
-    # This code processes these latter situations first; it assumes there is
-    # at most one straightforward comment per continued preprocessor line.  (It
-    # would be easier to handle more general cases if sed had a non-greedy '*'
-    # quantifier; but typically preprocessor directive lines are rather
-    # simple.)  The continuation line is joined, and the process repeated on
-    # the enlarged line as long as there are continuations.  At the end, if
-    # there are any comments remaining, they should be like the first 
situation,
-    # and can just be deleted.  (Subsequent lines of the comment are irrelevant
-    # and get dropped.)
-    ( $echo "#line 2 \"$file\""; \
-      $sed -n <$file \
-       -e "/^${filebase}_init(/q" \
-        -e ': testcont'               \
-       -e '/^[  ]*#/s|/\*.*\*/||'    \
-        -e '/\\$/{'                   \
-            -e 'N'                    \
-            -e 'b testcont'           \
-        -e '}'                        \
-        -e 's/\\\n/ /g'               \
-       -e '/^#line/d' \
-       -e '/^[  ]*#/{' \
-       -e 's|/\*.*$||' \
-       -e p \
+    # Also, in lines like
+    #      #defined FOO(a,b)    a/**/b
+    # the comment may be important and so needs to be retained.
+    # This code processes the single-line comments first; it assumes there is
+    # at most one straightforward comment per continued preprocessor line,
+    # replacing each non-empty comment (and its surrounding white space) by a
+    # single space.  (sed only has a greedy '*' quantifier, so this doesn't
+    # work right if there are multiple comments per line, and strings can look
+    # like comments to it; both are unlikely in a preprocessor statement.) Any
+    # continuation line is joined, and the process repeated on the enlarged
+    # line as long as there are continuations.  At the end, if there are any
+    # comments remaining, they are either completely empty or are like the
+    # first situation.  The latter are just deleted by first deleting to the
+    # end of line (including preceding white space) things that start with '/*'
+    # and the next char isn't a '*'; then things that start with '/**', but the
+    # next char isn't a '/'.  (Subsequent lines of the comment are irrelevant
+    # and get dropped.)  At the end, we unjoin very long lines to avoid
+    # preprocessor limitations
+    ( $echo "#line 2 \"$file\"";                               \
+      $sed -n <$file                                           \
+       -e "/^${filebase}_init(/q"                             \
+        -e ': testcont'                                        \
+       -e '/^[         ]*#/s|[         ]*/\*..*\*/[    ]*| |' \
+        -e '/\\$/{'                                            \
+            -e 'N'                                             \
+            -e 'b testcont'                                    \
+        -e '}'                                                 \
+       -e 's/\\\n//g'                                         \
+       -e '/^#line/d'                                         \
+       -e '/^[         ]*#/{'                                 \
+           -e 's|[     ]*/\*[^*].*$||'                        \
+           -e 's|[     ]*/\*\*[^/].*$||'                      \
+            -e 's/.\{255\}/&\\\n/g'                           \
+           -e p                                               \
        -e '}' ) >UU/$file.c
 
-    if [ "$osname" = os390 -a "$file" = perly.c ]; then
-        $echo '#endif' >>UU/$file.c
-    fi
+    # We're not sure why this was there; the #endif is extraneous on modern 
z/OS
+    #if [ "$osname" = os390 -a "$file" = perly.c ]; then
+    #   $echo '#endif' >>UU/$file.c
+    #fi
 
     if [ "$osname" = os390 ]; then
         $cppstdin $finc -I. $cppflags $cppminus <UU/$file.c |
diff --git a/nostdio.h b/nostdio.h
index ef8d652..c815fd8 100644
--- a/nostdio.h
+++ b/nostdio.h
@@ -25,6 +25,8 @@ struct _FILE;
 #define FILE struct _FILE
 #endif
 
+#ifndef EBCDIC
+
 #define _CANNOT "CANNOT"
 
 #undef clearerr
@@ -125,6 +127,8 @@ struct _FILE;
 #define pclose(f)  _CANNOT _pclose_
 #endif
 
+#endif /*not define EBCDIC */
+
 /*
  * Local variables:
  * c-indentation-style: bsd
diff --git a/perlio.c b/perlio.c
index 6c742d2..19b73ab 100644
--- a/perlio.c
+++ b/perlio.c
@@ -2930,11 +2930,27 @@ PerlIO_importFILE(FILE *stdio, const char *mode)
 {
     dTHX;
     PerlIO *f = NULL;
+#ifdef EBCDIC
+        int rc;
+        char filename[FILENAME_MAX];
+        fldata_t fileinfo;
+#endif
     if (stdio) {
        PerlIOStdio *s;
         int fd0 = fileno(stdio);
         if (fd0 < 0) {
+#ifdef EBCDIC
+                         rc = fldata(stdio,filename,&fileinfo);
+                         if(rc != 0){
+                                 return NULL;
+                         }
+                         if(fileinfo.__dsorgHFS){
+            return NULL;
+        }
+                         /*This MVS dataset , OK!*/
+#else
             return NULL;
+#endif
         }
        if (!mode || !*mode) {
            /* We need to probe to see how we can open the stream
@@ -2966,7 +2982,24 @@ PerlIO_importFILE(FILE *stdio, const char *mode)
        if ((f = PerlIO_push(aTHX_(PerlIO_allocate(aTHX)), 
PERLIO_FUNCS_CAST(&PerlIO_stdio), mode, NULL))) {
            s = PerlIOSelf(f, PerlIOStdio);
            s->stdio = stdio;
+#ifdef EBCDIC
+               fd0 = fileno(stdio);
+               if(fd0 != -1){
+                       PerlIOUnix_refcnt_inc(fd0);
+               }
+               else{
+                       rc = fldata(stdio,filename,&fileinfo);
+                       if(rc != 0){
+                               PerlIOUnix_refcnt_inc(fd0);
+                       }
+                       if(fileinfo.__dsorgHFS){
+                               PerlIOUnix_refcnt_inc(fd0);
+                       }
+                         /*This MVS dataset , OK!*/
+               }
+#else
            PerlIOUnix_refcnt_inc(fileno(stdio));
+#endif
        }
     }
     return f;
diff --git a/pod/perldata.pod b/pod/perldata.pod
index c490b63..b0f5e7e 100644
--- a/pod/perldata.pod
+++ b/pod/perldata.pod
@@ -188,31 +188,49 @@ fully-qualified.  They come in four forms:
 
 =over
 
-=item A sigil, followed solely by digits matching \p{POSIX_Digit}, like C<$0>,
-C<$1>, or C<$10000>.
-
-=item A sigil, followed by either a caret and a single POSIX uppercase letter,
-like C<$^V> or C<$^W>, or a sigil followed by a literal control character
-matching the C<\p{POSIX_Cntrl}> property.
-Due to a historical oddity, if not
-running under C<use utf8>, the 128 extra controls in the C<[0x80-0xff]> range
-may also be used in length one variables.  The use of a literal control
-character is deprecated.  Support for this form will be removed in a future
-version of perl.
-
-=item Similar to the above, a sigil, followed by bareword text in brackets,
-where the first character is either a caret followed by an uppercase letter,
-or a literal control, like C<${^GLOBAL_PHASE}> or C<${\7LOBAL_PHASE}>.  The use
-of a literal control character is deprecated.  Support for this form will be
-removed in a future version of perl.
-
-=item A sigil followed by a single character matching the C<\p{POSIX_Punct}>
-property, like C<$!> or C<%+>.
+=item *
+
+A sigil, followed solely by digits matching C<\p{POSIX_Digit}>, like
+C<$0>, C<$1>, or C<$10000>.
+
+=item *
+
+A sigil, followed by either a caret and a single POSIX uppercase letter,
+like C<$^V> or C<$^W>, or a sigil followed by a literal non-space,
+non-C<NUL> control character matching the C<\p{POSIX_Cntrl}> property.
+Due to a historical oddity, if not running under C<use utf8>, the 128
+characters in the C<[0x80-0xff]> range are considered to be controls,
+and may also be used in length-one variables.  However, the use of
+non-graphical characters is deprecated as of v5.22, and support for them
+will be removed in a future version of perl.  ASCII space characters and
+C<NUL> already aren't allowed, so this means that a single-character
+variable name with that name being any other C0 control C<[0x01-0x1F]>,
+or C<DEL> will generate a deprecated warning.  Already, under C<"use
+utf8">, non-ASCII characters must match C<Perl_XIDS>.  As of v5.22, when
+not under C<"use utf8"> C1 controls C<[0x80-0x9F]>, NO BREAK SPACE, and
+SOFT HYPHEN (C<SHY>)) generate a deprecated warning.
+
+=item *
+
+Similar to the above, a sigil, followed by bareword text in brackets,
+where the first character is either a caret followed by an uppercase
+letter, like C<${^GLOBAL_PHASE}> or a non-C<NUL>, non-space literal
+control like C<${\7LOBAL_PHASE}>.  Like the above, when not under
+C<"use utf8">, the characters in C<[0x80-0xFF]> are considered controls, but as
+of v5.22, the use of any that are non-graphical are deprecated, and as
+of v5.20 the use of any ASCII-range literal control is deprecated.
+Support for these will be removed in a future version of perl.
+
+=item *
+
+A sigil followed by a single character matching the C<\p{POSIX_Punct}>
+property, like C<$!> or C<%+>, except the character C<"{"> doesn't work.
 
 =back
 
 Note that as of Perl 5.20, literal control characters in variable names
-are deprecated.
+are deprecated; and as of Perl 5.22, any other non-graphic characters
+are also deprecated.
 
 =head2 Context
 X<context> X<scalar context> X<list context>
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 2c611ca..8e9aa2d 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -78,6 +78,17 @@ as an updated module in the L</Modules and Pragmata> section.
 
 [ List each other deprecation as a =head2 entry ]
 
+=head2 Use of non-graphic characters in single-character variable names
+
+The syntax for single-character variable names is more lenient than
+for longer variable names, allowing the one-character name to be a
+punctuation character or even invisible (a non-graphic).  Perl v5.20
+deprecated the ASCII-range controls as such a name.  Now, all
+non-graphic characters that formerly were allowed are deprecated.
+The practical effect of this occurs only when not under C<S<"use
+utf8">>, and affects just the C1 controls (code points 0x80 through
+0xFF), NO-BREAK SPACE, and SOFT HYPHEN.
+
 =head1 Performance Enhancements
 
 XXX Changes which enhance performance without changing behaviour go here.
@@ -152,13 +163,14 @@ XXX Changes which significantly change existing files in 
F<pod/> go here.
 However, any changes to F<pod/perldiag.pod> should go in the L</Diagnostics>
 section.
 
-=head3 L<XXX>
+=head3 L<perldata/Identifier parsing>
 
 =over 4
 
 =item *
 
-XXX Description of the change here
+The syntax of single-character variable names has been brought
+up-to-date and more fully explained.
 
 =back
 
@@ -192,7 +204,7 @@ XXX L<message|perldiag/"message">
 
 =item *
 
-XXX L<message|perldiag/"message">
+L<Use of literal non-graphic characters in variable names is 
deprecated|perldiag/"Use of literal non-graphic characters in variable names is 
deprecated">
 
 =back
 
diff --git a/pod/perldiag.pod b/pod/perldiag.pod
index 863caf1..fbfdb93 100644
--- a/pod/perldiag.pod
+++ b/pod/perldiag.pod
@@ -1181,6 +1181,13 @@ probably because you don't have write permission to the 
directory.
 (P) An error peculiar to VMS.  Perl thought stdin was a pipe, and tried
 to reopen it to accept binary data.  Alas, it failed.
 
+=item Can't represent character for Ox%X on this platform
+
+(F) There is a hard limit to how big a character code point can be due
+to the fundamental properties of UTF-8, especially on EBCDIC
+platforms.  The given code point exceeds that.  The only work-around is
+to not use such a large code point.
+
 =item Can't reset %ENV on this system
 
 (F) You called C<reset('E')> or similar, which tried to reset
@@ -5056,6 +5063,14 @@ missing the final closing quote or angle bracket after 
the name.  The
 S<<-- HERE> shows whereabouts in the regular expression the problem was
 discovered.
 
+=item Sequence "\c{" invalid
+
+(F) These three characters may not appear in sequence in a
+double-quotish context.  This message is raised only on non-ASCII
+platforms (a different error message is output on ASCII ones).  If you
+were intending to specify a control character with this sequence, you'll
+have to use a different way to specify it.
+
 =item Sequence \%s... not terminated in regex; marked by S<<-- HERE> in
 m/%s/
 
@@ -6517,10 +6532,19 @@ old way has bad side effects.
 
 =item Use of literal control characters in variable names is deprecated
 
-(D deprecated) Using literal control characters in the source to refer
-to the ^FOO variables, like C<$^X> and C<${^GLOBAL_PHASE}> is now
-deprecated.  This only affects code like C<$\cT>, where \cT is a control in
-the source code: C<${"\cT"}> and C<$^T> remain valid.
+=item Use of literal non-graphic characters in variable names is deprecated
+
+(D deprecated) Using literal non-graphic (including control) characters
+in the source to refer to the ^FOO variables, like C<$^X> and
+C<${^GLOBAL_PHASE}> is now deprecated.  (We use C<^X> and C<^G> here for
+legibility.  They actually represent the non-printable control
+characters, code points 0x18 and 0x07, respectively; C<^A> would mean
+the control character whose code point is 0x01.) This only affects code
+like
+C<$\cT>, where C<\cT> is a control in the source code; C<${"\cT"}> and
+C<$^T> remain valid.  Things that are non-controls and also not graphic
+are NO-BREAK SPACE and SOFT HYPHEN, which were previously only allowed
+for historical reasons.
 
 =item Use of -l on filehandle%s
 
diff --git a/pod/perlebcdic.pod b/pod/perlebcdic.pod
index 45a6e54..0a99be8 100644
--- a/pod/perlebcdic.pod
+++ b/pod/perlebcdic.pod
@@ -799,7 +799,9 @@ or regex, as it will absorb the terminator.   But 
C<\c\I<X>> is a C<FILE
 SEPARATOR> concatenated with I<X> for all I<X>.
 The outlier C<\c?> on ASCII, which yields a non-C0 control C<DEL>,
 yields the outlier control C<APC> on EBCDIC, the one that isn't in the
-block of contiguous controls.
+block of contiguous controls.  Note that a subtlety of this is that
+C<\c?> on ASCII platforms is an ASCII character, while it isn't
+equivalent to any ASCII character in EBCDIC platforms.
 
  chr   ord   8859-1    0037    1047 && POSIX-BC
  -----------------------------------------------------------------------
diff --git a/regcomp.c b/regcomp.c
index 5fe3c9c..a62e3e1 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -11364,7 +11364,7 @@ S_alloc_maybe_populate_EXACT(pTHX_ RExC_state_t 
*pRExC_state,
 
     if (! len_passed_in) {
         if (UTF) {
-            if (UNI_IS_INVARIANT(code_point)) {
+            if (UVCHR_IS_INVARIANT(code_point)) {
                 if (LOC || ! FOLD) {    /* /l defers folding until runtime */
                     *character = (U8) code_point;
                 }
@@ -12473,7 +12473,7 @@ tryagain:
                      * the simple case just below.) */
 
                     UV folded;
-                    if (isASCII(ender)) {
+                    if (isASCII_uni(ender)) {
                         folded = toFOLD(ender);
                         *(s)++ = (U8) folded;
                     }
diff --git a/t/charset_tools.pl b/t/charset_tools.pl
index 6abf902..6d70a37 100644
--- a/t/charset_tools.pl
+++ b/t/charset_tools.pl
@@ -29,7 +29,7 @@ sub latin1_to_native($) {
     return $string if $::IS_ASCII;
     my $output = "";
     for my $i (0 .. length($string) - 1) {
-        $output .= chr(ord_latin1_to_native(ord(substr($string, $i, 1))));
+        $output .= chr(utf8::unicode_to_native(ord(substr($string, $i, 1))));
     }
     # Preserve utf8ness of input onto the output, even if it didn't need to be
     # utf8
diff --git a/t/lib/warnings/toke b/t/lib/warnings/toke
index 8c0158a..cf0d020 100644
--- a/t/lib/warnings/toke
+++ b/t/lib/warnings/toke
@@ -150,6 +150,12 @@ EXPECT
 Use of bare << to mean <<"" is deprecated at - line 2.
 ########
 # toke.c
+BEGIN {
+    if (ord('A') == 193) {
+        print "SKIPPED\n# Literal control characters in variable names 
forbidden on EBCDIC";
+        exit 0;
+    }
+}
 eval "\$\cT";
 eval "\${\7LOBAL_PHASE}";
 eval "\${\cT}";
@@ -1360,11 +1366,30 @@ EXPECT
 "\c`" is more clearly written simply as "\ " at - line 4.
 ########
 # toke.c
+BEGIN {
+    if (ord('A') == 193) {
+        print "SKIPPED\n# test is ASCII-specific";
+        exit 0;
+    }
+}
+use warnings;
+my $a = "\c{ack}";
+EXPECT
+OPTION fatal
+Use ";" instead of "\c{" at - line 9.
+########
+# toke.c
+BEGIN {
+    if (ord('A') == 65) {
+        print "SKIPPED\n# test is EBCDIC-specific";
+        exit 0;
+    }
+}
 use warnings;
 my $a = "\c{ack}";
 EXPECT
 OPTION fatal
-Use ";" instead of "\c{" at - line 3.
+Sequence "\c{" invalid at - line 9.
 ########
 # toke.c
 my $a = "\cÃ¢";
@@ -1472,6 +1497,12 @@ I
 ########
 # toke.c
 #[perl #119123] disallow literal control character variables
+BEGIN {
+    if (ord('A') == 193) {
+        print "SKIPPED\n# Literal control characters in variable names 
forbidden on EBCDIC";
+        exit 0;
+    }
+}
 eval "\$\cQ = 25";
 eval "\${ \cX } = 24";
 *{
diff --git a/t/lib/warnings/utf8 b/t/lib/warnings/utf8
index 9004731..614d5ec 100644
--- a/t/lib/warnings/utf8
+++ b/t/lib/warnings/utf8
@@ -80,16 +80,26 @@ Operation "uc" returns its argument for non-Unicode code 
point 0x110000 at - lin
 use warnings 'utf8';
 my $d800  = uc(chr(0xD800));
 my $nonUnicode =  uc(chr(0x110000));
-my $big_nonUnicode = uc(chr(0x8000_0000));
 no warnings 'non_unicode';
 my $d800  = uc(chr(0xD800));
 my $nonUnicode =  uc(chr(0x110000));
-my $big_nonUnicode = uc(chr(0x8000_0000));
 EXPECT
 Operation "uc" returns its argument for UTF-16 surrogate U+D800 at - line 2.
 Operation "uc" returns its argument for non-Unicode code point 0x110000 at - 
line 3.
-Operation "uc" returns its argument for non-Unicode code point 0x80000000 at - 
line 4.
-Operation "uc" returns its argument for UTF-16 surrogate U+D800 at - line 6.
+Operation "uc" returns its argument for UTF-16 surrogate U+D800 at - line 5.
+########
+BEGIN {
+    if (ord('A') == 193) {
+        print "SKIPPED\n# ebcdic platforms can't handle this large a code 
point";
+        exit 0;
+    }
+}
+use warnings 'utf8';
+my $big_nonUnicode = uc(chr(0x8000_0000));
+no warnings 'non_unicode';
+my $big_nonUnicode = uc(chr(0x8000_0000));
+EXPECT
+Operation "uc" returns its argument for non-Unicode code point 0x80000000 at - 
line 8.
 ########
 use warnings 'utf8';
 my $d7ff  = lc pack("U", 0xD7FF);
diff --git a/t/loc_tools.pl b/t/loc_tools.pl
index fccbeeb..5406368 100644
--- a/t/loc_tools.pl
+++ b/t/loc_tools.pl
@@ -209,6 +209,10 @@ sub find_locales ($) {  # Returns an array of all the 
locales we found on the
 
 sub is_locale_utf8 ($) { # Return a boolean as to if core Perl thinks the input
                          # is a UTF-8 locale
+
+    # On z/OS, even locales marked as UTF-8 aren't.
+    return 0 if ord "A" != 65;
+
     my $locale = shift;
 
     use locale;
diff --git a/t/re/regexp.t b/t/re/regexp.t
index 7e104db..8c51ea5 100644
--- a/t/re/regexp.t
+++ b/t/re/regexp.t
@@ -3,6 +3,7 @@
 # The tests are in a separate file 't/re/re_tests'.
 # Each line in that file is a separate test.
 # There are five columns, separated by tabs.
+# An optional sixth column is used to give a reason, only when skipping tests
 #
 # Column 1 contains the pattern, optionally enclosed in C<''>.
 # Modifiers can be put after the closing C<'>.
@@ -20,6 +21,8 @@
 #      t       test exposes a bug with threading, TODO if qr_embed_thr
 #       s       test should only be run for regex_sets_compat.t
 #       S       test should not be run for regex_sets_compat.t
+#       a       test should only be run on ASCII platforms
+#       e       test should only be run on EBCDIC platforms
 #
 # Columns 4 and 5 are used only if column 3 contains C<y> or C<c>.
 #
@@ -47,6 +50,9 @@
 #
 # Note that columns 2,3 and 5 are all enclosed in double quotes and then
 # evalled; so something like a\"\x{100}$1 has length 3+length($1).
+#
+# \x... and \o{...} constants are automatically converted to the native
+# character set if necessary.  \[0-7] constants aren't
 
 my ($file, $iters);
 BEGIN {
@@ -71,6 +77,24 @@ sub _comment {
            map { split /\n/ } @_;
 }
 
+sub convert_from_ascii {
+    my $string = shift;
+
+    #my $save = $string;
+    # Convert \x{...}, \o{...}
+    $string =~ s/ (?<! \\ ) \\x\{ ( .*? ) } / "\\x{" . sprintf("%X", 
utf8::unicode_to_native(hex $1)) .  "}" /gex;
+    $string =~ s/ (?<! \\ ) \\o\{ ( .*? ) } / "\\o{" . sprintf("%o", 
utf8::unicode_to_native(oct $1)) .  "}" /gex;
+
+    # Convert \xAB
+    $string =~ s/ (?<! \\ ) \\x ( [A-Fa-f0-9]{2} ) / "\\x" . sprintf("%02X", 
utf8::unicode_to_native(hex $1)) /gex;
+
+    # Convert \xA
+    $string =~ s/ (?<! \\ ) \\x ( [A-Fa-f0-9] ) (?! [A-Fa-f0-9] ) / "\\x" . 
sprintf("%X", utf8::unicode_to_native(hex $1)) /gex;
+
+    #print STDERR __LINE__, ": $save\n$string\n" if $save ne $string;
+    return $string;
+}
+
 use strict;
 use warnings FATAL=>"all";
 use vars qw($bang $ffff $nulnul); # used by the tests
@@ -113,13 +137,20 @@ foreach (@tests) {
     }
     $reason = '' unless defined $reason;
     my $input = join(':',$pat,$subject,$result,$repl,$expect);
+
     # the double '' below keeps simple syntax highlighters from going crazy
     $pat = "'$pat'" unless $pat =~ /^[:''\/]/; 
     $pat =~ s/(\$\{\w+\})/$1/eeg;
     $pat =~ s/\\n/\n/g unless $regex_sets;
+    $pat = convert_from_ascii($pat) if ord("A") != 65;
+
+    $subject = convert_from_ascii($subject) if ord("A") != 65;
     $subject = eval qq("$subject"); die $@ if $@;
+
+    $expect = convert_from_ascii($expect) if ord("A") != 65;
     $expect  = eval qq("$expect"); die $@ if $@;
     $expect = $repl = '-' if $skip_amp and $input =~ /\$[&\`\']/;
+
     my $todo_qr = $qr_embed_thr && ($result =~ s/t//);
     my $skip = ($skip_amp ? ($result =~ s/B//i) : ($result =~ s/B//));
     ++$skip if $result =~ s/M// && !defined &DynaLoader::boot_DynaLoader;
@@ -129,6 +160,14 @@ foreach (@tests) {
             $reason = "Test not valid for $0";
         }
     }
+    if ($result =~ s/a// && ord("A") != 65) {
+        $skip++;
+        $reason = "Test is only valid for ASCII platforms.  $reason";
+    }
+    if ($result =~ s/e// && ord("A") != 193) {
+        $skip++;
+        $reason = "Test is only valid for EBCDIC platforms.  $reason";
+    }
     $reason = 'skipping $&' if $reason eq  '' && $skip_amp;
     $result =~ s/B//i unless $skip;
     my $todo= $result =~ s/T// ? " # TODO" : "";
diff --git a/t/uni/variables.t b/t/uni/variables.t
index e441000..e8259e5 100644
--- a/t/uni/variables.t
+++ b/t/uni/variables.t
@@ -14,7 +14,7 @@ use utf8;
 use open qw( :utf8 :std );
 no warnings qw(misc reserved);
 
-plan (tests => 65880);
+plan (tests => 66900);
 
 # ${single:colon} should not be valid syntax
 {
@@ -65,72 +65,200 @@ for my $v (qw( ^V ; < > ( ) {^GLOBAL_PHASE} ^W _ 1 4 0 [ ] 
! @ / \ = )) {
 
     local $@;
     eval "use utf8; \$$v;";
-    is $@, '', "No syntax error for \$$v under use utf8";
+    is $@, '', "No syntax error for \$$v under 'use utf8'";
   }
 }
 
 # Checking if the Latin-1 range behaves as expected, and that the behavior is 
the
 # same whenever under strict or not.
-for ( 0x80..0xff ) {
+for ( 0x0 .. 0xff ) {
+    my @warnings;
+    local $SIG {__WARN__} = sub {push @warnings, @_ };
+    my $ord = utf8::unicode_to_native($_);
+    my $chr = chr $ord;
+    my $syntax_error = 0;   # Do we expect this code point to generate a
+                            # syntax error?  Assume not, for now
+    my $deprecated = 0;
+    my $name;
+
+    # A different number of tests are run depending on the branches in this
+    # loop iteration.  This allows us to add skips to make the reported total
+    # the same for each iteration.
+    my $tests = 0;
+    my $max_tests = 6;
+
+    if ($chr =~ /[[:graph:]]/a) {
+        $name = "'$chr'";
+        $syntax_error = 1 if $chr eq '{';
+    }
+    elsif ($chr =~ /[[:space:]]/a) {
+        $name = sprintf "\\x%02x, an ASCII space character", $ord;
+        $syntax_error = 1;
+    }
+    elsif ($chr =~ /[[:cntrl:]]/a) {
+        if ($chr eq "\N{NULL}") {
+            $name = sprintf "\\x%02x, NUL", $ord;
+            $syntax_error = 1;
+        }
+        else {
+            $name = sprintf "\\x%02x, an ASCII control", $ord;
+            $syntax_error = $::IS_EBCDIC;
+            $deprecated = ! $syntax_error;
+        }
+    }
+    elsif ($chr =~ /\pC/) {
+        if ($chr eq "\N{SHY}") {
+            $name = sprintf "\\x%02x, SHY", $ord;
+        }
+        else {
+            $name = sprintf "\\x%02x, a C1 control", $ord;
+        }
+        $syntax_error = $::IS_EBCDIC;
+        $deprecated = ! $syntax_error;
+    }
+    elsif ($chr =~ /\p{XIDStart}/) {
+        $name = sprintf "\\x%02x, a non-ASCII XIDS character", $ord;
+    }
+    elsif ($chr =~ /\p{XPosixSpace}/) {
+        $name = sprintf "\\x%02x, a non-ASCII space character", $ord;
+        $syntax_error = $::IS_EBCDIC;
+        $deprecated = ! $syntax_error;
+    }
+    else {
+        $name = sprintf "\\x%02x, a non-ASCII, non-XIDS graphic character", 
$ord;
+    }
     no warnings 'closure';
-    my $chr = chr;
-    my $esc = sprintf("%X", ord $chr);
+    my $esc = sprintf("%X", $ord);
     utf8::downgrade($chr);
     if ($chr !~ /\p{XIDS}/u) {
-        is evalbytes "no strict; \$$chr = 10",
-            10,
-            sprintf("\\x%02x, part of the latin-1 range, is legal as a 
length-1 variable", $_);
-
-        utf8::upgrade($chr);
-        local $@;
-        eval "no strict; use utf8; \$$chr = 1";
-        like $@,
-            qr/\QUnrecognized character \x{\E\L$esc/,
-            sprintf("..but is illegal as a length-1 variable under use utf8", 
$_);
+        if ($syntax_error) {
+            evalbytes "\$$chr";
+            like($@, qr/ syntax\ error | Unrecognized\ character /x,
+                     "$name as a length-1 variable generates a syntax error");
+            $tests++;
+        }
+        elsif ($ord < 32 || chr =~ /[[:punct:][:digit:]]/a) {
+
+            # Unlike other variables, we dare not try setting the length-1
+            # variables that are \cX (for all valid X) nor ASCII ones that are
+            # punctuation nor digits.  This is because many of these variables
+            # have meaning to the system, and setting them could have side
+            # effects or not work as expected (And using fresh_perl() doesn't
+            # always help.) For example, setting $^D (to use a visible
+            # representation of code point 0x04) turns on tracing, and setting
+            # $^E sets an error number, but what gets printed is instead a
+            # string associated with that number.  For all these we just
+            # verify that they don't generate a syntax error.
+            local $@;
+            evalbytes "\$$chr;";
+            is $@, '', "$name as a length-1 variable doesn't generate a syntax 
error";
+            $tests++;
+            utf8::upgrade($chr);
+            evalbytes "no strict; use utf8; \$$chr;",
+            is $@, '', "  ... and the same under 'use utf8'";
+            $tests++;
+        }
+        else {
+            is evalbytes "no strict; \$$chr = 10",
+                10,
+                "$name is legal as a length-1 variable";
+            $tests++;
+            if ($chr =~ /[[:ascii:]]/) {
+                utf8::upgrade($chr);
+                is evalbytes "no strict; use utf8; \$$chr = 1",
+                    1,
+                    "  ... and is legal under 'use utf8'";
+                $tests++;
+            }
+            else {
+                utf8::upgrade($chr);
+                local $@;
+                eval "no strict; use utf8; \$$chr = 1";
+                like $@,
+                    qr/\QUnrecognized character \x{\E\L$esc/,
+                    "  ... but is illegal as a length-1 variable under 'use 
utf8'";
+                $tests++;
+            }
+        }
     }
     else {
         {
             no utf8;
             local $@;
             evalbytes "no strict; \$$chr = 1";
-            is($@, '', sprintf("\\x%02x, =~ \\p{XIDS}, latin-1, no utf8, no 
strict, is a valid length-1 variable", $_));
-
-            local $@;
-            evalbytes "use strict; \$$chr = 1";
-            is($@,
-                '',
-                sprintf("\\x%02x under no utf8 does not have to be required 
under strict, even though it matches XIDS", $_)
-            );
-
-            local $@;
-            evalbytes "\$a$chr = 1";
-            like($@,
-                qr/Unrecognized character /,
-                sprintf("...but under no utf8, it's not allowed in two-or-more 
character variables")
-            );
-
-            local $@;
-            evalbytes "\$a$chr = 1";
-            like($@,
-                qr/Unrecognized character /,
-                sprintf("...but under no utf8, it's not allowed in two-or-more 
character variables")
-            );
+            is($@, '', "$name under 'no utf8', 'no strict', is a valid 
length-1 variable");
+            $tests++;
+
+            if ($chr !~ /[[:ascii:]]/) {
+                local $@;
+                evalbytes "use strict; \$$chr = 1";
+                is($@,
+                    '',
+                    "  ... and under 'no utf8' does not have to be required 
under strict, even though it matches XIDS"
+                );
+                $tests++;
+
+                local $@;
+                evalbytes "\$a$chr = 1";
+                like($@,
+                    qr/Unrecognized character /,
+                    "  ... but under 'no utf8', it's not allowed in length-2+ 
variables"
+                );
+                $tests++;
+            }
         }
         {
             use utf8;
-            my $u = $chr;
-            utf8::upgrade($u);
+            my $utf8 = $chr;
+            utf8::upgrade($utf8);
             local $@;
-            eval "no strict; \$$u = 1";
-            is($@, '', sprintf("\\x%02x, =~ \\p{XIDS}, UTF-8, use utf8, no 
strict, is a valid length-1 variable", $_));
+            eval "no strict; \$$utf8 = 1";
+            is($@, '', "  ... and under 'use utf8', 'no strict', is a valid 
length-1 variable");
+            $tests++;
 
             local $@;
-            eval "use strict; \$$u = 1";
-            like($@,
-                qr/Global symbol "\$$u" requires explicit package name/,
-                sprintf("\\x%02x under utf8 has to be required under strict", 
$_)
-            );
+            eval "use strict; \$$utf8 = 1";
+            if ($chr =~ /[ab]/) {   # These are special, for sort()
+                is($@, '', "  ... and under 'use utf8', 'use strict',"
+                    . " is a valid length-1 variable (\$a and \$b are 
special)");
+                $tests++;
+            }
+            else {
+                like($@,
+                    qr/Global symbol "\$$utf8" requires explicit package name/,
+                    "  ... and under utf8 has to be required under strict"
+                );
+                $tests++;
+            }
+        }
+    }
+
+    if (! $deprecated) {
+        if ($chr =~ /[#*]/) {
+
+            # Length-1 variables with these two characters used to be used by
+            # Perl, but now their generates a warning that they're gone.
+            # Ignore such warnings.
+            for (my $i = @warnings - 1; $i >= 0; $i--) {
+                splice @warnings, $i, 1 if $warnings[$i] =~ /is no longer 
supported/;
+            }
         }
+        ok(@warnings == 0, "  ... and doesn't generate any warnings");
+        $tests++;
+    }
+    elsif (! @warnings) {
+        fail("  ... and generates deprecation warnings (since is deprecated)");
+        $tests++;
+    }
+    else {
+        ok((scalar @warnings == grep { $_ =~ /deprecated/ } @warnings),
+           "  ... and generates deprecation warnings (only)");
+        $tests++;
+    }
+
+    SKIP: {
+        die "Wrong max count for tests" if $tests > $max_tests;
+        skip("untaken tests", $max_tests - $tests) if $max_tests > $tests;
     }
 }
 
@@ -138,7 +266,7 @@ for ( 0x80..0xff ) {
     use utf8;
     my $ret = eval "my \$c\x{327} = 100; \$c\x{327}"; # c + cedilla
     is($@, '', "ASCII character + combining character works as a variable 
name");
-    is($ret, 100, "...and returns the correct value");
+    is($ret, 100, "  ... and returns the correct value");
 }
 
 # From Tom Christiansen's 'highly illegal variable names are now accidentally 
legal' mail
@@ -226,12 +354,16 @@ EOP
         no warnings 'deprecated';
 
         for my $var ( '$', "\7LOBAL_PHASE", "^GLOBAL_PHASE", "^V" ) {
+          SKIP: {
+            skip("Literal control characters in variable names forbidden on 
EBCDIC", 3)
+                             if ($::IS_EBCDIC && ord substr($var, 0, 1) < 32);
             eval "\${ $var}";
             is($@, '', "\${ $var} works" );
             eval "\${$var }";
             is($@, '', "\${$var } works" );
             eval "\${ $var }";
             is($@, '', "\${ $var } works" );
+          }
         }
     }
 }
@@ -244,19 +376,30 @@ EOP
     );
     
     
-    is(
-        "".eval "*{^JOIN}",
-        "*main::\nOIN",
-        "...but \$^J is still legal"
-    );
+    SKIP: {
+        skip('Is $^U on EBCDIC 1047, BC; nothing works on 0037', 1)
+                                                                if 
$::IS_EBCDIC;
+        is(
+            "".eval "*{^JOIN}",
+            "*main::\nOIN",
+            "  ... but \$^J is still legal"
+        );
+    }
     
+  SKIP: {
+    skip("Literal control characters in variable names forbidden on EBCDIC", 2)
+                                                                if 
$::IS_EBCDIC;
     no warnings 'deprecated';
     my $ret = eval "\${\cT\n}";
     is($@, "", 'No errors from using ${\n\cT\n}');
-    is($ret, $^T, "...and we got the right value");
+    is($ret, $^T, "  ... and we got the right value");
+  }
 }
 
-{
+SKIP: {
+    skip("Literal control characters in variable names forbidden on EBCDIC", 5)
+                                                                if 
$::IS_EBCDIC;
+
     # Originally from t/base/lex.t, moved here since we can't
     # turn deprecation warnings off in that file.
     no strict;
@@ -273,9 +416,9 @@ EOP
     );
 
     eval "\$\cQ = 24";                 # Literal control character
-    is($@, "", "...and they can be assigned to without error");
-    is(${"\cQ"}, 24, "...and the assignment works");
-    is($^Q, 24, "...even if we access the variable through the caret name");
+    is($@, "", "  ... and they can be assigned to without error");
+    is(${"\cQ"}, 24, "  ... and the assignment works");
+    is($^Q, 24, "  ... even if we access the variable through the caret name");
     is(\${"\cQ"}, \$^Q, '\${\cQ} == \$^Q');
 }
 
diff --git a/toke.c b/toke.c
index 44d0fef..b653687 100644
--- a/toke.c
+++ b/toke.c
@@ -8550,25 +8550,52 @@ S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, 
I32 ck_uni)
 
 /* Is the byte 'd' a legal single character identifier name?  'u' is true
  * iff Unicode semantics are to be used.  The legal ones are any of:
- *  a) ASCII digits
- *  b) ASCII punctuation
+ *  a) all ASCII characters except:
+ *          1) space-type ones, like \t and SPACE;
+            2) NUL;
+ *          3) '{'
+ *     The final case currently doesn't get this far in the program, so we
+ *     don't test for it.  If that were to change, it would be ok to allow it.
  *  c) When not under Unicode rules, any upper Latin1 character
- *  d) \c?, \c\, \c^, \c_, and \cA..\cZ, minus the ones that have traditionally
- *     been matched by \s on ASCII platforms.  That is: \c?, plus 1-32, minus
- *     the \s ones. */
-#define VALID_LEN_ONE_IDENT(d, u) (isPUNCT_A((U8)(d))                       \
-                                   || isDIGIT_A((U8)(d))                    \
-                                   || (!(u) && !isASCII((U8)(d)))           \
-                                   || ((((U8)(d)) < 32)                     \
-                                       && (((((U8)(d)) >= 14)               \
-                                           || (((U8)(d)) <= 8 && (d) != 0) \
-                                           || (((U8)(d)) == 13))))          \
-                                   || (((U8)(d)) == toCTRL('?')))
-    if (s < PL_bufend
-        && (isIDFIRST_lazy_if(s, is_utf8) || VALID_LEN_ONE_IDENT(*s, is_utf8)))
+ *  d) Otherwise, when unicode rules are used, all XIDS characters.
+ *
+ *      Because all ASCII characters have the same representation whether
+ *      encoded in UTF-8 or not, we can use the foo_A macros below and '\0' and
+ *      '{' without knowing if is UTF-8 or not */
+#ifdef EBCDIC
+#   define VALID_LEN_ONE_IDENT(s, is_utf8)                                    \
+    (isGRAPH_A(*(s)) || ((is_utf8)                                            \
+                         ? isIDFIRST_utf8((U8*) (s))                          \
+                         : (isGRAPH_L1(*s)                                    \
+                            && LIKELY((U8) *(s) != LATIN1_TO_NATIVE(0xAD)))))
+#else
+#   define VALID_LEN_ONE_IDENT(s, is_utf8) (! isSPACE_A(*(s))                 \
+                                            && LIKELY(*(s) != '\0')           \
+                                            && (! is_utf8                     \
+                                                || isASCII_utf8((U8*) (s))    \
+                                                || isIDFIRST_utf8((U8*) (s))))
+#endif
+    if ((s <= PL_bufend - (is_utf8)
+                          ? UTF8SKIP(s)
+                          : 1)
+        && VALID_LEN_ONE_IDENT(s, is_utf8))
     {
-        if ( isCNTRL_A((U8)*s) ) {
-            deprecate("literal control characters in variable names");
+        /* Deprecate all non-graphic characters.  Include SHY as a non-graphic,
+         * because often it has no graphic representation.  (We can't get to
+         * here with SHY when 'is_utf8' is true, so no need to include a UTF-8
+         * test for it.) */
+        if ((is_utf8)
+            ? ! isGRAPH_utf8( (U8*) s)
+            : (! isGRAPH_L1( (U8) *s)
+               || UNLIKELY((U8) *(s) == LATIN1_TO_NATIVE(0xAD))))
+        {
+            /* Split messages for back compat */
+            if (isCNTRL_A( (U8) *s)) {
+                deprecate("literal control characters in variable names");
+            }
+            else {
+                deprecate("literal non-graphic characters in variable names");
+            }
         }
         
         if (is_utf8) {
diff --git a/utf8.c b/utf8.c
index a7baed4..f42b1a2 100644
--- a/utf8.c
+++ b/utf8.c
@@ -107,6 +107,11 @@ Perl_uvoffuni_to_utf8_flags(pTHX_ U8 *d, UV uv, UV flags)
        return d;
     }
 
+#ifdef EBCDIC
+    /* Not representable in UTF-EBCDIC */
+    flags |= UNICODE_DISALLOW_FE_FF;
+#endif
+
     /* The first problematic code point is the first surrogate */
     if (uv >= UNICODE_SURROGATE_FIRST
         && ckWARN3_d(WARN_SURROGATE, WARN_NON_UNICODE, WARN_NONCHAR))
@@ -130,6 +135,10 @@ Perl_uvoffuni_to_utf8_flags(pTHX_ U8 *d, UV uv, UV flags)
            if (flags & UNICODE_DISALLOW_SUPER
                || (UNICODE_IS_FE_FF(uv) && (flags & UNICODE_DISALLOW_FE_FF)))
            {
+#ifdef EBCDIC
+                Perl_die(aTHX_ "Can't represent character for Ox%"UVXf" on 
this platform", uv);
+                assert(0);
+#endif
                return NULL;
            }
        }
@@ -1688,7 +1697,7 @@ Perl__is_utf8_perl_idstart(pTHX_ const U8 *p)
     if (! PL_utf8_perl_idstart) {
         invlist = _new_invlist_C_array(_Perl_IDStart_invlist);
     }
-    return is_utf8_common(p, &PL_utf8_perl_idstart, "", invlist);
+    return is_utf8_common(p, &PL_utf8_perl_idstart, "_Perl_IDStart", invlist);
 }
 
 bool
@@ -1711,7 +1720,7 @@ Perl__is_utf8_perl_idcont(pTHX_ const U8 *p)
     if (! PL_utf8_perl_idcont) {
         invlist = _new_invlist_C_array(_Perl_IDCont_invlist);
     }
-    return is_utf8_common(p, &PL_utf8_perl_idcont, "", invlist);
+    return is_utf8_common(p, &PL_utf8_perl_idcont, "_Perl_IDCont", invlist);
 }
 
 bool
diff --git a/utf8.h b/utf8.h
index d3b55ee..3d29706 100644
--- a/utf8.h
+++ b/utf8.h
@@ -382,11 +382,11 @@ Perl's extended UTF-8 means we can have start bytes up to 
FF.
 #define UTF8_TWO_BYTE_HI(c)                                                    
\
        (__ASSERT_((sizeof(c) ==  1)                                            
\
                   || !(((WIDEST_UTYPE)(c)) & ~MAX_PORTABLE_UTF8_TWO_BYTE))     
\
-        ((U8) __BASE_TWO_BYTE_HI(c, NATIVE_TO_LATIN1)))
+        ((U8) __BASE_TWO_BYTE_HI(c, NATIVE_TO_UNI)))
 #define UTF8_TWO_BYTE_LO(c)                                                    
\
        (__ASSERT_((sizeof(c) ==  1)                                            
\
                   || !(((WIDEST_UTYPE)(c)) & ~MAX_PORTABLE_UTF8_TWO_BYTE))     
\
-        ((U8) __BASE_TWO_BYTE_LO(c, NATIVE_TO_LATIN1)))
+        ((U8) __BASE_TWO_BYTE_LO(c, NATIVE_TO_UNI)))
 
 /* This is illegal in any well-formed UTF-8 in both EBCDIC and ASCII
  * as it is only in overlongs. */

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.21.5-57-g42327f0

Reply via email to