[perl.git] branch blead, updated. v5.26.0-RC1-56-g74ca8c9e18

Dave Mitchell Mon, 22 May 2017 08:31:36 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/74ca8c9e1838915d63d9881cdb7d5fa2adc383df?hp=8928e7dbca2b466dfcc52c61375531769f148292>


- Log -----------------------------------------------------------------
commit 74ca8c9e1838915d63d9881cdb7d5fa2adc383df
Author: David Mitchell <[email protected]>
Date:   Mon May 22 16:22:23 2017 +0100

    perldelta: tweak SV flags; eliminate hv_fetchs
    
    Expand the section on SV flags: they have been eliminated rather than just
    changed.
    
    And remove this entry:
    
        Change C<hv_fetch(â¦, "â¦", â¦, â¦)> to C<hv_fetchs(â¦, "â¦", 
â¦)>
    
        The dual-life dists all use Devel::PPPort, so they can use this
        function even though it was only added in 5.10.
    
    which appears to be just an implementation detail.

M       pod/perldelta.pod

commit c11f9206a779c0ab72982d5516580d441fb0b698
Author: David Mitchell <[email protected]>
Date:   Mon May 22 16:09:50 2017 +0100

    perldelta: expand Unicode/utf8 API changes
    
    Move all the Unicode/utf8 API changes into a separate bulleted
    sub-section, and add mentions of every macro/function added to the API.
    Previously there was just a vague "Several macros and functions have been
    added to the public API" without enumerating them.

M       pod/perldelta.pod

commit c2ef64b1f87fab29a77142b20270c85f60840918
Author: David Mitchell <[email protected]>
Date:   Mon May 22 15:01:18 2017 +0100

    perldelta: sort the "Internal Changes" section.
    
    Group entries that have similar themes such as unicode API or optree
    
    No changes apart from cut-n-paste of whole =item entries.

M       pod/perldelta.pod

commit 5573eafe4125dff4a970b0aee2878ee16282ea3e
Author: David Mitchell <[email protected]>
Date:   Mon May 22 14:50:21 2017 +0100

    perldelta: move an entry from Internal to Bug Fixes

M       pod/perldelta.pod

commit cdb2fee1b204e9179162a4d81ea042f70491b655
Author: David Mitchell <[email protected]>
Date:   Mon May 22 14:44:05 2017 +0100

    perldelta: remove some "Internal Changes" entries
    
    In general we don't list internals changes unless it affects the API or is
    very visible (e.g. performance enhancement). So I've deleted the following
    entries:
    
        Several new internal C macros have been added that take a string
        literal as arguments, alongside existing routines that take the
        equivalent value as two ...
    
    AFAKT, this is referring to:
    
        strEQs() and strNEs()
        _memEQs() and _memNEs
    
    which aren't listed as part of the API, so we don't (yet?) want to
    advertise them.
    
        The code in F<gv.c> that determines whether a variable has a special
        meaning to Perl has been simplified.
    
    which is great, but isn't visible AFAIKT.
    
        Use C<my_strlcat()> in C<locale.c>.  While C<strcat()> is safe in this
        context, some compilers were optimizing this to C<strcpy()> causing a
        porting test to fail that looks for unsafe code.  Rather than fighting
        this, we just use C<my_strlcat()> instead.
    
    Perhaps this should be reported as bug fix instead?

M       pod/perldelta.pod

commit 22a7b456e71458ee5dd4f86817a65094297b9833
Author: David Mitchell <[email protected]>
Date:   Mon May 22 14:17:22 2017 +0100

    perldelta: re-order "Core Enhancements" entries
    
    ... based subjectively on importance. I've put new language features
    first, followed by unicodey-stuff.
    
    There have been no edits - purely cut-n-pasting an entire entry as-is to a
    new location.

M       pod/perldelta.pod

commit f09bae29768c5900bc552dee9fdf7dcb9d31d573
Author: David Mitchell <[email protected]>
Date:   Mon May 22 14:07:29 2017 +0100

    perldelta: fix some issues raised by Karl
    
    moved utf8_hop_safe() to Internals;
    
    removed
        A regression from the previous development release, 5.23.3, where
        compiling a regular expression could crash the interpreter has been
        fixed. [perl #128686].
    
    since that was a typo - it was 5.25.3, and since the bug appeared, and was
    fixed in, the same development branch, doesn't need mentioning.

M       pod/perldelta.pod
-----------------------------------------------------------------------

Summary of changes:
 pod/perldelta.pod | 307 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 164 insertions(+), 143 deletions(-)

diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 4a30a164fd..39efba65e0 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -38,24 +38,13 @@ See L</Unescaped literal C<"{"> characters in regular 
expression patterns are no
 
 =head1 Core Enhancements
 
-=head2 New regular expression modifier C</xx>
-
-Specifying two C<x> characters to modify a regular expression pattern
-does everything that a single one does, but additionally TAB and SPACE
-characters within a bracketed character class are generally ignored and
-can be added to improve readability, like
-S<C</[ ^ A-Z d-f p-x ]/xx>>.  Details are at
-L<perlre/E<sol>x and E<sol>xx>.
-
-=head2 New Hash Function For 64-bit Builds
-
-We have switched to a hybrid hash function to better balance
-performance for short and long keys.
+=head2 Lexical subroutines are no longer experimental
 
-For short keys, 16 bytes and under, we use an optimised variant of
-One At A Time Hard, and for longer keys we use Siphash 1-3.  For very
-long keys this is a big improvement in performance.  For shorter keys
-there is a modest improvement.
+Using the C<lexical_subs> feature introduced in v5.18 no longer emits a 
warning.  Existing
+code that disables the C<experimental::lexical_subs> warning category
+that the feature previously used will continue to work.  The
+C<lexical_subs> feature has no effect; all Perl code can use lexical
+subroutines, regardless of what feature declarations are in scope.
 
 =head2 Indented Here-documents
 
@@ -89,6 +78,15 @@ For example:
 
 prints "Hello there\n" with no leading whitespace.
 
+=head2 New regular expression modifier C</xx>
+
+Specifying two C<x> characters to modify a regular expression pattern
+does everything that a single one does, but additionally TAB and SPACE
+characters within a bracketed character class are generally ignored and
+can be added to improve readability, like
+S<C</[ ^ A-Z d-f p-x ]/xx>>.  Details are at
+L<perlre/E<sol>x and E<sol>xx>.
+
 =head2 @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL}
 
 C<@{^CAPTURE}> exposes the capture buffers of the last match as an
@@ -106,6 +104,20 @@ C<%{^CAPTURE_ALL}> is the equivalent to C<%-> (I<i.e.>, 
all named captures).
 Other than being more self documenting there is no difference between the
 two forms.
 
+=head2 Declaring a reference to a variable
+
+As an experimental feature, Perl now allows the referencing operator to come
+after L<C<my()>|perlfunc/my>, L<C<state()>|perlfunc/state>,
+L<C<our()>|perlfunc/our>, or L<C<local()>|perlfunc/local>.  This syntax must
+be enabled with C<use feature 'declared_refs'>.  It is experimental, and will
+warn by default unless C<no warnings 'experimental::refaliasing'> is in effect.
+It is intended mainly for use in assignments to references.  For example:
+
+    use experimental 'refaliasing', 'declared_refs';
+    my \$a = \$b;
+
+See L<perlref/Assigning to References> for more details.
+
 =head2 Unicode 9.0 is now supported
 
 A list of changes is at L<http://www.unicode.org/versions/Unicode9.0.0/>.
@@ -123,20 +135,6 @@ programs that very specifically needed the old behavior.  
The meaning of
 compound forms, like C<\p{sc=I<script>}> are unchanged.  See
 L<perlunicode/Scripts>.
 
-=head2 Declaring a reference to a variable
-
-As an experimental feature, Perl now allows the referencing operator to come
-after L<C<my()>|perlfunc/my>, L<C<state()>|perlfunc/state>,
-L<C<our()>|perlfunc/our>, or L<C<local()>|perlfunc/local>.  This syntax must
-be enabled with C<use feature 'declared_refs'>.  It is experimental, and will
-warn by default unless C<no warnings 'experimental::refaliasing'> is in effect.
-It is intended mainly for use in assignments to references.  For example:
-
-    use experimental 'refaliasing', 'declared_refs';
-    my \$a = \$b;
-
-See L<perlref/Assigning to References> for more details.
-
 =head2 Perl can now do default collation in UTF-8 locales on platforms
 that support it
 
@@ -155,14 +153,6 @@ ignored at the higher priority ones.  There are still some 
gotchas in
 some strings, though.  See
 L<perllocale/Collation of strings containing embedded C<NUL> characters>.
 
-=head2 Lexical subroutines are no longer experimental
-
-Using the C<lexical_subs> feature introduced in v5.18 no longer emits a 
warning.  Existing
-code that disables the C<experimental::lexical_subs> warning category
-that the feature previously used will continue to work.  The
-C<lexical_subs> feature has no effect; all Perl code can use lexical
-subroutines, regardless of what feature declarations are in scope.
-
 =head2 C<CORE> subroutines for hash and array functions callable via
 reference
 
@@ -172,10 +162,15 @@ be called with ampersand syntax (C<&CORE::keys(\%hash>) 
and via reference
 (C<< my $k = \&CORE::keys; $k-E<gt>(\%hash) >>).  Previously they could only be
 used when inlined.
 
-=head2 for XS code, create a safer utf8_hop() called utf8_hop_safe()
+=head2 New Hash Function For 64-bit Builds
 
-Unlike utf8_hop(), utf8_hop_safe() won't navigate before the beginning or after
-the end of the supplied buffer.
+We have switched to a hybrid hash function to better balance
+performance for short and long keys.
+
+For short keys, 16 bytes and under, we use an optimised variant of
+One At A Time Hard, and for longer keys we use Siphash 1-3.  For very
+long keys this is a big improvement in performance.  For shorter keys
+there is a modest improvement.
 
 =head1 Security
 
@@ -2161,18 +2156,23 @@ t/uni/overload.t: Skip hanging test on FreeBSD.
 
 =item *
 
-The C<op_class()> API function has been added.  This is like the existing
-C<OP_CLASS()> macro, but can more accurately determine what struct an op
-has been allocated as.  For example C<OP_CLASS()> might return
-C<OA_BASEOP_OR_UNOP> indicating that ops of this type are usually
-allocated as an C<OP> or C<UNOP>; while C<op_class()> will return
-C<OPclass_BASEOP> or C<OPclass_UNOP> as appropriate.
+A new API function C<sv_setvpv_bufsize()> allows simultaneously setting the
+length and allocated size of the buffer in an C<SV>, growing the buffer if
+necessary.
 
 =item *
 
-The output format of the C<op_dump()> function (as used by C<perl -Dx>)
-has changed: it now displays an "ASCII-art" tree structure, and shows more
-low-level details about each op, such as its address and class.
+A new API macro C<SvPVCLEAR()> sets its C<SV> argument to an empty string,
+like Perl-space C<$x = ''>, but with several optimisations.
+
+=item *
+
+Several new macros and functions for dealing with Unicode and
+UTF-8-encoded strings have been added to the API, as well some changes in
+functionality of existing functions (see L<perlapi/Unicode Support> for
+more details):
+
+=over
 
 =item *
 
@@ -2193,23 +2193,70 @@ Similarly, macros like C<toLOWER_utf8> on malformed 
UTF-8 now die.
 
 =item *
 
-Calling the functions C<utf8n_to_uvchr> and its derivatives, while
-passing a string length of 0 is now asserted against in DEBUGGING
-builds, and otherwise returns the Unicode REPLACEMENT CHARACTER.   If
-you have nothing to decode, you shouldn't call the decode function.
+Several new macros for analysing the validity of utf8 sequences. These
+are:
+
+C<L<perlapi/UTF8_GOT_ABOVE_31_BIT>>
+C<L<perlapi/UTF8_GOT_CONTINUATION>>
+C<L<perlapi/UTF8_GOT_EMPTY>>
+C<L<perlapi/UTF8_GOT_LONG>>
+C<L<perlapi/UTF8_GOT_NONCHAR>>
+C<L<perlapi/UTF8_GOT_NON_CONTINUATION>>
+C<L<perlapi/UTF8_GOT_OVERFLOW>>
+C<L<perlapi/UTF8_GOT_SHORT>>
+C<L<perlapi/UTF8_GOT_SUPER>>
+C<L<perlapi/UTF8_GOT_SURROGATE>>
+C<L<perlapi/UTF8_IS_INVARIANT>>
+C<L<perlapi/UTF8_IS_NONCHAR>>
+C<L<perlapi/UTF8_IS_SUPER>>
+C<L<perlapi/UTF8_IS_SURROGATE>>
+C<L<perlapi/UVCHR_IS_INVARIANT>>
+C<L<perlapi/isUTF8_CHAR_flags>>
+C<L<perlapi/isSTRICT_UTF8_CHAR>>
+C<L<perlapi/isC9_STRICT_UTF8_CHAR>>
 
 =item *
 
-The functions C<utf8n_to_uvchr> and its derivatives now return the
-Unicode REPLACEMENT CHARACTER if called with UTF-8 that has the overlong
-malformation, and that malformation is allowed by the input parameters.
-This malformation is where the UTF-8 looks valid syntactically, but
-there is a shorter sequence that yields the same code point.  This has
-been forbidden since Unicode version 3.1.
+Functions that are all extensions of the C<is_utf8_string_*()> functions,
+that apply various restrictions to the UTF-8 recognized as valid:
+
+C<L<perlapi/is_strict_utf8_string>>,
+C<L<perlapi/is_strict_utf8_string_loc>>,
+C<L<perlapi/is_strict_utf8_string_loclen>>,
+
+C<L<perlapi/is_c9strict_utf8_string>>,
+C<L<perlapi/is_c9strict_utf8_string_loc>>,
+C<L<perlapi/is_c9strict_utf8_string_loclen>>,
+
+C<L<perlapi/is_utf8_string_flags>>,
+C<L<perlapi/is_utf8_string_loc_flags>>,
+C<L<perlapi/is_utf8_string_loclen_flags>>,
+
+C<L<perlapi/is_utf8_fixed_width_buf_flags>>,
+C<L<perlapi/is_utf8_fixed_width_buf_loc_flags>>,
+C<L<perlapi/is_utf8_fixed_width_buf_loclen_flags>>.
+
+C<L<perlapi/is_utf8_invariant_string>>.
+C<L<perlapi/is_utf8_valid_partial_char>>.
+C<L<perlapi/is_utf8_valid_partial_char_flags>>.
 
 =item *
 
-The functions C<utf8n_to_uvchr> and its derivatives now accept an input
+The  functions C<L<perlapi/utf8n_to_uvchr>> and its derivatives have had
+several changes of behaviour.
+
+Calling them, while passing a string length of 0 is now asserted against
+in DEBUGGING builds, and otherwise returns the Unicode REPLACEMENT
+CHARACTER.   If you have nothing to decode, you shouldn't call the decode
+function.
+
+They now return the Unicode REPLACEMENT CHARACTER if called with UTF-8
+that has the overlong malformation, and that malformation is allowed by
+the input parameters.  This malformation is where the UTF-8 looks valid
+syntactically, but there is a shorter sequence that yields the same code
+point.  This has been forbidden since Unicode version 3.1.
+
+They now accept an input
 flag to allow the overflow malformation.  This malformation is when the
 UTF-8 may be syntactically valid, but the code point it represents is
 not capable of being represented in the word length on the platform.
@@ -2218,21 +2265,19 @@ error, and advances the parse pointer to beyond the 
UTF-8 in question,
 but it returns the Unicode REPLACEMENT CHARACTER as the value of the
 code point (since the real value is not representable).
 
-=item *
-
-The C<PADOFFSET> type has changed from being unsigned to signed, and
-several pad-related variables such as C<PL_padix> have changed from being
-of type C<I32> to type C<PADOFFSET>.
-
-=item *
-
-The function C<L<perlapi/utf8n_to_uvchr>> has been changed to not
+C<utf8n_to_uvchr> has been changed to not
 abandon searching for other malformations when the first one is
 encountered.  A call to it thus can generate multiple diagnostics,
 instead of just one.
 
 =item *
 
+C<valid_utf8_to_uvchr()> has been added to the API (although it was
+present in core earlier). Like C<utf8_to_uvchr_buf()>, but assumes that
+the next character is well-formed.
+
+=item *
+
 A new function, C<L<perlapi/utf8n_to_uvchr_error>>, has been added for
 use by modules that need to know the details of UTF-8 malformations
 beyond pass/fail.  Previously, the only ways to know why a sequence was
@@ -2241,108 +2286,85 @@ your own analysis.
 
 =item *
 
-Several new functions for handling Unicode have been added to the API:
-C<L<perlapi/is_strict_utf8_string>>,
-C<L<perlapi/is_c9strict_utf8_string>>,
-C<L<perlapi/is_utf8_string_flags>>,
-C<L<perlapi/is_strict_utf8_string_loc>>,
-C<L<perlapi/is_strict_utf8_string_loclen>>,
-C<L<perlapi/is_c9strict_utf8_string_loc>>,
-C<L<perlapi/is_c9strict_utf8_string_loclen>>,
-C<L<perlapi/is_utf8_string_loc_flags>>,
-C<L<perlapi/is_utf8_string_loclen_flags>>,
-C<L<perlapi/is_utf8_fixed_width_buf_flags>>,
-C<L<perlapi/is_utf8_fixed_width_buf_loc_flags>>,
-C<L<perlapi/is_utf8_fixed_width_buf_loclen_flags>>.
-
-These functions are all extensions of the C<is_utf8_string_*()> functions,
-that apply various restrictions to the UTF-8 recognized as valid.
+There is now a safer version of utf8_hop(), called utf8_hop_safe().
+Unlike utf8_hop(), utf8_hop_safe() won't navigate before the beginning or
+after the end of the supplied buffer.
 
 =item *
 
-A new API function C<sv_setvpv_bufsize()> allows simultaneously setting the
-length and allocated size of the buffer in an C<SV>, growing the buffer if
-necessary.
+Two new functions, C<utf8_hop_forward()> and C<utf8_hop_back()> are
+similar to C<utf8_hop_safe()> but are for when you know which direction
+you wish to travel.
 
 =item *
 
-A new API macro C<SvPVCLEAR()> sets its C<SV> argument to an empty string,
-like Perl-space C<$x = ''>, but with several optimisations.
+Two new macros which return useful utf8 byte sequences:
 
-=item *
+C<L<perlapi/BOM_UTF8>>
+C<L<perlapi/REPLACEMENT_CHARACTER_UTF8>>
 
-All parts of the internals now agree that the C<sassign> op is a C<BINOP>;
-previously it was listed as a C<BASEOP> in F<regen/opcodes>, which meant
-that several parts of the internals had to be special-cased to accommodate
-it.  This oddity's original motivation was to handle code like C<$x ||= 1>;
-that is now handled in a simpler way.
+=back
 
 =item *
 
-Several new internal C macros have been added that take a string literal as
-arguments, alongside existing routines that take the equivalent value as two
-arguments, a character pointer and a length.  The advantage of this is that
-the length of the string is calculated automatically, rather than having to
-be done manually.  These routines are now used where appropriate across the
-entire codebase.
-
-=item *
+Perl is now built with the C<PERL_OP_PARENT> compiler define enabled by
+default.  To disable it, use the C<PERL_NO_OP_PARENT> compiler define.
+This flag alters how the C<op_sibling> field is used in C<OP> structures,
+and has been available optionally since perl 5.22.
 
-The code in F<gv.c> that determines whether a variable has a special meaning
-to Perl has been simplified.
+See L<perl5220delta/"Internal Changes"> for more details of what this
+build option does.
 
 =item *
 
-The C<DEBUGGING>-mode output for regex compilation and execution has been
-enhanced.
+Three new ops, C<OP_ARGELEM>, C<OP_ARGDEFELEM> and C<OP_ARGCHECK> have
+been added.  These are intended principally to implement the individual
+elements of a subroutine signature, plus any overall checking required.
 
 =item *
 
-Several macros and functions have been added to the public API for
-dealing with Unicode and UTF-8-encoded strings.  See
-L<perlapi/Unicode Support>.
+The C<op_class()> API function has been added.  This is like the existing
+C<OP_CLASS()> macro, but can more accurately determine what struct an op
+has been allocated as.  For example C<OP_CLASS()> might return
+C<OA_BASEOP_OR_UNOP> indicating that ops of this type are usually
+allocated as an C<OP> or C<UNOP>; while C<op_class()> will return
+C<OPclass_BASEOP> or C<OPclass_UNOP> as appropriate.
 
 =item *
 
-Use C<my_strlcat()> in C<locale.c>.  While C<strcat()> is safe in this context,
-some compilers were optimizing this to C<strcpy()> causing a porting test to
-fail that looks for unsafe code.  Rather than fighting this, we just use
-C<my_strlcat()> instead.
+All parts of the internals now agree that the C<sassign> op is a C<BINOP>;
+previously it was listed as a C<BASEOP> in F<regen/opcodes>, which meant
+that several parts of the internals had to be special-cased to accommodate
+it.  This oddity's original motivation was to handle code like C<$x ||= 1>;
+that is now handled in a simpler way.
 
 =item *
 
-Three new ops, C<OP_ARGELEM>, C<OP_ARGDEFELEM> and C<OP_ARGCHECK> have
-been added.  These are intended principally to implement the individual
-elements of a subroutine signature, plus any overall checking required.
+The output format of the C<op_dump()> function (as used by C<perl -Dx>)
+has changed: it now displays an "ASCII-art" tree structure, and shows more
+low-level details about each op, such as its address and class.
 
 =item *
 
-Perl no longer panics when switching into some locales on machines with
-buggy C<strxfrm()> implementations in their libc. [perl #121734]
+The C<PADOFFSET> type has changed from being unsigned to signed, and
+several pad-related variables such as C<PL_padix> have changed from being
+of type C<I32> to type C<PADOFFSET>.
 
 =item *
 
-Perl is now built with the C<PERL_OP_PARENT> compiler define enabled by
-default.  To disable it, use the C<PERL_NO_OP_PARENT> compiler define.
-This flag alters how the C<op_sibling> field is used in C<OP> structures,
-and has been available optionally since perl 5.22.
-
-See L<perl5220delta/"Internal Changes"> for more details of what this
-build option does.
+The C<DEBUGGING>-mode output for regex compilation and execution has been
+enhanced.
 
 =item *
 
-The meanings of some internal SV flags have been changed
-
-OPpRUNTIME, SVpbm_VALID, SVpbm_TAIL, SvTAIL_on, SvTAIL_off, SVrepl_EVAL,
-SvEVALED
+Several obscure SV flags have been eliminated, sometimes along with the
+macros which manipulate them: C<SVpbm_VALID>, C<SVpbm_TAIL>, C<SvTAIL_on>,
+C<SvTAIL_off>, C<SVrepl_EVAL>, C<SvEVALED>
 
 =item *
 
-Change C<hv_fetch(â¦, "â¦", â¦, â¦)> to C<hv_fetchs(â¦, "â¦", â¦)>
-
-The dual-life dists all use Devel::PPPort, so they can use this function even
-though it was only added in 5.10.
+An OP op_private flag has been eliminated: C<OPpRUNTIME>. This used to
+often get set on C<PMOP>s, but had become meaningless over time.
 
 =back
 
@@ -2352,6 +2374,11 @@ though it was only added in 5.10.
 
 =item *
 
+Perl no longer panics when switching into some locales on machines with
+buggy C<strxfrm()> implementations in their libc. [perl #121734]
+
+=item *
+
 C< $-{$name} > would leak an C<AV> on each access if the regular
 expression had no named captures.  The same applies to access to any
 hash tied with L<Tie::Hash::NamedCapture> and C<< all =E<gt> 1 >>. [perl
@@ -2689,12 +2716,6 @@ A regression in 5.24 with C<tr/\N{U+...}/foo/> when the 
code point was between
 
 =item *
 
-A regression from the previous development release, 5.23.3, where
-compiling a regular expression could crash the interpreter has been
-fixed. [perl #128686].
-
-=item *
-
 Use of a string delimiter whose code point is above 2**31 now works
 correctly on platforms that allow this.  Previously, certain characters,
 due to truncation, would be confused with other delimiter characters

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.26.0-RC1-56-g74ca8c9e18

Reply via email to