In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/74ca8c9e1838915d63d9881cdb7d5fa2adc383df?hp=8928e7dbca2b466dfcc52c61375531769f148292>
- Log ----------------------------------------------------------------- commit 74ca8c9e1838915d63d9881cdb7d5fa2adc383df Author: David Mitchell <[email protected]> Date: Mon May 22 16:22:23 2017 +0100 perldelta: tweak SV flags; eliminate hv_fetchs Expand the section on SV flags: they have been eliminated rather than just changed. And remove this entry: Change C<hv_fetch(â¦, "â¦", â¦, â¦)> to C<hv_fetchs(â¦, "â¦", â¦)> The dual-life dists all use Devel::PPPort, so they can use this function even though it was only added in 5.10. which appears to be just an implementation detail. M pod/perldelta.pod commit c11f9206a779c0ab72982d5516580d441fb0b698 Author: David Mitchell <[email protected]> Date: Mon May 22 16:09:50 2017 +0100 perldelta: expand Unicode/utf8 API changes Move all the Unicode/utf8 API changes into a separate bulleted sub-section, and add mentions of every macro/function added to the API. Previously there was just a vague "Several macros and functions have been added to the public API" without enumerating them. M pod/perldelta.pod commit c2ef64b1f87fab29a77142b20270c85f60840918 Author: David Mitchell <[email protected]> Date: Mon May 22 15:01:18 2017 +0100 perldelta: sort the "Internal Changes" section. Group entries that have similar themes such as unicode API or optree No changes apart from cut-n-paste of whole =item entries. M pod/perldelta.pod commit 5573eafe4125dff4a970b0aee2878ee16282ea3e Author: David Mitchell <[email protected]> Date: Mon May 22 14:50:21 2017 +0100 perldelta: move an entry from Internal to Bug Fixes M pod/perldelta.pod commit cdb2fee1b204e9179162a4d81ea042f70491b655 Author: David Mitchell <[email protected]> Date: Mon May 22 14:44:05 2017 +0100 perldelta: remove some "Internal Changes" entries In general we don't list internals changes unless it affects the API or is very visible (e.g. performance enhancement). So I've deleted the following entries: Several new internal C macros have been added that take a string literal as arguments, alongside existing routines that take the equivalent value as two ... AFAKT, this is referring to: strEQs() and strNEs() _memEQs() and _memNEs which aren't listed as part of the API, so we don't (yet?) want to advertise them. The code in F<gv.c> that determines whether a variable has a special meaning to Perl has been simplified. which is great, but isn't visible AFAIKT. Use C<my_strlcat()> in C<locale.c>. While C<strcat()> is safe in this context, some compilers were optimizing this to C<strcpy()> causing a porting test to fail that looks for unsafe code. Rather than fighting this, we just use C<my_strlcat()> instead. Perhaps this should be reported as bug fix instead? M pod/perldelta.pod commit 22a7b456e71458ee5dd4f86817a65094297b9833 Author: David Mitchell <[email protected]> Date: Mon May 22 14:17:22 2017 +0100 perldelta: re-order "Core Enhancements" entries ... based subjectively on importance. I've put new language features first, followed by unicodey-stuff. There have been no edits - purely cut-n-pasting an entire entry as-is to a new location. M pod/perldelta.pod commit f09bae29768c5900bc552dee9fdf7dcb9d31d573 Author: David Mitchell <[email protected]> Date: Mon May 22 14:07:29 2017 +0100 perldelta: fix some issues raised by Karl moved utf8_hop_safe() to Internals; removed A regression from the previous development release, 5.23.3, where compiling a regular expression could crash the interpreter has been fixed. [perl #128686]. since that was a typo - it was 5.25.3, and since the bug appeared, and was fixed in, the same development branch, doesn't need mentioning. M pod/perldelta.pod ----------------------------------------------------------------------- Summary of changes: pod/perldelta.pod | 307 +++++++++++++++++++++++++++++------------------------- 1 file changed, 164 insertions(+), 143 deletions(-) diff --git a/pod/perldelta.pod b/pod/perldelta.pod index 4a30a164fd..39efba65e0 100644 --- a/pod/perldelta.pod +++ b/pod/perldelta.pod @@ -38,24 +38,13 @@ See L</Unescaped literal C<"{"> characters in regular expression patterns are no =head1 Core Enhancements -=head2 New regular expression modifier C</xx> - -Specifying two C<x> characters to modify a regular expression pattern -does everything that a single one does, but additionally TAB and SPACE -characters within a bracketed character class are generally ignored and -can be added to improve readability, like -S<C</[ ^ A-Z d-f p-x ]/xx>>. Details are at -L<perlre/E<sol>x and E<sol>xx>. - -=head2 New Hash Function For 64-bit Builds - -We have switched to a hybrid hash function to better balance -performance for short and long keys. +=head2 Lexical subroutines are no longer experimental -For short keys, 16 bytes and under, we use an optimised variant of -One At A Time Hard, and for longer keys we use Siphash 1-3. For very -long keys this is a big improvement in performance. For shorter keys -there is a modest improvement. +Using the C<lexical_subs> feature introduced in v5.18 no longer emits a warning. Existing +code that disables the C<experimental::lexical_subs> warning category +that the feature previously used will continue to work. The +C<lexical_subs> feature has no effect; all Perl code can use lexical +subroutines, regardless of what feature declarations are in scope. =head2 Indented Here-documents @@ -89,6 +78,15 @@ For example: prints "Hello there\n" with no leading whitespace. +=head2 New regular expression modifier C</xx> + +Specifying two C<x> characters to modify a regular expression pattern +does everything that a single one does, but additionally TAB and SPACE +characters within a bracketed character class are generally ignored and +can be added to improve readability, like +S<C</[ ^ A-Z d-f p-x ]/xx>>. Details are at +L<perlre/E<sol>x and E<sol>xx>. + =head2 @{^CAPTURE}, %{^CAPTURE}, and %{^CAPTURE_ALL} C<@{^CAPTURE}> exposes the capture buffers of the last match as an @@ -106,6 +104,20 @@ C<%{^CAPTURE_ALL}> is the equivalent to C<%-> (I<i.e.>, all named captures). Other than being more self documenting there is no difference between the two forms. +=head2 Declaring a reference to a variable + +As an experimental feature, Perl now allows the referencing operator to come +after L<C<my()>|perlfunc/my>, L<C<state()>|perlfunc/state>, +L<C<our()>|perlfunc/our>, or L<C<local()>|perlfunc/local>. This syntax must +be enabled with C<use feature 'declared_refs'>. It is experimental, and will +warn by default unless C<no warnings 'experimental::refaliasing'> is in effect. +It is intended mainly for use in assignments to references. For example: + + use experimental 'refaliasing', 'declared_refs'; + my \$a = \$b; + +See L<perlref/Assigning to References> for more details. + =head2 Unicode 9.0 is now supported A list of changes is at L<http://www.unicode.org/versions/Unicode9.0.0/>. @@ -123,20 +135,6 @@ programs that very specifically needed the old behavior. The meaning of compound forms, like C<\p{sc=I<script>}> are unchanged. See L<perlunicode/Scripts>. -=head2 Declaring a reference to a variable - -As an experimental feature, Perl now allows the referencing operator to come -after L<C<my()>|perlfunc/my>, L<C<state()>|perlfunc/state>, -L<C<our()>|perlfunc/our>, or L<C<local()>|perlfunc/local>. This syntax must -be enabled with C<use feature 'declared_refs'>. It is experimental, and will -warn by default unless C<no warnings 'experimental::refaliasing'> is in effect. -It is intended mainly for use in assignments to references. For example: - - use experimental 'refaliasing', 'declared_refs'; - my \$a = \$b; - -See L<perlref/Assigning to References> for more details. - =head2 Perl can now do default collation in UTF-8 locales on platforms that support it @@ -155,14 +153,6 @@ ignored at the higher priority ones. There are still some gotchas in some strings, though. See L<perllocale/Collation of strings containing embedded C<NUL> characters>. -=head2 Lexical subroutines are no longer experimental - -Using the C<lexical_subs> feature introduced in v5.18 no longer emits a warning. Existing -code that disables the C<experimental::lexical_subs> warning category -that the feature previously used will continue to work. The -C<lexical_subs> feature has no effect; all Perl code can use lexical -subroutines, regardless of what feature declarations are in scope. - =head2 C<CORE> subroutines for hash and array functions callable via reference @@ -172,10 +162,15 @@ be called with ampersand syntax (C<&CORE::keys(\%hash>) and via reference (C<< my $k = \&CORE::keys; $k-E<gt>(\%hash) >>). Previously they could only be used when inlined. -=head2 for XS code, create a safer utf8_hop() called utf8_hop_safe() +=head2 New Hash Function For 64-bit Builds -Unlike utf8_hop(), utf8_hop_safe() won't navigate before the beginning or after -the end of the supplied buffer. +We have switched to a hybrid hash function to better balance +performance for short and long keys. + +For short keys, 16 bytes and under, we use an optimised variant of +One At A Time Hard, and for longer keys we use Siphash 1-3. For very +long keys this is a big improvement in performance. For shorter keys +there is a modest improvement. =head1 Security @@ -2161,18 +2156,23 @@ t/uni/overload.t: Skip hanging test on FreeBSD. =item * -The C<op_class()> API function has been added. This is like the existing -C<OP_CLASS()> macro, but can more accurately determine what struct an op -has been allocated as. For example C<OP_CLASS()> might return -C<OA_BASEOP_OR_UNOP> indicating that ops of this type are usually -allocated as an C<OP> or C<UNOP>; while C<op_class()> will return -C<OPclass_BASEOP> or C<OPclass_UNOP> as appropriate. +A new API function C<sv_setvpv_bufsize()> allows simultaneously setting the +length and allocated size of the buffer in an C<SV>, growing the buffer if +necessary. =item * -The output format of the C<op_dump()> function (as used by C<perl -Dx>) -has changed: it now displays an "ASCII-art" tree structure, and shows more -low-level details about each op, such as its address and class. +A new API macro C<SvPVCLEAR()> sets its C<SV> argument to an empty string, +like Perl-space C<$x = ''>, but with several optimisations. + +=item * + +Several new macros and functions for dealing with Unicode and +UTF-8-encoded strings have been added to the API, as well some changes in +functionality of existing functions (see L<perlapi/Unicode Support> for +more details): + +=over =item * @@ -2193,23 +2193,70 @@ Similarly, macros like C<toLOWER_utf8> on malformed UTF-8 now die. =item * -Calling the functions C<utf8n_to_uvchr> and its derivatives, while -passing a string length of 0 is now asserted against in DEBUGGING -builds, and otherwise returns the Unicode REPLACEMENT CHARACTER. If -you have nothing to decode, you shouldn't call the decode function. +Several new macros for analysing the validity of utf8 sequences. These +are: + +C<L<perlapi/UTF8_GOT_ABOVE_31_BIT>> +C<L<perlapi/UTF8_GOT_CONTINUATION>> +C<L<perlapi/UTF8_GOT_EMPTY>> +C<L<perlapi/UTF8_GOT_LONG>> +C<L<perlapi/UTF8_GOT_NONCHAR>> +C<L<perlapi/UTF8_GOT_NON_CONTINUATION>> +C<L<perlapi/UTF8_GOT_OVERFLOW>> +C<L<perlapi/UTF8_GOT_SHORT>> +C<L<perlapi/UTF8_GOT_SUPER>> +C<L<perlapi/UTF8_GOT_SURROGATE>> +C<L<perlapi/UTF8_IS_INVARIANT>> +C<L<perlapi/UTF8_IS_NONCHAR>> +C<L<perlapi/UTF8_IS_SUPER>> +C<L<perlapi/UTF8_IS_SURROGATE>> +C<L<perlapi/UVCHR_IS_INVARIANT>> +C<L<perlapi/isUTF8_CHAR_flags>> +C<L<perlapi/isSTRICT_UTF8_CHAR>> +C<L<perlapi/isC9_STRICT_UTF8_CHAR>> =item * -The functions C<utf8n_to_uvchr> and its derivatives now return the -Unicode REPLACEMENT CHARACTER if called with UTF-8 that has the overlong -malformation, and that malformation is allowed by the input parameters. -This malformation is where the UTF-8 looks valid syntactically, but -there is a shorter sequence that yields the same code point. This has -been forbidden since Unicode version 3.1. +Functions that are all extensions of the C<is_utf8_string_*()> functions, +that apply various restrictions to the UTF-8 recognized as valid: + +C<L<perlapi/is_strict_utf8_string>>, +C<L<perlapi/is_strict_utf8_string_loc>>, +C<L<perlapi/is_strict_utf8_string_loclen>>, + +C<L<perlapi/is_c9strict_utf8_string>>, +C<L<perlapi/is_c9strict_utf8_string_loc>>, +C<L<perlapi/is_c9strict_utf8_string_loclen>>, + +C<L<perlapi/is_utf8_string_flags>>, +C<L<perlapi/is_utf8_string_loc_flags>>, +C<L<perlapi/is_utf8_string_loclen_flags>>, + +C<L<perlapi/is_utf8_fixed_width_buf_flags>>, +C<L<perlapi/is_utf8_fixed_width_buf_loc_flags>>, +C<L<perlapi/is_utf8_fixed_width_buf_loclen_flags>>. + +C<L<perlapi/is_utf8_invariant_string>>. +C<L<perlapi/is_utf8_valid_partial_char>>. +C<L<perlapi/is_utf8_valid_partial_char_flags>>. =item * -The functions C<utf8n_to_uvchr> and its derivatives now accept an input +The functions C<L<perlapi/utf8n_to_uvchr>> and its derivatives have had +several changes of behaviour. + +Calling them, while passing a string length of 0 is now asserted against +in DEBUGGING builds, and otherwise returns the Unicode REPLACEMENT +CHARACTER. If you have nothing to decode, you shouldn't call the decode +function. + +They now return the Unicode REPLACEMENT CHARACTER if called with UTF-8 +that has the overlong malformation, and that malformation is allowed by +the input parameters. This malformation is where the UTF-8 looks valid +syntactically, but there is a shorter sequence that yields the same code +point. This has been forbidden since Unicode version 3.1. + +They now accept an input flag to allow the overflow malformation. This malformation is when the UTF-8 may be syntactically valid, but the code point it represents is not capable of being represented in the word length on the platform. @@ -2218,21 +2265,19 @@ error, and advances the parse pointer to beyond the UTF-8 in question, but it returns the Unicode REPLACEMENT CHARACTER as the value of the code point (since the real value is not representable). -=item * - -The C<PADOFFSET> type has changed from being unsigned to signed, and -several pad-related variables such as C<PL_padix> have changed from being -of type C<I32> to type C<PADOFFSET>. - -=item * - -The function C<L<perlapi/utf8n_to_uvchr>> has been changed to not +C<utf8n_to_uvchr> has been changed to not abandon searching for other malformations when the first one is encountered. A call to it thus can generate multiple diagnostics, instead of just one. =item * +C<valid_utf8_to_uvchr()> has been added to the API (although it was +present in core earlier). Like C<utf8_to_uvchr_buf()>, but assumes that +the next character is well-formed. + +=item * + A new function, C<L<perlapi/utf8n_to_uvchr_error>>, has been added for use by modules that need to know the details of UTF-8 malformations beyond pass/fail. Previously, the only ways to know why a sequence was @@ -2241,108 +2286,85 @@ your own analysis. =item * -Several new functions for handling Unicode have been added to the API: -C<L<perlapi/is_strict_utf8_string>>, -C<L<perlapi/is_c9strict_utf8_string>>, -C<L<perlapi/is_utf8_string_flags>>, -C<L<perlapi/is_strict_utf8_string_loc>>, -C<L<perlapi/is_strict_utf8_string_loclen>>, -C<L<perlapi/is_c9strict_utf8_string_loc>>, -C<L<perlapi/is_c9strict_utf8_string_loclen>>, -C<L<perlapi/is_utf8_string_loc_flags>>, -C<L<perlapi/is_utf8_string_loclen_flags>>, -C<L<perlapi/is_utf8_fixed_width_buf_flags>>, -C<L<perlapi/is_utf8_fixed_width_buf_loc_flags>>, -C<L<perlapi/is_utf8_fixed_width_buf_loclen_flags>>. - -These functions are all extensions of the C<is_utf8_string_*()> functions, -that apply various restrictions to the UTF-8 recognized as valid. +There is now a safer version of utf8_hop(), called utf8_hop_safe(). +Unlike utf8_hop(), utf8_hop_safe() won't navigate before the beginning or +after the end of the supplied buffer. =item * -A new API function C<sv_setvpv_bufsize()> allows simultaneously setting the -length and allocated size of the buffer in an C<SV>, growing the buffer if -necessary. +Two new functions, C<utf8_hop_forward()> and C<utf8_hop_back()> are +similar to C<utf8_hop_safe()> but are for when you know which direction +you wish to travel. =item * -A new API macro C<SvPVCLEAR()> sets its C<SV> argument to an empty string, -like Perl-space C<$x = ''>, but with several optimisations. +Two new macros which return useful utf8 byte sequences: -=item * +C<L<perlapi/BOM_UTF8>> +C<L<perlapi/REPLACEMENT_CHARACTER_UTF8>> -All parts of the internals now agree that the C<sassign> op is a C<BINOP>; -previously it was listed as a C<BASEOP> in F<regen/opcodes>, which meant -that several parts of the internals had to be special-cased to accommodate -it. This oddity's original motivation was to handle code like C<$x ||= 1>; -that is now handled in a simpler way. +=back =item * -Several new internal C macros have been added that take a string literal as -arguments, alongside existing routines that take the equivalent value as two -arguments, a character pointer and a length. The advantage of this is that -the length of the string is calculated automatically, rather than having to -be done manually. These routines are now used where appropriate across the -entire codebase. - -=item * +Perl is now built with the C<PERL_OP_PARENT> compiler define enabled by +default. To disable it, use the C<PERL_NO_OP_PARENT> compiler define. +This flag alters how the C<op_sibling> field is used in C<OP> structures, +and has been available optionally since perl 5.22. -The code in F<gv.c> that determines whether a variable has a special meaning -to Perl has been simplified. +See L<perl5220delta/"Internal Changes"> for more details of what this +build option does. =item * -The C<DEBUGGING>-mode output for regex compilation and execution has been -enhanced. +Three new ops, C<OP_ARGELEM>, C<OP_ARGDEFELEM> and C<OP_ARGCHECK> have +been added. These are intended principally to implement the individual +elements of a subroutine signature, plus any overall checking required. =item * -Several macros and functions have been added to the public API for -dealing with Unicode and UTF-8-encoded strings. See -L<perlapi/Unicode Support>. +The C<op_class()> API function has been added. This is like the existing +C<OP_CLASS()> macro, but can more accurately determine what struct an op +has been allocated as. For example C<OP_CLASS()> might return +C<OA_BASEOP_OR_UNOP> indicating that ops of this type are usually +allocated as an C<OP> or C<UNOP>; while C<op_class()> will return +C<OPclass_BASEOP> or C<OPclass_UNOP> as appropriate. =item * -Use C<my_strlcat()> in C<locale.c>. While C<strcat()> is safe in this context, -some compilers were optimizing this to C<strcpy()> causing a porting test to -fail that looks for unsafe code. Rather than fighting this, we just use -C<my_strlcat()> instead. +All parts of the internals now agree that the C<sassign> op is a C<BINOP>; +previously it was listed as a C<BASEOP> in F<regen/opcodes>, which meant +that several parts of the internals had to be special-cased to accommodate +it. This oddity's original motivation was to handle code like C<$x ||= 1>; +that is now handled in a simpler way. =item * -Three new ops, C<OP_ARGELEM>, C<OP_ARGDEFELEM> and C<OP_ARGCHECK> have -been added. These are intended principally to implement the individual -elements of a subroutine signature, plus any overall checking required. +The output format of the C<op_dump()> function (as used by C<perl -Dx>) +has changed: it now displays an "ASCII-art" tree structure, and shows more +low-level details about each op, such as its address and class. =item * -Perl no longer panics when switching into some locales on machines with -buggy C<strxfrm()> implementations in their libc. [perl #121734] +The C<PADOFFSET> type has changed from being unsigned to signed, and +several pad-related variables such as C<PL_padix> have changed from being +of type C<I32> to type C<PADOFFSET>. =item * -Perl is now built with the C<PERL_OP_PARENT> compiler define enabled by -default. To disable it, use the C<PERL_NO_OP_PARENT> compiler define. -This flag alters how the C<op_sibling> field is used in C<OP> structures, -and has been available optionally since perl 5.22. - -See L<perl5220delta/"Internal Changes"> for more details of what this -build option does. +The C<DEBUGGING>-mode output for regex compilation and execution has been +enhanced. =item * -The meanings of some internal SV flags have been changed - -OPpRUNTIME, SVpbm_VALID, SVpbm_TAIL, SvTAIL_on, SvTAIL_off, SVrepl_EVAL, -SvEVALED +Several obscure SV flags have been eliminated, sometimes along with the +macros which manipulate them: C<SVpbm_VALID>, C<SVpbm_TAIL>, C<SvTAIL_on>, +C<SvTAIL_off>, C<SVrepl_EVAL>, C<SvEVALED> =item * -Change C<hv_fetch(â¦, "â¦", â¦, â¦)> to C<hv_fetchs(â¦, "â¦", â¦)> - -The dual-life dists all use Devel::PPPort, so they can use this function even -though it was only added in 5.10. +An OP op_private flag has been eliminated: C<OPpRUNTIME>. This used to +often get set on C<PMOP>s, but had become meaningless over time. =back @@ -2352,6 +2374,11 @@ though it was only added in 5.10. =item * +Perl no longer panics when switching into some locales on machines with +buggy C<strxfrm()> implementations in their libc. [perl #121734] + +=item * + C< $-{$name} > would leak an C<AV> on each access if the regular expression had no named captures. The same applies to access to any hash tied with L<Tie::Hash::NamedCapture> and C<< all =E<gt> 1 >>. [perl @@ -2689,12 +2716,6 @@ A regression in 5.24 with C<tr/\N{U+...}/foo/> when the code point was between =item * -A regression from the previous development release, 5.23.3, where -compiling a regular expression could crash the interpreter has been -fixed. [perl #128686]. - -=item * - Use of a string delimiter whose code point is above 2**31 now works correctly on platforms that allow this. Previously, certain characters, due to truncation, would be confused with other delimiter characters -- Perl5 Master Repository
