In perl.git, the branch blead has been updated <https://perl5.git.perl.org/perl.git/commitdiff/0f3d8cd442f9b7ac5abd03743d2a372fa3d871d6?hp=2dd743bd3bdcfb2f8fd1e33369d14798c7c3a87d>
- Log ----------------------------------------------------------------- commit 0f3d8cd442f9b7ac5abd03743d2a372fa3d871d6 Author: Karl Williamson <[email protected]> Date: Thu May 23 14:23:48 2019 -0600 numeric.c: Use full name of function in definition The prefix 'Perl_' was omitted, but this did not matter because there was a macro that expanded to include the prefix. But it's customary to have the expansion at the point of definition. commit 52129632e392fa9969a1a481f0175777a4329973 Author: Karl Williamson <[email protected]> Date: Thu May 23 11:58:09 2019 -0600 PATCH: [perl #134098] no locales + debugging = no compile The wrong #define was being tested for commit 734a50c78631a0b2dcdb36e690dcad4e7f49a21e Author: Richard Leach <[email protected]> Date: Sun May 19 03:50:37 2019 +0000 Modern Androids do support locales commit a04a75f20f03aa08ce8118b3b0b3f93eb3e997c5 Author: Richard Leach <[email protected]> Date: Sun May 19 20:16:41 2019 +0000 PATCH: [perl #134117] Close DATA in loc_tools.pl This prevents unexpected text in lib/warnings.t commit 01b91050db552dbb89e4bd2eb6d089ebd0f28c1e Author: Karl Williamson <[email protected]> Date: Thu May 23 10:48:30 2019 -0600 locale.c: Fix '%s' directive argument is null This was just an oversight. THe code doesn't get executed unless it's trying to panic commit b377a43cebc7e181f6f2189b957deda3a9af6943 Author: Karl Williamson <[email protected]> Date: Fri May 3 14:30:53 2019 -0600 asan_ignore: Explicitly list ignored behavior fcns Overflow on signed integers is undefined behavior. Perl does whatever the underlying compiler does in this case. Previously this file used a wildcard which could specify more functions than we really expect to see this potential overflow from. It is safer to mention explicitly all the functions. commit 814735a391b874af8f00eaf89469e5ec7f38cd4a Author: Karl Williamson <[email protected]> Date: Fri May 3 13:57:47 2019 -0600 Remove undefined behavior from IV shifting It is undefined behavior to shift a negative integer to the left. This commit avoids that by treating the value as unsigned, then casting back to integer for return. commit bae047b68c92622bb4bb04499e36cdaa48138909 Author: Karl Williamson <[email protected]> Date: Wed May 1 10:41:38 2019 -0600 pp.c: Add two UNLIKELY()s It should be uncommon to shift beyond a full word commit 3a019afd6f6291c3249c254b5c01e244e4ec83ab Author: Karl Williamson <[email protected]> Date: Sun Apr 28 17:42:44 2019 -0600 Create fcn for lossless conversion of NV to IV Essentially the same code was being used in three places, and had undefined C behavior for some inputs. This consolidates the code into one inline function, and rewrites it to avoid undefined behavior. commit 190e86d7a39dad04ad5d024fb42c0b8fefa68ebe Author: Karl Williamson <[email protected]> Date: Sun Apr 28 17:26:38 2019 -0600 pp.c: White-space only Fix indentation of this routine to current standards, in preparation for making changes to it, and add a blank line for readability commit 19c737262739194609b3dc81b416fca7ee242315 Author: Karl Williamson <[email protected]> Date: Wed Apr 24 15:45:43 2019 -0600 perl.h: Clarify debugging text commit bc2f1ca15922645b3e432d3a3ef7142fe6baedbb Author: Karl Williamson <[email protected]> Date: Thu Apr 18 10:10:41 2019 -0600 Add some comments around tainting commit 6aba5c5eec25a6294e01899eb55a31bf5b889900 Author: Karl Williamson <[email protected]> Date: Wed Apr 3 14:11:27 2019 -0600 locale.c: Add some comments commit 72299e00ea670f308805c0327b3a5d0cfc2d4a62 Author: Jerome Duval <[email protected]> Date: Thu Apr 25 23:19:43 2019 +0200 locale.c: remove unnecessary cast This was failing in gcc 2.95. The original commit added a cast, but we figured out that removing this other one that really served no purpose causes this compiler to work. commit 79780975167f37019a43a656947fc7f7d167d29f Author: Karl Williamson <[email protected]> Date: Fri May 10 08:54:45 2019 -0600 Jereome Duval is a Perl author commit efec1f81243e3b567629efa4571266438ac98ec7 Author: Karl Williamson <[email protected]> Date: Thu May 23 00:48:49 2019 -0600 regcomp.sym: Fix typo in comment commit 02fd9d541c593812f3e152dc406cb4399353d5dc Author: Karl Williamson <[email protected]> Date: Sun May 5 10:03:01 2019 -0600 numeric.c: Add #error case It's best to have a #error case when nothing in the #if #else series is true, as it catches the problem at compile time. commit 89f69032d6a71f41b96ae6becbf3df4e2f9509a5 Author: Karl Williamson <[email protected]> Date: Sat Apr 27 13:56:39 2019 -0600 S_scan_const() Properly test if need to grow As we parse the input, creating a string constant, we may have to grow the destination if it fills up as we go along. It allocates space in an SV and populates the string, but it doesn' update the SvCUR until the end, so in single stepping the debugger through the code, the SV looks empty until the end. It turns out that as a result SvEND also doesn't get updated and still points to the beginning of the string until SvCUR is finally set. That means that the test changed by this commit was always succeeding, because it was using SvEND that didn't get updated, so it would attempt to grow each time through the loop. By moving a couple of statements earlier, and using SvLEN instead, which does always have the correct value, those extra growth attempts are avoided. commit 4d770d00eb09b8abf94260f9145e05c8901389b2 Author: Karl Williamson <[email protected]> Date: Sun Apr 28 21:26:35 2019 -0600 POSIX.xs: Add NOTREACHED to silence cygwin compiler commit 35acbeb13af0cd2b68ebf98c71bf6f3a4b7dc50b Author: Karl Williamson <[email protected]> Date: Wed Apr 24 16:22:08 2019 -0600 posix_bump commit 4e650215e450c1d08b1fa65f76bc3fbdd0d2b978 Author: Karl Williamson <[email protected]> Date: Sun Apr 28 21:17:21 2019 -0600 cygwin.c: Fix misleading indentation commit 016b720906a9b088c796d5295109cd8e9a3aecf9 Author: Karl Williamson <[email protected]> Date: Thu Mar 14 12:18:59 2019 -0600 regnodes.h: Change some regnodes' names These were misleading, as elsewhere a leading 'N' in the name means the complement. Instead move the N to the end of the name commit 9629b6dc1d6296e0d3a6a5dbfd0f672b4b2021ca Author: Karl Williamson <[email protected]> Date: Wed May 22 14:37:20 2019 -0600 PATCH: [perl #134126] -Dusemymalloc, -Dusethreads This was due to a missing declaration for thread context needed to output a message. commit cc16d262eb72677cdda2aa9395e943818b85ba38 Author: Karl Williamson <[email protected]> Date: Mon Apr 29 15:24:18 2019 -0600 PATCH: [perl #134059] panic outputting a warning This was due to a logic error on my part. We need to save and restore a value. Instead, it was getting restored to the wrong value. This particular instance of the bug was outputting a fatal error message, so that the only harm is not giving the user the correct info, and creating unnecessary work for them and us when it gets reported. But this bug could manifest itself when trying to output just a warning that the program otherwise would carry on from. commit d38c72a57261a93e56e738d286c7739608801d41 Author: Karl Williamson <[email protected]> Date: Fri Jun 1 10:51:05 2018 -0600 perllocale: Use L</Foo Bar>, not L<Foo Bar> commit 2d26cf4aed90a77ac5e93ddec29770756027b788 Author: Karl Williamson <[email protected]> Date: Fri May 24 09:15:59 2019 -0600 PATCH: [perl #134134] read beyond end of buffer This turns out to be because of a special input case in myatof3(), wherein if the input length is 0, it call strlen to find the length. The solution is to add a test and not call the function unless the length is positive. ----------------------------------------------------------------------- Summary of changes: AUTHORS | 1 + asan_ignore | 12 ++++------ cygwin/cygwin.c | 4 ++-- embed.fnc | 1 + embed.h | 3 +++ ext/POSIX/POSIX.xs | 1 + ext/POSIX/lib/POSIX.pm | 2 +- hints/linux-android.sh | 14 +++++++----- inline.h | 34 +++++++++++++++++++++++++++ intrpvar.h | 2 +- locale.c | 18 ++++++++------- malloc.c | 1 + numeric.c | 4 +++- perl.h | 22 ++++++++++++------ pod/perldebguts.pod | 12 +++++----- pod/perllocale.pod | 2 +- pp.c | 59 +++++++++++++++++++++++++++-------------------- pp_hot.c | 10 ++------ proto.h | 7 ++++++ regcomp.c | 32 +++++++++++++++++--------- regcomp.sym | 14 ++++++------ regexec.c | 12 +++++----- regnodes.h | 62 +++++++++++++++++++++++++------------------------- t/loc_tools.pl | 2 +- t/re/reg_mesg.t | 1 + taint.c | 5 +++- toke.c | 10 ++++---- 27 files changed, 213 insertions(+), 134 deletions(-) diff --git a/AUTHORS b/AUTHORS index 7cda831b9b..143464288f 100644 --- a/AUTHORS +++ b/AUTHORS @@ -589,6 +589,7 @@ Jeremy D. Zawodny <[email protected]> Jeremy H. Brown <[email protected]> Jeremy Madea <[email protected]> Jerome Abela <[email protected]> +Jerome Duval <[email protected]> Jerrad Pierce <[email protected]> Jerry D. Hedden <[email protected]> Jess Robinson <[email protected]> diff --git a/asan_ignore b/asan_ignore index e0f5685bc1..8050f3c217 100644 --- a/asan_ignore +++ b/asan_ignore @@ -16,13 +16,11 @@ # suffix with =foo for a "tool-specific category", but neither =undefined # nor =signed-integer-overflow worked. -fun:Perl_pp_i_* - -# Perl's << is defined as using the underlying C's << operator, with the -# same undefined behaviour for shifts greater than the word size. -# (UVs normally, IVs with 'use integer') - -fun:Perl_pp_left_shift +fun:Perl_pp_i_add +fun:Perl_pp_i_divide +fun:Perl_pp_i_negate +fun:Perl_pp_i_multiply +fun:Perl_pp_i_subtract # this function numifies the field width in eg printf "%10f". # It has its own overflow detection, so don't warn about it diff --git a/cygwin/cygwin.c b/cygwin/cygwin.c index fae90af000..6b11efed7f 100644 --- a/cygwin/cygwin.c +++ b/cygwin/cygwin.c @@ -112,8 +112,8 @@ do_spawn (char *cmd) if (strBEGINs (cmd,"exec") && isSPACE (cmd[4])) goto doshell; for (s=cmd; *s && isALPHA (*s); s++) ; /* catch VAR=val gizmo */ - if (*s=='=') - goto doshell; + if (*s=='=') + goto doshell; for (s=cmd; *s; s++) if (strchr (metachars,*s)) diff --git a/embed.fnc b/embed.fnc index 45597f67b6..259affded0 100644 --- a/embed.fnc +++ b/embed.fnc @@ -2272,6 +2272,7 @@ sR |SV* |refto |NN SV* sv : Used in pp_hot.c pRxo |GV* |softref2xv |NN SV *const sv|NN const char *const what \ |const svtype type|NN SV ***spp +inR |bool |lossless_NV_to_IV|const NV nv|NN IV * ivp #endif #if defined(PERL_IN_PP_PACK_C) diff --git a/embed.h b/embed.h index 75c91f77f4..9178c51e92 100644 --- a/embed.h +++ b/embed.h @@ -1924,6 +1924,9 @@ #define do_delete_local() S_do_delete_local(aTHX) #define refto(a) S_refto(aTHX_ a) # endif +# if defined(PERL_IN_PP_C) || defined(PERL_IN_PP_HOT_C) +#define lossless_NV_to_IV S_lossless_NV_to_IV +# endif # if defined(PERL_IN_PP_CTL_C) #define check_type_and_open(a) S_check_type_and_open(aTHX_ a) #define destroy_matcher(a) S_destroy_matcher(aTHX_ a) diff --git a/ext/POSIX/POSIX.xs b/ext/POSIX/POSIX.xs index 5b9bbc429b..311ed73c8b 100644 --- a/ext/POSIX/POSIX.xs +++ b/ext/POSIX/POSIX.xs @@ -1086,6 +1086,7 @@ static NV my_rint(NV x) } #endif not_here("rint"); + NOT_REACHED; /* NOTREACHED */ } #endif diff --git a/ext/POSIX/lib/POSIX.pm b/ext/POSIX/lib/POSIX.pm index 4de039410f..dc3f438a19 100644 --- a/ext/POSIX/lib/POSIX.pm +++ b/ext/POSIX/lib/POSIX.pm @@ -4,7 +4,7 @@ use warnings; our ($AUTOLOAD, %SIGRT); -our $VERSION = '1.88'; +our $VERSION = '1.89'; require XSLoader; diff --git a/hints/linux-android.sh b/hints/linux-android.sh index 6a59cb726e..626c0b4e48 100644 --- a/hints/linux-android.sh +++ b/hints/linux-android.sh @@ -30,13 +30,15 @@ esac # Make sure that we look for libm libswanted="$libswanted m" -# Down with locales! +# Older Androids lack locale support and may need the following undefs +# uncommenting. This isn't necessary from at least Android 8.1 (Oreo) # https://github.com/android/platform_bionic/blob/master/libc/CAVEATS -d_locconv='undef' -d_setlocale='undef' -d_setlocale_r='undef' -d_lc_monetary_2008='undef' -i_locale='undef' +#d_locconv='undef' +#d_setlocale='undef' +#d_setlocale_r='undef' +#d_lc_monetary_2008='undef' +#i_locale='undef' +#d_newlocale='undef' # https://code.google.com/p/android-source-browsing/source/browse/libc/netbsd/net/getservent_r.c?repo=platform--bionic&r=ca6fe7bebe3cc6ed7e2db5a3ede2de0fcddf411d#95 d_getservent_r='undef' diff --git a/inline.h b/inline.h index 654f801b75..de1e33e8ce 100644 --- a/inline.h +++ b/inline.h @@ -1913,6 +1913,40 @@ S_should_warn_nl(const char *pv) { #endif +#if defined(PERL_IN_PP_C) || defined(PERL_IN_PP_HOT_C) + +PERL_STATIC_INLINE bool +S_lossless_NV_to_IV(const NV nv, IV *ivp) +{ + /* This function determines if the input NV 'nv' may be converted without + * loss of data to an IV. If not, it returns FALSE taking no other action. + * But if it is possible, it does the conversion, returning TRUE, and + * storing the converted result in '*ivp' */ + + PERL_ARGS_ASSERT_LOSSLESS_NV_TO_IV; + +# if defined(Perl_isnan) + + if (UNLIKELY(Perl_isnan(nv))) { + return FALSE; + } + +# endif + + if (UNLIKELY(nv < IV_MIN) || UNLIKELY(nv > IV_MAX)) { + return FALSE; + } + + if ((IV) nv != nv) { + return FALSE; + } + + *ivp = (IV) nv; + return TRUE; +} + +#endif + /* ------------------ pp.c, regcomp.c, toke.c, universal.c ------------ */ #define MAX_CHARSET_NAME_LENGTH 2 diff --git a/intrpvar.h b/intrpvar.h index e33036601b..41aa364329 100644 --- a/intrpvar.h +++ b/intrpvar.h @@ -75,7 +75,7 @@ PERLVAR(I, multideref_pc, UNOP_AUX_item *) PERLVAR(I, curpm, PMOP *) /* what to do \ interps in REs from */ PERLVAR(I, curpm_under, PMOP *) /* what to do \ interps in REs from */ -PERLVAR(I, tainting, bool) /* doing taint checks */ +PERLVAR(I, tainting, bool) /* ? doing taint checks */ PERLVARI(I, tainted, bool, FALSE) /* using variables controlled by $< */ /* PL_delaymagic is currently used for two purposes: to assure simultaneous diff --git a/locale.c b/locale.c index 87e11d609d..dffedad7ff 100644 --- a/locale.c +++ b/locale.c @@ -739,7 +739,7 @@ S_emulate_setlocale(const int category, # endif - } + } /* End of this being setlocale(LC_foo, NULL) */ /* Here, we are switching locales. */ @@ -855,7 +855,7 @@ S_emulate_setlocale(const int category, } } } - } + } /* End of this being setlocale(LC_foo, "") */ else if (strchr(locale, ';')) { /* LC_ALL may actually incude a conglomeration of various categories. @@ -952,7 +952,8 @@ S_emulate_setlocale(const int category, assert(category == LC_ALL); return do_setlocale_c(LC_ALL, NULL); - } + } /* End of this being setlocale(LC_ALL, + "LC_CTYPE=foo;LC_NUMERIC=bar;...") */ ready_to_set: ; @@ -1006,7 +1007,9 @@ S_emulate_setlocale(const int category, # endif - /* If we are switching to the LC_ALL C locale, it already exists. Use + /* If this call is to switch to the LC_ALL C locale, it already exists, and + * in fact, we already have switched to it (in preparation for what + * normally is to come). But since we're already there, continue to use * it instead of trying to create a new locale */ if (mask == LC_ALL_MASK && isNAME_C_OR_POSIX(locale)) { @@ -2224,7 +2227,7 @@ Perl_setlocale(const int category, const char * locale) { /* This wraps POSIX::setlocale() */ -#ifdef NO_LOCALE +#ifndef USE_LOCALE PERL_UNUSED_ARG(category); PERL_UNUSED_ARG(locale); @@ -5052,9 +5055,7 @@ Perl__is_cur_LC_category_utf8(pTHX_ int category) Copy(delimited, utf8ness_cache, input_name_len_with_overhead - 1, char); utf8ness_cache[input_name_len_with_overhead - 1] = is_utf8 + '0'; - if ((PL_locale_utf8ness[strlen(PL_locale_utf8ness)-1] - & (PERL_UINTMAX_T) ~1) != '0') - { + if ((PL_locale_utf8ness[strlen(PL_locale_utf8ness)-1] & ~1) != '0') { Perl_croak(aTHX_ "panic: %s: %d: Corrupt utf8ness_cache=%s\nlen=%zu," " inserted_name=%s, its_len=%zu\n", @@ -5084,6 +5085,7 @@ Perl__is_cur_LC_category_utf8(pTHX_ int category) s++; e = strchr(s, UTF8NESS_PREFIX[0]); if (! e) { + e = PL_locale_utf8ness + strlen(PL_locale_utf8ness); Perl_croak(aTHX_ "panic: %s: %d: Corrupt utf8ness_cache: missing" " separator %.*s<-- HERE %s\n", diff --git a/malloc.c b/malloc.c index ed392ee5ba..0c80a0856f 100644 --- a/malloc.c +++ b/malloc.c @@ -1239,6 +1239,7 @@ Perl_malloc(size_t nbytes) * malloc so that pointer subtraction in the same structure is always * well defined */ if (nbytes > PTRDIFF_MAX) { + dTHX; MYMALLOC_WRITE2STDERR("Memory requests are limited to PTRDIFF_MAX" " bytes to prevent possible undefined" " behavior"); diff --git a/numeric.c b/numeric.c index d4e3493784..659b69e70a 100644 --- a/numeric.c +++ b/numeric.c @@ -66,6 +66,8 @@ S_strtod(pTHX_ const char * const s, char ** e) result = strtod(s, e); +# else +# error No strtod() equivalent found # endif RESTORE_LC_NUMERIC(); @@ -95,7 +97,7 @@ The synonym Strod() may be used instead. */ NV -my_strtod(const char * const s, char **e) +Perl_my_strtod(const char * const s, char **e) { dTHX; diff --git a/perl.h b/perl.h index e5a55850f1..06f62b1a4f 100644 --- a/perl.h +++ b/perl.h @@ -623,16 +623,24 @@ # define TAINT_WARN_get 0 # define TAINT_WARN_set(s) NOOP #else + /* Set to tainted if we are running under tainting mode */ # define TAINT (PL_tainted = PL_tainting) -# define TAINT_NOT (PL_tainted = FALSE) -# define TAINT_IF(c) if (UNLIKELY(c)) { PL_tainted = PL_tainting; } + +# define TAINT_NOT (PL_tainted = FALSE) /* Untaint */ +# define TAINT_IF(c) if (UNLIKELY(c)) { TAINT; } /* Conditionally taint */ # define TAINT_ENV() if (UNLIKELY(PL_tainting)) { taint_env(); } -# define TAINT_PROPER(s) if (UNLIKELY(PL_tainting)) { taint_proper(NULL, s); } + /* croak or warn if tainting */ +# define TAINT_PROPER(s) if (UNLIKELY(PL_tainting)) { \ + taint_proper(NULL, s); \ + } # define TAINT_set(s) (PL_tainted = (s)) # define TAINT_get (PL_tainted) -# define TAINTING_get (PL_tainting) +# define TAINTING_get (PL_tainting) /* Is taint checking enabled? */ # define TAINTING_set(s) (PL_tainting = (s)) -# define TAINT_WARN_get (PL_taint_warn) +# define TAINT_WARN_get (PL_taint_warn) /* FALSE => tainting violations + are fatal + TRUE => they're just + warnings */ # define TAINT_WARN_set(s) (PL_taint_warn = (s)) #endif @@ -6248,7 +6256,7 @@ typedef struct am_table_short AMTS; else { \ PL_lc_numeric_mutex_depth++; \ DEBUG_Lv(PerlIO_printf(Perl_debug_log, \ - "%s: %d: avoided lc_numeric_lock; depth=%d\n", \ + "%s: %d: avoided lc_numeric_lock; new depth=%d\n", \ __FILE__, __LINE__, PL_lc_numeric_mutex_depth)); \ if (cond_to_panic_if_already_locked) { \ Perl_croak_nocontext("panic: %s: %d: Trying to change" \ @@ -6270,7 +6278,7 @@ typedef struct am_table_short AMTS; else { \ PL_lc_numeric_mutex_depth--; \ DEBUG_Lv(PerlIO_printf(Perl_debug_log, \ - "%s: %d: avoided lc_numeric_unlock; depth=%d\n", \ + "%s: %d: avoided lc_numeric_unlock; new depth=%d\n",\ __FILE__, __LINE__, PL_lc_numeric_mutex_depth)); \ } \ } STMT_END \ diff --git a/pod/perldebguts.pod b/pod/perldebguts.pod index de413fd813..797e2c69e0 100644 --- a/pod/perldebguts.pod +++ b/pod/perldebguts.pod @@ -732,14 +732,14 @@ will be lost. # Named references. Code in regcomp.c assumes that these all are after # the numbered references - NREF no-sv 1 Match some already matched string - NREFF no-sv 1 Match already matched string, using /di + REFN no-sv 1 Match some already matched string + REFFN no-sv 1 Match already matched string, using /di rules. - NREFFL no-sv 1 Match already matched string, using /li + REFFLN no-sv 1 Match already matched string, using /li rules. - NREFFU num 1 Match already matched string, using /ui + REFFUN num 1 Match already matched string, using /ui rules. - NREFFA num 1 Match already matched string, using /aai + REFFAN num 1 Match already matched string, using /aai rules. # Support for long RE @@ -791,7 +791,7 @@ will be lost. GOSUB num/ofs 2L recurse to paren arg1 at (signed) ofs arg2 # Special conditionals - NGROUPP no-sv 1 Whether the group matched. + GROUPPN no-sv 1 Whether the group matched. INSUBP num 1 Whether we are in a specific recurse. DEFINEP none 1 Never execute directly. diff --git a/pod/perllocale.pod b/pod/perllocale.pod index df2db5c474..15498d9d5e 100644 --- a/pod/perllocale.pod +++ b/pod/perllocale.pod @@ -313,7 +313,7 @@ perform a char-by-char comparison, and only returns I<0> (equal) if the operands are char-for-char identical. If you really want to know whether two strings--which C<eq> and C<cmp> may consider different--are equal as far as collation in the locale is concerned, see the discussion in -L<Category C<LC_COLLATE>: Collation>. +L</Category C<LC_COLLATE>: Collation>. =item * diff --git a/pp.c b/pp.c index babf34843e..62a548bc0a 100644 --- a/pp.c +++ b/pp.c @@ -1268,16 +1268,10 @@ PP(pp_multiply) NV nr = SvNVX(svr); NV result; - if ( -#if defined(NAN_COMPARE_BROKEN) && defined(Perl_isnan) - !Perl_isnan(nl) && nl == (NV)(il = (IV)nl) - && !Perl_isnan(nr) && nr == (NV)(ir = (IV)nr) -#else - nl == (NV)(il = (IV)nl) && nr == (NV)(ir = (IV)nr) -#endif - ) + if (lossless_NV_to_IV(nl, &il) && lossless_NV_to_IV(nr, &ir)) { /* nothing was lost by converting to IVs */ goto do_iv; + } SP--; result = nl * nr; # if defined(__sgi) && defined(USE_LONG_DOUBLE) && LONG_DOUBLEKIND == LONG_DOUBLE_IS_DOUBLEDOUBLE_128_BIT_BE_BE && NVSIZE == 16 @@ -1849,16 +1843,10 @@ PP(pp_subtract) NV nl = SvNVX(svl); NV nr = SvNVX(svr); - if ( -#if defined(NAN_COMPARE_BROKEN) && defined(Perl_isnan) - !Perl_isnan(nl) && nl == (NV)(il = (IV)nl) - && !Perl_isnan(nr) && nr == (NV)(ir = (IV)nr) -#else - nl == (NV)(il = (IV)nl) && nr == (NV)(ir = (IV)nr) -#endif - ) + if (lossless_NV_to_IV(nl, &il) && lossless_NV_to_IV(nr, &ir)) { /* nothing was lost by converting to IVs */ goto do_iv; + } SP--; TARGn(nl - nr, 0); /* args not GMG, so can't be tainted */ SETs(TARG); @@ -1991,7 +1979,7 @@ static UV S_uv_shift(UV uv, int shift, bool left) shift = -shift; left = !left; } - if (shift >= IV_BITS) { + if (UNLIKELY(shift >= IV_BITS)) { return 0; } return left ? uv << shift : uv >> shift; @@ -1999,14 +1987,34 @@ static UV S_uv_shift(UV uv, int shift, bool left) static IV S_iv_shift(IV iv, int shift, bool left) { - if (shift < 0) { - shift = -shift; - left = !left; - } - if (shift >= IV_BITS) { - return iv < 0 && !left ? -1 : 0; - } - return left ? iv << shift : iv >> shift; + if (shift < 0) { + shift = -shift; + left = !left; + } + + if (UNLIKELY(shift >= IV_BITS)) { + return iv < 0 && !left ? -1 : 0; + } + + /* For left shifts, perl 5 has chosen to treat the value as unsigned for + * the * purposes of shifting, then cast back to signed. This is very + * different from perl 6: + * + * $ perl6 -e 'say -2 +< 5' + * -64 + * + * $ ./perl -le 'print -2 << 5' + * 18446744073709551552 + * */ + if (left) { + if (iv == IV_MIN) { /* Casting this to a UV is undefined behavior */ + return 0; + } + return (IV) (((UV) iv) << shift); + } + + /* Here is right shift */ + return iv >> shift; } #define UV_LEFT_SHIFT(uv, shift) S_uv_shift(uv, shift, TRUE) @@ -3740,6 +3748,7 @@ PP(pp_ucfirst) else if (DO_UTF8(source)) { /* Is the source utf8? */ doing_utf8 = TRUE; ulen = UTF8SKIP(s); + if (op_type == OP_UCFIRST) { #ifdef USE_LOCALE_CTYPE _toTITLE_utf8_flags(s, s +slen, tmpbuf, &tculen, IN_LC_RUNTIME(LC_CTYPE)); diff --git a/pp_hot.c b/pp_hot.c index 7d5ffc02fd..2df5df8303 100644 --- a/pp_hot.c +++ b/pp_hot.c @@ -1435,16 +1435,10 @@ PP(pp_add) NV nl = SvNVX(svl); NV nr = SvNVX(svr); - if ( -#if defined(NAN_COMPARE_BROKEN) && defined(Perl_isnan) - !Perl_isnan(nl) && nl == (NV)(il = (IV)nl) - && !Perl_isnan(nr) && nr == (NV)(ir = (IV)nr) -#else - nl == (NV)(il = (IV)nl) && nr == (NV)(ir = (IV)nr) -#endif - ) + if (lossless_NV_to_IV(nl, &il) && lossless_NV_to_IV(nr, &ir)) { /* nothing was lost by converting to IVs */ goto do_iv; + } SP--; TARGn(nl + nr, 0); /* args not GMG, so can't be tainted */ SETs(TARG); diff --git a/proto.h b/proto.h index 0f8feed187..74a8e46ab7 100644 --- a/proto.h +++ b/proto.h @@ -5224,6 +5224,13 @@ STATIC SV* S_refto(pTHX_ SV* sv) #endif #if defined(PERL_IN_PP_C) || defined(PERL_IN_PP_HOT_C) +#ifndef PERL_NO_INLINE_FUNCTIONS +PERL_STATIC_INLINE bool S_lossless_NV_to_IV(const NV nv, IV * ivp) + __attribute__warn_unused_result__; +#define PERL_ARGS_ASSERT_LOSSLESS_NV_TO_IV \ + assert(ivp) +#endif + PERL_CALLCONV GV* Perl_softref2xv(pTHX_ SV *const sv, const char *const what, const svtype type, SV ***spp) __attribute__warn_unused_result__; #define PERL_ARGS_ASSERT_SOFTREF2XV \ diff --git a/regcomp.c b/regcomp.c index 9bd6dd3739..70fd0bebf5 100644 --- a/regcomp.c +++ b/regcomp.c @@ -131,6 +131,8 @@ struct RExC_state_t { char *parse; /* Input-scan pointer. */ char *copy_start; /* start of copy of input within constructed parse string */ + char *save_copy_start; /* Provides one level of saving + and restoring 'copy_start' */ char *copy_start_in_input; /* Position in input string corresponding to copy_start */ SSize_t whilem_seen; /* number of WHILEM in this expr */ @@ -229,6 +231,7 @@ struct RExC_state_t { #define RExC_precomp (pRExC_state->precomp) #define RExC_copy_start_in_input (pRExC_state->copy_start_in_input) #define RExC_copy_start_in_constructed (pRExC_state->copy_start) +#define RExC_save_copy_start_in_constructed (pRExC_state->save_copy_start) #define RExC_precomp_end (pRExC_state->precomp_end) #define RExC_rx_sv (pRExC_state->rx_sv) #define RExC_rx (pRExC_state->rx) @@ -821,8 +824,13 @@ static const scan_data_t zero_scan_data = { } STMT_END /* Setting this to NULL is a signal to not output warnings */ -#define TURN_OFF_WARNINGS_IN_SUBSTITUTE_PARSE RExC_copy_start_in_constructed = NULL -#define RESTORE_WARNINGS RExC_copy_start_in_constructed = RExC_precomp +#define TURN_OFF_WARNINGS_IN_SUBSTITUTE_PARSE \ + STMT_START { \ + RExC_save_copy_start_in_constructed = RExC_copy_start_in_constructed;\ + RExC_copy_start_in_constructed = NULL; \ + } STMT_END +#define RESTORE_WARNINGS \ + RExC_copy_start_in_constructed = RExC_save_copy_start_in_constructed /* Since a warning can be generated multiple times as the input is reparsed, we * output it the first time we come to that point in the parse, but suppress it @@ -10992,14 +11000,14 @@ S_handle_named_backref(pTHX_ RExC_state_t *pRExC_state, RExC_sawback = 1; ret = reganode(pRExC_state, ((! FOLD) - ? NREF + ? REFN : (ASCII_FOLD_RESTRICTED) - ? NREFFA + ? REFFAN : (AT_LEAST_UNI_SEMANTICS) - ? NREFFU + ? REFFUN : (LOC) - ? NREFFL - : NREFF), + ? REFFLN + : REFFN), num); *flagp |= HASWIDTH; @@ -11840,7 +11848,7 @@ S_reg(pTHX_ RExC_state_t *pRExC_state, I32 paren, I32 *flagp, U32 depth) RExC_rxi->data->data[num]=(void*)sv_dat; SvREFCNT_inc_simple_void_NN(sv_dat); } - ret = reganode(pRExC_state, NGROUPP, num); + ret = reganode(pRExC_state, GROUPPN, num); goto insert_if_check_paren; } else if (memBEGINs(RExC_parse, @@ -20286,7 +20294,7 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const regnode *o, const regmatch_ name_list= RExC_paren_name_list; } if (name_list) { - if ( k != REF || (OP(o) < NREF)) { + if ( k != REF || (OP(o) < REFN)) { SV **name= av_fetch(name_list, parno, 0 ); if (name) Perl_sv_catpvf(aTHX_ sv, " '%" SVf "'", SVfARG(*name)); @@ -23428,10 +23436,12 @@ Perl_parse_uniprop_string(pTHX_ * NV. */ NV value; + SSize_t value_len = lookup_len - equals_pos; /* Get the value */ - if (my_atof3(lookup_name + equals_pos, &value, - lookup_len - equals_pos) + if ( value_len <= 0 + || my_atof3(lookup_name + equals_pos, &value, + value_len) != lookup_name + lookup_len) { goto failed; diff --git a/regcomp.sym b/regcomp.sym index a35beca063..51f3176bcf 100644 --- a/regcomp.sym +++ b/regcomp.sym @@ -118,7 +118,7 @@ EXACTFAA_NO_TRIE EXACT, str ; Match this string using /iaa rules (w/len) (str EXACT_ONLY8 EXACT, str ; Like EXACT, but only UTF-8 encoded targets can match EXACTFU_ONLY8 EXACT, str ; Like EXACTFU, but only UTF-8 encoded targets can match -# One could add EXACTFAA8 and and something that has the same effect for /l, +# One could add EXACTFAA8 and something that has the same effect for /l, # but these would be extremely uncommon EXACTFU_S_EDGE EXACT, str ; /di rules, but nothing in it precludes /ui, except begins and/or ends with [Ss]; (string not UTF-8; compile-time only). @@ -166,11 +166,11 @@ REFFA REF, num 1 V ; Match already matched string, using /aai rul #*Named references. Code in regcomp.c assumes that these all are after #*the numbered references -NREF REF, no-sv 1 V ; Match some already matched string -NREFF REF, no-sv 1 V ; Match already matched string, using /di rules. -NREFFL REF, no-sv 1 V ; Match already matched string, using /li rules. -NREFFU REF, num 1 V ; Match already matched string, using /ui rules. -NREFFA REF, num 1 V ; Match already matched string, using /aai rules. +REFN REF, no-sv 1 V ; Match some already matched string +REFFN REF, no-sv 1 V ; Match already matched string, using /di rules. +REFFLN REF, no-sv 1 V ; Match already matched string, using /li rules. +REFFUN REF, num 1 V ; Match already matched string, using /ui rules. +REFFAN REF, num 1 V ; Match already matched string, using /aai rules. #*Support for long RE LONGJMP LONGJMP, off 1 . 1 ; Jump far away. @@ -214,7 +214,7 @@ AHOCORASICKC TRIE,trie charclass ; Same as AHOCORASICK, but with embedded c GOSUB GOSUB, num/ofs 2L ; recurse to paren arg1 at (signed) ofs arg2 #*Special conditionals -NGROUPP NGROUPP, no-sv 1 ; Whether the group matched. +GROUPPN GROUPPN, no-sv 1 ; Whether the group matched. INSUBP INSUBP, num 1 ; Whether we are in a specific recurse. DEFINEP DEFINEP, none 1 ; Never execute directly. diff --git a/regexec.c b/regexec.c index f8fa85010f..02c2d623c1 100644 --- a/regexec.c +++ b/regexec.c @@ -7034,7 +7034,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) } break; - case NREFFL: /* /\g{name}/il */ + case REFFLN: /* /\g{name}/il */ { /* The capture buffer cases. The ones beginning with N for the named buffers just convert to the equivalent numbered and pretend they were called as the corresponding numbered buffer @@ -7054,28 +7054,28 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) utf8_fold_flags = FOLDEQ_LOCALE; goto do_nref; - case NREFFA: /* /\g{name}/iaa */ + case REFFAN: /* /\g{name}/iaa */ folder = foldEQ_latin1; fold_array = PL_fold_latin1; type = REFFA; utf8_fold_flags = FOLDEQ_UTF8_NOMIX_ASCII; goto do_nref; - case NREFFU: /* /\g{name}/iu */ + case REFFUN: /* /\g{name}/iu */ folder = foldEQ_latin1; fold_array = PL_fold_latin1; type = REFFU; utf8_fold_flags = 0; goto do_nref; - case NREFF: /* /\g{name}/i */ + case REFFN: /* /\g{name}/i */ folder = foldEQ; fold_array = PL_fold; type = REFF; utf8_fold_flags = 0; goto do_nref; - case NREF: /* /\g{name}/ */ + case REFN: /* /\g{name}/ */ type = REF; folder = NULL; fold_array = NULL; @@ -7729,7 +7729,7 @@ S_regmatch(pTHX_ regmatch_info *reginfo, char *startpos, regnode *prog) sw = cBOOL(rex->lastparen >= n && rex->offs[n].end != -1); break; - case NGROUPP: /* (?(<name>)) */ + case GROUPPN: /* (?(<name>)) */ /* reg_check_named_buff_matched returns 0 for no match */ sw = cBOOL(0 < reg_check_named_buff_matched(rex,scan)); break; diff --git a/regnodes.h b/regnodes.h index ba691a2c18..dd23029a2c 100644 --- a/regnodes.h +++ b/regnodes.h @@ -76,11 +76,11 @@ #define REFFL 62 /* 0x3e Match already matched string, using /li rules. */ #define REFFU 63 /* 0x3f Match already matched string, usng /ui. */ #define REFFA 64 /* 0x40 Match already matched string, using /aai rules. */ -#define NREF 65 /* 0x41 Match some already matched string */ -#define NREFF 66 /* 0x42 Match already matched string, using /di rules. */ -#define NREFFL 67 /* 0x43 Match already matched string, using /li rules. */ -#define NREFFU 68 /* 0x44 Match already matched string, using /ui rules. */ -#define NREFFA 69 /* 0x45 Match already matched string, using /aai rules. */ +#define REFN 65 /* 0x41 Match some already matched string */ +#define REFFN 66 /* 0x42 Match already matched string, using /di rules. */ +#define REFFLN 67 /* 0x43 Match already matched string, using /li rules. */ +#define REFFUN 68 /* 0x44 Match already matched string, using /ui rules. */ +#define REFFAN 69 /* 0x45 Match already matched string, using /aai rules. */ #define LONGJMP 70 /* 0x46 Jump far away. */ #define BRANCHJ 71 /* 0x47 BRANCH with long offset. */ #define IFMATCH 72 /* 0x48 Succeeds if the following matches; non-zero flags "f", next_off "o" means lookbehind assertion starting "f..(f-o)" characters before current */ @@ -97,7 +97,7 @@ #define AHOCORASICK 83 /* 0x53 Aho Corasick stclass. flags==type */ #define AHOCORASICKC 84 /* 0x54 Same as AHOCORASICK, but with embedded charclass data */ #define GOSUB 85 /* 0x55 recurse to paren arg1 at (signed) ofs arg2 */ -#define NGROUPP 86 /* 0x56 Whether the group matched. */ +#define GROUPPN 86 /* 0x56 Whether the group matched. */ #define INSUBP 87 /* 0x57 Whether we are in a specific recurse. */ #define DEFINEP 88 /* 0x58 Never execute directly. */ #define ENDLIKE 89 /* 0x59 Used only for the type field of verbs */ @@ -226,11 +226,11 @@ EXTCONST U8 PL_regkind[] = { REF, /* REFFL */ REF, /* REFFU */ REF, /* REFFA */ - REF, /* NREF */ - REF, /* NREFF */ - REF, /* NREFFL */ - REF, /* NREFFU */ - REF, /* NREFFA */ + REF, /* REFN */ + REF, /* REFFN */ + REF, /* REFFLN */ + REF, /* REFFUN */ + REF, /* REFFAN */ LONGJMP, /* LONGJMP */ BRANCHJ, /* BRANCHJ */ BRANCHJ, /* IFMATCH */ @@ -247,7 +247,7 @@ EXTCONST U8 PL_regkind[] = { TRIE, /* AHOCORASICK */ TRIE, /* AHOCORASICKC */ GOSUB, /* GOSUB */ - NGROUPP, /* NGROUPP */ + GROUPPN, /* GROUPPN */ INSUBP, /* INSUBP */ DEFINEP, /* DEFINEP */ ENDLIKE, /* ENDLIKE */ @@ -377,11 +377,11 @@ static const U8 regarglen[] = { EXTRA_SIZE(struct regnode_1), /* REFFL */ EXTRA_SIZE(struct regnode_1), /* REFFU */ EXTRA_SIZE(struct regnode_1), /* REFFA */ - EXTRA_SIZE(struct regnode_1), /* NREF */ - EXTRA_SIZE(struct regnode_1), /* NREFF */ - EXTRA_SIZE(struct regnode_1), /* NREFFL */ - EXTRA_SIZE(struct regnode_1), /* NREFFU */ - EXTRA_SIZE(struct regnode_1), /* NREFFA */ + EXTRA_SIZE(struct regnode_1), /* REFN */ + EXTRA_SIZE(struct regnode_1), /* REFFN */ + EXTRA_SIZE(struct regnode_1), /* REFFLN */ + EXTRA_SIZE(struct regnode_1), /* REFFUN */ + EXTRA_SIZE(struct regnode_1), /* REFFAN */ EXTRA_SIZE(struct regnode_1), /* LONGJMP */ EXTRA_SIZE(struct regnode_1), /* BRANCHJ */ EXTRA_SIZE(struct regnode_1), /* IFMATCH */ @@ -398,7 +398,7 @@ static const U8 regarglen[] = { EXTRA_SIZE(struct regnode_1), /* AHOCORASICK */ EXTRA_SIZE(struct regnode_charclass), /* AHOCORASICKC */ EXTRA_SIZE(struct regnode_2L), /* GOSUB */ - EXTRA_SIZE(struct regnode_1), /* NGROUPP */ + EXTRA_SIZE(struct regnode_1), /* GROUPPN */ EXTRA_SIZE(struct regnode_1), /* INSUBP */ EXTRA_SIZE(struct regnode_1), /* DEFINEP */ 0, /* ENDLIKE */ @@ -484,11 +484,11 @@ static const char reg_off_by_arg[] = { 0, /* REFFL */ 0, /* REFFU */ 0, /* REFFA */ - 0, /* NREF */ - 0, /* NREFF */ - 0, /* NREFFL */ - 0, /* NREFFU */ - 0, /* NREFFA */ + 0, /* REFN */ + 0, /* REFFN */ + 0, /* REFFLN */ + 0, /* REFFUN */ + 0, /* REFFAN */ 1, /* LONGJMP */ 1, /* BRANCHJ */ 1, /* IFMATCH */ @@ -505,7 +505,7 @@ static const char reg_off_by_arg[] = { 0, /* AHOCORASICK */ 0, /* AHOCORASICKC */ 0, /* GOSUB */ - 0, /* NGROUPP */ + 0, /* GROUPPN */ 0, /* INSUBP */ 0, /* DEFINEP */ 0, /* ENDLIKE */ @@ -597,11 +597,11 @@ EXTCONST char * const PL_reg_name[] = { "REFFL", /* 0x3e */ "REFFU", /* 0x3f */ "REFFA", /* 0x40 */ - "NREF", /* 0x41 */ - "NREFF", /* 0x42 */ - "NREFFL", /* 0x43 */ - "NREFFU", /* 0x44 */ - "NREFFA", /* 0x45 */ + "REFN", /* 0x41 */ + "REFFN", /* 0x42 */ + "REFFLN", /* 0x43 */ + "REFFUN", /* 0x44 */ + "REFFAN", /* 0x45 */ "LONGJMP", /* 0x46 */ "BRANCHJ", /* 0x47 */ "IFMATCH", /* 0x48 */ @@ -618,7 +618,7 @@ EXTCONST char * const PL_reg_name[] = { "AHOCORASICK", /* 0x53 */ "AHOCORASICKC", /* 0x54 */ "GOSUB", /* 0x55 */ - "NGROUPP", /* 0x56 */ + "GROUPPN", /* 0x56 */ "INSUBP", /* 0x57 */ "DEFINEP", /* 0x58 */ "ENDLIKE", /* 0x59 */ @@ -758,7 +758,7 @@ EXTCONST U8 PL_varies[] __attribute__deprecated__; #else EXTCONST U8 PL_varies[] __attribute__deprecated__ = { CLUMP, BRANCH, STAR, PLUS, CURLY, CURLYN, CURLYM, CURLYX, WHILEM, REF, - REFF, REFFL, REFFU, REFFA, NREF, NREFF, NREFFL, NREFFU, NREFFA, + REFF, REFFL, REFFU, REFFA, REFN, REFFN, REFFLN, REFFUN, REFFAN, BRANCHJ, SUSPEND, IFTHEN, 0 }; diff --git a/t/loc_tools.pl b/t/loc_tools.pl index c76e29388a..beebd98d88 100644 --- a/t/loc_tools.pl +++ b/t/loc_tools.pl @@ -421,7 +421,7 @@ sub find_locales ($;$) { } # The rest of the locales are in this file. - push @Data, <DATA>; + push @Data, <DATA>; close DATA; foreach my $line (@Data) { my ($locale_name, $language_codes, $country_codes, $encodings) = diff --git a/t/re/reg_mesg.t b/t/re/reg_mesg.t index c5c79f0323..d10fa2c09a 100644 --- a/t/re/reg_mesg.t +++ b/t/re/reg_mesg.t @@ -318,6 +318,7 @@ my @death = '/\p{Is_Other_Alphabetic=F}/ ' => 'Can\'t find Unicode property definition "Is_Other_Alphabetic=F" {#} m/\p{Is_Other_Alphabetic=F}{#}/', '/\x{100}(?(/' => 'Unknown switch condition (?(...)) {#} m/\\x{100}(?({#}/', # [perl #133896] '/(?[\N{KEYCAP DIGIT NINE}/' => '\N{} in inverted character class or as a range end-point is restricted to one character {#} m/(?[\\N{U+39.FE0F.20E3{#}}/', # [perl #133988] + '/0000000000000000[\N{U+0.00}0000/' => 'Unmatched [ {#} m/0000000000000000[{#}\N{U+0.00}0000/', # [perl #134059] ); # These are messages that are death under 'use re "strict"', and may or may diff --git a/taint.c b/taint.c index 72216f85de..871d89f48b 100644 --- a/taint.c +++ b/taint.c @@ -26,6 +26,9 @@ void Perl_taint_proper(pTHX_ const char *f, const char *const s) { + /* Output a tainting violation, croaking unless we're just to warn. + * '_proper' is just to throw you off the scent */ + #if defined(HAS_SETEUID) && defined(DEBUGGING) PERL_ARGS_ASSERT_TAINT_PROPER; @@ -60,7 +63,7 @@ Perl_taint_proper(pTHX_ const char *f, const char *const s) ug = " while running with -T switch"; /* XXX because taint_proper adds extra format args, we can't - * get the caller to check properly; o we just silence the warning + * get the caller to check properly; so we just silence the warning * and hope the callers aren't naughty */ GCC_DIAG_IGNORE_STMT(-Wformat-nonliteral); if (PL_unsafe || TAINT_WARN_get) { diff --git a/toke.c b/toke.c index 68eea0cae6..03c4f2ba26 100644 --- a/toke.c +++ b/toke.c @@ -4097,10 +4097,12 @@ S_scan_const(pTHX_ char *start) goto default_action; /* Redo, having upgraded so both are UTF-8 */ } else { /* UTF8ness matters: convert this non-UTF8 source char to - UTF-8 for output. It will occupy 2 bytes */ - if (d + 2 >= SvEND(sv)) { - const STRLEN extra = 2 + (send - s - 1) + 1; - const STRLEN off = d - SvPVX_const(sv); + UTF-8 for output. It will occupy 2 bytes, but don't include + the input byte since we haven't incremented 's' yet. See + Note on sizing above. */ + const STRLEN off = d - SvPVX(sv); + const STRLEN extra = 2 + (send - s - 1) + 1; + if (off + extra > SvLEN(sv)) { d = off + SvGROW(sv, off + extra); } *d++ = UTF8_EIGHT_BIT_HI(*s); -- Perl5 Master Repository
