[perl.git] branch blead, updated. v5.19.2-246-g36b347b

Karl Williamson Tue, 30 Jul 2013 11:08:01 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/36b347b9c8caf3680cadf3c3cb98018a41a92507?hp=c23a69e83f410b43450b9f1e9696d04795aae758>


- Log -----------------------------------------------------------------
commit 36b347b9c8caf3680cadf3c3cb98018a41a92507
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 30 11:27:20 2013 -0600

    regcomp.c: Remove extraneous debug info
    
    Prior to this commit the prhase {unicode} was emitted to mark what a
    bracketed character class matched that wasn't in that classes bitmap.
    This was oftern accompanied by another phrase that gave further
    details.  Since everything is {unicode}, the first phrase isn't very
    helpful.  Now it is changed to {utf8} for those things that won't match
    unless the target string is in utf8 (this includes some upper latin1
    code points under /d matches), or {outside bitmap} for where utf8 isn't
    necessasily required (this happens for user-defined Unicode properties
    that aren't known at compile time).

M       regcomp.c

commit b60fb2ed2680b5d8e070d120472a5888810bbd95
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 30 11:21:11 2013 -0600

    regcomp.c: White-space only
    
    Outdent code which the previous commit removed from a block.

M       regcomp.c

commit 89d3fa0ee43d5c7489581a62b3d662c316bfcb43
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 30 11:16:56 2013 -0600

    regcomp.c: Remove redundant code
    
    This code is redundant, attempting to output what isn't returned in 'lv'
    by reglcass_swash(), but that function makes sure that 'lv' contains
    everything it should, so we ended up processing (and outputting) the
    same data twice.

M       regcomp.c

commit a498e2f8b43c486d265d1afa2d1ca923f13325c2
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 30 10:51:35 2013 -0600

    regexec.c: Add, clarify comments

M       regexec.c

commit 186551534c7256e97240abd050bead0f17f28fdc
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 27 19:10:27 2013 -0600

    regcomp.c: Change Debug output of char classes
    
    This commit causes the debug output that was formerly "\x4ff", for
    example to be \x{4f}f.  It always puts braces around the hex to separate
    it from other characters.

M       regcomp.c

commit f202c207b253f1f19a37243471f35742d6bb309c
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 27 18:45:18 2013 -0600

    regcomp.c: Debug output clearer ranges
    
    It's not immediately obvious what the character class [!-~] matches.
    Better is its equivalent: [\x21-\x7e].  This commit changes the debug
    output to be the latter for character class matches, while retaining the
    current behavior where it is clear what the range matches, in, e.g.,
    [J-R].  Ranges like [A-z] include more than just alphabetics, so they
    are now output as [\x41-\x7a].  (Debug output is done, for example, when
    the command line option -Dr is specified.)

M       regcomp.c

commit 9a1ec8a9cfbf5741c8f41cdf8d1f5fe0c3600696
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 27 18:21:40 2013 -0600

    regcomp.c: White-space only
    
    This indents properly to correspond to a newly formed block

M       regcomp.c

commit e89035d52414c485845a8b299343063fa14a2253
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 27 18:19:29 2013 -0600

    regcomp.c: Change debug output to use \t, etc instead of hex
    
    It is easier to read the standard abbreviations \t, \n, etc than the hex
    equivalents, \x09, ...

M       regcomp.c

commit 3ae1b3845fd924a5615289bd6a44ed109508f93f
Author: Karl Williamson <[email protected]>
Date:   Sat Jul 27 18:14:12 2013 -0600

    regcomp.c: Extract duplicated code into single fcn
    
    This code that appears twice is nearly duplicate.

M       embed.fnc
M       embed.h
M       proto.h
M       regcomp.c

commit 5b34bd203cb394f6ec83a0cb41cf11c7b3937f77
Author: Karl Williamson <[email protected]>
Date:   Tue Jul 30 11:39:02 2013 -0600

    Regen t/porting/known_pod_issues.dat.
    
    This silences a warning that shows up under pedantic mode

M       t/porting/known_pod_issues.dat

commit 1fa1147c2c277eb9a22c643f21b08da9bc15510e
Author: Karl Williamson <[email protected]>
Date:   Fri Jul 26 14:26:27 2013 -0600

    regcomp.c: Fix potential scalar leak
    
    The lines in this code were reversed.  We need to check something before
    overwriting it, rather than the other way around.  The result would be
    that under certain circumstances a SV would not get freed.  Those
    circumstances are very limited: the first of the three parameters to
    this function is not empty, but the 2nd is, and the output (3rd
    parameter) is to overwrite the 2nd.  I found this bug by code reading; I
    have searched the code space and there are no current calls to it that
    have this parameter configuration, therefore there is no test that can
    be added to trigger it.

M       regcomp.c

commit 15c0b4aa89b8fe74b8f5e4712967bbb49292b95e
Author: Karl Williamson <[email protected]>
Date:   Wed Jul 24 20:02:40 2013 -0600

    regcomp.c: Change #ifdef
    
    This function is currently #ifdef'd out.  Change it so that enabling it
    in embed.fnc automatically enables it here as well, making a 2 step
    process into just a single step.

M       regcomp.c
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc                      |   1 +
 embed.h                        |   1 +
 proto.h                        |   6 +
 regcomp.c                      | 242 +++++++++++++++++++++--------------------
 regexec.c                      |  21 ++--
 t/porting/known_pod_issues.dat |   3 +-
 6 files changed, 145 insertions(+), 129 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index a85b8a6..e4cb24d 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -2063,6 +2063,7 @@ Es        |const regnode*|dumpuntil|NN const regexp *r|NN 
const regnode *start \
                                |NULLOK const regnode *plast \
                                |NN SV* sv|I32 indent|U32 depth
 Es     |void   |put_byte       |NN SV* sv|int c
+Es     |bool   |put_latin1_charclass_innards|NN SV* sv|NN char* bitmap
 Es     |void   |dump_trie      |NN const struct _reg_trie_data *trie\
                                |NULLOK HV* widecharmap|NN AV *revcharmap\
                                |U32 depth
diff --git a/embed.h b/embed.h
index d755269..94f4c15 100644
--- a/embed.h
+++ b/embed.h
@@ -882,6 +882,7 @@
 #define dump_trie_interim_table(a,b,c,d,e)     S_dump_trie_interim_table(aTHX_ 
a,b,c,d,e)
 #define dumpuntil(a,b,c,d,e,f,g,h)     S_dumpuntil(aTHX_ a,b,c,d,e,f,g,h)
 #define put_byte(a,b)          S_put_byte(aTHX_ a,b)
+#define put_latin1_charclass_innards(a,b)      
S_put_latin1_charclass_innards(aTHX_ a,b)
 #define regdump_extflags(a,b)  S_regdump_extflags(aTHX_ a,b)
 #define regdump_intflags(a,b)  S_regdump_intflags(aTHX_ a,b)
 #define regtail_study(a,b,c,d) S_regtail_study(aTHX_ a,b,c,d)
diff --git a/proto.h b/proto.h
index 15ec073..8599884 100644
--- a/proto.h
+++ b/proto.h
@@ -5217,6 +5217,12 @@ STATIC void      S_put_byte(pTHX_ SV* sv, int c)
 #define PERL_ARGS_ASSERT_PUT_BYTE      \
        assert(sv)
 
+STATIC bool    S_put_latin1_charclass_innards(pTHX_ SV* sv, char* bitmap)
+                       __attribute__nonnull__(pTHX_1)
+                       __attribute__nonnull__(pTHX_2);
+#define PERL_ARGS_ASSERT_PUT_LATIN1_CHARCLASS_INNARDS  \
+       assert(sv); assert(bitmap)
+
 STATIC void    S_regdump_extflags(pTHX_ const char *lead, const U32 flags);
 STATIC void    S_regdump_intflags(pTHX_ const char *lead, const U32 flags);
 STATIC U8      S_regtail_study(pTHX_ struct RExC_state_t *pRExC_state, regnode 
*p, const regnode *val, U32 depth)
diff --git a/regcomp.c b/regcomp.c
index 0af3483..8c7a6f8 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -7844,11 +7844,11 @@ Perl__invlist_intersection_maybe_complement_2nd(pTHX_ 
SV* const a, SV* const b,
              * must be every possible code point.  Thus the intersection is
              * simply 'a'. */
             if (*i != a) {
-                *i = invlist_clone(a);
-
                 if (*i == b) {
                     SvREFCNT_dec_NN(b);
                 }
+
+                *i = invlist_clone(a);
             }
             /* else *i is already 'a' */
             return;
@@ -8312,7 +8312,7 @@ Perl__invlist_dump(pTHX_ SV* const invlist, const char * 
const header)
 }
 #endif
 
-#if 0
+#ifdef PERL_ARGS_ASSERT__INVLISTEQ
 bool
 S__invlistEQ(pTHX_ SV* const a, SV* const b, const bool complement_b)
 {
@@ -14685,26 +14685,10 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o)
             )
         );
         if ( IS_ANYOF_TRIE(op) || trie->bitmap ) {
-            int i;
-            int rangestart = -1;
-            U8* bitmap = IS_ANYOF_TRIE(op) ? (U8*)ANYOF_BITMAP(o) : 
(U8*)TRIE_BITMAP(trie);
             sv_catpvs(sv, "[");
-            for (i = 0; i <= 256; i++) {
-                if (i < 256 && BITMAP_TEST(bitmap,i)) {
-                    if (rangestart == -1)
-                        rangestart = i;
-                } else if (rangestart != -1) {
-                    if (i <= rangestart + 3)
-                        for (; rangestart < i; rangestart++)
-                            put_byte(sv, rangestart);
-                    else {
-                        put_byte(sv, rangestart);
-                        sv_catpvs(sv, "-");
-                        put_byte(sv, i - 1);
-                    }
-                    rangestart = -1;
-                }
-            }
+            (void) put_latin1_charclass_innards(sv, IS_ANYOF_TRIE(op)
+                                                   ? ANYOF_BITMAP(o)
+                                                   : TRIE_BITMAP(trie));
             sv_catpvs(sv, "]");
         } 
         
@@ -14748,7 +14732,6 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o)
     } else if (k == LOGICAL)
        Perl_sv_catpvf(aTHX_ sv, "[%d]", o->flags);     /* 2: embedded, 
otherwise 1 */
     else if (k == ANYOF) {
-       int i, rangestart = -1;
        const U8 flags = ANYOF_FLAGS(o);
        int do_sep = 0;
 
@@ -14762,32 +14745,19 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o)
            sv_catpvs(sv, "^");
 
        /* output what the standard cp 0-255 bitmap matches */
-       for (i = 0; i <= 256; i++) {
-           if (i < 256 && ANYOF_BITMAP_TEST(o,i)) {
-               if (rangestart == -1)
-                   rangestart = i;
-           } else if (rangestart != -1) {
-               if (i <= rangestart + 3)
-                   for (; rangestart < i; rangestart++)
-                       put_byte(sv, rangestart);
-               else {
-                   put_byte(sv, rangestart);
-                   sv_catpvs(sv, "-");
-                   put_byte(sv, i - 1);
-               }
-               do_sep = 1;
-               rangestart = -1;
-           }
-       }
+        do_sep = put_latin1_charclass_innards(sv, ANYOF_BITMAP(o));
         
         EMIT_ANYOF_TEST_SEPARATOR(do_sep,sv,flags);
         /* output any special charclass tests (used entirely under use locale) 
*/
-       if (ANYOF_CLASS_TEST_ANY_SET(o))
-           for (i = 0; i < (int)(sizeof(anyofs)/sizeof(char*)); i++)
+       if (ANYOF_CLASS_TEST_ANY_SET(o)) {
+            int i;
+           for (i = 0; i < (int)(sizeof(anyofs)/sizeof(char*)); i++) {
                if (ANYOF_CLASS_TEST(o,i)) {
                    sv_catpv(sv, anyofs[i]);
                    do_sep = 1;
                }
+            }
+        }
         
         EMIT_ANYOF_TEST_SEPARATOR(do_sep,sv,flags);
         
@@ -14798,91 +14768,61 @@ Perl_regprop(pTHX_ const regexp *prog, SV *sv, const 
regnode *o)
         /* output information about the unicode matching */
        if (flags & ANYOF_UNICODE_ALL)
            sv_catpvs(sv, "{unicode_all}");
-       else if (ANYOF_NONBITMAP(o))
-           sv_catpvs(sv, "{unicode}");
-       if (flags & ANYOF_NONBITMAP_NON_UTF8)
-           sv_catpvs(sv, "{outside bitmap}");
+       else if (ANYOF_NONBITMAP(o)) {
+            SV *lv; /* Set if there is something outside the bit map. */
+
+            if (flags & ANYOF_NONBITMAP_NON_UTF8) {
+                sv_catpvs(sv, "{outside bitmap}");
+            }
+            else {
+                sv_catpvs(sv, "{utf8}");
+            }
 
-       if (ANYOF_NONBITMAP(o)) {
-           SV *lv; /* Set if there is something outside the bit map */
+            /* Get the stuff that wasn't in the bitmap */
            SV * const sw = regclass_swash(prog, o, FALSE, &lv, NULL);
             bool byte_output = FALSE;   /* If something in the bitmap has been
                                            output */
-
            if (lv && lv != &PL_sv_undef) {
-               if (sw) {
-                   U8 s[UTF8_MAXBYTES_CASE+1];
-
-                   for (i = 0; i <= 256; i++) { /* Look at chars in bitmap */
-                       uvchr_to_utf8(s, i);
-
-                       if (i < 256
-                            && ! ANYOF_BITMAP_TEST(o, i)    /* Don't duplicate
-                                                               things already
-                                                               output as part
-                                                               of the bitmap */
-                            && swash_fetch(sw, s, TRUE))
-                        {
-                           if (rangestart == -1)
-                               rangestart = i;
-                       } else if (rangestart != -1) {
-                            byte_output = TRUE;
-                           if (i <= rangestart + 3)
-                               for (; rangestart < i; rangestart++) {
-                                   put_byte(sv, rangestart);
-                               }
-                           else {
-                               put_byte(sv, rangestart);
-                               sv_catpvs(sv, "-");
-                               put_byte(sv, i-1);
-                           }
-                           rangestart = -1;
-                       }
-                   }
-               }
+                char *s = savesvpv(lv);
+                char * const origs = s;
 
-               {
-                   char *s = savesvpv(lv);
-                   char * const origs = s;
-
-                   while (*s && *s != '\n')
-                       s++;
+                while (*s && *s != '\n')
+                    s++;
 
-                   if (*s == '\n') {
-                       const char * const t = ++s;
+                if (*s == '\n') {
+                    const char * const t = ++s;
 
-                        if (byte_output) {
-                            sv_catpvs(sv, " ");
-                        }
+                    if (byte_output) {
+                        sv_catpvs(sv, " ");
+                    }
 
-                       while (*s) {
-                           if (*s == '\n') {
+                    while (*s) {
+                        if (*s == '\n') {
 
-                                /* Truncate very long output */
-                               if (s - origs > 256) {
-                                   Perl_sv_catpvf(aTHX_ sv,
-                                                  "%.*s...",
-                                                  (int) (s - origs - 1),
-                                                  t);
-                                   goto out_dump;
-                               }
-                               *s = ' ';
-                           }
-                           else if (*s == '\t') {
-                               *s = '-';
-                           }
-                           s++;
-                       }
-                       if (s[-1] == ' ')
-                           s[-1] = 0;
+                            /* Truncate very long output */
+                            if (s - origs > 256) {
+                                Perl_sv_catpvf(aTHX_ sv,
+                                               "%.*s...",
+                                               (int) (s - origs - 1),
+                                               t);
+                                goto out_dump;
+                            }
+                            *s = ' ';
+                        }
+                        else if (*s == '\t') {
+                            *s = '-';
+                        }
+                        s++;
+                    }
+                    if (s[-1] == ' ')
+                        s[-1] = 0;
 
-                       sv_catpv(sv, t);
-                   }
+                    sv_catpv(sv, t);
+                }
 
-               out_dump:
+            out_dump:
 
-                   Safefree(origs);
-               }
+                Safefree(origs);
                SvREFCNT_dec_NN(lv);
            }
        }
@@ -15511,12 +15451,17 @@ S_put_byte(pTHX_ SV *sv, int c)
 
        So the old condition can be simplified to !isPRINT(c)  */
     if (!isPRINT(c)) {
-       if (c < 256) {
-           Perl_sv_catpvf(aTHX_ sv, "\\x%02x", c);
-       }
-       else {
-           Perl_sv_catpvf(aTHX_ sv, "\\x{%x}", c);
-       }
+        switch (c) {
+            case '\r': Perl_sv_catpvf(aTHX_ sv, "\\r"); break;
+            case '\n': Perl_sv_catpvf(aTHX_ sv, "\\n"); break;
+            case '\t': Perl_sv_catpvf(aTHX_ sv, "\\t"); break;
+            case '\f': Perl_sv_catpvf(aTHX_ sv, "\\f"); break;
+            case '\a': Perl_sv_catpvf(aTHX_ sv, "\\a"); break;
+
+            default:
+                Perl_sv_catpvf(aTHX_ sv, "\\x{%x}", c);
+                break;
+        }
     }
     else {
        const char string = c;
@@ -15526,6 +15471,63 @@ S_put_byte(pTHX_ SV *sv, int c)
     }
 }
 
+STATIC bool
+S_put_latin1_charclass_innards(pTHX_ SV *sv, char *bitmap)
+{
+    /* Appends to 'sv' a displayable version of the innards of the bracketed
+     * character class whose bitmap is 'bitmap';  Returns 'TRUE' if it actually
+     * output anything */
+
+    int i;
+    int rangestart = -1;
+    bool has_output_anything = FALSE;
+
+    PERL_ARGS_ASSERT_PUT_LATIN1_CHARCLASS_INNARDS;
+
+    for (i = 0; i <= 256; i++) {
+        if (i < 256 && BITMAP_TEST((U8 *) bitmap,i)) {
+            if (rangestart == -1)
+                rangestart = i;
+        } else if (rangestart != -1) {
+            int j = i - 1;
+            if (i <= rangestart + 3) {  /* Individual chars in short ranges */
+                for (; rangestart < i; rangestart++)
+                    put_byte(sv, rangestart);
+            }
+            else if (   j > 255
+                     || ! isALPHANUMERIC(rangestart)
+                     || ! isALPHANUMERIC(j)
+                     || isDIGIT(rangestart) != isDIGIT(j)
+                     || isUPPER(rangestart) != isUPPER(j)
+                     || isLOWER(rangestart) != isLOWER(j)
+
+                        /* This final test should get optimized out except
+                         * on EBCDIC platforms, where it causes ranges that
+                         * cross discontinuities like i/j to be shown as hex
+                         * instead of the misleading, e.g. H-K (since that
+                         * range includes more than H, I, J, K). */
+                     || (j - rangestart)
+                         != NATIVE_TO_ASCII(j) - NATIVE_TO_ASCII(rangestart))
+            {
+                Perl_sv_catpvf(aTHX_ sv, "\\x{%02x}-\\x{%02x}",
+                               rangestart,
+                               (j < 256) ? j : 255);
+            }
+            else { /* Here, the ends of the range are both digits, or both
+                      uppercase, or both lowercase; and there's no
+                      discontinuity in the range (which could happen on EBCDIC
+                      platforms) */
+                put_byte(sv, rangestart);
+                sv_catpvs(sv, "-");
+                put_byte(sv, j);
+            }
+            rangestart = -1;
+            has_output_anything = TRUE;
+        }
+    }
+
+    return has_output_anything;
+}
 
 #define CLEAR_OPTSTART \
     if (optstart) STMT_START { \
diff --git a/regexec.c b/regexec.c
index 04491ee..4a350fb 100644
--- a/regexec.c
+++ b/regexec.c
@@ -7271,15 +7271,18 @@ STATIC SV *
 S_core_regclass_swash(pTHX_ const regexp *prog, const regnode* node, bool 
doinit, SV** listsvp)
 {
     /* Returns the swash for the input 'node' in the regex 'prog'.
-     * If <doinit> is true, will attempt to create the swash if not already
+     * If <doinit> is 'true', will attempt to create the swash if not already
      *   done.
-     * If <listsvp> is non-null, will return the swash initialization string in
-     *   it.
+     * If <listsvp> is non-null, will return the printable contents of the
+     *    swash.  This can be used to get debugging information even before the
+     *    swash exists, by calling this function with 'doinit' set to false, in
+     *    which case the components that will be used to eventually create the
+     *    swash are returned  (in a printable form).
      * Tied intimately to how regcomp.c sets up the data structure */
 
     dVAR;
     SV *sw  = NULL;
-    SV *si  = NULL;
+    SV *si  = NULL;         /* Input swash initialization string */
     SV*  invlist = NULL;
 
     RXi_GET_DECL(prog,progi);
@@ -7332,16 +7335,18 @@ S_core_regclass_swash(pTHX_ const regexp *prog, const 
regnode* node, bool doinit
        }
     }
        
+    /* If requested, return a printable version of what this swash matches */
     if (listsvp) {
        SV* matches_string = newSVpvn("", 0);
 
-       /* Use the swash, if any, which has to have incorporated into it all
-        * possibilities */
+        /* The swash should be used, if possible, to get the data, as it
+         * contains the resolved data.  But this function can be called at
+         * compile-time, before everything gets resolved, in which case we
+         * return the currently best available information, which is the string
+         * that will eventually be used to do that resolving, 'si' */
        if ((! sw || (invlist = _get_swash_invlist(sw)) == NULL)
             && (si && si != &PL_sv_undef))
         {
-
-           /* If no swash, use the input initialization string, if available */
            sv_catsv(matches_string, si);
        }
 
diff --git a/t/porting/known_pod_issues.dat b/t/porting/known_pod_issues.dat
index 0bd64c5..12c2b15 100644
--- a/t/porting/known_pod_issues.dat
+++ b/t/porting/known_pod_issues.dat
@@ -35,8 +35,8 @@ Class::PseudoHash
 Classic::Perl
 Clone
 cpan2dist(1)
-cpanp(1)
 CPAN::Changes::Spec
+cpanp(1)
 CPANPLUS
 Crypt::Random
 curl(1)
@@ -230,6 +230,7 @@ pod/perldsc.pod     Verbatim line length including indents 
exceeds 79 by    4
 pod/perldtrace.pod     Verbatim line length including indents exceeds 79 by    
26
 pod/perlebcdic.pod     Verbatim line length including indents exceeds 79 by    
13
 pod/perlembed.pod      Verbatim line length including indents exceeds 79 by    
27
+pod/perlfunc.pod       ? Should you be using F<...> or maybe L<...> instead of 
1
 pod/perlgit.pod        Verbatim line length including indents exceeds 79 by    
12
 pod/perlgpl.pod        Verbatim line length including indents exceeds 79 by    
50
 pod/perlguts.pod       ? Should you be using F<...> or maybe L<...> instead of 
2

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.19.2-246-g36b347b

Reply via email to