[perl.git] branch blead, updated. v5.13.6-306-g9e92194

Father Chrysostomos Sun, 31 Oct 2010 16:59:25 -0700

In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/9e92194e47c35a3910a8726b88f0abc6c1c0df3c?hp=7cb18e1b020cd2e5d1de687ae046ab2d48a69301>


- Log -----------------------------------------------------------------
commit 9e92194e47c35a3910a8726b88f0abc6c1c0df3c
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 14:04:15 2010 -0600

    perlunicode.pod: Add detail on utf8/locale conflicts

M       pod/perlunicode.pod

commit 4b844e06c99d9c2e251dde1c8abc47508b801786
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 13:37:11 2010 -0600

    regcomp.h: Remove unused #define
    
    ANYOF_RUNTIME() is no longer used, so can be removed.
    
    I had long tried to figure out what the purpose of this was, and
    discovered it really had none.
    
    I think it must have had something to do with locales at one time.  But
    locales don't do well with utf8, and I don't know how to make it better.
    In any event this wasn't actually accomplishing anything.

M       regcomp.h

commit e051a21d2458b5ce83206336c2ee2d39013b8d1b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 13:31:13 2010 -0600

    reginclass: Remove redundant test
    
    The previous re-ordering of this function makes it clear that this test
    doesn't do anything.  It is testing the charclass bitmap, but that was
    already done in the re-ordered block from a previous commit, so if it
    didn't succeed there, it won't succeed here.
    
    In fact, trying to understand why this code was here was what led me to
    figure out that it wasn't, and that things could be sped up by doing the
    reordering.

M       regexec.c

commit 7cdde5444c9ad8cccf237ec340ddb54f58ce3cf0
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 13:17:34 2010 -0600

    reginclass: Reorder fastest first
    
    This patch simply moves the block of code that does the bitmap tests in
    front of the block of code that deals with potential things not in the
    bit map.  The reason to do this is that it is faster to find things in
    the bitmap, than to have to create a utf8 swash.
    
    The patch also adds some comments, and the first block doesn't have to
    test if there has been a match, and the second block does, so if
    statements for those two blocks are adjusted accordingly.
    
    The proof that this doesn't break anything stems from the fact that the
    routine never stops early.  If there wasn't a match in the first block
    of code, it would execute the second block.  Thus swapping the order
    doesn't affect the outcome.  The side effects of the first block are
    reading in the swash.  These side effects won't happen if it no longer
    gets executed, because the other block matched.  And thus an error could
    be introduced if there were coding errors elsewhere that didn't
    initialize the swash before using it.  But that doesn't appear to be the
    case, as all tests pass.

M       regexec.c

commit ea6756a64e0a36eb91bcea39467a2253c31192b8
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 12:45:50 2010 -0600

    reginclass: Remove unnecessary test
    
    The previous changes have made it clear that this test never was useful,
    so remove it.

M       regexec.c

commit a5a291f55baf01a6b4b1013e2d3c722a0ad77432
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 12:36:49 2010 -0600

    reginclass: Make explicit the length assumptions
    
    reginclass assumes that can match always at least one character.  Make
    that explicit, and now that we have that length always saved, don't
    recalculate it.

M       regexec.c

commit f7ab54c630d4ff497dfc2435654b5aee46420b17
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 12:31:11 2010 -0600

    reginclass: Rename variable for clarity
    
    Several other variables in the routine have the previous name

M       regexec.c

commit bf3f2d85838ef8cae6a146bc0d731316fbdb3554
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 12:07:43 2010 -0600

    regcomp.h: Clean up some comments

M       regcomp.h

commit eb91e30040d44cf387bdcd0ad4dd10fec2d95799
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 12:05:23 2010 -0600

    ANYOF_LARGE is now the same as ANYOF_CLASS
    
    These two #defines now mean the same thing.  Free up bit used by
    ANYOF_LARGE

M       regcomp.h

commit 20ed0b260f81efafa0c8e1cd9413ad026e3f19be
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 11:50:08 2010 -0600

    regexec.c: reorder statements for speed
    
    The call to reginclass is guaranteed by constness to not change
    locinput, so if going to fail don't waste time calling it.

M       regexec.c

commit af364d036721fad8cdfe034fea03192968f09774
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 11:20:43 2010 -0600

    regexec.c: Add clarifying comment

M       regexec.c

commit f6ad78d8b847361e046c98c7811db99387508807
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 10:37:55 2010 -0600

    reginclass: add some consts to prototype

M       embed.fnc
M       proto.h
M       regexec.c

commit b32d7d3e8f81b492f99ac325dc00ad4dca683023
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 10:33:50 2010 -0600

    regexec.c: Remove redundant line.
    
    Now that reginclass is guaranteed to return the match length upon
    success, the caller need not do it again.

M       regexec.c

commit 4b3cda86f0c2a54a1ac8bbbaf4a2412f98dff6c0
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 10:21:14 2010 -0600

    reginclass: Return matched length even if not utf8
    
    This also allows for less special case testing

M       regexec.c

commit 6698fab5a29c48acfd94f01ff8a587819f15be79
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 09:20:41 2010 -0600

    reginclass: Change variable name for clarity.

M       regexec.c

commit eba1359ea44dc786981104981f7f0c719920e6ba
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Oct 31 09:11:39 2010 -0600

    regexec.c: Document existing reginclass behavior

M       regexec.c

commit 8ff01ef0717d3f418692ebdcfbeff3221e20f9f8
Author: Father Chrysostomos <spr...@cpan.org>
Date:   Sun Oct 31 14:35:59 2010 -0700

    perldelta up to 7cb18e1b02

M       pod/perldelta.pod
-----------------------------------------------------------------------

Summary of changes:
 embed.fnc           |    4 +-
 pod/perldelta.pod   |   63 ++++++++++++++++++--
 pod/perlunicode.pod |    9 ++-
 proto.h             |    2 +-
 regcomp.h           |   29 ++++------
 regexec.c           |  161 +++++++++++++++++++++++++++++----------------------
 6 files changed, 170 insertions(+), 98 deletions(-)

diff --git a/embed.fnc b/embed.fnc
index f900005..edee2f4 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -1790,8 +1790,8 @@ Es        |U8     |regtail_study  |NN struct RExC_state_t 
*pRExC_state \
 ERs    |I32    |regmatch       |NN regmatch_info *reginfo|NN regnode *prog
 ERs    |I32    |regrepeat      |NN const regexp *prog|NN const regnode *p|I32 
max|int depth
 ERs    |I32    |regtry         |NN regmatch_info *reginfo|NN char **startpos
-ERs    |bool   |reginclass     |NULLOK const regexp *prog|NN const regnode 
*n|NN const U8 *p|NULLOK STRLEN *lenp\
-                               |bool do_utf8sv_is_utf8
+ERs    |bool   |reginclass     |NULLOK const regexp * const prog|NN const 
regnode * const n|NN const U8 * const p|NULLOK STRLEN *lenp\
+                               |bool const do_utf8sv_is_utf8
 Es     |CHECKPOINT|regcppush   |I32 parenfloor
 Es     |char*  |regcppop       |NN const regexp *rex
 ERsn   |U8*    |reghop3        |NN U8 *s|I32 off|NN const U8 *lim
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index 203260c..01dffec 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -1,7 +1,7 @@
 =encoding utf8
 
 =for comment
-This has been completed up to 0b6a3b5adbe83, except for:
+This has been completed up to 7cb18e1b02, except for:
 04777d295957ad270188e4debf51b523e07cc5b0
 c565ab54dc649bb62cd4d57149d7b2abb21df5f3
 1c8d11ca3d0ce8bc11562f159b94c2c7e62dea6c
@@ -202,8 +202,6 @@ C<B> has been upgraded from 1.24 to 1.25.
 
 =item *
 
-XXX What should the version be?
-
 C<B::Deparse> has been upgraded from 0.99 to 1.01.
 
 It fixes deparsing of C<our> followed by a variable with funny characters
@@ -265,6 +263,10 @@ C<Math::BigInt::FastCalc> has been upgraded from 0.22 to 
0.23.
 
 =item *
 
+C<mro> has been upgraded from 1.04 to 1.05.
+
+=item *
+
 C<NDBM_File> has been upgraded from 1.09 to 1.10.
 
 This fixes a memory leak when DBM filters are used.
@@ -286,6 +288,12 @@ C<use re "/flags"> pragma.
 
 =item *
 
+C<Safe> has been upgraded from 2.28 to 2.29.
+
+It adds C<&version::vxs::VCMP> to the default share.
+
+=item *
+
 C<SDBM_File> has been upgraded from 1.07 to 1.08.
 
 =item *
@@ -533,10 +541,12 @@ C<cophh_> in L<perlapi>.
 =item *
 
 A stash can now have a list of effective names in addition to its usual
-name. These can be added and deleted via C<hv_ename_add> and
-C<hv_ename_delete>. The first effective name can be accessed via the
-C<HvENAME*> macros defined in F<hv.h>. These new functions and macros are
-I<not> part of the API.
+name. The first effective name can be accessed via the C<HvENAME> macro,
+which is now the recommended name to use in MRO linearisations (C<HvNAME>
+being a fallback if there is no C<HvENAME>).
+
+These names are added and deleted via C<hv_ename_add> and
+C<hv_ename_delete>. These two functions are I<not> part of the API.
 
 =item *
 
@@ -646,6 +656,45 @@ now fixed for regular expressions compiled under the 
C<"u"> modifier.
 See L</C<use feature "unicode_strings"> now applies to more regex matching>.
 L<[perl #18281]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=18281>.
 
+=item *
+
+Concatenating long strings under C<use encoding> no longer causes perl to
+crash
+L<[perl #78674]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=78674>.
+
+=item *
+
+Typeglob assignments would crash if the glob's stash no longer existed, if
+the glob assigned to was named 'ISA' or the glob on either side of the
+assignment contained a subroutine.
+
+=item *
+
+Calling C<< ->import >> on a class lacking an import method could corrupt the 
stack result in strange behaviour. For instance,
+
+  push @a, "foo", $b = bar->import;
+
+would assign 'foo' to C<$b>
+L<[perl #63790]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=63790>.
+
+=item *
+
+Creating an alias to a package when that package had been detached from the
+symbol table would result in corrupted isa caches
+L<[perl #77358]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=77358>.
+
+=item *
+
+C<.=> followed by C<< <> >> or C<readline> would leak memory if C<$/>
+contained characters beyond the octet range and the scalar assigned to
+happened to be encoded as UTF8 internally
+L<[perl #72246]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=72246>.
+
+=item *
+
+The C<recv> function could crash when called with the MSG_TRUNC flag
+L<[perl #75082]|http://rt.perl.org/rt3/Public/Bug/Display.html?id=75082>.
+
 =back
 
 =head1 Known Problems
diff --git a/pod/perlunicode.pod b/pod/perlunicode.pod
index dfd6d42..b9a43c3 100644
--- a/pod/perlunicode.pod
+++ b/pod/perlunicode.pod
@@ -1371,7 +1371,7 @@ for more discussion of the issues.
 =head2 Locales
 
 Usually locale settings and Unicode do not affect each other, but
-there are a couple of exceptions:
+there are exceptions:
 
 =over 4
 
@@ -1386,7 +1386,12 @@ variable, see L<perlrun> for the documentation of the 
C<-C> switch.
 
 Perl tries really hard to work both with Unicode and the old
 byte-oriented world. Most often this is nice, but sometimes Perl's
-straddling of the proverbial fence causes problems.
+straddling of the proverbial fence causes problems.  Here's an example
+of how things can go wrong.  A locale can define a code point to be
+anything it wants.  It could make 'A' into a control character, for example.
+But strings encoded in utf8 always have Unicode semantics, so an 'A' in
+such a string is always an uppercase letter, never a control, no matter
+what the locale says it should be.
 
 =back
 
diff --git a/proto.h b/proto.h
index 0027180..8cc3281 100644
--- a/proto.h
+++ b/proto.h
@@ -6425,7 +6425,7 @@ STATIC U8*        S_reghopmaybe3(U8 *s, I32 off, const U8 
*lim)
 #define PERL_ARGS_ASSERT_REGHOPMAYBE3  \
        assert(s); assert(lim)
 
-STATIC bool    S_reginclass(pTHX_ const regexp *prog, const regnode *n, const 
U8 *p, STRLEN *lenp, bool do_utf8sv_is_utf8)
+STATIC bool    S_reginclass(pTHX_ const regexp * const prog, const regnode * 
const n, const U8 * const p, STRLEN *lenp, bool const do_utf8sv_is_utf8)
                        __attribute__warn_unused_result__
                        __attribute__nonnull__(pTHX_2)
                        __attribute__nonnull__(pTHX_3);
diff --git a/regcomp.h b/regcomp.h
index 9e1a4e3..8f0b828 100644
--- a/regcomp.h
+++ b/regcomp.h
@@ -204,15 +204,16 @@ struct regnode_charclass {
     U8 flags;
     U8  type;
     U16 next_off;
-    U32 arg1;
+    U32 arg1;                          /* used as ptr in S_regclass */
     char bitmap[ANYOF_BITMAP_SIZE];    /* only compile-time */
 };
 
-struct regnode_charclass_class {       /* has [[:blah:]] classes */
-    U8 flags;                          /* should have ANYOF_CLASS here */
+/* has runtime (locale) \d, \w, ..., [:posix:] classes */
+struct regnode_charclass_class {
+    U8 flags;                          /* ANYOF_CLASS bit must go here */
     U8  type;
     U16 next_off;
-    U32 arg1;
+    U32 arg1;                                  /* used as ptr in S_regclass */
     char bitmap[ANYOF_BITMAP_SIZE];            /* both compile-time */
     char classflags[ANYOF_CLASSBITMAP_SIZE];   /* and run-time */
 };
@@ -271,7 +272,7 @@ struct regnode_charclass_class {    /* has [[:blah:]] 
classes */
 #undef STRING
 
 #define        OP(p)           ((p)->type)
-#define FLAGS(p)       ((p)->flags)    /* Caution: Doesn't apply to all
+#define FLAGS(p)       ((p)->flags)    /* Caution: Doesn't apply to all \
                                           regnode types */
 #define        OPERAND(p)      (((struct regnode_string *)p)->string)
 #define MASK(p)                ((char*)OPERAND(p))
@@ -313,23 +314,17 @@ struct regnode_charclass_class {  /* has [[:blah:]] 
classes */
 
 /* Flags for node->flags of ANYOF */
 
-#define ANYOF_CLASS            0x08    /* has [[:blah:]] classes */
+#define ANYOF_CLASS            0x08    /* has runtime \d, \w, [:posix:], ... */
+#define ANYOF_LARGE      ANYOF_CLASS    /* Same; name retained for back compat 
*/
 #define ANYOF_INVERT           0x04
 #define ANYOF_FOLD             0x02
 #define ANYOF_LOCALE           0x01
 
-/* Used for regstclass only */
-#define ANYOF_EOS              0x10            /* Can match an empty string 
too */
-
-/* There is a character or a range past 0xff */
-#define ANYOF_UNICODE          0x20
-#define ANYOF_UNICODE_ALL      0x40    /* Can match any char past 0xff */
-
-/* size of node is large (includes class pointer) */
-#define ANYOF_LARGE            0x80
+/* EOS used for regstclass only */
+#define ANYOF_EOS              0x10    /* Can match an empty string too */
 
-/* Are there any runtime flags on in this node? */
-#define ANYOF_RUNTIME(s)       (ANYOF_FLAGS(s) & 0x0f)
+#define ANYOF_UNICODE          0x20    /* Matches >= one thing past 0xff */
+#define ANYOF_UNICODE_ALL      0x40    /* Matches 0x100 - infinity */
 
 #define ANYOF_FLAGS_ALL                0xff
 
diff --git a/regexec.c b/regexec.c
index 842afaf..2c9b852 100644
--- a/regexec.c
+++ b/regexec.c
@@ -94,7 +94,11 @@
 #define        STATIC  static
 #endif
 
-#define REGINCLASS(prog,p,c)  (ANYOF_FLAGS(p) ? reginclass(prog,p,c,0,0) : 
ANYOF_BITMAP_TEST(p,*(c)))
+/* Valid for non-utf8 strings only: avoids the reginclass call if there are no
+ * complications: i.e., if everything matchable is straight forward in the
+ * bitmap */
+#define REGINCLASS(prog,p,c)  (ANYOF_FLAGS(p) ? reginclass(prog,p,c,0,0)   \
+                                             : ANYOF_BITMAP_TEST(p,*(c)))
 
 /*
  * Forwards.
@@ -3628,22 +3632,22 @@ S_regmatch(pTHX_ regmatch_info *reginfo, regnode *prog)
        case ANYOF:
            if (utf8_target) {
                STRLEN inclasslen = PL_regeol - locinput;
+               if (locinput >= PL_regeol)
+                   sayNO;
 
                if (!reginclass(rex, scan, (U8*)locinput, &inclasslen, 
utf8_target))
                    goto anyof_fail;
-               if (locinput >= PL_regeol)
-                   sayNO;
-               locinput += inclasslen ? inclasslen : UTF8SKIP(locinput);
+               locinput += inclasslen;
                nextchr = UCHARAT(locinput);
                break;
            }
            else {
                if (nextchr < 0)
                    nextchr = UCHARAT(locinput);
-               if (!REGINCLASS(rex, scan, (U8*)locinput))
-                   goto anyof_fail;
                if (!nextchr && locinput >= PL_regeol)
                    sayNO;
+               if (!REGINCLASS(rex, scan, (U8*)locinput))
+                   goto anyof_fail;
                nextchr = UCHARAT(++locinput);
                break;
            }
@@ -6180,91 +6184,60 @@ Perl_regclass_swash(pTHX_ const regexp *prog, register 
const regnode* node, bool
 /*
  - reginclass - determine if a character falls into a character class
  
-  The n is the ANYOF regnode, the p is the target string, lenp
-  is pointer to the maximum length of how far to go in the p
-  (if the lenp is zero, UTF8SKIP(p) is used),
-  utf8_target tells whether the target string is in UTF-8.
+  n is the ANYOF regnode
+  p is the target string
+  lenp is pointer to the maximum number of bytes of how far to go in p
+    (This is assumed wthout checking to always be at least the current
+    character's size)
+  utf8_target tells whether p is in UTF-8.
+
+  Returns true if matched; false otherwise.  If lenp is not NULL, on return
+  from a successful match, the value it points to will be updated to how many
+  bytes in p were matched.  If there was no match, the value is undefined,
+  possibly changed from the input.
 
  */
 
 STATIC bool
-S_reginclass(pTHX_ const regexp *prog, register const regnode *n, register 
const U8* p, STRLEN* lenp, register bool utf8_target)
+S_reginclass(pTHX_ const regexp * const prog, register const regnode * const 
n, register const U8* const p, STRLEN* lenp, register const bool utf8_target)
 {
     dVAR;
     const char flags = ANYOF_FLAGS(n);
     bool match = FALSE;
     UV c = *p;
-    STRLEN len = 0;
-    STRLEN plen;
+    STRLEN c_len = 0;
+    STRLEN maxlen;
 
     PERL_ARGS_ASSERT_REGINCLASS;
 
+    /* If c is not already the code point, get it */
     if (utf8_target && !UTF8_IS_INVARIANT(c)) {
-       c = utf8n_to_uvchr(p, UTF8_MAXBYTES, &len,
+       c = utf8n_to_uvchr(p, UTF8_MAXBYTES, &c_len,
                (UTF8_ALLOW_DEFAULT & UTF8_ALLOW_ANYUV)
                | UTF8_ALLOW_FFFF | UTF8_CHECK_ONLY);
                /* see [perl #37836] for UTF8_ALLOW_ANYUV; [perl #38293] for
                 * UTF8_ALLOW_FFFF */
-       if (len == (STRLEN)-1) 
+       if (c_len == (STRLEN)-1)
            Perl_croak(aTHX_ "Malformed UTF-8 character (fatal)");
     }
+    else {
+       c_len = 1;
+    }
 
-    plen = lenp ? *lenp : UNISKIP(NATIVE_TO_UNI(c));
-    if (utf8_target || (flags & ANYOF_UNICODE)) {
-        if (lenp)
-           *lenp = 0;
-       if (utf8_target && !ANYOF_RUNTIME(n)) {
-           if (len != (STRLEN)-1 && c < 256 && ANYOF_BITMAP_TEST(n, c))
-               match = TRUE;
-       }
-       if (!match && utf8_target && (flags & ANYOF_UNICODE_ALL) && c >= 256)
-           match = TRUE;
-       if (!match) {
-           AV *av;
-           SV * const sw = regclass_swash(prog, n, TRUE, 0, (SV**)&av);
-       
-           if (sw) {
-               U8 * utf8_p;
-               if (utf8_target) {
-                   utf8_p = (U8 *) p;
-               } else {
-                   STRLEN len = 1;
-                   utf8_p = bytes_to_utf8(p, &len);
-               }
-               if (swash_fetch(sw, utf8_p, 1))
-                   match = TRUE;
-               else if (flags & ANYOF_FOLD) {
-                   if (!match && lenp && av) {
-                       I32 i;
-                       for (i = 0; i <= av_len(av); i++) {
-                           SV* const sv = *av_fetch(av, i, FALSE);
-                           STRLEN len;
-                           const char * const s = SvPV_const(sv, len);
-                           if (len <= plen && memEQ(s, (char*)utf8_p, len)) {
-                               *lenp = len;
-                               match = TRUE;
-                               break;
-                           }
-                       }
-                   }
-                   if (!match) {
-                       U8 tmpbuf[UTF8_MAXBYTES_CASE+1];
-
-                       STRLEN tmplen;
-                       to_utf8_fold(utf8_p, tmpbuf, &tmplen);
-                       if (swash_fetch(sw, tmpbuf, 1))
-                           match = TRUE;
-                   }
-               }
+    /* Use passed in max length, or one character if none passed in or less
+     * than one character.  And assume will match just one character.  This is
+     * overwritten later if matched more. */
+    if (lenp) {
+       maxlen = (*lenp > c_len) ? *lenp : c_len;
+       *lenp = c_len;
 
-               /* If we allocated a string above, free it */
-               if (! utf8_target) Safefree(utf8_p);
-           }
-       }
-       if (match && lenp && *lenp == 0)
-           *lenp = UNISKIP(NATIVE_TO_UNI(c));
     }
-    if (!match && c < 256) {
+    else {
+       maxlen = c_len;
+    }
+
+    /* If this character is potentially in the bitmap, check it */
+    if (c < 256) {
        if (ANYOF_BITMAP_TEST(n, c))
            match = TRUE;
        else if (flags & ANYOF_FOLD) {
@@ -6320,6 +6293,56 @@ S_reginclass(pTHX_ const regexp *prog, register const 
regnode *n, register const
        }
     }
 
+    /* If the bitmap didn't (or couldn't) match, and something outside the
+     * bitmap could match, try that */
+    if (! match && utf8_target || (flags & ANYOF_UNICODE)) {
+       if (utf8_target && (flags & ANYOF_UNICODE_ALL) && c >= 256) {
+           match = TRUE;
+       }
+       else {
+           AV *av;
+           SV * const sw = regclass_swash(prog, n, TRUE, 0, (SV**)&av);
+
+           if (sw) {
+               U8 * utf8_p;
+               if (utf8_target) {
+                   utf8_p = (U8 *) p;
+               } else {
+                   STRLEN len = 1;
+                   utf8_p = bytes_to_utf8(p, &len);
+               }
+               if (swash_fetch(sw, utf8_p, 1))
+                   match = TRUE;
+               else if (flags & ANYOF_FOLD) {
+                   if (!match && lenp && av) {
+                       I32 i;
+                       for (i = 0; i <= av_len(av); i++) {
+                           SV* const sv = *av_fetch(av, i, FALSE);
+                           STRLEN len;
+                           const char * const s = SvPV_const(sv, len);
+                           if (len <= maxlen && memEQ(s, (char*)utf8_p, len)) {
+                               *lenp = len;
+                               match = TRUE;
+                               break;
+                           }
+                       }
+                   }
+                   if (!match) {
+                       U8 tmpbuf[UTF8_MAXBYTES_CASE+1];
+
+                       STRLEN tmplen;
+                       to_utf8_fold(utf8_p, tmpbuf, &tmplen);
+                       if (swash_fetch(sw, tmpbuf, 1))
+                           match = TRUE;
+                   }
+               }
+
+               /* If we allocated a string above, free it */
+               if (! utf8_target) Safefree(utf8_p);
+           }
+       }
+    }
+
     return (flags & ANYOF_INVERT) ? !match : match;
 }
 

--
Perl5 Master Repository

[perl.git] branch blead, updated. v5.13.6-306-g9e92194

Reply via email to