In perl.git, the branch blead has been updated

<http://perl5.git.perl.org/perl.git/commitdiff/c13a5c80de2334e935eada0927b9f5f7c862a45e?hp=258a58987ee2cc2be52e4eec4f8e68af1693368b>

- Log -----------------------------------------------------------------
commit c13a5c80de2334e935eada0927b9f5f7c862a45e
Author: Nicholas Clark <[email protected]>
Date:   Tue May 24 15:11:53 2011 +0100

    Store FBMs in PVMGs, instead of GVs.
    
    This should reduce the complexity of code dealing with GVs, as they no 
longer
    try to play several different incompatible roles.
    
    (As suggested by Ben Morrow. However, it didn't turn out to be as
    straightforward as one might have hoped).

M       dump.c
M       ext/B/t/optree_misc.t
M       ext/Devel-Peek/t/Peek.t
M       pod/perldelta.pod
M       sv.c
M       sv.h
M       util.c

commit 2bda37bab5fb768caff2b228fda376b75df4815c
Author: Nicholas Clark <[email protected]>
Date:   Wed May 18 11:45:22 2011 +0100

    Store the BM table in mg_ptr instead of after SvCUR().
    
    Previously the 256 byte Boyer-Moore table was stored in the buffer of 
SvPVX()
    after the raw string by extending the buffer.
    
    Given that the scalar is alway upgraded to add PERL_MAGIC_bm magic, to clear
    the table and other flags, there's no extra memory cost in using mg_ptr in 
the
    MAGIC struct to point directly to the table.
    
    I believe that this removes the last place in the core that stores data 
beyond
    SvCUR().

M       ext/B/B.xs
M       ext/Devel-Peek/t/Peek.t
M       sv.h
M       util.c

commit 9402563ac1ee6a7649763b93342cb2940addf915
Author: Nicholas Clark <[email protected]>
Date:   Wed May 18 10:49:43 2011 +0100

    Exit early from Perl_fbm_compile() if the SV is already "compiled".
    
    I believe that this can only happen if a constant subroutine is used more 
than
    once as the second argument to index.

M       t/op/index.t
M       util.c

commit 21aeb718e9b2ffc24ed853a11c571efa7fc3555d
Author: Nicholas Clark <[email protected]>
Date:   Tue May 17 13:16:57 2011 +0100

    In Perl_fbm_instr(), use a switch() statement for the special case code.
    
    Previously the special-case code for lengths 0, 1 and 2 was in a nested set
    of if() statements, which was slightly cryptic to read.

M       util.c

commit ea725ce6fd46ebc3bd7040373b9c0721af8813f8
Author: Nicholas Clark <[email protected]>
Date:   Tue May 17 12:28:36 2011 +0100

    In Perl_fbm_compile(), use STRLEN instead of U32 to calculate BmPREVIOUS().
    
    This should fix a theoretical bug on strings longer than 2**32 bytes where 
the
    byte referenced by BmRARE() is at an offset beyond 2**32. I'm not sure how 
to
    test this, as I think to trigger it one would need to have one of
    
    a: the second argument to index as a string literal, longer than 2**32 bytes
    b: a fixed string in a regex, longer than 2**32 bytes

M       util.c

commit ced454951435adad2176a9e62dc76e463b1a6406
Author: Nicholas Clark <[email protected]>
Date:   Tue May 17 10:26:49 2011 +0100

    Abolish xbm_rare. Move BmUSEFUL() to union _xnvu and BmPREVIOUS() to the UV.
    
    This reduces the complexity of the union declarations in sv.h.
    
    As B.xs is accessing the structures/unions directly, instead of using the
    macros, it needs a patch too.

M       ext/B/B.xs
M       sv.h

commit cffe132d3937f3dc01397f5375d368ad39d53ab7
Author: Nicholas Clark <[email protected]>
Date:   Mon May 9 16:53:28 2011 +0100

    Use SvTAIL() instead of BmFLAGS(). The core no longer uses BmFLAGS().

M       dump.c
M       ext/Devel-Peek/t/Peek.t
M       regexec.c
M       sv.h
M       util.c

commit 6976cee33e524456a81f646a3fa65f279c6c190d
Author: Nicholas Clark <[email protected]>
Date:   Mon May 9 12:07:17 2011 +0100

    Emulate the value of BmFLAGS() using SvTAIL().
    
    Don't set BmFLAGS() in Perl_fbm_compile()
    
    Originally fbm_compile() had an I32 flags argument, which seems to have been
    part of case folding/locale improvements. bbce6d69784bf43b removed this.
    SvTAIL() was only used in once place until c277df42229d99fe. 
2779dcf1a3ceec16
    added the U32 flags argument to fbm_compile(), not used until 
cf93c79d660ae36c.
    That commit also added FBMcf_TAIL and FBMcf_TAIL{z,Z,DOLLAR} but didn't use 
the
    last three. Additionally, it stored the BmFLAGS as part of the compiled 
table:
    
    +       table[-1] = flags;              /* Not used yet */
    
    f722798beaa43749 added FBMcf_TAIL_DOLLARM, renumbered 
FBMcf_TAIL{z,Z,DOLLAR},
    but still didn't use anything other than FBMcf_TAIL.
    
    The core, nothing on CPAN, and nothing visible to Google codesearch, has 
ever
    used the 4 specialist flags. The only use is 0 or FBMcf_TAIL, which is in
    lockstep with SvTAIL() of 0 or non-0.

M       sv.h
M       util.c

commit e1dcbbca67be0410a61078d0abb9ff53fb4b3c0a
Author: Nicholas Clark <[email protected]>
Date:   Tue May 24 11:17:13 2011 +0100

    Use 0x40008000 in SvFLAGS() for SVpad_NAME, not 0x40000000
    
    This eliminates potential confusion between SVpad_NAME and SVpbm_VALID.

M       sv.h

commit 1979170bfeba87d234d4d34023c2ea5feeb22f48
Author: Nicholas Clark <[email protected]>
Date:   Tue May 24 10:59:01 2011 +0100

    Test that SvFLAGS() & SVpad_NAME is SVpad_NAME, not just non-zero.
    
    In Perl_find_rundefsv() and PAD_COMPNAME_FLAGS_isOUR(), replace longhand 
flags
    test with SvPAD_OUR().

M       dump.c
M       pad.c
M       pad.h

commit 4265b45d3e4359a48158b96f90aa017ade60632c
Author: Nicholas Clark <[email protected]>
Date:   Mon May 23 21:52:40 2011 +0100

    Don't fbm_compile() studied scalars, to give more flexibility in SV flag 
usage.
    
    No real-world code would ever end up using a studied scalar as a 
compile-time
    second argument to index, so this isn't a real pessimisation.

M       ext/Devel-Peek/t/Peek.t
M       util.c

commit bc9a525687b7ccee10de7bc6f6a729645b1058dd
Author: Nicholas Clark <[email protected]>
Date:   Mon May 23 21:36:06 2011 +0100

    Don't allow study on an FBM scalar, to give more flexibility in SV flag 
usage.
    
    No real-world code would ever end up studying an FBM scalar, so this isn't a
    real pessimisation.

M       ext/Devel-Peek/t/Peek.t
M       pp.c

commit ccbcbb3d6584c01ff4c18f0e39e86d6eb181eb4d
Author: Nicholas Clark <[email protected]>
Date:   Mon May 23 21:18:51 2011 +0100

    Test dumping studied scalars.

M       ext/Devel-Peek/t/Peek.t

commit 0a0c4b76b41cff6aaa3670fa19292a3e9248e69e
Author: Nicholas Clark <[email protected]>
Date:   Mon May 23 21:01:37 2011 +0100

    Perl_do_sv_dump() shouldn't show "IV" for a FBM, as it's not valid.
    
    The memory is used for part of the FBM state.
    
    Tidy the order of conditions in the if() determining whether the IV/UV 
should
    be shown.

M       dump.c
M       ext/Devel-Peek/t/Peek.t

commit a672f009fb8f223715e97dfcac7fb84e4bb2904b
Author: Nicholas Clark <[email protected]>
Date:   Mon May 23 18:14:45 2011 +0100

    In Perl_sv_2[inu]v_flags(), use the non-caching code whenever SvVALID() is 
true
    
    Previous the non-caching code was only used when SvVALID() was true on a 
PVGV.
    However, PVLVs can also perform all the roles of a PVGV, so could 
conceivably
    be acting as FBMs. As it's safe to test SvVALID() on any scalar SV, do so, 
as
    the compiler can combine the flag test for SvVALID() with that for
    SvGMAGICAL(), producing tighter object code.

M       sv.c
-----------------------------------------------------------------------

Summary of changes:
 dump.c                  |   28 +++++++------
 ext/B/B.xs              |   24 ++++++++++-
 ext/B/t/optree_misc.t   |    4 +-
 ext/Devel-Peek/t/Peek.t |  106 +++++++++++++++++++++++++++++++++++++++++++++++
 pad.c                   |    3 +-
 pad.h                   |    3 +-
 pod/perldelta.pod       |   15 +++++--
 pp.c                    |    7 ++-
 regexec.c               |   19 ++++----
 sv.c                    |    8 +--
 sv.h                    |   53 +++++++++++++----------
 t/op/index.t            |    8 +++-
 util.c                  |   82 ++++++++++++++++++++++++------------
 13 files changed, 268 insertions(+), 92 deletions(-)

diff --git a/dump.c b/dump.c
index 9624970..c3c07b2 100644
--- a/dump.c
+++ b/dump.c
@@ -1601,10 +1601,11 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
                   (int)(PL_dumpindent*level), "", (IV)SvREFCNT(sv),
                   (int)(PL_dumpindent*level), "");
 
-    if (!(flags & SVpad_NAME && (type == SVt_PVMG || type == SVt_PVNV))) {
+    if (!((flags & SVpad_NAME) == SVpad_NAME
+         && (type == SVt_PVMG || type == SVt_PVNV))) {
        if (flags & SVs_PADSTALE)       sv_catpv(d, "PADSTALE,");
     }
-    if (!(flags & SVpad_NAME && type == SVt_PVMG)) {
+    if (!((flags & SVpad_NAME) == SVpad_NAME && type == SVt_PVMG)) {
        if (flags & SVs_PADTMP) sv_catpv(d, "PADTMP,");
        if (flags & SVs_PADMY)  sv_catpv(d, "PADMY,");
     }
@@ -1646,8 +1647,6 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
                sv_catpv(d, " ),");
            }
        }
-       if (SvTAIL(sv))         sv_catpv(d, "TAIL,");
-       if (SvVALID(sv))        sv_catpv(d, "VALID,");
        /* FALL THROUGH */
     default:
     evaled_or_uv:
@@ -1655,6 +1654,8 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
        if (SvIsUV(sv) && !(flags & SVf_ROK))   sv_catpv(d, "IsUV,");
        break;
     case SVt_PVMG:
+       if (SvTAIL(sv))         sv_catpv(d, "TAIL,");
+       if (SvVALID(sv))        sv_catpv(d, "VALID,");
        if (SvPAD_TYPED(sv))    sv_catpv(d, "TYPED,");
        if (SvPAD_OUR(sv))      sv_catpv(d, "OUR,");
        /* FALL THROUGH */
@@ -1708,8 +1709,8 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
     /* Dump general SV fields */
 
     if ((type >= SVt_PVIV && type != SVt_PVAV && type != SVt_PVHV
-        && type != SVt_PVCV && !isGV_with_GP(sv) && type != SVt_PVFM
-        && type != SVt_PVIO && type != SVt_REGEXP)
+        && type != SVt_PVCV && type != SVt_PVFM && type != SVt_PVIO
+        && type != SVt_REGEXP && !isGV_with_GP(sv) && !SvVALID(sv))
        || (type == SVt_IV && !SvROK(sv))) {
        if (SvIsUV(sv)
 #ifdef PERL_OLD_COPY_ON_WRITE
@@ -1728,7 +1729,8 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
        PerlIO_putc(file, '\n');
     }
 
-    if ((type == SVt_PVNV || type == SVt_PVMG) && SvFLAGS(sv) & SVpad_NAME) {
+    if ((type == SVt_PVNV || type == SVt_PVMG)
+       && (SvFLAGS(sv) & SVpad_NAME) == SVpad_NAME) {
        Perl_dump_indent(aTHX_ level, file, "  COP_LOW = %"UVuf"\n",
                         (UV) COP_SEQ_RANGE_LOW(sv));
        Perl_dump_indent(aTHX_ level, file, "  COP_HIGH = %"UVuf"\n",
@@ -1796,6 +1798,12 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
        }
        if (SvSTASH(sv))
            do_hv_dump(level, file, "  STASH", SvSTASH(sv));
+
+       if ((type == SVt_PVMG || type == SVt_PVLV) && SvVALID(sv)) {
+           Perl_dump_indent(aTHX_ level, file, "  RARE = %u\n", 
(U8)BmRARE(sv));
+           Perl_dump_indent(aTHX_ level, file, "  PREVIOUS = %"UVuf"\n", 
(UV)BmPREVIOUS(sv));
+           Perl_dump_indent(aTHX_ level, file, "  USEFUL = %"IVdf"\n", 
(IV)BmUSEFUL(sv));
+       }
     }
 
     /* Dump type-specific SV fields */
@@ -2087,12 +2095,6 @@ Perl_do_sv_dump(pTHX_ I32 level, PerlIO *file, SV *sv, 
I32 nest, I32 maxnest, bo
                do_sv_dump(level+1, file, LvTARG(sv), nest+1, maxnest,
                    dumpops, pvlim);
        }
-       if (SvVALID(sv)) {
-           Perl_dump_indent(aTHX_ level, file, "  FLAGS = %u\n", 
(U8)BmFLAGS(sv));
-           Perl_dump_indent(aTHX_ level, file, "  RARE = %u\n", 
(U8)BmRARE(sv));
-           Perl_dump_indent(aTHX_ level, file, "  PREVIOUS = %"UVuf"\n", 
(UV)BmPREVIOUS(sv));
-           Perl_dump_indent(aTHX_ level, file, "  USEFUL = %"IVdf"\n", 
(IV)BmUSEFUL(sv));
-       }
        if (!isGV_with_GP(sv))
            break;
        Perl_dump_indent(aTHX_ level, file, "  NAME = \"%s\"\n", GvNAME(sv));
diff --git a/ext/B/B.xs b/ext/B/B.xs
index f4d5fea..44f8402 100644
--- a/ext/B/B.xs
+++ b/ext/B/B.xs
@@ -1370,8 +1370,13 @@ MODULE = B       PACKAGE = B::IV
 #define PVMG_stash_ix  sv_SVp | offsetof(struct xpvmg, xmg_stash)
 
 #if PERL_VERSION >= 10
+#  if PERL_VERSION > 14
+#    define PVBM_useful_ix     sv_I32p | offsetof(struct xpvgv, 
xnv_u.xbm_s.xbm_useful)
+#    define PVBM_previous_ix   sv_UVp | offsetof(struct xpvuv, xuv_uv)
+#  else
 #define PVBM_useful_ix sv_I32p | offsetof(struct xpvgv, xiv_u.xivu_i32)
 #define PVBM_previous_ix    sv_U32p | offsetof(struct xpvgv, 
xnv_u.xbm_s.xbm_previous)
+#  endif
 #define PVBM_rare_ix   sv_U8p | offsetof(struct xpvgv, xnv_u.xbm_s.xbm_rare)
 #else
 #define PVBM_useful_ix sv_I32p | offsetof(struct xpvbm, xbm_useful)
@@ -1614,10 +1619,19 @@ PV(sv)
        U32 utf8 = 0;
     CODE:
        if (ix == 3) {
+#ifndef PERL_FBM_TABLE_OFFSET
+           const MAGIC *const mg = mg_find(sv, PERL_MAGIC_bm);
+
+           if (!mg)
+                croak("argument to B::BM::TABLE is not a PVBM");
+           p = mg->mg_ptr;
+           len = mg->mg_len;
+#else
            p = SvPV(sv, len);
            /* Boyer-Moore table is just after string and its safety-margin \0 
*/
            p += len + PERL_FBM_TABLE_OFFSET;
            len = 256;
+#endif
        } else if (ix == 2) {
            /* This used to read 257. I think that that was buggy - should have
               been 258. (The "\0", the flags byte, and 256 for the table.)
@@ -1630,9 +1644,17 @@ PV(sv)
               first used by the compiler in 651aa52ea1faa806. It's used to
               get a "complete" dump of the buffer at SvPVX(), not just the
               PVBM table. This permits the generated bytecode to "load"
-              SvPVX in "one" hit.  */
+              SvPVX in "one" hit.
+
+              5.15 and later store the BM table via MAGIC, so the compiler
+              should handle this just fine without changes if PVBM now
+              always returns the SvPVX() buffer.  */
            p = SvPVX_const(sv);
+#ifdef PERL_FBM_TABLE_OFFSET
            len = SvCUR(sv) + (SvVALID(sv) ? 256 + PERL_FBM_TABLE_OFFSET : 0);
+#else
+           len = SvCUR(sv);
+#endif
        } else if (ix) {
            p = SvPVX(sv);
            len = strlen(p);
diff --git a/ext/B/t/optree_misc.t b/ext/B/t/optree_misc.t
index 4c3ea14..5e16b92 100644
--- a/ext/B/t/optree_misc.t
+++ b/ext/B/t/optree_misc.t
@@ -95,7 +95,7 @@ my $t = <<'EOT_EOT';
 # 5        <@> index[t2] sK/2 ->6
 # -           <0> ex-pushmark s ->3
 # 3           <$> const[PV "foo"] s ->4
-# 4           <$> const[GV "foo"] s ->5
+# 4           <$> const[PVMG "foo"] s ->5
 # -        <1> ex-rv2sv sKRM*/1 ->7
 # 6           <#> gvsv[*_] s ->7
 EOT_EOT
@@ -107,7 +107,7 @@ my $nt = <<'EONT_EONT';
 # 5        <@> index[t1] sK/2 ->6
 # -           <0> ex-pushmark s ->3
 # 3           <$> const(PV "foo") s ->4
-# 4           <$> const(GV "foo") s ->5
+# 4           <$> const(PVMG "foo") s ->5
 # -        <1> ex-rv2sv sKRM*/1 ->7
 # 6           <$> gvsv(*_) s ->7
 EONT_EONT
diff --git a/ext/Devel-Peek/t/Peek.t b/ext/Devel-Peek/t/Peek.t
index 8eedf53..7c6e985 100644
--- a/ext/Devel-Peek/t/Peek.t
+++ b/ext/Devel-Peek/t/Peek.t
@@ -778,4 +778,110 @@ SKIP: {
      or diag $@;
 }
 
+# This is more a test of fbm_compile/pp_study (non) interaction than dumping
+# prowess, but short of duplicating all the gubbins of this file, I can't see
+# a way to make a better place for it:
+
+use constant {
+    perl => 'rules',
+    beer => 'foamy',
+};
+
+unless ($Config{useithreads}) {
+    # These end up as copies in pads under ithreads, which rather defeats the
+    # the point of what we're trying to test here.
+
+    do_test('regular string constant', perl,
+'SV = PV\\($ADDR\\) at $ADDR
+  REFCNT = 5
+  FLAGS = \\(PADMY,POK,READONLY,pPOK\\)
+  PV = $ADDR "rules"\\\0
+  CUR = 5
+  LEN = \d+
+');
+
+    eval 'index "", perl';
+
+    # FIXME - really this shouldn't say EVALED. It's a false posistive on
+    # 0x40000000 being used for several things, not a flag for "I'm in a string
+    # eval"
+
+    do_test('string constant now an FBM', perl,
+'SV = PVMG\\($ADDR\\) at $ADDR
+  REFCNT = 5
+  FLAGS = \\(PADMY,SMG,POK,READONLY,pPOK,VALID,EVALED\\)
+  PV = $ADDR "rules"\\\0
+  CUR = 5
+  LEN = \d+
+  MAGIC = $ADDR
+    MG_VIRTUAL = &PL_vtbl_bm
+    MG_TYPE = PERL_MAGIC_bm\\(B\\)
+    MG_LEN = 256
+    MG_PTR = $ADDR "(?:\\\\\d){256}"
+  RARE = \d+
+  PREVIOUS = 1
+  USEFUL = 100
+');
+
+    is(study perl, '', "Not allowed to study an FBM");
+
+    do_test('string constant still an FBM', perl,
+'SV = PVMG\\($ADDR\\) at $ADDR
+  REFCNT = 5
+  FLAGS = \\(PADMY,SMG,POK,READONLY,pPOK,VALID,EVALED\\)
+  PV = $ADDR "rules"\\\0
+  CUR = 5
+  LEN = \d+
+  MAGIC = $ADDR
+    MG_VIRTUAL = &PL_vtbl_bm
+    MG_TYPE = PERL_MAGIC_bm\\(B\\)
+    MG_LEN = 256
+    MG_PTR = $ADDR "(?:\\\\\d){256}"
+  RARE = \d+
+  PREVIOUS = 1
+  USEFUL = 100
+');
+
+    do_test('regular string constant', beer,
+'SV = PV\\($ADDR\\) at $ADDR
+  REFCNT = 5
+  FLAGS = \\(PADMY,POK,READONLY,pPOK\\)
+  PV = $ADDR "foamy"\\\0
+  CUR = 5
+  LEN = \d+
+');
+
+    is(study beer, 1, "Our studies were successful");
+
+    do_test('string constant now studied', beer,
+'SV = PVMG\\($ADDR\\) at $ADDR
+  REFCNT = 6
+  FLAGS = \\(PADMY,SMG,POK,READONLY,pPOK,SCREAM\\)
+  IV = 0
+  NV = 0
+  PV = $ADDR "foamy"\\\0
+  CUR = 5
+  LEN = \d+
+  MAGIC = $ADDR
+    MG_VIRTUAL = &PL_vtbl_mglob
+    MG_TYPE = PERL_MAGIC_regex_global\\(g\\)
+');
+
+    is (eval 'index "not too foamy", beer', 8, 'correct index');
+
+    do_test('string constant still studied', beer,
+'SV = PVMG\\($ADDR\\) at $ADDR
+  REFCNT = 6
+  FLAGS = \\(PADMY,SMG,POK,READONLY,pPOK,SCREAM\\)
+  IV = 0
+  NV = 0
+  PV = $ADDR "foamy"\\\0
+  CUR = 5
+  LEN = \d+
+  MAGIC = $ADDR
+    MG_VIRTUAL = &PL_vtbl_mglob
+    MG_TYPE = PERL_MAGIC_regex_global\\(g\\)
+');
+}
+
 done_testing();
diff --git a/pad.c b/pad.c
index ff52eb8..b5ee2bf 100644
--- a/pad.c
+++ b/pad.c
@@ -795,8 +795,7 @@ Perl_find_rundefsv(pTHX)
     po = pad_findlex("$_", find_runcv(NULL), PL_curcop->cop_seq, 1,
            NULL, &namesv, &flags);
 
-    if (po == NOT_IN_PAD
-       || (SvFLAGS(namesv) & (SVpad_NAME|SVpad_OUR)) == (SVpad_NAME|SVpad_OUR))
+    if (po == NOT_IN_PAD || SvPAD_OUR(namesv))
        return DEFSV;
 
     return PAD_SVl(po);
diff --git a/pad.h b/pad.h
index dec267a..7e130d8 100644
--- a/pad.h
+++ b/pad.h
@@ -309,8 +309,7 @@ ling pad (lvalue) to C<gen>.  Note that C<SvUV_set> is 
hijacked for this purpose
 
 #define PAD_COMPNAME_SV(po) (*av_fetch(PL_comppad_name, (po), FALSE))
 #define PAD_COMPNAME_FLAGS(po) SvFLAGS(PAD_COMPNAME_SV(po))
-#define PAD_COMPNAME_FLAGS_isOUR(po) \
-  ((PAD_COMPNAME_FLAGS(po) & (SVpad_NAME|SVpad_OUR)) == (SVpad_NAME|SVpad_OUR))
+#define PAD_COMPNAME_FLAGS_isOUR(po) SvPAD_OUR(PAD_COMPNAME_SV(po))
 #define PAD_COMPNAME_PV(po) SvPV_nolen(PAD_COMPNAME_SV(po))
 
 #define PAD_COMPNAME_TYPE(po) pad_compname_type(po)
diff --git a/pod/perldelta.pod b/pod/perldelta.pod
index bea7ea4..7536cae 100644
--- a/pod/perldelta.pod
+++ b/pod/perldelta.pod
@@ -611,6 +611,13 @@ be noted as well.
 
 =item *
 
+When empting a hash of its elements (e.g. via undef(%h), or %h=()), HvARRAY
+field is no longer temporarily zeroed.  Any destructors called on the freed
+elements see the remaining elements. Thus, %h=() becomes more like C<delete
+$h{$_} for keys %h>.
+
+=item *
+
 The compiled representation of formats is now stored via the mg_ptr of
 their PERL_MAGIC_fm. Previously it was stored in the string buffer,
 beyond SvLEN(), the regular end of the string. SvCOMPILED() and
@@ -619,10 +626,10 @@ The first is always 0, the other two now no-ops.
 
 =item *
 
-When empting a hash of its elements (e.g. via undef(%h), or %h=()), HvARRAY
-field is no longer temporarily zeroed.  Any destructors called on the freed
-elements see the remaining elements. Thus, %h=() becomes more like C<delete
-$h{$_} for keys %h>.
+Boyer-Moore compiled scalars are now PVMGs, and the Boyer-Moore tables are now
+stored via the mg_ptr of their PERL_MAGIC_bm. Previously they were PVGVs, with
+the tables stored in the string buffer, beyond SvLEN(). This eliminates the
+last place where the core stores data beyond SvLEN().
 
 =back
 
diff --git a/pp.c b/pp.c
index 3673abd..385f1be 100644
--- a/pp.c
+++ b/pp.c
@@ -718,12 +718,15 @@ PP(pp_study)
            RETPUSHYES;
     }
     s = (unsigned char*)(SvPV(sv, len));
-    if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv)) {
+    if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv) || SvVALID(sv)) {
        /* No point in studying a zero length string, and not safe to study
           anything that doesn't appear to be a simple scalar (and hence might
           change between now and when the regexp engine runs without our set
           magic ever running) such as a reference to an object with overloaded
-          stringification.  */
+          stringification.  Also refuse to study an FBM scalar, as this gives
+          more flexibility in SV flag usage.  No real-world code would ever
+          end up studying an FBM scalar, so this isn't a real pessimisation.
+       */
        RETPUSHNO;
     }
     pos = len;
diff --git a/regexec.c b/regexec.c
index 6b32aa2..7587a7d 100644
--- a/regexec.c
+++ b/regexec.c
@@ -6887,16 +6887,16 @@ S_to_utf8_substr(pTHX_ register regexp *prog)
            prog->substrs->data[i].utf8_substr = sv;
            sv_utf8_upgrade(sv);
            if (SvVALID(prog->substrs->data[i].substr)) {
-               const U8 flags = BmFLAGS(prog->substrs->data[i].substr);
-               if (flags & FBMcf_TAIL) {
+               if (SvTAIL(prog->substrs->data[i].substr)) {
                    /* Trim the trailing \n that fbm_compile added last
                       time.  */
                    SvCUR_set(sv, SvCUR(sv) - 1);
                    /* Whilst this makes the SV technically "invalid" (as its
                       buffer is no longer followed by "\0") when fbm_compile()
                       adds the "\n" back, a "\0" is restored.  */
-               }
-               fbm_compile(sv, flags);
+                   fbm_compile(sv, FBMcf_TAIL);
+               } else
+                   fbm_compile(sv, 0);
            }
            if (prog->substrs->data[i].substr == prog->check_substr)
                prog->check_utf8 = sv;
@@ -6918,15 +6918,14 @@ S_to_byte_substr(pTHX_ register regexp *prog)
            SV* sv = newSVsv(prog->substrs->data[i].utf8_substr);
            if (sv_utf8_downgrade(sv, TRUE)) {
                if (SvVALID(prog->substrs->data[i].utf8_substr)) {
-                   const U8 flags
-                       = BmFLAGS(prog->substrs->data[i].utf8_substr);
-                   if (flags & FBMcf_TAIL) {
+                   if (SvTAIL(prog->substrs->data[i].utf8_substr)) {
                        /* Trim the trailing \n that fbm_compile added last
                           time.  */
                        SvCUR_set(sv, SvCUR(sv) - 1);
-                   }
-                   fbm_compile(sv, flags);
-               }           
+                       fbm_compile(sv, FBMcf_TAIL);
+                   } else
+                       fbm_compile(sv, 0);
+               }
            } else {
                SvREFCNT_dec(sv);
                sv = &PL_sv_undef;
diff --git a/sv.c b/sv.c
index d75bfbe..e0899ba 100644
--- a/sv.c
+++ b/sv.c
@@ -2267,7 +2267,7 @@ Perl_sv_2iv_flags(pTHX_ register SV *const sv, const I32 
flags)
     dVAR;
     if (!sv)
        return 0;
-    if (SvGMAGICAL(sv) || (SvTYPE(sv) == SVt_PVGV && SvVALID(sv))) {
+    if (SvGMAGICAL(sv) || SvVALID(sv)) {
        /* FBMs use the space for SvIVX and SvNVX for other purposes, and use
           the same flag bit as SVf_IVisUV, so must not let them cache IVs.
           In practice they are extremely unlikely to actually get anywhere
@@ -2356,7 +2356,7 @@ Perl_sv_2uv_flags(pTHX_ register SV *const sv, const I32 
flags)
     dVAR;
     if (!sv)
        return 0;
-    if (SvGMAGICAL(sv) || (SvTYPE(sv) == SVt_PVGV && SvVALID(sv))) {
+    if (SvGMAGICAL(sv) || SvVALID(sv)) {
        /* FBMs use the space for SvIVX and SvNVX for other purposes, and use
           the same flag bit as SVf_IVisUV, so must not let them cache IVs.  */
        if (flags & SV_GMAGIC)
@@ -2436,7 +2436,7 @@ Perl_sv_2nv_flags(pTHX_ register SV *const sv, const I32 
flags)
     dVAR;
     if (!sv)
        return 0.0;
-    if (SvGMAGICAL(sv) || (SvTYPE(sv) == SVt_PVGV && SvVALID(sv))) {
+    if (SvGMAGICAL(sv) || SvVALID(sv)) {
        /* FBMs use the space for SvIVX and SvNVX for other purposes, and use
           the same flag bit as SVf_IVisUV, so must not let them cache NVs.  */
        if (flags & SV_GMAGIC)
@@ -4088,8 +4088,6 @@ Perl_sv_setsv_flags(pTHX_ SV *dstr, register SV* sstr, 
const I32 flags)
        /* case SVt_BIND: */
     case SVt_PVLV:
     case SVt_PVGV:
-       /* SvVALID means that this PVGV is playing at being an FBM.  */
-
     case SVt_PVMG:
        if (SvGMAGICAL(sstr) && (flags & SV_GMAGIC)) {
            mg_get(sstr);
diff --git a/sv.h b/sv.h
index 71faf2b..5f58935 100644
--- a/sv.h
+++ b/sv.h
@@ -367,8 +367,21 @@ perform the upgrade if necessary.  See C<svtype>.
 
 /* PVHV */
 #define SVphv_SHAREKEYS 0x20000000  /* PVHV keys live on shared string table */
-/* PVNV, PVMG, presumably only inside pads */
-#define SVpad_NAME     0x40000000  /* This SV is a name in the PAD, so
+
+/* PVNV, PVMG only, and only used in pads. Should be safe to test on any scalar
+   SV, as the core is careful to avoid setting both.
+
+   SVf_POK, SVp_POK also set:
+   0x00004400   Normal
+   0x0000C400   Studied (SvSCREAM)
+   0x40004400   FBM compiled (SvVALID)
+   0x4000C400   pad name.
+
+   0x00008000   GV with GP
+   0x00008800   RV with PCS imported
+*/
+#define SVpad_NAME     (SVp_SCREAM|SVpbm_VALID)
+                                   /* This SV is a name in the PAD, so
                                       SVpad_TYPED, SVpad_OUR and SVpad_STATE
                                       apply */
 /* PVAV */
@@ -406,8 +419,7 @@ union _xnvu {
        U32 xhigh;
     }      xpad_cop_seq;       /* used by pad.c for cop_sequence */
     struct {
-       U32 xbm_previous;       /* how many characters in string before rare? */
-       U8  xbm_flags;
+       I32 xbm_useful;
        U8  xbm_rare;           /* rarest character in string */
     }      xbm_s;              /* fields from PVBM */
 };
@@ -415,7 +427,6 @@ union _xnvu {
 union _xivu {
     IV     xivu_iv;            /* integer value */
     UV     xivu_uv;
-    I32            xivu_i32;           /* BmUSEFUL */
     HEK *   xivu_namehek;      /* xpvlv, xpvgv: GvNAME */
 };
 
@@ -1276,42 +1287,38 @@ the scalar's value cannot change unless written to.
                 } STMT_END
 #endif
 
-#define PERL_FBM_TABLE_OFFSET 1        /* Number of bytes between EOS and 
table */
-
 /* SvPOKp not SvPOK in the assertion because the string can be tainted! eg
    perl -T -e '/$^X/'
 */
+
+#ifndef PERL_CORE
+#  define BmFLAGS(sv)          (SvTAIL(sv) ? FBMcf_TAIL : 0)
+#endif
+
 #if defined (DEBUGGING) && defined(__GNUC__) && 
!defined(PERL_GCC_BRACE_GROUPS_FORBIDDEN)
-#  define BmFLAGS(sv)                                                  \
-       (*({ SV *const _bmflags = MUTABLE_SV(sv);                       \
-               assert(SvTYPE(_bmflags) == SVt_PVGV);                   \
-               assert(SvVALID(_bmflags));                              \
-           &(((XPVGV*) SvANY(_bmflags))->xnv_u.xbm_s.xbm_flags);       \
-        }))
 #  define BmRARE(sv)                                                   \
        (*({ SV *const _bmrare = MUTABLE_SV(sv);                        \
-               assert(SvTYPE(_bmrare) == SVt_PVGV);                    \
+               assert(SvTYPE(_bmrare) == SVt_PVMG);                    \
                assert(SvVALID(_bmrare));                               \
-           &(((XPVGV*) SvANY(_bmrare))->xnv_u.xbm_s.xbm_rare);         \
+           &(((XPVMG*) SvANY(_bmrare))->xnv_u.xbm_s.xbm_rare);         \
         }))
 #  define BmUSEFUL(sv)                                                 \
        (*({ SV *const _bmuseful = MUTABLE_SV(sv);                      \
-           assert(SvTYPE(_bmuseful) == SVt_PVGV);                      \
+           assert(SvTYPE(_bmuseful) == SVt_PVMG);                      \
            assert(SvVALID(_bmuseful));                                 \
            assert(!SvIOK(_bmuseful));                                  \
-           &(((XPVGV*) SvANY(_bmuseful))->xiv_u.xivu_i32);             \
+           &(((XPVMG*) SvANY(_bmuseful))->xnv_u.xbm_s.xbm_useful);     \
         }))
 #  define BmPREVIOUS(sv)                                               \
     (*({ SV *const _bmprevious = MUTABLE_SV(sv);                       \
-               assert(SvTYPE(_bmprevious) == SVt_PVGV);                \
+               assert(SvTYPE(_bmprevious) == SVt_PVMG);                \
                assert(SvVALID(_bmprevious));                           \
-           &(((XPVGV*) SvANY(_bmprevious))->xnv_u.xbm_s.xbm_previous); \
+           &(((XPVMG*) SvANY(_bmprevious))->xiv_u.xivu_uv);            \
         }))
 #else
-#  define BmFLAGS(sv)          ((XPVGV*) SvANY(sv))->xnv_u.xbm_s.xbm_flags
-#  define BmRARE(sv)           ((XPVGV*) SvANY(sv))->xnv_u.xbm_s.xbm_rare
-#  define BmUSEFUL(sv)         ((XPVGV*) SvANY(sv))->xiv_u.xivu_i32
-#  define BmPREVIOUS(sv)       ((XPVGV*) SvANY(sv))->xnv_u.xbm_s.xbm_previous
+#  define BmRARE(sv)           ((XPVMG*) SvANY(sv))->xnv_u.xbm_s.xbm_rare
+#  define BmUSEFUL(sv)         ((XPVMG*) SvANY(sv))->xnv_u.xbm_s.xbm_useful
+#  define BmPREVIOUS(sv)       ((XPVMG*) SvANY(sv))->xiv_u.xivu_uv
 
 #endif
 
diff --git a/t/op/index.t b/t/op/index.t
index c8aafcf..b5b1005 100644
--- a/t/op/index.t
+++ b/t/op/index.t
@@ -7,7 +7,7 @@ BEGIN {
 }
 
 use strict;
-plan( tests => 120 );
+plan( tests => 122 );
 
 run_tests() unless caller;
 
@@ -224,4 +224,10 @@ formline PVBM2;
 is($^A, 'bang', "formline isn't confused by index compilation");
 is(index('bang', PVBM2), 0, "index isn't confused by format compilation");
 
+{
+    use constant perl => "rules";
+    is(index("perl rules", perl), 5, 'first index of a constant works');
+    is(index("rules 1 & 2", perl), 0, 'second index of the same constant 
works');
+}
+
 }
diff --git a/util.c b/util.c
index 1c90fb0..3428a25 100644
--- a/util.c
+++ b/util.c
@@ -545,13 +545,24 @@ Perl_fbm_compile(pTHX_ SV *sv, U32 flags)
 {
     dVAR;
     register const U8 *s;
-    register U32 i;
+    STRLEN i;
     STRLEN len;
-    U32 rarest = 0;
+    STRLEN rarest = 0;
     U32 frequency = 256;
+    MAGIC *mg;
 
     PERL_ARGS_ASSERT_FBM_COMPILE;
 
+    /* Refuse to fbm_compile a studied scalar, as this gives more flexibility 
in
+       SV flag usage.  No real-world code would ever end up using a studied
+       scalar as a compile-time second argument to index, so this isn't a real
+       pessimisation.  */
+    if (SvSCREAM(sv))
+       return;
+
+    if (SvVALID(sv))
+       return;
+
     if (flags & FBMcf_TAIL) {
        MAGIC * const mg = SvUTF8(sv) && SvMAGICAL(sv) ? mg_find(sv, 
PERL_MAGIC_utf8) : NULL;
        sv_catpvs(sv, "\n");            /* Taken into account in fbm_instr() */
@@ -561,31 +572,49 @@ Perl_fbm_compile(pTHX_ SV *sv, U32 flags)
     s = (U8*)SvPV_force_mutable(sv, len);
     if (len == 0)              /* TAIL might be on a zero-length string. */
        return;
-    SvUPGRADE(sv, SVt_PVGV);
+    SvUPGRADE(sv, SVt_PVMG);
     SvIOK_off(sv);
     SvNOK_off(sv);
     SvVALID_on(sv);
+
+    /* "deep magic", the comment used to add. The use of MAGIC itself isn't
+       really. MAGIC was originally added in 79072805bf63abe5 (perl 5.0 alpha 
2)
+       to call SvVALID_off() if the scalar was assigned to.
+
+       The comment itself (and "deeper magic" below) date back to
+       378cc40b38293ffc (perl 2.0). "deep magic" was an annotation on
+       str->str_pok |= 2;
+       where the magic (presumably) was that the scalar had a BM table hidden
+       inside itself.
+
+       As MAGIC is always present on BMs [in Perl 5 :-)], we can use it to 
store
+       the table instead of the previous (somewhat hacky) approach of co-opting
+       the string buffer and storing it after the string.  */
+
+    assert(!mg_find(sv, PERL_MAGIC_bm));
+    mg = sv_magicext(sv, NULL, PERL_MAGIC_bm, &PL_vtbl_bm, NULL, 0);
+    assert(mg);
+
     if (len > 2) {
-       const unsigned char *sb;
+       /* Shorter strings are special-cased in Perl_fbm_instr(), and don't use
+          the BM table.  */
        const U8 mlen = (len>255) ? 255 : (U8)len;
+       const unsigned char *const sb = s + len - mlen; /* first char (maybe) */
        register U8 *table;
 
-       Sv_Grow(sv, len + 256 + PERL_FBM_TABLE_OFFSET);
-       table
-           = (unsigned char*)(SvPVX_mutable(sv) + len + PERL_FBM_TABLE_OFFSET);
-       s = table - 1 - PERL_FBM_TABLE_OFFSET;  /* last char */
+       Newx(table, 256, U8);
        memset((void*)table, mlen, 256);
+       mg->mg_ptr = (char *)table;
+       mg->mg_len = 256;
+
+       s += len - 1; /* last char */
        i = 0;
-       sb = s - mlen + 1;                      /* first char (maybe) */
        while (s >= sb) {
            if (table[*s] == mlen)
                table[*s] = (U8)i;
            s--, i++;
        }
-    } else {
-       Sv_Grow(sv, len + PERL_FBM_TABLE_OFFSET);
     }
-    sv_magic(sv, NULL, PERL_MAGIC_bm, NULL, 0);        /* deep magic */
 
     s = (const unsigned char*)(SvPVX_const(sv));       /* deeper magic */
     for (i = 0; i < len; i++) {
@@ -594,14 +623,13 @@ Perl_fbm_compile(pTHX_ SV *sv, U32 flags)
            frequency = PL_freq[s[i]];
        }
     }
-    BmFLAGS(sv) = (U8)flags;
     BmRARE(sv) = s[rarest];
     BmPREVIOUS(sv) = rarest;
     BmUSEFUL(sv) = 100;                        /* Initial value */
     if (flags & FBMcf_TAIL)
        SvTAIL_on(sv);
-    DEBUG_r(PerlIO_printf(Perl_debug_log, "rarest char %c at %lu\n",
-                         BmRARE(sv),(unsigned long)BmPREVIOUS(sv)));
+    DEBUG_r(PerlIO_printf(Perl_debug_log, "rarest char %c at %"UVuf"\n",
+                         BmRARE(sv), BmPREVIOUS(sv)));
 }
 
 /* If SvTAIL(littlestr), it has a fake '\n' at end. */
@@ -641,9 +669,10 @@ Perl_fbm_instr(pTHX_ unsigned char *big, register unsigned 
char *bigend, SV *lit
        return NULL;
     }
 
-    if (littlelen <= 2) {              /* Special-cased */
-
-       if (littlelen == 1) {
+    switch (littlelen) { /* Special cases for 0, 1 and 2  */
+    case 0:
+       return (char*)big;              /* Cannot be SvTAIL! */
+    case 1:
            if (SvTAIL(littlestr) && !multiline) { /* Anchor only! */
                /* Know that bigend != big.  */
                if (bigend[-1] == '\n')
@@ -659,11 +688,7 @@ Perl_fbm_instr(pTHX_ unsigned char *big, register unsigned 
char *bigend, SV *lit
            if (SvTAIL(littlestr))
                return (char *) bigend;
            return NULL;
-       }
-       if (!littlelen)
-           return (char*)big;          /* Cannot be SvTAIL! */
-
-       /* littlelen is 2 */
+    case 2:
        if (SvTAIL(littlestr) && !multiline) {
            if (bigend[-1] == '\n' && bigend[-2] == *little)
                return (char*)bigend - 2;
@@ -723,7 +748,10 @@ Perl_fbm_instr(pTHX_ unsigned char *big, register unsigned 
char *bigend, SV *lit
        if (SvTAIL(littlestr) && (*bigend == *little))
            return (char *)bigend;      /* bigend is already decremented. */
        return NULL;
+    default:
+       break; /* Only lengths 0 1 and 2 have special-case code.  */
     }
+
     if (SvTAIL(littlestr) && !multiline) {     /* tail anchored? */
        s = bigend - littlelen;
        if (s >= big && bigend[-1] == '\n' && *s == *little
@@ -761,8 +789,8 @@ Perl_fbm_instr(pTHX_ unsigned char *big, register unsigned 
char *bigend, SV *lit
        return NULL;
 
     {
-       register const unsigned char * const table
-           = little + littlelen + PERL_FBM_TABLE_OFFSET;
+       const MAGIC *const mg = mg_find(littlestr, PERL_MAGIC_bm);
+       const unsigned char * const table = (const unsigned char *) mg->mg_ptr;
        register const unsigned char *oldlittle;
 
        --littlelen;                    /* Last char found by table lookup */
@@ -798,7 +826,7 @@ Perl_fbm_instr(pTHX_ unsigned char *big, register unsigned 
char *bigend, SV *lit
        }
       check_end:
        if ( s == bigend
-            && (BmFLAGS(littlestr) & FBMcf_TAIL)
+            && SvTAIL(littlestr)
             && memEQ((char *)(bigend - littlelen),
                      (char *)(oldlittle - littlelen), littlelen) )
            return (char*)bigend - littlelen;
@@ -836,7 +864,7 @@ Perl_screaminstr(pTHX_ SV *bigstr, SV *littlestr, I32 
start_shift, I32 end_shift
 
     PERL_ARGS_ASSERT_SCREAMINSTR;
 
-    assert(SvTYPE(littlestr) == SVt_PVGV);
+    assert(SvTYPE(littlestr) == SVt_PVMG);
     assert(SvVALID(littlestr));
 
     if (*old_posp == -1

--
Perl5 Master Repository

Reply via email to