[perl.git] branch yves/superfasthash5, created. v5.17.3-288-g6b3e208

Yves Orton Sat, 08 Sep 2012 12:05:58 -0700

In perl.git, the branch yves/superfasthash5 has been created

<http://perl5.git.perl.org/perl.git/commitdiff/6b3e2080fb0e74ed46bca0c58d75d9dd9dd9d8a2?hp=0000000000000000000000000000000000000000>


        at  6b3e2080fb0e74ed46bca0c58d75d9dd9dd9d8a2 (commit)

- Log -----------------------------------------------------------------
commit 6b3e2080fb0e74ed46bca0c58d75d9dd9dd9d8a2
Author: Yves Orton <[email protected]>
Date:   Wed Aug 29 09:47:03 2012 +0200

    Fix hash ordering dependency in DBM_Filter/t/int32.t
    
    Under the filtering rules in place undef() and "" and 0 map to a
    packed representation of 0.
    
    In the StoreData call we pass in an anonymous perl (untied) hash
    containing an "undef" key (which is actually treated as "") with a
    value of undef(), along with a key 0 with a value of 1. This hash
    will store both values as distinct key/value pairs.
    
    When this hash is used to set up the *tied* %h1 hash both the "" key
    and the 0 key will be converted into the same packed value "\0\0\0\0",
    which means that whichever is last in the each() of the input hashref
    will be the one stored in %h1.
    
    This means the test breaks if we change the PL_hash_seed or the hash
    implementation in such a way that "" comes before 0 in the keys of
    the hash.
    
    This patch changes the input test hash to verify that undef() => 1 is
    treated the same as 0 => 1, and eliminates the potential key collision.
    The reason this test was reliable in the wild is that pretty well all
    perls use a 0 hash seed and the same hash function.
    
    This test probably would have broken in other enviornments as well.

M       lib/DBM_Filter/t/int32.t

commit d28473e10b4a3244242f5d41640129cc79ef4b0c
Author: Yves Orton <[email protected]>
Date:   Wed Aug 29 09:38:07 2012 +0200

    improve diagnostics of dbm_filter_util.pl by using Data::Dumper::qquote
    
    We are testing things like packed strings. If we output the bytes raw
    via diag we upset terminal layers expecting utf8, and generally output
    unreadable garbage regardless. So use Data::Dumper::qqoute() to
    preprocess diagnositics output.

M       lib/dbm_filter_util.pl

commit 7d8893a982731c38702178157a4ec3f3da032ada
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 11:35:01 2012 +0200

    fix silly typo

M       ext/re/t/re_funcs_u.t

commit eff6df589a4996849aad8f258b2ccc004b5edd65
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 11:07:18 2012 +0200

    fix off-by-one error in uninitialized warnings subscript finding logic
    
    When we use an uninitialized var in a hash we try to find the var, so
    we can show the subscript. An off by one error prevented us from seeing
    the items in the last bucket.
    
    We expect to see this:
    
       $ ./perl -Ilib -MDevel::Peek -wle'our %foo7=("foo"=>"bar","baz"=>undef); 
\
       $SIG{__WARN__}=sub { print STDERR @_; Dump(\%foo7); }; print sprintf 
"\n%s:%s %s:%s",%foo7;'
    
       Use of uninitialized value $foo7{"baz"} in sprintf at -e line 1.
    
    But when baz was stored in the last bucket (1/8th of the time) it fails
    to find it:
    
       $ ./perl -Ilib -MDevel::Peek -wle'our %foo7=("foo"=>"bar","baz"=>undef); 
\
       $SIG{__WARN__}=sub { print STDERR @_; Dump(\%foo7); }; print sprintf 
"\n%s:%s %s:%s",%foo7;'
    
       Use of uninitialized value within %foo7 in sprintf at -e line 1.
    
    This patch fixes the off by one error. It does not correct any broken
    tests it might cause. I will do that in a follow up patch.

M       sv.c

commit d157de647524a51f922f85bf562dd5a0e7518d09
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 10:21:26 2012 +0200

    enable "superfasthash" and add back in the len key distributor (which imo 
it needs)

M       hv.h

commit b2f0c84125797567ce52db0dc2f3edd874323b71
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 10:20:37 2012 +0200

    fixup for autodie

M       cpan/autodie/lib/Fatal.pm

commit d790a97b849ce42d89a84d252cd10d112a3fe33a
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 10:15:40 2012 +0200

    fix another very subtle hash ordering dependency
    
    Currently our hash implementation is order dependent on insertion.
    
    When two keys collide and have to be stored in the same bucket the
    order in which they are inserted into the hash will govern the order
    in which they are fetched out by things like keys() and values().
    
    This means that a copy of such a hash may be different. It is possible
    this can be fixed with a low cost, but until then you cannot rely on
    two hashes with the same keys having the same ordering of those keys
    
    Depending on the hash algorithm and the seed values used this test
    would fail. By changing it so there is one initial hash and then all
    tests are done on copies of that hash we avoid the problem.

M       t/op/smartkve.t

commit 41f30bf4a786860423e90e711fedd2b6308811fe
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 09:24:08 2012 +0200

    fix a hash order dependency in autouse tests
    
    At the same time make part of the internals deterministic Just In Case.

M       cpan/autodie/lib/Fatal.pm
M       cpan/autodie/t/hints_pod_examples.t

commit df2ccefc5235cf2cdced729dc9b302a59194b23e
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 09:23:15 2012 +0200

    fix a hash order dependency in the tests

M       ext/re/t/re_funcs_u.t

commit 87c920b02606b311d3f52f76e6255abeda212481
Author: Yves Orton <[email protected]>
Date:   Tue Aug 28 00:15:55 2012 +0200

    check for unset hash use

M       hv.h

commit ff9290956a0539355d0d9f01db5a92b2ecfb65c3
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 22:39:58 2012 +0200

    catch if we do a hash lookup with a zero seed

M       hv.h

commit 15f7d0d669bff3755b295296f06a9453d7aff293
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 19:28:19 2012 +0200

    initialize the PL_hash_seed and the PL_rehash_seed as part of the startup 
process
    
    this breaks tests in weird and wonderful ways, and IMO it should NOT.

M       intrpvar.h
M       sv.c

commit e5ba8d436371fd66d90688f76aed8c9d45e7de35
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 11:21:07 2012 +0200

    why does setting the hash seed break tests when using One-At-A-Time?
    
    Especially "cpan/autodie/t/hints_pod_examples.t" - it makes no sense to me.
    
        >not ok 28 - scalar test - zero_scalar("")
        >#   Failed test 'scalar test - zero_scalar("")'
        >#   at cpan/autodie/t/hints_pod_examples.t line 168.
        >#          got: 'Can't zero_scalar(''):  at 
cpan/autodie/t/hints_pod_examples.t line 157
        ># '
        >#     expected: ''
        >#
        >#
        >#         my $scalar = zero_scalar("");
        >#         1;
        >#

M       intrpvar.h

commit 1c27caf181aa0f62cf9bb2cfe1aaab469f877282
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 11:10:22 2012 +0200

    in order to make One-At-A-Time pass test the hash seed must be 0
    
    I need to figure out why. For now this documents things and restores the 
default to One-At-A-Time.
    
    I dont think its right, changing the hash seed should not break anything.

M       hv.h
M       intrpvar.h

commit c10b817f73cd28871184cf93cbf9b23a6c117407
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 09:03:55 2012 +0200

    Switch to Paul Hsieh's 'superfasthash" hashing algorithm.
    
    Details on the source of this code are available here:
    
      See http://www.azillionmonkeys.com/qed/hash.html
    
    This patch sets the default to the Paul's algorithm, before it is
    rebased to trunk perhaps the default should be changed back to the
    old algorithm.

M       hv.h

commit 57ef3bed5908ca1b5c3cb5dc4c53f2cd4869a03e
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 09:01:52 2012 +0200

    PERL_DEBUG_HASH_SEED show both seeds.
    
    Before this patch we showed the generally unused PL_rehash_seed and
    did not show te PL_hash_seed which is what we actually use most of
    the time. Additionally when we showed it we called the rehash_seed
    the "seed", which is confusing.

M       perl.c

commit beff4875c3e5c99f5a0fc108d0b7e68145ad623c
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 09:01:00 2012 +0200

    make get_hash_seed() reusable by removing PL_rehash_seed logic from it
    
    This is in preparation for making perl use a different hash seed every
    invocation.

M       perl.c
M       util.c

commit db925bc607fd6ecbd0838df43ec067526629d651
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:58:59 2012 +0200

    use a random number as the hash initializer
    
    This is a value generated by get_hash_seed() which I have
    used a better PL_hash_seed initializer than 0, which leads
    to a "zero sink" on strings of the form ("\0" x $number)

M       intrpvar.h

commit 6aa90c10c344ca08c43343e166758384edc8bcdd
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:57:19 2012 +0200

    make cpan/CGI/.. robust to changes in the key hashing algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       cpan/CGI/lib/CGI.pm
M       cpan/CGI/lib/CGI/Util.pm
M       cpan/CGI/t/function.t
M       cpan/CGI/t/html.t

commit 264b4d9dc79ce06d03b34634c25de74c0ce2c975
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:55:32 2012 +0200

    make cpan/Module-Pluggable/t/23depth.t robust to changes in the key hashing 
algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       cpan/Module-Pluggable/t/23depth.t

commit 2c4be1d95fe825a6d82c6acb70275568d8dc981d
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:54:50 2012 +0200

    make test robust to weirdness (I needed this during debugging)

M       cpan/autodie/t/hints_pod_examples.t

commit 723968f1471c269fe03b2be7925f1eac44ca2fb4
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:53:54 2012 +0200

    make ext/B/t/b.t robust to changes in the key hashing algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       ext/B/t/b.t

commit 8779cf3335ca173e876581ee380b9f0b10adc6ef
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:53:24 2012 +0200

    make ext/Hash-Util-Field/t/10_hash.t robust to changes in the key hashing 
algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       ext/Hash-Util-FieldHash/t/10_hash.t

commit 15cd114ba1e375814bdcf8254a4ddaa52b1b51c0
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:52:51 2012 +0200

    make warnings tests robust to changes in the key hashing algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       t/lib/warnings/9uninit

commit 899402a83f526a887cf36681c93232cc75b3bc0c
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:51:39 2012 +0200

    make t/op/defins.t robust to changes in the key hashing algorithm
    
    This is in preparation for allowing people to build with different
    hash algorithms.

M       t/op/defins.t

commit 2ddedde7a9cfd8915e3204b69afc6b1177fb8c93
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:49:56 2012 +0200

    Make t/op/hash.t robust to changes in the key hashing algorithm we use
    
    This is in preparation for allowing builds with other hash algorithms.

M       t/op/hash.t

commit dfb09fad4e24da76f43dce84e3e675b9c6fc085a
Author: Yves Orton <[email protected]>
Date:   Mon Aug 27 08:47:06 2012 +0200

    add a way to find out what hash key Perl would use
    
    This will allows us to make our hash tests robust to changes
    in the hashing algorithm.

M       universal.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch yves/superfasthash5, created. v5.17.3-288-g6b3e208

Reply via email to