In perl.git, the branch smoke-me/new_hashes has been created

<http://perl5.git.perl.org/perl.git/commitdiff/5c554842c13d3c8ff6bb90fec8e73c51963b91e8?hp=0000000000000000000000000000000000000000>

        at  5c554842c13d3c8ff6bb90fec8e73c51963b91e8 (commit)

- Log -----------------------------------------------------------------
commit 5c554842c13d3c8ff6bb90fec8e73c51963b91e8
Author: Yves Orton <[email protected]>
Date:   Thu Mar 23 11:57:36 2017 +0100

    improve and update hash algorithm configuration docs in INSTALL
    
    Updated to reflect new hash functions, along with some wordsmithing
    tweaks to make things read more smoothly (hopefully).

M       INSTALL

commit 3ad3c8b70fb4fc74e0b8ec8759a4bff77b55d8d5
Author: Yves Orton <[email protected]>
Date:   Thu Mar 23 11:54:43 2017 +0100

    get rid of USE_HASH_SEED_EXPLICIT
    
    I think this has been outright broken for a long time, and it
    doesnt make any sense to since mandatory randomization anyway,
    so simply remove it.

M       hv_func.h
M       perl.c
M       perl.h
M       util.c

commit 7f89a5572a9e9281879b8786f769d5af5abe8953
Author: Yves Orton <[email protected]>
Date:   Thu Mar 23 11:06:13 2017 +0100

    Move utility macros to their own file
    
    so that hv_func is left with only logic relating to selecting
    and configuring the hash function we use, not the macros our
    hash functions use.

M       MANIFEST
M       hv_func.h
A       hv_macro.h

commit 52d2e04f0bd69ba9f2a8aaca4a04dd3ed63ad7b0
Author: Yves Orton <[email protected]>
Date:   Wed Mar 22 16:40:28 2017 +0100

    Add new hashing and "hash with state" infrastructure
    
    This adds support for three new hash functions: StadtX, Zaphod32 and SBOX,
    and reworks some of our hash internals infrastructure to do so.
    
    SBOX is special in that it is designed to be used in conjuction with any
    other hash function for hashing short strings very efficiently and very
    securely. It features compile time options on how much memory and startup
    time are traded off to control the length of keys that SBOX hashes.
    
    This also adds support for caching the hash values of single byte characters
    which can be used in conjuction with any other hash, including SBOX, 
although
    SBOX itself is as fast as the lookup cache, so typically you wouldnt use 
both
    at the same time.
    
    This also *removes* support for Jenkins One-At-A-Time. It has served us
    well, but it's day is done.
    
    This patch adds three new files: zaphod32_hash.h, stadtx_hash.h,
    sbox32_hash.h

M       MANIFEST
M       embedvar.h
M       hv_func.h
M       perl.c
M       perlapi.h
M       perlvars.h
A       sbox32_hash.h
A       stadtx_hash.h
M       t/porting/globvar.t
A       zaphod32_hash.h

commit 57f6303226e1304da95613ca12511780f25cb456
Author: Yves Orton <[email protected]>
Date:   Wed Mar 22 15:59:31 2017 +0100

    Tweak our hash bucket splitting rules
    
    Prior to this patch we resized hashes when after inserting a key
    the load factor of the hash reached 1 (load factor= keys / buckets).
    
    This patch makes two subtle changes to this logic:
    
    1. We split only after inserting a key into an utilized bucket,
    2. and the maximum load factor exceeds 0.667
    
    The intent and effect of this change is to increase our hash tables
    efficiency. Reducing the maximum load factor 0.667 means that we should
    have much less keys in collision overall, at the cost of some unutilized
    space (2/3rds was chosen as it is easier to calculate than 0.7). On the
    other hand, only splitting after a collision means in theory that we execute
    the "final split" less often. Additionally, insertin a key into an unused
    bucket increases the efficiency of the hash, without changing the worst
    case.[1] In other words without increasing collisions we use the space
    in our hashes more efficiently.
    
    A side effect of this hash is that the size of a hash is more sensitive
    to key insert order. A set of keys with some collisions might be one
    size if those collisions were encountered early, or another if they were
    encountered later. Assuming random distribution of hash values about
    50% of hashes should be smaller than they would be without this rule.
    
    The two changes complement each other, as changing the maximum load
    factor decreases the chance of a collision, but changing to only split
    after a collision means that we won't waste as much of that space we
    might.
    
    [1] Since I personally didnt find this obvious at first here is my
    explanation:
    
    The old behavior was that we doubled the number of buckets when the
    number of keys in the hash matched that of buckets. So on inserting
    the Kth key into a K bucket hash, we would double the number of
    buckets.  Thus the worse case prior to this patch was a hash
    containing K-1 keys which all hash into a single  bucket, and the post
    split worst case behavior would be having K items in a single bucket
    of a hash with 2*K buckets total.
    
    The new behavior says that we double the size of the hash once inserting
    an item into an occupied bucket and after doing so we exceeed the maximum
    load factor (leave aside the change in maximum load factor in this patch).
    If we insert into an occupied bucket (including the worse case bucket) then
    we trigger a key split, and we have exactly the same cases as before.
    If we insert into an empty bucket then we now have a worst case of K-1 items
    in one bucket, and 1 item in another, in a hash with K buckets, thus the
    worst case has not changed.

M       ext/Hash-Util/t/Util.t
M       ext/Hash-Util/t/builtin.t
M       hv.c
M       t/op/coreamp.t
M       t/op/hash.t
M       t/op/sub_lval.t

commit 811d31fed1742985ef21f2b6146ba21bf3364786
Author: Yves Orton <[email protected]>
Date:   Wed Mar 15 15:03:42 2017 +0100

    use a specific define for 64 bit hashing

M       hv_func.h
-----------------------------------------------------------------------

--
Perl5 Master Repository

Reply via email to