On Tue, Aug 11, 2020 at 3:38 PM Ian Denhardt <[email protected]> wrote:

> I think
> it's valuable for this function to be one of the options for
> compatibility


I agree there, but I'm becoming increasingly convinced that it should not
be the default, or even recommended.

I've added a new level of analysis to my benchmark function as of the
latest commit
<https://github.com/bobg/hashsplit/commit/17195adda444fcc11e96cfd6058613edd88af5be>:
in addition to counting how often each bit is zero (which should approach
50%), it counts how often each pair of bits is correlated (which should
also approach 50%). The results for rollsum are not great. On the other
hand, the results for the other algorithms in
github.com/chmduquesne/rollinghash, which are now added to the benchmark,
are great (except for adler32). Try running with and without the env
var BENCHMARK_ROLLSUM_ANALYZE=1 to see the results.

By the way, there's probably more sophisticated analysis that could be done
on the distribution produced by these hashes but I suspect we're into
diminishing returns after the pairwise bit correlations I'm now doing. I
could be wrong though. Whether I am, and how else the results should be
analyzed, are left as an exercise for other readers of this thread.

Cheers,
- Bob

and (2) I don't want to give too many options; a
> different hash function is much more extra implementation work than a
> numeric parameter, and if we add too many we've sort of missed the
> point of standardization. So I'd only want to do this if there are clear
> compelling use cases for each of the functions we include.
>
> Whatever parameters we decide to add, we should pick a
> default/recommended set of values for them.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/perkeep/CAEf8c48L50if%2BO0_3Jg8%2BC1hjZf5dpOiO9PuqknJsVNARZaDog%40mail.gmail.com.

Reply via email to