On Tue, Aug 11, 2020 at 3:38 PM Ian Denhardt <[email protected]> wrote:
> I think > it's valuable for this function to be one of the options for > compatibility I agree there, but I'm becoming increasingly convinced that it should not be the default, or even recommended. I've added a new level of analysis to my benchmark function as of the latest commit <https://github.com/bobg/hashsplit/commit/17195adda444fcc11e96cfd6058613edd88af5be>: in addition to counting how often each bit is zero (which should approach 50%), it counts how often each pair of bits is correlated (which should also approach 50%). The results for rollsum are not great. On the other hand, the results for the other algorithms in github.com/chmduquesne/rollinghash, which are now added to the benchmark, are great (except for adler32). Try running with and without the env var BENCHMARK_ROLLSUM_ANALYZE=1 to see the results. By the way, there's probably more sophisticated analysis that could be done on the distribution produced by these hashes but I suspect we're into diminishing returns after the pairwise bit correlations I'm now doing. I could be wrong though. Whether I am, and how else the results should be analyzed, are left as an exercise for other readers of this thread. Cheers, - Bob and (2) I don't want to give too many options; a > different hash function is much more extra implementation work than a > numeric parameter, and if we add too many we've sort of missed the > point of standardization. So I'd only want to do this if there are clear > compelling use cases for each of the functions we include. > > Whatever parameters we decide to add, we should pick a > default/recommended set of values for them. > -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/perkeep/CAEf8c48L50if%2BO0_3Jg8%2BC1hjZf5dpOiO9PuqknJsVNARZaDog%40mail.gmail.com.
