Re: General purpose hashing func in pgbench

Ildar Musin Tue, 06 Mar 2018 05:48:38 -0800

Hello Teodor,

Thank you for reviewing this patch.


On 06.03.2018 15:53, Teodor Sigaev wrote:

Patch applies, compiles, pgbench & global "make check" ok, doc
built ok.


Agree.

If I understand upthread correctly, implementation of Murmur hash
algorithm based on Austin Appleby work
https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp

If so, I have notice and objections:

1) Seems, it's good idea to add credits to Austin Appleby to
comments.


Sounds fair, I'll send an updated version soon.

2) Reference implementaion directly says (link above): // 2. It will
not produce the same results on little-endian and big-endian //
machines.

I don't think that is good thing for testing and benchmarking for
several reasons: it could produce different data collection,
different selects, different distribution.

3) Again, from comments of reference implementation: // Note - This
code makes a few assumptions about how your machine behaves - // 1.
We can read a 4-byte value from any address without crashing

It's not true for all supported platforms. Any box with strict
aligment will SIGBUSed here.


I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).

Thanks!

--
Ildar Musin
i.mu...@postgrespro.ru

Re: General purpose hashing func in pgbench

Reply via email to