On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund <and...@anarazel.de> wrote:
> Just to clarify: I don't think it's a problem to do so for integers and
> most other simple scalar types. There's plenty hash algorithms that are
> endianess independent, and the rest is just a bit of care.
Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?
https://github.com/markokr/pghashlib implements various hash functions
for PostgreSQL, and claims that, of those implemented, crc32, Jenkins,
lookup3be and lookup3le, md5, and siphash24 are endian-independent.
An interesting point here is that Jeff Davis asserted in the original
post on this thread that our existing hash_any() wasn't portable, but
our current hash_any seems to be the Jenkins algorithm -- so I'm
confused. Part of the problem seems to be that, according to
https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of
those. I don't know whether the one in pghashlib is the same one
Kennel Marshall suggested xxhash as an endian-independent algorithm
upthread. Code for that is available under a 2-clause BSD license.
PostgreSQL page checksums use an algorithm based on, but not exactly,
FNV-1a. See storage/checksum_impl.h. The comments there say this
algorithm was chosen with speed in mind. Our version is not
endian-independent because it folds in 4-byte integers rather than
1-byte integers, but plain old FNV-1a *is* endian-independent and
could be used.
We also have an implementation of CRC32C in core - see port/pg_crc32.h
and port/pg_crc32c_sb8.c. It's not clear to me whether this is
Endian-independent or not, although there is stuff that depends on
WORDS_BIGENDIAN, so, uh, maybe?
Some other possibly-interesting links:
The Enterprise PostgreSQL Company
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: