On Thu, Jun 1, 2017 at 2:25 PM, Andres Freund <and...@anarazel.de> wrote:
> Just to clarify: I don't think it's a problem to do so for integers and
> most other simple scalar types. There's plenty hash algorithms that are
> endianess independent, and the rest is just a bit of care.

Do you have any feeling for which of those endianness-independent hash
functions might be a reasonable choice for us?

https://github.com/markokr/pghashlib implements various hash functions
for PostgreSQL, and claims that, of those implemented, crc32, Jenkins,
lookup3be and lookup3le, md5, and siphash24 are endian-independent.

An interesting point here is that Jeff Davis asserted in the original
post on this thread that our existing hash_any() wasn't portable, but
our current hash_any seems to be the Jenkins algorithm -- so I'm
confused.  Part of the problem seems to be that, according to
https://en.wikipedia.org/wiki/Jenkins_hash_function there are 4 of
those.  I don't know whether the one in pghashlib is the same one
we've implemented.

Kennel Marshall suggested xxhash as an endian-independent algorithm
upthread.  Code for that is available under a 2-clause BSD license.

PostgreSQL page checksums use an algorithm based on, but not exactly,
FNV-1a.  See storage/checksum_impl.h.  The comments there say this
algorithm was chosen with speed in mind.  Our version is not
endian-independent because it folds in 4-byte integers rather than
1-byte integers, but plain old FNV-1a *is* endian-independent and
could be used.

We also have an implementation of CRC32C in core - see port/pg_crc32.h
and port/pg_crc32c_sb8.c.  It's not clear to me whether this is
Endian-independent or not, although there is stuff that depends on
WORDS_BIGENDIAN, so, uh, maybe?

Some other possibly-interesting links:


Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to