Hi all,

I've noticed that spark's xxhas64 output doesn't match other tool's due to
using seed=42 as a default. I've looked at a few libraries and they use 0
as a default seed:

- python https://github.com/ifduyue/python-xxhash
- java https://github.com/OpenHFT/Zero-Allocation-Hashing/
- java (slice library, used by trino)
https://github.com/airlift/slice/blob/master/src/main/java/io/airlift/slice/XxHash64.java

Was there a special motivation behind this? or is 42 just used for the sake
of the hitchhiker's guide reference? It's very common for spark to interact
with other tools (either via data or direct connection) and this just seems
like a unnecessary footgun.

Reply via email to