[Spark SQL] xxhash64 default seed of 42 confusion

Igor Calabria Tue, 16 Apr 2024 08:45:20 -0700

Hi all,

I've noticed that spark's xxhas64 output doesn't match other tool's due to
using seed=42 as a default. I've looked at a few libraries and they use 0
as a default seed:


- python https://github.com/ifduyue/python-xxhash
- java https://github.com/OpenHFT/Zero-Allocation-Hashing/
- java (slice library, used by trino)
https://github.com/airlift/slice/blob/master/src/main/java/io/airlift/slice/XxHash64.java

Was there a special motivation behind this? or is 42 just used for the sake
of the hitchhiker's guide reference? It's very common for spark to interact
with other tools (either via data or direct connection) and this just seems
like a unnecessary footgun.

[Spark SQL] xxhash64 default seed of 42 confusion

Reply via email to