On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants.aa...@cybertec.at> wrote:
> But given that we know the data length and we have it in a register
> already, it's easy enough to just mask out data past the end with a
> shift. See patch 1. Performance benefit is about 1.5x Measured on a
> small test harness that just hashes and finalizes an array of strings,
> with a data dependency between consecutive hashes (next address
> depends on the previous hash output).

Interesting work! I've taken this idea and (I'm guessing, haven't
tested) improved it by re-using an intermediate step for the
conditional, simplifying the creation of the mask, and moving the
bitscan out of the longest dependency chain. Since you didn't attach
the test harness, would you like to run this and see how it fares?
(v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan
to test myself as well, but since your test tries to model true
latency, I'm more interested in that one.

> Not sure if the second one is worth the extra code.

I'd say it's not worth optimizing the case we think won't be taken
anyway. I also like having a simple path to assert against.
From be62303df785b80b1bf888b869c5b240a6777af0 Mon Sep 17 00:00:00 2001
From: Ants Aasma <a...@cybertec.at>
Date: Mon, 29 Jan 2024 15:16:02 +0200
Subject: [PATCH v16 1/2] Speed up last iteration of aligned fasthash

We know the length of the string so we can mask out end of the
string with a shift. Without this the aligned version was slower
than unaligned on small strings.
---
 src/include/common/hashfn_unstable.h | 31 ++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h
index 3d927e1fb1..8ee1b99a20 100644
--- a/src/include/common/hashfn_unstable.h
+++ b/src/include/common/hashfn_unstable.h
@@ -176,6 +176,19 @@ fasthash_accum(fasthash_state *hs, const char *k, int len)
 #define haszero64(v) \
 	(((v) - 0x0101010101010101) & ~(v) & 0x8080808080808080)
 
+/*
+ * Returns non-zero when first byte in memory order is not NUL
+ */
+static inline int
+first_byte_nonzero(uint64 v)
+{
+#ifdef WORDS_BIGENDIAN
+		return v >> 56;
+#else
+		return v & 0xFF;
+#endif
+}
+
 /*
  * all-purpose workhorse for fasthash_accum_cstring
  */
@@ -211,11 +224,12 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	const char *const start = str;
 	int			remainder;
 	uint64		zero_bytes_le;
+	uint64		chunk;
 
 	Assert(PointerIsAligned(start, uint64));
 	for (;;)
 	{
-		uint64		chunk = *(uint64 *) str;
+		chunk = *(uint64 *) str;
 
 		/*
 		 * With little-endian representation, we can use this calculation,
@@ -243,9 +257,18 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	 * byte within the input word by counting the number of trailing (because
 	 * little-endian) zeros and dividing the result by 8.
 	 */
-	remainder = pg_rightmost_one_pos64(zero_bytes_le) / BITS_PER_BYTE;
-	fasthash_accum(hs, str, remainder);
-	str += remainder;
+	if (first_byte_nonzero(chunk))
+	{
+		remainder = pg_rightmost_one_pos64(zero_bytes_le) / BITS_PER_BYTE;
+#ifdef WORDS_BIGENDIAN
+		hs->accum = chunk & ((~0ULL) << (64 - BITS_PER_BYTE*remainder));
+#else
+		hs->accum = chunk & ((~0ULL) >> (64 - BITS_PER_BYTE*remainder));
+#endif
+		fasthash_combine(hs);
+
+		str += remainder;
+	}
 
 	return str - start;
 }
-- 
2.43.0

From a1e1648f3f3a25001c62fffe7dcd422273619e3e Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Tue, 30 Jan 2024 16:14:57 +0700
Subject: [PATCH v16 2/2] Shorten dependency chain for computing hash mask

---
 src/include/common/hashfn_unstable.h | 30 ++++++++++++----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/src/include/common/hashfn_unstable.h b/src/include/common/hashfn_unstable.h
index 8ee1b99a20..0cac3aa380 100644
--- a/src/include/common/hashfn_unstable.h
+++ b/src/include/common/hashfn_unstable.h
@@ -176,19 +176,6 @@ fasthash_accum(fasthash_state *hs, const char *k, int len)
 #define haszero64(v) \
 	(((v) - 0x0101010101010101) & ~(v) & 0x8080808080808080)
 
-/*
- * Returns non-zero when first byte in memory order is not NUL
- */
-static inline int
-first_byte_nonzero(uint64 v)
-{
-#ifdef WORDS_BIGENDIAN
-		return v >> 56;
-#else
-		return v & 0xFF;
-#endif
-}
-
 /*
  * all-purpose workhorse for fasthash_accum_cstring
  */
@@ -225,6 +212,7 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	int			remainder;
 	uint64		zero_bytes_le;
 	uint64		chunk;
+	uint64		mask;
 
 	Assert(PointerIsAligned(start, uint64));
 	for (;;)
@@ -257,14 +245,22 @@ fasthash_accum_cstring_aligned(fasthash_state *hs, const char *str)
 	 * byte within the input word by counting the number of trailing (because
 	 * little-endian) zeros and dividing the result by 8.
 	 */
-	if (first_byte_nonzero(chunk))
+	/*
+	 * Create a mask for the remaining bytes and
+	 * combine them into the hash. It would be harmless if the mask also covered the NUL
+	 * terminator, except for the case where it is the first byte in the last input read.
+	 * In that case, we need to return, so we perform a check for that as we form the mask
+	 * for the bytes we need.
+	 */
+	mask = zero_bytes_le >> BITS_PER_BYTE;
+	if (mask)
 	{
 		remainder = pg_rightmost_one_pos64(zero_bytes_le) / BITS_PER_BYTE;
+		mask |= mask - 1;
 #ifdef WORDS_BIGENDIAN
-		hs->accum = chunk & ((~0ULL) << (64 - BITS_PER_BYTE*remainder));
-#else
-		hs->accum = chunk & ((~0ULL) >> (64 - BITS_PER_BYTE*remainder));
+		mask = pg_bswap64(mask);
 #endif
+		hs->accum = chunk & mask;
 		fasthash_combine(hs);
 
 		str += remainder;
-- 
2.43.0

Reply via email to