Here is a new version of the patch that I feel is in decent shape.

On Mon, Mar 25, 2024 at 10:16:47AM -0500, Nathan Bossart wrote:
> On Mon, Mar 25, 2024 at 11:08:39AM -0400, Tom Lane wrote:
>> * The magic constants (crossover list length and bloom filter size)
>> need some testing to see if there are better values.  They should
>> probably be made into named #defines, too.  I suspect, with little
>> proof, that the bloom filter size isn't particularly critical --- but
>> I know we pulled the crossover of 1000 out of thin air, and I have
>> no certainty that it's even within an order of magnitude of being a
>> good choice.
> 
> I'll try to construct a couple of tests to see if we can determine a proper
> order of magnitude.

I spent some time trying to get some ballpark figures but have thus far
been unsuccessful.  Even if I was able to get good numbers, I'm not sure
how much they'd help us, as we'll still need to decide how much overhead we
are willing to take in comparison to the linear search.  I don't think
~1000 is an unreasonable starting point, as it seems generally more likely
that you will have many more roles to process at that point than if the
threshold was, say, 100.  And if the threshold is too high (e.g., 10,000),
this optimization will only kick in for the most extreme cases, so we'd
likely be leaving a lot on the table.  But, I will be the first to admit
that my reasoning here is pretty unscientific, and I'm open to suggestions
for how to make it less so.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 95fe6e811990895acb9f79544f502408ac472203 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nat...@postgresql.org>
Date: Mon, 25 Mar 2024 23:17:08 -0500
Subject: [PATCH v3 1/1] Optimize roles_is_member_of() with a Bloom filter.

When the list of roles gathered by roles_is_member_of() grows very
large, a Bloom filter is created to help avoid some linear searches
through the list.  The threshold for creating the Bloom filter is
set arbitrarily high and may require future adjustment.

Suggested-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com
---
 src/backend/utils/adt/acl.c | 71 +++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index cf5d08576a..7c4b642ebd 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -36,6 +36,7 @@
 #include "common/hashfn.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
+#include "lib/bloomfilter.h"
 #include "lib/qunique.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
@@ -78,6 +79,17 @@ static Oid	cached_role[] = {InvalidOid, InvalidOid, InvalidOid};
 static List *cached_roles[] = {NIL, NIL, NIL};
 static uint32 cached_db_hash;
 
+/*
+ * If the list of roles gathered by roles_is_member_of() grows larger than the
+ * below threshold, a Bloom filter is created to speed up list membership
+ * checks.  This threshold is set arbitrarily high to avoid the overhead of
+ * creating the Bloom filter until it seems likely to provide a net benefit.
+ *
+ * XXX: The current threshold of 1024 is little more than a wild guess and may
+ * need to be adjusted in the future.
+ */
+#define ROLES_LIST_BLOOM_THRESHOLD 1024
+static bloom_filter *roles_list_bf = NULL;
 
 static const char *getid(const char *s, char *n, Node *escontext);
 static void putid(char *p, const char *s);
@@ -4918,6 +4930,54 @@ RoleMembershipCacheCallback(Datum arg, int cacheid, uint32 hashvalue)
 	cached_role[ROLERECURSE_SETROLE] = InvalidOid;
 }
 
+/*
+ * A helper function for roles_is_member_of() that provides an optimized
+ * implementation of list_append_unique_oid() via a Bloom filter.  The caller
+ * (i.e., roles_is_member_of()) is responsible for freeing roles_list_bf once
+ * it is done using this function.
+ */
+static inline List *
+roles_list_append(List *roles_list, Oid role)
+{
+	unsigned char *roleptr = (unsigned char *) &role;
+
+	/*
+	 * If there is a previously-created Bloom filter, use it to determine
+	 * whether the role is missing from the list.  Otherwise, do an ordinary
+	 * linear search through the existing role list.
+	 */
+	if ((roles_list_bf &&
+		 bloom_lacks_element(roles_list_bf, roleptr, sizeof(Oid))) ||
+		!list_member_oid(roles_list, role))
+	{
+		/*
+		 * If the list is large, we take on the overhead of creating and
+		 * populating a Bloom filter to speed up future calls to this
+		 * function.
+		 */
+		if (!roles_list_bf &&
+			list_length(roles_list) > ROLES_LIST_BLOOM_THRESHOLD)
+		{
+			roles_list_bf = bloom_create(ROLES_LIST_BLOOM_THRESHOLD * 10,
+										 work_mem, 0);
+			foreach_oid(roleid, roles_list)
+				bloom_add_element(roles_list_bf,
+								  (unsigned char *) &roleid,
+								  sizeof(Oid));
+		}
+
+		/*
+		 * Finally, add the role to the list and the Bloom filter, if it
+		 * exists.
+		 */
+		roles_list = lappend_oid(roles_list, role);
+		if (roles_list_bf)
+			bloom_add_element(roles_list_bf, roleptr, sizeof(Oid));
+	}
+
+	return roles_list;
+}
+
 /*
  * Get a list of roles that the specified roleid is a member of
  *
@@ -5023,16 +5083,21 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 			 * graph, we must test for having already seen this role. It is
 			 * legal for instance to have both A->B and A->C->B.
 			 */
-			roles_list = list_append_unique_oid(roles_list, otherid);
+			roles_list = roles_list_append(roles_list, otherid);
 		}
 		ReleaseSysCacheList(memlist);
 
 		/* implement pg_database_owner implicit membership */
 		if (memberid == dba && OidIsValid(dba))
-			roles_list = list_append_unique_oid(roles_list,
-												ROLE_PG_DATABASE_OWNER);
+			roles_list = roles_list_append(roles_list, ROLE_PG_DATABASE_OWNER);
 	}
 
+	/*
+	 * Free the Bloom filter created by roles_list_append(), if there is one.
+	 */
+	if (roles_list_bf)
+		bloom_free(roles_list_bf);
+
 	/*
 	 * Copy the completed list into TopMemoryContext so it will persist.
 	 */
-- 
2.25.1

Reply via email to