On Tue, 16 Jan 2024 at 16:32, David Rowley <dgrowle...@gmail.com> wrote:
>
> While working on [1], I noticed some strange code in
> DiscreteKnapsack() which seems to be aiming to copy the Bitmapset.
>
> It's not that obvious at a casual glance, but:
>
> sets[j] = bms_del_members(sets[j], sets[j]);
>
> this is aiming to zero all the words in the set by passing the same
> set in both parameters.
>
> Now that 00b41463c changed Bitmapset to have NULL be the only valid
> representation of an empty set, this code no longer makes sense.  We
> may as well just bms_free() the original set and bms_copy() in the new
> set as the bms_del_members() call will always pfree the set anyway.
>
> I've done that in the attached.
>
> I did consider if we might want bms_merge_members() or
> bms_exchange_members() or some other function suitably named function
> to perform a del/add operation, but given the lack of complaints about
> any performance regressions here, I think it's not worthwhile.

After looking at this again and reading more code, I see that
DiscreteKnapsack() goes to some efforts to minimise memory
allocations.

The functions's header comment mentions "The bitmapsets are all
pre-initialized with an unused high bit so that memory allocation is
done only once.".

I tried adding some debugging output to track how many additional
allocations we're now causing as a result of 00b41463c.  Previously
there'd have just been max_weight allocations, but now there's those
plus the number that's mentioned for "frees" below.

NOTICE:  DiscreteKnapsack: frees = 373, max_weight = 140, extra = 266.43%
NOTICE:  DiscreteKnapsack: frees = 373, max_weight = 140, extra = 266.43%
NOTICE:  DiscreteKnapsack: frees = 267, max_weight = 100, extra = 267.00%
NOTICE:  DiscreteKnapsack: frees = 267, max_weight = 100, extra = 267.00%
NOTICE:  DiscreteKnapsack: frees = 200, max_weight = 140, extra = 142.86%
NOTICE:  DiscreteKnapsack: frees = 200, max_weight = 140, extra = 142.86%
NOTICE:  DiscreteKnapsack: frees = 30, max_weight = 40, extra = 75.00%
NOTICE:  DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE:  DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE:  DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%
NOTICE:  DiscreteKnapsack: frees = 110, max_weight = 60, extra = 183.33%

and by the looks of the code, the worst case is much worse.

Given that the code original code was written in a very deliberate way
to avoid reallocations, I now think that we should maintain that.

I've attached a patch which adds bms_replace_members(). It's basically
like bms_copy() but attempts to reuse the member from another set. I
considered if the new function should be called bms_copy_inplace(),
but left it as bms_replace_members() for now.

Now I wonder if this should be backpatched to PG16.

David
diff --git a/src/backend/lib/knapsack.c b/src/backend/lib/knapsack.c
index 13d800718f..439da1ad70 100644
--- a/src/backend/lib/knapsack.c
+++ b/src/backend/lib/knapsack.c
@@ -89,10 +89,7 @@ DiscreteKnapsack(int max_weight, int num_items,
                        {
                                /* copy sets[ow] to sets[j] without realloc */
                                if (j != ow)
-                               {
-                                       sets[j] = bms_del_members(sets[j], 
sets[j]);
-                                       sets[j] = bms_add_members(sets[j], 
sets[ow]);
-                               }
+                                       sets[j] = bms_replace_members(sets[j], 
sets[ow]);
 
                                sets[j] = bms_add_member(sets[j], i);
 
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index b0f908f978..8be61cd081 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -976,6 +976,50 @@ bms_add_members(Bitmapset *a, const Bitmapset *b)
        return result;
 }
 
+/*
+ * bms_replace_members
+ *             Remove all existing members from 'a' and repopulate the set 
with members
+ *             from 'b', recycling 'a', when possible.
+ */
+Bitmapset *
+bms_replace_members(Bitmapset *a, const Bitmapset *b)
+{
+       int i;
+
+       Assert(bms_is_valid_set(a));
+       Assert(bms_is_valid_set(b));
+
+       if (a == NULL)
+               return bms_copy(b);
+       if (b == NULL)
+       {
+               pfree(a);
+               return NULL;
+       }
+
+       if (a->nwords < b->nwords)
+               a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(b->nwords));
+
+       i = 0;
+       do
+       {
+               a->words[i] = b->words[i];
+       } while (++i < b->nwords);
+
+       a->nwords = b->nwords;
+
+#ifdef REALLOCATE_BITMAPSETS
+
+       /*
+        * There's no guarantee that the repalloc returned a new pointer, so 
copy
+        * and free unconditionally here.
+        */
+       a = bms_copy_and_free(a);
+#endif
+
+       return a;
+}
+
 /*
  * bms_add_range
  *             Add members in the range of 'lower' to 'upper' to the set.
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index efc8890ce6..906e8dcc15 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -109,6 +109,7 @@ extern BMS_Membership bms_membership(const Bitmapset *a);
 extern Bitmapset *bms_add_member(Bitmapset *a, int x);
 extern Bitmapset *bms_del_member(Bitmapset *a, int x);
 extern Bitmapset *bms_add_members(Bitmapset *a, const Bitmapset *b);
+extern Bitmapset *bms_replace_members(Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_add_range(Bitmapset *a, int lower, int upper);
 extern Bitmapset *bms_int_members(Bitmapset *a, const Bitmapset *b);
 extern Bitmapset *bms_del_members(Bitmapset *a, const Bitmapset *b);

Reply via email to