Re: dshash_find_or_insert vs. OOM

Robert Haas Wed, 18 Mar 2026 11:43:21 -0700

On Wed, Mar 18, 2026 at 11:21 AM Sami Imseih <[email protected]> wrote:
> Passing this as a flag seems OK with me. Not sure what other
> flags could be added in the future, but the flexibility is a good
> thing, IMO. I was debating if this should just be a dsh_params
> option, but maybe for the same dshash a caller may want OOM
> in some code path, and retry in some other. maybe?


Yeah, I think this is a property of the individual call to
dshash_find_or_insert(), rather than a property of the table. It does
seem likely that most callers will want the same behavior in all
cases, but it would be very sad if we embedded that idea into the
structure of the code and then somebody has a case where it's not what
they want, and I see no real reason why that couldn't happen. Also,
doing it that way wouldn't really follow existing precedents
(pg_malloc_extended, palloc_extended, MemoryContextAllocExtended,
dsa_allocate_extended).

> I did some testing by triggering an OOM, a no-OOM, and an OOM with
> eviction afterwards, as a sanity check. All looks good.
> I've attached the tests I created with other basic testing, as dshash
> lacks direct testing in general, if you're inclined to add them.

I tried running a coverage report for this. It's pretty good. It
doesn't cover these things:

- dshash_create: OOM while allocating the bucket array.
- dshash_find_or_insert_extended: OOM while inserting an item
(apparently, at least here, it instead hits OOM while resizing)
- dshash_delete_key: the case where the entry not found
- dshash_dump: not called at all
- resize: the case where we exit early due to a concurrent resize
- delete_item_from_bucket: the case where there is more than one item
in the bucket

I think that adding a call to dshash_dump() somewhere would probably
make sense, and I'd suggest also trying to delete a nonexistent key.
The rest don't seem worth worrying about.

On the code itself:

+    /* Verify all entries via find. */
+    for (int i = 0; i < count; i++)
+    {
+        entry = dshash_find(ht, &i, false);
+        if (entry == NULL)
+            elog(ERROR, "key %d not found", i);
+        dshash_release_lock(ht, entry);
+    }

You could verify that entry->key == i. The current code wouldn't
notice if the hash table returned a garbage pointer. You could
possibly also include some kind of a payload in each record and verify
that, e.g. set entry->value =
some_homomorphism_over_0_to_9(entry->key).

+    dsa_set_size_limit(area, dsa_minimum_size());

This is an abuse of the documented purpose of dsa_set_size_limit().
Seems better to just pick a constant here.

+    /* Insert until OOM — without NO_OOM flag, this should raise ERROR. */
+    for (;;)

I think it would be safer to code this as a fixed iteration count. For
example, if you choose the size limit so that we should run out of
memory after 10 entries, you could terminate this loop after 1000
iterations. That way, if something goes wrong, we're more likely to
see "expected out-of-memory, but completed all %d iterations" in the
log rather than a core dump or whatever.

I+    {
+        TestDshashEntry *entry;
+
+        entry = dshash_find_or_insert(ht, &key, &found);
+        dshash_release_lock(ht, entry);
+        key++;
+    }

In other places, you check for entry == NULL, but not here.

+    {
+        TestDshashEntry *entry;
+
+        entry = dshash_find_or_insert_extended(ht, &key, &found,
+                                               DSHASH_INSERT_NO_OOM);
+        if (entry == NULL)
+            elog(ERROR, "insert after eviction failed");
+        dshash_release_lock(ht, entry);
+    }

I just got in trouble for letting a bare block sneak into my code, so
now it's my turn to complain.

I suggest git config user.email in whatever directory you use to
generate patches like this, just to make life a bit easier for anyone
who might eventually be committing one of them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: dshash_find_or_insert vs. OOM

Reply via email to