On 23/08/2024 22:05, Robert Haas wrote:
On Fri, Aug 23, 2024 at 2:48 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
If we're going to do this, I'd prefer a solution that doesn't force
API changes onto the vast majority of index AMs that don't have a
problem here.
That's a fair concern.
Yeah, although I don't think it's too bad. There are not that many
out-of-tree index AM implementations to begin with, and we do change
things often enough that any interesting AM implementation will likely
need a few #ifdef PG_VERSION blocks for each PostgreSQL major version
anyway. pgvector certainly does.
One way could be to formalize the hack we were just discussing:
"To refuse a proposed path, amcostestimate can set the path's
disabled_nodes value to anything larger than 1". I suspect that
that would actually be sufficient, since the path would then lose
to the seqscan path in add_path even if that were disabled; but
we could put in a hack to prevent it from getting add_path'd at all.
Another way could be to bless what hnsw is already doing:
"To refuse a proposed path, amcostestimate can return an
indexTotalCost of DBL_MAX" (or maybe insisting on +Inf would
be better). That would still require changes comparable to
what you specify above, but only in the core-code call path
not in every AM.
If just setting disabled_nodes to a value larger than one works, I'd
be inclined to not do anything here at all, except possibly document
that you can do that. Otherwise, we should probably change the code
somehow.
Modifying the passed-in Path feels hacky. amcostestimate currently
returns all the estimates in *output parameters, it doesn't modify the
Path at all.
I find both of your proposed solutions above to be pretty inelegant,
and I think if this problem occurred with a core AM, I'd push for an
API break rather than accept the ugliness. "This path is not valid
because the AM cannot support it", "this path is crazy expensive", and
"the user told us not to do it this way" are three different things,
and signalling two or more of them in the same way muddies the water
in a way that I don't like. API breaks aren't free, though, so I
certainly understand why you're not very keen to introduce one where
it can reasonably be avoided.
The +Inf approach seems fine to me. Or perhaps NaN. Your proposal would
certainly be the cleanest interface if we don't mind incurring churn to
AM implementations.
--
Heikki Linnakangas
Neon (https://neon.tech)