Re: Bypassing cursors in postgres_fdw to enable parallel plans

Robert Haas Fri, 06 Mar 2026 12:51:25 -0800

On Mon, Jan 26, 2026 at 6:44 AM Rafia Sabih <[email protected]> wrote:
> Thanks Robert for your time and attention to this patch.
> Based on these review comments and an off list discussion about the design of 
> the patch, I have reworked the patch significantly.
> In this version, a tuplestore is added to the PgFdwScanState along with a 
> flag. Now, in case of a cursor switch, this tuplestore is filled with the 
> remaining tuples of the query. The corresponding flag is set to indicate that 
> the tuplestore is ready to be fetched. To remember the last query that was 
> executing, a pointer to its PgFdwScanState is maintained in the conn_state. 
> Note that we only need to remember just the very last query and once tuples 
> for it are fetched we can forget that pointer, because now its fsstate has 
> the remaining tuples in the tuplestore.
> So, this version of the patch looks simpler than before.
> A few test cases are also added in the attached patch to cover the different 
> scenarios in case of non-cursor mode.


+       DefineCustomBoolVariable("postgres_fdw.use_cursor",

Usually I would expect the GUC for a few feature to be defined in such
a way that a true values means to use the new feature and a false
value means not to do that. This is reversed. Maybe consider renaming
the GUC so that the sense is inverted.

+       /* To be used only in non-cursor mode */
+       Tuplestorestate *tuplestore;    /* Tuplestore to save the tuples of the
+
  * query for later fetc
h. */
+       TupleTableSlot *slot;           /* Slot to be used when
reading the tuple from
+                                                                * the
tuplestore */
+       bool            tuples_ready;   /* To indicate when tuplestore
is ready to be
+                                                                * read. */
+       int                     total_tuples;   /* total tuples in the
tuplestore. */

There's already a tuplestore_tuple_count() function, so it's not clear
why you need total_tuples. It's also not exactly clear what
tuples_ready means here: sure, it means that we're ready to read
tuples, but when does that happen? I think the /* To be used only in
non-cursor mode */ comment could use expansion, too. Maybe use this to
explain the idea that we're going to use a Tuplestore in non-cursor
mode when, and only when, there are overlapping scans.

+       fsstate->conn_state->last_query = NULL;

I think this should be called something like active_scan. last_query
makes it sound like the last query we ran, whether or not it is still
in progress. But the value that it points to is an PgFdwScanState, so
it's not a query, it's a scan, and we should only have this set to a
non-NULL value for as long as that scan is actually active (i.e. has
the exclusive use of the connection).

+ * This routine fetches all the tuples of a query and saves them in a
tuplestore.
+ * This is required when the result of a query is not completely
fetched but the control
+ * switches to a different query.
+ * A call to fetch_data is made from here, hence we need complete
ForeignScanState here.

This is a good explanation but the last sentence probably isn't
needed. Also, you don't mention that this only used in non-cursor mode
(or whatever we end up calling it).

+       if (conn->asyncStatus == PGASYNC_IDLE)
+       {
+               /* If the connection is not active then set up */
+               if (!PQsendQueryParams(conn, last_fsstate->query,
last_fsstate->numParams,
+                                                          NULL,
values, NULL, NULL, 0))
+                       pgfdw_report_error(NULL, conn, last_fsstate->query);
+
+               if (!PQsetChunkedRowsMode(conn, last_fsstate->fetch_size))
+                       pgfdw_report_error(NULL, conn, last_fsstate->query);
+       }

I think this is a problem for multiple reasons. One problem is that
the function header says that we use this "when the result of a query
is not completely fetched". But if the status where PGASYNC_IDLE, the
query has not even started yet. In that case, seems there is no reason
to call this function in the first place. We can just start the new
scan right away and wait to do anything about the other scan until
it's needed. Additionally, asyncStatus is not part of libpq's exposed
API. You're only able to access it here because you've included
libpq-int.h into postgres_fdw.h, but libpq-int.h is so-called because
it's "internal", so you should really be trying very hard not to use
it, and especially not to include it in another header file. I suggest
that there's probably just a bug here -- I don't think you should need
to check asyncStatus here in the first place.

+ if (pgfdw_use_cursor)
+ snprintf(sql, sizeof(sql), "CLOSE c%u",
+ fsstate->cursor_number);
+ else
+ close_cursor = true;

This is extremely confusing coding. If we're using a cursor, we write
some SQL into a buffer but don't actually do anything with the buffer
right away. But if we're not using a cursor, then we set a flag that
tells us to close a cursor ... which we're not using? I think the
logic updates in the function are not at all clear. You need to back
up and think more carefully about what the function is supposed to do
in each case. Look at how the function starts after your patch:

postgresReScanForeignScan(ForeignScanState *node)
{
    PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
    char        sql[64];
    PGresult   *res;
    bool        close_cursor = false;

    /* If we haven't created the cursor yet, nothing to do. */
    if (!fsstate->cursor_exists)
        return;

Well, if we're in non-cursor mode, that presumably means this function
always returns without doing anything, since there should not be a
cursor if we are not using with them. That can't be right.
Alternatively, maybe it means that the cursor_exists flag is now being
used to track something other than whether a cursor exists, which
wouldn't be right either. I don't know off the top of my head exactly
what this function should do in each case, but as the patch author,
it's your job to go through, figure that out, and make sure that it is
all correct and well-commented.

Similarly, you have a bunch of changes to a function called
create_cursor(), but why would a function that creates a cursor get
changed to handle a mode where we don't create cursors? Either the
function needs to be renamed, or there should be two separate
functions, or these changes don't do anything in the first place.

+ while (res != NULL)
+ res = pgfdw_get_result(fsstate->conn);

I don't see how this code can be right. We should know how many result
sets we expect and read exactly that number, not just read over and
over until there's nothing more. But even if we did want to read until
there's nothing more, we would need to free those result sets rather
than leaking them.

-
- /* Clean up */
- pfree(buf.data);

It's hard to understand why this would be a good idea.

  /*
  * We'll store the tuples in the batch_cxt.  First, flush the previous
  * batch.
  */
  fsstate->tuples = NULL;
- MemoryContextReset(fsstate->batch_cxt);
  oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);

It's also hard to understand why this would be a good idea. It seems
like it makes the comment false: we wouldn't be flushing the previous
batch any more. We'd just be forgetting that it existed and leaking
the memory.

+ /*
+ * In non-cursor mode, there are three options for the further
+ * processing: 1. If the tuplestore is already filled, retrieve the
+ * tuples from there. 2. Fetch the tuples till the end of the query
+ * and store them in tuplestore. 3. Perform a normal fetch and process
+ * the tuples.

I find this comment pretty confusing. First of all, it's not really
clear what performing a "normal fetch" means here. But also, this says
that in non-cursor mode there are three options, and that's followed
by an if-else block i.e. two options. Three isn't the same as two, so
something isn't right here.

The code that follows is quite involved. It's rather deeply nested in
places, so possibly some of it needs to be broken out into separate
functions. It contains another instance, of this, which has the same
problems I pointed out earlier:

+ /* There are no more tuples to fetch */
+ while (res != NULL)
+ res = pgfdw_get_result(conn);

If there are no more tuples to fetch, then why do we need to loop
fetching result sets until we get a NULL? And if we ever get any, we
leak them, which is also bad.

+ /*
+ * Reset cursor mode when in asynchronous mode as it is not supported in
+ * non-cursor mode
+ */
+ if (!pgfdw_use_cursor)
+ pgfdw_use_cursor = true;

It's not allowed to change a GUC variable from outside of the GUC
code. If you need to disable the optimization in certain cases, you
need a separate flag for that, which is initially set based on
pgfdw_use_cursor and then can be changed here or wherever is needed.
You can't change the GUC value itself. This would cause a variety of
problems, including the fact that the optimization would then stay
disabled for later queries in the session, but probably also some more
GUC-related stuff.

-
 SELECT current_database() AS current_database,
   current_setting('port') AS current_port
 \gset
-

Please avoid including unnecessary cosmetic changes in the patch file.

Overall, I think the test case changes are in much better shape than
they were in earlier patch versions. I suspect there might still be a
bit of room for improvement, but we can deal with that once some of
these other issues are sorted out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Bypassing cursors in postgres_fdw to enable parallel plans

Reply via email to