Re: Bypassing cursors in postgres_fdw to enable parallel plans

Rafia Sabih Fri, 10 Apr 2026 00:42:22 -0700

Rebased version of the patch is attached.
It is a minor change from the earlier version.


On Thu, 2 Apr 2026 at 20:00, Rafia Sabih <[email protected]> wrote:

>
>
> On Fri, 6 Mar 2026 at 21:51, Robert Haas <[email protected]> wrote:
>
>> On Mon, Jan 26, 2026 at 6:44 AM Rafia Sabih <[email protected]>
>> wrote:
>> > Thanks Robert for your time and attention to this patch.
>> > Based on these review comments and an off list discussion about the
>> design of the patch, I have reworked the patch significantly.
>> > In this version, a tuplestore is added to the PgFdwScanState along with
>> a flag. Now, in case of a cursor switch, this tuplestore is filled with the
>> remaining tuples of the query. The corresponding flag is set to indicate
>> that the tuplestore is ready to be fetched. To remember the last query that
>> was executing, a pointer to its PgFdwScanState is maintained in the
>> conn_state. Note that we only need to remember just the very last query and
>> once tuples for it are fetched we can forget that pointer, because now its
>> fsstate has the remaining tuples in the tuplestore.
>> > So, this version of the patch looks simpler than before.
>> > A few test cases are also added in the attached patch to cover the
>> different scenarios in case of non-cursor mode.
>>
>> +       DefineCustomBoolVariable("postgres_fdw.use_cursor",
>>
>> Usually I would expect the GUC for a few feature to be defined in such
>> a way that a true values means to use the new feature and a false
>> value means not to do that. This is reversed. Maybe consider renaming
>> the GUC so that the sense is inverted.
>>
>> +       /* To be used only in non-cursor mode */
>> +       Tuplestorestate *tuplestore;    /* Tuplestore to save the tuples
>> of the
>> +
>>   * query for later fetc
>> h. */
>> +       TupleTableSlot *slot;           /* Slot to be used when
>> reading the tuple from
>> +                                                                * the
>> tuplestore */
>> +       bool            tuples_ready;   /* To indicate when tuplestore
>> is ready to be
>> +                                                                * read.
>> */
>> +       int                     total_tuples;   /* total tuples in the
>> tuplestore. */
>>
>> There's already a tuplestore_tuple_count() function, so it's not clear
>> why you need total_tuples. It's also not exactly clear what
>> tuples_ready means here: sure, it means that we're ready to read
>> tuples, but when does that happen? I think the /* To be used only in
>> non-cursor mode */ comment could use expansion, too. Maybe use this to
>> explain the idea that we're going to use a Tuplestore in non-cursor
>> mode when, and only when, there are overlapping scans.
>>
>> +       fsstate->conn_state->last_query = NULL;
>>
>> I think this should be called something like active_scan. last_query
>> makes it sound like the last query we ran, whether or not it is still
>> in progress. But the value that it points to is an PgFdwScanState, so
>> it's not a query, it's a scan, and we should only have this set to a
>> non-NULL value for as long as that scan is actually active (i.e. has
>> the exclusive use of the connection).
>>
>> + * This routine fetches all the tuples of a query and saves them in a
>> tuplestore.
>> + * This is required when the result of a query is not completely
>> fetched but the control
>> + * switches to a different query.
>> + * A call to fetch_data is made from here, hence we need complete
>> ForeignScanState here.
>>
>> This is a good explanation but the last sentence probably isn't
>> needed. Also, you don't mention that this only used in non-cursor mode
>> (or whatever we end up calling it).
>>
>> +       if (conn->asyncStatus == PGASYNC_IDLE)
>> +       {
>> +               /* If the connection is not active then set up */
>> +               if (!PQsendQueryParams(conn, last_fsstate->query,
>> last_fsstate->numParams,
>> +                                                          NULL,
>> values, NULL, NULL, 0))
>> +                       pgfdw_report_error(NULL, conn,
>> last_fsstate->query);
>> +
>> +               if (!PQsetChunkedRowsMode(conn, last_fsstate->fetch_size))
>> +                       pgfdw_report_error(NULL, conn,
>> last_fsstate->query);
>> +       }
>>
>> I think this is a problem for multiple reasons. One problem is that
>> the function header says that we use this "when the result of a query
>> is not completely fetched". But if the status where PGASYNC_IDLE, the
>> query has not even started yet. In that case, seems there is no reason
>> to call this function in the first place. We can just start the new
>> scan right away and wait to do anything about the other scan until
>> it's needed. Additionally, asyncStatus is not part of libpq's exposed
>> API. You're only able to access it here because you've included
>> libpq-int.h into postgres_fdw.h, but libpq-int.h is so-called because
>> it's "internal", so you should really be trying very hard not to use
>> it, and especially not to include it in another header file. I suggest
>> that there's probably just a bug here -- I don't think you should need
>> to check asyncStatus here in the first place.
>>
>> + if (pgfdw_use_cursor)
>> + snprintf(sql, sizeof(sql), "CLOSE c%u",
>> + fsstate->cursor_number);
>> + else
>> + close_cursor = true;
>>
>> This is extremely confusing coding. If we're using a cursor, we write
>> some SQL into a buffer but don't actually do anything with the buffer
>> right away. But if we're not using a cursor, then we set a flag that
>> tells us to close a cursor ... which we're not using? I think the
>> logic updates in the function are not at all clear. You need to back
>> up and think more carefully about what the function is supposed to do
>> in each case. Look at how the function starts after your patch:
>>
>> postgresReScanForeignScan(ForeignScanState *node)
>> {
>>     PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
>>     char        sql[64];
>>     PGresult   *res;
>>     bool        close_cursor = false;
>>
>>     /* If we haven't created the cursor yet, nothing to do. */
>>     if (!fsstate->cursor_exists)
>>         return;
>>
>> Well, if we're in non-cursor mode, that presumably means this function
>> always returns without doing anything, since there should not be a
>> cursor if we are not using with them. That can't be right.
>> Alternatively, maybe it means that the cursor_exists flag is now being
>> used to track something other than whether a cursor exists, which
>> wouldn't be right either. I don't know off the top of my head exactly
>> what this function should do in each case, but as the patch author,
>> it's your job to go through, figure that out, and make sure that it is
>> all correct and well-commented.
>>
>> Similarly, you have a bunch of changes to a function called
>> create_cursor(), but why would a function that creates a cursor get
>> changed to handle a mode where we don't create cursors? Either the
>> function needs to be renamed, or there should be two separate
>> functions, or these changes don't do anything in the first place.
>>
>> + while (res != NULL)
>> + res = pgfdw_get_result(fsstate->conn);
>>
>> I don't see how this code can be right. We should know how many result
>> sets we expect and read exactly that number, not just read over and
>> over until there's nothing more. But even if we did want to read until
>> there's nothing more, we would need to free those result sets rather
>> than leaking them.
>>
>> -
>> - /* Clean up */
>> - pfree(buf.data);
>>
>> It's hard to understand why this would be a good idea.
>>
>>   /*
>>   * We'll store the tuples in the batch_cxt.  First, flush the previous
>>   * batch.
>>   */
>>   fsstate->tuples = NULL;
>> - MemoryContextReset(fsstate->batch_cxt);
>>   oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
>>
>> It's also hard to understand why this would be a good idea. It seems
>> like it makes the comment false: we wouldn't be flushing the previous
>> batch any more. We'd just be forgetting that it existed and leaking
>> the memory.
>>
>> + /*
>> + * In non-cursor mode, there are three options for the further
>> + * processing: 1. If the tuplestore is already filled, retrieve the
>> + * tuples from there. 2. Fetch the tuples till the end of the query
>> + * and store them in tuplestore. 3. Perform a normal fetch and process
>> + * the tuples.
>>
>> I find this comment pretty confusing. First of all, it's not really
>> clear what performing a "normal fetch" means here. But also, this says
>> that in non-cursor mode there are three options, and that's followed
>> by an if-else block i.e. two options. Three isn't the same as two, so
>> something isn't right here.
>>
>> The code that follows is quite involved. It's rather deeply nested in
>> places, so possibly some of it needs to be broken out into separate
>> functions. It contains another instance, of this, which has the same
>> problems I pointed out earlier:
>>
>> + /* There are no more tuples to fetch */
>> + while (res != NULL)
>> + res = pgfdw_get_result(conn);
>>
>> If there are no more tuples to fetch, then why do we need to loop
>> fetching result sets until we get a NULL? And if we ever get any, we
>> leak them, which is also bad.
>>
>> + /*
>> + * Reset cursor mode when in asynchronous mode as it is not supported in
>> + * non-cursor mode
>> + */
>> + if (!pgfdw_use_cursor)
>> + pgfdw_use_cursor = true;
>>
>> It's not allowed to change a GUC variable from outside of the GUC
>> code. If you need to disable the optimization in certain cases, you
>> need a separate flag for that, which is initially set based on
>> pgfdw_use_cursor and then can be changed here or wherever is needed.
>> You can't change the GUC value itself. This would cause a variety of
>> problems, including the fact that the optimization would then stay
>> disabled for later queries in the session, but probably also some more
>> GUC-related stuff.
>>
>> -
>>  SELECT current_database() AS current_database,
>>    current_setting('port') AS current_port
>>  \gset
>> -
>>
>> Please avoid including unnecessary cosmetic changes in the patch file.
>>
>> Overall, I think the test case changes are in much better shape than
>> they were in earlier patch versions. I suspect there might still be a
>> bit of room for improvement, but we can deal with that once some of
>> these other issues are sorted out.
>>
>> --
>> Robert Haas
>> EDB: http://www.enterprisedb.com
>>
>
> Thank you for the detailed review of the patch. Please find the attached
> file for the updated patch.
> To summarise the changes done in this patch following these review
> comments,
> - the new GUC is called disable_cursor, so when we want to use this
> feature we set it, otherwise it remains false by default. I hope this makes
> it easier to understand as you mentioned.
>
> - Simplified the code in fetch_more_data for this new mode. Now we have a
> function called fetch_from_tuplestore, and it is called from
> fetch_more_data when tuples from tuplestore are requested. Another function
> called save_to_tuplestore is also introduced in this patch, it is called
> from create_cursor code when all the tuples are to be fetched for the
> active_scan. So things look better code readability wise.
>
> - For handling the async mode, I went to one level up and modified
> async_capable to false when no_cursor mode is used. So, now we do not
> change any GUC as such.
>
> - With regards to use of flag cursor_exists and functions create_cursor
> and close_cursor, these are used in this no_cursor mode as well. For the
> flag cursor_exists, I used it to check if we have already done the required
> steps for when disale_cursor is set, like setting the ChunkedRowsMode,
> saving the tuples to the tuplestore if required. Inherently there is no
> nothing in fsstate or conn_state through which we can identify that all the
> required steps are done or if this scan is coming for the first time, so
> maybe we can have a different flag for the case when disable_cursor is set
> and use it for such checks. But to me it feels like additional code when I
> can use existing vars, but I agree on making it more readable that way. So,
> I modified in this patch to add a separate flag called scan_init and
> functions init_scan and end_scan for the setting and resetting of the
> required values when cursors are not used. So, let me know what you think
> is better using the existing flag or creating a new one for this mode as
> done in this patch.
>
> - For this change,
>   /*
>   * We'll store the tuples in the batch_cxt.  First, flush the previous
>   * batch.
>   */
>   fsstate->tuples = NULL;
> - MemoryContextReset(fsstate->batch_cxt);
>   oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
> since, the tuplestore is now created in batch_cxt, so we can't reset it
> here. But I now changed it to be done only when disable_cursor is set.
>
> - Regarding the following change,
> +       if (conn->asyncStatus == PGASYNC_IDLE)
> +       {
> +               /* If the connection is not active then set up */
> When we are here after the call to the rescan, the connection is as per
> the current fsstate which has not yet started, hence the idle status, but
> we are here to fetch the tuples for the active scan which got reset after
> the rescan so it is required to set it up again. But you are right in that
> using asyncStatus is not appropriate here, so I added a new flag to check
> for rescan, which is set in rescan and used here to decide if we need to
> set the ChunkedRowsMode.
>
> - Modified and added more comments
> --
> Regards,
> Rafia Sabih
> CYBERTEC PostgreSQL International GmbH
>


-- 
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH

v8-Fetch-without-cursors.patch
Description: Binary data

Re: Bypassing cursors in postgres_fdw to enable parallel plans

Reply via email to