Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Alexander Lakhin writes: > 08.04.2024 18:08, Tom Lane wrote: >> Hmm, the point about recursion is still valid isn't it? I agree the >> reference to ExecQueryUsingCursor is obsolete, but I think we need to >> reconstruct what this comment is actually talking about. It's >> certainly pretty obscure ... > Sorry, I wasn't clear enough, I meant to remove only that reference, not > the quoted comment altogether. After looking at it, I realized that the comment's last sentence was also out of date, since SendQuery() isn't where the check of gexec_flag happens any more. I concluded that documenting the behavior of other functions here isn't such a hot idea, and removed both sentences in favor of expanding the relevant comments in ExecQueryAndProcessResults. While doing that, I compared the normal and chunked-fetch code paths in ExecQueryAndProcessResults more carefully, and realized that the patch was a few other bricks shy of a load: * it didn't honor pset.queryFout; * it ignored the passed-in "printQueryOpt *opt" (maybe that's always NULL, but doesn't seem like a great assumption); * it failed to call PrintQueryStatus, so that INSERT RETURNING and the like would print a status line only in non-FETCH_COUNT mode. I cleaned all that up at c21d4c416. BTW, I had to reverse-engineer the exact reasoning for the cases where we don't honor FETCH_COUNT. Most of them are clear enough, but I'm not totally sure about \watch. I wrote: + * * We're doing \watch: users probably don't want us to force use of the + * pager for that, plus chunking could break the min_rows check. It would not be terribly hard to make the chunked-fetch code path handle min_rows correctly, and AFAICS the only other thing that is_watch does differently is to not do SetResultVariables, which we could match easily enough. So this is really down to whether forcing pager mode is okay for a \watch'd query. I wonder if that was actually Daniel's reasoning for excluding \watch, and how strong that argument really is. regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Alexander Lakhin wrote: > >> Now that ExecQueryUsingCursor() is gone, it's not clear, what does > >> the following comment mean:? > >> * We must turn off gexec_flag to avoid infinite recursion. Note that > >> * this allows ExecQueryUsingCursor to be applied to the individual > >> query > >> * results. > > Hmm, the point about recursion is still valid isn't it? I agree the > > reference to ExecQueryUsingCursor is obsolete, but I think we need to > > reconstruct what this comment is actually talking about. It's > > certainly pretty obscure ... > > Sorry, I wasn't clear enough, I meant to remove only that reference, not > the quoted comment altogether. The comment might want to stress the fact that psql honors FETCH_COUNT "on top of" \gset, so if the user issues for instance: select 'select ' || i from generate_series(1,) as i \gexec what's going to be sent to the server is a series of: BEGIN DECLARE _psql_cursor NO SCROLL CURSOR FOR select FETCH FORWARD FROM _psql_cursor (possibly repeated) CLOSE _psql_cursor COMMIT Another choice would be to ignore FETCH_COUNT and send exactly the queries that \gset produces, with the assumption that it better matches the user's expectation. Maybe that alternative was considered and the comment reflects the decision. Since the new implementation doesn't rewrite the user-supplied queries, the point is moot, and this part should be removed: "Note that this allows ExecQueryUsingCursor to be applied to the individual query results" I'll wait a bit for other comments and submit a patch. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
08.04.2024 18:08, Tom Lane wrote: Alexander Lakhin writes: Now that ExecQueryUsingCursor() is gone, it's not clear, what does the following comment mean:? * We must turn off gexec_flag to avoid infinite recursion. Note that * this allows ExecQueryUsingCursor to be applied to the individual query * results. Hmm, the point about recursion is still valid isn't it? I agree the reference to ExecQueryUsingCursor is obsolete, but I think we need to reconstruct what this comment is actually talking about. It's certainly pretty obscure ... Sorry, I wasn't clear enough, I meant to remove only that reference, not the quoted comment altogether. Best regards, Alexander
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Alexander Lakhin writes: > Now that ExecQueryUsingCursor() is gone, it's not clear, what does > the following comment mean:? > * We must turn off gexec_flag to avoid infinite recursion. Note that > * this allows ExecQueryUsingCursor to be applied to the individual query > * results. Hmm, the point about recursion is still valid isn't it? I agree the reference to ExecQueryUsingCursor is obsolete, but I think we need to reconstruct what this comment is actually talking about. It's certainly pretty obscure ... regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hello Daniel and Tom, 08.04.2024 17:25, Daniel Verite wrote: So I whacked the patch around till I liked it better, and pushed it. Thanks for taking care of this! Now that ExecQueryUsingCursor() is gone, it's not clear, what does the following comment mean:? * We must turn off gexec_flag to avoid infinite recursion. Note that * this allows ExecQueryUsingCursor to be applied to the individual query * results. Shouldn't it be removed? Best regards, Alexander
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Tom Lane wrote: > I've reconsidered after realizing that implementing FETCH_COUNT > atop traditional single-row mode would require either merging > single-row results into a bigger PGresult or persuading psql's > results-printing code to accept an array of PGresults not just > one. Either of those would be expensive and ugly, not to mention > needing chunks of code we don't have today. Yes, we must accumulate results because the aligned format needs to know the columns widths for a entire "page", and the row-by-row logic does not fit that well in that case. One of the posted patches implemented this with an array of PGresult in single-row mode [1] but I'm confident that the newer version you pushed with the libpq changes is a better approach. > So I whacked the patch around till I liked it better, and pushed it. Thanks for taking care of this! [1] https://www.postgresql.org/message-id/092583fb-97c5-428f-8d99-fd31be4a5...@manitou-mail.org Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
So what was really bothering me about this patchset was that I didn't think marginal performance gains were a sufficient reason to put a whole different operating mode into libpq. However, I've reconsidered after realizing that implementing FETCH_COUNT atop traditional single-row mode would require either merging single-row results into a bigger PGresult or persuading psql's results-printing code to accept an array of PGresults not just one. Either of those would be expensive and ugly, not to mention needing chunks of code we don't have today. Also, it doesn't really need to be a whole different operating mode. There's no reason that single-row mode shouldn't be exactly equivalent to chunk mode with chunk size 1, except for the result status code. (We've got to keep PGRES_SINGLE_TUPLE for the old behavior, but using that for a chunked result would be too confusing.) So I whacked the patch around till I liked it better, and pushed it. I hope my haste will not come back to bite me, but we are getting pretty hard up against the feature-freeze deadline. regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Tom Lane wrote: > > I should say that I've noticed significant latency improvements with > > FETCH_COUNT retrieving large resultsets, such that it would benefit > > non-interactive use cases. > > Do you have a theory for why that is? It's pretty counterintuitive > that it would help at all. I've been thinking that it's a kind of pipeline/parallelism effect. When libpq accumulates all rows in one resultset, if the network or the server are not fast enough, it spends a certain amount of time waiting for the data to come in. But when it accumulates fewer rows and gives back control to the app to display intermediate results, during that time the network buffers can fill in, resulting, I assume, in less time waiting overall. I think the benefit is similar to what we get with \copy. In fact with the above-mentioned test, the execution times with FETCH_COUNT=1000 look very close to \copy of the same query. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
"Daniel Verite" writes: > Tom Lane wrote: >> I do not buy that psql's FETCH_COUNT mode is a sufficient reason >> to add it. FETCH_COUNT mode is not something you'd use >> non-interactively > I should say that I've noticed significant latency improvements with > FETCH_COUNT retrieving large resultsets, such that it would benefit > non-interactive use cases. Do you have a theory for why that is? It's pretty counterintuitive that it would help at all. regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Tom Lane wrote: > I do not buy that psql's FETCH_COUNT mode is a sufficient reason > to add it. FETCH_COUNT mode is not something you'd use > non-interactively I should say that I've noticed significant latency improvements with FETCH_COUNT retrieving large resultsets, such that it would benefit non-interactive use cases. For instance, with the current v7 patch, a query like the OP's initial case and batches of 1000 rows: $ cat fetchcount-test.sql select repeat('a', 100) || '-' || i || '-' || repeat('b', 500) as total_pat from generate_series(1, 500) as i \g /dev/null $ export TIMEFORMAT=%R $ for s in $(seq 1 10); do time /usr/local/pgsql/bin/psql -At \ -v FETCH_COUNT=1000 -f fetchcount-test.sql; done 3.597 3.413 3.362 3.612 3.377 3.416 3.346 3.368 3.504 3.413 => Average elapsed time = 3.44s Now without FETCH_COUNT, fetching the 5 million rows in one resultset: $ for s in $(seq 1 10); do time /usr/local/pgsql/bin/psql -At \ -f fetchcount-test.sql; done 4.200 4.178 4.200 4.169 4.195 4.217 4.197 4.234 4.225 4.242 => Average elapsed time = 4.20s By comparison the unpatched version (cursor-based method) gives these execution times with FETCH_COUNT=1000: 4.458 4.448 4.476 4.455 4.450 4.466 4.395 4.429 4.387 4.473 => Average elapsed time = 4.43s Now that's just one test, but don't these numbers look good? Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
"Daniel Verite" writes: > Updated patches are attached. I started to look through this, and almost immediately noted - - Retrieving Query Results Row-by-Row + + Retrieving Query Results in chunks This is a bit problematic, because changing the sect1 ID will change the page's URL, eg https://www.postgresql.org/docs/current/libpq-single-row-mode.html Aside from possibly breaking people's bookmarks, I'm pretty sure this will cause the web docs framework to not recognize any cross-version commonality of the page. How ugly would it be if we left the ID alone? Another idea could be to leave the whole page alone and add a new for chunked mode. But ... TBH I'm not convinced that we need the chunked mode at all. We explicitly rejected that idea back when single-row mode was designed, see around here: https://www.postgresql.org/message-id/flat/50173BF7.1070801%40Yahoo.com#7f92ebad0143fb5f575ecb3913c5ce88 and I'm still very skeptical that there's much win to be had. I do not buy that psql's FETCH_COUNT mode is a sufficient reason to add it. FETCH_COUNT mode is not something you'd use non-interactively, and there is enough overhead elsewhere in psql (notably in result-set formatting) that it doesn't seem worth micro-optimizing the part about fetching from libpq. (I see that there was some discussion in that old thread about micro-optimizing single-row mode internally to libpq by making PGresult creation cheaper, which I don't think anyone ever got back to doing. Maybe we should resurrect that.) regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Laurenz Albe wrote: > Here is the code review for patch number 2: > +static void > +CloseGOutput(FILE *gfile_fout, bool is_pipe) > > It makes sense to factor out this code. > But shouldn't these functions have a prototype at the beginning of the file? Looking at the other static functions in psql/common.c, there are 22 of them but only 3 have prototypes at the top of the file. These 3 functions are called before being defined, so these prototypes are mandatory. The other static functions that are defined before being called happen not to have forward declarations, so SetupGOutput() and CloseGOutput() follow that model. > Here is a suggestion for a consolidated comment: > > Fetch the result in chunks if FETCH_COUNT is set. We don't enable chunking > if SHOW_ALL_RESULTS is false, since that requires us to accumulate all rows > before we can tell what should be displayed, which would counter the idea > of FETCH_COUNT. Chunk fetching is also disabled if \gset, \crosstab, > \gexec and \watch are used. OK, done like that. > > + if (fetch_count > 0 && result_status == PGRES_TUPLES_CHUNK) > > Could it be that result_status == PGRES_TUPLES_CHUNK, but fetch_count is 0? > if not, perhaps there should be an Assert that verifies that, and the "if" > statement should only check for the latter condition. Good point. In fact it can be simplified to if (result_status == PGRES_TUPLES_CHUNK), and fetch_count as a variable can be removed from the function. Done that way. > > --- a/src/bin/psql/t/001_basic.pl > > +++ b/src/bin/psql/t/001_basic.pl > > @@ -184,10 +184,10 @@ like( > > "\\set FETCH_COUNT 1\nSELECT error;\n\\errverbose", > > on_error_stop => 0))[2], > > qr/\A^psql::2: ERROR: .*$ > > -^LINE 2: SELECT error;$ > > +^LINE 1: SELECT error;$ > > ^ *^.*$ > > ^psql::3: error: ERROR: [0-9A-Z]{5}: .*$ > > -^LINE 2: SELECT error;$ > > +^LINE 1: SELECT error;$ > > Why does the output change? Perhaps there is a good and harmless > explanation, but the naïve expectation would be that it doesn't. Unpatched, psql builds this query: DECLARE _psql_cursor NO SCROLL CURSOR FOR \n therefore the user query starts at line 2. With the patch, the user query is sent as-is, starting at line1, hence the different error location. > After fixing the problem manually, it builds without warning. > The regression tests pass, and the feature works as expected. Thanks for testing. Updated patches are attached. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From 0fefaee7d5b3003ad0d089ea9e92675c6f50245f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Mon, 1 Apr 2024 19:46:20 +0200 Subject: [PATCH v7 1/2] Implement retrieval of results in chunks with libpq. This mode is similar to the single-row mode except that chunks of results contain up to N rows instead of a single row. It is meant to reduce the overhead of the row-by-row allocations for large result sets. The mode is selected with PQsetChunkedRowsMode(int maxRows) and results have the new status code PGRES_TUPLES_CHUNK. --- doc/src/sgml/libpq.sgml | 98 +++ .../libpqwalreceiver/libpqwalreceiver.c | 1 + src/bin/pg_amcheck/pg_amcheck.c | 1 + src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c| 117 +++--- src/interfaces/libpq/libpq-fe.h | 4 +- src/interfaces/libpq/libpq-int.h | 7 +- 7 files changed, 185 insertions(+), 44 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d3e87056f2..1814921d5a 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3545,7 +3545,20 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult contains a single result tuple from the current command. This status occurs only when single-row mode has been selected for the query -(see ). +(see ). + + + + + + PGRES_TUPLES_CHUNK + + +The PGresult contains several tuples +from the current command. The count of tuples cannot exceed +the maximum passed to . +This status occurs only when the chunked mode has been selected +for the query (see ). @@ -5197,8 +5210,8 @@ PGresult *PQgetResult(PGconn *conn); Another frequently-desired feature that can be obtained with and - is retrieving large query results a row at a time. This is discussed - in . + is retrieving large query results a limited number of rows at a time. This is discussed + in . @@ -5562,12 +5575,13 @@ int PQflush(PGconn *conn); - To enter single-row mode, call PQsetSingleRowMode - before
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Laurenz Albe wrote: > I had a look at patch 0001 (0002 will follow). Thanks for reviewing this! I've implemented the suggested doc changes. A patch update will follow with the next part of the review. > > --- a/src/interfaces/libpq/fe-exec.c > > +++ b/src/interfaces/libpq/fe-exec.c > > @@ -41,7 +41,8 @@ char *const pgresStatus[] = { > > "PGRES_COPY_BOTH", > > "PGRES_SINGLE_TUPLE", > > "PGRES_PIPELINE_SYNC", > > - "PGRES_PIPELINE_ABORTED" > > + "PGRES_PIPELINE_ABORTED", > > + "PGRES_TUPLES_CHUNK" > > }; > > I think that PGRES_SINGLE_TUPLE and PGRES_TUPLES_CHUNK should be next to > each other, but that's no big thing. > The same applies to the change in src/interfaces/libpq/libpq-fe.h I assume we can't renumber/reorder existing values, otherwise it would be an ABI break. We can only add new values. > I understand that we need to keep the single-row mode for compatibility > reasons. But I think that under the hood, "single-row mode" should be the > same as "chunk mode with chunk size one". I've implemented it like that at first, and wasn't thrilled with the result. libpq still has to return PGRES_SINGLE_TUPLE in single-row mode and PGRES_TUPLES_CHUNK with chunks of size 1, so the mutualization did not work that well in practice. I also contemplated not creating PGRES_TUPLES_CHUNK and instead using PGRES_SINGLE_TUPLE for N rows, but I found it too ugly. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Fri, 2024-03-29 at 14:07 +0100, Laurenz Albe wrote: > I had a look at patch 0001 (0002 will follow). Here is the code review for patch number 2: > diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c [...] +static bool +SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe) [...] +static void +CloseGOutput(FILE *gfile_fout, bool is_pipe) It makes sense to factor out this code. But shouldn't these functions have a prototype at the beginning of the file? > + /* > +* If FETCH_COUNT is set and the context allows it, use the single row > +* mode to fetch results and have no more than FETCH_COUNT rows in > +* memory. > +*/ That comment talks about single-row mode, whey you are using chunked mode. You probably forgot to modify the comment from a previous version of the patch. > + if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && > !is_watch > + && !pset.gset_prefix && pset.show_all_results) > + { > + /* > +* The row-by-chunks fetch is not enabled when SHOW_ALL_RESULTS is > false, > +* since we would need to accumulate all rows before knowing > +* whether they need to be discarded or displayed, which contradicts > +* FETCH_COUNT. > +*/ > + if (!PQsetChunkedRowsMode(pset.db, fetch_count)) > + { I think that comment should be before the "if" statement, not inside it. Here is a suggestion for a consolidated comment: Fetch the result in chunks if FETCH_COUNT is set. We don't enable chunking if SHOW_ALL_RESULTS is false, since that requires us to accumulate all rows before we can tell what should be displayed, which would counter the idea of FETCH_COUNT. Chunk fetching is also disabled if \gset, \crosstab, \gexec and \watch are used. > + if (fetch_count > 0 && result_status == PGRES_TUPLES_CHUNK) Could it be that result_status == PGRES_TUPLES_CHUNK, but fetch_count is 0? if not, perhaps there should be an Assert that verifies that, and the "if" statement should only check for the latter condition. > --- a/src/bin/psql/t/001_basic.pl > +++ b/src/bin/psql/t/001_basic.pl > @@ -184,10 +184,10 @@ like( > "\\set FETCH_COUNT 1\nSELECT error;\n\\errverbose", > on_error_stop => 0))[2], > qr/\A^psql::2: ERROR: .*$ > -^LINE 2: SELECT error;$ > +^LINE 1: SELECT error;$ > ^ *^.*$ > ^psql::3: error: ERROR: [0-9A-Z]{5}: .*$ > -^LINE 2: SELECT error;$ > +^LINE 1: SELECT error;$ Why does the output change? Perhaps there is a good and harmless explanation, but the naïve expectation would be that it doesn't. The patch does not apply any more because of a conflict with the non-blocking PQcancel patch. After fixing the problem manually, it builds without warning. The regression tests pass, and the feature works as expected. Yours, Laurenz Albe
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Tue, 2024-01-30 at 15:29 +0100, Daniel Verite wrote: > PFA a rebased version. I had a look at patch 0001 (0002 will follow). > - > - Retrieving Query Results Row-by-Row > + > + Retrieving Query Results by chunks That should be "in chunks". > + > + > + > + PQsetChunkedRowsMode > +PQsetChunkedRowsMode > + > + > + Select the mode retrieving results in chunks for the > currently-executing query. That is questionable English. How about Select to receive the results for the currently-executing query in chunks. > + This function is similar to linkend="libpq-PQsetSingleRowMode"/>, > + except that it can retrieve a user-specified number of rows > + per call to , instead of a single > row. The "user-specified number" is "maxRows". So a better wording would be: ... except that it can retrieve maxRows rows per call to instead of a single row. > -error. But in single-row mode, those rows will have already been > +error. But in single-row or chunked modes, those rows will have already > been I'd say it should be "in *the* single-row or chunk modes". > --- a/src/interfaces/libpq/fe-exec.c > +++ b/src/interfaces/libpq/fe-exec.c > @@ -41,7 +41,8 @@ char *const pgresStatus[] = { > "PGRES_COPY_BOTH", > "PGRES_SINGLE_TUPLE", > "PGRES_PIPELINE_SYNC", > - "PGRES_PIPELINE_ABORTED" > + "PGRES_PIPELINE_ABORTED", > + "PGRES_TUPLES_CHUNK" > }; I think that PGRES_SINGLE_TUPLE and PGRES_TUPLES_CHUNK should be next to each other, but that's no big thing. The same applies to the change in src/interfaces/libpq/libpq-fe.h I understand that we need to keep the single-row mode for compatibility reasons. But I think that under the hood, "single-row mode" should be the same as "chunk mode with chunk size one". That should save some code repetition. Yours, Laurenz Albe
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Jakub Wartak wrote: > when I run with default pager (more or less): > \set FETCH_COUNT 1000 > WITH data AS (SELECT generate_series(1, 2000) as Total) select > repeat('a',100) || data.Total || repeat('b', 800) as total_pat from > data; > -- it enters pager, a skip couple of pages and then "q" > > .. then - both backend and psql - go into 100% CPU as it were still > receiving Thanks for looking into this patch! What's happening after the pager has quit is that psql continues to pump results from the server until there are no more results. If the user wants to interrupt that, they should hit Ctrl+C to cancel the query. I think psql should not cancel it implicitly on their behalf, as it also cancels the transaction. The behavior differs from the cursor implementation, because in the cursor case, when the pager is displaying results, no query is running. The previous FETCH results have been entirely read, and the next FETCH has not been sent to the server yet. This is why quitting the pager in the middle of this can be dealt with instantly. > (that doesn't happen e.g. with export PAGER=cat). So I'm > not sure, maybe ExecQueryAndProcessResults() should somewhat > faster abort when the $PAGER is exiting normally(?). I assume that when using PAGER=cat, you cancel the display with Ctrl+C, which propagates to psql and have the effect to also cancel the query. In that case it displays "Cancel request sent", and then shortly after it gets back from the server: "ERROR: canceling statement due to user request". That case corresponds to the generic query canceling flow. OTOH if killing the "cat" process with kill -TERM I see the same behavior than with "more" or "less", that is postgres running the query to completion and psql pumping the results. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hi Daniel, On Tue, Jan 30, 2024 at 3:29 PM Daniel Verite wrote: > PFA a rebased version. Thanks for the patch! I've tested it using my original reproducer and it works great now against the original problem description. I've taken a quick look at the patch, it looks good for me. I've tested using -Werror for both gcc 10.2 and clang 11.0 and it was clean. I have one slight doubt: when I run with default pager (more or less): \set FETCH_COUNT 1000 WITH data AS (SELECT generate_series(1, 2000) as Total) select repeat('a',100) || data.Total || repeat('b', 800) as total_pat from data; -- it enters pager, a skip couple of pages and then "q" .. then - both backend and psql - go into 100% CPU as it were still receiving (that doesn't happen e.g. with export PAGER=cat). So I'm not sure, maybe ExecQueryAndProcessResults() should somewhat faster abort when the $PAGER is exiting normally(?). And oh , btw, in v6-0001 (so if you would be sending v7 for any other reason -- other reviewers -- maybe worth realigning it as detail): + int PQsetChunkedRowsMode(PGconn *conn, + int maxRows); but the code has (so "maxRows" != "chunkSize"): +PQsetChunkedRowsMode(PGconn *conn, int chunkSize) -J.
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
vignesh C wrote: > patching file src/interfaces/libpq/exports.txt > Hunk #1 FAILED at 191. > 1 out of 1 hunk FAILED -- saving rejects to file > src/interfaces/libpq/exports.txt.rej > > Please post an updated version for the same. PFA a rebased version. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From 8cfb82b1e36e96996637948a231ae35b9af1e074 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Tue, 30 Jan 2024 14:38:21 +0100 Subject: [PATCH v6 1/2] Implement retrieval of results in chunks with libpq. This mode is similar to the single-row mode except that chunks of results contain up to N rows instead of a single row. It is meant to reduce the overhead of the row-by-row allocations for large result sets. The mode is selected with PQsetChunkedRowsMode(int maxRows) and results have the new status code PGRES_TUPLES_CHUNK. --- doc/src/sgml/libpq.sgml | 96 ++ .../libpqwalreceiver/libpqwalreceiver.c | 1 + src/bin/pg_amcheck/pg_amcheck.c | 1 + src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c| 117 +++--- src/interfaces/libpq/libpq-fe.h | 4 +- src/interfaces/libpq/libpq-int.h | 7 +- 7 files changed, 184 insertions(+), 43 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index d0d5aefadc..f7f5a04df6 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult contains a single result tuple from the current command. This status occurs only when single-row mode has been selected for the query -(see ). +(see ). + + + + + + PGRES_TUPLES_CHUNK + + +The PGresult contains several tuples +from the current command. The count of tuples cannot exceed +the maximum passed to . +This status occurs only when the chunked mode has been selected +for the query (see ). @@ -5189,8 +5202,8 @@ PGresult *PQgetResult(PGconn *conn); Another frequently-desired feature that can be obtained with and - is retrieving large query results a row at a time. This is discussed - in . + is retrieving large query results a limited number of rows at a time. This is discussed + in . @@ -5554,12 +5567,13 @@ int PQflush(PGconn *conn); - To enter single-row mode, call PQsetSingleRowMode - before retrieving results with PQgetResult. - This mode selection is effective only for the query currently - being processed. For more information on the use of - PQsetSingleRowMode, - refer to . + To enter single-row or chunked modes, call + respectively PQsetSingleRowMode + or PQsetChunkedRowsMode before retrieving results + with PQgetResult. This mode selection is effective + only for the query currently being processed. For more information on the + use of these functions refer + to . @@ -5926,10 +5940,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; - - Retrieving Query Results Row-by-Row + + Retrieving Query Results by chunks - + libpq single-row mode @@ -5940,13 +5954,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; PGresult. This can be unworkable for commands that return a large number of rows. For such cases, applications can use and in - single-row mode. In this mode, the result row(s) are - returned to the application one at a time, as they are received from the - server. + single-row mode or chunked mode. + In these modes, the result row(s) are returned to the application one at a + time for the single-row mode and by chunks for the chunked mode, as they + are received from the server. - To enter single-row mode, call + To enter these modes, call +or immediately after a successful call of (or a sibling function). This mode selection is effective only for the currently executing query. Then call @@ -5954,7 +5970,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; linkend="libpq-async"/>. If the query returns any rows, they are returned as individual PGresult objects, which look like normal query results except for having status code - PGRES_SINGLE_TUPLE instead of + PGRES_SINGLE_TUPLE for the single-row mode and + PGRES_TUPLES_CHUNK for the chunked mode, instead of PGRES_TUPLES_OK. After the last row, or immediately if the query returns zero rows, a zero-row object with status PGRES_TUPLES_OK is returned; this is the signal that no @@ -5967,9 +5984,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; - When using pipeline mode,
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Tue, 2 Jan 2024 at 20:28, Daniel Verite wrote: > > Hi, > > PFA a rebased version. CFBot shows that the patch does not apply anymore as in [1]: === Applying patches on top of PostgreSQL commit ID a3a836fb5e51183eae624d43225279306c2285b8 === === applying patch ./v5-0001-Implement-retrieval-of-results-in-chunks-with-lib.patch patching file doc/src/sgml/libpq.sgml ... patching file src/backend/replication/libpqwalreceiver/libpqwalreceiver.c ... patching file src/interfaces/libpq/exports.txt Hunk #1 FAILED at 191. 1 out of 1 hunk FAILED -- saving rejects to file src/interfaces/libpq/exports.txt.rej Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_4233.log Regards, Vignesh
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hi, PFA a rebased version. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From cd0fe1d517a0e31e031fbbea1e603a715c77ea97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Tue, 2 Jan 2024 14:15:48 +0100 Subject: [PATCH v5 1/2] Implement retrieval of results in chunks with libpq. This mode is similar to the single-row mode except that chunks of results contain up to N rows instead of a single row. It is meant to reduce the overhead of the row-by-row allocations for large result sets. The mode is selected with PQsetChunkedRowsMode(int maxRows) and results have the new status code PGRES_TUPLES_CHUNK. --- doc/src/sgml/libpq.sgml | 96 ++ .../libpqwalreceiver/libpqwalreceiver.c | 1 + src/bin/pg_amcheck/pg_amcheck.c | 1 + src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c| 117 +++--- src/interfaces/libpq/libpq-fe.h | 4 +- src/interfaces/libpq/libpq-int.h | 7 +- 7 files changed, 184 insertions(+), 43 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index ed88ac001a..8007bf67d8 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult contains a single result tuple from the current command. This status occurs only when single-row mode has been selected for the query -(see ). +(see ). + + + + + + PGRES_TUPLES_CHUNK + + +The PGresult contains several tuples +from the current command. The count of tuples cannot exceed +the maximum passed to . +This status occurs only when the chunked mode has been selected +for the query (see ). @@ -5187,8 +5200,8 @@ PGresult *PQgetResult(PGconn *conn); Another frequently-desired feature that can be obtained with and - is retrieving large query results a row at a time. This is discussed - in . + is retrieving large query results a limited number of rows at a time. This is discussed + in . @@ -5551,12 +5564,13 @@ int PQflush(PGconn *conn); - To enter single-row mode, call PQsetSingleRowMode - before retrieving results with PQgetResult. - This mode selection is effective only for the query currently - being processed. For more information on the use of - PQsetSingleRowMode, - refer to . + To enter single-row or chunked modes, call + respectively PQsetSingleRowMode + or PQsetChunkedRowsMode before retrieving results + with PQgetResult. This mode selection is effective + only for the query currently being processed. For more information on the + use of these functions refer + to . @@ -5895,10 +5909,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; - - Retrieving Query Results Row-by-Row + + Retrieving Query Results by chunks - + libpq single-row mode @@ -5909,13 +5923,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; PGresult. This can be unworkable for commands that return a large number of rows. For such cases, applications can use and in - single-row mode. In this mode, the result row(s) are - returned to the application one at a time, as they are received from the - server. + single-row mode or chunked mode. + In these modes, the result row(s) are returned to the application one at a + time for the single-row mode and by chunks for the chunked mode, as they + are received from the server. - To enter single-row mode, call + To enter these modes, call +or immediately after a successful call of (or a sibling function). This mode selection is effective only for the currently executing query. Then call @@ -5923,7 +5939,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; linkend="libpq-async"/>. If the query returns any rows, they are returned as individual PGresult objects, which look like normal query results except for having status code - PGRES_SINGLE_TUPLE instead of + PGRES_SINGLE_TUPLE for the single-row mode and + PGRES_TUPLES_CHUNK for the chunked mode, instead of PGRES_TUPLES_OK. After the last row, or immediately if the query returns zero rows, a zero-row object with status PGRES_TUPLES_OK is returned; this is the signal that no @@ -5936,9 +5953,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; - When using pipeline mode, single-row mode needs to be activated for each - query in the pipeline before retrieving results for that query - with PQgetResult. + When using pipeline mode, the single-row or chunked mode need to be + activated for each query
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hi, Here's a new version to improve the performance of FETCH_COUNT and extend the cases when it can be used. Patch 0001 adds a new mode in libpq to allow the app to retrieve larger chunks of results than the single row of the row-by-row mode. The maximum number of rows per PGresult is set by the user. Patch 0002 uses that mode in psql and gets rid of the cursor implementation as suggested upthread. The performance numbers look good. For a query retrieving 50M rows of about 200 bytes: select repeat('abc', 200) from generate_series(1,500) /usr/bin/time -v psql -At -c $query reports these metrics (medians of 5 runs): version | fetch_count | clock_time | user_time | sys_time | max_rss_size (kB) ---+-++---+--+--- 16-stable | 0 | 6.58 | 3.98 | 2.09 | 3446276 16-stable | 100 | 9.25 | 4.10 | 1.90 | 8768 16-stable |1000 | 11.13 | 5.17 | 1.66 | 8904 17-patch | 0 |6.5 | 3.94 | 2.09 | 3442696 17-patch | 100 | 5 | 3.56 | 0.93 | 4096 17-patch |1000 | 6.48 | 4.00 | 1.55 | 4344 Interestingly, retrieving by chunks of 100 rows appears to be a bit faster than the default one big chunk. It means that independently of using less memory, FETCH_COUNT implemented that way would be a performance enhancement compared to both not using it and using it in v16 with the cursor implementation. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From 766bbe84def2db494f646caeaf29eefeba893c1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Mon, 20 Nov 2023 17:24:55 +0100 Subject: [PATCH v4 1/2] Implement retrieval of results in chunks with libpq. This mode is similar to the single-row mode except that chunks of results contain up to N rows instead of a single row. It is meant to reduce the overhead of the row-by-row allocations for large result sets. The mode is selected with PQsetChunkedRowsMode(int maxRows) and results have the new status code PGRES_TUPLES_CHUNK. --- doc/src/sgml/libpq.sgml | 96 +++-- src/bin/pg_amcheck/pg_amcheck.c | 1 + src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c | 118 +-- src/interfaces/libpq/libpq-fe.h | 4 +- src/interfaces/libpq/libpq-int.h | 7 +- 6 files changed, 183 insertions(+), 44 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index ed88ac001a..8007bf67d8 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res); The PGresult contains a single result tuple from the current command. This status occurs only when single-row mode has been selected for the query -(see ). +(see ). + + + + + + PGRES_TUPLES_CHUNK + + +The PGresult contains several tuples +from the current command. The count of tuples cannot exceed +the maximum passed to . +This status occurs only when the chunked mode has been selected +for the query (see ). @@ -5187,8 +5200,8 @@ PGresult *PQgetResult(PGconn *conn); Another frequently-desired feature that can be obtained with and - is retrieving large query results a row at a time. This is discussed - in . + is retrieving large query results a limited number of rows at a time. This is discussed + in . @@ -5551,12 +5564,13 @@ int PQflush(PGconn *conn); - To enter single-row mode, call PQsetSingleRowMode - before retrieving results with PQgetResult. - This mode selection is effective only for the query currently - being processed. For more information on the use of - PQsetSingleRowMode, - refer to . + To enter single-row or chunked modes, call + respectively PQsetSingleRowMode + or PQsetChunkedRowsMode before retrieving results + with PQgetResult. This mode selection is effective + only for the query currently being processed. For more information on the + use of these functions refer + to . @@ -5895,10 +5909,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; - - Retrieving Query Results Row-by-Row + + Retrieving Query Results by chunks - + libpq single-row mode @@ -5909,13 +5923,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42; PGresult. This can be unworkable for commands that return a large number of rows. For such cases, applications can use and in - single-row mode. In this mode, the result row(s) are - returned to the application
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Tom Lane wrote: > This gives me several "-Wincompatible-pointer-types" warnings > [...] > I think what you probably ought to do to avoid all that is to change > the arguments of PrintQueryResult and nearby routines to be "const > PGresult *result" not just "PGresult *result". The const-ness issue that I ignored in the previous patch is that while C is fine with passing T* to a function expecting const T*, it's not okay with passing T** to a function expecting const T**, or more generally converting T** to const T**. When callers need to pass arrays of PGresult* instead of const PGresult*, I've opted to remove the const qualifiers for the functions that are concerned by this change. PFA an updated patch. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 5973df2e39..476a9770f0 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error) { case PGRES_COMMAND_OK: case PGRES_TUPLES_OK: + case PGRES_SINGLE_TUPLE: case PGRES_EMPTY_QUERY: case PGRES_COPY_IN: case PGRES_COPY_OUT: @@ -695,7 +696,7 @@ PrintNotifications(void) * Returns true if successful, false otherwise. */ static bool -PrintQueryTuples(const PGresult *result, const printQueryOpt *opt, +PrintQueryTuples(PGresult *result, const printQueryOpt *opt, FILE *printQueryFout) { bool ok = true; @@ -1391,6 +1392,47 @@ DescribeQuery(const char *query, double *elapsed_msec) return OK; } +/* + * Check if an output stream for \g needs to be opened, and if + * yes, open it. + * Return false if an error occurred, true otherwise. + */ +static bool +SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe) +{ + ExecStatusType status = PQresultStatus(result); + if (pset.gfname != NULL && /* there is a \g file or program */ + *gfile_fout == NULL && /* and it's not already opened */ + (status == PGRES_TUPLES_OK || + status == PGRES_SINGLE_TUPLE || + status == PGRES_COPY_OUT)) + { + if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe)) + { + if (is_pipe) +disable_sigpipe_trap(); + } + else + return false; + } + return true; +} + +static void +CloseGOutput(FILE *gfile_fout, bool is_pipe) +{ + /* close \g file if we opened it */ + if (gfile_fout) + { + if (is_pipe) + { + SetShellResultVariables(pclose(gfile_fout)); + restore_sigpipe_trap(); + } + else + fclose(gfile_fout); + } +} /* * ExecQueryAndProcessResults: utility function for use by SendQuery() @@ -1422,10 +1464,18 @@ ExecQueryAndProcessResults(const char *query, bool success; instr_time before, after; + int fetch_count = pset.fetch_count; PGresult *result; + FILE *gfile_fout = NULL; bool gfile_is_pipe = false; + PGresult **result_array = NULL; /* to collect results in single row mode */ + int64 total_tuples = 0; + int ntuples; + int flush_error = 0; + bool is_pager = false; + if (timing) INSTR_TIME_SET_CURRENT(before); else @@ -1448,6 +1498,33 @@ ExecQueryAndProcessResults(const char *query, return -1; } + /* + * If FETCH_COUNT is set and the context allows it, use the single row + * mode to fetch results and have no more than FETCH_COUNT rows in + * memory. + */ + if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch + && !pset.gset_prefix && pset.show_all_results) + { + /* + * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false, + * since we would need to accumulate all rows before knowing + * whether they need to be discarded or displayed, which contradicts + * FETCH_COUNT. + */ + if (!PQsetSingleRowMode(pset.db)) + { + pg_log_warning("fetching results in single row mode is unavailable"); + fetch_count = 0; + } + else + { + result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*)); + } + } + else + fetch_count = 0; /* disable single-row mode */ + /* * If SIGINT is sent while the query is processing, the interrupt will be * consumed. The user's intention, though, is to cancel the entire watch @@ -1467,6 +1544,8 @@ ExecQueryAndProcessResults(const char *query, ExecStatusType result_status; PGresult *next_result; bool last; + /* whether the output starts before results are fully fetched */ + bool partial_display = false; if (!AcceptResult(result, false)) { @@ -1593,6 +1672,94 @@ ExecQueryAndProcessResults(const char *query, success &= HandleCopyResult(, copy_stream); } + if (fetch_count > 0 && result_status == PGRES_SINGLE_TUPLE) + { + FILE *tuples_fout = printQueryFout ? printQueryFout : stdout; + printQueryOpt my_popt = pset.popt; + + ntuples = 0; + total_tuples = 0; + partial_display = true; + + success = SetupGOutput(result, _fout, _is_pipe); + if (gfile_fout) +tuples_fout = gfile_fout; + + /* initialize print options for partial
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
"Daniel Verite" writes: > PFA an updated patch. This gives me several "-Wincompatible-pointer-types" warnings (as are also reported by the cfbot): common.c: In function 'ExecQueryAndProcessResults': common.c:1686:24: warning: passing argument 1 of 'PrintQueryTuples' from incompatible pointer type [-Wincompatible-pointer-types] PrintQueryTuples(result_array, ntuples, _popt, tuples_fout); ^~~~ common.c:679:35: note: expected 'const PGresult **' {aka 'const struct pg_result **'} but argument is of type 'PGresult **' {aka 'struct pg_result **'} PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt, ~^~ common.c:1720:24: warning: passing argument 1 of 'PrintQueryTuples' from incompatible pointer type [-Wincompatible-pointer-types] PrintQueryTuples(result_array, ntuples, _popt, tuples_fout); ^~~~ common.c:679:35: note: expected 'const PGresult **' {aka 'const struct pg_result **'} but argument is of type 'PGresult **' {aka 'struct pg_result **'} PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt, ~^~ I think the cause is the inconsistency about whether PGresult pointers are pointer-to-const or not. Even without compiler warnings, I find code like this very ugly: - success = PrintQueryTuples(result, opt, printQueryFout); + success = PrintQueryTuples((const PGresult**), 1, opt, printQueryFout); I think what you probably ought to do to avoid all that is to change the arguments of PrintQueryResult and nearby routines to be "const PGresult *result" not just "PGresult *result". I find it sad that we can't get rid of ExecQueryUsingCursor(). Maybe a little effort towards reducing overhead in the single-row mode would help? regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
I wrote: > Here's a POC patch implementing row-by-row fetching. PFA an updated patch. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index f907f5d4e8..ad5e8a5de9 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error) { case PGRES_COMMAND_OK: case PGRES_TUPLES_OK: + case PGRES_SINGLE_TUPLE: case PGRES_EMPTY_QUERY: case PGRES_COPY_IN: case PGRES_COPY_OUT: @@ -675,13 +676,13 @@ PrintNotifications(void) * Returns true if successful, false otherwise. */ static bool -PrintQueryTuples(const PGresult *result, const printQueryOpt *opt, +PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt, FILE *printQueryFout) { bool ok = true; FILE *fout = printQueryFout ? printQueryFout : pset.queryFout; - printQuery(result, opt ? opt : , fout, false, pset.logfile); + printQueryChunks(result, nresults, opt ? opt : , fout, false, pset.logfile); fflush(fout); if (ferror(fout)) { @@ -958,7 +959,7 @@ PrintQueryResult(PGresult *result, bool last, else if (last && pset.crosstab_flag) success = PrintResultInCrosstab(result); else if (last || pset.show_all_results) -success = PrintQueryTuples(result, opt, printQueryFout); +success = PrintQueryTuples((const PGresult**), 1, opt, printQueryFout); else success = true; @@ -1371,6 +1372,47 @@ DescribeQuery(const char *query, double *elapsed_msec) return OK; } +/* + * Check if an output stream for \g needs to be opened, and if + * yes, open it. + * Return false if an error occurred, true otherwise. + */ +static bool +SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe) +{ + ExecStatusType status = PQresultStatus(result); + if (pset.gfname != NULL && /* there is a \g file or program */ + *gfile_fout == NULL && /* and it's not already opened */ + (status == PGRES_TUPLES_OK || + status == PGRES_SINGLE_TUPLE || + status == PGRES_COPY_OUT)) + { + if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe)) + { + if (is_pipe) +disable_sigpipe_trap(); + } + else + return false; + } + return true; +} + +static void +CloseGOutput(FILE *gfile_fout, bool is_pipe) +{ + /* close \g file if we opened it */ + if (gfile_fout) + { + if (is_pipe) + { + pclose(gfile_fout); + restore_sigpipe_trap(); + } + else + fclose(gfile_fout); + } +} /* * ExecQueryAndProcessResults: utility function for use by SendQuery() @@ -1402,10 +1444,16 @@ ExecQueryAndProcessResults(const char *query, bool success; instr_time before, after; + int fetch_count = pset.fetch_count; PGresult *result; + FILE *gfile_fout = NULL; bool gfile_is_pipe = false; + PGresult **result_array = NULL; /* to collect results in single row mode */ + int64 total_tuples = 0; + int ntuples; + if (timing) INSTR_TIME_SET_CURRENT(before); else @@ -1428,6 +1476,33 @@ ExecQueryAndProcessResults(const char *query, return -1; } + /* + * If FETCH_COUNT is set and the context allows it, use the single row + * mode to fetch results and have no more than FETCH_COUNT rows in + * memory. + */ + if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch + && !pset.gset_prefix && pset.show_all_results) + { + /* + * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false, + * since we would need to accumulate all rows before knowing + * whether they need to be discarded or displayed, which contradicts + * FETCH_COUNT. + */ + if (!PQsetSingleRowMode(pset.db)) + { + pg_log_warning("fetching results in single row mode is unavailable"); + fetch_count = 0; + } + else + { + result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*)); + } + } + else + fetch_count = 0; /* disable single-row mode */ + /* * If SIGINT is sent while the query is processing, the interrupt will be * consumed. The user's intention, though, is to cancel the entire watch @@ -1447,6 +1522,7 @@ ExecQueryAndProcessResults(const char *query, ExecStatusType result_status; PGresult *next_result; bool last; + bool partial_display = false; if (!AcceptResult(result, false)) { @@ -1573,6 +1649,96 @@ ExecQueryAndProcessResults(const char *query, success &= HandleCopyResult(, copy_stream); } + if (fetch_count > 0 && result_status == PGRES_SINGLE_TUPLE) + { + FILE *tuples_fout = printQueryFout; + printQueryOpt my_popt = pset.popt; + bool is_pager = false; + int flush_error = 0; + + ntuples = 0; + total_tuples = 0; + partial_display = true; + + success = SetupGOutput(result, _fout, _is_pipe); + if (gfile_fout) +tuples_fout = gfile_fout; + + /* initialize print options for partial table output */ + my_popt.topt.start_table = true; + my_popt.topt.stop_table = false; +
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Tom Lane wrote: > I agree that it seems like a good idea to try. > There will be more per-row overhead, but the increase in flexibility > is likely to justify that. Here's a POC patch implementing row-by-row fetching. If it wasn't for the per-row overhead, we could probably get rid of ExecQueryUsingCursor() and use row-by-row fetches whenever FETCH_COUNT is set, independently of the form of the query. However the difference in processing time seems to be substantial: on some quick tests with FETCH_COUNT=1, I'm seeing almost a 1.5x increase on large datasets. I assume it's the cost of more allocations. I would have hoped that avoiding the FETCH queries and associated round-trips with the cursor method would compensate for that, but it doesn't appear to be the case, at least with a fast local connection. So in this patch, psql still uses the cursor method if the query starts with "select", and falls back to the row-by-row in the main code (ExecQueryAndProcessResults) otherwise. Anyway it solves the main issue of the over-consumption of memory for CTE and update/insert queries returning large resultsets. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 00627830c4..d3de9d8336 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error) { case PGRES_COMMAND_OK: case PGRES_TUPLES_OK: + case PGRES_SINGLE_TUPLE: case PGRES_EMPTY_QUERY: case PGRES_COPY_IN: case PGRES_COPY_OUT: @@ -675,13 +676,13 @@ PrintNotifications(void) * Returns true if successful, false otherwise. */ static bool -PrintQueryTuples(const PGresult *result, const printQueryOpt *opt, +PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt, FILE *printQueryFout) { bool ok = true; FILE *fout = printQueryFout ? printQueryFout : pset.queryFout; - printQuery(result, opt ? opt : , fout, false, pset.logfile); + printQueryChunks(result, nresults, opt ? opt : , fout, false, pset.logfile); fflush(fout); if (ferror(fout)) { @@ -958,7 +959,7 @@ PrintQueryResult(PGresult *result, bool last, else if (last && pset.crosstab_flag) success = PrintResultInCrosstab(result); else if (last || pset.show_all_results) -success = PrintQueryTuples(result, opt, printQueryFout); +success = PrintQueryTuples((const PGresult**), 1, opt, printQueryFout); else success = true; @@ -1369,6 +1370,47 @@ DescribeQuery(const char *query, double *elapsed_msec) return OK; } +/* + * Check if an output stream for \g needs to be opened, and if + * yes, open it. + * Return false if an error occurred, true otherwise. + */ +static bool +SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe) +{ + ExecStatusType status = PQresultStatus(result); + if (pset.gfname != NULL && /* there is a \g file or program */ + *gfile_fout == NULL && /* and it's not already opened */ + (status == PGRES_TUPLES_OK || + status == PGRES_SINGLE_TUPLE || + status == PGRES_COPY_OUT)) + { + if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe)) + { + if (is_pipe) +disable_sigpipe_trap(); + } + else + return false; + } + return true; +} + +static void +CloseGOutput(FILE *gfile_fout, bool is_pipe) +{ + /* close \g file if we opened it */ + if (gfile_fout) + { + if (is_pipe) + { + pclose(gfile_fout); + restore_sigpipe_trap(); + } + else + fclose(gfile_fout); + } +} /* * ExecQueryAndProcessResults: utility function for use by SendQuery() @@ -1400,10 +1442,16 @@ ExecQueryAndProcessResults(const char *query, bool success; instr_time before, after; + int fetch_count = pset.fetch_count; PGresult *result; + FILE *gfile_fout = NULL; bool gfile_is_pipe = false; + PGresult **result_array = NULL; /* to collect results in single row mode */ + int64 total_tuples = 0; + int ntuples; + if (timing) INSTR_TIME_SET_CURRENT(before); @@ -1424,6 +1472,33 @@ ExecQueryAndProcessResults(const char *query, return -1; } + /* + * If FETCH_COUNT is set and the context allows it, use the single row + * mode to fetch results and have no more than FETCH_COUNT rows in + * memory. + */ + if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch + && !pset.gset_prefix && pset.show_all_results) + { + /* + * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false, + * since we would need to accumulate all rows before knowing + * whether they need to be discarded or displayed, which contradicts + * FETCH_COUNT. + */ + if (!PQsetSingleRowMode(pset.db)) + { + pg_log_warning("fetching results in single row mode is unavailable"); + fetch_count = 0; + } + else + { + result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*)); + } + } + else + fetch_count = 0; /* disable single-row mode */
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Wed, Jan 4, 2023 at 6:38 PM Robert Haas wrote: > > On Wed, Jan 4, 2023 at 11:36 AM Tom Lane wrote: > > As you well know, psql's FETCH_COUNT mechanism is far older than > > single-row mode. I don't think anyone's tried to transpose it > > onto that. I agree that it seems like a good idea to try. > > There will be more per-row overhead, but the increase in flexibility > > is likely to justify that. > > Yeah, I was vaguely worried that there might be more per-row overhead, > not that I know a lot about this topic. I wonder if there's a way to > mitigate that. I'm a bit suspicious that what we want here is really > more of an incremental mode than a single-row mode i.e. yeah, you want > to fetch rows without materializing the whole result, but maybe not in > batches of exactly size one. Given the low importance and very low priority of this, how about adding it as a TODO wiki item then and maybe adding just some warning instead? I've intentionally avoided parsing grammar and regexp so it's not perfect (not that I do care about this too much either, as web crawlers already have indexed this $thread). BTW I've found two threads if know what are you looking for [1][2] -Jakub Wartak. [1] - https://www.postgresql.org/message-id/flat/a0a854b6-563c-4a11-bf1c-d6c6f924004d%40manitou-mail.org [2] - https://www.postgresql.org/message-id/flat/1274761885.4261.233.camel%40minidragon 0001-psql-warn-about-CTE-queries-to-be-executed-without-u.patch Description: Binary data
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Wed, Jan 4, 2023 at 11:36 AM Tom Lane wrote: > As you well know, psql's FETCH_COUNT mechanism is far older than > single-row mode. I don't think anyone's tried to transpose it > onto that. I agree that it seems like a good idea to try. > There will be more per-row overhead, but the increase in flexibility > is likely to justify that. Yeah, I was vaguely worried that there might be more per-row overhead, not that I know a lot about this topic. I wonder if there's a way to mitigate that. I'm a bit suspicious that what we want here is really more of an incremental mode than a single-row mode i.e. yeah, you want to fetch rows without materializing the whole result, but maybe not in batches of exactly size one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Robert Haas writes: > On Wed, Jan 4, 2023 at 10:22 AM Daniel Verite wrote: >> A solution would be for psql to use PQsetSingleRowMode() to retrieve >> results row-by-row, as opposed to using a cursor, and then allocate >> memory for only FETCH_COUNT rows at a time. > Is there any reason that someone hasn't, like, already done this? As you well know, psql's FETCH_COUNT mechanism is far older than single-row mode. I don't think anyone's tried to transpose it onto that. I agree that it seems like a good idea to try. There will be more per-row overhead, but the increase in flexibility is likely to justify that. regards, tom lane
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
On Wed, Jan 4, 2023 at 10:22 AM Daniel Verite wrote: > A solution would be for psql to use PQsetSingleRowMode() to retrieve > results row-by-row, as opposed to using a cursor, and then allocate > memory for only FETCH_COUNT rows at a time. Incidentally it solves > other problems like queries containing multiple statements, that also > fail to work properly with cursors, or UPDATE/INSERT... RETURNING.. on > large number of rows that could also benefit from pagination in > memory. Is there any reason that someone hasn't, like, already done this? Because if there isn't, we should really do this. And if there is, like say that it would hurt performance or something, then we should come up with a fix for that problem and then do something like this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs
Jakub Wartak wrote: > It might be a not so well known fact (?) that CTEs are not executed > with cursor when asked to do so, but instead silently executed with > potential huge memory allocation going on. Patch is attached. My one > doubt is that not every statement starting with "WITH" is WITH(..) > SELECT of course. Yes, that's why WITH queries are currently filtered out by the FETCH_COUNT feature. Case in point: test=> begin; BEGIN test=> create table tbl(i int); CREATE TABLE test=> declare psql_cursor cursor for with r(i) as (values (1)) insert into tbl(i) select i from r; ERROR: syntax error at or near "insert" LINE 3: insert into tbl(i) select i from r; So the fix you're proposing would fail on that kind of queries. A solution would be for psql to use PQsetSingleRowMode() to retrieve results row-by-row, as opposed to using a cursor, and then allocate memory for only FETCH_COUNT rows at a time. Incidentally it solves other problems like queries containing multiple statements, that also fail to work properly with cursors, or UPDATE/INSERT... RETURNING.. on large number of rows that could also benefit from pagination in memory. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
psql's FETCH_COUNT (cursor) is not being respected for CTEs
Hi -hackers, I've spent some time fighting against "out of memory" errors coming out of psql when trying to use the cursor via FETCH_COUNT. It might be a not so well known fact (?) that CTEs are not executed with cursor when asked to do so, but instead silently executed with potential huge memory allocation going on. Patch is attached. My one doubt is that not every statement starting with "WITH" is WITH(..) SELECT of course. Demo (one might also get the "out of memory for query result"): postgres@hive:~$ psql -Ant --variable='FETCH_COUNT=100' -c "WITH data AS (SELECT generate_series(1, 2000) as Total) select repeat('a', 100) || data.Total || repeat('b', 800) as total_pat from data;" Killed postgres@hive:~$ tail -4 /var/log/postgresql/postgresql-14-main.log [..] 2023-01-04 12:46:20.193 CET [32936] postgres@postgres LOG: could not send data to client: Broken pipe [..] 2023-01-04 12:46:20.195 CET [32936] postgres@postgres FATAL: connection to client lost With the patch: postgres@hive:~$ /tmp/psql16-with-patch -Ant --variable='FETCH_COUNT=100' -c "WITH data AS (SELECT generate_series(1, 2000) as Total) select repeat('a', 100) || data.Total || repeat('b', 800) as total_pat from data;" | wc -l 2000 postgres@hive:~$ Regards, -Jakub Wartak. 0001-psql-allow-CTE-queries-to-be-executed-also-using-cur.patch Description: Binary data