Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Tom Lane
Alexander Lakhin  writes:
> 08.04.2024 18:08, Tom Lane wrote:
>> Hmm, the point about recursion is still valid isn't it?  I agree the
>> reference to ExecQueryUsingCursor is obsolete, but I think we need to
>> reconstruct what this comment is actually talking about.  It's
>> certainly pretty obscure ...

> Sorry, I wasn't clear enough, I meant to remove only that reference, not
> the quoted comment altogether.

After looking at it, I realized that the comment's last sentence was
also out of date, since SendQuery() isn't where the check of
gexec_flag happens any more.  I concluded that documenting the
behavior of other functions here isn't such a hot idea, and removed
both sentences in favor of expanding the relevant comments in
ExecQueryAndProcessResults.

While doing that, I compared the normal and chunked-fetch code paths
in ExecQueryAndProcessResults more carefully, and realized that the
patch was a few other bricks shy of a load:

* it didn't honor pset.queryFout;

* it ignored the passed-in "printQueryOpt *opt" (maybe that's always
NULL, but doesn't seem like a great assumption);

* it failed to call PrintQueryStatus, so that INSERT RETURNING
and the like would print a status line only in non-FETCH_COUNT
mode.

I cleaned all that up at c21d4c416.

BTW, I had to reverse-engineer the exact reasoning for the cases
where we don't honor FETCH_COUNT.  Most of them are clear enough,
but I'm not totally sure about \watch.  I wrote:

+ * * We're doing \watch: users probably don't want us to force use of the
+ * pager for that, plus chunking could break the min_rows check.

It would not be terribly hard to make the chunked-fetch code path
handle min_rows correctly, and AFAICS the only other thing that
is_watch does differently is to not do SetResultVariables, which
we could match easily enough.  So this is really down to whether
forcing pager mode is okay for a \watch'd query.  I wonder if
that was actually Daniel's reasoning for excluding \watch, and
how strong that argument really is.

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Daniel Verite
Alexander Lakhin wrote:

> >> Now that ExecQueryUsingCursor() is gone, it's not clear, what does
> >> the following comment mean:?
> >>  * We must turn off gexec_flag to avoid infinite recursion.  Note that
> >>  * this allows ExecQueryUsingCursor to be applied to the individual 
> >> query
> >>  * results.
> > Hmm, the point about recursion is still valid isn't it?  I agree the
> > reference to ExecQueryUsingCursor is obsolete, but I think we need to
> > reconstruct what this comment is actually talking about.  It's
> > certainly pretty obscure ...
> 
> Sorry, I wasn't clear enough, I meant to remove only that reference, not
> the quoted comment altogether.

The comment might want to stress the fact that psql honors
FETCH_COUNT "on top of" \gset, so if the user issues for instance:

  select 'select ' || i  from generate_series(1,) as i \gexec

what's going to be sent to the server is a series of:

 BEGIN
 DECLARE _psql_cursor NO SCROLL CURSOR FOR
select 
 FETCH FORWARD  FROM _psql_cursor (possibly repeated)
 CLOSE _psql_cursor
 COMMIT

Another choice would be to ignore FETCH_COUNT and send exactly the
queries that \gset produces, with the assumption that it better
matches the user's expectation. Maybe that alternative was considered
and the comment reflects the decision.

Since the new implementation doesn't rewrite the user-supplied queries,
the point is moot, and this part should be removed:
  "Note that this allows ExecQueryUsingCursor to be applied to the
  individual query results"
I'll wait a bit for other comments and submit a patch.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Alexander Lakhin

08.04.2024 18:08, Tom Lane wrote:

Alexander Lakhin  writes:

Now that ExecQueryUsingCursor() is gone, it's not clear, what does
the following comment mean:?
     * We must turn off gexec_flag to avoid infinite recursion.  Note that
     * this allows ExecQueryUsingCursor to be applied to the individual query
     * results.

Hmm, the point about recursion is still valid isn't it?  I agree the
reference to ExecQueryUsingCursor is obsolete, but I think we need to
reconstruct what this comment is actually talking about.  It's
certainly pretty obscure ...


Sorry, I wasn't clear enough, I meant to remove only that reference, not
the quoted comment altogether.

Best regards,
Alexander




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Tom Lane
Alexander Lakhin  writes:
> Now that ExecQueryUsingCursor() is gone, it's not clear, what does
> the following comment mean:?
>     * We must turn off gexec_flag to avoid infinite recursion.  Note that
>     * this allows ExecQueryUsingCursor to be applied to the individual query
>     * results.

Hmm, the point about recursion is still valid isn't it?  I agree the
reference to ExecQueryUsingCursor is obsolete, but I think we need to
reconstruct what this comment is actually talking about.  It's
certainly pretty obscure ...

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Alexander Lakhin

Hello Daniel and Tom,

08.04.2024 17:25, Daniel Verite wrote:



So I whacked the patch around till I liked it better, and pushed it.

Thanks for taking care of this!


Now that ExecQueryUsingCursor() is gone, it's not clear, what does
the following comment mean:?
   * We must turn off gexec_flag to avoid infinite recursion.  Note that
   * this allows ExecQueryUsingCursor to be applied to the individual query
   * results.

Shouldn't it be removed?

Best regards,
Alexander




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Daniel Verite
Tom Lane wrote:

> I've reconsidered after realizing that implementing FETCH_COUNT
> atop traditional single-row mode would require either merging
> single-row results into a bigger PGresult or persuading psql's
> results-printing code to accept an array of PGresults not just
> one.  Either of those would be expensive and ugly, not to mention
> needing chunks of code we don't have today.

Yes, we must accumulate results because the aligned format needs to
know the columns widths for a entire "page", and the row-by-row logic
does not fit that well in that case.
One of the posted patches implemented this with an array of PGresult
in single-row mode [1] but I'm confident that the newer version you
pushed with the libpq changes is a better approach.

> So I whacked the patch around till I liked it better, and pushed it.

Thanks for taking care of this!


[1]
https://www.postgresql.org/message-id/092583fb-97c5-428f-8d99-fd31be4a5...@manitou-mail.org


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-06 Thread Tom Lane
So what was really bothering me about this patchset was that I
didn't think marginal performance gains were a sufficient reason
to put a whole different operating mode into libpq.  However,
I've reconsidered after realizing that implementing FETCH_COUNT
atop traditional single-row mode would require either merging
single-row results into a bigger PGresult or persuading psql's
results-printing code to accept an array of PGresults not just
one.  Either of those would be expensive and ugly, not to mention
needing chunks of code we don't have today.

Also, it doesn't really need to be a whole different operating mode.
There's no reason that single-row mode shouldn't be exactly equivalent
to chunk mode with chunk size 1, except for the result status code.
(We've got to keep PGRES_SINGLE_TUPLE for the old behavior, but
using that for a chunked result would be too confusing.)

So I whacked the patch around till I liked it better, and pushed it.
I hope my haste will not come back to bite me, but we are getting
pretty hard up against the feature-freeze deadline.

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Daniel Verite
Tom Lane wrote:

> > I should say that I've noticed significant latency improvements with
> > FETCH_COUNT retrieving large resultsets, such that it would benefit
> > non-interactive use cases.
> 
> Do you have a theory for why that is?  It's pretty counterintuitive
> that it would help at all.

I've been thinking that it's a kind of pipeline/parallelism effect.
When libpq accumulates all rows in one resultset, if the network
or the server are not fast enough, it spends a certain amount of
time waiting for the data to come in.
But when it accumulates fewer rows and gives back control
to the app to display intermediate results, during that time the
network buffers can fill in, resulting, I assume, in less time waiting
overall.

I think the benefit is similar to what we get with \copy. In fact
with the  above-mentioned test, the execution times with
FETCH_COUNT=1000 look very close to \copy of the same query.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> I do not buy that psql's FETCH_COUNT mode is a sufficient reason
>> to add it.  FETCH_COUNT mode is not something you'd use
>> non-interactively

> I should say that I've noticed significant latency improvements with
> FETCH_COUNT retrieving large resultsets, such that it would benefit
> non-interactive use cases.

Do you have a theory for why that is?  It's pretty counterintuitive
that it would help at all.

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Daniel Verite
Tom Lane wrote:

> I do not buy that psql's FETCH_COUNT mode is a sufficient reason
> to add it.  FETCH_COUNT mode is not something you'd use
> non-interactively

I should say that I've noticed significant latency improvements with
FETCH_COUNT retrieving large resultsets, such that it would benefit
non-interactive use cases.

For instance, with the current v7 patch, a query like the OP's initial
case and batches of 1000 rows:

$ cat fetchcount-test.sql

select repeat('a', 100) || '-' ||
i || '-' || repeat('b', 500) as total_pat
from generate_series(1, 500) as i
\g /dev/null

$ export TIMEFORMAT=%R

$ for s in $(seq 1 10); do time /usr/local/pgsql/bin/psql -At \
   -v FETCH_COUNT=1000 -f fetchcount-test.sql;  done

3.597
3.413
3.362
3.612
3.377
3.416
3.346
3.368
3.504
3.413

=> Average elapsed time = 3.44s

Now without FETCH_COUNT, fetching the 5 million rows in one resultset:

$ for s in $(seq 1 10); do time /usr/local/pgsql/bin/psql -At \
-f fetchcount-test.sql;  done

4.200
4.178
4.200
4.169
4.195
4.217
4.197
4.234
4.225
4.242

=> Average elapsed time = 4.20s

By comparison the unpatched version (cursor-based method)
gives these execution times with FETCH_COUNT=1000:

4.458
4.448
4.476
4.455
4.450
4.466
4.395
4.429
4.387
4.473

=> Average elapsed time = 4.43s

Now that's just one test, but don't these numbers look good?


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Tom Lane
"Daniel Verite"  writes:
> Updated patches are attached.

I started to look through this, and almost immediately noted

- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results in chunks

This is a bit problematic, because changing the sect1 ID will
change the page's URL, eg

https://www.postgresql.org/docs/current/libpq-single-row-mode.html

Aside from possibly breaking people's bookmarks, I'm pretty sure this
will cause the web docs framework to not recognize any cross-version
commonality of the page.  How ugly would it be if we left the ID
alone?  Another idea could be to leave the whole page alone and add
a new  for chunked mode.

But ... TBH I'm not convinced that we need the chunked mode at all.
We explicitly rejected that idea back when single-row mode was
designed, see around here:

https://www.postgresql.org/message-id/flat/50173BF7.1070801%40Yahoo.com#7f92ebad0143fb5f575ecb3913c5ce88

and I'm still very skeptical that there's much win to be had.
I do not buy that psql's FETCH_COUNT mode is a sufficient reason
to add it.  FETCH_COUNT mode is not something you'd use
non-interactively, and there is enough overhead elsewhere in psql
(notably in result-set formatting) that it doesn't seem worth
micro-optimizing the part about fetching from libpq.

(I see that there was some discussion in that old thread about
micro-optimizing single-row mode internally to libpq by making
PGresult creation cheaper, which I don't think anyone ever got
back to doing.  Maybe we should resurrect that.)

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-01 Thread Daniel Verite
Laurenz Albe wrote:

> Here is the code review for patch number 2:

> +static void
> +CloseGOutput(FILE *gfile_fout, bool is_pipe)
> 
> It makes sense to factor out this code.
> But shouldn't these functions have a prototype at the beginning of the file?

Looking at the other static functions in psql/common.c,  there
are 22 of them but only 3 have prototypes at the top of the file.
These 3 functions are called before being defined, so these prototypes
are mandatory.
The other static functions that are defined before being called happen
not to have forward declarations, so SetupGOutput() and CloseGOutput()
follow that model.

> Here is a suggestion for a consolidated comment:
> 
> Fetch the result in chunks if FETCH_COUNT is set.  We don't enable chunking
> if SHOW_ALL_RESULTS is false, since that requires us to accumulate all rows
>   before we can tell what should be displayed, which would counter the idea
>   of FETCH_COUNT.  Chunk fetching is also disabled if \gset, \crosstab,
>   \gexec and \watch are used.

OK, done like that.

> > +   if (fetch_count > 0 && result_status == PGRES_TUPLES_CHUNK)
> 
> Could it be that result_status == PGRES_TUPLES_CHUNK, but fetch_count is 0?
> if not, perhaps there should be an Assert that verifies that, and the "if"
> statement should only check for the latter condition.

Good point. In fact it can be simplified to
 if (result_status == PGRES_TUPLES_CHUNK),
and fetch_count as a variable can be removed from the function.
Done that way.


> > --- a/src/bin/psql/t/001_basic.pl
> > +++ b/src/bin/psql/t/001_basic.pl
> > @@ -184,10 +184,10 @@ like(
> > "\\set FETCH_COUNT 1\nSELECT error;\n\\errverbose",
> > on_error_stop => 0))[2],
> > qr/\A^psql::2: ERROR:  .*$
> > -^LINE 2: SELECT error;$
> > +^LINE 1: SELECT error;$

> >  ^ *^.*$
> >  ^psql::3: error: ERROR:  [0-9A-Z]{5}: .*$
> > -^LINE 2: SELECT error;$
> > +^LINE 1: SELECT error;$
> 
> Why does the output change?  Perhaps there is a good and harmless
> explanation, but the naïve expectation would be that it doesn't.

Unpatched, psql builds this query:
DECLARE _psql_cursor NO SCROLL CURSOR FOR  \n
   
therefore the user query starts at line 2.

With the patch, the user query is sent as-is, starting at line1,
hence the different error location.


> After fixing the problem manually, it builds without warning.
> The regression tests pass, and the feature works as expected.

Thanks for testing.
Updated patches are attached.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From 0fefaee7d5b3003ad0d089ea9e92675c6f50245f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Mon, 1 Apr 2024 19:46:20 +0200
Subject: [PATCH v7 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml   |  98 +++
 .../libpqwalreceiver/libpqwalreceiver.c   |   1 +
 src/bin/pg_amcheck/pg_amcheck.c   |   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-exec.c| 117 +++---
 src/interfaces/libpq/libpq-fe.h   |   4 +-
 src/interfaces/libpq/libpq-int.h  |   7 +-
 7 files changed, 185 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d3e87056f2..1814921d5a 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3545,7 +3545,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5197,8 +5210,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5562,12 +5575,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before 

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-01 Thread Daniel Verite
Laurenz Albe wrote:

> I had a look at patch 0001 (0002 will follow).

Thanks for reviewing this!

I've implemented the suggested doc changes. A patch update
will follow with the next part of the review.

> > --- a/src/interfaces/libpq/fe-exec.c
> > +++ b/src/interfaces/libpq/fe-exec.c
> > @@ -41,7 +41,8 @@ char *const pgresStatus[] = {
> > "PGRES_COPY_BOTH",
> > "PGRES_SINGLE_TUPLE",
> > "PGRES_PIPELINE_SYNC",
> > -   "PGRES_PIPELINE_ABORTED"
> > +   "PGRES_PIPELINE_ABORTED",
> > +   "PGRES_TUPLES_CHUNK"
> >  };
> 
> I think that PGRES_SINGLE_TUPLE and PGRES_TUPLES_CHUNK should be next to
> each other, but that's no big thing.
> The same applies to the change in src/interfaces/libpq/libpq-fe.h

I assume we can't renumber/reorder existing values, otherwise it would be
an ABI break. We can only add new values.

> I understand that we need to keep the single-row mode for compatibility
> reasons.  But I think that under the hood, "single-row mode" should be the
> same as "chunk mode with chunk size one".

I've implemented it like that at first, and wasn't thrilled with the result.
libpq still has to return PGRES_SINGLE_TUPLE in single-row
mode and PGRES_TUPLES_CHUNK with chunks of size 1, so
the mutualization did not work that well in practice.
I also contemplated not creating PGRES_TUPLES_CHUNK
and instead using PGRES_SINGLE_TUPLE for N rows, but I found
it too ugly.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-03-29 Thread Laurenz Albe
On Fri, 2024-03-29 at 14:07 +0100, Laurenz Albe wrote:
> I had a look at patch 0001 (0002 will follow).

Here is the code review for patch number 2:


> diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
[...]
+static bool
+SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe)
[...]
+static void
+CloseGOutput(FILE *gfile_fout, bool is_pipe)

It makes sense to factor out this code.
But shouldn't these functions have a prototype at the beginning of the file?

> +   /*
> +* If FETCH_COUNT is set and the context allows it, use the single row
> +* mode to fetch results and have no more than FETCH_COUNT rows in
> +* memory.
> +*/

That comment talks about single-row mode, whey you are using chunked mode.
You probably forgot to modify the comment from a previous version of the patch.

> +   if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && 
> !is_watch
> +   && !pset.gset_prefix && pset.show_all_results)
> +   {
> +   /*
> +* The row-by-chunks fetch is not enabled when SHOW_ALL_RESULTS is 
> false,
> +* since we would need to accumulate all rows before knowing
> +* whether they need to be discarded or displayed, which contradicts
> +* FETCH_COUNT.
> +*/
> +   if (!PQsetChunkedRowsMode(pset.db, fetch_count))
> +   {

I think that comment should be before the "if" statement, not inside it.

Here is a suggestion for a consolidated comment:

  Fetch the result in chunks if FETCH_COUNT is set.  We don't enable chunking
  if SHOW_ALL_RESULTS is false, since that requires us to accumulate all rows
  before we can tell what should be displayed, which would counter the idea
  of FETCH_COUNT.  Chunk fetching is also disabled if \gset, \crosstab,
  \gexec and \watch are used.

> +   if (fetch_count > 0 && result_status == PGRES_TUPLES_CHUNK)

Could it be that result_status == PGRES_TUPLES_CHUNK, but fetch_count is 0?
if not, perhaps there should be an Assert that verifies that, and the "if"
statement should only check for the latter condition.

> --- a/src/bin/psql/t/001_basic.pl
> +++ b/src/bin/psql/t/001_basic.pl
> @@ -184,10 +184,10 @@ like(
> "\\set FETCH_COUNT 1\nSELECT error;\n\\errverbose",
> on_error_stop => 0))[2],
> qr/\A^psql::2: ERROR:  .*$
> -^LINE 2: SELECT error;$
> +^LINE 1: SELECT error;$
>  ^ *^.*$
>  ^psql::3: error: ERROR:  [0-9A-Z]{5}: .*$
> -^LINE 2: SELECT error;$
> +^LINE 1: SELECT error;$

Why does the output change?  Perhaps there is a good and harmless
explanation, but the naïve expectation would be that it doesn't.


The patch does not apply any more because of a conflict with the
non-blocking PQcancel patch.

After fixing the problem manually, it builds without warning.
The regression tests pass, and the feature works as expected.

Yours,
Laurenz Albe




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-03-29 Thread Laurenz Albe
On Tue, 2024-01-30 at 15:29 +0100, Daniel Verite wrote:
> PFA a rebased version.

I had a look at patch 0001 (0002 will follow).

> - 
> -  Retrieving Query Results Row-by-Row
> + 
> +  Retrieving Query Results by chunks

That should be "in chunks".

> +
> +   
> +
> +  PQsetChunkedRowsMode
> +PQsetChunkedRowsMode
> + 
> +  
> +   Select the mode retrieving results in chunks for the 
> currently-executing query.

That is questionable English.  How about

  Select to receive the results for the currently-executing query in chunks.

> +   This function is similar to  linkend="libpq-PQsetSingleRowMode"/>,
> +   except that it can retrieve a user-specified number of rows
> +   per call to , instead of a single 
> row.

The "user-specified number" is "maxRows".  So a better wording would be:

  ... except that it can retrieve maxRows rows
  per call to  instead of a single row.

> -error.  But in single-row mode, those rows will have already been
> +error.  But in single-row or chunked modes, those rows will have already 
> been

I'd say it should be "in *the* single-row or chunk modes".

> --- a/src/interfaces/libpq/fe-exec.c
> +++ b/src/interfaces/libpq/fe-exec.c
> @@ -41,7 +41,8 @@ char *const pgresStatus[] = {
> "PGRES_COPY_BOTH",
> "PGRES_SINGLE_TUPLE",
> "PGRES_PIPELINE_SYNC",
> -   "PGRES_PIPELINE_ABORTED"
> +   "PGRES_PIPELINE_ABORTED",
> +   "PGRES_TUPLES_CHUNK"
>  };

I think that PGRES_SINGLE_TUPLE and PGRES_TUPLES_CHUNK should be next to each
other, but that's no big thing.
The same applies to the change in src/interfaces/libpq/libpq-fe.h

I understand that we need to keep the single-row mode for compatibility
reasons.  But I think that under the hood, "single-row mode" should be the
same as "chunk mode with chunk size one".
That should save some code repetition.

Yours,
Laurenz Albe




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-02-12 Thread Daniel Verite
Jakub Wartak wrote:

> when I run with default pager (more or less):
> \set FETCH_COUNT 1000
> WITH data AS (SELECT  generate_series(1, 2000) as Total) select
> repeat('a',100) || data.Total || repeat('b', 800) as total_pat from
> data;
> -- it enters pager, a skip couple of pages and then "q"
> 
> .. then - both backend and psql - go into 100% CPU as it were still
> receiving

Thanks for looking into this patch!

What's happening after the pager has quit is that psql continues
to pump results from the server until there are no more results.

If the user wants to interrupt that, they should hit Ctrl+C to
cancel the query. I think psql should not cancel it implicitly
on their behalf, as it also cancels the transaction.

The behavior differs from the cursor implementation, because in
the cursor case, when the pager is displaying results, no query is
running. The previous FETCH results have been entirely
read, and the next FETCH has not been sent to the server yet.
This is why quitting the pager in the middle of this can
be dealt with instantly.

> (that doesn't happen e.g. with export PAGER=cat).  So I'm
> not sure, maybe ExecQueryAndProcessResults() should somewhat
> faster abort when the $PAGER is exiting normally(?).

I assume that when using PAGER=cat, you cancel the display
with Ctrl+C, which propagates to psql and have the effect
to also cancel the query. In that case it displays
"Cancel request sent",
and then shortly after it gets back from the server:
"ERROR:  canceling statement due to user request".
That case corresponds to the generic query canceling flow.

OTOH if killing the "cat" process with kill -TERM I see the same
behavior than with "more" or "less", that is postgres running
the query to completion and psql pumping the results.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-02-08 Thread Jakub Wartak
Hi Daniel,


On Tue, Jan 30, 2024 at 3:29 PM Daniel Verite  wrote:

> PFA a rebased version.

Thanks for the patch! I've tested it using my original reproducer and
it works great now against the original problem description. I've
taken a quick look at the patch, it looks good for me. I've tested
using -Werror for both gcc 10.2 and clang 11.0 and it was clean. I
have one slight doubt:

when I run with default pager (more or less):
\set FETCH_COUNT 1000
WITH data AS (SELECT  generate_series(1, 2000) as Total) select
repeat('a',100) || data.Total || repeat('b', 800) as total_pat from
data;
-- it enters pager, a skip couple of pages and then "q"

.. then - both backend and psql - go into 100% CPU as it were still
receiving (that doesn't happen e.g. with export PAGER=cat).  So I'm
not sure, maybe ExecQueryAndProcessResults() should somewhat faster
abort when the $PAGER is exiting normally(?).

And oh , btw, in v6-0001 (so if you would be sending v7 for any other
reason -- other reviewers -- maybe worth realigning it as detail):

+  int PQsetChunkedRowsMode(PGconn *conn,
+   int maxRows);

but the code has (so "maxRows" != "chunkSize"):

+PQsetChunkedRowsMode(PGconn *conn, int chunkSize)

-J.




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-30 Thread Daniel Verite
vignesh C wrote:

> patching file src/interfaces/libpq/exports.txt
> Hunk #1 FAILED at 191.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/interfaces/libpq/exports.txt.rej
> 
> Please post an updated version for the same.

PFA a rebased version.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From 8cfb82b1e36e96996637948a231ae35b9af1e074 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Tue, 30 Jan 2024 14:38:21 +0100
Subject: [PATCH v6 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml   |  96 ++
 .../libpqwalreceiver/libpqwalreceiver.c   |   1 +
 src/bin/pg_amcheck/pg_amcheck.c   |   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-exec.c| 117 +++---
 src/interfaces/libpq/libpq-fe.h   |   4 +-
 src/interfaces/libpq/libpq-int.h  |   7 +-
 7 files changed, 184 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index d0d5aefadc..f7f5a04df6 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5189,8 +5202,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5554,12 +5567,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before retrieving results with PQgetResult.
- This mode selection is effective only for the query currently
- being processed. For more information on the use of
- PQsetSingleRowMode,
- refer to .
+ To enter single-row or chunked modes, call
+ respectively PQsetSingleRowMode
+ or PQsetChunkedRowsMode before retrieving results
+ with PQgetResult.  This mode selection is effective
+ only for the query currently being processed. For more information on the
+ use of these functions refer
+ to .
 
 
 
@@ -5926,10 +5940,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
  
 
- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results by chunks
 
-  
+  
libpq
single-row mode
   
@@ -5940,13 +5954,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
PGresult.  This can be unworkable for commands
that return a large number of rows.  For such cases, applications can use
 and  
in
-   single-row mode.  In this mode, the result row(s) are
-   returned to the application one at a time, as they are received from the
-   server.
+   single-row mode or chunked 
mode.
+   In these modes, the result row(s) are returned to the application one at a
+   time for the single-row mode and by chunks for the chunked mode, as they
+   are received from the server.
   
 
   
-   To enter single-row mode, call 
+   To enter these modes, call 
+or 
immediately after a successful call of 
(or a sibling function).  This mode selection is effective only for the
currently executing query.  Then call 
@@ -5954,7 +5970,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
linkend="libpq-async"/>.  If the query returns any rows, they are returned
as individual PGresult objects, which look like
normal query results except for having status code
-   PGRES_SINGLE_TUPLE instead of
+   PGRES_SINGLE_TUPLE for the single-row mode and
+   PGRES_TUPLES_CHUNK for the chunked mode, instead of
PGRES_TUPLES_OK.  After the last row, or immediately if
the query returns zero rows, a zero-row object with status
PGRES_TUPLES_OK is returned; this is the signal that no
@@ -5967,9 +5984,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
 
   
-   When using pipeline mode, 

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-26 Thread vignesh C
On Tue, 2 Jan 2024 at 20:28, Daniel Verite  wrote:
>
>   Hi,
>
> PFA a rebased version.

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
a3a836fb5e51183eae624d43225279306c2285b8 ===
=== applying patch
./v5-0001-Implement-retrieval-of-results-in-chunks-with-lib.patch
patching file doc/src/sgml/libpq.sgml
...
patching file src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
...
patching file src/interfaces/libpq/exports.txt
Hunk #1 FAILED at 191.
1 out of 1 hunk FAILED -- saving rejects to file
src/interfaces/libpq/exports.txt.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4233.log

Regards,
Vignesh




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-02 Thread Daniel Verite
  Hi,

PFA a rebased version.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From cd0fe1d517a0e31e031fbbea1e603a715c77ea97 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Tue, 2 Jan 2024 14:15:48 +0100
Subject: [PATCH v5 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml   |  96 ++
 .../libpqwalreceiver/libpqwalreceiver.c   |   1 +
 src/bin/pg_amcheck/pg_amcheck.c   |   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-exec.c| 117 +++---
 src/interfaces/libpq/libpq-fe.h   |   4 +-
 src/interfaces/libpq/libpq-int.h  |   7 +-
 7 files changed, 184 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index ed88ac001a..8007bf67d8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5187,8 +5200,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5551,12 +5564,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before retrieving results with PQgetResult.
- This mode selection is effective only for the query currently
- being processed. For more information on the use of
- PQsetSingleRowMode,
- refer to .
+ To enter single-row or chunked modes, call
+ respectively PQsetSingleRowMode
+ or PQsetChunkedRowsMode before retrieving results
+ with PQgetResult.  This mode selection is effective
+ only for the query currently being processed. For more information on the
+ use of these functions refer
+ to .
 
 
 
@@ -5895,10 +5909,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
  
 
- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results by chunks
 
-  
+  
libpq
single-row mode
   
@@ -5909,13 +5923,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
PGresult.  This can be unworkable for commands
that return a large number of rows.  For such cases, applications can use
 and  
in
-   single-row mode.  In this mode, the result row(s) are
-   returned to the application one at a time, as they are received from the
-   server.
+   single-row mode or chunked 
mode.
+   In these modes, the result row(s) are returned to the application one at a
+   time for the single-row mode and by chunks for the chunked mode, as they
+   are received from the server.
   
 
   
-   To enter single-row mode, call 
+   To enter these modes, call 
+or 
immediately after a successful call of 
(or a sibling function).  This mode selection is effective only for the
currently executing query.  Then call 
@@ -5923,7 +5939,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
linkend="libpq-async"/>.  If the query returns any rows, they are returned
as individual PGresult objects, which look like
normal query results except for having status code
-   PGRES_SINGLE_TUPLE instead of
+   PGRES_SINGLE_TUPLE for the single-row mode and
+   PGRES_TUPLES_CHUNK for the chunked mode, instead of
PGRES_TUPLES_OK.  After the last row, or immediately if
the query returns zero rows, a zero-row object with status
PGRES_TUPLES_OK is returned; this is the signal that no
@@ -5936,9 +5953,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
 
   
-   When using pipeline mode, single-row mode needs to be activated for each
-   query in the pipeline before retrieving results for that query
-   with PQgetResult.
+   When using pipeline mode, the single-row or chunked mode need to be
+   activated for each query 

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-11-20 Thread Daniel Verite
 Hi,

Here's a new version to improve the performance of FETCH_COUNT
and extend the cases when it can be used.

Patch 0001 adds a new mode in libpq to allow the app to retrieve
larger chunks of results than the single row of the row-by-row mode.
The maximum number of rows per PGresult is set by the user.

Patch 0002 uses that mode in psql and gets rid of the cursor
implementation as suggested upthread.

The performance numbers look good.
For a query retrieving 50M rows of about 200 bytes:
  select repeat('abc', 200) from generate_series(1,500)
/usr/bin/time -v psql -At -c $query reports these metrics
(medians of 5 runs):

  version  | fetch_count | clock_time | user_time | sys_time | max_rss_size
(kB) 
---+-++---+--+---
 16-stable |   0 |   6.58 |  3.98 | 2.09 |  
3446276
 16-stable | 100 |   9.25 |  4.10 | 1.90 | 
8768
 16-stable |1000 |  11.13 |  5.17 | 1.66 | 
8904
 17-patch  |   0 |6.5 |  3.94 | 2.09 |  
3442696
 17-patch  | 100 |  5 |  3.56 | 0.93 | 
4096
 17-patch  |1000 |   6.48 |  4.00 | 1.55 | 
4344

Interestingly, retrieving by chunks of 100 rows appears to be a bit faster
than the default one big chunk. It means that independently
of using less memory, FETCH_COUNT implemented that way
would be a performance enhancement compared to both
not using it and using it in v16 with the cursor implementation.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From 766bbe84def2db494f646caeaf29eefeba893c1a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Mon, 20 Nov 2023 17:24:55 +0100
Subject: [PATCH v4 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml  |  96 +++--
 src/bin/pg_amcheck/pg_amcheck.c  |   1 +
 src/interfaces/libpq/exports.txt |   1 +
 src/interfaces/libpq/fe-exec.c   | 118 +--
 src/interfaces/libpq/libpq-fe.h  |   4 +-
 src/interfaces/libpq/libpq-int.h |   7 +-
 6 files changed, 183 insertions(+), 44 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index ed88ac001a..8007bf67d8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5187,8 +5200,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5551,12 +5564,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before retrieving results with PQgetResult.
- This mode selection is effective only for the query currently
- being processed. For more information on the use of
- PQsetSingleRowMode,
- refer to .
+ To enter single-row or chunked modes, call
+ respectively PQsetSingleRowMode
+ or PQsetChunkedRowsMode before retrieving results
+ with PQgetResult.  This mode selection is effective
+ only for the query currently being processed. For more information on the
+ use of these functions refer
+ to .
 
 
 
@@ -5895,10 +5909,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
  
 
- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results by chunks
 
-  
+  
libpq
single-row mode
   
@@ -5909,13 +5923,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
PGresult.  This can be unworkable for commands
that return a large number of rows.  For such cases, applications can use
 and  
in
-   single-row mode.  In this mode, the result row(s) are
-   returned to the application 

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-07-07 Thread Daniel Verite
Tom Lane wrote:

> This gives me several "-Wincompatible-pointer-types" warnings
> [...]
> I think what you probably ought to do to avoid all that is to change
> the arguments of PrintQueryResult and nearby routines to be "const
> PGresult *result" not just "PGresult *result".

The const-ness issue that I ignored in the previous patch is that
while C is fine with passing T* to a function expecting const T*, it's
not okay with passing T** to a function expecting const T**,
or more generally converting T** to const T**.

When callers need to pass arrays of PGresult* instead of const
PGresult*, I've opted to remove the const qualifiers for the functions
that are concerned by this change.


PFA an updated patch.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 5973df2e39..476a9770f0 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error)
 		{
 			case PGRES_COMMAND_OK:
 			case PGRES_TUPLES_OK:
+			case PGRES_SINGLE_TUPLE:
 			case PGRES_EMPTY_QUERY:
 			case PGRES_COPY_IN:
 			case PGRES_COPY_OUT:
@@ -695,7 +696,7 @@ PrintNotifications(void)
  * Returns true if successful, false otherwise.
  */
 static bool
-PrintQueryTuples(const PGresult *result, const printQueryOpt *opt,
+PrintQueryTuples(PGresult *result, const printQueryOpt *opt,
  FILE *printQueryFout)
 {
 	bool		ok = true;
@@ -1391,6 +1392,47 @@ DescribeQuery(const char *query, double *elapsed_msec)
 	return OK;
 }
 
+/*
+ * Check if an output stream for \g needs to be opened, and if
+ * yes, open it.
+ * Return false if an error occurred, true otherwise.
+ */
+static bool
+SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe)
+{
+	ExecStatusType status = PQresultStatus(result);
+	if (pset.gfname != NULL &&			/* there is a \g file or program */
+		*gfile_fout == NULL &&   /* and it's not already opened */
+		(status == PGRES_TUPLES_OK ||
+		 status == PGRES_SINGLE_TUPLE ||
+		 status == PGRES_COPY_OUT))
+	{
+		if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe))
+		{
+			if (is_pipe)
+disable_sigpipe_trap();
+		}
+		else
+			return false;
+	}
+	return true;
+}
+
+static void
+CloseGOutput(FILE *gfile_fout, bool is_pipe)
+{
+	/* close \g file if we opened it */
+	if (gfile_fout)
+	{
+		if (is_pipe)
+		{
+			SetShellResultVariables(pclose(gfile_fout));
+			restore_sigpipe_trap();
+		}
+		else
+			fclose(gfile_fout);
+	}
+}
 
 /*
  * ExecQueryAndProcessResults: utility function for use by SendQuery()
@@ -1422,10 +1464,18 @@ ExecQueryAndProcessResults(const char *query,
 	bool		success;
 	instr_time	before,
 after;
+	int fetch_count = pset.fetch_count;
 	PGresult   *result;
+
 	FILE	   *gfile_fout = NULL;
 	bool		gfile_is_pipe = false;
 
+	PGresult   **result_array = NULL; /* to collect results in single row mode */
+	int64		total_tuples = 0;
+	int			ntuples;
+	int			flush_error = 0;
+	bool		is_pager = false;
+
 	if (timing)
 		INSTR_TIME_SET_CURRENT(before);
 	else
@@ -1448,6 +1498,33 @@ ExecQueryAndProcessResults(const char *query,
 		return -1;
 	}
 
+	/*
+	 * If FETCH_COUNT is set and the context allows it, use the single row
+	 * mode to fetch results and have no more than FETCH_COUNT rows in
+	 * memory.
+	 */
+	if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch
+		&& !pset.gset_prefix && pset.show_all_results)
+	{
+		/*
+		 * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false,
+		 * since we would need to accumulate all rows before knowing
+		 * whether they need to be discarded or displayed, which contradicts
+		 * FETCH_COUNT.
+		 */
+		if (!PQsetSingleRowMode(pset.db))
+		{
+			pg_log_warning("fetching results in single row mode is unavailable");
+			fetch_count = 0;
+		}
+		else
+		{
+			result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*));
+		}
+	}
+	else
+		fetch_count = 0;		/* disable single-row mode */
+
 	/*
 	 * If SIGINT is sent while the query is processing, the interrupt will be
 	 * consumed.  The user's intention, though, is to cancel the entire watch
@@ -1467,6 +1544,8 @@ ExecQueryAndProcessResults(const char *query,
 		ExecStatusType result_status;
 		PGresult   *next_result;
 		bool		last;
+		/* whether the output starts before results are fully fetched */
+		bool		partial_display = false;
 
 		if (!AcceptResult(result, false))
 		{
@@ -1593,6 +1672,94 @@ ExecQueryAndProcessResults(const char *query,
 			success &= HandleCopyResult(, copy_stream);
 		}
 
+		if (fetch_count > 0 && result_status == PGRES_SINGLE_TUPLE)
+		{
+			FILE	   *tuples_fout = printQueryFout ? printQueryFout : stdout;
+			printQueryOpt my_popt = pset.popt;
+
+			ntuples = 0;
+			total_tuples = 0;
+			partial_display = true;
+
+			success = SetupGOutput(result, _fout, _is_pipe);
+			if (gfile_fout)
+tuples_fout = gfile_fout;
+
+			/* initialize print options for partial 

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-03-24 Thread Tom Lane
"Daniel Verite"  writes:
> PFA an updated patch.

This gives me several "-Wincompatible-pointer-types" warnings
(as are also reported by the cfbot):

common.c: In function 'ExecQueryAndProcessResults':
common.c:1686:24: warning: passing argument 1 of 'PrintQueryTuples' from 
incompatible pointer type [-Wincompatible-pointer-types]
   PrintQueryTuples(result_array, ntuples, _popt, tuples_fout);
^~~~
common.c:679:35: note: expected 'const PGresult **' {aka 'const struct 
pg_result **'} but argument is of type 'PGresult **' {aka 'struct pg_result **'}
 PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt 
*opt,
  ~^~
common.c:1720:24: warning: passing argument 1 of 'PrintQueryTuples' from 
incompatible pointer type [-Wincompatible-pointer-types]
   PrintQueryTuples(result_array, ntuples, _popt, tuples_fout);
^~~~
common.c:679:35: note: expected 'const PGresult **' {aka 'const struct 
pg_result **'} but argument is of type 'PGresult **' {aka 'struct pg_result **'}
 PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt 
*opt,
  ~^~

I think the cause is the inconsistency about whether PGresult pointers
are pointer-to-const or not.  Even without compiler warnings, I find
code like this very ugly:

-   success = PrintQueryTuples(result, opt, 
printQueryFout);
+   success = PrintQueryTuples((const 
PGresult**), 1, opt, printQueryFout);

I think what you probably ought to do to avoid all that is to change
the arguments of PrintQueryResult and nearby routines to be "const
PGresult *result" not just "PGresult *result".

I find it sad that we can't get rid of ExecQueryUsingCursor().
Maybe a little effort towards reducing overhead in the single-row
mode would help?

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-03-01 Thread Daniel Verite
  I wrote:

> Here's a POC patch implementing row-by-row fetching.

PFA an updated patch.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index f907f5d4e8..ad5e8a5de9 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error)
 		{
 			case PGRES_COMMAND_OK:
 			case PGRES_TUPLES_OK:
+			case PGRES_SINGLE_TUPLE:
 			case PGRES_EMPTY_QUERY:
 			case PGRES_COPY_IN:
 			case PGRES_COPY_OUT:
@@ -675,13 +676,13 @@ PrintNotifications(void)
  * Returns true if successful, false otherwise.
  */
 static bool
-PrintQueryTuples(const PGresult *result, const printQueryOpt *opt,
+PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt,
  FILE *printQueryFout)
 {
 	bool		ok = true;
 	FILE	   *fout = printQueryFout ? printQueryFout : pset.queryFout;
 
-	printQuery(result, opt ? opt : , fout, false, pset.logfile);
+	printQueryChunks(result, nresults, opt ? opt : , fout, false, pset.logfile);
 	fflush(fout);
 	if (ferror(fout))
 	{
@@ -958,7 +959,7 @@ PrintQueryResult(PGresult *result, bool last,
 			else if (last && pset.crosstab_flag)
 success = PrintResultInCrosstab(result);
 			else if (last || pset.show_all_results)
-success = PrintQueryTuples(result, opt, printQueryFout);
+success = PrintQueryTuples((const PGresult**), 1, opt, printQueryFout);
 			else
 success = true;
 
@@ -1371,6 +1372,47 @@ DescribeQuery(const char *query, double *elapsed_msec)
 	return OK;
 }
 
+/*
+ * Check if an output stream for \g needs to be opened, and if
+ * yes, open it.
+ * Return false if an error occurred, true otherwise.
+ */
+static bool
+SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe)
+{
+	ExecStatusType status = PQresultStatus(result);
+	if (pset.gfname != NULL &&			/* there is a \g file or program */
+		*gfile_fout == NULL &&   /* and it's not already opened */
+		(status == PGRES_TUPLES_OK ||
+		 status == PGRES_SINGLE_TUPLE ||
+		 status == PGRES_COPY_OUT))
+	{
+		if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe))
+		{
+			if (is_pipe)
+disable_sigpipe_trap();
+		}
+		else
+			return false;
+	}
+	return true;
+}
+
+static void
+CloseGOutput(FILE *gfile_fout, bool is_pipe)
+{
+	/* close \g file if we opened it */
+	if (gfile_fout)
+	{
+		if (is_pipe)
+		{
+			pclose(gfile_fout);
+			restore_sigpipe_trap();
+		}
+		else
+			fclose(gfile_fout);
+	}
+}
 
 /*
  * ExecQueryAndProcessResults: utility function for use by SendQuery()
@@ -1402,10 +1444,16 @@ ExecQueryAndProcessResults(const char *query,
 	bool		success;
 	instr_time	before,
 after;
+	int fetch_count = pset.fetch_count;
 	PGresult   *result;
+
 	FILE	   *gfile_fout = NULL;
 	bool		gfile_is_pipe = false;
 
+	PGresult   **result_array = NULL; /* to collect results in single row mode */
+	int64		total_tuples = 0;
+	int			ntuples;
+
 	if (timing)
 		INSTR_TIME_SET_CURRENT(before);
 	else
@@ -1428,6 +1476,33 @@ ExecQueryAndProcessResults(const char *query,
 		return -1;
 	}
 
+	/*
+	 * If FETCH_COUNT is set and the context allows it, use the single row
+	 * mode to fetch results and have no more than FETCH_COUNT rows in
+	 * memory.
+	 */
+	if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch
+		&& !pset.gset_prefix && pset.show_all_results)
+	{
+		/*
+		 * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false,
+		 * since we would need to accumulate all rows before knowing
+		 * whether they need to be discarded or displayed, which contradicts
+		 * FETCH_COUNT.
+		 */
+		if (!PQsetSingleRowMode(pset.db))
+		{
+			pg_log_warning("fetching results in single row mode is unavailable");
+			fetch_count = 0;
+		}
+		else
+		{
+			result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*));
+		}
+	}
+	else
+		fetch_count = 0;		/* disable single-row mode */
+
 	/*
 	 * If SIGINT is sent while the query is processing, the interrupt will be
 	 * consumed.  The user's intention, though, is to cancel the entire watch
@@ -1447,6 +1522,7 @@ ExecQueryAndProcessResults(const char *query,
 		ExecStatusType result_status;
 		PGresult   *next_result;
 		bool		last;
+		bool		partial_display = false;
 
 		if (!AcceptResult(result, false))
 		{
@@ -1573,6 +1649,96 @@ ExecQueryAndProcessResults(const char *query,
 			success &= HandleCopyResult(, copy_stream);
 		}
 
+		if (fetch_count > 0 && result_status == PGRES_SINGLE_TUPLE)
+		{
+			FILE	   *tuples_fout = printQueryFout;
+			printQueryOpt my_popt = pset.popt;
+			bool		is_pager = false;
+			int			flush_error = 0;
+
+			ntuples = 0;
+			total_tuples = 0;
+			partial_display = true;
+
+			success = SetupGOutput(result, _fout, _is_pipe);
+			if (gfile_fout)
+tuples_fout = gfile_fout;
+
+			/* initialize print options for partial table output */
+			my_popt.topt.start_table = true;
+			my_popt.topt.stop_table = false;
+			

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-12 Thread Daniel Verite
Tom Lane wrote:

> I agree that it seems like a good idea to try.
> There will be more per-row overhead, but the increase in flexibility
> is likely to justify that.

Here's a POC patch implementing row-by-row fetching.

If it wasn't for the per-row overhead, we could probably get rid of
ExecQueryUsingCursor() and use row-by-row fetches whenever
FETCH_COUNT is set, independently of the form of the query.

However the difference in processing time seems to be substantial: on
some quick tests with FETCH_COUNT=1, I'm seeing almost a 1.5x
increase on large datasets. I assume it's the cost of more allocations.
I would have hoped that avoiding the FETCH queries and associated
round-trips with the cursor method would compensate for that, but it
doesn't appear to be the case, at least with a fast local connection.

So in this patch, psql still uses the cursor method if the
query starts with "select", and falls back to the row-by-row in
the main code (ExecQueryAndProcessResults) otherwise.
Anyway it solves the main issue of the over-consumption of memory
for CTE and update/insert queries returning large resultsets.


Best regards,

-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 00627830c4..d3de9d8336 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error)
 		{
 			case PGRES_COMMAND_OK:
 			case PGRES_TUPLES_OK:
+			case PGRES_SINGLE_TUPLE:
 			case PGRES_EMPTY_QUERY:
 			case PGRES_COPY_IN:
 			case PGRES_COPY_OUT:
@@ -675,13 +676,13 @@ PrintNotifications(void)
  * Returns true if successful, false otherwise.
  */
 static bool
-PrintQueryTuples(const PGresult *result, const printQueryOpt *opt,
+PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt,
  FILE *printQueryFout)
 {
 	bool		ok = true;
 	FILE	   *fout = printQueryFout ? printQueryFout : pset.queryFout;
 
-	printQuery(result, opt ? opt : , fout, false, pset.logfile);
+	printQueryChunks(result, nresults, opt ? opt : , fout, false, pset.logfile);
 	fflush(fout);
 	if (ferror(fout))
 	{
@@ -958,7 +959,7 @@ PrintQueryResult(PGresult *result, bool last,
 			else if (last && pset.crosstab_flag)
 success = PrintResultInCrosstab(result);
 			else if (last || pset.show_all_results)
-success = PrintQueryTuples(result, opt, printQueryFout);
+success = PrintQueryTuples((const PGresult**), 1, opt, printQueryFout);
 			else
 success = true;
 
@@ -1369,6 +1370,47 @@ DescribeQuery(const char *query, double *elapsed_msec)
 	return OK;
 }
 
+/*
+ * Check if an output stream for \g needs to be opened, and if
+ * yes, open it.
+ * Return false if an error occurred, true otherwise.
+ */
+static bool
+SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe)
+{
+	ExecStatusType status = PQresultStatus(result);
+	if (pset.gfname != NULL &&			/* there is a \g file or program */
+		*gfile_fout == NULL &&   /* and it's not already opened */
+		(status == PGRES_TUPLES_OK ||
+		 status == PGRES_SINGLE_TUPLE ||
+		 status == PGRES_COPY_OUT))
+	{
+		if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe))
+		{
+			if (is_pipe)
+disable_sigpipe_trap();
+		}
+		else
+			return false;
+	}
+	return true;
+}
+
+static void
+CloseGOutput(FILE *gfile_fout, bool is_pipe)
+{
+	/* close \g file if we opened it */
+	if (gfile_fout)
+	{
+		if (is_pipe)
+		{
+			pclose(gfile_fout);
+			restore_sigpipe_trap();
+		}
+		else
+			fclose(gfile_fout);
+	}
+}
 
 /*
  * ExecQueryAndProcessResults: utility function for use by SendQuery()
@@ -1400,10 +1442,16 @@ ExecQueryAndProcessResults(const char *query,
 	bool		success;
 	instr_time	before,
 after;
+	int fetch_count = pset.fetch_count;
 	PGresult   *result;
+
 	FILE	   *gfile_fout = NULL;
 	bool		gfile_is_pipe = false;
 
+	PGresult   **result_array = NULL; /* to collect results in single row mode */
+	int64		total_tuples = 0;
+	int			ntuples;
+
 	if (timing)
 		INSTR_TIME_SET_CURRENT(before);
 
@@ -1424,6 +1472,33 @@ ExecQueryAndProcessResults(const char *query,
 		return -1;
 	}
 
+	/*
+	 * If FETCH_COUNT is set and the context allows it, use the single row
+	 * mode to fetch results and have no more than FETCH_COUNT rows in
+	 * memory.
+	 */
+	if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch
+		&& !pset.gset_prefix && pset.show_all_results)
+	{
+		/*
+		 * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false,
+		 * since we would need to accumulate all rows before knowing
+		 * whether they need to be discarded or displayed, which contradicts
+		 * FETCH_COUNT.
+		 */
+		if (!PQsetSingleRowMode(pset.db))
+		{
+			pg_log_warning("fetching results in single row mode is unavailable");
+			fetch_count = 0;
+		}
+		else
+		{
+			result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*));
+		}
+	}
+	else
+		fetch_count = 0;		/* disable single-row mode */

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-10 Thread Jakub Wartak
On Wed, Jan 4, 2023 at 6:38 PM Robert Haas  wrote:
>
> On Wed, Jan 4, 2023 at 11:36 AM Tom Lane  wrote:
> > As you well know, psql's FETCH_COUNT mechanism is far older than
> > single-row mode.  I don't think anyone's tried to transpose it
> > onto that.  I agree that it seems like a good idea to try.
> > There will be more per-row overhead, but the increase in flexibility
> > is likely to justify that.
>
> Yeah, I was vaguely worried that there might be more per-row overhead,
> not that I know a lot about this topic. I wonder if there's a way to
> mitigate that. I'm a bit suspicious that what we want here is really
> more of an incremental mode than a single-row mode i.e. yeah, you want
> to fetch rows without materializing the whole result, but maybe not in
> batches of exactly size one.

Given the low importance and very low priority of this, how about
adding it as a TODO wiki item then and maybe adding just some warning
instead? I've intentionally avoided parsing grammar and regexp so it's
not perfect (not that I do care about this too much either, as web
crawlers already have indexed this $thread). BTW I've found two
threads if know what are you looking for [1][2]

-Jakub Wartak.

[1] - 
https://www.postgresql.org/message-id/flat/a0a854b6-563c-4a11-bf1c-d6c6f924004d%40manitou-mail.org
[2] - 
https://www.postgresql.org/message-id/flat/1274761885.4261.233.camel%40minidragon


0001-psql-warn-about-CTE-queries-to-be-executed-without-u.patch
Description: Binary data


Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-04 Thread Robert Haas
On Wed, Jan 4, 2023 at 11:36 AM Tom Lane  wrote:
> As you well know, psql's FETCH_COUNT mechanism is far older than
> single-row mode.  I don't think anyone's tried to transpose it
> onto that.  I agree that it seems like a good idea to try.
> There will be more per-row overhead, but the increase in flexibility
> is likely to justify that.

Yeah, I was vaguely worried that there might be more per-row overhead,
not that I know a lot about this topic. I wonder if there's a way to
mitigate that. I'm a bit suspicious that what we want here is really
more of an incremental mode than a single-row mode i.e. yeah, you want
to fetch rows without materializing the whole result, but maybe not in
batches of exactly size one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-04 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 4, 2023 at 10:22 AM Daniel Verite  wrote:
>> A solution would be for psql to use PQsetSingleRowMode() to retrieve
>> results row-by-row, as opposed to using a cursor, and then allocate
>> memory for only FETCH_COUNT rows at a time.

> Is there any reason that someone hasn't, like, already done this?

As you well know, psql's FETCH_COUNT mechanism is far older than
single-row mode.  I don't think anyone's tried to transpose it
onto that.  I agree that it seems like a good idea to try.
There will be more per-row overhead, but the increase in flexibility
is likely to justify that.

regards, tom lane




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-04 Thread Robert Haas
On Wed, Jan 4, 2023 at 10:22 AM Daniel Verite  wrote:
> A solution would be for psql to use PQsetSingleRowMode() to retrieve
> results row-by-row, as opposed to using a cursor, and then allocate
> memory for only FETCH_COUNT rows at a time. Incidentally it solves
> other problems like queries containing multiple statements, that also
> fail to work properly with cursors, or UPDATE/INSERT... RETURNING.. on
> large number of rows that could also benefit from pagination in
> memory.

Is there any reason that someone hasn't, like, already done this?

Because if there isn't, we should really do this. And if there is,
like say that it would hurt performance or something, then we should
come up with a fix for that problem and then do something like this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-04 Thread Daniel Verite
Jakub Wartak wrote:

> It might be a not so well known fact (?) that CTEs are not executed
> with cursor when asked to do so, but instead silently executed with
> potential huge memory allocation going on. Patch is attached. My one
> doubt is that not every statement starting with "WITH" is WITH(..)
> SELECT of course.

Yes, that's why WITH queries are currently filtered out by the
FETCH_COUNT feature.

Case in point:

test=> begin;
BEGIN

test=> create table tbl(i int);
CREATE TABLE

test=> declare psql_cursor cursor for
 with r(i) as (values (1))
 insert into tbl(i) select i from r;
ERROR:  syntax error at or near "insert"
LINE 3: insert into tbl(i) select i from r;


So the fix you're proposing would fail on that kind of queries.

A solution would be for psql to use PQsetSingleRowMode() to retrieve
results row-by-row, as opposed to using a cursor, and then allocate
memory for only FETCH_COUNT rows at a time. Incidentally it solves
other problems like queries containing multiple statements, that also
fail to work properly with cursors, or UPDATE/INSERT... RETURNING.. on
large number of rows that could also benefit from pagination in
memory.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-04 Thread Jakub Wartak
Hi -hackers,

I've spent some time fighting against "out of memory" errors coming
out of psql when trying to use the cursor via FETCH_COUNT. It might be
a not so well known fact (?) that CTEs are not executed with cursor
when asked to do so, but instead silently executed with potential huge
memory allocation going on. Patch is attached. My one doubt is that
not every statement starting with "WITH" is WITH(..) SELECT of course.

Demo (one might also get the "out of memory for query result"):

postgres@hive:~$ psql -Ant --variable='FETCH_COUNT=100' -c "WITH data
AS (SELECT  generate_series(1, 2000) as Total) select repeat('a',
100) || data.Total || repeat('b', 800) as total_pat from data;"
Killed
postgres@hive:~$ tail -4 /var/log/postgresql/postgresql-14-main.log
[..]
2023-01-04 12:46:20.193 CET [32936] postgres@postgres LOG:  could not
send data to client: Broken pipe
[..]
2023-01-04 12:46:20.195 CET [32936] postgres@postgres FATAL:
connection to client lost

With the patch:
postgres@hive:~$ /tmp/psql16-with-patch -Ant
--variable='FETCH_COUNT=100' -c "WITH data AS (SELECT
generate_series(1, 2000) as Total) select repeat('a', 100) ||
data.Total || repeat('b', 800) as total_pat from data;" | wc -l
2000
postgres@hive:~$

Regards,
-Jakub Wartak.


0001-psql-allow-CTE-queries-to-be-executed-also-using-cur.patch
Description: Binary data