Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

John Gorman Thu, 23 Feb 2017 04:48:07 -0800

Yes of course that’s all verified and taken into account during code 
initialization

From: Vitalii Tymchyshyn [mailto:[email protected]]
Sent: Wednesday, February 22, 2017 8:14 PM
To: John Gorman; [email protected]
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in 
Postgres

For JDBC there are certain prerequisites for setFetchSize to work, e.g. using 
forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman 
<[email protected]<mailto:[email protected]>> пише:
My experience with cursors in PostgreSQL with Java has been to stay away from 
them. We support 2 databases with our product, PostgreSQL (default) and SQL 
Server. While re-encrypting data in a database the application used cursors 
with a fetch size of 1000.

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL 
table with more than 11 million rows. After spending weeks trying to figure out 
what was happening, I realized that when it gets to a table with more than 10 
million rows for some reason, the cursor functionality just silently stopped 
working and it was reading the entire table. I asked another very senior 
architect to look at it and he came to the same conclusion. Because of limited 
time, I ended up working around it using limit/offset.

Again we are using Java, so the problem could just be in the PostgreSQL JDBC 
driver. Also we were on 9.1 at the time.

Regards
John

From: 
[email protected]<mailto:[email protected]>

[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in 
Postgres

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data 
sizes, since the entire data to be streamed from the table is always duplicated 
into another buffer and then streamed?

> if you want the whole query result at once, why are you bothering with a 
> cursor?

The PostgreSQL docs 
(https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) 
clearly recommend cursors as a way to return a reference to a large result set 
from a function (as I understood, this is recommended precisely as a way to 
avoid tuple-based buffering of the data).

So following that advice, it's not unreasonable that I would actually have a 
cursor to a large dataset.

Then, I would ideally want to be able to fetch the data from that cursor 
without the entire data getting duplicated (even if only a bit at a time 
instead of all at once, which seems to be the best case behaviour) as I go.

Additionally, I thought that if I had a streaming use-case (which I do), and a 
streaming data-access layer (which I do), then since `SELECT * FROM large` is 
absolutely fine, end-to-end, in that situation, then by symmetry and the 
principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Reply via email to