Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

John Gorman Tue, 21 Feb 2017 06:07:29 -0800

My experience with cursors in PostgreSQL with Java has been to stay away from 
them. We support 2 databases with our product, PostgreSQL (default) and SQL 
Server. While re-encrypting data in a database the application used cursors 
with a fetch size of 1000.


Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL 
table with more than 11 million rows. After spending weeks trying to figure out 
what was happening, I realized that when it gets to a table with more than 10 
million rows for some reason, the cursor functionality just silently stopped 
working and it was reading the entire table. I asked another very senior 
architect to look at it and he came to the same conclusion. Because of limited 
time, I ended up working around it using limit/offset.

Again we are using Java, so the problem could just be in the PostgreSQL JDBC 
driver. Also we were on 9.1 at the time.

Regards
John

From: [email protected] 
[mailto:[email protected]] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: [email protected]
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in 
Postgres

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data 
sizes, since the entire data to be streamed from the table is always duplicated 
into another buffer and then streamed?

> if you want the whole query result at once, why are you bothering with a 
> cursor?

The PostgreSQL docs 
(https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) 
clearly recommend cursors as a way to return a reference to a large result set 
from a function (as I understood, this is recommended precisely as a way to 
avoid tuple-based buffering of the data).

So following that advice, it's not unreasonable that I would actually have a 
cursor to a large dataset.

Then, I would ideally want to be able to fetch the data from that cursor 
without the entire data getting duplicated (even if only a bit at a time 
instead of all at once, which seems to be the best case behaviour) as I go.

Additionally, I thought that if I had a streaming use-case (which I do), and a 
streaming data-access layer (which I do), then since `SELECT * FROM large` is 
absolutely fine, end-to-end, in that situation, then by symmetry and the 
principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Reply via email to