Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Mike Beaton Fri, 17 Feb 2017 04:05:49 -0800

I asked the same question at the same time on Stack Overflow (sincere
apologies if this is a breach of etiquette - I really needed an answer, and
I thought the two communities might not overlap).


Stackoverflow now has an answer, by me:
http://stackoverflow.com/q/42292341/#42297234 - which is based on
accumulating the most consistent, coherent information from the answers and
comments given there so far.

I think this is right, and I happily repeat it below, for anyone finding my
question on this list. But I would still *love* to find official PostgreSQL
documentation of all this. And of course to be told - quickly! - if anyone
knows it is still wrong.

***The answer is:***

**Q1:** For `SELECT * FROM AstronomicallyHugeTable` sent over the wire,
then PostgreSQL will *not* generate a huge buffer, and will stream the data
efficiently, starting quickly, to the client.

**Q2:** For `FETCH ALL FROM CursorToAstronomicallyHugeTable` sent over the
wire, then PostgreSQL will also *not* generate a huge buffer, and also will
stream the data efficiently, starting quickly, to the client.

**Implications of this for `FETCH ALL FROM cursor`**

IF (and this is a big if) you have client software which is NOT going to
store all the fetched data anywhere, but is just trying to do something
with it row by row (and this presupposes that your data access layer
supports this, which Npgsql does), then there is nothing wrong with `FETCH
ALL FROM cursor`. No huge buffers anywhere. No long setup time. Processing
huge data this way will certainly run for a very long time - or at least
until the user or some other condition aborts the process, and the cursor
can be closed. But it will start to run quickly, and its usage of resources
will be efficient.

**WARNINGS**

It would *never* make sense to do `FETCH ALL FROM cursor` for
astronomically large data, if your client side code (including your data
access layer) has any bottleneck at all at which means that all the data
from a command is fetched before any processing can be done. Many data
access layers (and especially data access wrappers) are like this. So
beware. But it is also true that not all client side code is made this way.

Returning huge data using a `TABLE` or `SETOF` return type from within a
PostgeSQL function will *always* be broken (i.e. will create a huge buffer
and take a very long time to start). This will be so whether the function
is called from SQL to SQL or called over the wire. The bottleneck is before
the function returns. For efficient returns of very large data sets you
must use a cursor return from a function (or else do `SELECT *` directly
over the wire), in every case.

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Reply via email to