Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Mike Beaton Fri, 17 Feb 2017 23:58:23 -0800

Dear Tom,

This is very helpful, thank you.


You make a very useful point that the limitation is basically on PL/pgSQL
and other PL languages. And someone on SO already pointed out that an
inline SQL function with a enormous sized TABLE return value also doesn't
have any buffering problems. So that's a very convenient option, whenever
SQL alone is powerful enough.

You make the further very helpful point that any library which is written
using `libpq` won't work as desired on `FETCH ALL FROM HugeCursor`. But I
don't know whether that's 'most' libraries. I think that depends on your
programming milieu! I'm working in the world of ADO.NET (but the same seems
to apply to JDBC) where most low level drivers are not written using
`libpq` but rather directly with sockets against the database - which makes
sense because a streaming data reader is part of the contract which those
drivers have to implement.

It's definitely worth noting that the `FETCH 100000 FROM cursor` until
exhausted pattern will *always* be safe. But most fundamentally I did, very
specifically, want to know if the `FETCH ALL FROM
CursorToAstronomicallyLargeData` pattern can *ever* work sensibly. It seems
it clearly can and does if certain assumptions are met. Assumptions which I
actually know *are* met, in the case in which I potentially wanted to use
it!

One outstanding question I have. Based on a lot of helpful responses given
to the SO question I can now test and see what disk buffers are generated
(by setting `log_temp_files` to `0` and then `tail -f log`), as well as how
long it takes for results to start arriving.

With a large (10,000,000 row) test table, if I do `SELECT * FROM table` on
psql it starts to return results immediately with no disk buffer. If I do
`FETCH ALL FROM cursortotable` on psql it takes about 7.5 seconds to start
returning results, and generates a 14MB buffer. If I do `SELECT * FROM
table` on a correctly coded streaming client, it also starts to return
results immediately with no disk buffer. But if I do `FETCH ALL FROM
cursortotable` from my streaming client, it takes about 1.5 seconds for
results to start coming... but again with no disk buffer, as hoped

I was kind of hoping that the 'it creates a buffer' and the 'it takes a
while to start' issues would be pretty much directly aligned, but it's
clearly not as simple as that! I don't know if you can offer any more
helpful insight on this last aspect?

Many thanks,

Mike

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Reply via email to