Hi, @Francisco - Yeah, the file is around 600 Mb currently, uncompressed.
You're right, our internet connection is going to be the limiting factor. Essentially, the PostgreSQL server is in a datacentre, the server we're dumping to is in the office. Running a script on the PostgreSQL server in the datacentre is going to be tricky (not so much technically, just from a procedures/security point of view). Dumping to a spare table seems like an interesting point - so we'd just create the table, COPY the results to that table, then use LIMIT/OFFSET to paginate through that, then drop the table afterwards? Currently, I'm doing a quick hack where we download an ordered list of the ids (auto-incrementing integer) into Python, chunk it up into groups of ids, then use a WHERE IN clause to download each chunk via COPY. Would dumping to a spare table and paginating a better approach? Reasons? (Not challenging it, I just want to understand everything). Cheers, Victor On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <fola...@peoplecall.com>wrote: > On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorh...@yahoo.com> wrote: > > They think that it might be limited by the network, and how fast the > > PostgreSQL server can push the data across the internet. (The Postgres > > server and the box running the query are connected over the internet). > > You previously said you had 600Mb. Over the internet. ¿ Is it a very > fat pipe ? Because otherwise the limitng factor is probably not the > speed at which postgres can push the resuts, but he throughput of your > link. > > If, as you stated, you need a single transaction to get a 600Mb > snapshot I would recommend to dump it to disk, compressing on the fly > ( you should get easily four o five fold reduction on a CSV file using > any decent compressor ), and then send the file. If you do not have > disk for the dump but can run programs near the server, you can try > compressing on the fly. If you have got none of this but have got > space for a spare table, use a select into, paginate this output and > drop it after. Or just look at the configs and set longer query times, > if your app NEEDS two hour queries, they can be enabled. But anyway, > doing a long transaction over the internet does not seem like a good > idea to me. > > Francisco Olarte >