Hi All, I am having a performance problem extracting a large volume of data from Postgres 7.4.2, and was wondering if there was a more cunning way to get the data out of the DB...
This isn't a performance problem with any particular PgSQL operation, its more a strategy for getting large volumes of related tables out of the DB whilst perserving the relations between them. Basically we have a number of tables, which are exposed as 2 public views (say PvA and PvB). For each row in PvA, there are a number of related rows in PvB (this number is arbitrary, which is one of the reasons why it cant be expressed as additional columns in PvA - so we really need 2 sets of tables - which leads to two sets of extract calls - interwoven to associate PvA with PvB). The problem is that for extraction, we ultimately want to grab a row from PvA, and then all the related rows from PvB and store them together offline (e.g. in XML). However, the number of rows at any time on the DB is likely to be in the millions, with perhaps 25% of them being suitable for extraction at any given batch run (ie several hundred thousand to several million). Currently, the proposal is to grab several hundred rows from PvA (thus avoiding issues with the resultset being very large), and then process each of them by selecting the related rows in PvB (again, several hundred rows at a time to avoid problems with large result sets). So the algorithm is basically: Do Select the next 200 rows from PvA For each PvA row Do Write current PvA row as XML Do Select the next 200 rows from PvB For each PvB row Do Write current PvB row as XML within the parent PvA XML Element End For While More Rows End For While More Rows However, this has a fairly significant performance impact, and I was wondering if there was a faster way to do it (something like taking a dump of the tables so they can be worked on offline - but a basic dump means we have lost the 1:M relationships between PvA and PvB). Are there any tools/tricks/tips with regards to extracting large volumes of data across related tables from Postgres? It doesnt have to export into XML, we can do post-processing on the extracted data as needed - the important thing is to keep the relationship between PvA and PvB on a row-by-row basis. Many thanks, Damien ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend