I am currently trying to separate two environments contained in one database. Essentially I need to break that one database down into two with a portion of the data going to each new database. I am intending to achieve this by duplicating the database and then stripping out the data that is not required in each database. I have started by trying to delete data from a set of 28 related tables, however the performance appears to be terrible. I am deleting from a table called document which cascades down to 27 tables underneath it linked by various cascading foreign key constraints. Some of these subsidiary tables have as many as a couple of million records.

Before executing the delete statement from document I tried setting all constraints as deferred within a transaction, but this does not seem to have helped.

I can't work out whether the indexes on these tables are a help or a hindrance. Presumably, any involving the foreign keys should help as long as PostgreSQL will actually use them, but given that large numbers of records are being deleted the query planner may decide just to do a sequence scan. An EXPLAIN doesn't show me what it does past the delete from document, i.e. if indexes are used when cascading. The downside of the indexes is that they have to be maintained which could be a lot of work in large scale deletions.

What I fear is that for every row that is deleted from the document table, the database is visiting all subsidiary tables to delete all data related to that one row before returning to document to delete another row. this would mean that all tables are being visited many times. If this is the way it is working, then the large tables are going to be a real problem. The most efficient way to do it would be to delete all document records, then with that list of documents in mind go on to the next table and delete all related records so that each table is only visited once to delete all the relevant records. I was hoping that setting constraints deferred would achieve this.

Can anyone advise me on how PostgreSQL (v8.0.3 on MacOS X 10.3) works in a delete statement and what strategy it uses to remove the data?
Can I specify "Unrecoverable" so that it doesn't write redo?
Are they any indicators I can use to tell me what part of the delete is taking so much time?
Also can anyone suggest anything else I can do to speed things up?

Or perhaps it simply is a lot of work and there is no way round it. My fallback option is to SELECT data that I do need rather than DELETE the data that I don't, but this route means I cannot make use of the foreign keys.

Regards,
Alex Stanier.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to