[PERFORM] Deleting millions of rows

2009-02-02 Thread Brian Cox
I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete 
from ts_defects;

Result: out of memory/Can't allocate size: 32
I then did 10 or so deletes to get rid of the rows. Afterwards, inserts 
into or queries on this
table performed significantly slower. I tried a vacuum analyze, but this 
didn't help. To fix this,

I dumped and restored the database.

1) why can't postgres delete all rows in a table if it has millions of rows?
2) is there any other way to restore performance other than restoring 
the database?


Thanks,
Brian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Robert Haas
On Mon, Feb 2, 2009 at 1:17 PM, Brian Cox brian@ca.com wrote:
 I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete from
 ts_defects;
 Result: out of memory/Can't allocate size: 32
 I then did 10 or so deletes to get rid of the rows. Afterwards, inserts into
 or queries on this
 table performed significantly slower. I tried a vacuum analyze, but this
 didn't help. To fix this,
 I dumped and restored the database.

 1) why can't postgres delete all rows in a table if it has millions of rows?
 2) is there any other way to restore performance other than restoring the
 database?

Does the table have triggers on it?  Does it have indexes? What is the
result of pg_relation_size() on that table?

How much memory do you have in your machine? What is work_mem set to?

Did you try VACUUM FULL instead of just plain VACUUM to recover
performance?  You might also need to REINDEX.

Or you could TRUNCATE the table.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Jerry Champlin
Brian:

One approach we use for large tables is to partition and then drop
partitions as the data becomes obsolete.  This way you never have the
problem.  Our general rule is to never delete data from a table because it
is too slow.  We have found this to be the preferred approach regardless of
database platform.

-Jerry

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547


-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Brian Cox
Sent: Monday, February 02, 2009 11:18 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Deleting millions of rows

I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete 
from ts_defects;
Result: out of memory/Can't allocate size: 32
I then did 10 or so deletes to get rid of the rows. Afterwards, inserts 
into or queries on this
table performed significantly slower. I tried a vacuum analyze, but this 
didn't help. To fix this,
I dumped and restored the database.

1) why can't postgres delete all rows in a table if it has millions of rows?
2) is there any other way to restore performance other than restoring 
the database?

Thanks,
Brian


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Brian Cox

Robert Haas [robertmh...@gmail.com] wrote:
Thanks for your response.


Does the table have triggers on it?  Does it have indexes? What is the
result of pg_relation_size() on that table?

No triggers; 3 indexes
cemdb= select pg_relation_size('ts_defects');
 pg_relation_size
--
   9464971264
(1 row)

cemdb=
cemdb= select pg_relation_size('ts_defects_DateIndex');
 pg_relation_size
--
   1299931136
(1 row)

cemdb= select pg_relation_size('ts_defects_DefectIndex');
 pg_relation_size
--
   1217224704
(1 row)

cemdb= select pg_relation_size('ts_defects_EventIndex');
 pg_relation_size
--
   1216528384



How much memory do you have in your machine? What is work_mem set to?

32G; work_mem=64M


Did you try VACUUM FULL instead of just plain VACUUM to recover
performance?  You might also need to REINDEX.
Or you could TRUNCATE the table.
I didn't try FULL or REINDEX. In this case, TRUNCATE is the best option 
as I was just trying to reset the state of the table for another test. 
But this brings up another question: will autovacuum do the right thing 
to preserve performance on this table when many rows are deleted?


Thanks,
Brian

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Robert Haas
On Mon, Feb 2, 2009 at 3:01 PM, Brian Cox brian@ca.com wrote:
 How much memory do you have in your machine? What is work_mem set to?

 32G; work_mem=64M

Hmm.  Well then I'm not sure why you're running out of memory, that
seems like a bug. Taking a long time, I understand.  Crashing, not so
much.

 Did you try VACUUM FULL instead of just plain VACUUM to recover
 performance?  You might also need to REINDEX.
 Or you could TRUNCATE the table.

 I didn't try FULL or REINDEX. In this case, TRUNCATE is the best option as I
 was just trying to reset the state of the table for another test. But this
 brings up another question: will autovacuum do the right thing to preserve
 performance on this table when many rows are deleted?

I don't think so.  I think you need to VACUUM FULL and REINDEX when
you do a big DELETE.  But if you TRUNCATE then you should be OK - no
further cleanup required in that case.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread David Wilson
On Mon, Feb 2, 2009 at 1:17 PM, Brian Cox brian@ca.com wrote:
 I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete from
 ts_defects;
 Result: out of memory/Can't allocate size: 32

Is this table the target of any foreign keys?

-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Dave Dutcher
 -Original Message-
 From: Brian Cox
 Subject: [PERFORM] Deleting millions of rows
 
 I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: 
 delete from ts_defects;
 Result: out of memory/Can't allocate size: 32 I then did 10 
 or so deletes to get rid of the rows. Afterwards, inserts 
 into or queries on this table performed significantly slower. 
 I tried a vacuum analyze, but this didn't help. To fix this, 
 I dumped and restored the database.
 
 1) why can't postgres delete all rows in a table if it has 
 millions of rows?
 2) is there any other way to restore performance other than 
 restoring the database?
 
 Thanks,
 Brian

If you are deleting an entire table, then the TRUNCATE command is the way to
go.  TRUNCATE is very fast and leaves no dead rows behind.  The problem with
a normal delete is that the rows are not actually removed from the file.
Once the table is VACUUMED the dead space is marked as available to be
reused, but plain VACUUM doesn't remove any space either.  A VACUUM FULL or
CLUSTER will actually remove dead space, but they can take a while to run.
(I've heard CLUSTER is supposed to be faster than VACUUM FULL)  Another way
is to create a new table with the same definition as the old table, select
the rows you want to keep into the new table, drop the old table, and then
rename the new table to have the old table's name.  


Dave


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Brian Cox

David Wilson [david.t.wil...@gmail.com] wrote:


Is this table the target of any foreign keys?

There are 2 on delete cascade FKs that reference this table.

Brian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread David Wilson
On Mon, Feb 2, 2009 at 3:37 PM, Brian Cox brian@ca.com wrote:
 David Wilson [david.t.wil...@gmail.com] wrote:

 Is this table the target of any foreign keys?

 There are 2 on delete cascade FKs that reference this table.

I believe that's the source of your memory issues. I think TRUNCATE
may handle this more effectively; alternately you can handle the
cascading yourself in these cases. (And, as Dave Dutcher mentioned,
TRUNCATE is definitely the way to go for full-table wipes).


-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Scott Marlowe
On Mon, Feb 2, 2009 at 11:17 AM, Brian Cox brian@ca.com wrote:
 I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete from
 ts_defects;
 Result: out of memory/Can't allocate size: 32
 I then did 10 or so deletes to get rid of the rows. Afterwards, inserts into
 or queries on this
 table performed significantly slower. I tried a vacuum analyze, but this
 didn't help.

There are two different problems happening here.  One is the failed
delete, the other is the normal bloating caused when a lot of rows are
deleted.

When deleting every row in a table you're much better off just
truncating it.  But foreign keys can get in the way so you might need
truncate cascade.  If you're not sure you really want to do it you can
wrap your truncate in a begin;commit; pair and see how the database
looks after the truncate.

If you choose to use a delete, then foreign keys can slow things down
quite a bit, and if you've got bad stats it's possible for the planner
to choose a plan that runs out of memory.  Was this db recently
analyzed?

If the delete is what you need for some reason, a regular vacuum won't
fix your problem, because it only makes dead tuples available again,
it doesn't remove them.  A cluster command OR vacuum full followed by
reindex are the two best ways to get the space recovered.

 To fix this,
 I dumped and restored the database.

That works too.  Since the table was empty, you could have dropped and
recreated it, but if you had foreign keys you'd have to recreate them
too.

 1) why can't postgres delete all rows in a table if it has millions of rows?

It works fine for me.  Often into the billions.  Your test case seems
out of the ordinary.

Can you post all the non-default values in your postgresql.conf /
alter database set ... settings?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Feb 2, 2009 at 3:01 PM, Brian Cox brian@ca.com wrote:
 How much memory do you have in your machine? What is work_mem set to?
 
 32G; work_mem=64M

 Hmm.  Well then I'm not sure why you're running out of memory,

It's the pending trigger list.  He's got two trigger events per row,
which at 40 bytes apiece would approach 4GB of memory.  Apparently
it's a 32-bit build of Postgres, so he's running out of process address
space.

There's a TODO item to spill that list to disk when it gets too large,
but the reason nobody's done it yet is that actually executing that many
FK check trigger events would take longer than you want to wait anyway.

TRUNCATE is the best solution if you want to get rid of the whole table
contents.  If you're deleting very many but not all rows, people tend
to drop the FK constraints and re-establish them afterwards.  Retail
checking is just too slow.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Brian Cox

Tom Lane [...@sss.pgh.pa.us] wrote:

It's the pending trigger list.  He's got two trigger events per row,
which at 40 bytes apiece would approach 4GB of memory.  Apparently
it's a 32-bit build of Postgres, so he's running out of process address
space.
Yes, this is a 32 bit Postgres running on a 32 bit Linux. I assume that 
the 2 triggers are due to the 2 on delete cascade FKs. Thanks for 
explaining this bit of a mystery.



TRUNCATE is the best solution if you want to get rid of the whole table
contents.  If you're deleting very many but not all rows, people tend
to drop the FK constraints and re-establish them afterwards.  Retail
checking is just too slow.

Thanks also to you (and several others) for reminding me of TRUNCATE.
This will definitely work for what I was trying to do: reset this table 
for more testing.


In production, the table on which I ran DELETE FROM grows constantly 
with old data removed in bunches periodically (say up to a few 100,000s 
of rows [out of several millions] in a bunch). I'm assuming that 
auto-vacuum/analyze will allow Postgres to maintain reasonable 
performance for INSERTs and SELECTs on it; do you think that this is a 
reasonable assumption?


Thanks,
Brian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Scott Marlowe
On Mon, Feb 2, 2009 at 3:01 PM, Brian Cox brian@ca.com wrote:

 In production, the table on which I ran DELETE FROM grows constantly with
 old data removed in bunches periodically (say up to a few 100,000s of rows
 [out of several millions] in a bunch). I'm assuming that auto-vacuum/analyze
 will allow Postgres to maintain reasonable performance for INSERTs and
 SELECTs on it; do you think that this is a reasonable assumption?

Yes, as long as you're deleting a small enough percentage that it
doesn't get bloated (100k of millions is a good ratio) AND autovacuum
is running AND you have enough FSM entries to track the dead tuples
you're gold.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-02 Thread Robert Haas
 It's the pending trigger list.  He's got two trigger events per row,
 which at 40 bytes apiece would approach 4GB of memory.  Apparently
 it's a 32-bit build of Postgres, so he's running out of process address
 space.

 There's a TODO item to spill that list to disk when it gets too large,
 but the reason nobody's done it yet is that actually executing that many
 FK check trigger events would take longer than you want to wait anyway.

Have you ever given any thought to whether it would be possible to
implement referential integrity constraints with statement-level
triggers instead of row-level triggers?  IOW, instead of planning this
and executing it N times:

DELETE FROM ONLY fktable WHERE $1 = fkatt1 [AND ...]

...we could join the original query against fktable with join clauses
on the correct pairs of attributes and then execute it once.

Is this insanely difficult to implement?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance