Re: [PERFORM] Delete performance

2011-06-01 Thread Jarrod Chesney

On 01/06/2011, at 11:45 AM, Jarrod Chesney wrote:

 I'm executing 30,000 single delete statements in one transaction.
 
 At this point i'm looking into combining the multiple deletes into one 
 statement and breaking my big transaction into smaller ones of about 100 
 deletes or so.
 
 On 01/06/2011, at 11:40 AM, Craig Ringer wrote:
 
 On 1/06/2011 7:11 AM, Pierre C wrote:
 If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it
 takes close to 10 minutes.
 
 Do you run those in a single transaction or do you use one transaction
 per DELETE ?
 
 In the latter case, postgres will ensure each transaction is commited to
 disk, at each commit. Since this involves waiting for the physical I/O
 to happen, it is slow. If you do it 30.000 times, it will be 30.000
 times slow.
 
 Not only that, but if you're doing it via some application the app has to 
 wait for Pg to respond before it can send the next query. This adds even 
 more delay, as do all the processor switches between Pg and your application.
 
 If you really must issue individual DELETE commands one-by-one, I *think* 
 you can use synchronous_commit=off or
 
 SET LOCAL synchronous_commit TO OFF;
 
 See:
 
 http://www.postgresql.org/docs/current/static/runtime-config-wal.html
 
 
 -- 
 Craig Ringer
 
 Tech-related writing at http://soapyfrogs.blogspot.com/
 

Apologies for top posting, Sorry.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Delete performance

2011-05-31 Thread Grzegorz Jaśkiewicz
9.0rc1 ?
You know that the stable 9.0 has been out for quite a while now.
Its not going to affect the delete speed in any way, but I would
generally advice you to upgrade it to the lates 9.0.x

As for the delete it self, check if you have indices on the tables
that refer the main table on the referred column. Often times that's
the issue.
Other thing is , number of triggers on the other tables.

-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Delete performance

2011-05-31 Thread Pierre C
If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it  
takes close to 10 minutes.


Do you run those in a single transaction or do you use one transaction per  
DELETE ?


In the latter case, postgres will ensure each transaction is commited to  
disk, at each commit. Since this involves waiting for the physical I/O to  
happen, it is slow. If you do it 30.000 times, it will be 30.000 times  
slow.


Note that you should really do :

DELETE FROM table WHERE id IN (huge list of ids).

or

DELETE FROM table JOIN VALUES (list of ids) ON (...)

Also, check your foreign keys using cascading deletes have indexes in the  
referencing tables. Without an index, finding the rows to cascade-delete  
will be slow.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Delete performance

2011-05-30 Thread Jarrod Chesney
Hi All
My database uses joined table inheritance and my server version is 9.0

Version string  PostgreSQL 9.0rc1 on x86_64-pc-linux-gnu, compiled by GCC 
x86_64-pc-linux-gnu-gcc (Gentoo 4.4.4-r1 p1.1, pie-0.4.5) 4.4.4, 64-bit   

I have about 120,000 records in the table that everything else inherits from, 
if i truncate-cascaded this table it happens almost instantly. If i run 30,000 
prepared DELETE FROM xxx WHERE ID = ? commands it takes close to 10 minutes.

My foreign keys to the base table are all set with ON DELETE CASCADE. I've 
looked though all the feilds that relate to the ID in the base table and 
created btree indexes for them.

Can anyone outline what I need to verify/do to ensure i'm getting the best 
performance for my deletes?

Regards, Jarrod Chesney 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-25 Thread marcin mank
On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti kronos...@gmail.com wrote:
         -  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) 
 (a
 ctual time=571807.575..610178.552 rows=26185953 loops=1)


This is Your problem. The system`s estimate for the number of distinct
annotation_ids in t2 is wildly off.

The disk activity is almost certainly swapping (You can check it
iostat on the linux machine).

Can You try analyze t2 just before the delete quety? maybe try
raising statistics target for the annotation_id column.

If all else fails, You may try set enable_hashagg to false just
before the query.

Greetings
Marcin Mańk


Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-25 Thread Luca Tettamanti
On Wed, Nov 25, 2009 at 04:22:47PM +0100, marcin mank wrote:
 On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti kronos...@gmail.com wrote:
          -  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 
  width=8) (a
  ctual time=571807.575..610178.552 rows=26185953 loops=1)
 
 
 This is Your problem. The system`s estimate for the number of distinct
 annotation_ids in t2 is wildly off.

Ah, I see.

 The disk activity is almost certainly swapping (You can check it
 iostat on the linux machine).

Nope, zero swap activity. Under Linux postgres tops up at about 4.4GB, leaving
3.6GB of page cache (nothing else is running right now).

 Can You try analyze t2 just before the delete quety? maybe try
 raising statistics target for the annotation_id column.

I already tried, the estimation is still way off.

 If all else fails, You may try set enable_hashagg to false just
 before the query.

 Hash IN Join  (cost=1879362.27..11080576.17 rows=202376 width=6) (actual 
time=250281.607..608638.141 rows=26185953 loops=1)
Hash Cond: (t1.annotation_id = t2.annotation_id)
   -  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) 
(actual time=0.017..193661.353 rows=45874812 loops=1)
  -  Hash  (cost=879289.12..879289.12 rows=60956812 width=8) (actual 
time=250271.012..250271.012 rows=60956812 loops=1)
   -  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812 
width=8) (actual time=0.023..178297.862 rows=60956812 loops=1)
 Total runtime: 900019.033 ms
(6 rows)

This is after an analyze.

The alternative query suggested by Shrirang Chitnis:

DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = 
t2.annotation_id)

performs event better:

 Seq Scan on t1  (cost=0.00..170388415.89 rows=22937406 width=6) (actual 
time=272.625..561241.294 rows=26185953 loops=1)
Filter: (subplan)
   SubPlan
-  Index Scan using t2_idx on t2  (cost=0.00..1113.63 rows=301 
width=0) (actual time=0.008..0.008 rows=1 loops=45874812)
   Index Cond: ($0 = annotation_id)
 Total runtime: 629426.014 ms
(6 rows)

Will try on the full data set.

thanks,
Luca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-25 Thread Grzegorz Jaśkiewicz
On Wed, Nov 25, 2009 at 4:13 PM, Luca Tettamanti kronos...@gmail.comwrote:



 DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id =
 t2.annotation_id)

 performs event better:

  Seq Scan on t1  (cost=0.00..170388415.89 rows=22937406 width=6) (actual
 time=272.625..561241.294 rows=26185953 loops=1)
Filter: (subplan)
   SubPlan
-  Index Scan using t2_idx on t2  (cost=0.00..1113.63 rows=301
 width=0) (actual time=0.008..0.008 rows=1 loops=45874812)
   Index Cond: ($0 = annotation_id)
  Total runtime: 629426.014 ms
 (6 rows)

 Have you tried:
DELETE FROM t1 USING t2 WHERE  t1.annotation_id = t2.annotation_id;

?




-- 
GJ


[PERFORM] DELETE performance problem

2009-11-24 Thread Luca Tettamanti
Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
 Table public.t1
Column |  Type  | Modifiers
---++---
 annotation_id | bigint | not null
Indexes:
t1_pkey PRIMARY KEY, btree (annotation_id)

test2=# \d t2
 Table public.t2
Column |  Type  | Modifiers
---++---
 annotation_id | bigint |
Indexes:
t2_idx btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2= explain analyze delete from t1 where annotation_id in (select annotation
_id from t2);
   QUERY PLAN


-
 Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339
5.565..1832056.588 rows=26185953 loops=1)
   Hash Cond: (t1.annotation_id = t2.annotation_id)
   -  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual tim
e=0.291..179119.487 rows=45874812 loops=1)
   -  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433
93.742..643393.742 rows=26185953 loops=1)
 -  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
   -  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812 width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
 Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms


An identital linux machine (with 8.4.1) shows the same issue; with strace I see
a lots of seeks:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 90.370.155484  15 10601   read
  9.100.0156495216 3   fadvise64
  0.390.000668   0  5499   write
  0.150.000253   0 10733   lseek
  0.000.00   0 3   open
  0.000.00   0 3   close
  0.000.00   0 3   semop
-- --- --- - - 
100.000.172054 26845   total

(30s sample) 

Before hitting the memory limit (AS on win2k8, unsure about Linux) the trace
is the following:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
100.000.063862   0321597   read
  0.000.00   0 3   lseek
  0.000.00   076   mmap
-- --- --- - - 
100.000.063862321676   total


The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Luca Tettamanti
On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
jchamp...@absolute-performance.com wrote:
 You may want to consider using partitioning.  That way you can drop the
 appropriate partition and never have the overhead of a delete.

Hum, I don't think it's doable in my case; the partitioning is not
know a priori. First t1 is fully populated, then the data is loaded
and manipulated by my application, the result is stored in t2; only
then I want to remove (part of) the data from t1.

thanks,
Luca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Thom Brown
2009/11/24 Luca Tettamanti kronos...@gmail.com

 On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
 jchamp...@absolute-performance.com wrote:
  You may want to consider using partitioning.  That way you can drop the
  appropriate partition and never have the overhead of a delete.

 Hum, I don't think it's doable in my case; the partitioning is not
 know a priori. First t1 is fully populated, then the data is loaded
 and manipulated by my application, the result is stored in t2; only
 then I want to remove (part of) the data from t1.

 thanks,
 Luca


It's a shame there isn't a LIMIT option on DELETE so this can be done in
small batches.

Thom


Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Jerry Champlin
You may want to consider using partitioning.  That way you can drop the
appropriate partition and never have the overhead of a delete.

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca Tettamanti
Sent: Tuesday, November 24, 2009 6:37 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] DELETE performance problem

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2 contains about
60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's
not even unique, in fact there are duplicates - there are about 20M distinct
annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the full
tables) with the following schemas:

test2=# \d t1
 Table public.t1
Column |  Type  | Modifiers
---++---
 annotation_id | bigint | not null
Indexes:
t1_pkey PRIMARY KEY, btree (annotation_id)

test2=# \d t2
 Table public.t2
Column |  Type  | Modifiers
---++---
 annotation_id | bigint |
Indexes:
t2_idx btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using
procexp I see the process churning the disk and using more memory until it
hits
some limit (at about 1.8GB) then the IO slows down considerably. See this
screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2= explain analyze delete from t1 where annotation_id in (select
annotation
_id from t2);
   QUERY PLAN



-
 Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
time=64339
5.565..1832056.588 rows=26185953 loops=1)
   Hash Cond: (t1.annotation_id = t2.annotation_id)
   -  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual
tim
e=0.291..179119.487 rows=45874812 loops=1)
   -  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
time=6433
93.742..643393.742 rows=26185953 loops=1)
 -  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
   -  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
 Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms


An identital linux machine (with 8.4.1) shows the same issue; with strace I
see
a lots of seeks:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 90.370.155484  15 10601   read
  9.100.0156495216 3   fadvise64
  0.390.000668   0  5499   write
  0.150.000253   0 10733   lseek
  0.000.00   0 3   open
  0.000.00   0 3   close
  0.000.00   0 3   semop
-- --- --- - - 
100.000.172054 26845   total

(30s sample) 

Before hitting the memory limit (AS on win2k8, unsure about Linux) the
trace
is the following:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
100.000.063862   0321597   read
  0.000.00   0 3   lseek
  0.000.00   076   mmap
-- --- --- - - 
100.000.063862321676   total


The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate array.
One machine is running linux 64bit (Debian/stable), the other win2k8 (32
bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB
checkpoint_segments = 16
wal_buffers = 8MB
fsync = off # Just in case... usually it's enabled
effective_cache_size = 4096MB

(the machine with win2k8 is running with a smaller shared_buffers - 16MB)

Any idea on what's going wrong here?

thanks,
Luca
[1] http://img10.imageshack.us/i/psql2.png/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql

Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Grzegorz Jaśkiewicz
On Tue, Nov 24, 2009 at 3:19 PM, Thom Brown thombr...@gmail.com wrote:

 2009/11/24 Luca Tettamanti kronos...@gmail.com

 On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin
 jchamp...@absolute-performance.com wrote:
  You may want to consider using partitioning.  That way you can drop the
  appropriate partition and never have the overhead of a delete.

 Hum, I don't think it's doable in my case; the partitioning is not
 know a priori. First t1 is fully populated, then the data is loaded
 and manipulated by my application, the result is stored in t2; only
 then I want to remove (part of) the data from t1.

 thanks,
 Luca


 It's a shame there isn't a LIMIT option on DELETE so this can be done in
 small batches.


you sort of can do it, using PK on table as pointer. DELETE FROM foo USING
... etc.
with subquery in using that will limit number of rows ;)




 Thom




-- 
GJ


Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Alan Hodgson
On Tuesday 24 November 2009, Thom Brown thombr...@gmail.com wrote:

 It's a shame there isn't a LIMIT option on DELETE so this can be done in
 small batches.

delete from table where pk in (select pk from table where delete_condition 
limit X);


-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-24 Thread Kris Kewley
 Even though the column in question is not unique on t2 could you not  
index it? That should improve the performance of the inline query.


Are dates applicable in any way? In some cases adding a date field,  
partitioning or indexing on that and adding where datex days. That  
can be an effective way to limit records searched.


Kris

On 24-Nov-09, at 9:59, Jerry Champlin jchamp...@absolute-performance.com 
 wrote:


You may want to consider using partitioning.  That way you can drop  
the

appropriate partition and never have the overhead of a delete.

Jerry Champlin|Absolute Performance Inc.|Mobile:  303-588-2547

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca  
Tettamanti

Sent: Tuesday, November 24, 2009 6:37 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] DELETE performance problem

Hello,
I've run in a severe performance problem with the following statement:

DELETE FROM t1 WHERE t1.annotation_id IN (
   SELECT t2.annotation_id FROM t2)

t1 contains about 48M record (table size is 5.8GB), while t2  
contains about

60M
record (total size 8.6GB). annotation_id is the PK in t1 but not in  
t2 (it's
not even unique, in fact there are duplicates - there are about 20M  
distinct

annotation_id in this table). There are no FKs on either tables.
I've killed the query after 14h(!) of runtime...

I've reproduced the problem using a only the ids (extracted from the  
full

tables) with the following schemas:

test2=# \d t1
Table public.t1
   Column |  Type  | Modifiers
---++---
annotation_id | bigint | not null
Indexes:
   t1_pkey PRIMARY KEY, btree (annotation_id)

test2=# \d t2
Table public.t2
   Column |  Type  | Modifiers
---++---
annotation_id | bigint |
Indexes:
   t2_idx btree (annotation_id)

The query above takes about 30 minutes to complete. The slowdown is  
not as
severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8  
using
procexp I see the process churning the disk and using more memory  
until it

hits
some limit (at about 1.8GB) then the IO slows down considerably. See  
this

screenshot[1].
This is exactly what happens with the full dataset.

This is the output of the explain:

test2= explain analyze delete from t1 where annotation_id in (select
annotation
_id from t2);
  QUERY  
PLAN


--- 
--- 
--


-
Hash Join  (cost=1035767.26..2158065.55 rows=181605 width=6) (actual
time=64339
5.565..1832056.588 rows=26185953 loops=1)
  Hash Cond: (t1.annotation_id = t2.annotation_id)
  -  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14)  
(actual

tim
e=0.291..179119.487 rows=45874812 loops=1)
  -  Hash  (cost=1033497.20..1033497.20 rows=181605 width=8) (actual
time=6433
93.742..643393.742 rows=26185953 loops=1)
-  HashAggregate  (cost=1031681.15..1033497.20 rows=181605
width=8) (a
ctual time=571807.575..610178.552 rows=26185953 loops=1)
  -  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812
width=8)
(actual time=2460.595..480446.581 rows=60956812 loops=1)
Total runtime: 2271122.474 ms
(7 rows)

Time: 2274723,284 ms


An identital linux machine (with 8.4.1) shows the same issue; with  
strace I

see
a lots of seeks:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
90.370.155484  15 10601   read
 9.100.0156495216 3   fadvise64
 0.390.000668   0  5499   write
 0.150.000253   0 10733   lseek
 0.000.00   0 3   open
 0.000.00   0 3   close
 0.000.00   0 3   semop
-- --- --- - - 
100.000.172054 26845   total

(30s sample)

Before hitting the memory limit (AS on win2k8, unsure about Linux)  
the

trace
is the following:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
100.000.063862   0321597   read
 0.000.00   0 3   lseek
 0.000.00   076   mmap
-- --- --- - - 
100.000.063862321676   total


The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data
directory
is on hardware (Dell PERC5) raid mirror, with the log on a separate  
array.
One machine is running linux 64bit (Debian/stable), the other win2k8  
(32

bit).

shared_buffers = 512MB
work_mem = 512MB
maintenance_work_mem = 1GB

Re: [PERFORM] Delete performance again

2008-10-10 Thread Віталій Тимчишин
BTW: Have just tried clean (without any foreign keys constraints)
peformance of
delete from tbl where field not in (select)
vs
create temporary table tmp(id)  as select distinct field from tbl; delete
from tmp where id in (select); delete from tbl where field in (select id
from tmp).
both tbl and select are huge.
tbl cardinality is ~5 million, select is ~1 milliion. Number of records to
delete is small.
select is simply select id from table2.

First (simple) one could not do in a night, second did in few seconds.


Re: [PERFORM] Delete performance again

2008-10-09 Thread Віталій Тимчишин
OK, I did try you proposal and correlated subselect.
I have a database ~90 companies.
First try was to remove randomly selected 1000 companies
Uncorrelated subselect: 65899ms
Correlated subselect: 97467ms
using: 9605ms
my way: 104979ms. (without constraints recreate)
My is the worst because it is oriented on massive delete.
So I thought USING would perform better, so I did try 1 companies
my way: 190527ms. (without constraints recreate)
using: 694144ms
I was a little shocked, but I did check plans and found out that it did
switch from Nested Loop to Hash Join.
I did disable Hash Join, it not show Merge Join. This was also disabled
and I've got 747253ms.
Then I've tried combinations: Without hash join it was the best result of
402629ms, without merge join it was 1096116ms.

My conclusion: Until optimizer would take into account additional actions
needed (like constraints check/cascade deletes/triggers), it can not make
good plan.


Re: [PERFORM] Delete performance again

2008-10-03 Thread Marc Mamin
Hi,
 
Maybe you can try this syntax. I'm not sure, but it eventually perform better:
 
 
delete from company_alias USING comprm
where company_alias.company_id =comprm.id


Cheers,

Marc

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Delete performance again

2008-10-02 Thread Віталій Тимчишин
Hello.

I have a database with company table that have a bunch of related
(delete=cascade) tables.
Also it has 1-M relation to company_descr table.
Once we've found that ~half of our companies do not have any description and
we would like to remove them.
First this I've tried was
delete from company where id not in (select company_id from company_descr);
I've tried to analyze command, but unlike to other RDBM I've used it did not
include cascade deletes/checks into query plan. That is first problem.
It was SLOW. To make it faster I've done next thing:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company where id in (select id from comprm);

That was much better. So the question is why postgresql can't do such a
thing.
But it was better only until removing dataset was small (~5% of all
table).
As soon as I've tried to remove 50% I've got speed problems. I've ensured I
have all indexes for both ends of foreign key.
I've tried to remove all cascaded entries by myself, e.g.:

create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company_alias where company_id in (select id from comprm);
...
delete from company where id in (select id from comprm);

It did not help until I drop all constraints before and recreate all
constraints after.
Now I have it work for 15minutes, while previously it could not do in a day.

Is it OK? I'd say, some (if not all) of the optimizations could be done by
postgresql optimizer.


Re: [PERFORM] Delete performance again

2008-10-02 Thread Tom Lane
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
 delete from company where id not in (select company_id from company_descr);
 I've tried to analyze command, but unlike to other RDBM I've used it did not
 include cascade deletes/checks into query plan. That is first problem.
 It was SLOW.

Usually the reason for that is having forgotten to make an index on the
referencing column(s) ?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Delete performance again

2008-10-02 Thread Віталій Тимчишин
2008/10/2 Tom Lane [EMAIL PROTECTED]

 =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes:
  delete from company where id not in (select company_id from
 company_descr);
  I've tried to analyze command, but unlike to other RDBM I've used it did
 not
  include cascade deletes/checks into query plan. That is first problem.
  It was SLOW.

 Usually the reason for that is having forgotten to make an index on the
 referencing column(s) ?


Not at all. As you can see below in original message, simply extending the
query to what should have been done by optimizer helps. I'd say optimizer
always uses fixed plan not taking into account that this is massive update
and id doing index lookup of children records for each parent record, while
it would be much more effective to perform removal of all children records
in single table scan.

It's like trigger for each record instead of for each statement.


Re: [PERFORM] Delete performance on delete from table with inherited tables

2004-04-05 Thread Chris Kratz
Thanks Stephan and Tom for your responses.  We have been busy, so I haven't 
had time to do any further research on this till yesterday.  I found that the 
large number of triggers on the parent or master table were foreign key 
triggers for each table back to the child tables (update and delete on 
master, insert on child).  The triggers have existed through several versions 
of postgres and as far as we can tell were automatically created using the 
references keyword at inception.

Yesterday I dropped all the current triggers on parent and children and ran a 
script that did an alter table add foreign key constraint to each of the 67 
child tables with update cascade delete cascade.  After this, the delete from 
the parent where no records existed in the child tables was far more 
acceptable.  Instead of taking hours to do the delete, the process ran for 
about 5 minutes on my workstation.  Removing all constraints entirely reduces 
this time to a couple of seconds.  I am currently evaluating if the foreign 
key constraints are worth the performance penalty in this particular case.

To finish up, it appears that the foreign key implementation has changed since 
when these first tables were created in our database.  Dropping the existing 
triggers and re-adding the constraints on each table significantly improved 
performance for us.  I do not know enough of the internals to know why this 
happened.  But our experience seems to prove that the newer implementation of 
foreign keys is more efficient then previous versions.  YMMV

One other item that was brought up was whether the child tables have the fk 
column indexed, and the answer was yes.  Each had a standard btree index on 
the foreign key.   Explain showed nothing as all the time was being spent in 
the triggers.  Time spent in triggers is not shown in the pg 7.3.4 version of 
explain (nor would I necessarily expect it to).

Thanks for your time, expertise and responses.

-Chris

On Tuesday 09 March 2004 7:18 pm, Stephan Szabo wrote:
 On Wed, 3 Mar 2004, Chris Kratz wrote:
  Which certainly points to the triggers being the culprit.  In reading the
  documentation, it seems like the delete from only... statement should
  ignore the constraint triggers.  But it seems quite obvious from the

 Delete from only merely means that children of the table being deleted
 will not have their rows checked against any where conditions and removed
 for that reason.  It does not affect constraint triggers at all.

 Given I'm guessing it's going to be running about 7000 * 67 queries to
 check the validity of the delete for 7000 rows each having 67 foreign
 keys, I'm not sure there's much to do other than hack around the issue
 right now.

 If you're a superuser, you could temporarily hack reltriggers on the
 table's pg_class row to 0, run the delete and then set it back to the
 correct number. I'm guessing from your message that there's never any
 chance of a concurrent transaction putting in a matching row in a way that
 something is marked as deletable when it isn't?

-- 
Chris Kratz
Systems Analyst/Programmer
VistaShare LLC
www.vistashare.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Delete performance on delete from table with inherited tables

2004-03-09 Thread Chris Kratz
Hello all,

I have a performance issue that I cannot seem to solve and am hoping that 
someone might be able to make some suggestions.

First some background information.  We are using PostgreSQL 7.3.4 on Linux 
with kernel 2.4.19.  The box is a single P4 2.4Ghz proc with 1G ram and  uw 
scsi drives in a hardware raid setup.

We have a transactioninfo table  with about 163k records.  psql describes the 
table as:

\d transactioninfo
  Table public.transactioninfo
Column |   Type   |   Modifiers
---+--+
 transactionid | integer  | not null default 
nextval('transaction_sequence'::text)
 userid| integer  |
 programid | integer  |
 time  | timestamp with time zone |
 comment   | text |
 undoable  | boolean  |
 del   | boolean  |
Indexes: transactioninfo_pkey primary key btree (transactionid),
 delidx btree (del),
 transactioninfo_date btree (time, programid, userid)
Triggers: RI_ConstraintTrigger_6672989,
  RI_ConstraintTrigger_6672990,
  RI_ConstraintTrigger_6672992,
--snip--
--snip--
  RI_ConstraintTrigger_6673121,
  RI_ConstraintTrigger_6673122

There are about 67 inherited tables that inherit the fields from this table, 
hence the 134 constraint triggers.  

There is a related table transactionlog which has a fk(foreign key) to 
transactioninfo.  It contains about 600k records.

There are 67 hist_tablename tables, each with a different structure.  Then an 
additional 67 tables called hist_tablename_log which inherit from the 
transactionlog table and appropriate hist_tablename table.  By the automagic 
of inheritance, since the transactionlog has a fk to transactioninfo, each of 
the hist_tablename_log tables does as well (if I am reading the pg_trigger 
table correctly).

Once a day we run a sql select statement to clear out all records in 
transactioninfo that don't have a matching record in transactionlog.  We 
accumulate between 5k-10k records a day that need clearing from 
transactioninfo.  That clear ran this morning for 5 hours and 45 minutes.

Today I am working on streamlining the sql to try and get the delete down to a 
manageable time frame.  The original delete statement was quite inefficent.  
So, far, I've found that it appears to be much faster to break the task into 
two pieces.  The first is to update a flag on transactioninfo to mark empty 
transactions and then a followup delete which clears based on that flag.  The 
update takes about a minute or so.

update only transactioninfo set del=TRUE where
not exists (select transactionid from transactionlog l where 
l.transactionid=transactioninfo.transactionid);
UPDATE 6911
Time: 59763.26 ms

 Now if I delete a single transactioninfo record found by selecting del=true 
limit 1 I get

explain analyze delete  from only transactioninfo where transactionid=734607;
 QUERY PLAN

 Index Scan using transactioninfo_pkey on transactioninfo  (cost=0.00..6.01 
rows=1 width=6) (actual time=0.18..0.18 rows=1 loops=1)
   Index Cond: (transactionid = 734607)
 Total runtime: 0.41 msec
(3 rows)

Time: 855.08 ms

With the 7000 records to delete and a delete time of 0.855s, we are looking at 
1.5hrs to do the clear which is a great improvement from the 6 hours we have 
been seeing.  But it still seems like it should run faster.

The actual clear statement used in the clear is as follows:
explain delete from transactioninfo where del=true;
  QUERY PLAN
--
 Seq Scan on transactioninfo  (cost=0.00..6177.21 rows=78528 width=6)
   Filter: (del = true)
(2 rows)
 
Another interesting observation is that the raid subsystem shows very low 
activity during the clear.  The backend process is almost entirely cpu bound.

Some of the documentation implies that inherited tables cause deletes to be 
very slow on the parent table, so I did the following experiment.

vistashare=# create table transactioninfo_copy as select * from 
transactioninfo;
SELECT
Time: 6876.88 ms
vistashare=# create index transinfo_copy_del_idx on transactioninfo_copy(del);
CREATE INDEX
Time: 446.20 ms
vistashare=# delete from transactioninfo_copy where del=true;
DELETE 6904
Time: 202.33 ms

Which certainly points to the triggers being the culprit.  In reading the 
documentation, it seems like the delete from only... statement should 
ignore the constraint triggers.  But it seems quite obvious from the 
experiments that it is not.  Also, the fact that the query plan