Re: [PERFORM] Slow loads when indexes added.

2005-03-16 Thread Stef
[EMAIL PROTECTED] mentioned :
= Try ANALYZE after loading the referenced tables, but before loading the main 
table

I attached a new script for creating the load file...
 
Analyze didn't help, it actually took longer to load.
I set autocommit to off, and put a commit after every 100
inserts, chattr'd noatime atrribute off recursively on PGDATA, and
set fsync to off, this improved the time from 3min 51sec to 2min 37 sec
for the slow scenario.

But I was already doing all these things  in the app that 
used to take 40 minutes, but now takes four hours to load.

Any other suggestions?

Kind Regards
Stefan

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Magnus Hagander
  The this day and age argument isn't very convincing. Hard drive 
  capacity growth has far outstripped hard drive seek time 
 and bandwidth improvements.
  Random access has more penalty than ever.
 
  In point of fact, there haven't been noticeable seek time 
 improvements 
  for years. Transfer rates, on the other hand, have gone 
 through the roof.
 
 Er, yeah. I stated it wrong. The real ratio here is between 
 seek time and throughput.
 
 Typical 7200RPM drives have average seek times are in the 
 area of 10ms.
 Typical sustained transfer rates are in the range of 40Mb/s. 
 Postgres reads 8kB blocks at a time.
 
 So 800kB/s for random access reads. And 40Mb/s for sequential 
 reads. That's a factor of 49. I don't think anyone wants 
 random_page_cost to be set to 50 though.
 
 For a high end 15k drive I see average seek times get as low 
 as 3ms. And sustained transfer rates get as high as 100Mb/s. 
 So about 2.7Mb/s for random access reads or about a 
 random_page_cost of 37. Still pretty extreme.
 
 So what's going on with the empirically derived value of 4? 
 Perhaps this is because even though Postgres is reading an 
 entire table sequentially it's unlikely to be the only I/O 
 consumer? The sequential reads would be interleaved 
 occasionally by some other I/O forcing a seek to continue.

What about the cache memory on the disk? Even IDE disks have some 8Mb
cache today, which makes a lot of difference for fairly short scans.
Even if it's just read cache. That'll bring the speed of random access
down to a 1=1 relationship with sequential access, assuming all fits in
the cache.


//Magnus

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread David Gagnon
Hi All,
I rerun the example with the debug info turned on in postgresl. As you 
can see all dependent tables (that as foreign key on table IC) are 
emptied before the DELETE FROM IC statement is issued.  For what I 
understand the performance problem seem to came from those selects that 
point back to IC ( LOG:  statement: SELECT 1 FROM ONLY public.ic x 
WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of them.  I don't know 
where they are comming from.  But if I want to delete the content of the 
table (~10k) it may be long to those 6 selects for each deleted rows.  
Why are those selects are there ?  Are those select really run on each 
row deleted?

I'm running version 7.4.5 on cygwin.  I ran the same delete from 
pgAdminIII and I got 945562ms for all the deletes within the same 
transaction  .. (so I was wrong saying it took less time in 
PgAdminIII... sorry about this).

Do you have any idea why those 6 selects are there?
Maybe I can drop indexes before deleting the content of the table.  I 
didn't planned to because tables are quite small and it's more 
complicated in my environment.  And tell me if I'm wrong but if I drop 
indexed do I have to reload all my stored procedure (to reset the 
planner related info)??? Remember having read that somewhere.. (was it 
in the Postgresql General Bit newletter ...anyway)

Thanks for your help I really appréciate it :-)
/David
LOG:  duration: 144.000 ms
LOG:  statement: DELETE FROM YN
LOG:  duration: 30.000 ms
LOG:  statement: DELETE FROM YO
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 
AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = 
$1 AND ynyonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 
AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = 
$1 AND yryonum = $2 FOR UPDATE OF x
LOG:  duration: 83.000 ms
LOG:  connection received: host=127.0.0.1 port=2196
LOG:  connection authorized: user=admin database=webCatalog
LOG:  statement: set datestyle to 'ISO'; select version(), case when 
pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else 
getdatabaseencoding() end;
LOG:  duration: 2.000 ms
LOG:  statement: set client_encoding = 'UNICODE'
LOG:  duration: 0.000 ms
LOG:  statement: DELETE FROM IY
LOG:  duration: 71.000 ms
LOG:  statement: DELETE FROM IA
LOG:  duration: 17.000 ms
LOG:  statement: DELETE FROM IQ
LOG:  duration: 384.000 ms
LOG:  statement: DELETE FROM IC
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = $1 
FOR UPDATE OF x
LOG:  duration: 656807.000 msMichael Fuhr wrote:



---
DELETE FROM BM;
DELETE FROM BD;
DELETE FROM BO;
DELETE FROM IL;
DELETE FROM YR;
DELETE FROM YN;
DELETE FROM YO;
DELETE FROM IY;
DELETE FROM IA;
DELETE FROM IQ;
DELETE FROM IC;
Michael Fuhr wrote:
On Tue, Mar 15, 2005 at 04:24:17PM -0500, David Gagnon wrote:
 

Il get this strange problem when deleting rows from a Java program.  
Sometime (For what I noticed it's not all the time) the server take 
almost forever to delete rows from table.
   

Do other tables have foreign key references to the table you're
deleting from?  If so, are there indexes on the foreign key columns?
Do you have triggers or rules on the table?
Have you queried pg_locks during the long-lasting deletes to see
if the deleting transaction is waiting for a lock on something?
 

I rememeber having tried to delete the content of my table (IC) from
PgAdminIII and I took couples of seconds!!! Not minutes.
   

How many records did you delete in this case?  If there are foreign
key references, how many records were in the referencing tables?
How repeatable is the disparity in delete time?  A single test case
might have been done under different conditions, so it might not
mean much.  No offense intended, but I remember doesn't carry as
much weight as a documented example.
 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread Dave Cramer

David Gagnon wrote:
Hi All,
I rerun the example with the debug info turned on in postgresl. As you 
can see all dependent tables (that as foreign key on table IC) are 
emptied before the DELETE FROM IC statement is issued.  For what I 
understand the performance problem seem to came from those selects 
that point back to IC ( LOG:  statement: SELECT 1 FROM ONLY 
public.ic x WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of 
them.  I don't know where they are comming from.  But if I want to 
delete the content of the table (~10k) it may be long to those 6 
selects for each deleted rows.  Why are those selects are there ?  Are 
those select really run on each row deleted?
You are using hibernate. Hibernate is generating them to lock the tables.

I'm running version 7.4.5 on cygwin.  I ran the same delete from 
pgAdminIII and I got 945562ms for all the deletes within the same 
transaction  .. (so I was wrong saying it took less time in 
PgAdminIII... sorry about this).

Do you have any idea why those 6 selects are there?
Hibernate
Maybe I can drop indexes before deleting the content of the table.  I 
didn't planned to because tables are quite small and it's more 
complicated in my environment.  And tell me if I'm wrong but if I drop 
indexed do I have to reload all my stored procedure (to reset the 
planner related info)??? Remember having read that somewhere.. (was it 
in the Postgresql General Bit newletter ...anyway)

Thanks for your help I really appréciate it :-)
/David
LOG:  duration: 144.000 ms
LOG:  statement: DELETE FROM YN
LOG:  duration: 30.000 ms
LOG:  statement: DELETE FROM YO
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = 
$1 AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = 
$1 AND ynyonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = 
$1 AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = 
$1 AND yryonum = $2 FOR UPDATE OF x
LOG:  duration: 83.000 ms
LOG:  connection received: host=127.0.0.1 port=2196
LOG:  connection authorized: user=admin database=webCatalog
LOG:  statement: set datestyle to 'ISO'; select version(), case when 
pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else 
getdatabaseencoding() end;
LOG:  duration: 2.000 ms
LOG:  statement: set client_encoding = 'UNICODE'
LOG:  duration: 0.000 ms
LOG:  statement: DELETE FROM IY
LOG:  duration: 71.000 ms
LOG:  statement: DELETE FROM IA
LOG:  duration: 17.000 ms
LOG:  statement: DELETE FROM IQ
LOG:  duration: 384.000 ms
LOG:  statement: DELETE FROM IC
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = 
$1 FOR UPDATE OF x
LOG:  duration: 656807.000 msMichael Fuhr wrote:



---
DELETE FROM BM;
DELETE FROM BD;
DELETE FROM BO;
DELETE FROM IL;
DELETE FROM YR;
DELETE FROM YN;
DELETE FROM YO;
DELETE FROM IY;
DELETE FROM IA;
DELETE FROM IQ;
DELETE FROM IC;
Michael Fuhr wrote:
On Tue, Mar 15, 2005 at 04:24:17PM -0500, David Gagnon wrote:
 

Il get this strange problem when deleting rows from a Java program.  
Sometime (For what I noticed it's not all the time) the server take 
almost forever to delete rows from table.
  

Do other tables have foreign key references to the table you're
deleting from?  If so, are there indexes on the foreign key columns?
Do you have triggers or rules on the table?
Have you queried pg_locks during the long-lasting deletes to see
if the deleting transaction is waiting for a lock on something?
 

I rememeber having tried to delete the content of my table (IC) from
PgAdminIII and I took couples of seconds!!! Not minutes.
  

How many records did you delete in this case?  If there are foreign
key references, how many records were in the referencing tables?
How repeatable is the disparity in delete time?  A single test case
might have been done under different conditions, so it might not
mean much.  No offense intended, but I remember doesn't carry as
much weight as a documented example.
 


---(end of 

Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread David Gagnon
Hi
I rerun the example with the debug info turned on in postgresl. As you 
can see all dependent tables (that as foreign key on table IC) are 
emptied before the DELETE FROM IC statement is issued.  For what I 
understand the performance problem seem to came from those selects that 
point back to IC ( LOG:  statement: SELECT 1 FROM ONLY public.ic x 
WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of them.  I don't know 
where they are comming from.
   

I think they come from the FK checking code.  Try to run a VACUUM on the
IC table just before you delete from the other tables; that should make
the checking almost instantaneous (assuming the vacuuming actually
empties the table, which would depend on other transactions).
 

I'll try to vaccum first before I start the delete to see if it change 
something.

There is probably a good reason why but I don't understant why in a 
foreign key check it need to check the date it points to.

You delete a row from table IC and do a check for integrity on tables 
that have foreign keys on IC (make sense).  But why checking back IC?  
I'm pretty sure there is a good reason but it seems to have a big 
performance impact... In this case.  It means it's not really feasable 
to empty the content of a schema.  The table has only 10k .. with a huge 
table it's not feasible just because the checks on itselft!

Is someone can explain why there is this extra check?  Is that can be 
fixed or improved?

Thanks for your help
/David


LOG:  duration: 144.000 ms
LOG:  statement: DELETE FROM YN
LOG:  duration: 30.000 ms
LOG:  statement: DELETE FROM YO
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 
AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yn x WHERE ynyotype = 
$1 AND ynyonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yo x WHERE yotype = $1 
AND yonum = $2 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.yr x WHERE yryotype = 
$1 AND yryonum = $2 FOR UPDATE OF x
LOG:  duration: 83.000 ms
LOG:  connection received: host=127.0.0.1 port=2196
LOG:  connection authorized: user=admin database=webCatalog
LOG:  statement: set datestyle to 'ISO'; select version(), case when 
pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else 
getdatabaseencoding() end;
LOG:  duration: 2.000 ms
LOG:  statement: set client_encoding = 'UNICODE'
LOG:  duration: 0.000 ms
LOG:  statement: DELETE FROM IY
LOG:  duration: 71.000 ms
LOG:  statement: DELETE FROM IA
LOG:  duration: 17.000 ms
LOG:  statement: DELETE FROM IQ
LOG:  duration: 384.000 ms
LOG:  statement: DELETE FROM IC
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iq x WHERE iqicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ia x WHERE iaicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumo = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.iy x WHERE iyicnumr = 
$1 FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.il x WHERE ilicnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.ic x WHERE icnum = $1 
FOR UPDATE OF x
LOG:  statement: SELECT 1 FROM ONLY public.bd x WHERE bdicnum = $1 
FOR UPDATE OF x
LOG:  duration: 656807.000 msMichael Fuhr wrote:



It would be better to be able to use TRUNCATE to do this, but in 8.0 you
can't if the tables have FKs.  8.1 is better on that regard ...
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] multi-column index

2005-03-16 Thread Daniel Crisan
Hello.
I have a problem concerning multi-column indexes.
I have a table containing some 250k lines.
Table public.descriptionprodftdiclnk
  Column|  Type   | Modifiers
-+-+---
idword  | integer | not null
idqualifier | integer | not null
Indexes:
   descriptionprodftdiclnk_pkey primary key, btree (idword, idqualifier)
   ix_descriptionprodftdiclnk_idqualif btree (idqualifier)

When analyzing a simple query on the idword column the query planner 
displays:

explain analyze select * from descriptionprodftdiclnk where idword=44;
 QUERY PLAN
---
Seq Scan on descriptionprodftdiclnk  (cost=0.00..4788.14 rows=44388 
width=8) (actual time=87.582..168.041 rows=43792 loops=1)
  Filter: (idword = 44)
Total runtime: 195.339 ms
(3 rows)

I don't understand why the query planner would not use the default 
created multi-column index
on the primary key. According to the Postgres online documentation it 
should. By setting the
enable_seqscan parameter to off, i can force the planner to use the index:

explain analyze select * from descriptionprodftdiclnk where idword=44;
   
QUERY PLAN
---
Index Scan using descriptionprodftdiclnk_pkey on 
descriptionprodftdiclnk  (cost=0.00..36720.39 rows=44388 width=8) 
(actual time=0.205..73.489 rows=43792 loops=1)
  Index Cond: (idword = 44)
Total runtime: 100.564 ms
(3 rows)


On the other hand, by defining a new index on the idword column (and 
enable_seqscan set to on),
the query uses the index:

create index ix_tempIndex on descriptionprodftdiclnk(idword);
CREATE INDEX
explain analyze select * from descriptionprodftdiclnk where idword=44;
  QUERY 
PLAN
-
Index Scan using ix_tempindex on descriptionprodftdiclnk  
(cost=0.00..916.24 rows=44388 width=8) (actual time=0.021..79.879 
rows=43792 loops=1)
  Index Cond: (idword = 44)
Total runtime: 107.081 ms
(3 rows)

Could someone provide an explanation for the planner's behaviour?
Thanks for your help,
Daniel
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread Stephan Szabo
On Wed, 16 Mar 2005, David Gagnon wrote:

 Hi

 I rerun the example with the debug info turned on in postgresl. As you
 can see all dependent tables (that as foreign key on table IC) are
 emptied before the DELETE FROM IC statement is issued.  For what I
 understand the performance problem seem to came from those selects that
 point back to IC ( LOG:  statement: SELECT 1 FROM ONLY public.ic x
 WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of them.  I don't know
 where they are comming from.
 
 
 
 I think they come from the FK checking code.  Try to run a VACUUM on the
 IC table just before you delete from the other tables; that should make
 the checking almost instantaneous (assuming the vacuuming actually
 empties the table, which would depend on other transactions).
 
 
 I'll try to vaccum first before I start the delete to see if it change
 something.

 There is probably a good reason why but I don't understant why in a
 foreign key check it need to check the date it points to.

 You delete a row from table IC and do a check for integrity on tables
 that have foreign keys on IC (make sense).  But why checking back IC?

Because in the general case there might be another row which satisfies the
constraint added between the delete and the check.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread David Gagnon

Stephan Szabo wrote:
On Wed, 16 Mar 2005, David Gagnon wrote:
 

Hi
   

I rerun the example with the debug info turned on in postgresl. As you
can see all dependent tables (that as foreign key on table IC) are
emptied before the DELETE FROM IC statement is issued.  For what I
understand the performance problem seem to came from those selects that
point back to IC ( LOG:  statement: SELECT 1 FROM ONLY public.ic x
WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of them.  I don't know
where they are comming from.
   

I think they come from the FK checking code.  Try to run a VACUUM on the
IC table just before you delete from the other tables; that should make
the checking almost instantaneous (assuming the vacuuming actually
empties the table, which would depend on other transactions).
 

I'll try to vaccum first before I start the delete to see if it change
something.
There is probably a good reason why but I don't understant why in a
foreign key check it need to check the date it points to.
You delete a row from table IC and do a check for integrity on tables
that have foreign keys on IC (make sense).  But why checking back IC?
   

Because in the general case there might be another row which satisfies the
constraint added between the delete and the check.
 

So it's means if I want to reset the shema with DELETE FROM Table 
statemnets  I must first drop indexes, delete the data and then recreate 
indexes and reload stored procedure.

Or I can suspend the foreign key check in the db right.  I saw something 
on this.  Is that possible to do this from the JDBC interface?

Is there any other options I can consider ?
Thanks for your help!
/David
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Speeding up select distinct

2005-03-16 Thread Laurent Martelli
Consider this query:

SELECT distinct owner from pictures; 

 Unique  (cost=361.18..382.53 rows=21 width=4) (actual time=14.197..17.639 
rows=21 loops=1)
   -  Sort  (cost=361.18..371.86 rows=4270 width=4) (actual 
time=14.188..15.450 rows=4270 loops=1)
 Sort Key: owner
 -  Seq Scan on pictures  (cost=0.00..103.70 rows=4270 width=4) 
(actual time=0.012..5.795 rows=4270 loops=1)
 Total runtime: 19.147 ms

I thought that 19ms to return 20 rows out of a 4000 rows table so I
added an index:

CREATE INDEX pictures_owner ON pictures (owner);

It gives a slight improvement:

 Unique  (cost=0.00..243.95 rows=21 width=4) (actual time=0.024..10.293 rows=21 
loops=1)
   -  Index Scan using pictures_owner on pictures  (cost=0.00..233.27 
rows=4270 width=4) (actual time=0.022..8.227 rows=4270 loops=1)
 Total runtime: 10.369 ms

But still, it's a lot for 20 rows. I looked at other type of indexes,
but they seem to either not give beter perfs or be irrelevant. 

Any ideas, apart from more or less manually maintaining a list of
distinct owners in another table ?

-- 
Laurent Martelli
[EMAIL PROTECTED]Java Aspect Components
http://www.aopsys.com/  http://jac.objectweb.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Josh Berkus
Greg,

 So 800kB/s for random access reads. And 40Mb/s for sequential reads. That's
 a factor of 49. I don't think anyone wants random_page_cost to be set to 50
 though.

 For a high end 15k drive I see average seek times get as low as 3ms. And
 sustained transfer rates get as high as 100Mb/s. So about 2.7Mb/s for
 random access reads or about a random_page_cost of 37. Still pretty
 extreme.

Actually, what you're demonstrating here is that there's really no point in 
having a random_page_cost GUC, since the seek/scan ratio is going to be high 
regardless.   

Although I can point out that you left out the fact that the disk needs to do 
a seek to find the beginning of the seq scan area, and even then some file 
fragmentation is possible.   Finally, I've never seen PostgreSQL manage more 
than 70% of the maximum read rate, and in most cases more like 30%. 

 So what's going on with the empirically derived value of 4? 

It's not empirically derived; it's a value we plug into an 
internal-to-postgresql formula.   And 4 is a fairly conservative value that 
works for a lot of systems.

Realistically, the values we should be deriving from are:
-- median file cache size for postgresql files
-- average disk read throughput
-- effective processor calculation throughput
-- median I/O contention

However, working those 4 hardware facts into forumulas that allow us to 
calculate the actual cost of a query execution plan is somebody's PhD paper.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread PFC
Try :
SELECT owner from pictures group by owner;
Any ideas, apart from more or less manually maintaining a list of
distinct owners in another table ?
That would be a good idea too for normalizing your database.
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] multi-column index

2005-03-16 Thread Josh Berkus
Daniel,

 Table public.descriptionprodftdiclnk

What is this, German?  ;-)

 explain analyze select * from descriptionprodftdiclnk where idword=44;
   QUERY PLAN
 ---
 Seq Scan on
 descriptionprodftdiclnk  (cost=0.00..4788.14 rows=44388 width=8) (actual
 time=87.582..168.041 rows=43792 loops=1)
Filter: (idword = 44)
  Total runtime: 195.339 ms
 (3 rows)

 explain analyze select * from descriptionprodftdiclnk where idword=44;

 QUERY PLAN
 ---

 Index Scan using descriptionprodftdiclnk_pkey on
 descriptionprodftdiclnk  (cost=0.00..36720.39 rows=44388 width=8)
 (actual time=0.205..73.489 rows=43792 loops=1)
Index Cond: (idword = 44)
  Total runtime: 100.564 ms
 (3 rows)

 create index ix_tempIndex on descriptionprodftdiclnk(idword);
 CREATE INDEX
 explain analyze select * from descriptionprodftdiclnk where idword=44;
QUERY
 PLAN
 ---
-- Index
 Scan using ix_tempindex on descriptionprodftdiclnk
 (cost=0.00..916.24 rows=44388 width=8) (actual time=0.021..79.879
 rows=43792 loops=1)
Index Cond: (idword = 44)
  Total runtime: 107.081 ms
 (3 rows)

 Could someone provide an explanation for the planner's behaviour?

Pretty simple, really.  Look at the cost calculations for the index scan for 
the multi-column index.PostgreSQL believes that:
The cost of a seq scan is 4788.14
The cost of an 2-column index scan is 36720.39
The cost of a 1-column index scan is 916.24

Assuming that you ran each of these queries multiple times to eliminate 
caching as a factor, the issue is that the cost calculations are wrong.   We 
give you a number of GUC variables to change that:
effective_cache_size
random_page_cost
cpu_tuple_cost
etc.

See the RUNTIME-CONFIGURATION docs for more details.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Rod Taylor
On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote:
 Consider this query:
 
 SELECT distinct owner from pictures; 

The performance has nothing to do with the number of rows returned, but
rather the complexity of calculations and amount of data to sift through
in order to find it.

 Any ideas, apart from more or less manually maintaining a list of
 distinct owners in another table ?

This would be the proper thing to do, along with adding a foreign key
from pictures to the new owner structure for integrity enforcement.
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Merlin Moncure
 Consider this query:
 
 SELECT distinct owner from pictures;

[...]
 Any ideas, apart from more or less manually maintaining a list of
 distinct owners in another table ?

you answered your own question.  With a 20 row owners table, you should
be directing your efforts there group by is faster than distinct, but
both are very wasteful and essentially require s full seqscan of the
detail table.  

With a little hacking, you can change 'manual maintenance' to 'automatic
maintenance'.

1. create table owner as select distinct owner from pictures;
2. alter table owner add constraint owner_pkey(owner);
3. alter table pictures add constraint ri_picture_owner(owner)
references owner;
4. make a little append_ownder function which adds an owner to the owner
table if there is not already one there. Inline this to your insert
statement on pictures.

Voila!
Merlin
p.s. normalize your data always!

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Laurent Martelli
 Rod == Rod Taylor [EMAIL PROTECTED] writes:

  Rod On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote:
   Consider this query:
   
   SELECT distinct owner from pictures;

  Rod The performance has nothing to do with the number of rows
  Rod returned, but rather the complexity of calculations and amount
  Rod of data to sift through in order to find it.

Yes, but I thought that an index might be able to know what distinct
values there are and help optime that query very much.

-- 
Laurent Martelli
[EMAIL PROTECTED]Java Aspect Components
http://www.aopsys.com/  http://jac.objectweb.org


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Laurent Martelli
Wow, what a fast response !!!

 PFC == PFC  [EMAIL PROTECTED] writes:

  PFC  Try :

  PFC  SELECT owner from pictures group by owner;

That's a slight improvement, but there's still a seq scan on pictures:

 HashAggregate  (cost=114.38..114.38 rows=21 width=4) (actual time=7.585..7.605 
rows=21 loops=1)
   -  Seq Scan on pictures  (cost=0.00..103.70 rows=4270 width=4) (actual 
time=0.015..3.272 rows=4270 loops=1)
 Total runtime: 7.719 ms




-- 
Laurent Martelli
[EMAIL PROTECTED]Java Aspect Components
http://www.aopsys.com/  http://jac.objectweb.org


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Rod Taylor
On Wed, 2005-03-16 at 19:31 +0100, Laurent Martelli wrote:
  Rod == Rod Taylor [EMAIL PROTECTED] writes:
 
   Rod On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote:
Consider this query:

SELECT distinct owner from pictures;
 
   Rod The performance has nothing to do with the number of rows
   Rod returned, but rather the complexity of calculations and amount
   Rod of data to sift through in order to find it.
 
 Yes, but I thought that an index might be able to know what distinct
 values there are and help optime that query very much.

The index does know. You just have to visit all of the pages within the
index to find out, which it does, and that's why you dropped 10ms.

But if you want a sub ms query, you're going to have to normalize the
structure.

-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speeding up select distinct

2005-03-16 Thread Merlin Moncure
 I just wished there was a means to fully automate all this and render
 it transparent to the user, just like an index.
 
   Merlin Voila!  Merlin p.s. normalize your data always!
 
 I have this:
 
 pictures(
 PictureID serial PRIMARY KEY,
 Owner integer NOT NULL REFERENCES users,
 [...]);
 CREATE TABLE users (
 UserID serial PRIMARY KEY,
 Name character varying(255),
 [...]);
 
 Isn't it normalized ?

try:
select * from users where UserID in (select pictureId from pictures);
select * userid from users intersect select pictureid from pictures;
select distinct userid, [...] from users, pictures where user userid =
pictureid)

if none of these give you what you want then you can solve this with a
new tble, picture_user using the instructions I gave previously.

Not sure if your data is normalized, but ISTM you are over-using
surrogate keys.  It may not be possible, but consider downgrading ID
columns to unique and picking a natural key.  Now you get better benefit
of RI and you can sometimes remove joins from certain queries.

Rule: use natural keys when you can, surrogate keys when you have to.
Corollary: use domains for fields used in referential integrity.

Merlin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Help to find out problem with joined tables

2005-03-16 Thread Rodrigo Moreno
Hi all,

Could someone explain me when I joined tree tables the querys that took
about 1sec to finish, takes 17secs to complete when I put tree tables joined
?

If I join movest/natope, it's fast, if I join movest/produt, it's fast too,
but when I put a third joined table, forget, it's very slow.

All tables are vacuumed by vacummdb --full --analyze, every night
All Indexes are reindexed every night

TABLES:
---

Movest: +- 2 milions rows, indexed
Natope: 30 rows PK(natope_id)
Produt: +- 1400 Rows PK(codpro)

EXPLAINS:
-
explain analyze
select a.codpro, a.datmov, a.vlrtot
from movest a, natope b
where a.tipmov = 'S'
  and a.codpro = 629001
  and a.datmov between '2005-03-01' and '2005-03-31'
  and a.natope = b.natope_id

Merge Join  (cost=35.68..36.23 rows=1 width=25) (actual time=2.613..2.840
rows=6 loops=1)
  Merge Cond: (outer.natope = inner.?column2?)
  -  Sort  (cost=32.02..32.04 rows=7 width=35) (actual time=1.296..1.314
rows=10 loops=1)
Sort Key: a.natope
-  Index Scan using ix_movest_03 on movest a  (cost=0.00..31.92
rows=7 width=35) (actual time=0.507..1.215 rows=10 loops=1)
  Index Cond: ((codpro = 629001::numeric) AND (datmov =
'2005-03-01'::date) AND (datmov = '2005-03-31'::date))
  Filter: (tipmov = 'S'::bpchar)
  -  Sort  (cost=3.65..3.82 rows=66 width=4) (actual time=1.132..1.203
rows=49 loops=1)
Sort Key: (b.natope_id)::numeric
-  Seq Scan on natope b  (cost=0.00..1.66 rows=66 width=4) (actual
time=0.117..0.500 rows=66 loops=1)
Total runtime: 3.077 ms


---
explain analyze
select a.codpro, a.datmov, a.vlrtot
from movest a, natope b, produt c
where a.tipmov = 'S'
  and a.codpro = 629001
  and a.datmov between '2005-03-01' and '2005-03-31'
  and a.natope = b.natope_id
  and a.codpro = c.codpro

Nested Loop  (cost=35.68..144.57 rows=2 width=25) (actual
time=2838.121..17257.168 rows=6 loops=1)
  -  Merge Join  (cost=35.68..36.23 rows=1 width=25) (actual
time=1.808..2.280 rows=6 loops=1)
Merge Cond: (outer.natope = inner.?column2?)
-  Sort  (cost=32.02..32.04 rows=7 width=35) (actual
time=0.485..0.504 rows=10 loops=1)
  Sort Key: a.natope
  -  Index Scan using ix_movest_03 on movest a
(cost=0.00..31.92 rows=7 width=35) (actual time=0.135..0.390 rows=10
loops=1)
Index Cond: ((codpro = 629001::numeric) AND (datmov =
'2005-03-01'::date) AND (datmov = '2005-03-31'::date))
Filter: (tipmov = 'S'::bpchar)
-  Sort  (cost=3.65..3.82 rows=66 width=4) (actual
time=1.114..1.209 rows=49 loops=1)
  Sort Key: (b.natope_id)::numeric
  -  Seq Scan on natope b  (cost=0.00..1.66 rows=66 width=4)
(actual time=0.058..0.485 rows=66 loops=1)
  -  Seq Scan on produt c  (cost=0.00..108.26 rows=8 width=4) (actual
time=2688.356..2875.743 rows=1 loops=6)
Filter: ((codpro)::numeric = 629001::numeric)
Total runtime: 17257.865 ms

Best Regards
Rodrigo Moreno



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance problem on delete from for 10k rows. May

2005-03-16 Thread Stephan Szabo
On Wed, 16 Mar 2005, David Gagnon wrote:



 Stephan Szabo wrote:

 On Wed, 16 Mar 2005, David Gagnon wrote:
 
 
 
 Hi
 
 
 
 I rerun the example with the debug info turned on in postgresl. As you
 can see all dependent tables (that as foreign key on table IC) are
 emptied before the DELETE FROM IC statement is issued.  For what I
 understand the performance problem seem to came from those selects that
 point back to IC ( LOG:  statement: SELECT 1 FROM ONLY public.ic x
 WHERE icnum = $1 FOR UPDATE OF x).  There are 6 of them.  I don't know
 where they are comming from.
 
 
 
 
 I think they come from the FK checking code.  Try to run a VACUUM on the
 IC table just before you delete from the other tables; that should make
 the checking almost instantaneous (assuming the vacuuming actually
 empties the table, which would depend on other transactions).
 
 
 
 
 I'll try to vaccum first before I start the delete to see if it change
 something.
 
 There is probably a good reason why but I don't understant why in a
 foreign key check it need to check the date it points to.
 
 You delete a row from table IC and do a check for integrity on tables
 that have foreign keys on IC (make sense).  But why checking back IC?
 
 
 
 Because in the general case there might be another row which satisfies the
 constraint added between the delete and the check.
 
 
 
 So it's means if I want to reset the shema with DELETE FROM Table
 statemnets  I must first drop indexes, delete the data and then recreate
 indexes and reload stored procedure.

 Or I can suspend the foreign key check in the db right.  I saw something
 on this.  Is that possible to do this from the JDBC interface?

I think you can remove the constraints and re-add them after which should
hopefully be fast (a vacuum on the tables after the delete and before the
add might help, but I'm not sure).  You could potentially defer the
constraint if it were deferrable, but I don't think that would help any.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cpu_tuple_cost

2005-03-16 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 So what's going on with the empirically derived value of 4? 

 It's not empirically derived;

Yes it is.  I ran experiments back in the late 90s to derive it.
Check the archives.

Disks have gotten noticeably bigger since then, but I don't think
the ratio of seek time to rotation rate has changed much.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] multi-column index

2005-03-16 Thread Tom Lane
David Brown [EMAIL PROTECTED] writes:
 Actually, I'm surprised the planner came up with such a low cost for the 
 single column index, unless ... perhaps correlation statistics aren't 
 used when determining costs for multi-column indexes?

The correlation calculation for multi-column indexes is pretty whacked
out pre-8.0.  I don't think it's that great in 8.0 either --- we really
need to make ANALYZE calculate the correlation explicitly for each
index, probably, rather than trying to use per-column correlations.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Help to find out problem with joined tables

2005-03-16 Thread Michael Fuhr
On Wed, Mar 16, 2005 at 05:10:17PM -0300, Rodrigo Moreno wrote:

 If I join movest/natope, it's fast, if I join movest/produt, it's fast too,
 but when I put a third joined table, forget, it's very slow.

What version of PostgreSQL are you using?

 All tables are vacuumed by vacummdb --full --analyze, every night
 All Indexes are reindexed every night

How many updates/deletes do the tables see between vacuums?

 Movest: +- 2 milions rows, indexed
 Natope: 30 rows PK(natope_id)
 Produt: +- 1400 Rows PK(codpro)

Could you show the table definitions, or at least the definitions
for the relevant columns and indexes?

   -  Seq Scan on produt c  (cost=0.00..108.26 rows=8 width=4) (actual
 time=2688.356..2875.743 rows=1 loops=6)
 Filter: ((codpro)::numeric = 629001::numeric)

What type is produt.codpro?  You might be missing a potential index
scan here due to mismatched types.

The times (2688.356..2875.743) here look odd, although I might be
overlooking or misinterpreting something.  I don't know what else
might cause that, but one thing that can is a lot of dead tuples
in the table, hence my question about how much activity the tables
see between vacuums.  Maybe somebody else can provide a better
explanation.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Join method influences scan method?

2005-03-16 Thread Tom Lane
[EMAIL PROTECTED] writes:
 So, it would seem like my optimal plan should have hash joins with index
 scans.

No.  The thing you are looking at here is a nestloop join with inner
index scan, which has to be understood as a unit even though EXPLAIN
doesn't describe it that way.  The inner indexscan is repeated once
for each outer row, using a join key from the outer row as part of the
index lookup.  That's simply not relevant to the other kinds of joins,
because they expect the inner and outer relations to be scanned
independently.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster