Re: [PERFORM] Slow query problem

2004-01-09 Thread Stephan Szabo
On Fri, 9 Jan 2004, Richard Huxton wrote:

> On Friday 09 January 2004 08:57, Dennis Björklund wrote:
> > On Fri, 9 Jan 2004, Richard Huxton wrote:
> > > > > select invheadref, invprodref, sum(units)
> > > > > from invtran
> > > > > group by invheadref, invprodref
> > > >
> > > > For the above query, shouldn't you have one index for both columns
> > > > (invheadref, invprodref). Then it should not need to sort at all to do
> > > > the grouping and it should all be fast.
> > >
> > > Not sure if that would make a difference here, since the whole table is
> > > being read.
> >
> > The goal was to avoid the sorting which should not be needed with that
> > index (I hope). So I still think that it would help in this case.
>
> Sorry - not being clear. I can see how it _might_ help, but will the planner
> take into account the fact that even though:
>   index-cost > seqscan-cost
> that
>   (index-cost + no-sorting) < (seqscan-cost + sort-cost)
> assuming of course, that the costs turn out that way.

AFAICS, yes it does take that effect into account (as best
it can with the estimates).


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow query problem

2004-01-09 Thread Tom Lane
Richard Huxton <[EMAIL PROTECTED]> writes:
>> The goal was to avoid the sorting which should not be needed with that
>> index (I hope). So I still think that it would help in this case.

> Sorry - not being clear. I can see how it _might_ help, but will the planner 
> take into account the fact that even though:
>   index-cost > seqscan-cost
> that
>   (index-cost + no-sorting) < (seqscan-cost + sort-cost)

Yes, it would.

> assuming of course, that the costs turn out that way.

That I'm less sure about.  A sort frequently looks cheaper than a full
indexscan, unless the table is pretty well clustered on that index,
or you knock random_page_cost way down.

With no stats at all, CVS tip has these preferences:

regression=# create table fooey (f1 int, f2 int, unique(f1,f2));
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "fooey_f1_key" for table 
"fooey"
CREATE TABLE
regression=# explain select * from fooey group by f1,f2;
  QUERY PLAN
---
 HashAggregate  (cost=25.00..25.00 rows=1000 width=8)
   ->  Seq Scan on fooey  (cost=0.00..20.00 rows=1000 width=8)
(2 rows)

regression=# set enable_hashagg TO 0;
SET
regression=# explain select * from fooey group by f1,f2;
 QUERY PLAN

 Group  (cost=0.00..57.00 rows=1000 width=8)
   ->  Index Scan using fooey_f1_key on fooey  (cost=0.00..52.00 rows=1000 width=8)
(2 rows)

regression=# set enable_indexscan TO 0;
SET
regression=# explain select * from fooey group by f1,f2;
 QUERY PLAN
-
 Group  (cost=69.83..77.33 rows=1000 width=8)
   ->  Sort  (cost=69.83..72.33 rows=1000 width=8)
 Sort Key: f1, f2
 ->  Seq Scan on fooey  (cost=0.00..20.00 rows=1000 width=8)
(4 rows)

but remember this is for a relatively small (estimated size of) table.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Slow query problem

2004-01-09 Thread Bradley Tate
Dennis Björklund wrote:

On Fri, 9 Jan 2004, Richard Huxton wrote:

 

select invheadref, invprodref, sum(units)
from invtran
group by invheadref, invprodref
   

For the above query, shouldn't you have one index for both columns
(invheadref, invprodref). Then it should not need to sort at all to do the
grouping and it should all be fast.
 

Not sure if that would make a difference here, since the whole table is being 
read. 
   

The goal was to avoid the sorting which should not be needed with that 
index (I hope). So I still think that it would help in this case.

 

Thanks for the advice. I tried creating a compound index along with 
clustering the invtran table on it, adding another 512MB RAM, increasing 
shared_buffers to 6 and increasing sort_mem to 100MB, playing with 
effective cache size in postgresql.conf. This cut the execution time 
down to 4 minutes, which was helpful but still way behind firebird. 
There was still an awful lot of disk activity while it was happening 
which seems to imply lots of sorting going on (?)

Invtran is a big table but it is clustered and static i.e. no updates, 
select statements only.

Mostly my performance problems are with sorts - group by, order by. I 
was hoping for better results than I've been getting so far.

Thanks.

p.s.
Can someone confirm whether this should work from pgadmin3? i.e.  will 
the size of the sort_mem be changed for the duration of the query or 
session?

set sort_mem  to 10;
select etc;


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Slow query problem

2004-01-09 Thread Richard Huxton
On Friday 09 January 2004 08:57, Dennis Björklund wrote:
> On Fri, 9 Jan 2004, Richard Huxton wrote:
> > > > select invheadref, invprodref, sum(units)
> > > > from invtran
> > > > group by invheadref, invprodref
> > >
> > > For the above query, shouldn't you have one index for both columns
> > > (invheadref, invprodref). Then it should not need to sort at all to do
> > > the grouping and it should all be fast.
> >
> > Not sure if that would make a difference here, since the whole table is
> > being read.
>
> The goal was to avoid the sorting which should not be needed with that
> index (I hope). So I still think that it would help in this case.

Sorry - not being clear. I can see how it _might_ help, but will the planner 
take into account the fact that even though:
  index-cost > seqscan-cost
that
  (index-cost + no-sorting) < (seqscan-cost + sort-cost)
assuming of course, that the costs turn out that way.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Slow query problem

2004-01-09 Thread Dennis Björklund
On Fri, 9 Jan 2004, Richard Huxton wrote:

> > > select invheadref, invprodref, sum(units)
> > > from invtran
> > > group by invheadref, invprodref
> 
> > For the above query, shouldn't you have one index for both columns
> > (invheadref, invprodref). Then it should not need to sort at all to do the
> > grouping and it should all be fast.
> 
> Not sure if that would make a difference here, since the whole table is being 
> read. 

The goal was to avoid the sorting which should not be needed with that 
index (I hope). So I still think that it would help in this case.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Slow query problem

2004-01-09 Thread Richard Huxton
On Friday 09 January 2004 07:29, Dennis Björklund wrote:
> On Thu, 8 Jan 2004, Bradley Tate wrote:
> >
> > select invheadref, invprodref, sum(units)
> > from invtran
> > group by invheadref, invprodref

> For the above query, shouldn't you have one index for both columns
> (invheadref, invprodref). Then it should not need to sort at all to do the
> grouping and it should all be fast.

Not sure if that would make a difference here, since the whole table is being 
read. 

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow query problem

2004-01-09 Thread Dennis Björklund
On Thu, 8 Jan 2004, Bradley Tate wrote:

> We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, 
> Redhat 9) to do some basic comparisons between postgresql and firebird 
> 1.0.3 and 1.5rc8. Mostly the results are comparable, with one 
> significant exception.
> 
> QUERY
> select invheadref, invprodref, sum(units)
> from invtran
> group by invheadref, invprodref
> 
> RESULTS
> pg 7.3.4  -  5.5 min
> pg 7.4.0  -  10 min
> fb 1.0.3   -  64 sec
> fb 1.5 -   44 sec
> 
> * The invtran table has about 2.5 million records, invheadref and 
> invprodref are both char(10) and indexed.

For the above query, shouldn't you have one index for both columns
(invheadref, invprodref). Then it should not need to sort at all to do the
grouping and it should all be fast.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow query problem

2004-01-08 Thread Tom Lane
Mike Glover <[EMAIL PROTECTED]> writes:
> You should bump sort_mem as high as you can stand.  with only 8MB sort
> memory available, you're swapping intermediate sort pages to disk --
> a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
> memory). 

7.4 will probably flip over to a hash-based aggregation method, and not
sort at all, once you make sort_mem large enough that it thinks the hash
table will fit in sort_mem.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Slow query problem

2004-01-08 Thread Bruno Wolff III
On Thu, Jan 08, 2004 at 19:27:16 -0800,
  Mike Glover <[EMAIL PROTECTED]> wrote:
> 
> You should bump sort_mem as high as you can stand.  with only 8MB sort
> memory available, you're swapping intermediate sort pages to disk --
> a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
> memory). 

Postgres also might be able to switch to a hash aggregate instead of
using a sort if sortmem is made large enough to hold the results for
all of the (estimated) groups.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Slow query problem

2004-01-08 Thread Mike Glover
On Thu, 08 Jan 2004 16:52:05 +1100
Bradley Tate <[EMAIL PROTECTED]> wrote:
> Am I correct in interpreting that most time was spent doing the
> sorting? 

looks so.  your table is about 70MB total size, and its getting loaded
completely into memory (you have 12000 * 8k = 96M available).  26s to
load 70MB from disk seems reasonable.  The rest of the time is used for
sorting.

> Explain confuses the heck out of me and any help on how I could make 
> this run faster would be gratefully received.
> 

You should bump sort_mem as high as you can stand.  with only 8MB sort
memory available, you're swapping intermediate sort pages to disk --
a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
memory). 

-mike

> Cheers,
> 
> Bradley.
> 
> 
> ---(end of
> broadcast)--- TIP 8: explain analyze is your
> friend


-- 
Mike Glover
Key ID BFD19F2C <[EMAIL PROTECTED]>


pgp0.pgp
Description: PGP signature


[PERFORM] Slow query problem

2004-01-08 Thread Bradley Tate
Hi,

We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, 
Redhat 9) to do some basic comparisons between postgresql and firebird 
1.0.3 and 1.5rc8. Mostly the results are comparable, with one 
significant exception.

QUERY
select invheadref, invprodref, sum(units)
from invtran
group by invheadref, invprodref
RESULTS
pg 7.3.4  -  5.5 min
pg 7.4.0  -  10 min
fb 1.0.3   -  64 sec
fb 1.5 -   44 sec
* The invtran table has about 2.5 million records, invheadref and 
invprodref are both char(10) and indexed.
* shared_buffers = 12000 and sort_mem = 8192 are the only changes I've 
made to postgresql.conf, with relevant changes to shmall and shmmax.

This is an explain analyse plan from postgresql 7.4:

QUERY PLAN
 

GroupAggregate  (cost=572484.23..601701.15 rows=1614140 width=39) 
(actual time=500091.171..554203.189 rows=147621 loops=1)
  ->  Sort  (cost=572484.23..578779.62 rows=2518157 width=39) (actual 
time=500090.939..527500.940 rows=2521530 loops=1)
Sort Key: invheadref, invprodref
->  Seq Scan on invtran  (cost=0.00..112014.57 rows=2518157 
width=39) (actual time=16.002..25516.917 rows=2521530 loops=1)
Total runtime: 554826.827 ms
(5 rows)

Am I correct in interpreting that most time was spent doing the sorting? 
Explain confuses the heck out of me and any help on how I could make 
this run faster would be gratefully received.

Cheers,

Bradley.

---(end of broadcast)---
TIP 8: explain analyze is your friend