subject:"\[PERFORM\] select distinct on varchar \- wild performance differences\!"

[PERFORM] select distinct on varchar -- wild performance differences!

2005-06-27 Thread Elliott Bennett

Hey, all.  I've bounced this around in #postgres for an hour or so, and 
it was suggested that I post it here as well.  Hopefully someone can 
help me out.


I have three machines.  All have 512MB of ram.
Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6.
Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3.  
(7.3.9 also exhibited the behaviour below, by the way)

Machine C is a 1.0ghz athlon, running centos 4.0,  pg verison 7.4.7.


The SAME data and schema is loaded (from a pg_dump, default parameters) 
onto all three machines.   With the same query: select distinct model 
from exif_common, machines A and C return results quickly (1/4 
second).  Machine B chews on it for 30ish seconds!  Note, this column is 
a VARCHAR(40).


Here's an explain analyze for it.

Machine A (fast): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2629.74..2732.11 rows=5 width=15) (actual time=211.358..265.049 
rows=6 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=15) (actual 
time=211.351..242.296 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=15) 
(actual time=0.022..58.635 rows=20473 loops=1)
Total runtime: 265.928 ms
(5 rows)



Machine B (slow): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2640.74..2743.11 rows=6 width=15) (actual 
time=27939.231..32914.134 rows=6 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=15) (actual 
time=27939.222..27983.784 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=15) 
(actual time=0.071..97.772 rows=20473 loops=1)
Total runtime: 32915.031 ms
(5 rows)


( yes, i know, six distinct rows out of 20,000   But holy moly!  1/4 
sec vs 32.9 sec?!?! )



Now, if I do a similar query against an INT column, the speeds are more 
in line with each other:


Machine A:
photos=# explain analyze select distinct imagewidth from exif_common;

QUERY PLAN
-
Unique  (cost=2629.74..2732.11 rows=36 width=4) (actual time=179.899..225.934 
rows=107 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=4) (actual 
time=179.891..207.632 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=4) 
(actual time=0.024..62.946 rows=20473 loops=1)
Total runtime: 226.707 ms
(5 rows)



Machine B:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN
-
Unique  (cost=2640.74..2743.11 rows=24 width=4) (actual time=209.394..287.131 
rows=107 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=4) (actual 
time=209.384..251.693 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=4) 
(actual time=0.074..94.574 rows=20473 loops=1)
Total runtime: 288.411 ms

(5 rows)




Machine C exhibits the same behaviour as A for all queries.

This weird slow behaviour on machine B also appeared in 7.3.9.  
Upgrading didn't seem to help.


neilc from irc thought it may be a qsort(2) quirk, but a sample C 
program I whipped up testing different sized data sets with a similar 
distribution gave very similar sort timings between the three 
machines..  Therefore, I don't think it's qsort(2) to blame...


Anyone have any ideas as to what may be up with machine B?   


Thanks,
-Elliott

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] select distinct on varchar - wild performance differences!

2005-06-23 Thread Elliott Bennett

Hey, all.  I've bounced this around in #postgres for an hour or so, and
it was suggested that I post it here as well.  Hopefully someone can
help me out.

I have three machines.  All have 512MB of ram.
Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6.
Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3.
(7.3.9 also exhibited the behaviour below, by the way)
Machine C is a 1.0ghz athlon, running centos 4.0,  pg verison 7.4.7.


The SAME data and schema is loaded (from a pg_dump, default parameters)
onto all three machines.   With the same query: select distinct model
from exif_common, machines A and C return results quickly (1/4
second).  Machine B chews on it for 30ish seconds!  Note, this column is
a VARCHAR(40).

Here's an explain analyze for it.

Machine A (fast):
photos=# explain analyze select distinct model from exif_common;
 QUERY PLAN
--
+--
Unique  (cost=2629.74..2732.11 rows=5 width=15) (actual
time=211.358..265.049 rows=6 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=15) (actual
  time=211.351..242.296 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473
width=15) (actual time=0.022..58.635 rows=20473 loops=1)
Total runtime: 265.928 ms
(5 rows)



Machine B (slow):
photos=# explain analyze select distinct model from exif_common;
 QUERY PLAN

+--
Unique  (cost=2640.74..2743.11 rows=6 width=15) (actual
time=27939.231..32914.134 rows=6 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=15) (actual
  time=27939.222..27983.784 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473
width=15) (actual time=0.071..97.772 rows=20473 loops=1)
Total runtime: 32915.031 ms
(5 rows)


( yes, i know, six distinct rows out of 20,000   But holy moly!  1/4
sec vs 32.9 sec?!?! )


Now, if I do a similar query against an INT column, the speeds are more
in line with each other:

Machine A:
photos=# explain analyze select distinct imagewidth from exif_common;

QUERY PLAN

+-
Unique  (cost=2629.74..2732.11 rows=36 width=4) (actual
time=179.899..225.934 rows=107 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=4) (actual
  time=179.891..207.632 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473
width=4)
(actual time=0.024..62.946 rows=20473 loops=1)
Total runtime: 226.707 ms
(5 rows)



Machine B:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN

+-
Unique  (cost=2640.74..2743.11 rows=24 width=4) (actual
time=209.394..287.131 rows=107 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=4) (actual
  time=209.384..251.693 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473
width=4)
(actual time=0.074..94.574 rows=20473 loops=1)
Total runtime: 288.411 ms

(5 rows)




Machine C exhibits the same behaviour as A for all queries.

This weird slow behaviour on machine B also appeared in 7.3.9.
Upgrading didn't seem to help.

neilc from irc thought it may be a qsort(2) quirk, but a sample C
program I whipped up testing different sized data sets with a similar
distribution gave very similar sort timings between the three
machines..  Therefore, I don't think it's qsort(2) to blame...

Anyone have any ideas as to what may be up with machine B?

Thanks,
-Elliott




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] select distinct on varchar - wild performance differences!

2005-06-23 Thread Tom Lane

Elliott Bennett [EMAIL PROTECTED] writes:
 Anyone have any ideas as to what may be up with machine B?

Different locale setting?  strcoll() can be horribly slow in some
locales ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] select distinct on varchar - wild performance differences!

2005-06-23 Thread Elliott Bennett

hah!  That did it.  Setting to 'C' makes it just as fast as the other
machines.  I think it defaulted to en_US...

Thanks!

-Elliott

On Thu, Jun 23, 2005 at 11:34:55AM -0400, Tom Lane wrote:
 Elliott Bennett [EMAIL PROTECTED] writes:
  Anyone have any ideas as to what may be up with machine B?
 
 Different locale setting?  strcoll() can be horribly slow in some
 locales ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] select distinct on varchar -- wild performance differences!

[PERFORM] select distinct on varchar - wild performance differences!

Re: [PERFORM] select distinct on varchar - wild performance differences!

Re: [PERFORM] select distinct on varchar - wild performance differences!

4 matches

Site Navigation

Mail list logo

Footer information