Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Ian Westmacott
Yes, sorry, two totally different machines.  The 7.4.8
run was on a dual P4 3.2GHz, and the 7.4.2 run was on
a dual hyperthreaded Xeon 2.4GHz.

--Ian


On Tue, 2005-08-09 at 10:33, Tom Lane wrote:
 Ian Westmacott [EMAIL PROTECTED] writes:
  On Mon, 2005-08-08 at 20:58, Tom Lane wrote:
  I'd be interested to see results from other people using 7.4.* too.
 
  7.4.8:
   Total runtime: 0.198 ms
 
  7.4.2:
   Total runtime: 0.697 ms
 
 Just to be clear: those are two different machines of different speeds,
 right?  I don't believe we put any factor-of-three speedups into 7.4.*
 after release ;-)
 
   regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Rhett Garber
Duplicated your setup in a separate DB.

At least its reproducable for me.

I tested this on a Xeon 2 Ghz, 1 Gig Ram. Its running on some shared
storage array that I'm not sure the details of.

My production example also shows up on our production machine that is
almost the same hardware but has dual zeon and 6 gigs of ram.

Rhett

Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
rows=1 loops=1)
   Hash Cond: (outer.id = inner.obj2)
   -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
(actual time=0.007..0.015 rows=5 loops=1)
   -  Hash  (cost=4.83..4.83 rows=1 width=4) (actual
time=0.055..0.055 rows=0 loops=1)
 -  Index Scan using connection_regid_obj1_index on
connection  (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
rows=1 loops=1)
   Index Cond: ((connection_registry_id = 40105) AND (obj1
= 73582)) Total runtime: 7.693 ms
(7 rows)


On 8/8/05, Tom Lane [EMAIL PROTECTED] wrote:
 Rhett Garber [EMAIL PROTECTED] writes:
  This is postgres 7.4.1
  All the rows involved are integers.
 
 Hmph.  There is something really strange going on here.  I tried to
 duplicate your problem in 7.4.*, thus:
 
 regression=# create table rtmessagestate(id int, f1 char(6));
 CREATE TABLE
 regression=# insert into rtmessagestate values(1,'z');
 INSERT 559399 1
 regression=# insert into rtmessagestate values(2,'z');
 INSERT 559400 1
 regression=# insert into rtmessagestate values(3,'z');
 INSERT 559401 1
 regression=# insert into rtmessagestate values(4,'z');
 INSERT 559402 1
 regression=# insert into rtmessagestate values(5,'z');
 INSERT 559403 1
 regression=# vacuum analyze rtmessagestate;
 VACUUM
 regression=# create table connection(connection_registry_id int, obj1 int, 
 obj2 int);
 CREATE TABLE
 regression=# create index connection_regid_obj1_index on 
 connection(connection_registry_id,obj1);
 CREATE INDEX
 regression=# insert into  connection values(40105,73582,3);
 INSERT 559407 1
 regression=# explain analyze select rtmessagestate.* from 
 rtmessagestate,connection where (connection_registry_id =  40105) AND (obj1  
 = 73582) and id = obj2;
  QUERY 
 PLAN
 
  Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 
 rows=1 loops=1)
Hash Cond: (outer.id = inner.obj2)
-  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14) (actual 
 time=0.030..0.072 rows=5 loops=1)
-  Hash  (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 
 rows=0 loops=1)
  -  Index Scan using connection_regid_obj1_index on connection  
 (cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264 rows=1 loops=1)
Index Cond: ((connection_registry_id = 40105) AND (obj1 = 
 73582))
  Total runtime: 1.119 ms
 (7 rows)
 
 This duplicates your example as to plan and row counts:
 
  Hash Join  (cost=5.96..7.04 rows=1 width=14) (actual
  time=10.591..10.609 rows=1 loops=1)
  Hash Cond: (outer.id = inner.obj2)
  -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
  (actual time=0.011..0.022 rows=5 loops=1)
  -  Hash  (cost=5.96..5.96 rows=1 width=4) (actual
  time=0.109..0.109 rows=0 loops=1)
  -  Index Scan using connection_regid_obj1_index on
  connection  (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076
  rows=1 loops=1)
  Index Cond: ((connection_registry_id = 40105) AND (obj1
  = 73582)) Total runtime: 11.536 ms
  (7 rows)
 
 My machine is considerably slower than yours, to judge by the actual
 elapsed times in the scan nodes ... so why is it beating the pants
 off yours in the join step?
 
 Can you try the above script verbatim in a scratch database and see
 what you get?  (Note it's worth trying the explain two or three
 times to be sure the values have settled out.)
 
 I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one
 possible reason for the discrepancy between my results and yours, but I
 do not see anything in the 7.4 CVS logs that looks like it's related to
 hashjoin performance.
 
 I'd be interested to see results from other people using 7.4.* too.
 
 regards, tom lane


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Tom Lane
Rhett Garber [EMAIL PROTECTED] writes:
 Duplicated your setup in a separate DB.
 At least its reproducable for me.

Hmm.  Well, we now have several data points but they seem to be on
wildly varying hardware.  To try to normalize the results a little,
I computed the total actual time for the hash plan divided by the sum
of the actual times for the two scan nodes.  Thus, for your example:

 Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
 rows=1 loops=1)
Hash Cond: (outer.id = inner.obj2)
-  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
 (actual time=0.007..0.015 rows=5 loops=1)
-  Hash  (cost=4.83..4.83 rows=1 width=4) (actual
 time=0.055..0.055 rows=0 loops=1)
  -  Index Scan using connection_regid_obj1_index on
 connection  (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
 rows=1 loops=1)
Index Cond: ((connection_registry_id = 40105) AND (obj1
 = 73582)) Total runtime: 7.693 ms
 (7 rows)

this would be 7.159 / (0.015 + 0.032).  This is probably not an
enormously robust statistic but it at least focuses attention in the
right place.  Here's what I get (rounded off to 4 digits which is surely
as much precision as we have in the numbers):

 Tom 7.4.8+   1.619
 Ian 7.4.86.000
 Ian 7.4.2   13.95
 Steinar 7.4.78.833
 Rhett orig 108.3
 Rhett test 152.3
 Michael 7.4.12.015

My number seems to be a bit of an outlier to the low side, but yours are
way the heck to the high side.  And Michael's test seems to rule out the
idea that it's something broken in 7.4.1 in particular.

I'm now thinking you've got either a platform- or compiler-specific
problem.  Exactly what is the hardware (the CPU not the disks)?  How did
you build or come by the Postgres executables (compiler, configure
options, etc)?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Rhett Garber
 I'm now thinking you've got either a platform- or compiler-specific
 problem.  Exactly what is the hardware (the CPU not the disks)?  How did
 you build or come by the Postgres executables (compiler, configure
 options, etc)?

I've tried it on two of our machines, both HP Proliant DL580:
Production: Intel(R) Xeon(TM) MP CPU 2.80GHz (I think there are 2
physical CPUs with Hyperthreading, shows up as 4)
   6 gigs RAM
Development: Intel(R) XEON(TM) MP CPU 2.00GHz (I have vague
recollection of disabling hyperthreading on this chip because of some
other kernel issue)
  1 gig RAM

They are both running SuSE 8, 2.4.21-128-smp kernel

Compile instructions (I didn't do it myself) indicate we built from
source with nothing fancy:

tar xpvf postgresql-7.4.1.tar.bz2
cd postgresql-7.4.1
./configure --prefix=/usr/local/postgresql-7.4.1
make
make install
make install-all-headers

If i run 'file' on /usr/local/postgresql-7.4.1/bin/postgres :
postgres: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
dynamically linked (uses shared libs), not stripped

Thanks for all your help guys,

Rhett

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Tom Lane
Rhett Garber [EMAIL PROTECTED] writes:
 They are both running SuSE 8, 2.4.21-128-smp kernel

 Compile instructions (I didn't do it myself) indicate we built from
 source with nothing fancy:

You could double-check the configure options by running pg_config.
But probably the more interesting question is whether any nondefault
CFLAGS were used, and I don't think pg_config records that.
(Hmm, maybe it should.)

In any case, there's no smoking gun there.  I'm now wondering if maybe
there's something unusual about your runtime parameters.  AFAIR you
didn't show us your postgresql.conf settings --- could we see any
nondefault entries there?

(I looked quickly at the 7.4 hashjoin code, and I see that it uses a
hash table sized according to sort_mem even when the input is predicted
to be very small ... so an enormous sort_mem setting would account for
some plan startup overhead to initialize the table ...)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Rhett Garber
Well that could be an issue, is this abnormally large:

#shared_buffers = 1536  # min 16, at least max_connections*2, 8KB each
shared_buffers = 206440
#sort_mem = 131072  # min 64, size in KB
sort_mem = 524288   # min 64, size in KB
vacuum_mem = 131072 # min 1024, size in K

I actually had a lot of trouble finding example values for these... no
one wants to give real numbers in any postgres performance tuning
articles I saw. What would be appropriate for machines with 1 or 6
gigs of RAM and wanting to maximize performance.

Rhett

On 8/9/05, Tom Lane [EMAIL PROTECTED] wrote:
 Rhett Garber [EMAIL PROTECTED] writes:
  They are both running SuSE 8, 2.4.21-128-smp kernel
 
  Compile instructions (I didn't do it myself) indicate we built from
  source with nothing fancy:
 
 You could double-check the configure options by running pg_config.
 But probably the more interesting question is whether any nondefault
 CFLAGS were used, and I don't think pg_config records that.
 (Hmm, maybe it should.)
 
 In any case, there's no smoking gun there.  I'm now wondering if maybe
 there's something unusual about your runtime parameters.  AFAIR you
 didn't show us your postgresql.conf settings --- could we see any
 nondefault entries there?
 
 (I looked quickly at the 7.4 hashjoin code, and I see that it uses a
 hash table sized according to sort_mem even when the input is predicted
 to be very small ... so an enormous sort_mem setting would account for
 some plan startup overhead to initialize the table ...)
 
 regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Tom Lane
Rhett Garber [EMAIL PROTECTED] writes:
 Well that could be an issue, is this abnormally large:
 #shared_buffers = 1536  # min 16, at least max_connections*2, 8KB each
 shared_buffers = 206440
 #sort_mem = 131072  # min 64, size in KB
 sort_mem = 524288   # min 64, size in KB
 vacuum_mem = 131072 # min 1024, size in K

The vacuum_mem number is OK I think, but both of the others seem
unreasonably large.  Conventional wisdom about shared_buffers is that
the sweet spot is maybe 1 or so buffers, rarely more than 5.
(Particularly in pre-8.0 releases, there are code paths that grovel
through all the buffers linearly, so there is a significant cost to
making it too large.)  Don't worry about it being too small to make
effective use of RAM --- we rely on the kernel's disk cache to do that.

sort_mem is *per sort*, and so half a gig in a machine with only a
couple of gig is far too much except when you know you have only one
query running.  A couple dozen backends each trying to use half a gig
will drive you into the ground in no time.  Conventional wisdom here
is that the global setting should be conservatively small (perhaps
10Mb to 100Mb depending on how many concurrent backends you expect to
have), and then you can explicitly increase it locally with SET for
specific queries that need it.

In terms of the problem at hand, try the test case with a few different
values of sort_mem (use SET to adjust it, you don't need to touch the
config file) and see what happens.  I think the cost you're seeing is
just startup overhead to zero a hash table of a few hundred meg ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-09 Thread Rhett Garber
Bingo, the smaller the sort_mem, the faster that query is.

Thanks a lot to everybody that helped, i'll tweak with these values
more when I get a chance now that I have some guidelines that make
sense.

Rhett

On 8/9/05, Tom Lane [EMAIL PROTECTED] wrote:
 Rhett Garber [EMAIL PROTECTED] writes:
  Well that could be an issue, is this abnormally large:
  #shared_buffers = 1536  # min 16, at least max_connections*2, 8KB 
  each
  shared_buffers = 206440
  #sort_mem = 131072  # min 64, size in KB
  sort_mem = 524288   # min 64, size in KB
  vacuum_mem = 131072 # min 1024, size in K
 
 The vacuum_mem number is OK I think, but both of the others seem
 unreasonably large.  Conventional wisdom about shared_buffers is that
 the sweet spot is maybe 1 or so buffers, rarely more than 5.
 (Particularly in pre-8.0 releases, there are code paths that grovel
 through all the buffers linearly, so there is a significant cost to
 making it too large.)  Don't worry about it being too small to make
 effective use of RAM --- we rely on the kernel's disk cache to do that.
 
 sort_mem is *per sort*, and so half a gig in a machine with only a
 couple of gig is far too much except when you know you have only one
 query running.  A couple dozen backends each trying to use half a gig
 will drive you into the ground in no time.  Conventional wisdom here
 is that the global setting should be conservatively small (perhaps
 10Mb to 100Mb depending on how many concurrent backends you expect to
 have), and then you can explicitly increase it locally with SET for
 specific queries that need it.
 
 In terms of the problem at hand, try the test case with a few different
 values of sort_mem (use SET to adjust it, you don't need to touch the
 config file) and see what happens.  I think the cost you're seeing is
 just startup overhead to zero a hash table of a few hundred meg ...
 
 regards, tom lane


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-08 Thread Rhett Garber
This is postgres 7.4.1

All the rows involved are integers.

Thanks,

Rhett

On 8/5/05, Tom Lane [EMAIL PROTECTED] wrote:
 Rhett Garber [EMAIL PROTECTED] writes:
  Hash Join  (cost=5.96..7.04 rows=1 width=14) (actual
  time=10.591..10.609 rows=1 loops=1)
 Hash Cond: (outer.id = inner.obj2)
 -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
  (actual time=0.011..0.022 rows=5 loops=1)
 -  Hash  (cost=5.96..5.96 rows=1 width=4) (actual
  time=0.109..0.109 rows=0 loops=1)
   -  Index Scan using connection_regid_obj1_index on
  connection  (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076
  rows=1 loops=1)
 Index Cond: ((connection_registry_id = 40105) AND (obj1
  = 73582)) Total runtime: 11.536 ms
  (7 rows)
 
 [ scratches head... ]  If the hash table build takes only 0.109 msec
 and loads only one row into the hash table, and the scan of
 rtmessagestate takes only 0.022 msec and produces only 5 rows, it is
 real hard to see how the join takes 10.609 msec overall.  Unless the id
 and obj2 columns are of a datatype with an incredibly slow equality
 function.  What is the datatype involved here, anyway?  And what PG
 version are we speaking of?
 
 regards, tom lane


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-08 Thread Tom Lane
Rhett Garber [EMAIL PROTECTED] writes:
 This is postgres 7.4.1
 All the rows involved are integers.

Hmph.  There is something really strange going on here.  I tried to
duplicate your problem in 7.4.*, thus:

regression=# create table rtmessagestate(id int, f1 char(6));
CREATE TABLE
regression=# insert into rtmessagestate values(1,'z');
INSERT 559399 1
regression=# insert into rtmessagestate values(2,'z');
INSERT 559400 1
regression=# insert into rtmessagestate values(3,'z');
INSERT 559401 1
regression=# insert into rtmessagestate values(4,'z');
INSERT 559402 1
regression=# insert into rtmessagestate values(5,'z');
INSERT 559403 1
regression=# vacuum analyze rtmessagestate;
VACUUM
regression=# create table connection(connection_registry_id int, obj1 int, obj2 
int);
CREATE TABLE
regression=# create index connection_regid_obj1_index on 
connection(connection_registry_id,obj1);
CREATE INDEX
regression=# insert into  connection values(40105,73582,3);
INSERT 559407 1
regression=# explain analyze select rtmessagestate.* from 
rtmessagestate,connection where (connection_registry_id =  40105) AND (obj1  = 
73582) and id = obj2;
 QUERY PLAN

 Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=0.498..0.544 rows=1 
loops=1)
   Hash Cond: (outer.id = inner.obj2)
   -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14) (actual 
time=0.030..0.072 rows=5 loops=1)
   -  Hash  (cost=4.83..4.83 rows=1 width=4) (actual time=0.305..0.305 rows=0 
loops=1)
 -  Index Scan using connection_regid_obj1_index on connection  
(cost=0.00..4.83 rows=1 width=4) (actual time=0.236..0.264 rows=1 loops=1)
   Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582))
 Total runtime: 1.119 ms
(7 rows)

This duplicates your example as to plan and row counts:

 Hash Join  (cost=5.96..7.04 rows=1 width=14) (actual
 time=10.591..10.609 rows=1 loops=1)
 Hash Cond: (outer.id = inner.obj2)
 -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14)
 (actual time=0.011..0.022 rows=5 loops=1)
 -  Hash  (cost=5.96..5.96 rows=1 width=4) (actual
 time=0.109..0.109 rows=0 loops=1)
 -  Index Scan using connection_regid_obj1_index on
 connection  (cost=0.00..5.96 rows=1 width=4) (actual time=0.070..0.076
 rows=1 loops=1)
 Index Cond: ((connection_registry_id = 40105) AND (obj1
 = 73582)) Total runtime: 11.536 ms
 (7 rows)

My machine is considerably slower than yours, to judge by the actual
elapsed times in the scan nodes ... so why is it beating the pants
off yours in the join step?

Can you try the above script verbatim in a scratch database and see
what you get?  (Note it's worth trying the explain two or three
times to be sure the values have settled out.)

I'm testing a fairly recent 7.4-branch build (7.4.8 plus), so that's one
possible reason for the discrepancy between my results and yours, but I
do not see anything in the 7.4 CVS logs that looks like it's related to
hashjoin performance.

I'd be interested to see results from other people using 7.4.* too.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Why hash join instead of nested loop?

2005-08-08 Thread Steinar H. Gunderson
On Mon, Aug 08, 2005 at 08:58:26PM -0400, Tom Lane wrote:
 Hmph.  There is something really strange going on here.  I tried to
 duplicate your problem in 7.4.*, thus:

PostgreSQL 7.4.7 (Debian sarge):

create table and stuff, exactly the same as you

regression=# explain analyze select rtmessagestate.* from 
rtmessagestate,connection where (connection_registry_id =  40105) AND (obj1  = 
73582) and id = obj2;

 QUERY PLAN 


 Hash Join  (cost=4.83..5.91 rows=1 width=14) (actual time=0.155..0.159 rows=1 
loops=1)
   Hash Cond: (outer.id = inner.obj2)
   -  Seq Scan on rtmessagestate  (cost=0.00..1.05 rows=5 width=14) (actual 
time=0.003..0.006 rows=5 loops=1)
   -  Hash  (cost=4.83..4.83 rows=1 width=4) (actual time=0.026..0.026 rows=0 
loops=1)
 -  Index Scan using connection_regid_obj1_index on connection  
(cost=0.00..4.83 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1)
   Index Cond: ((connection_registry_id = 40105) AND (obj1 = 73582))
 Total runtime: 0.215 ms
(7 rows)

This is an Opteron (in 32-bit mode), though.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster