Re: [PERFORM] mergehashloop

2006-04-20 Thread Jim C. Nasby
On Wed, Apr 19, 2006 at 01:25:28AM -0400, Tom Lane wrote:
 Mark Kirkwood [EMAIL PROTECTED] writes:
  Jim C. Nasby wrote:
  Good point. :/ I'm guessing there's no easy way to see how many blocks
  for a given relation are in shared memory, either...
 
  contrib/pg_buffercache will tell you this -
 
 I think the key word in Jim's comment was easy, ie, cheap.  Grovelling
 through many thousands of buffers to count the matches to a given
 relation doesn't sound appetizing, especially not if it gets done over
 again several times during each query-planning cycle.  Trying to keep
 centralized counts somewhere would be even worse (because of locking/
 contention issues).

Very true. OTOH, it might not be unreasonable to periodically slog
through the buffers and store that information, perhaps once a minute,
or every X number of transactions.

I think a bigger issue is that we currently have no way to really
measure the effictiveness of the planner. Without that it's impossible
to come up with any real data on whether cost formula A is better or
worse than cost formula B. The only means I can think of for doing this
would be to measure estimated cost vs actual cost, but with the overhead
of EXPLAIN ANALYZE and other variables that might not prove terribly
practical. Maybe someone else has some ideas...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] mergehashloop

2006-04-18 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 Well, the other thing that's going on here is that we know we are
 overestimating the cost of nestloop-with-inner-indexscan plans.
 The current estimation for that is basically outer scan cost plus N
 times inner scan cost where N is the estimated number of outer tuples;
 in other words the repeated indexscan probes are each assumed to happen
 from a cold start.  In reality, caching of the upper levels of the index
 means that the later index probes are much cheaper than this model
 thinks.  We've known about this for some time but no one's yet proposed
 a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

 In my mind this is tied into another issue, which is that the planner
 always costs on the basis of each query starting from zero.  In a real
 environment it's much cheaper to use heavily-used indexes than this cost
 model suggests, because they'll already be swapped in due to use by
 previous queries.  But we haven't got any infrastructure to keep track
 of what's been heavily used, let alone a cost model that could make use
 of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

 I think part of the reason that people commonly reduce random_page_cost
 to values much lower than physical reality would suggest is that it
 provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 12:51:59PM +0200, Markus Schaber wrote:
  In my mind this is tied into another issue, which is that the planner
  always costs on the basis of each query starting from zero.  In a real
  environment it's much cheaper to use heavily-used indexes than this cost
  model suggests, because they'll already be swapped in due to use by
  previous queries.  But we haven't got any infrastructure to keep track
  of what's been heavily used, let alone a cost model that could make use
  of the info.
 
 An easy first approach would be to add a user tunable cache probability
 value to each index (and possibly table) between 0 and 1. Then simply
 multiply random_page_cost with (1-that value) for each scan.
 
 Later, this value could be automatically tuned by stats analysis or
 other means.

Actually, if you run with stats_block_level turned on you have a
first-order approximation of what is and isn't cached. Perhaps the
planner could make use of this information if it's available.

  I think part of the reason that people commonly reduce random_page_cost
  to values much lower than physical reality would suggest is that it
  provides a crude way of partially compensating for this basic problem.
 
 I totall agree with this, it's just what we did here from time to time. :-)
 
 Hmm, how does effective_cach_size correspond with it? Shouldn't a high
 effective_cache_size have a similar effect?

Generally, effective_cache_size is used to determine the likelyhood that
something will be in-cache. random_page_cost tells us how expensive it
will be to get that information if it isn't in cache.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Actually, if you run with stats_block_level turned on you have a
 first-order approximation of what is and isn't cached.

Only if those stats decayed (pretty fast) with time; which they don't.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 Hmm, how does effective_cach_size correspond with it? Shouldn't a high
 effective_cache_size have a similar effect?

It seems reasonable to suppose that effective_cache_size ought to be
used as a number indicating how much stuff would hang around from
query to query.  Right now it's not used that way...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Actually, if you run with stats_block_level turned on you have a
  first-order approximation of what is and isn't cached.
 
 Only if those stats decayed (pretty fast) with time; which they don't.

Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Tue, Apr 18, 2006 at 06:26:48PM -0400, Tom Lane wrote:
 Markus Schaber [EMAIL PROTECTED] writes:
  Hmm, how does effective_cach_size correspond with it? Shouldn't a high
  effective_cache_size have a similar effect?
 
 It seems reasonable to suppose that effective_cache_size ought to be
 used as a number indicating how much stuff would hang around from
 query to query.  Right now it's not used that way...

Maybe it would be a reasonable first pass to have estimators calculate
the cost if a node found everything it wanted in cache and then do a
linear interpolation between that and the costs we currently come up
with? Something like pg_class.relpages / sum(pg_class.relpages) would
give an idea of how much of a relation is likely to be cached, which
could be used for the linear interpolation.

Of course having *any* idea as to how much of a relation was actually in
shared_buffers (or better yet, the OS cache) would be a lot more
accurate, but this simple method might be a good enough first-pass.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 An easy first approach would be to add a user tunable cache probability
 value to each index (and possibly table) between 0 and 1. Then simply
 multiply random_page_cost with (1-that value) for each scan.

That's not the way you'd need to use it.  But on reflection I do think
there's some merit in a cache probability parameter, ranging from zero
(giving current planner behavior) to one (causing the planner to assume
everything is already in cache from prior queries).  We'd have to look
at exactly how such an assumption should affect the cost equations ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] mergehashloop

2006-04-18 Thread Mark Kirkwood

Jim C. Nasby wrote:

On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:

Jim C. Nasby [EMAIL PROTECTED] writes:

Actually, if you run with stats_block_level turned on you have a
first-order approximation of what is and isn't cached.

Only if those stats decayed (pretty fast) with time; which they don't.


Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...


contrib/pg_buffercache will tell you this - what buffers from what 
relation are in shared_buffers (if you want to interrogate the os file 
buffer cache, that's a different story - tho I've been toying with doing 
a utility for  Freebsd that would do this).


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] mergehashloop

2006-04-18 Thread Jim C. Nasby
On Wed, Apr 19, 2006 at 04:47:40PM +1200, Mark Kirkwood wrote:
 Jim C. Nasby wrote:
 On Tue, Apr 18, 2006 at 06:22:26PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
 Actually, if you run with stats_block_level turned on you have a
 first-order approximation of what is and isn't cached.
 Only if those stats decayed (pretty fast) with time; which they don't.
 
 Good point. :/ I'm guessing there's no easy way to see how many blocks
 for a given relation are in shared memory, either...
 
 contrib/pg_buffercache will tell you this - what buffers from what 
 relation are in shared_buffers (if you want to interrogate the os file 

So theoretically with that code we could make the cost estimator
functions more intelligent about actual query costs. Now, how you'd
actually see how those estimates improved...

 buffer cache, that's a different story - tho I've been toying with doing 
 a utility for  Freebsd that would do this).

Well, the problem is that I doubt anything that OS-specific would be
accepted into core. What we really need is some method that's
OS-agnostic...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-18 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 Jim C. Nasby wrote:
 Good point. :/ I'm guessing there's no easy way to see how many blocks
 for a given relation are in shared memory, either...

 contrib/pg_buffercache will tell you this -

I think the key word in Jim's comment was easy, ie, cheap.  Grovelling
through many thousands of buffers to count the matches to a given
relation doesn't sound appetizing, especially not if it gets done over
again several times during each query-planning cycle.  Trying to keep
centralized counts somewhere would be even worse (because of locking/
contention issues).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] mergehashloop

2006-04-18 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:

Jim C. Nasby wrote:

Good point. :/ I'm guessing there's no easy way to see how many blocks
for a given relation are in shared memory, either...



contrib/pg_buffercache will tell you this -


I think the key word in Jim's comment was easy, ie, cheap.  Grovelling
through many thousands of buffers to count the matches to a given
relation doesn't sound appetizing, especially not if it gets done over
again several times during each query-planning cycle.  Trying to keep
centralized counts somewhere would be even worse (because of locking/
contention issues).



Yeah - not sensible for a transaction oriented system - might be ok for 
DSS tho.


Cheers

mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] mergehashloop

2006-04-14 Thread Ian Westmacott
I have this query, where PG (8.1.2) prefers Merge Join over Hash Join
over Nested Loop.  However, this order turns out to increase in
performance.  I was hoping someone might be able to shed some light on
why PG chooses the plans in this order, and what I might do to
influence it otherwise.  Thanks,

itvtrackdata= explain analyze SELECT mva,blob.* FROM
blobs33c3_c16010 AS blob NATURAL JOIN
objects33c3 WHERE
AREA(BOX(POINT(bbox_x1,bbox_y0),POINT(bbox_x0,bbox_y1))#BOX('(50,10),(10,50)'))/AREA(BOX(POINT(bbox_x1,bbox_y0),POINT(bbox_x0,bbox_y1)))0
 AND time=1263627787-32768 AND time1273458187 AND finish-start=8738 ORDER BY 
time ASC;

QUERY
PLAN

 

 Sort  (cost=47170.44..47184.46 rows=5607 width=28) (actual
time=2661.980..2663.642 rows=1687 loops=1)
   Sort Key: blob.time
   -  Merge Join  (cost=46547.88..46821.32 rows=5607 width=28) (actual
time=2645.685..2657.621 rows=1687 loops=1)
 Merge Cond: (outer.sva = inner.sva)
 -  Sort  (cost=18003.31..18045.36 rows=16821 width=20) (actual
time=181.303..183.092 rows=1741 loops=1)
   Sort Key: blob.sva
   -  Bitmap Heap Scan on blobs33c3_c16010 blob
(cost=535.77..16822.65 rows=16821 width=20) (actual time=10.827..177.671
rows=1741 loops=1)
 Recheck Cond: ((time = 1263595019) AND (time 
1273458187))
 Filter: ((area((box(point((bbox_x1)::double
precision, (bbox_y0)::double precision), point((bbox_x0)::double
precision, (bbox_y1)::double precision)) # '(50,50),(10,10)'::box)) /
area(box(point((bbox_x1)::double precision, (bbox_y0)::double
precision), point((bbox_x0)::double precision, (bbox_y1)::double
precision  0::double precision)
 -  Bitmap Index Scan on
idx_blobs33c3_c16010_time  (cost=0.00..535.77 rows=50462
width=0) (actual time=8.565..8.565 rows=50673 loops=1)
   Index Cond: ((time = 1263595019) AND
(time  1273458187))
 -  Sort  (cost=28544.56..28950.88 rows=162526 width=16)
(actual time=2387.726..2437.429 rows=29969 loops=1)
   Sort Key: objects33c3.sva
   -  Seq Scan on objects33c3
(cost=0.00..14477.68 rows=162526 width=16) (actual time=0.085..826.125
rows=207755 loops=1)
 Filter: ((finish - start) = 8738)
 Total runtime: 2675.037 ms
(16 rows)

itvtrackdata= set enable_mergejoin to false;
SET
itvtrackdata= explain analyze SELECT mva,blob.* FROM
blobs33c3_c16010 AS blob NATURAL JOIN
objects33c3 WHERE
AREA(BOX(POINT(bbox_x1,bbox_y0),POINT(bbox_x0,bbox_y1))#BOX('(50,10),(10,50)'))/AREA(BOX(POINT(bbox_x1,bbox_y0),POINT(bbox_x0,bbox_y1)))0
 AND time=1263627787-32768 AND time1273458187 AND finish-start=8738 ORDER BY 
time ASC;

QUERY
PLAN

  
--
 Sort  (cost=65783.09..65797.10 rows=5607 width=28) (actual
time=1211.228..1212.671 rows=1687 loops=1)
   Sort Key: blob.time
   -  Hash Join  (cost=15419.77..65433.97 rows=5607 width=28) (actual
time=1067.514..1207.727 rows=1687 loops=1)
 Hash Cond: (outer.sva = inner.sva)
 -  Bitmap Heap Scan on blobs33c3_c16010 blob
(cost=535.77..16822.65 rows=16821 width=20) (actual time=14.720..149.179
rows=1741 loops=1)
   Recheck Cond: ((time = 1263595019) AND (time 
1273458187))
   Filter: ((area((box(point((bbox_x1)::double precision,
(bbox_y0)::double precision), point((bbox_x0)::double precision,
(bbox_y1)::double precision)) # '(50,50),(10,10)'::box)) /
area(box(point((bbox_x1)::double precision, (bbox_y0)::double
precision), point((bbox_x0)::double precision, (bbox_y1)::double
precision  0::double precision)
   -  Bitmap Index Scan on
idx_blobs33c3_c16010_time  (cost=0.00..535.77 rows=50462
width=0) (actual time=12.880..12.880 rows=50673 loops=1)
 Index Cond: ((time = 1263595019) AND (time 
1273458187))
 

Re: [PERFORM] mergehashloop

2006-04-14 Thread Tom Lane
Ian Westmacott [EMAIL PROTECTED] writes:
 I have this query, where PG (8.1.2) prefers Merge Join over Hash Join
 over Nested Loop.  However, this order turns out to increase in
 performance.  I was hoping someone might be able to shed some light on
 why PG chooses the plans in this order, and what I might do to
 influence it otherwise.  Thanks,

Reducing random_page_cost would influence it to prefer the nestloop.
However, I doubt you're ever going to get really ideal results for
this query --- the estimated row counts are too far off, and the
WHERE conditions are sufficiently bizarre that there's not much hope
that they'd ever be much better.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] mergehashloop

2006-04-14 Thread Ian Westmacott
On Fri, 2006-04-14 at 12:13 -0400, Tom Lane wrote:
 Ian Westmacott [EMAIL PROTECTED] writes:
  I have this query, where PG (8.1.2) prefers Merge Join over Hash Join
  over Nested Loop.  However, this order turns out to increase in
  performance.  I was hoping someone might be able to shed some light on
  why PG chooses the plans in this order, and what I might do to
  influence it otherwise.  Thanks,
 
 Reducing random_page_cost would influence it to prefer the nestloop.
 However, I doubt you're ever going to get really ideal results for
 this query --- the estimated row counts are too far off, and the
 WHERE conditions are sufficiently bizarre that there's not much hope
 that they'd ever be much better.

That's what I feared, thanks.  But then if I simplify things a bit,
such that the row counts are quite good, and PG still chooses a
worse plan, can I conclude anything about my configuration settings
(like random_page_cost)?

itvtrackdata= explain analyze SELECT mva,blob.* FROM
blobs33c3_c16010 AS blob NATURAL JOIN
objects33c3 WHERE time=1263627787 AND time1273458187 ORDER
BY time ASC;

QUERY
PLAN
  
--
 Sort  (cost=75426.47..75552.21 rows=50295 width=28) (actual
time=2083.624..2146.654 rows=50472 loops=1)
   Sort Key: blob.time
   -  Hash Join  (cost=16411.51..70749.84 rows=50295 width=28) (actual
time=1790.746..1910.204 rows=50472 loops=1)
 Hash Cond: (outer.sva = inner.sva)
 -  Bitmap Heap Scan on blobs33c3_c16010 blob
(cost=533.77..14421.19 rows=50295 width=20) (actual time=12.533..78.284
rows=50472 loops=1)
   Recheck Cond: ((time = 1263627787) AND (time 
1273458187))
   -  Bitmap Index Scan on
idx_blobs33c3_c16010_time  (cost=0.00..533.77 rows=50295
width=0) (actual time=12.447..12.447 rows=50472 loops=1)
 Index Cond: ((time = 1263627787) AND (time 
1273458187))
 -  Hash  (cost=12039.79..12039.79 rows=487579 width=16)
(actual time=1618.128..1618.128 rows=487579 loops=1)
   -  Seq Scan on objects33c3
(cost=0.00..12039.79 rows=487579 width=16) (actual time=0.019..833.931
rows=487579 loops=1)
 Total runtime: 2194.492 ms
(11 rows)

itvtrackdata= set enable_hashjoin to false;
SET
itvtrackdata= explain analyze SELECT mva,blob.* FROM
blobs33c3_c16010 AS blob NATURAL JOIN
objects33c3 WHERE time=1263627787 AND time1273458187 ORDER
BY time ASC;

QUERY
PLAN

--
 Sort  (cost=321096.91..321222.64 rows=50295 width=28) (actual
time=929.893..992.316 rows=50472 loops=1)
   Sort Key: blob.time
   -  Nested Loop  (cost=533.77..316420.28 rows=50295 width=28) (actual
time=16.780..719.140 rows=50472 loops=1)
 -  Bitmap Heap Scan on blobs33c3_c16010 blob
(cost=533.77..14421.19 rows=50295 width=20) (actual time=16.693..84.299
rows=50472 loops=1)
   Recheck Cond: ((time = 1263627787) AND (time 
1273458187))
   -  Bitmap Index Scan on
idx_blobs33c3_c16010_time  (cost=0.00..533.77 rows=50295
width=0) (actual time=16.546..16.546 rows=50472 loops=1)
 Index Cond: ((time = 1263627787) AND (time 
1273458187))
 -  Index Scan using idx_objects33c3_sva on
objects33c3  (cost=0.00..5.99 rows=1 width=16) (actual
time=0.006..0.008 rows=1 loops=50472)
   Index Cond: (outer.sva = objects33c3.sva)
 Total runtime: 1039.725 ms
(10 rows)

-- 
Ian Westmacott [EMAIL PROTECTED]
IntelliVid Corp.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] mergehashloop

2006-04-14 Thread Tom Lane
Ian Westmacott [EMAIL PROTECTED] writes:
 That's what I feared, thanks.  But then if I simplify things a bit,
 such that the row counts are quite good, and PG still chooses a
 worse plan, can I conclude anything about my configuration settings
 (like random_page_cost)?

Well, the other thing that's going on here is that we know we are
overestimating the cost of nestloop-with-inner-indexscan plans.
The current estimation for that is basically outer scan cost plus N
times inner scan cost where N is the estimated number of outer tuples;
in other words the repeated indexscan probes are each assumed to happen
from a cold start.  In reality, caching of the upper levels of the index
means that the later index probes are much cheaper than this model
thinks.  We've known about this for some time but no one's yet proposed
a more reasonable cost model.

In my mind this is tied into another issue, which is that the planner
always costs on the basis of each query starting from zero.  In a real
environment it's much cheaper to use heavily-used indexes than this cost
model suggests, because they'll already be swapped in due to use by
previous queries.  But we haven't got any infrastructure to keep track
of what's been heavily used, let alone a cost model that could make use
of the info.

I think part of the reason that people commonly reduce random_page_cost
to values much lower than physical reality would suggest is that it
provides a crude way of partially compensating for this basic problem.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly