Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD

 Incrementing random_page_cost from 4 (the default) to 5 causes the
 planner to make a better decision.
 
 We have such a low default random_page_cost primarily to mask other
 problems in the optimizer, two of which are
 
 . multi-column index correlation
 
 . interpolation between min_IO_Cost and max_IO_cost which approximates
 max_IO_cost too fast.

One other important figure here is concurrency. If you have a lot of backends
concurrently doing other IO, your sequential IO numbers will suffer more than
random IO numbers. Might be, that some super smart OS readahead implementation 
aleviates that problem, but I have not yet experienced one.
Also less of random IO tends to get higher cache rates.

Thus I think if you are alone, 4 tends to be too low, while with concurrent 
load 
4 tends to be too high. (All assuming farly large tables, that don't fit into 
RAM)

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-19 Thread Zeugswetter Andreas DAZ SD

 But to get the estimated cost ratio to match up with the actual cost
 ratio, we'd have to raise random_page_cost to nearly 70, which is a bit
 hard to credit.  What was the platform being tested here?

Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s 
sequential.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-18 Thread Manfred Koizar
On Tue, 17 May 2005 22:12:17 -0700, Jeffrey W. Baker [EMAIL PROTECTED]
wrote:
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.

We have such a low default random_page_cost primarily to mask other
problems in the optimizer, two of which are

. multi-column index correlation

. interpolation between min_IO_Cost and max_IO_cost which approximates
max_IO_cost too fast.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-18 Thread Tom Lane
Jeffrey W. Baker [EMAIL PROTECTED] writes:
 ... If bitmap
 scan is disabled, the planner will pick index scan even in cases when
 sequential scan is 10x faster:

 scratch=# set enable_bitmapscan to off;
 SET
 scratch=# explain analyze select count(1) from test where random = 
 1429076987  and  random  1429076987 + 1000;
  QUERY PLAN   
 
  
 
  Aggregate  (cost=170142.03..170142.03 rows=1 width=0) (actual 
 time=177419.182..177419.185 rows=1 loops=1)
-  Index Scan using test_rand_idx on test  (cost=0.00..170034.11 
 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1)
  Index Cond: ((random = 1429076987) AND (random  1439076987))
  Total runtime: 177419.302 ms
 (4 rows)

 scratch=# set enable_indexscan to off;
 SET
 scratch=# explain analyze select count(1) from test where random = 
 1429076987  and  random  1429076987 + 1000;
   QUERY PLAN  
 
 --
  Aggregate  (cost=204165.55..204165.55 rows=1 width=0) (actual 
 time=12334.042..12334.045 rows=1 loops=1)
-  Seq Scan on test  (cost=0.00..204057.62 rows=43167 width=0) (actual 
 time=17.436..12174.150 rows=46764 loops=1)
  Filter: ((random = 1429076987) AND (random  1439076987))
  Total runtime: 12334.156 ms
 (4 rows)

 Obviously in this case sequential scan was (would have been) a huge win.
 Incrementing random_page_cost from 4 (the default) to 5 causes the
 planner to make a better decision.

But to get the estimated cost ratio to match up with the actual cost
ratio, we'd have to raise random_page_cost to nearly 70, which is a bit
hard to credit.  What was the platform being tested here?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)

2005-05-17 Thread Jeffrey W. Baker
On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote:
 Jeffrey W. Baker [EMAIL PROTECTED] writes:
  On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote:
  This is a fallacy, and I think your concern is largely mistaken.  Have
  you experimented with the cases you are worried about?
 
  Perhaps I have not stated the problem clearly.  Believe me, I have
  experimented extensively with this problem.
 
 Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*?
 The current code is certainly capable of choosing either seqscan,
 bitmap scan, or traditional index scan depending on the given query.
 For example,

[...]

 I think the work that's most needed at the moment is to test the
 bitmap-scan cost model to see if it has much to do with reality...

The cost model seems to work pretty darn well, as a matter of fact.  

This table is in-order by id, in-order by date, and randomly ordered by
random.

scratch=# \d test
 Table public.test
 Column |  Type   | Modifiers 
+-+---
 id | integer | 
 date   | date| 
 random | integer | 
Indexes:
test_pk UNIQUE, btree (id)
test_date_idx btree (date)
test_rand_idx btree (random)

scratch=# select count(1) from test;
  count   
--
 1000

The size of the tables and indexes is about 1G, or double physical
memory.  I tested by selecting on the random column.  For 100 random
values, I selected the row that matches exactly, then in ranges of 1000,
1, 10, and 100.  These touch roughly 5, 50, 500, and 5000
tuples, respectively.  The test is repeated for index scan, bitmap scan,
and sequential scan.

Example query:

select count(1) 
  from test  
 where random = 1429076987 
   and random  1429076987 + 1;

  QUERY PLAN
  
--
 Aggregate  (cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 
rows=1 loops=1)
   -  Bitmap Heap Scan on test  (cost=2.26..171.20 rows=43 width=0) (actual 
time=0.145..0.829 rows=42 loops=1)
 Recheck Cond: ((random = 1429076987) AND (random  1429086987))
 -  Bitmap Index Scan on test_rand_idx  (cost=0.00..2.26 rows=43 
width=0) (actual time=0.063..0.063 rows=42 loops=1)
   Index Cond: ((random = 1429076987) AND (random  1429086987))
 Total runtime: 1.114 ms

100 queries returning |  1  |  5  |  50  |  500  |  5000  |
--+-+-+--+---++
bitmap scan   |  1s |  2s |  13s | 1m41s | 11m16s |
index scan|  1s |  1s |  12s | 2m6s  | 24m19s |
--+-+-+--+---++
sequential scan   | 28m20s| 

The planner uses index scan for the first two columns, and chooses
bitmap scan for the rest.  This would seem to be a reasonable decision,
given the results.  The only real problem with the cost model in further
testing is that it doesn't switch to seq scan quickly enough.  If bitmap
scan is disabled, the planner will pick index scan even in cases when
sequential scan is 10x faster:

scratch=# set enable_bitmapscan to off;
SET
scratch=# explain analyze select count(1) from test where random = 1429076987  
and  random  1429076987 + 1000;
 QUERY PLAN 
  
  

 Aggregate  (cost=170142.03..170142.03 rows=1 width=0) (actual 
time=177419.182..177419.185 rows=1 loops=1)
   -  Index Scan using test_rand_idx on test  (cost=0.00..170034.11 rows=43167 
width=0) (actual time=0.035..177255.696 rows=46764 loops=1)
 Index Cond: ((random = 1429076987) AND (random  1439076987))
 Total runtime: 177419.302 ms
(4 rows)

scratch=# set enable_indexscan to off;
SET
scratch=# explain analyze select count(1) from test where random = 1429076987  
and  random  1429076987 + 1000;
  QUERY PLAN
  
--
 Aggregate  (cost=204165.55..204165.55 rows=1 width=0) (actual 
time=12334.042..12334.045 rows=1 loops=1)
   -  Seq Scan on test  (cost=0.00..204057.62 rows=43167 width=0) (actual 
time=17.436..12174.150 rows=46764 loops=1)
 Filter: ((random = 1429076987) AND (random  1439076987))
 Total runtime: 12334.156 ms
(4 rows)

Obviously in this case sequential scan was (would have been) a huge win.
Incrementing random_page_cost from 4 (the default) to 5 causes the
planner to make a better decision.

-jwb