Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. We have such a low default random_page_cost primarily to mask other problems in the optimizer, two of which are . multi-column index correlation . interpolation between min_IO_Cost and max_IO_cost which approximates max_IO_cost too fast. One other important figure here is concurrency. If you have a lot of backends concurrently doing other IO, your sequential IO numbers will suffer more than random IO numbers. Might be, that some super smart OS readahead implementation aleviates that problem, but I have not yet experienced one. Also less of random IO tends to get higher cache rates. Thus I think if you are alone, 4 tends to be too low, while with concurrent load 4 tends to be too high. (All assuming farly large tables, that don't fit into RAM) Andreas ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
But to get the estimated cost ratio to match up with the actual cost ratio, we'd have to raise random_page_cost to nearly 70, which is a bit hard to credit. What was the platform being tested here? Why ? Numbers for modern single disks are 1-2Mb/s 8k random and 50-120 Mb/s sequential. Andreas ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
On Tue, 17 May 2005 22:12:17 -0700, Jeffrey W. Baker [EMAIL PROTECTED] wrote: Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. We have such a low default random_page_cost primarily to mask other problems in the optimizer, two of which are . multi-column index correlation . interpolation between min_IO_Cost and max_IO_cost which approximates max_IO_cost too fast. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
Jeffrey W. Baker [EMAIL PROTECTED] writes: ... If bitmap scan is disabled, the planner will pick index scan even in cases when sequential scan is 10x faster: scratch=# set enable_bitmapscan to off; SET scratch=# explain analyze select count(1) from test where random = 1429076987 and random 1429076987 + 1000; QUERY PLAN Aggregate (cost=170142.03..170142.03 rows=1 width=0) (actual time=177419.182..177419.185 rows=1 loops=1) - Index Scan using test_rand_idx on test (cost=0.00..170034.11 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1) Index Cond: ((random = 1429076987) AND (random 1439076987)) Total runtime: 177419.302 ms (4 rows) scratch=# set enable_indexscan to off; SET scratch=# explain analyze select count(1) from test where random = 1429076987 and random 1429076987 + 1000; QUERY PLAN -- Aggregate (cost=204165.55..204165.55 rows=1 width=0) (actual time=12334.042..12334.045 rows=1 loops=1) - Seq Scan on test (cost=0.00..204057.62 rows=43167 width=0) (actual time=17.436..12174.150 rows=46764 loops=1) Filter: ((random = 1429076987) AND (random 1439076987)) Total runtime: 12334.156 ms (4 rows) Obviously in this case sequential scan was (would have been) a huge win. Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. But to get the estimated cost ratio to match up with the actual cost ratio, we'd have to raise random_page_cost to nearly 70, which is a bit hard to credit. What was the platform being tested here? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Bitmap scan cost model (was Re: bitmap scans, btree scans, and tid order)
On Mon, 2005-05-16 at 14:35 -0400, Tom Lane wrote: Jeffrey W. Baker [EMAIL PROTECTED] writes: On Mon, 2005-05-16 at 09:53 -0400, Tom Lane wrote: This is a fallacy, and I think your concern is largely mistaken. Have you experimented with the cases you are worried about? Perhaps I have not stated the problem clearly. Believe me, I have experimented extensively with this problem. Sorry, perhaps I wasn't clear: have you experimented *with CVS tip*? The current code is certainly capable of choosing either seqscan, bitmap scan, or traditional index scan depending on the given query. For example, [...] I think the work that's most needed at the moment is to test the bitmap-scan cost model to see if it has much to do with reality... The cost model seems to work pretty darn well, as a matter of fact. This table is in-order by id, in-order by date, and randomly ordered by random. scratch=# \d test Table public.test Column | Type | Modifiers +-+--- id | integer | date | date| random | integer | Indexes: test_pk UNIQUE, btree (id) test_date_idx btree (date) test_rand_idx btree (random) scratch=# select count(1) from test; count -- 1000 The size of the tables and indexes is about 1G, or double physical memory. I tested by selecting on the random column. For 100 random values, I selected the row that matches exactly, then in ranges of 1000, 1, 10, and 100. These touch roughly 5, 50, 500, and 5000 tuples, respectively. The test is repeated for index scan, bitmap scan, and sequential scan. Example query: select count(1) from test where random = 1429076987 and random 1429076987 + 1; QUERY PLAN -- Aggregate (cost=171.31..171.31 rows=1 width=0) (actual time=0.996..1.000 rows=1 loops=1) - Bitmap Heap Scan on test (cost=2.26..171.20 rows=43 width=0) (actual time=0.145..0.829 rows=42 loops=1) Recheck Cond: ((random = 1429076987) AND (random 1429086987)) - Bitmap Index Scan on test_rand_idx (cost=0.00..2.26 rows=43 width=0) (actual time=0.063..0.063 rows=42 loops=1) Index Cond: ((random = 1429076987) AND (random 1429086987)) Total runtime: 1.114 ms 100 queries returning | 1 | 5 | 50 | 500 | 5000 | --+-+-+--+---++ bitmap scan | 1s | 2s | 13s | 1m41s | 11m16s | index scan| 1s | 1s | 12s | 2m6s | 24m19s | --+-+-+--+---++ sequential scan | 28m20s| The planner uses index scan for the first two columns, and chooses bitmap scan for the rest. This would seem to be a reasonable decision, given the results. The only real problem with the cost model in further testing is that it doesn't switch to seq scan quickly enough. If bitmap scan is disabled, the planner will pick index scan even in cases when sequential scan is 10x faster: scratch=# set enable_bitmapscan to off; SET scratch=# explain analyze select count(1) from test where random = 1429076987 and random 1429076987 + 1000; QUERY PLAN Aggregate (cost=170142.03..170142.03 rows=1 width=0) (actual time=177419.182..177419.185 rows=1 loops=1) - Index Scan using test_rand_idx on test (cost=0.00..170034.11 rows=43167 width=0) (actual time=0.035..177255.696 rows=46764 loops=1) Index Cond: ((random = 1429076987) AND (random 1439076987)) Total runtime: 177419.302 ms (4 rows) scratch=# set enable_indexscan to off; SET scratch=# explain analyze select count(1) from test where random = 1429076987 and random 1429076987 + 1000; QUERY PLAN -- Aggregate (cost=204165.55..204165.55 rows=1 width=0) (actual time=12334.042..12334.045 rows=1 loops=1) - Seq Scan on test (cost=0.00..204057.62 rows=43167 width=0) (actual time=17.436..12174.150 rows=46764 loops=1) Filter: ((random = 1429076987) AND (random 1439076987)) Total runtime: 12334.156 ms (4 rows) Obviously in this case sequential scan was (would have been) a huge win. Incrementing random_page_cost from 4 (the default) to 5 causes the planner to make a better decision. -jwb