> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat. A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).
> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:
Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:
Column | Type | Cardinality
--------+-------------------+-----------
c1 | integer |
i10 | integer | 10
i50 | integer | 50
n10 | numeric | 10
n50 | numeric | 50
ctext | character varying |
Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):
-------------------------------------------------------------------------
Column B-tree size Bitmap size B-tree time Bitmap time
-------------------------------------------------------------------------
i10 35056 2392 11.8 6.0
i50 35056 4328 10.8 6.4
n10 52264 2656 34.8 9.6
n50 52616 4272 34.3 11.7
-------------------------------------------------------------------------
Query timings (in seconds):
-------------------------------------------------------------------------
Query B-tree Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where 4.08 1.61
i10=5 and i50=2 and n10=0.1 and n50=0.05;
-------------------------------------------------------------------------
This test case fits in memory, the benefits seen will be more if we take
bigger test case.
I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to Bitmap And¹ and
Bitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
Bitmap And¹ and Bitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.
Ayush
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org