> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat.  A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones). 

> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:
 
Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:
 Column |       Type        | Cardinality
--------+-------------------+-----------
 c1     | integer           |
 i10    | integer           | 10
 i50    | integer           | 50
 n10    | numeric           | 10
 n50    | numeric           | 50
 ctext  | character varying |


Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):

-------------------------------------------------------------------------
Column  B-tree size     Bitmap size     B-tree time     Bitmap time
-------------------------------------------------------------------------
i10     35056           2392            11.8            6.0
i50     35056           4328            10.8            6.4
n10     52264           2656            34.8            9.6
n50     52616           4272            34.3            11.7
-------------------------------------------------------------------------

Query timings (in seconds):
-------------------------------------------------------------------------
Query                                           B-tree          Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where           4.08            1.61
i10=5 and i50=2 and  n10=0.1 and n50=0.05;
-------------------------------------------------------------------------

This test case fits in memory, the benefits seen will be more if we take
bigger test case.
 
I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to ŒBitmap And¹ and
ŒBitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
ŒBitmap And¹ and ŒBitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.

Ayush



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to