Josh Berkus <josh@agliodbs.com> writes:
> One particular compelling situation for on-disk bitmaps is for terabyte 
> tables where a btree index would not fit into memory.   Index 
> performance for an index which is 10x or more the size of RAM really 
> sucks ... I can come up with some test results if you doubt that.

Sure...

> Also note that "low cardinality" is relative.  For a 1 billion row 
> table, a column with 10,000 values is "low-cardinality", having around 
> 100,000 rows per value ... but that's still 0.01% of the table per 
> value, making index use still applicable.

And your point is?  Assuming a reasonable datatype like int4, a btree
index on this table would occupy say 20GB (16 bytes per entry plus
fillfactor overhead).  The bitmap version would require 10,000 bitmaps
of 1G bits apiece, or 1250GB (not even counting overhead).  You need
some wildly optimistic assumptions about the compressibility of the
bitmaps to get even within hailing distance of the btree, let alone
fit in RAM.  A realistic assumption for the numbers you mention is
that the bitmaps have 1-bits about 100 bits apart, which means you
could get maybe 3-to-1 compression using the runlength scheme that's
in there ... leaving the bitmap a factor of 20 bigger than the btree.

AFAICS "low cardinality" has to mean just that, a few dozen distinct
values at most, for this scheme to have any hope.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to