On Thu, Jul 27, 2006 at 09:13:21AM -0700, Jie Zhang wrote:
> On 7/26/06 11:50 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:
> > "Jie Zhang" <[EMAIL PROTECTED]> writes:
> >> On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:
> >>> ... A nonuniform distribution would probably mean that some
> >>> of the bitmaps compress better-than-expected and others worse.  I have
> >>> no idea how to model that and guess what the overall result is ...
> > 
> >> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
> >> Wu et al gave an approximate answer for this question. Assume that there 
> >> are
> >> c distinct values. Let the i-th value has a probability of p_i, the number
> >> of rows r, and the word size w. then the total size of the compressed 
> >> bitmap
> >> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where 
> >> in
> >> both \sum's, i is from 1 to c.
> > 
> > Hm, but that's still begging the question no?  It's still assuming that
> > any one value is uniformly distributed.  ISTM the cases that would break
> > my simplistic calculation involve clustering of particular values, such
> > that some areas of the bitmap are all-zero while other areas have lots
> > of ones.
> Yes, you are right -- each value is still uniformly distributed. But this
> will be the worst case in terms of the size of a bitmap vector. As for how
> to model the size of a bitmap vector for an non-uniformly distributed value,
> that's a good question. I don't really know. But we do know the best case
> and the worse case.

If the usefulness of bitmap indexes is still in doubt, could someone at
Greenplum provide data from actual data warehouses from actual
Jim C. Nasby, Sr. Engineering Consultant      [EMAIL PROTECTED]
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to