On Thu, Jul 27, 2006 at 09:13:21AM -0700, Jie Zhang wrote: > On 7/26/06 11:50 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > > "Jie Zhang" <[EMAIL PROTECTED]> writes: > >> On 7/26/06 10:14 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > >>> ... A nonuniform distribution would probably mean that some > >>> of the bitmaps compress better-than-expected and others worse. I have > >>> no idea how to model that and guess what the overall result is ... > > > >> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng > >> Wu et al gave an approximate answer for this question. Assume that there > >> are > >> c distinct values. Let the i-th value has a probability of p_i, the number > >> of rows r, and the word size w. then the total size of the compressed > >> bitmap > >> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where > >> in > >> both \sum's, i is from 1 to c. > > > > Hm, but that's still begging the question no? It's still assuming that > > any one value is uniformly distributed. ISTM the cases that would break > > my simplistic calculation involve clustering of particular values, such > > that some areas of the bitmap are all-zero while other areas have lots > > of ones. > > Yes, you are right -- each value is still uniformly distributed. But this > will be the worst case in terms of the size of a bitmap vector. As for how > to model the size of a bitmap vector for an non-uniformly distributed value, > that's a good question. I don't really know. But we do know the best case > and the worse case.

If the usefulness of bitmap indexes is still in doubt, could someone at Greenplum provide data from actual data warehouses from actual customers? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster