Re: [HACKERS] Bitmap index thoughts

2006-12-28 Thread Jie Zhang

On 12/27/06 3:10 AM, Gavin Sherry [EMAIL PROTECTED] wrote:

 On Wed, 27 Dec 2006, Heikki Linnakangas wrote:
 
 Gavin Sherry wrote:
 On Tue, 26 Dec 2006, Heikki Linnakangas wrote:
 for typical bitmap index use cases and most of the needed pages should
 stay in memory, but could we simplify this? Why do we need the auxiliary
 heap, couldn't we just store the blk+offset of the LOV item directly in
 the b-tree index item?
 
 The problem is, the b-tree code is very much tied to the heap. I don't
 want to modify the b-tree code to make bitmap indexes work (better).
 What's really tempting is to just manage our own balanced tree within the
 bitmap index file(s) itself. It would start from the metapage and simply
 spill to other 'special' index pages when necessary. The problem is, we do
 not have b-tree code generic enough that it would allow us to do this
 trivially -- consider concurrency and WAL in particular, which we
 currently get for free. I guess this is why I've been ignoring this issue
 :-).
 
 Maybe we could reuse the code in ginbtree.c. Looks like Teodor  Oleg
 had the same problem :).
 
 Modifying the nbtree code doesn't seem that difficult either. AFAICS,
 the only places where the heap is from within nbtree code is in index
 building and uniqueness checks.
 
 I'll take a look and think about it.
 

I will take a look at it as well. I would also like to add eventually the
values for bm_last_tid_location in all bitmap page to the lov heap.
Combining this with the attribute values, we can easily jump into a specific
bitmap page. This will be useful during vacuuming and insertion after
vacuuming -- we can jump into a page to modify a bit. I am not sure if this
will affect the use of the btree code without the heap. But if we can get
rid of the lov heap, that would be great.

 
 And instead of having separate LOV pages that store a number of LOV
 items, how about storing each LOV item on a page of it's own, and using
 the rest of the page to store the last chunk of the bitmap. That would
 eliminate one page access, but more importantly, maybe we could then get
 rid of all the bm_last_* attributes in BMLOVItemData that complicate the
 patch quite a bit, while preserving the performance.
 
 That's an interesting approach. We would still need a concept of
 last_word, at the very least, and probably last_comp_word for convenience.
 
 Why?
 
 The two last words of the bitmap vector, stored in the LOV item, provide a
 few things: they buffer the addition of new matches in the vector and they
 ease the calculation of the current position of the end of the vector.
 Jie, do you want to add more?
 
 I think we could do this differently by calculating everything based on
 the other data stored in the lov item and data page (last_tid_location and
 num_hrl_words_used, from memory). But, I'm not sure. Jie?

Sure. When appending a new bit into an existing bitmap, the last two words
are needed to compute new HRL words, and they are the only words affected by
this new appending bit. So we need a way to know these two words easily.
Like Gavin said, separating them from other words is for convenience.

Also, we won't be able to get rid of bm_last_setbit. We need bm_last_setbit
to tell us how many zeros are there between two bit 1s. There is no other
way to know this.

Thanks,
Jie



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bitmap index thoughts

2006-12-28 Thread Jie Zhang

On 12/27/06 3:16 AM, Gavin Sherry [EMAIL PROTECTED] wrote:

 On Wed, 27 Dec 2006, Heikki Linnakangas wrote:
 
 Jie Zhang wrote:
 The bitmap data segment sounds good in terms of space. The problem is that
 one bitmap is likely to occupy more pages than before, which may hurt the
 query performance.
 
 We could have segments of say 1/5 of page. When a bitmap grows larger
 than that, the bitmap would be moved to a page of its own. That way we
 wouldn't get unnecessary fragmentation with large bitmaps, but small
 bitmaps would be stored efficiently.
 
 Yes.

I see. This sounds good. I will think more about this.

Thanks,
Jie



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bitmap index thoughts

2006-12-27 Thread Jie Zhang

 And instead of having separate LOV pages that store a number of LOV
 items, how about storing each LOV item on a page of it's own, and using
 the rest of the page to store the last chunk of the bitmap. That would
 eliminate one page access, but more importantly, maybe we could then get
 rid of all the bm_last_* attributes in BMLOVItemData that complicate the
 patch quite a bit, while preserving the performance.
 
 That's an interesting approach. We would still need a concept of
 last_word, at the very least, and probably last_comp_word for convenience.
 Your idea doesn't require any extra space, either, which is good.
 Something I've been working through is the idea of a 'bitmap data
 segment'. Currently, we store the HRL compressed bitmap data to the extent
 of the page. That is, sizeof(BMBitmap) is as close to BLCKSZ as we can
 make it. The problem is that we may have some bitmaps where a few values
 occur only a small number of times and are well clustered at the beginning
 of the heap. In that circumstance, we use a whole page to store a small
 number of words and the free space cannot be used by any other vector.
 Now, say a segment was some fraction the size of BLCKSZ, we use less space
 for those vectors with few tuples in the heap. Just an idea...

The bitmap data segment sounds good in terms of space. The problem is that
one bitmap is likely to occupy more pages than before, which may hurt the
query performance. I have been thinking along the lines of increasing the
number of last bitmap words stored in each LOV item, but not to occupy one
page. This may prevent some cases Gavin indicated here, but not all.

Thanks,
Jie



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bitmap index status

2006-10-18 Thread Jie Zhang

On 10/18/06 2:41 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Hi,
 
 I don't want to harass you :), but what's the status with the bitmap
 index code? Is there something I can do to help?

Not at all. We will send you the new patch soon. Your comments are most
appreciated.

Thanks,
Jie



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bitmap index status

2006-09-25 Thread Jie Zhang
Hi Mark,

Thanks for doing the test. I checked out the link you provided below. I am a
little confused about the goal of these tests. Do you plan to test the
overall performance of postgreSQL on handling TPC-H queries?

Thanks,
Jie

On 9/22/06 3:45 PM, Mark Wong [EMAIL PROTECTED] wrote:

 Jie Zhang wrote:
 Hi Heikki and all,
 
 I just sent the latest bitmap index patch to the list. I am not sure if
 there is any size limit for this mailing list. If you have received my
 previous email, please let me know.
 
 Hi Jie,
 
 I know I said I was going to get testing on this months ago but I've
 been juggling between 3 systems due to disk failures and other hardware
 configuration issues.  Anyways, I've take a baseline run of only the
 power test using a 1GB database with the patch 09-17 patch against a
 snapshot of pgsql from 2006-09-17:
 
 http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/
 
 Do you think the 1GB scale factor will be sufficient for testing as it
 will certainly be faster?  Do you think testing with just a power test
 will be sufficient for now?  I really don't have a good reason why I
 didn't run a throughput test other than to save time. :)  I also wanted
 to get your opinion again on which indexes we will want to try first.
 
 Thanks,
 Mark
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bitmap index status

2006-09-23 Thread Jie Zhang
Gavin  Heikki,

 
 The handling of stream and hash bitmaps looks pretty complicated to me.
 All the bitmap-related nodes have logic to handle both types slightly
 differently. It all seems to come down to that if a subnode (or
 amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
 caller needs to call the subnode many times, until it returns a NULL.
 With a HashBitmap, the caller only calls the subnode once.
 
 I think amgetbitmap should be called just once per index scan, and it
 should return either a hash bitmap or a stream bitmap. The same applies
 to all the executor nodes that return bitmaps, they would only return a
 single HashBitmap or StreamBitmap, and the upper node would call
 tbm_iterate repeatedly on that.
 
 StreamBitmap would contain a callback (filled by the indexam) that
 tbm_iterate would call to fill the next TBMIterateResult.
 
 Right, this was the approach taken by an earlier version of the patch I
 had worked on. It was significantly uglified by the need to keep the index
 state around and to be careful about what amrescan might do behind out
 back. I will, however, introduce the idea again because it makes the code
 much cleaner and more logical, as you seem to suggest.
 

I have been thinking about this approach some more. I do agree that this is
more attractive now. The following includes some more detailed design.
Please let me know if you have any comments. (My apologies to Gavin. You
talked to me about this approach before. But you introduced some on-disk
bitmap specific code into the tidbitmap.c, which prevented me from looking
more in this direction.)

Essentially, we want to have a stream bitmap object that has an iterator,
which will be able to iterate through the bitmaps. The BitmapIndexscan,
BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate through
the bitmaps.

The StreamBitmap structure will look like below.

struct StreamBitmap {
NodeTag   type;   /* to make it a valid Node */
PagetableEntryentry;  /* a page of tids in this stream bitmap */

/* the iterator function */
void(*next)(StreamBitmap*);
Node*   state;/* store how this stream bitmap generated,
 and all necessary information to
 obtain the next stream bitmap. */
};

Two new state objects will look like below. At the same time, we introduce
three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
And T_StreamBitmapIndex, to define different states.

/*
 * Stores the necessary information for iterating through the stream bitmaps
 * generated by nodeBitmapAnd or nodeBitmapOr.
 */
struct StreamBitmapOp {
NodeTag type;  /* handles T_StreamBitmapAND and T_StreamBitmapOR */
List*   bitmaps;
};

/*
 * Stores some necessary information for iterating through the stream
 * bitmaps generated by nodeBitmapIndexscan.
 */
struct StreamBitmapIndex {
NodeTag type; /* handle T_StreamBitmapIndex */
IndexScanDescscan;
BlockNumbernextBlockNo;/* next block no to be read */
};

Then we will have the iterator functions like the following:

void StreamBitmapAndNext(StreamBitmap* node) {
  tbm_intersect_stream(((StreampBitmapOp*) node-state)-bitmaps, node);
}

void StreamBitmapOrNext(StreamBitmap* node) {
  tbm_union_stream(((StreampBitmapOp*) node-state)-bitmaps, node);
}

void StreamBitmapIndexNext(StreamBitmap* node) {
  StreamBitmapIndex* sbi = (StreamBitmapIndex*) node-state;
  amgetbitmap(sbi-scan, NULL, sbi-nextBlockNo);
}

What do you think?

Thanks,
Jie



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bitmap index status

2006-09-20 Thread Jie Zhang

On 9/19/06 5:15 AM, Gavin Sherry [EMAIL PROTECTED] wrote:

 On Tue, 19 Sep 2006, Heikki Linnakangas wrote:
 
 Jie Zhang wrote:
 Hi Heikki and all,
 
 Please find the latest bitmap index patch in the attachment. This patch is
 generated against the postgresql cvs head.
 
 
 Thanks.
 
 The handling of stream and hash bitmaps looks pretty complicated to me.
 All the bitmap-related nodes have logic to handle both types slightly
 differently. It all seems to come down to that if a subnode (or
 amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the
 caller needs to call the subnode many times, until it returns a NULL.
 With a HashBitmap, the caller only calls the subnode once.
 
 I think amgetbitmap should be called just once per index scan, and it
 should return either a hash bitmap or a stream bitmap. The same applies
 to all the executor nodes that return bitmaps, they would only return a
 single HashBitmap or StreamBitmap, and the upper node would call
 tbm_iterate repeatedly on that.
 
 StreamBitmap would contain a callback (filled by the indexam) that
 tbm_iterate would call to fill the next TBMIterateResult.
 
 Right, this was the approach taken by an earlier version of the patch I
 had worked on. It was significantly uglified by the need to keep the index
 state around and to be careful about what amrescan might do behind out
 back. I will, however, introduce the idea again because it makes the code
 much cleaner and more logical, as you seem to suggest.

I will think about this approach. But I am not quite convinced that this
approach will be simpler and cleaner than the above approach. :-)

Thanks,
Jie



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Bitmap index status

2006-09-17 Thread Jie Zhang
Hi Heikki and all,

I just sent the latest bitmap index patch to the list. I am not sure if
there is any size limit for this mailing list. If you have received my
previous email, please let me know.

Thanks,
Jie

On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Hi,
 
 What's the status of the bitmap index patch? Have you worked on it since
 the last posted patch
 (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)?
 
 I've started to review it, to get it into CVS early in the 8.3 cycle. I
 just want to make sure that I'm working on the latest version.
 
 Beside the issues already discussed, I found two minor bugs:
 * pg_am says that bitmap am supports unique indexes, while it doesn't.
 Second,
 * race condition in _bitmap_inserttuple if two backends try to insert
 the same, new value. If they both find that there's no lov item for the
 key, and try to create one, one backend will get a duplicate key error
 on the lov index.
 
 Also, vacuum actually does a reindex, which seems awfully wasteful. That
 needs to be looked at.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Bitmap index status

2006-09-12 Thread Jie Zhang
Hi Heikki,

Gavin and I are trying to merge our changes together this week. We will post
a new patch by the end of this week. This patch will include some style
fixes, bug fixes, and the stream bitmap implementation.

I will look into the problems you have mentioned in this email. Yes, vacuum
currently does a reindex now. Gavin and I just talked about this yesterday.
We are looking into ways to improve this. One way is not to do reindex for
each vacuum. We maintain a list of updated tids along with the bitmap index.
Only when this list goes to a certain point, vacuum will re-build the index.

Thanks,
Jie

On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote:

 Hi,
 
 What's the status of the bitmap index patch? Have you worked on it since
 the last posted patch
 (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)?
 
 I've started to review it, to get it into CVS early in the 8.3 cycle. I
 just want to make sure that I'm working on the latest version.
 
 Beside the issues already discussed, I found two minor bugs:
 * pg_am says that bitmap am supports unique indexes, while it doesn't.
 Second,
 * race condition in _bitmap_inserttuple if two backends try to insert
 the same, new value. If they both find that there's no lov item for the
 key, and try to create one, one backend will get a duplicate key error
 on the lov index.
 
 Also, vacuum actually does a reindex, which seems awfully wasteful. That
 needs to be looked at.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WIP: bitmap indexes

2006-08-17 Thread Jie Zhang
 It occurs to me that what tbm_begin_iterate really is is a constructor
 for a stream bitmap object that reads out the contents of a tbm bitmap
 (we need a decent name for the non-stream data structure ... maybe
 hash bitmap?).  If we think of it like that then we can unify the
 ideas some more.
 
 My proposal at this point would be to invent two different Node types,
 one for stream bitmaps and one for hash bitmaps.  The initial input to
 nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
 it stream-ifies it for use.  amgetmulti can return either kind, and
 nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
 Preserving the existing optimization for ORing hash bitmaps is a bit
 tricky but I think it's doable.  Consider this API for amgetmulti:
 
 amgetmulti takes an argument which can be either a hash bitmap or NULL.
 It returns an object that must be either a hash or stream bitmap.
 If it wants to return a stream bitmap, it simply disregards the argument
 and returns a constructed stream-bitmap object.  If it wants to return
 a hash bitmap, then if the argument is not NULL, OR the additional bits
 into the argument object and return it; if the argument is NULL,
 construct a fresh hash-bitmap object, set bits in it, and return it.

This sounds great. One thing I am concern about is that this will add the
dependency of node types into the access methods. If we still keep
nodeBitmapIndexscan and let it do the bitmap construction for tids returned
by amgetmulti.

Otherwise, I think that I get a pretty good picture about how to do this.

Thanks,
Jie



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] WIP: bitmap indexes

2006-08-17 Thread Jie Zhang

On 8/17/06 5:54 AM, Tom Lane [EMAIL PROTECTED] wrote:

 Jie Zhang [EMAIL PROTECTED] writes:
 This sounds great. One thing I am concern about is that this will add the
 dependency of node types into the access methods. If we still keep
 nodeBitmapIndexscan and let it do the bitmap construction for tids returned
 by amgetmulti.
 
 No, I'm assuming the other proposal that was on the table, namely to get
 rid of amgetmulti in its current form and instead have an AM call that
 delivers a bitmap in one step.  (Probably should rename the pg_am column
 to something like amgetbitmap.)  nodeBitmapIndexscan would become pretty
 trivial.  For the existing AMs this just means that they call
 tbm_add_tuple(s) for themselves, which is no big problem, especially
 considering that they probably get to save some code by not having to
 stop the indexscan when the buffer array gets full.

This sounds good. Another problem is about ScalarArrayOpExpr support in
current nodeBitmapIndexscan. This will not work for stream bitmaps. We have
to disable it in the optimizer.

Thanks,
Jie



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WIP: bitmap indexes

2006-08-17 Thread Jie Zhang


On 8/17/06 12:29 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Jie Zhang [EMAIL PROTECTED] writes:
 This sounds good. Another problem is about ScalarArrayOpExpr support in
 current nodeBitmapIndexscan. This will not work for stream bitmaps.
 
 Sure it will; it's just an OR.
 

Yes, it is just an OR. But we have to introduce an OR logic in
nodeBitmapIndexscan node for stream bitmaps. We can't do what it is done
right now: scanning the underline index one key at a time. We have to scan
the underline index for all keys at the same time. It seems simply that we
just introduce another BitmapOr in the path.

Thanks,
Jie



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] WIP: bitmap indexes

2006-08-16 Thread Jie Zhang


On 8/15/06 6:18 AM, Tom Lane [EMAIL PROTECTED] wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
 On Mon, 14 Aug 2006, Tom Lane wrote:
 Correct me if I'm wrong, but isn't the patch's present hacking on the
 executor intended to make it happen like this?
 
 Not really. It reads ahead on the bitmap index and passes back the bitmap
 words. The other executor routines are hacked up to process the data in
 this format.
 
 Well, as I said, I don't think there's justification for exposing a
 bitmap index's internal data formats to the rest of the system like
 that: it's not very future-proof and I don't see that it's buying any
 significant performance gain.  At some point you have to convert to TIDs
 anyway, at least in the sense of knowing what page and line number you
 are at, so passing the data as an array of TIDs really isn't going to
 hurt much.  So my advice is to rip out all those changes and go back to
 the existing tidbitmap.c readout API.  There's nothing wrong with
 the TBMIterateResult data structure.


The bitmap words in the bitmap index are very simple and can be very
generic. You can think about them as one bit per tuple along with some
padding bits between heap pages. The problem I have is that I do not know a
good way to construct an in-memory version of this for other index
structures, like b-tree. To be able to handle both cases nicely, you are
right -- TBMIterateResult is better. Or, PagetableEntry may be better since
it will make AND/OR easier.

 What I do find interesting to think about is whether, strictly within
 tidbitmap.c, there could be an alternate kind of bitmap object that
 doesn't have to materialize the whole bitmap for an indexscan in memory
 because it knows it can fetch the data on-demand, ie, build the next
 page TBMIterateResult data structure on-the-fly from the index when it's
 requested.  Call it a stream bitmap in contrast to the present
 materialized bitmaps.  The trick here is to be able to AND and OR a
 stream bitmap with another stream bitmap or a materialized bitmap.
 I don't see any reason in principle why that couldn't be done: the
 output of the AND/OR would be a stream of TBMIterateResults just like
 the inputs.  That is, it's another stream bitmap, but with a different
 generating function and some internal state that includes its source
 bitmaps.  You'd have to sort a materialized bitmap into order before
 starting to AND/OR it with a stream bitmap, but that code is there
 already.

I like this idea. I think that we can define a new TBMStatus to be
TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
that we add a new returning bool argument, stating if this is a stream
bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
bitmaps, the result bitmap is a stream bitmap if there is at least one
bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
be able to pull more data from its subnode.

Thanks,
Jie



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] On-disk bitmap index patch

2006-07-27 Thread Jie Zhang



On 7/26/06 10:14 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Mark Kirkwood [EMAIL PROTECTED] writes:
 An obvious deduction is that the TPCH dataset is much more amenable to
 run compression than my synthetic Zipfian data was. The interesting
 question is how well real datasets are run compressable,
 
 Yeah --- the back-of-the-envelope calculations I was making presupposed
 uniform random distribution, and we know that's often not realistic for
 real datasets.  A nonuniform distribution would probably mean that some
 of the bitmaps compress better-than-expected and others worse.  I have
 no idea how to model that and guess what the overall result is ...
 

The paper Optimizing Bitmap Indices With Efficient Compression by Kesheng
Wu et al gave an approximate answer for this question. Assume that there are
c distinct values. Let the i-th value has a probability of p_i, the number
of rows r, and the word size w. then the total size of the compressed bitmap
index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
both \sum's, i is from 1 to c.

The constraint for this equation is \sum(p_i)=1. Therefore, when all p_i are
equal, or the attribute has randomly distributed values, the size of the
bitmap index is the largest.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] On-disk bitmap index patch

2006-07-27 Thread Jie Zhang


On 7/26/06 11:50 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Jie Zhang [EMAIL PROTECTED] writes:
 On 7/26/06 10:14 PM, Tom Lane [EMAIL PROTECTED] wrote:
 ... A nonuniform distribution would probably mean that some
 of the bitmaps compress better-than-expected and others worse.  I have
 no idea how to model that and guess what the overall result is ...
 
 The paper Optimizing Bitmap Indices With Efficient Compression by Kesheng
 Wu et al gave an approximate answer for this question. Assume that there are
 c distinct values. Let the i-th value has a probability of p_i, the number
 of rows r, and the word size w. then the total size of the compressed bitmap
 index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
 both \sum's, i is from 1 to c.
 
 Hm, but that's still begging the question no?  It's still assuming that
 any one value is uniformly distributed.  ISTM the cases that would break
 my simplistic calculation involve clustering of particular values, such
 that some areas of the bitmap are all-zero while other areas have lots
 of ones.

Yes, you are right -- each value is still uniformly distributed. But this
will be the worst case in terms of the size of a bitmap vector. As for how
to model the size of a bitmap vector for an non-uniformly distributed value,
that's a good question. I don't really know. But we do know the best case
and the worse case.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] On-disk bitmap index patch

2006-07-26 Thread Jie Zhang

On 7/26/06 8:55 PM, Luke Lonergan [EMAIL PROTECTED] wrote:

 Tom,
 
 On 7/26/06 7:26 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
 I wonder
 whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs
 
 We tried variations from 8-bit to 32-bit and came to the conclusion that
 8-bit was a better match for the more general case.  For the moment I forget
 exactly why (maybe Jie or Ayush will recall).  It's a #define I believe, so
 you can play with it to find out.
 
 There's a lot more to be done with this - I think we've got some big gains
 ahead.

For low-cardinality cases, 8-bit word size is usually better than 16-bit and
32-bit. The reason is that in this case, the compression is better in 8-bit
than in 16-bit or 32-bit. But for high-cardinality cases, like 10,000,
16-bit is better. That's because a compressed 8-bit can only represent at
most 2^7*8=1024 bits. So every 10,000 bits requires about 10 bytes. However,
a compressed 16-bit can represent 2^15*16=524288 bits. We only need 2 bytes.

I have been thinking about this recently -- maybe we can support different
word sizes for different cardinalities.

 
 BTW - lots of excitement here at OSCON about bitmap index, and there's a
 fellow who is using the current Bizgres version, but wants *higher*
 cardinality support, so I discussed Jie's thoughts about implementing
 binning on values, basically combining bit vectors into value bins as the
 cardinality increases.

Yes, that's the basic idea. Essentially we want to limit the number of bins
into an ideal value so that (1) the size of the bitmap index can be small;
(2) this will still guarantee good query performance. The details still need
to be working out.

 
 I am still puzzled why our index creation time increases so dramatically as
 we approach cardinality 10,000.  I know that we found that index page
 traversal for append began to take a lot of time as we started to increase
 the number of bitvector pages - we had talked about aggregating the append
 operations into groups before they were implemented to sequentialize the
 access.
 

I believe that this is because of 8-bit word size. We can try to increase it
to 16-bit, and see how the result looks like.

Thanks,
Jie



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Jie Zhang

Thanks Tom and Gavin for your comments!

Yes, this patch is generated against 8.2 in a short time. As long as the
code is working, I post the patch to get some comments and help.

 
 * The xlog routines need help; they seem to not be updated for recent
 changes in the API for xlog recovery code.
 
 Yep. The patch was actually against 8.1 and was hastily brought up to 8.2.
 I think Jie's intention was to simply let everyone know that this was
 going on.

Thanks for pointing this out. I didn't notice that these are changed in 8.2.

 
 
 * The hacks on vacuum.c (where it tests the AM name) are utterly
 unacceptable.  If you need the main code to do something special for a
 particular index AM, define a bool flag column for it in pg_am.
 
 Yes.

Sounds good.

 
 
 * The interface to the existing executor bitmap scan code is mighty
 messy --- seems like there's a lot of almost duplicate code, a lot
 of rather invasive hacking, etc.  This needs to be rethought and
 refactored.
 
 I agree.

I will think about this more.

 
 
 * Why in the world is it introducing duplicate copies of a lot
 of btree comparison functions?  Use the ones that are there.
 
 Yes, I raised this with Jie and she has fixed it. One thought is, we may
 want to rename those comparison functions prefixed with 'bm' to make their
 naming less confusing. They'll be used by btree, gin and bitmap index
 methods. Anyway, a seperate patch.

Yeah, the main problem I hesitated to use btree's comparison functions
because of those function names starting with 'bt'. Since Gavin told me that
Gin is using those functions as well, I had changed them. Renaming them
would be good.

 
 
 * The patch itself is a mess: it introduces .orig and .rej files,
 changes around $PostgreSQL$ lines, etc.
 
 
 Right, not to mention patches to configure and a lot of style which needs
 to be knocked into shape.
 

The way I generate a patch is kind of clumsy. I need to find a better way to
do that.

I will start fixing these.

Thanks,
Jie



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Jie Zhang


On 7/24/06 6:59 AM, Hannu Krosing [EMAIL PROTECTED] wrote:

 Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane:
 Gavin Sherry [EMAIL PROTECTED] writes:
 On Sun, 23 Jul 2006, Tom Lane wrote:
 However, the main problem I've got with this is that a new index AM is a
 pretty large burden, and no one's made the slightest effort to sell
 pghackers on taking this on.
 
 For low cardinality sets, bitmaps greatly out perform btree.
 
 If the column is sufficiently low cardinality, you might as well just do
 a seqscan --- you'll be hitting most of the heap's pages anyway.  I'm
 still waiting to be convinced that there's a sweet spot wide enough to
 justify supporting another index AM.  (I'm also wondering whether this
 doesn't overlap the use-case for GIN.)
 
 IIRC they quoted the cardinality of 1 as something that is still
 faster than btree for several usecases.
 
 And also for AND-s of several indexes, where indexes are BIG, your btree
 indexes may be almost as big as tables but the resulting set of pages is
 small.

Yeah, Hannu points it out very well -- the bitmap index works very well when
columns have low cardinalities and AND operations will produce small number
of results.

Also, the bitmap index is very small in low cardinality cases, where the
btree tends to take up at least 10 times more space.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] On-disk bitmap index patch

2006-07-24 Thread Jie Zhang


On 7/24/06 6:04 PM, Bruce Momjian [EMAIL PROTECTED] wrote:

 Jie Zhang wrote:
 IIRC they quoted the cardinality of 1 as something that is still
 faster than btree for several usecases.
 
 And also for AND-s of several indexes, where indexes are BIG, your btree
 indexes may be almost as big as tables but the resulting set of pages is
 small.
 
 Yeah, Hannu points it out very well -- the bitmap index works very well when
 columns have low cardinalities and AND operations will produce small number
 of results.
 
 What operations on columns of low cardinality produce a small number of
 results?  That seems contradictory.

Let's see an example. Table 'T' includes two columns, 'p' and 's'. The
column 'p' has 100 distinct values, say p1-p100 and the column 'status' has
20 distinct values, say s1-s20. The query

  'select * from order where priority=p1 and status=s1'

may produce small number of results. Also, if these related rows are
clustered together, that would be even better.

 
 Also, the bitmap index is very small in low cardinality cases, where the
 btree tends to take up at least 10 times more space.
 
 Also, are adding/changing rows is more expensive with bitmaps than
 btrees?

Inserting a row will only affect the last word (at most last several words)
of a bitmap vector, so this should not be very expensive: 3-4 IOs. When a
row is updated and the new row is inserted in the middle of the heap,
currently the code will update the bit in the place -- where the bit should
be. Searching for the page which includes the bit to be updated is not very
efficient now, but this can be fixed. Currently, we have to scan the pages
for a bitmap vector one by one until we hit the right page. Since the bitmap
vector is compressed, updating a bit in the middle may cause its page to
overflow. In this case, we create a new page to accommodate those extra
bits, and insert this new page right after the original page.

Overall, inserting a row or updating a row can be done efficiently. But it
is true that the bitmap index does not perform well if there are lots of
inserts and updates, especially updates.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] On-disk bitmap index patch

2006-07-18 Thread Jie Zhang
Hi,

I have posted a patch to the CVS head for on-disk bitmap index to
pgsql-patches. If this can get in 8.2, that would be great. Any comments and
suggestions are welcome.

I still need to add several items:

(1) README file in src/backend/access/bitmap.
(2) Bitmap index documentation.
(3) Hiding the internal btree.

Also, I have disabled the multi-column index support because there is a
known problem. Assume that there is a bitmap index on a and b. When a query
predicate has only a, the current code may generate a wrong result. That's
because the current code assumes that b is null. The ultimate problem is
because the search code only handles one bitmap vector now. I need a fix to
support manipulating multiple bitmap vectors.

If you find any other problems, please let me know.

Thanks,
Jie



---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] An In-memory Bitmap Index Bug

2005-07-23 Thread Jie Zhang

There is a bug in the in-memory bitmap index on the CVS Tip.

Consider this query: select * from bt1 where g=2 and e=20, which will generate 
the following query plan:

   QUERY PLAN   


 Bitmap Heap Scan on bt1  (cost=2041.47..19807.47 rows=8451 width=159)
   Recheck Cond: ((e = 20) AND (g = 2))
   -  BitmapAnd  (cost=2041.47..2041.47 rows=8451 width=0)
 -  Bitmap Index Scan on bt1_btree_e  (cost=0.00..145.91 rows=25404 
width=0)
   Index Cond: (e = 20)
 -  Bitmap Index Scan on bt1_btree_g  (cost=0.00..1895.31 rows=332661 
width=0)
   Index Cond: (g = 2)
(7 rows)

With default work_mem setting (1024), the bitmap generated for (e=20) will not 
have lossy pages, while the bitmap generated for (g=2) will have lossy pages.

The problem appears when a page in the first bitmap tries to intersect with the 
second bitmap. The current code didn't set the page in the first bitmap to 
lossy after the intersection. As a result, incorrect answers are returned.

A patch is attached in this email.

Sincerely,

Jie Zhang
Greenplum



pgsql-bitmapscan-bugfix.patch
Description: pgsql-bitmapscan-bugfix.patch

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq