Re: [HACKERS] Bitmap index thoughts
On 12/27/06 3:10 AM, Gavin Sherry [EMAIL PROTECTED] wrote: On Wed, 27 Dec 2006, Heikki Linnakangas wrote: Gavin Sherry wrote: On Tue, 26 Dec 2006, Heikki Linnakangas wrote: for typical bitmap index use cases and most of the needed pages should stay in memory, but could we simplify this? Why do we need the auxiliary heap, couldn't we just store the blk+offset of the LOV item directly in the b-tree index item? The problem is, the b-tree code is very much tied to the heap. I don't want to modify the b-tree code to make bitmap indexes work (better). What's really tempting is to just manage our own balanced tree within the bitmap index file(s) itself. It would start from the metapage and simply spill to other 'special' index pages when necessary. The problem is, we do not have b-tree code generic enough that it would allow us to do this trivially -- consider concurrency and WAL in particular, which we currently get for free. I guess this is why I've been ignoring this issue :-). Maybe we could reuse the code in ginbtree.c. Looks like Teodor Oleg had the same problem :). Modifying the nbtree code doesn't seem that difficult either. AFAICS, the only places where the heap is from within nbtree code is in index building and uniqueness checks. I'll take a look and think about it. I will take a look at it as well. I would also like to add eventually the values for bm_last_tid_location in all bitmap page to the lov heap. Combining this with the attribute values, we can easily jump into a specific bitmap page. This will be useful during vacuuming and insertion after vacuuming -- we can jump into a page to modify a bit. I am not sure if this will affect the use of the btree code without the heap. But if we can get rid of the lov heap, that would be great. And instead of having separate LOV pages that store a number of LOV items, how about storing each LOV item on a page of it's own, and using the rest of the page to store the last chunk of the bitmap. That would eliminate one page access, but more importantly, maybe we could then get rid of all the bm_last_* attributes in BMLOVItemData that complicate the patch quite a bit, while preserving the performance. That's an interesting approach. We would still need a concept of last_word, at the very least, and probably last_comp_word for convenience. Why? The two last words of the bitmap vector, stored in the LOV item, provide a few things: they buffer the addition of new matches in the vector and they ease the calculation of the current position of the end of the vector. Jie, do you want to add more? I think we could do this differently by calculating everything based on the other data stored in the lov item and data page (last_tid_location and num_hrl_words_used, from memory). But, I'm not sure. Jie? Sure. When appending a new bit into an existing bitmap, the last two words are needed to compute new HRL words, and they are the only words affected by this new appending bit. So we need a way to know these two words easily. Like Gavin said, separating them from other words is for convenience. Also, we won't be able to get rid of bm_last_setbit. We need bm_last_setbit to tell us how many zeros are there between two bit 1s. There is no other way to know this. Thanks, Jie ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Bitmap index thoughts
On 12/27/06 3:16 AM, Gavin Sherry [EMAIL PROTECTED] wrote: On Wed, 27 Dec 2006, Heikki Linnakangas wrote: Jie Zhang wrote: The bitmap data segment sounds good in terms of space. The problem is that one bitmap is likely to occupy more pages than before, which may hurt the query performance. We could have segments of say 1/5 of page. When a bitmap grows larger than that, the bitmap would be moved to a page of its own. That way we wouldn't get unnecessary fragmentation with large bitmaps, but small bitmaps would be stored efficiently. Yes. I see. This sounds good. I will think more about this. Thanks, Jie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bitmap index thoughts
And instead of having separate LOV pages that store a number of LOV items, how about storing each LOV item on a page of it's own, and using the rest of the page to store the last chunk of the bitmap. That would eliminate one page access, but more importantly, maybe we could then get rid of all the bm_last_* attributes in BMLOVItemData that complicate the patch quite a bit, while preserving the performance. That's an interesting approach. We would still need a concept of last_word, at the very least, and probably last_comp_word for convenience. Your idea doesn't require any extra space, either, which is good. Something I've been working through is the idea of a 'bitmap data segment'. Currently, we store the HRL compressed bitmap data to the extent of the page. That is, sizeof(BMBitmap) is as close to BLCKSZ as we can make it. The problem is that we may have some bitmaps where a few values occur only a small number of times and are well clustered at the beginning of the heap. In that circumstance, we use a whole page to store a small number of words and the free space cannot be used by any other vector. Now, say a segment was some fraction the size of BLCKSZ, we use less space for those vectors with few tuples in the heap. Just an idea... The bitmap data segment sounds good in terms of space. The problem is that one bitmap is likely to occupy more pages than before, which may hurt the query performance. I have been thinking along the lines of increasing the number of last bitmap words stored in each LOV item, but not to occupy one page. This may prevent some cases Gavin indicated here, but not all. Thanks, Jie ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bitmap index status
On 10/18/06 2:41 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Hi, I don't want to harass you :), but what's the status with the bitmap index code? Is there something I can do to help? Not at all. We will send you the new patch soon. Your comments are most appreciated. Thanks, Jie ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bitmap index status
Hi Mark, Thanks for doing the test. I checked out the link you provided below. I am a little confused about the goal of these tests. Do you plan to test the overall performance of postgreSQL on handling TPC-H queries? Thanks, Jie On 9/22/06 3:45 PM, Mark Wong [EMAIL PROTECTED] wrote: Jie Zhang wrote: Hi Heikki and all, I just sent the latest bitmap index patch to the list. I am not sure if there is any size limit for this mailing list. If you have received my previous email, please let me know. Hi Jie, I know I said I was going to get testing on this months ago but I've been juggling between 3 systems due to disk failures and other hardware configuration issues. Anyways, I've take a baseline run of only the power test using a 1GB database with the patch 09-17 patch against a snapshot of pgsql from 2006-09-17: http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/ Do you think the 1GB scale factor will be sufficient for testing as it will certainly be faster? Do you think testing with just a power test will be sufficient for now? I really don't have a good reason why I didn't run a throughput test other than to save time. :) I also wanted to get your opinion again on which indexes we will want to try first. Thanks, Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bitmap index status
Gavin Heikki, The handling of stream and hash bitmaps looks pretty complicated to me. All the bitmap-related nodes have logic to handle both types slightly differently. It all seems to come down to that if a subnode (or amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the caller needs to call the subnode many times, until it returns a NULL. With a HashBitmap, the caller only calls the subnode once. I think amgetbitmap should be called just once per index scan, and it should return either a hash bitmap or a stream bitmap. The same applies to all the executor nodes that return bitmaps, they would only return a single HashBitmap or StreamBitmap, and the upper node would call tbm_iterate repeatedly on that. StreamBitmap would contain a callback (filled by the indexam) that tbm_iterate would call to fill the next TBMIterateResult. Right, this was the approach taken by an earlier version of the patch I had worked on. It was significantly uglified by the need to keep the index state around and to be careful about what amrescan might do behind out back. I will, however, introduce the idea again because it makes the code much cleaner and more logical, as you seem to suggest. I have been thinking about this approach some more. I do agree that this is more attractive now. The following includes some more detailed design. Please let me know if you have any comments. (My apologies to Gavin. You talked to me about this approach before. But you introduced some on-disk bitmap specific code into the tidbitmap.c, which prevented me from looking more in this direction.) Essentially, we want to have a stream bitmap object that has an iterator, which will be able to iterate through the bitmaps. The BitmapIndexscan, BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate through the bitmaps. The StreamBitmap structure will look like below. struct StreamBitmap { NodeTag type; /* to make it a valid Node */ PagetableEntryentry; /* a page of tids in this stream bitmap */ /* the iterator function */ void(*next)(StreamBitmap*); Node* state;/* store how this stream bitmap generated, and all necessary information to obtain the next stream bitmap. */ }; Two new state objects will look like below. At the same time, we introduce three new node types: T_StreamBitmapAND, T_StreamBitmapOR, And T_StreamBitmapIndex, to define different states. /* * Stores the necessary information for iterating through the stream bitmaps * generated by nodeBitmapAnd or nodeBitmapOr. */ struct StreamBitmapOp { NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */ List* bitmaps; }; /* * Stores some necessary information for iterating through the stream * bitmaps generated by nodeBitmapIndexscan. */ struct StreamBitmapIndex { NodeTag type; /* handle T_StreamBitmapIndex */ IndexScanDescscan; BlockNumbernextBlockNo;/* next block no to be read */ }; Then we will have the iterator functions like the following: void StreamBitmapAndNext(StreamBitmap* node) { tbm_intersect_stream(((StreampBitmapOp*) node-state)-bitmaps, node); } void StreamBitmapOrNext(StreamBitmap* node) { tbm_union_stream(((StreampBitmapOp*) node-state)-bitmaps, node); } void StreamBitmapIndexNext(StreamBitmap* node) { StreamBitmapIndex* sbi = (StreamBitmapIndex*) node-state; amgetbitmap(sbi-scan, NULL, sbi-nextBlockNo); } What do you think? Thanks, Jie ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bitmap index status
On 9/19/06 5:15 AM, Gavin Sherry [EMAIL PROTECTED] wrote: On Tue, 19 Sep 2006, Heikki Linnakangas wrote: Jie Zhang wrote: Hi Heikki and all, Please find the latest bitmap index patch in the attachment. This patch is generated against the postgresql cvs head. Thanks. The handling of stream and hash bitmaps looks pretty complicated to me. All the bitmap-related nodes have logic to handle both types slightly differently. It all seems to come down to that if a subnode (or amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the caller needs to call the subnode many times, until it returns a NULL. With a HashBitmap, the caller only calls the subnode once. I think amgetbitmap should be called just once per index scan, and it should return either a hash bitmap or a stream bitmap. The same applies to all the executor nodes that return bitmaps, they would only return a single HashBitmap or StreamBitmap, and the upper node would call tbm_iterate repeatedly on that. StreamBitmap would contain a callback (filled by the indexam) that tbm_iterate would call to fill the next TBMIterateResult. Right, this was the approach taken by an earlier version of the patch I had worked on. It was significantly uglified by the need to keep the index state around and to be careful about what amrescan might do behind out back. I will, however, introduce the idea again because it makes the code much cleaner and more logical, as you seem to suggest. I will think about this approach. But I am not quite convinced that this approach will be simpler and cleaner than the above approach. :-) Thanks, Jie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bitmap index status
Hi Heikki and all, I just sent the latest bitmap index patch to the list. I am not sure if there is any size limit for this mailing list. If you have received my previous email, please let me know. Thanks, Jie On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Hi, What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)? I've started to review it, to get it into CVS early in the 8.3 cycle. I just want to make sure that I'm working on the latest version. Beside the issues already discussed, I found two minor bugs: * pg_am says that bitmap am supports unique indexes, while it doesn't. Second, * race condition in _bitmap_inserttuple if two backends try to insert the same, new value. If they both find that there's no lov item for the key, and try to create one, one backend will get a duplicate key error on the lov index. Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bitmap index status
Hi Heikki, Gavin and I are trying to merge our changes together this week. We will post a new patch by the end of this week. This patch will include some style fixes, bug fixes, and the stream bitmap implementation. I will look into the problems you have mentioned in this email. Yes, vacuum currently does a reindex now. Gavin and I just talked about this yesterday. We are looking into ways to improve this. One way is not to do reindex for each vacuum. We maintain a list of updated tids along with the bitmap index. Only when this list goes to a certain point, vacuum will re-build the index. Thanks, Jie On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Hi, What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)? I've started to review it, to get it into CVS early in the 8.3 cycle. I just want to make sure that I'm working on the latest version. Beside the issues already discussed, I found two minor bugs: * pg_am says that bitmap am supports unique indexes, while it doesn't. Second, * race condition in _bitmap_inserttuple if two backends try to insert the same, new value. If they both find that there's no lov item for the key, and try to create one, one backend will get a duplicate key error on the lov index. Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
It occurs to me that what tbm_begin_iterate really is is a constructor for a stream bitmap object that reads out the contents of a tbm bitmap (we need a decent name for the non-stream data structure ... maybe hash bitmap?). If we think of it like that then we can unify the ideas some more. My proposal at this point would be to invent two different Node types, one for stream bitmaps and one for hash bitmaps. The initial input to nodeBitmapHeapscan can be either, but if it's given a hash bitmap then it stream-ifies it for use. amgetmulti can return either kind, and nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do. Preserving the existing optimization for ORing hash bitmaps is a bit tricky but I think it's doable. Consider this API for amgetmulti: amgetmulti takes an argument which can be either a hash bitmap or NULL. It returns an object that must be either a hash or stream bitmap. If it wants to return a stream bitmap, it simply disregards the argument and returns a constructed stream-bitmap object. If it wants to return a hash bitmap, then if the argument is not NULL, OR the additional bits into the argument object and return it; if the argument is NULL, construct a fresh hash-bitmap object, set bits in it, and return it. This sounds great. One thing I am concern about is that this will add the dependency of node types into the access methods. If we still keep nodeBitmapIndexscan and let it do the bitmap construction for tids returned by amgetmulti. Otherwise, I think that I get a pretty good picture about how to do this. Thanks, Jie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
On 8/17/06 5:54 AM, Tom Lane [EMAIL PROTECTED] wrote: Jie Zhang [EMAIL PROTECTED] writes: This sounds great. One thing I am concern about is that this will add the dependency of node types into the access methods. If we still keep nodeBitmapIndexscan and let it do the bitmap construction for tids returned by amgetmulti. No, I'm assuming the other proposal that was on the table, namely to get rid of amgetmulti in its current form and instead have an AM call that delivers a bitmap in one step. (Probably should rename the pg_am column to something like amgetbitmap.) nodeBitmapIndexscan would become pretty trivial. For the existing AMs this just means that they call tbm_add_tuple(s) for themselves, which is no big problem, especially considering that they probably get to save some code by not having to stop the indexscan when the buffer array gets full. This sounds good. Another problem is about ScalarArrayOpExpr support in current nodeBitmapIndexscan. This will not work for stream bitmaps. We have to disable it in the optimizer. Thanks, Jie ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
On 8/17/06 12:29 PM, Tom Lane [EMAIL PROTECTED] wrote: Jie Zhang [EMAIL PROTECTED] writes: This sounds good. Another problem is about ScalarArrayOpExpr support in current nodeBitmapIndexscan. This will not work for stream bitmaps. Sure it will; it's just an OR. Yes, it is just an OR. But we have to introduce an OR logic in nodeBitmapIndexscan node for stream bitmaps. We can't do what it is done right now: scanning the underline index one key at a time. We have to scan the underline index for all keys at the same time. It seems simply that we just introduce another BitmapOr in the path. Thanks, Jie ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] WIP: bitmap indexes
On 8/15/06 6:18 AM, Tom Lane [EMAIL PROTECTED] wrote: Gavin Sherry [EMAIL PROTECTED] writes: On Mon, 14 Aug 2006, Tom Lane wrote: Correct me if I'm wrong, but isn't the patch's present hacking on the executor intended to make it happen like this? Not really. It reads ahead on the bitmap index and passes back the bitmap words. The other executor routines are hacked up to process the data in this format. Well, as I said, I don't think there's justification for exposing a bitmap index's internal data formats to the rest of the system like that: it's not very future-proof and I don't see that it's buying any significant performance gain. At some point you have to convert to TIDs anyway, at least in the sense of knowing what page and line number you are at, so passing the data as an array of TIDs really isn't going to hurt much. So my advice is to rip out all those changes and go back to the existing tidbitmap.c readout API. There's nothing wrong with the TBMIterateResult data structure. The bitmap words in the bitmap index are very simple and can be very generic. You can think about them as one bit per tuple along with some padding bits between heap pages. The problem I have is that I do not know a good way to construct an in-memory version of this for other index structures, like b-tree. To be able to handle both cases nicely, you are right -- TBMIterateResult is better. Or, PagetableEntry may be better since it will make AND/OR easier. What I do find interesting to think about is whether, strictly within tidbitmap.c, there could be an alternate kind of bitmap object that doesn't have to materialize the whole bitmap for an indexscan in memory because it knows it can fetch the data on-demand, ie, build the next page TBMIterateResult data structure on-the-fly from the index when it's requested. Call it a stream bitmap in contrast to the present materialized bitmaps. The trick here is to be able to AND and OR a stream bitmap with another stream bitmap or a materialized bitmap. I don't see any reason in principle why that couldn't be done: the output of the AND/OR would be a stream of TBMIterateResults just like the inputs. That is, it's another stream bitmap, but with a different generating function and some internal state that includes its source bitmaps. You'd have to sort a materialized bitmap into order before starting to AND/OR it with a stream bitmap, but that code is there already. I like this idea. I think that we can define a new TBMStatus to be TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except that we add a new returning bool argument, stating if this is a stream bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages, and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several bitmaps, the result bitmap is a stream bitmap if there is at least one bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to be able to pull more data from its subnode. Thanks, Jie ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] On-disk bitmap index patch
On 7/26/06 10:14 PM, Tom Lane [EMAIL PROTECTED] wrote: Mark Kirkwood [EMAIL PROTECTED] writes: An obvious deduction is that the TPCH dataset is much more amenable to run compression than my synthetic Zipfian data was. The interesting question is how well real datasets are run compressable, Yeah --- the back-of-the-envelope calculations I was making presupposed uniform random distribution, and we know that's often not realistic for real datasets. A nonuniform distribution would probably mean that some of the bitmaps compress better-than-expected and others worse. I have no idea how to model that and guess what the overall result is ... The paper Optimizing Bitmap Indices With Efficient Compression by Kesheng Wu et al gave an approximate answer for this question. Assume that there are c distinct values. Let the i-th value has a probability of p_i, the number of rows r, and the word size w. then the total size of the compressed bitmap index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in both \sum's, i is from 1 to c. The constraint for this equation is \sum(p_i)=1. Therefore, when all p_i are equal, or the attribute has randomly distributed values, the size of the bitmap index is the largest. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] On-disk bitmap index patch
On 7/26/06 11:50 PM, Tom Lane [EMAIL PROTECTED] wrote: Jie Zhang [EMAIL PROTECTED] writes: On 7/26/06 10:14 PM, Tom Lane [EMAIL PROTECTED] wrote: ... A nonuniform distribution would probably mean that some of the bitmaps compress better-than-expected and others worse. I have no idea how to model that and guess what the overall result is ... The paper Optimizing Bitmap Indices With Efficient Compression by Kesheng Wu et al gave an approximate answer for this question. Assume that there are c distinct values. Let the i-th value has a probability of p_i, the number of rows r, and the word size w. then the total size of the compressed bitmap index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in both \sum's, i is from 1 to c. Hm, but that's still begging the question no? It's still assuming that any one value is uniformly distributed. ISTM the cases that would break my simplistic calculation involve clustering of particular values, such that some areas of the bitmap are all-zero while other areas have lots of ones. Yes, you are right -- each value is still uniformly distributed. But this will be the worst case in terms of the size of a bitmap vector. As for how to model the size of a bitmap vector for an non-uniformly distributed value, that's a good question. I don't really know. But we do know the best case and the worse case. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] On-disk bitmap index patch
On 7/26/06 8:55 PM, Luke Lonergan [EMAIL PROTECTED] wrote: Tom, On 7/26/06 7:26 AM, Tom Lane [EMAIL PROTECTED] wrote: I wonder whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs We tried variations from 8-bit to 32-bit and came to the conclusion that 8-bit was a better match for the more general case. For the moment I forget exactly why (maybe Jie or Ayush will recall). It's a #define I believe, so you can play with it to find out. There's a lot more to be done with this - I think we've got some big gains ahead. For low-cardinality cases, 8-bit word size is usually better than 16-bit and 32-bit. The reason is that in this case, the compression is better in 8-bit than in 16-bit or 32-bit. But for high-cardinality cases, like 10,000, 16-bit is better. That's because a compressed 8-bit can only represent at most 2^7*8=1024 bits. So every 10,000 bits requires about 10 bytes. However, a compressed 16-bit can represent 2^15*16=524288 bits. We only need 2 bytes. I have been thinking about this recently -- maybe we can support different word sizes for different cardinalities. BTW - lots of excitement here at OSCON about bitmap index, and there's a fellow who is using the current Bizgres version, but wants *higher* cardinality support, so I discussed Jie's thoughts about implementing binning on values, basically combining bit vectors into value bins as the cardinality increases. Yes, that's the basic idea. Essentially we want to limit the number of bins into an ideal value so that (1) the size of the bitmap index can be small; (2) this will still guarantee good query performance. The details still need to be working out. I am still puzzled why our index creation time increases so dramatically as we approach cardinality 10,000. I know that we found that index page traversal for append began to take a lot of time as we started to increase the number of bitvector pages - we had talked about aggregating the append operations into groups before they were implemented to sequentialize the access. I believe that this is because of 8-bit word size. We can try to increase it to 16-bit, and see how the result looks like. Thanks, Jie ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] On-disk bitmap index patch
Thanks Tom and Gavin for your comments! Yes, this patch is generated against 8.2 in a short time. As long as the code is working, I post the patch to get some comments and help. * The xlog routines need help; they seem to not be updated for recent changes in the API for xlog recovery code. Yep. The patch was actually against 8.1 and was hastily brought up to 8.2. I think Jie's intention was to simply let everyone know that this was going on. Thanks for pointing this out. I didn't notice that these are changed in 8.2. * The hacks on vacuum.c (where it tests the AM name) are utterly unacceptable. If you need the main code to do something special for a particular index AM, define a bool flag column for it in pg_am. Yes. Sounds good. * The interface to the existing executor bitmap scan code is mighty messy --- seems like there's a lot of almost duplicate code, a lot of rather invasive hacking, etc. This needs to be rethought and refactored. I agree. I will think about this more. * Why in the world is it introducing duplicate copies of a lot of btree comparison functions? Use the ones that are there. Yes, I raised this with Jie and she has fixed it. One thought is, we may want to rename those comparison functions prefixed with 'bm' to make their naming less confusing. They'll be used by btree, gin and bitmap index methods. Anyway, a seperate patch. Yeah, the main problem I hesitated to use btree's comparison functions because of those function names starting with 'bt'. Since Gavin told me that Gin is using those functions as well, I had changed them. Renaming them would be good. * The patch itself is a mess: it introduces .orig and .rej files, changes around $PostgreSQL$ lines, etc. Right, not to mention patches to configure and a lot of style which needs to be knocked into shape. The way I generate a patch is kind of clumsy. I need to find a better way to do that. I will start fixing these. Thanks, Jie ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] On-disk bitmap index patch
On 7/24/06 6:59 AM, Hannu Krosing [EMAIL PROTECTED] wrote: Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane: Gavin Sherry [EMAIL PROTECTED] writes: On Sun, 23 Jul 2006, Tom Lane wrote: However, the main problem I've got with this is that a new index AM is a pretty large burden, and no one's made the slightest effort to sell pghackers on taking this on. For low cardinality sets, bitmaps greatly out perform btree. If the column is sufficiently low cardinality, you might as well just do a seqscan --- you'll be hitting most of the heap's pages anyway. I'm still waiting to be convinced that there's a sweet spot wide enough to justify supporting another index AM. (I'm also wondering whether this doesn't overlap the use-case for GIN.) IIRC they quoted the cardinality of 1 as something that is still faster than btree for several usecases. And also for AND-s of several indexes, where indexes are BIG, your btree indexes may be almost as big as tables but the resulting set of pages is small. Yeah, Hannu points it out very well -- the bitmap index works very well when columns have low cardinalities and AND operations will produce small number of results. Also, the bitmap index is very small in low cardinality cases, where the btree tends to take up at least 10 times more space. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] On-disk bitmap index patch
On 7/24/06 6:04 PM, Bruce Momjian [EMAIL PROTECTED] wrote: Jie Zhang wrote: IIRC they quoted the cardinality of 1 as something that is still faster than btree for several usecases. And also for AND-s of several indexes, where indexes are BIG, your btree indexes may be almost as big as tables but the resulting set of pages is small. Yeah, Hannu points it out very well -- the bitmap index works very well when columns have low cardinalities and AND operations will produce small number of results. What operations on columns of low cardinality produce a small number of results? That seems contradictory. Let's see an example. Table 'T' includes two columns, 'p' and 's'. The column 'p' has 100 distinct values, say p1-p100 and the column 'status' has 20 distinct values, say s1-s20. The query 'select * from order where priority=p1 and status=s1' may produce small number of results. Also, if these related rows are clustered together, that would be even better. Also, the bitmap index is very small in low cardinality cases, where the btree tends to take up at least 10 times more space. Also, are adding/changing rows is more expensive with bitmaps than btrees? Inserting a row will only affect the last word (at most last several words) of a bitmap vector, so this should not be very expensive: 3-4 IOs. When a row is updated and the new row is inserted in the middle of the heap, currently the code will update the bit in the place -- where the bit should be. Searching for the page which includes the bit to be updated is not very efficient now, but this can be fixed. Currently, we have to scan the pages for a bitmap vector one by one until we hit the right page. Since the bitmap vector is compressed, updating a bit in the middle may cause its page to overflow. In this case, we create a new page to accommodate those extra bits, and insert this new page right after the original page. Overall, inserting a row or updating a row can be done efficiently. But it is true that the bitmap index does not perform well if there are lots of inserts and updates, especially updates. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] On-disk bitmap index patch
Hi, I have posted a patch to the CVS head for on-disk bitmap index to pgsql-patches. If this can get in 8.2, that would be great. Any comments and suggestions are welcome. I still need to add several items: (1) README file in src/backend/access/bitmap. (2) Bitmap index documentation. (3) Hiding the internal btree. Also, I have disabled the multi-column index support because there is a known problem. Assume that there is a bitmap index on a and b. When a query predicate has only a, the current code may generate a wrong result. That's because the current code assumes that b is null. The ultimate problem is because the search code only handles one bitmap vector now. I need a fix to support manipulating multiple bitmap vectors. If you find any other problems, please let me know. Thanks, Jie ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] An In-memory Bitmap Index Bug
There is a bug in the in-memory bitmap index on the CVS Tip. Consider this query: select * from bt1 where g=2 and e=20, which will generate the following query plan: QUERY PLAN Bitmap Heap Scan on bt1 (cost=2041.47..19807.47 rows=8451 width=159) Recheck Cond: ((e = 20) AND (g = 2)) - BitmapAnd (cost=2041.47..2041.47 rows=8451 width=0) - Bitmap Index Scan on bt1_btree_e (cost=0.00..145.91 rows=25404 width=0) Index Cond: (e = 20) - Bitmap Index Scan on bt1_btree_g (cost=0.00..1895.31 rows=332661 width=0) Index Cond: (g = 2) (7 rows) With default work_mem setting (1024), the bitmap generated for (e=20) will not have lossy pages, while the bitmap generated for (g=2) will have lossy pages. The problem appears when a page in the first bitmap tries to intersect with the second bitmap. The current code didn't set the page in the first bitmap to lossy after the intersection. As a result, incorrect answers are returned. A patch is attached in this email. Sincerely, Jie Zhang Greenplum pgsql-bitmapscan-bugfix.patch Description: pgsql-bitmapscan-bugfix.patch ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq