Re: [HACKERS] Best way to scan on-disk bitmaps
On Mon, 13 Jun 2005, Tom Lane wrote: Teodor Sigaev [EMAIL PROTECTED] wrote: ... So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other operations. Number of supported operations by GiST is indefinite unlike, for example, btree which supported only five: , =, =, =, . I have committed changes to the planner to arrange that a GiST indexscan must supply at least one restriction clause for the first index column, and can supply restriction clauses for any, all, or none of the remaining columns; the old left-to-right heuristic is gone. As far as I can tell, this doesn't require any changes to the GiST code, but please take another look if you aren't too sure about it. I did quick test and found no problem with gist(a,b,c) and index does used for (a,*,c) case regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Best way to scan on-disk bitmaps
Teodor Sigaev [EMAIL PROTECTED] wrote: ... So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other operations. Number of supported operations by GiST is indefinite unlike, for example, btree which supported only five: , =, =, =, . I have committed changes to the planner to arrange that a GiST indexscan must supply at least one restriction clause for the first index column, and can supply restriction clauses for any, all, or none of the remaining columns; the old left-to-right heuristic is gone. As far as I can tell, this doesn't require any changes to the GiST code, but please take another look if you aren't too sure about it. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Best way to scan on-disk bitmaps
Tom Lane [EMAIL PROTECTED] writes: I think it would be easy to change the planner and btree to handle this (where easy means I remember where all the skeletons are buried). But I don't know the gist code hardly at all. Can anyone offer an informed opinion on whether gist can handle this now, and if not what it would take to handle it? Currently gist indexes only use the first column for page splits, making multi-key gist indexes basically useless. The problem is it's hard to imagine an API for a pickSplit function that could handle multi-key indexes with disparate data types and operator classes. I had an idea of a new way to deal with gist indexes that simplified the API and side-stepped the whole issue but you raised concerns that it might be too limiting. Unfortunately the mailing list archive seems to have missed this discussion. I've attached three messages from the discussion at the time. ---BeginMessage--- Oleg Teodor, If I understand the code correctly, GiST will only pass the first attribute of each index tuple to the user-defined PickSplit method when it wants to split a node. (see circa line 1269 of gist.c) Is this a wise design decision? Granted, in many situations the first attribute in the index is sufficient to make a reasonable decision about how to divide the node into two halves, but I don't think that is universally true. For example, consider a two column index whose first attribute has a small number of distinct values. It could well be that all the first attribute values in a node-to-be-split would be the same. Only passing the first attribute to PickSplit would result in an essentially random distribution of tuples among the split nodes, rather than allowing the GiST extension to make use of the second attribution to partition the nodes. That's an extreme example, but it is easy to construct more realistic scenarios (basically, any situation in which the cardinality of the first index attribute is low -- a reasonably common occurrence with a multi-column index, I believe). I'm not sure whether this would be a problem in practice. Speculation: repeated invocations of PickSplit are one of the main factors in deriving the ultimate shape of the GiST tree. Poor distribution of keys resulting from PickSplit would eventually result in unnecessarily loose bounding predicates in internal nodes, which would hurt performance. Comments welcome, Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---End Message--- ---BeginMessage--- Greg Stark [EMAIL PROTECTED] writes: I'm not sure that GiST indexes behave the same way as btree indexes for the multi-column case. In a btree index the second column is entirely subordinate to the first column. In a GiST index the data is multi-dimensional, and all dimensions are equally important. In fact on further consideration I do have a proposal. If you look at the GiST implementations for various index types you'll see that many (all?) take the same approach for PickSplit. In fact they pretty much all have the same code copy/pasted to handle it. The approach they take is to have a function which calculates an abstract distance between any two entries. There's an algorithm that they use to pick the split based on this distance function. If you abandoned PickSplit and instead exposed this distance function as the external API then the behaviour for multi-column indexes is clear. You calculate the distance along all the axes and calculate the diagonal distance. I think abandoning PickSplit for the distance function might also mean you don't need a separate function for Penalty either, but I'm not sure on that. -- greg ---End Message--- ---BeginMessage--- Greg Stark [EMAIL PROTECTED] writes: The approach they take is to have a function which calculates an abstract distance between any two entries. There's an algorithm that they use to pick the split based on this distance function. If you abandoned PickSplit and instead exposed this distance function as the external API then the behaviour for multi-column indexes is clear. You calculate the distance along all the axes and calculate the diagonal distance. Hmm ... the problem with that is the assumption that different opclasses will compute similarly-scaled distances. If opclass A generates distances in the range (0,1e6) while B generates in the range (0,1), combining them with Euclidean distance won't work well at all. OTOH you can't blindly normalize, because in some cases maybe the data is such that a massive difference in distances is truly appropriate. I'm also a bit leery of the assumption that every GiST application can reduce its PickSplit logic to Euclidean distances. regards, tom lane ---End Message--- (hash and rtree are not at issue since they don't support multi-key indexes.) It seems like it would be trivial to make
Re: [HACKERS] Best way to scan on-disk bitmaps
About page splitting algorithm in GiST in multikey case. For the beginning, page is splitted by calling pickSplit method of key of first column (pickSplit method is defined for opclass and it is a user function), then it try to find equal values of first column in left and right pages ( gist.c lines 1264-1901 ). If it is, then GiST core will try to resort tuples with first equal keys between left and right pages using penalty method for second and higher column's key. If it's not, it leave pages untouched. But unions for parent page of second and higher column's keys will be formed. So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other operations. Number of supported operations by GiST is indefinite unlike, for example, btree which supported only five: , =, =, =, . -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Best way to scan on-disk bitmaps
If people have GIST TODOs, please post them. --- Teodor Sigaev wrote: About page splitting algorithm in GiST in multikey case. For the beginning, page is splitted by calling pickSplit method of key of first column (pickSplit method is defined for opclass and it is a user function), then it try to find equal values of first column in left and right pages ( gist.c lines 1264-1901 ). If it is, then GiST core will try to resort tuples with first equal keys between left and right pages using penalty method for second and higher column's key. If it's not, it leave pages untouched. But unions for parent page of second and higher column's keys will be formed. So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other operations. Number of supported operations by GiST is indefinite unlike, for example, btree which supported only five: , =, =, =, . -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Best way to scan on-disk bitmaps
Bruce Momjian wrote: If people have GIST TODOs, please post them. Concurrency :) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Best way to scan on-disk bitmaps
On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote: Bruce Momjian wrote: If people have GIST TODOs, please post them. Concurrency :) And WAL support. -- Alvaro Herrera (alvherre[a]surnet.cl) No necesitamos banderas No reconocemos fronteras (Jorge González) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Best way to scan on-disk bitmaps
Alvaro Herrera wrote: On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote: Bruce Momjian wrote: If people have GIST TODOs, please post them. Concurrency :) And WAL support. Already there: * Add WAL index reliability improvement to non-btree indexes and this too: * Add concurrency to GIST -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Best way to scan on-disk bitmaps
Bruce Momjian pgman@candle.pha.pa.us writes: Hmm. That particular case will work, but the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. This is true for btree and gist indexes, so I suppose we'd need to introduce a pg_am column that tells what to do. We do have a TODO for this: * Use index to restrict rows returned by multi-key index when used with non-consecutive keys to reduce heap accesses For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and col3 = 9, spin though the index checking for col1 and col3 matches, rather than just col1; also called skip-scanning. That TODO is something else. Though it is related in that it is another example of why the existing code is too simplistic and will eventually need to be enhanced. Not only would bitmap indexes and (possibly) gist indexes, but even btree indexes would need to do so if this TODO were implemented. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Best way to scan on-disk bitmaps
On Thu, 12 May 2005 17:40:06 -0400, Tom Lane [EMAIL PROTECTED] wrote: the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. This is true for btree [...] It's not difficult to setup a test case where an index is used, but only with a=42 as an index condition, and c='foo' is applied as a filter condition. Now add a redundant condition on b ... AND b BETWEEN minb AND maxb ... and watch how c='foo' moves into the index condition. I did this test some time ago and I believe that adding the condition on b did not change the estimated cost, only the actual execution time. Servus Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Best way to scan on-disk bitmaps
Greg Stark [EMAIL PROTECTED] writes: Bruce Momjian pgman@candle.pha.pa.us writes: Hmm. That particular case will work, but the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. We do have a TODO for this: * Use index to restrict rows returned by multi-key index when used with non-consecutive keys to reduce heap accesses That TODO is something else. No, I think Bruce is right --- it's essentially the same thing. It certainly would be a good idea to change btrees to work like that, if we are going to modify the planner to relax the restriction for other index types. I think it would be easy to change the planner and btree to handle this (where easy means I remember where all the skeletons are buried). But I don't know the gist code hardly at all. Can anyone offer an informed opinion on whether gist can handle this now, and if not what it would take to handle it? (hash and rtree are not at issue since they don't support multi-key indexes.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Best way to scan on-disk bitmaps
On Sun, 15 May 2005, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Bruce Momjian pgman@candle.pha.pa.us writes: Hmm. That particular case will work, but the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. We do have a TODO for this: * Use index to restrict rows returned by multi-key index when used with non-consecutive keys to reduce heap accesses That TODO is something else. No, I think Bruce is right --- it's essentially the same thing. It certainly would be a good idea to change btrees to work like that, if we are going to modify the planner to relax the restriction for other index types. I think it would be easy to change the planner and btree to handle this (where easy means I remember where all the skeletons are buried). But I don't know the gist code hardly at all. Can anyone offer an informed opinion on whether gist can handle this now, and if not what it would take to handle it? I think that handling this in GiST is depends solely on how users consistent function designed to handle NULLs in keys. Other words, it should works as soon as users consistent function know what to do with NULLs in internal keys. Teodor will comment multi-key GiST tomorrow. We used Paul Aoki paper Generalizing ''Search'' in Generalized Search Trees, (available from http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf ) for implementation of multi-key GiST index support. It's true, that first key is used for splitting, but elements with duplicated first key could be shuffled to get better clustering on second key. (hash and rtree are not at issue since they don't support multi-key indexes.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Best way to scan on-disk bitmaps
Tom Lane wrote: Victor Y. Yegorov [EMAIL PROTECTED] writes: If I have on-disk bitmap ON (a, b, c) will the planner pick an index scan for WHERE a = 42 AND b = 'foo' (i.e. only part of the index attributes are involved)? Any modifications needed to achieve this functionality? Hmm. That particular case will work, but the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. This is true for btree and gist indexes, so I suppose we'd need to introduce a pg_am column that tells what to do. We do have a TODO for this: * Use index to restrict rows returned by multi-key index when used with non-consecutive keys to reduce heap accesses For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and col3 = 9, spin though the index checking for col1 and col3 matches, rather than just col1; also called skip-scanning. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Best way to scan on-disk bitmaps
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Plan B would be to remove that restriction and teach btree and gist to cope. While a btree couldn't use a nonconsecutive restriction as part of its where-to-scan logic, I don't see any good reason why it couldn't still perform the test before returning the TID, thus possibly saving a trip to the heap. [ snip ] In this model the columns listed in the gist index are unordered. Any subset of columns can be used to perform an index lookup. Making it more like the bitmap index behaviour you're looking at than the btree index behaviour. I thought some more about this since sending my earlier message. As far as I can recall at the moment, there really isn't anything fundamental that depends on the consecutive-columns rule. The one place where the rubber meets the road is in the index cost estimation functions: if we were to relax that rule, then btcostestimate would have to be taught to include only the consecutive columns when estimating how much of a btree index is going to be touched. And more than that: if you've studied the btree code at all, you realize that that's only an incomplete heuristic anyway. For instance, if the leading key is a xxx, second keys like b yyy and b yyy act completely differently in terms of indexscan cost, but btcostestimate doesn't presently know that. I wonder if we shouldn't migrate the amcostestimate functions into the individual index AMs (which would mean adding a column to pg_am, but so what). btcostestimate could be much less phony about this if it had access to the same infrastructure that _bt_first uses to examine the index clauses. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Best way to scan on-disk bitmaps
Greetings. I have questions on how to implement on-disk bitmap scan. I've been working on on-disk bitmaps as an ordinary Index Access Method, but now it's clear to me, that it'll lose all it's strengths this way. One on-disk bitmap has exactly one list of indexed table's TIDs and potentially unlimited number of bitmaps (number of index attributes multiplied by attribute's cardinality, to be precise). So, for better performance, one should first retrieve all the needed bitmaps from the index, then join all bitmaps according to the query clauses and, as the last phase, retrieve TIDs from index, that matches final bitmap. According to the docs the index access method is responsible for regurgitating the TIDs ..., but for on-disk bitmaps index scan is devided into 3 phases. So, to perform the scan in phases, to my mind, executor should be involved. (I'd like to mention again, that this is the first time I got so deep inside postgres code). I wanted to use Tom's nodeBitmap* stuff, but it's problematic a bit. Bitmaps in nodeBitmap* are built upon list of TIDs retrieved during relation scan. For on-disk bitmap indexes, there's no need for that, as all bitmaps are already inside the index. The question is: Is it possible to extend nodeBitmap functionality in such a way, that Executor can either build bitmap after list of TIDs, obtained from RelationScan, or ask index access method to give bitmaps it contain (and TIDs at given position in the map later)? This will, probably, require more functions in the pg_am catalog. Or should I create a completely new node for on-disk bitmaps? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Best way to scan on-disk bitmaps
Victor Y. Yegorov [EMAIL PROTECTED] writes: I have questions on how to implement on-disk bitmap scan. I think your best plan might be 1. Be sure that all the indexable WHERE conditions are passed to the indexscan as indexquals. This might be, say, WHERE a = 42 and b = 'foo' 2. Within the index AM, form the AND of the relevant bitmaps (here the ones for a = 42 and b = 'foo'). 3. Within the index AM, pick up the TIDs for the remaining one-bits, and pass them back. 4. Let the existing machinery handle the OR-ing problem as well as actual fetching of the heap rows. This can be done without any restructuring of the index AM API. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Best way to scan on-disk bitmaps
* Tom Lane [EMAIL PROTECTED] [12.05.2005 23:09]: 1. Be sure that all the indexable WHERE conditions are passed to the indexscan as indexquals. This might be, say, WHERE a = 42 and b = 'foo' If I have on-disk bitmap ON (a, b, c) will the planner pick an index scan for WHERE a = 42 AND b = 'foo' (i.e. only part of the index attributes are involved)? Any modifications needed to achieve this functionality? To my mind, bitmap scan even for 1 attribute of a multi-column index would be a win, though I haven't tested this yet. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Best way to scan on-disk bitmaps
Victor Y. Yegorov [EMAIL PROTECTED] writes: If I have on-disk bitmap ON (a, b, c) will the planner pick an index scan for WHERE a = 42 AND b = 'foo' (i.e. only part of the index attributes are involved)? Any modifications needed to achieve this functionality? Hmm. That particular case will work, but the planner believes that only consecutive columns in the index are usable --- that is, if you have quals for a and c but not for b, it will think that the condition for c isn't usable with the index. This is true for btree and gist indexes, so I suppose we'd need to introduce a pg_am column that tells what to do. [ thinks some more ... ] Plan B would be to remove that restriction and teach btree and gist to cope. While a btree couldn't use a nonconsecutive restriction as part of its where-to-scan logic, I don't see any good reason why it couldn't still perform the test before returning the TID, thus possibly saving a trip to the heap. Offhand it seems this should be true of gist as well, but I don't know that code well enough to be sure. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Best way to scan on-disk bitmaps
Tom Lane [EMAIL PROTECTED] writes: Plan B would be to remove that restriction and teach btree and gist to cope. While a btree couldn't use a nonconsecutive restriction as part of its where-to-scan logic, I don't see any good reason why it couldn't still perform the test before returning the TID, thus possibly saving a trip to the heap. Offhand it seems this should be true of gist as well, but I don't know that code well enough to be sure. Not long ago there was some discussion about how gist indexes don't really handle multicolumn indexes usefully currently. They use only the first column to determine page splits. The discussion wandered and it became clear that it wasn't even clear what a multicolumn gist index should mean. I suggested treating each column as independent axes. Independently ask each column's datatype for the distance value and then calculate the inner product as a kind of geometric n-dimensional distance. There was some resistance since this limits gist indexes to always basing page splits on a single distance based algorithm. (Though all current gist index methods in the postgres source tree do work this way, mostly with copy-pasted code in fact.) In this model the columns listed in the gist index are unordered. Any subset of columns can be used to perform an index lookup. Making it more like the bitmap index behaviour you're looking at than the btree index behaviour. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org