Re: [HACKERS] Indexam interface proposal
Added to TODO: * Allow index scans to return matching index keys http://archives.postgresql.org/pgsql-hackers/2007-03/msg01079.php --- Heikki Linnakangas wrote: > Currently amgettuple returns one matching tuple at a time, in index > order. I'm proposing two changes to add support for > - candidate matches > - partial ordering > > Those two features are essential to make clustered indexes work, and in > the future, binned bitmap indexes that don't have a bitmap for each > distinct value but for ranges of values. > > There's a third feature looming in the future, that I haven't addressed: > - returning index values, for index-only scans or at least for filtering > rows before fetching heap tuples. > > > I'm proposing that we keep the one tuple per call nature of the > interface, but add a flag to mark candidate matches. index_getnext or > the executor would need to recheck the original quals for tuples marked > as candidates. > > Another flag would be used to mean "this tuple marks the boundary of a > partial ordering". Let's call it boundary_mark for now. > > For example, if an index scan returned tuples with the following keys, > with tuples on same line meaning the index doesn't know their relative > ordering. > > 1 > 3 4 2 > 5 8 6 7 > 9 > 10 > > amgettuple would return the above tuples like this: > > 1 3 4 2 5 8 6 7 9 10 > * * * * * > > where the tuples marked with * would have the boundary_mark-flag set. If > the plan requires ordered results, index_getnext would have to sort > tuples between two markers before returning them to the caller. > > amgetmulti would also need to have the candidate-flag added as I > proposed in the "Bitmapindexscan changes" patch I sent earlier to > pgsql-patches. > > This interface change would solve much of the ugliness of my GIT patch, > by generalizing the index quals checking and sorting code to index_getnext. > > Another source of ugliness in the patch is in inserting new tuples. > Inserts need to reach to the heap to fetch heap tuples, to compare keys > when splitting a group. I don't see a clean fix for that, but I don't > think it's as bad as the index scan code. > > -- >Heikki Linnakangas >EnterpriseDB http://www.enterprisedb.com > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indexam interface proposal
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: Martijn van Oosterhout wrote: IIRC indexes can already ask to have the system recheck conditions on returned tuples. For example GiST can return more tuples than actually match. That's what the amopreqcheck column is for in pg_amop. Right, except that flag is per operator in operator class, and what I'm proposing is that the index could pass a flag per tuple in the scan. The reason for attaching the flag to operators is so that the system (particularly the planner) can tell *which* conditions need to be rechecked, and can prepare the necessary expression infrastructure. I dislike the idea of having to be prepared to do that every time for every indexscan. I don't see any big down-side in preparing for that. We'd need to always store the original index quals in the executor node, like we do now with recheck-flagged operators, but that doesn't seem too bad to me. I suppose we would want to keep the existing per-operator recheck-flag and quals as it is, and add another field like indexqualorig to be used to recheck tuples amgetnext flags as candidates. The notion of having to be prepared to sort (according to what?) is even worse. That we wouldn't need for clustered indexes, if we change the current design a bit. Either: * store a sorted list of offsetnumbers for each group, instead of a bitmap, * or store a bitmap like now, but require that heap tuples in a grouped index tuple are in cluster order within the heap page. The first option eats away some of the space savings, the second option makes clustered indexes to become declustered quicker if there's out-of-order updates or inserts. Choosing either option would also reduce the CPU overhead of index scans, because we could use binary search within a grouped index tuple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Indexam interface proposal
Given that rechecking requires Expr and state structures, maybe it would be easier to make the operators RECHECK so the planner does the right thing now, but make a flag that tells the index scan *not* to recheck this tuple. That would seem slightly less work and fit better with the existing code. (In other words, it's an optimisation rather than a big change). I like your suggestion - it's useful for GiST/GIN fulltext indexing. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Indexam interface proposal
Interesting. So we'd add a flag to the index tuples in GiST indicating if the tuple is lossily compressed or not. The compress-function would set that flag when it performs lossy compression, and gistgettuple would return it to the caller. Keys in GiST index may be another type than column on which index was created, so gistgettuple can not return tuple in original form - but sometimes gistgettuple may be sure that recheck isn't needed. That would completely replace the current RECHECK-option we have, right? Yeah, this is possible. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Indexam interface proposal
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Martijn van Oosterhout wrote: >> IIRC indexes can already ask to have the system recheck conditions on >> returned tuples. For example GiST can return more tuples than actually >> match. That's what the amopreqcheck column is for in pg_amop. > Right, except that flag is per operator in operator class, and what I'm > proposing is that the index could pass a flag per tuple in the scan. The reason for attaching the flag to operators is so that the system (particularly the planner) can tell *which* conditions need to be rechecked, and can prepare the necessary expression infrastructure. I dislike the idea of having to be prepared to do that every time for every indexscan. The notion of having to be prepared to sort (according to what?) is even worse. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Indexam interface proposal
On Mon, Mar 19, 2007 at 04:40:52PM +0300, Teodor Sigaev wrote: > >Right, except that flag is per operator in operator class, and what I'm > >proposing is that the index could pass a flag per tuple in the scan. > > That might make sense even for GiST. Sometimes complex compressions is used > in GiST opclasses. If indexing value is rather small then it's stored in > index as is, but large value is compressed with lossy techniques. So, GiST > might return a tuple which is allowed to not recheck. Given that rechecking requires Expr and state structures, maybe it would be easier to make the operators RECHECK so the planner does the right thing now, but make a flag that tells the index scan *not* to recheck this tuple. That would seem slightly less work and fit better with the existing code. (In other words, it's an optimisation rather than a big change). Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Indexam interface proposal
Teodor Sigaev wrote: Right, except that flag is per operator in operator class, and what I'm proposing is that the index could pass a flag per tuple in the scan. That might make sense even for GiST. Sometimes complex compressions is used in GiST opclasses. If indexing value is rather small then it's stored in index as is, but large value is compressed with lossy techniques. So, GiST might return a tuple which is allowed to not recheck. Interesting. So we'd add a flag to the index tuples in GiST indicating if the tuple is lossily compressed or not. The compress-function would set that flag when it performs lossy compression, and gistgettuple would return it to the caller. That would completely replace the current RECHECK-option we have, right? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Indexam interface proposal
Right, except that flag is per operator in operator class, and what I'm proposing is that the index could pass a flag per tuple in the scan. That might make sense even for GiST. Sometimes complex compressions is used in GiST opclasses. If indexing value is rather small then it's stored in index as is, but large value is compressed with lossy techniques. So, GiST might return a tuple which is allowed to not recheck. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Indexam interface proposal
Martijn van Oosterhout wrote: On Mon, Mar 19, 2007 at 12:23:01PM +, Heikki Linnakangas wrote: Currently amgettuple returns one matching tuple at a time, in index order. I'm proposing two changes to add support for - candidate matches IIRC indexes can already ask to have the system recheck conditions on returned tuples. For example GiST can return more tuples than actually match. That's what the amopreqcheck column is for in pg_amop. Right, except that flag is per operator in operator class, and what I'm proposing is that the index could pass a flag per tuple in the scan. Some tuples in the scan might need rechecking, some might not. The need for rechecking in clustered indexes is orthogonal to the need arising from the lossyness of GiST operators. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Indexam interface proposal
On Mon, Mar 19, 2007 at 12:23:01PM +, Heikki Linnakangas wrote: > Currently amgettuple returns one matching tuple at a time, in index > order. I'm proposing two changes to add support for > - candidate matches IIRC indexes can already ask to have the system recheck conditions on returned tuples. For example GiST can return more tuples than actually match. That's what the amopreqcheck column is for in pg_amop. Or am I misunderstanding? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
[HACKERS] Indexam interface proposal
Currently amgettuple returns one matching tuple at a time, in index order. I'm proposing two changes to add support for - candidate matches - partial ordering Those two features are essential to make clustered indexes work, and in the future, binned bitmap indexes that don't have a bitmap for each distinct value but for ranges of values. There's a third feature looming in the future, that I haven't addressed: - returning index values, for index-only scans or at least for filtering rows before fetching heap tuples. I'm proposing that we keep the one tuple per call nature of the interface, but add a flag to mark candidate matches. index_getnext or the executor would need to recheck the original quals for tuples marked as candidates. Another flag would be used to mean "this tuple marks the boundary of a partial ordering". Let's call it boundary_mark for now. For example, if an index scan returned tuples with the following keys, with tuples on same line meaning the index doesn't know their relative ordering. 1 3 4 2 5 8 6 7 9 10 amgettuple would return the above tuples like this: 1 3 4 2 5 8 6 7 9 10 * * * * * where the tuples marked with * would have the boundary_mark-flag set. If the plan requires ordered results, index_getnext would have to sort tuples between two markers before returning them to the caller. amgetmulti would also need to have the candidate-flag added as I proposed in the "Bitmapindexscan changes" patch I sent earlier to pgsql-patches. This interface change would solve much of the ugliness of my GIT patch, by generalizing the index quals checking and sorting code to index_getnext. Another source of ugliness in the patch is in inserting new tuples. Inserts need to reach to the heap to fetch heap tuples, to compare keys when splitting a group. I don't see a clean fix for that, but I don't think it's as bad as the index scan code. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org