Re: [PATCH] kNN for btree
Hi, Andrey! On 31.03.2024 12:22, Andrey M. Borodin wrote: On 15 Jan 2024, at 13:11, Anton A. Melnikov wrote: If there are any ideas pro and contra would be glad to discuss them. Hi, Anton! This is kind of ancient thread. I've marked CF entry [0] as "Needs review" and you as an author (seems like you are going to be a point of correspondence on this feature). That's right, i would like to bring the work on this feature to a positive result. First of all, let me share a simple test that allows one to estimate the effect of applying this patch and, i hope, can be considered as a criterion for future versions. For all the tests below, one should set the following settings: set enable_seqscan to false; set enable_indexscan to true; set enable_bitmapscan to false; set enable_indexonlyscan to true; set max_parallel_workers_per_gather = 0; To carry out the test, one can use the table "events" mentioned in the first message of this thread, linked here [1]. psql -f events.dump Then perform a query like that: explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 10; When using the existing btree_gist extension and preliminary commands executing: create extension btree_gist; CREATE INDEX event_date_idx ON events USING gist (date); the test query gives: postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 10; QUERY PLAN -- Limit (actual time=0.759..102.992 rows=10 loops=1) -> Index Only Scan using event_date_idx on events (actual time=0.757..97.021 rows=10 loops=1) Order By: (date <-> '1957-10-04'::date) Heap Fetches: 0 Planning Time: 0.504 ms Execution Time: 108.311 ms Average value on my PC was 107+-1 ms. When using an existing patch from [1] and creating a btree index: CREATE INDEX event_date_idx ON events USING btree (date); instead of btree_gist one, the test query gives: postgres=# explain (costs off, analyze on) SELECT date FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT 10; QUERY PLAN -- Limit (actual time=0.120..48.817 rows=10 loops=1) -> Index Only Scan using event_date_idx on events (actual time=0.117..42.610 rows=10 loops=1) Order By: (date <-> '1957-10-04'::date) Heap Fetches: 0 Planning Time: 0.487 ms Execution Time: 54.463 ms 55+-1 ms on average. The execution time is reduced by ~2 times. So the effect is obvious but the implementation problems are reasonable too. On 15.01.2024 11:11, Anton A. Melnikov wrote: On 16.03.2020 16:17, Alexander Korotkov wrote: After another try to polish this patch I figured out that the way it's implemented is unnatural. I see the two reasonable ways to implement knn for B-tree, but current implementation matches none of them. 1) Implement knn as two simultaneous scans over B-tree: forward and backward. It's similar to what current patchset does. But putting this logic to B-tree seems artificial. What B-tree does here is still unidirectional scan. On top of that we merge results of two unidirectional scans. The appropriate place to do this high-level work is IndexScan node or even Optimizer/Executor (Merge node over to IndexScan nodes), but not B-tree itself. 2) Implement arbitrary scans in B-tree using priority queue like GiST and SP-GiST do. That would lead to much better support for KNN. We can end up in supporting interesting cases like "ORDER BY col1 DESC, col2 <> val1, col2 ASC" or something. But that's requires way more hacking in B-tree core. At first i'm going to implement p.1). I think it's preferable for now because it seems easier and faster to get a working version. I was wrong here. Firstly, this variant turned out to be not so easy and fast, and secondly, when i received the desired query plan, i was not happy with the results: In the case of btree_gist, splitting the query into two scans at the optimizer level and adding MergeAppend on the top of it resulted in a ~13% slowdown in query execution. The average time became ~121 ms. Limit (actual time=1.205..117.689 rows=10 loops=1) -> Merge Append (actual time=1.202..112.260 rows=10 loops=1) Sort Key: ((events.date <-> '1957-10-04'::date)) -> Index Only Scan using event_date_idx on events (actual time=0.713..43.372 rows=42585 loops=1) Index Cond: (date < '1957-10-04'::date) Order By: (date <-> '1957-10-04'::date) Heap Fetches: 0 -> Index Only Scan using event_date_idx on events events_1
Re: [PATCH] kNN for btree
> On 15 Jan 2024, at 13:11, Anton A. Melnikov wrote: > > If there are any ideas pro and contra would be glad to discuss them. Hi, Anton! This is kind of ancient thread. I've marked CF entry [0] as "Needs review" and you as an author (seems like you are going to be a point of correspondence on this feature). At this point it's obvious that the feature won't make it to 17, so let's move to July CF. Given 7 years of history in this thread, you might want to start a new one with a summarisation of this thread. This will attract more reviewers, either way they have to dig on their own. Thanks! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/47/4871/
Re: [PATCH] kNN for btree
On Wed, Mar 4, 2020 at 2:39 PM Alexander Korotkov wrote: > On Wed, Mar 4, 2020 at 4:58 AM Peter Geoghegan wrote: > > On Mon, Mar 2, 2020 at 1:27 PM Alexander Korotkov > > wrote: > > > I've rebased the patchset to the current master and made some > > > refactoring. I hope it would be possible to bring it to committable > > > shape during this CF. This need more refactoring though. > > > > This patch doesn't change anything about the B-Tree on-disk format -- right? > > Yes, this is correct. No on-disk format changes, just new scanning strategy. After another try to polish this patch I figured out that the way it's implemented is unnatural. I see the two reasonable ways to implement knn for B-tree, but current implementation matches none of them. 1) Implement knn as two simultaneous scans over B-tree: forward and backward. It's similar to what current patchset does. But putting this logic to B-tree seems artificial. What B-tree does here is still unidirectional scan. On top of that we merge results of two unidirectional scans. The appropriate place to do this high-level work is IndexScan node or even Optimizer/Executor (Merge node over to IndexScan nodes), but not B-tree itself. 2) Implement arbitrary scans in B-tree using priority queue like GiST and SP-GiST do. That would lead to much better support for KNN. We can end up in supporting interesting cases like "ORDER BY col1 DESC, col2 <> val1, col2 ASC" or something. But that's requires way more hacking in B-tree core. So, I'm marking this patch RWF. We should try re-implement this for v14. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
On Wed, Mar 4, 2020 at 4:58 AM Peter Geoghegan wrote: > On Mon, Mar 2, 2020 at 1:27 PM Alexander Korotkov > wrote: > > I've rebased the patchset to the current master and made some > > refactoring. I hope it would be possible to bring it to committable > > shape during this CF. This need more refactoring though. > > This patch doesn't change anything about the B-Tree on-disk format -- right? Yes, this is correct. No on-disk format changes, just new scanning strategy. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
On Mon, Mar 2, 2020 at 1:27 PM Alexander Korotkov wrote: > I've rebased the patchset to the current master and made some > refactoring. I hope it would be possible to bring it to committable > shape during this CF. This need more refactoring though. This patch doesn't change anything about the B-Tree on-disk format -- right? -- Peter Geoghegan
Re: [PATCH] kNN for btree
On Tue, Dec 3, 2019 at 3:00 AM Alexander Korotkov wrote: > On Mon, Sep 9, 2019 at 11:28 PM Alexander Korotkov > wrote: > > On Sun, Sep 8, 2019 at 11:35 PM Alexander Korotkov > > wrote: > > > I'm going to push 0001 changing "attno >= 1" to assert. > > > > Pushed. Rebased patchset is attached. I propose to limit > > consideration during this commitfest to this set of 7 remaining > > patches. The rest of patches could be considered later. I made some > > minor editing in preparation to commit. But I decided I've couple > > more notes to Nikita. > > > > * 0003 extracts part of fields from BTScanOpaqueData struct into new > > BTScanStateData struct. However, there is a big comment regarding > > BTScanOpaqueData just before definition of BTScanPosItem. It needs to > > be revised. > > * 0004 adds "knnState" field to BTScanOpaqueData in addition to > > "state" field. Wherein "knnState" might unused during knn scan if it > > could be done in one direction. This seems counter-intuitive. Could > > we rework this to have "forwardState" and "backwardState" fields > > instead of "state" and "knnState"? > > I have reordered patchset into fewer more self-consistent patches. > > Besides comments, code beautification and other improvements, now > there are dedicated fields for forward and backward scan states. The > forward scan state is the pointer to data structure, which is used in > ordinal unidirectional scan. So, it's mostly cosmetic change, but it > improves the code readability. > > One thing bothers me. We have to move scalar distance operators from > btree_gist to core. btree_gist extension upgrade script have to > qualify operators with schema in order to distinguish core and > extension implementations. So, I have to use @extschema@. But it's > forbidden for relocatable extensions. For now, I marken btree_gist as > non-relocatable. Our comment in extension.c says "For a relocatable > extension, we needn't do this. There cannot be any need for > @extschema@, else it wouldn't be relocatable.". Is it really true? I > think now we have pretty valid example for relocatable extension, > which needs @extschema@ in upgrade script. Any thoughts? I've rebased the patchset to the current master and made some refactoring. I hope it would be possible to bring it to committable shape during this CF. This need more refactoring though. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Introduce-IndexAmRoutine.ammorderbyopfirstcol-v17.patch.gz Description: GNU Zip compressed data 0002-Allow-ordering-by-operator-in-ordered-indexes-v17.patch.gz Description: GNU Zip compressed data 0004-Move-scalar-distance-operators-from-btree_gist-t-v17.patch.gz Description: GNU Zip compressed data 0003-Extract-BTScanState-from-BTScanOpaqueData-v17.patch.gz Description: GNU Zip compressed data 0005-Add-knn-support-to-btree-indexes-v17.patch.gz Description: GNU Zip compressed data
Re: [PATCH] kNN for btree
On Mon, Sep 9, 2019 at 11:28 PM Alexander Korotkov wrote: > On Sun, Sep 8, 2019 at 11:35 PM Alexander Korotkov > wrote: > > I'm going to push 0001 changing "attno >= 1" to assert. > > Pushed. Rebased patchset is attached. I propose to limit > consideration during this commitfest to this set of 7 remaining > patches. The rest of patches could be considered later. I made some > minor editing in preparation to commit. But I decided I've couple > more notes to Nikita. > > * 0003 extracts part of fields from BTScanOpaqueData struct into new > BTScanStateData struct. However, there is a big comment regarding > BTScanOpaqueData just before definition of BTScanPosItem. It needs to > be revised. > * 0004 adds "knnState" field to BTScanOpaqueData in addition to > "state" field. Wherein "knnState" might unused during knn scan if it > could be done in one direction. This seems counter-intuitive. Could > we rework this to have "forwardState" and "backwardState" fields > instead of "state" and "knnState"? I have reordered patchset into fewer more self-consistent patches. Besides comments, code beautification and other improvements, now there are dedicated fields for forward and backward scan states. The forward scan state is the pointer to data structure, which is used in ordinal unidirectional scan. So, it's mostly cosmetic change, but it improves the code readability. One thing bothers me. We have to move scalar distance operators from btree_gist to core. btree_gist extension upgrade script have to qualify operators with schema in order to distinguish core and extension implementations. So, I have to use @extschema@. But it's forbidden for relocatable extensions. For now, I marken btree_gist as non-relocatable. Our comment in extension.c says "For a relocatable extension, we needn't do this. There cannot be any need for @extschema@, else it wouldn't be relocatable.". Is it really true? I think now we have pretty valid example for relocatable extension, which needs @extschema@ in upgrade script. Any thoughts? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0002-Allow-ordering-by-operator-in-ordered-indexes-v16.patch.gz Description: GNU Zip compressed data 0003-Extract-BTScanState-from-BTScanOpaqueData-v16.patch.gz Description: GNU Zip compressed data 0004-Move-scalar-distance-operators-from-btree_gist-t-v16.patch.gz Description: GNU Zip compressed data 0005-Add-knn-support-to-btree-indexes-v16.patch.gz Description: GNU Zip compressed data 0001-Introduce-IndexAmRoutine.ammorderbyopfirstcol-v16.patch.gz Description: GNU Zip compressed data
Re: [PATCH] kNN for btree
On Sun, Sep 8, 2019 at 11:35 PM Alexander Korotkov wrote: > I'm going to push 0001 changing "attno >= 1" to assert. Pushed. Rebased patchset is attached. I propose to limit consideration during this commitfest to this set of 7 remaining patches. The rest of patches could be considered later. I made some minor editing in preparation to commit. But I decided I've couple more notes to Nikita. * 0003 extracts part of fields from BTScanOpaqueData struct into new BTScanStateData struct. However, there is a big comment regarding BTScanOpaqueData just before definition of BTScanPosItem. It needs to be revised. * 0004 adds "knnState" field to BTScanOpaqueData in addition to "state" field. Wherein "knnState" might unused during knn scan if it could be done in one direction. This seems counter-intuitive. Could we rework this to have "forwardState" and "backwardState" fields instead of "state" and "knnState"? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0002-Enable-ORDER-BY-operator-scans-on-ordered-indexe-v15.patch.gz Description: GNU Zip compressed data 0001-Introduce-ammorderbyopfirstcol-v15.patch.gz Description: GNU Zip compressed data 0003-Extract-structure-BTScanState-v15.patch.gz Description: GNU Zip compressed data 0004-Add-kNN-support-to-btree-v15.patch.gz Description: GNU Zip compressed data 0005-Add-btree-distance-operators-v15.patch.gz Description: GNU Zip compressed data 0006-Remove-distance-operators-from-btree_gist-v15.patch.gz Description: GNU Zip compressed data 0007-Add-regression-tests-for-kNN-btree-v15.patch.gz Description: GNU Zip compressed data
Re: [PATCH] kNN for btree
On Tue, Sep 3, 2019 at 2:19 AM Alvaro Herrera wrote: > > On 2019-Sep-03, Alexander Korotkov wrote: > > > I think patches 0001-0008 are very clear and extends our index-AM > > infrastructure in query straightforward way. I'm going to propose > > them for commit after some further polishing. > > Hmm. Why is 0001 needed? I see that 0005 introduces a call to that > function, but if attnum == 0 then it doesn't call it. Maybe it was > necessary in an older version of the patch? Regarding "attno >= 1" check I agree with you. It should be changed to assert. But "attno <= rd_index->indnkeyatts" check appears to be needed for current code already. It appears that gistproperty() can ask get_index_column_opclass() for non-key attribute. Then get_index_column_opclass() returns garbage past oidvector value. Typically get_opclass_opfamily_and_input_type() doesn't find this garbage opclass oid and gistproperty() returns null as expected. But this is bug and needs to be fixed. I'm going to push 0001 changing "attno >= 1" to assert. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
On 2019-Sep-03, Alexander Korotkov wrote: > I think patches 0001-0008 are very clear and extends our index-AM > infrastructure in query straightforward way. I'm going to propose > them for commit after some further polishing. Hmm. Why is 0001 needed? I see that 0005 introduces a call to that function, but if attnum == 0 then it doesn't call it. Maybe it was necessary in an older version of the patch? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] kNN for btree
On Mon, Sep 2, 2019 at 9:11 PM Alvaro Herrera wrote: > > On 2019-Jul-12, Nikita Glukhov wrote: > > > Attached 13th version of the patches. > > Hello Nikita, > > Can you please rebase this again? Nikita is on vacation now. Rebased patchset is attached. I think patches 0001-0008 are very clear and extends our index-AM infrastructure in query straightforward way. I'm going to propose them for commit after some further polishing. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0004-Extract-structure-BTScanState-v14.patch.gz Description: GNU Zip compressed data 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexe-v14.patch.gz Description: GNU Zip compressed data 0005-Add-kNN-support-to-btree-v14.patch.gz Description: GNU Zip compressed data 0002-Introduce-ammorderbyopfirstcol-v14.patch.gz Description: GNU Zip compressed data 0001-Fix-get_index_column_opclass-v14.patch.gz Description: GNU Zip compressed data 0006-Add-btree-distance-operators-v14.patch.gz Description: GNU Zip compressed data 0007-Remove-distance-operators-from-btree_gist-v14.patch.gz Description: GNU Zip compressed data 0009-Introduce-ammatchorderby-function-v14.patch.gz Description: GNU Zip compressed data 0008-Add-regression-tests-for-kNN-btree-v14.patch.gz Description: GNU Zip compressed data 0010-Add-btree-support-of-kNN-on-non-first-column-v14.patch.gz Description: GNU Zip compressed data 0011-Allow-ammatchorderby-to-return-pathkey-sublists-v14.patch.gz Description: GNU Zip compressed data 0012-Add-support-of-array-ops-to-btree-kNN-v14.patch.gz Description: GNU Zip compressed data
Re: [PATCH] kNN for btree
On 2019-Jul-12, Nikita Glukhov wrote: > Attached 13th version of the patches. Hello Nikita, Can you please rebase this again? Thanks, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] kNN for btree
On Sat, Jul 13, 2019 at 5:32 AM Nikita Glukhov wrote: > Attached 13th version of the patches. While moving this to the next CF, I noticed that this needs to be adjusted for the new pg_list.h API. -- Thomas Munro https://enterprisedb.com
Re: [PATCH] kNN for btree
Attached 13th version of the patches. On 08.07.2019 21:09, Alexander Korotkov wrote: I have more thoughts about planner/executor infrastructure. It appears that supporting "ORDER BY col1, col2 <-> val" is too complex for initial version of patch. Handling of "ORDER BY col" and "ORDER BY col <-> val" cases uses different code paths in optimizer. So, I'd like to limit first version of this patch to support just most simple "ORDER BY col <-> val" case. We would probably be able to enhance it even in v13 release cycle, but let's start with just simple case. I also think we can evade replacing amcanorderbyop flag with method, but introduce just new boolean flag indicating knn supports only first column. Now patches 1-8 implement only "ORDER BY col <-> const" case. ammatchorderby() is replaced with amorderbyopfirstcol flag. Patches 9-12 contain ammatchorderby() and other features, so they may not be reviewed right now. On 08.07.2019 11:07, Thomas Munro wrote: make check-world fails for me, and in tmp_install/log/install.log I see: btree_int2.c:97:9: error: implicit declaration of function 'int2dist' is invalid in C99 [-Werror,-Wimplicit-function-declaration] return int2dist(fcinfo); ^ btree_int2.c:97:9: note: did you mean 'int2_dist'? btree_int2.c:95:1: note: 'int2_dist' declared here int2_dist(PG_FUNCTION_ARGS) ^ 1 error generated. Fixed. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-get_index_column_opclass-v13.patch.gz Description: application/gzip 0002-Introduce-ammorderbyopfirstcol-v13.patch.gz Description: application/gzip 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v13.patch.gz Description: application/gzip 0004-Extract-structure-BTScanState-v13.patch.gz Description: application/gzip 0005-Add-kNN-support-to-btree-v13.patch.gz Description: application/gzip 0006-Add-btree-distance-operators-v13.patch.gz Description: application/gzip 0007-Remove-distance-operators-from-btree_gist-v13.patch.gz Description: application/gzip 0008-Add-regression-tests-for-kNN-btree-v13.patch.gz Description: application/gzip 0009-Introduce-ammatchorderby-function-v13.patch.gz Description: application/gzip 0010-Add-btree-support-of-kNN-on-non-first-column-v13.patch.gz Description: application/gzip 0011-Allow-ammatchorderby-to-return-pathkey-sublists-v13.patch.gz Description: application/gzip 0012-Add-support-of-array-ops-to-btree-kNN-v13.patch.gz Description: application/gzip
Re: [PATCH] kNN for btree
On Mon, Jul 1, 2019 at 8:47 PM Nikita Glukhov wrote: > On 01.07.2019 13:41, Thomas Munro wrote: > > On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov > > wrote: > Fixed two bugs in patches 3 and 10 (see below). > Patch 3 was extracted from the main patch 5 (patch 4 in v9). > >>> This patch no longer applies so marking Waiting on Author. > >> Attached 11th version of the patches rebased onto current master. > > Hi Nikita, > > > > Since a new Commitfest is here and this doesn't apply, could we please > > have a fresh rebase? > > Attached 12th version of the patches rebased onto the current master. I have more thoughts about planner/executor infrastructure. It appears that supporting "ORDER BY col1, col2 <-> val" is too complex for initial version of patch. Handling of "ORDER BY col" and "ORDER BY col <-> val" cases uses different code paths in optimizer. So, I'd like to limit first version of this patch to support just most simple "ORDER BY col <-> val" case. We would probably be able to enhance it even in v13 release cycle, but let's start with just simple case. I also think we can evade replacing amcanorderbyop flag with method, but introduce just new boolean flag indicating knn supports only first column. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
On Tue, Jul 2, 2019 at 5:47 AM Nikita Glukhov wrote: > Attached 12th version of the patches rebased onto the current master. Hi Nikita, make check-world fails for me, and in tmp_install/log/install.log I see: btree_int2.c:97:9: error: implicit declaration of function 'int2dist' is invalid in C99 [-Werror,-Wimplicit-function-declaration] return int2dist(fcinfo); ^ btree_int2.c:97:9: note: did you mean 'int2_dist'? btree_int2.c:95:1: note: 'int2_dist' declared here int2_dist(PG_FUNCTION_ARGS) ^ 1 error generated. -- Thomas Munro https://enterprisedb.com
Re: [PATCH] kNN for btree
On 01.07.2019 13:41, Thomas Munro wrote: On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov wrote: Fixed two bugs in patches 3 and 10 (see below). Patch 3 was extracted from the main patch 5 (patch 4 in v9). This patch no longer applies so marking Waiting on Author. Attached 11th version of the patches rebased onto current master. Hi Nikita, Since a new Commitfest is here and this doesn't apply, could we please have a fresh rebase? Attached 12th version of the patches rebased onto the current master. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-get_index_column_opclass-v12.patch.gz Description: application/gzip 0002-Introduce-ammatchorderby-function-v12.patch.gz Description: application/gzip 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v12.patch.gz Description: application/gzip 0004-Extract-structure-BTScanState-v12.patch.gz Description: application/gzip 0005-Add-kNN-support-to-btree-v12.patch.gz Description: application/gzip 0006-Add-btree-distance-operators-v12.patch.gz Description: application/gzip 0007-Remove-distance-operators-from-btree_gist-v12.patch.gz Description: application/gzip 0008-Add-regression-tests-for-kNN-btree-v12.patch.gz Description: application/gzip 0009-Allow-ammatchorderby-to-return-pathkey-sublists-v12.patch.gz Description: application/gzip 0010-Add-support-of-array-ops-to-btree-kNN-v12.patch.gz Description: application/gzip
Re: [PATCH] kNN for btree
On Tue, Mar 26, 2019 at 4:30 AM Nikita Glukhov wrote: > >> Fixed two bugs in patches 3 and 10 (see below). > >> Patch 3 was extracted from the main patch 5 (patch 4 in v9). > > > > This patch no longer applies so marking Waiting on Author. > > > Attached 11th version of the patches rebased onto current master. Hi Nikita, Since a new Commitfest is here and this doesn't apply, could we please have a fresh rebase? Thanks, -- Thomas Munro https://enterprisedb.com
Re: [PATCH] kNN for btree
On 25.03.2019 11:17, David Steele wrote: On 3/15/19 2:11 AM, Nikita Glukhov wrote: Attached 10th versions of the patches. Fixed two bugs in patches 3 and 10 (see below). Patch 3 was extracted from the main patch 5 (patch 4 in v9). This patch no longer applies so marking Waiting on Author. Attached 11th version of the patches rebased onto current master. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-get_index_column_opclass-v11.patch.gz Description: application/gzip 0002-Introduce-ammatchorderby-function-v11.patch.gz Description: application/gzip 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v11.patch.gz Description: application/gzip 0004-Extract-structure-BTScanState-v11.patch.gz Description: application/gzip 0005-Add-kNN-support-to-btree-v11.patch.gz Description: application/gzip 0006-Add-btree-distance-operators-v11.patch.gz Description: application/gzip 0007-Remove-distance-operators-from-btree_gist-v11.patch.gz Description: application/gzip 0008-Add-regression-tests-for-kNN-btree-v11.patch.gz Description: application/gzip 0009-Allow-ammatchorderby-to-return-pathkey-sublists-v11.patch.gz Description: application/gzip 0010-Add-support-of-array-ops-to-btree-kNN-v11.patch.gz Description: application/gzip
Re: Re: [PATCH] kNN for btree
On 3/15/19 2:11 AM, Nikita Glukhov wrote: Attached 10th versions of the patches. Fixed two bugs in patches 3 and 10 (see below). Patch 3 was extracted from the main patch 5 (patch 4 in v9). This patch no longer applies so marking Waiting on Author. Regards, -- -David da...@pgmasters.net
Re: [PATCH] kNN for btree
Attached 10th versions of the patches. Fixed two bugs in patches 3 and 10 (see below). Patch 3 was extracted from the main patch 5 (patch 4 in v9). On 11.03.2019 20:42, Alexander Korotkov wrote: Hi! I've some questions regarding this patchset. 1) This comment needs to be revised. Now, B-tree supports both ammatchorderby and amcanbackward. How do we guarantee that kNN is not backwards scan? /* * Only forward scan is supported with reordering. Note: we can get away * with just Asserting here because the system will not try to run the * plan backwards if ExecSupportsBackwardScan() says it won't work. * Currently, that is guaranteed because no index AMs support both * ammatchorderby and amcanbackward; if any ever do, * ExecSupportsBackwardScan() will need to consider indexorderbys * explicitly. */ Yes, there was problem with backward kNN scans: they were not disabled in ExecSupportsBackwardScan(). This can lead to incorrect results for backward fetches from cursors. Corresponding regression test is included into patch #8. And the comment was also fixed. 2) Why this should be so? EXPLAIN (COSTS OFF) SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23) ORDER BY thousand DESC, tenthous <-> 3500; QUERY PLAN - Sort Sort Key: thousand DESC, ((tenthous <-> 3500)) -> Index Only Scan using tenk1_thous_tenthous on tenk1 Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[])) (4 rows) EXPLAIN (COSTS OFF) SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23) ORDER BY thousand, tenthous <-> 3500; QUERY PLAN --- Index Only Scan using tenk1_thous_tenthous on tenk1 Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[])) Order By: (thousand AND (tenthous <-> 3500)) (3 rows) It seems that we restart bidirectional scan for each value specified in IN clause. Then why does it matter whether it is forward or backward scan? kNN scans now can only be forward, and in forward btree scans array iteration order matches the index sort order. We could determine array iteration order by ScanKey strategy, but ASC/DESC info flag is not passed now to the place of ScanKeys initialization (see ExecIndexBuildScanKeys()). ASC/DESC passing needs refactoring of whole passing of orderbyclauses/orderbyclausecols. There also was a problem in btmmatchorderby()/match_patchkey_to_indexcol(): array keys were incorrectly matched to DESC index columns. 3) What is idea of 8th patch? It doesn't seem to be really needed for subsequent 9th patch, because we anyway ignore partial match pathkeys. Then why bother producing them? Is it stub for further incremental sort? Yes, this is a kind of stub for incremental sort. But also this simplifies a bit ammatchorderby() functions, because they should not care about the length of returned pathkey list, they simply return after the first unsupported pathkey. I event think that ammacthorderby() should not depend on whether we support incremental sorting or not. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-Fix-get_index_column_opclass-v10.patch.gz Description: application/gzip 0002-Introduce-ammatchorderby-function-v10.patch.gz Description: application/gzip 0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v10.patch.gz Description: application/gzip 0004-Extract-structure-BTScanState-v10.patch.gz Description: application/gzip 0005-Add-kNN-support-to-btree-v10.patch.gz Description: application/gzip 0006-Add-btree-distance-operators-v10.patch.gz Description: application/gzip 0007-Remove-distance-operators-from-btree_gist-v10.patch.gz Description: application/gzip 0008-Add-regression-tests-for-kNN-btree-v10.patch.gz Description: application/gzip 0009-Allow-ammatchorderby-to-return-pathkey-sublists-v10.patch.gz Description: application/gzip 0010-Add-support-of-array-ops-to-btree-kNN-v10.patch.gz Description: application/gzip
Re: [PATCH] kNN for btree
Hi! I've some questions regarding this patchset. 1) This comment needs to be revised. Now, B-tree supports both ammatchorderby and amcanbackward. How do we guarantee that kNN is not backwards scan? /* * Only forward scan is supported with reordering. Note: we can get away * with just Asserting here because the system will not try to run the * plan backwards if ExecSupportsBackwardScan() says it won't work. * Currently, that is guaranteed because no index AMs support both * ammatchorderby and amcanbackward; if any ever do, * ExecSupportsBackwardScan() will need to consider indexorderbys * explicitly. */ 2) Why this should be so? EXPLAIN (COSTS OFF) SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23) ORDER BY thousand DESC, tenthous <-> 3500; QUERY PLAN - Sort Sort Key: thousand DESC, ((tenthous <-> 3500)) -> Index Only Scan using tenk1_thous_tenthous on tenk1 Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[])) (4 rows) EXPLAIN (COSTS OFF) SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23) ORDER BY thousand, tenthous <-> 3500; QUERY PLAN --- Index Only Scan using tenk1_thous_tenthous on tenk1 Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[])) Order By: (thousand AND (tenthous <-> 3500)) (3 rows) It seems that we restart bidirectional scan for each value specified in IN clause. Then why does it matter whether it is forward or backward scan? 3) What is idea of 8th patch? It doesn't seem to be really needed for subsequent 9th patch, because we anyway ignore partial match pathkeys. Then why bother producing them? Is it stub for further incremental sort? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
Attached 9th version of the patches. On 03.03.2019 12:46, Anastasia Lubennikova wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thank you for your work on this patch. Thank you for you review. Patch #1 is ready for commit. It only fixes lack of refactoring after INCLUDE index patch. Patches 2-7 are ready for committer's review. I found no significant problems in algorithm or implementation. Here are few suggestions to improve readability: 1) patch 0002. * Returned matched index clause exression. * Number of matched index column is returned in *indexcol_p. Typos in comment: exPression index columnS "exPression" is fixed. But there should be "column" because only single index column is matched. 2) patch 0002. + /* +* We allow any column of this index to match each pathkey; they +* don't have to match left-to-right as you might expect. +*/ Before refactoring this comment had a line about gist and sp-gist specific: -* We allow any column of the index to match each pathkey; they -* don't have to match left-to-right as you might expect. This is -* correct for GiST, and it doesn't matter for SP-GiST because -* that doesn't handle multiple columns anyway, and no other -* existing AMs support amcanorderbyop. We might need different -* logic in future for other implementations. Now, when the code was moved to a separate function, it is not clear why the same logic is ok for b-tree and other index methods. If I got it right, it doesn't affect the correctness of the algorithm, because b-tree specific checks are performed in another function. Still it would be better to explain it in the comment for future readers. It seems that match_orderbyop_pathkey() is simply the wrong place for this comment. I moved it into match_orderbyop_pathkeys() which is implementation of ammatchorderby() for GiST an SP-GiST. Also I added old sentence about its correctness for GiST and SP-GiST. 3) patch 0004 if (!so->distanceTypeByVal) { so->state.currDistance = PointerGetDatum(NULL); so->state.markDistance = PointerGetDatum(NULL); } Why do we reset these fields only for !distanceTypeByVal? These fields should be initialized (it is initialization, not reset) only for by-ref types because before writing a new distance values to these fields, the previous by-ref values are pfreed. The corresponding comment was added. 4) patch 0004 + * _bt_next_item() -- Advance to next tuple on current page; + * or if there's no more, try to step to the next page with data. + * + * If there are no more matching records in the given direction */ Looks like the last sentence of the comment is unfinished. Yes, "false is returned" is missing. Fixed. 5) patch 0004 _bt_start_knn_scan() so->currRightIsNearest = _bt_compare_current_dist(so, rstate, lstate); /* Reset right flag if the left item is nearer. */ right = so->currRightIsNearest; If this comment relates to the string above it? No, it relates only to string below. 'right' flag determines later the selected scan direction, so 'currRightIsNearest' should be assigned to it. This comment was rewritten. 6) patch 0004 _bt_parallel_seize() + scanPage = state == >state + ? >btps_scanPage + : >btps_knnScanPage; This code looks a bit tricke to me. Why do we need to pass BTScanState state to _bt_parallel_seize() at all? Won't it be enough to choose the page before function call and pass it? If we will pass page, then we will have to pass it through the whole function tree: _bt_parallel_seize() _bt_steppage() _bt_next_item() _bt_next_nearest() _bt_load_first_page() _bt_init_knn_scan() _bt_readnextpage() _bt_parallel_readpage() _bt_first() I decided simply to add flag 'isKnn' to BtScanState, so the code now looks like this: scanPage = state->isKnn ? >btps_scanPage : >btps_knnScanPage; I also can offer to add 'id' (0/1) to BtScanState instead, then the code will look like this: scanPage = >btps_scanPages[state->id]; 7) patch 0006 + Upgrade notes for version 1.6 + + + In version 1.6 btree_gist switched to using in-core + distance operators, and its own implementations were removed. References to + these operators in btree_gist opclasses will be updated + automatically during the extension upgrade, but if the user has created + objects referencing these operators or functions, then these objects must be + dropped manually before updating the extension. Is this comment still relevant after the latest changes? Functions are not removed, they're still in
Re: [PATCH] kNN for btree
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Hi, thank you for your work on this patch. Patch #1 is ready for commit. It only fixes lack of refactoring after INCLUDE index patch. Patches 2-7 are ready for committer's review. I found no significant problems in algorithm or implementation. Here are few suggestions to improve readability: 1) patch 0002. * Returned matched index clause exression. * Number of matched index column is returned in *indexcol_p. Typos in comment: exPression index columnS 2) patch 0002. + /* +* We allow any column of this index to match each pathkey; they +* don't have to match left-to-right as you might expect. +*/ Before refactoring this comment had a line about gist and sp-gist specific: -* We allow any column of the index to match each pathkey; they -* don't have to match left-to-right as you might expect. This is -* correct for GiST, and it doesn't matter for SP-GiST because -* that doesn't handle multiple columns anyway, and no other -* existing AMs support amcanorderbyop. We might need different -* logic in future for other implementations. Now, when the code was moved to a separate function, it is not clear why the same logic is ok for b-tree and other index methods. If I got it right, it doesn't affect the correctness of the algorithm, because b-tree specific checks are performed in another function. Still it would be better to explain it in the comment for future readers. 3) patch 0004 if (!so->distanceTypeByVal) { so->state.currDistance = PointerGetDatum(NULL); so->state.markDistance = PointerGetDatum(NULL); } Why do we reset these fields only for !distanceTypeByVal? 4) patch 0004 + * _bt_next_item() -- Advance to next tuple on current page; + * or if there's no more, try to step to the next page with data. + * + * If there are no more matching records in the given direction */ Looks like the last sentence of the comment is unfinished. 5) patch 0004 _bt_start_knn_scan() so->currRightIsNearest = _bt_compare_current_dist(so, rstate, lstate); /* Reset right flag if the left item is nearer. */ right = so->currRightIsNearest; If this comment relates to the string above it? 6) patch 0004 _bt_parallel_seize() + scanPage = state == >state + ? >btps_scanPage + : >btps_knnScanPage; This code looks a bit tricke to me. Why do we need to pass BTScanState state to _bt_parallel_seize() at all? Won't it be enough to choose the page before function call and pass it? 7) patch 0006 + Upgrade notes for version 1.6 + + + In version 1.6 btree_gist switched to using in-core + distance operators, and its own implementations were removed. References to + these operators in btree_gist opclasses will be updated + automatically during the extension upgrade, but if the user has created + objects referencing these operators or functions, then these objects must be + dropped manually before updating the extension. Is this comment still relevant after the latest changes? Functions are not removed, they're still in contrib. Patches #8 and #9 need more review and tests. I'll try to give a feedback on them in the week. P.S. many thanks for splitting the code into separate patches. It made review a lot easier. The new status of this patch is: Waiting on Author
Re: [PATCH] kNN for btree
On Wed, Feb 20, 2019 at 2:18 PM Nikita Glukhov wrote: > On 04.02.2019 8:35, Michael Paquier wrote: > This patch set needs a rebase because of conflicts caused by the > recent patches for pluggable storage. Hi Nikita, >From the department of trivialities: according to cfbot the documentation doesn't build. Looks like you have some cases of , but these days you have to write out (or whatever) in full. -- Thomas Munro https://enterprisedb.com
Re: [PATCH] kNN for btree
On Fri, Jan 11, 2019 at 04:01:51PM +0300, Nikita Glukhov wrote: > All new distance functions except oiddist() are not leakproof, > so I had to relax condition in opr_sanity.sql test: This patch set needs a rebase because of conflicts caused by the recent patches for pluggable storage. -- Michael signature.asc Description: PGP signature
Re: [PATCH] kNN for btree
Hi! I've couple more notes regarding this patch. 1) There are two loops over scan key determining scan strategy: existing loop in _bt_first(), and in new function _bt_select_knn_search_strategy(). It's kind of redundant that we've to process scan keys twice for knn searches. I think scan keys processing should be merged into one loop. 2) We're not supporting knn ordering only using the first key. Supporting multiple knn keys would require significant reword of B-tree traversal algorithm making it closer to GiST and SP-GiST. So, I think supporting only one knn key is reasonable restriction, at least for now. But we could support single-key knn ordering in more cases. For instance, knn search in "SELECT * FROM tbl WHERE a = const1 ORDER BY b <-> const2" could benefit from (a, b) B-tree index. So, we can support knn search on n-th key if there are equality scan keys for [1, n-1] index keys. I've already discussed these issues with Nikita personally. Hopefully, new version will be published soon. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
On Sun, Dec 30, 2018 at 1:19 AM Alexander Korotkov wrote: > On Thu, Dec 27, 2018 at 5:46 AM Alexander Korotkov > wrote: > > * 0006-Remove-distance-operators-from-btree_gist-v04.patch > > > > I see you provide btree_gist--1.6.sql and remove btree_gist--1.2.sql. > > Note, that in order to better checking of extension migrations, we're > > now providing just migration script to new version. So, everybody > > installing new version will go through the migration. However, in > > this particular case we've mass deletion of former extension objects. > > So, I think this case should be an exception to the rules. And it's > > good to provide new version of extension script in this case. Other > > opinions? > > I also note that you've removed implementation of distance functions > from btree_gist. But during pg_upgrade extensions are moved "as is". > Not just CREATE EXTENSION command is dumped, but the whole extension > content. pg_upgrade'd instances would have old version of extension > metadata with new .so until ALTER EXTENSION UPDATE. So, user would get > errors about missed function in .so until updates the extension. > > We're typically evade this by inclusion of old functions into new .so. > Then user can work normally before extension update. In this > particular case, we can leave the distance functions in the .so, but > make them just wrappers over core functions. I've run regression tests with patch applied and opr_sanity showed some errors: 1) date_dist_timestamptz(), timestamp_dist_timestamptz(), timestamptz_dist_date(), timestamptz_dist_timestamp() should be stable, not immutable. These functions use timezone during conversion. 2) date_dist_timestamp(), date_dist_timestamptz(), timestamp_dist_date(), timestamp_dist_timestamptz(), timestamptz_dist_date(), timestamptz_dist_timestamp() should be not leafproof. These functions perform conversion, which might fail in corner case. So, this error should be considered as a leak. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
Hi! On Fri, Nov 30, 2018 at 3:02 PM Nikita Glukhov wrote: > On 29.11.2018 18:24, Dmitry Dolgov wrote: > >> On Wed, Sep 26, 2018 at 5:41 PM Nikita Glukhov > >> wrote: > >> > >> Attached 3rd version of the patches rebased onto the current master. > >> > >> Changes from the previous version: > >> - Added support of INCLUDE columns to get_index_column_opclass() (1st > >> patch). > >> - Added parallel kNN scan support. > >> - amcanorderbyop() was transformed into ammatchorderby() which takes a > >> List of > >>PathKeys and checks each of them with new function > >> match_orderbyop_pathkey() > >>extracted from match_pathkeys_to_index(). I think that this design can > >> be > >>used in the future to support a mix of ordinary and order-by-op > >> PathKeys, > >>but I am not sure. > > Hi, > > > > Unfortunately, the patch has some conflicts, could you rebase it? In the > > meantime I'll move it to the next CF, hoping to have more reviewers for this > > item. > > Attached 4th version of the patches rebased onto the current master. I think this patchset in general has a good shape. After some rounds of review, it might be committed during January commitfest. For now, I have following notes. * 0002-Introduce-ammatchorderby-function-v04.patch I think match_orderbyop_pathkey() and match_orderbyop_pathkeys() deserve some high-level commends describing what these functions are expected to do. * 0004-Add-kNN-support-to-btree-v04.patch + + FIXME!!! + To implement the distance ordered (nearest-neighbor) search, we only need + to define a distance operator (usually it called -) with a correpsonding + operator family for distance comparison in the index's operator class. + These operators must satisfy the following assumptions for all non-null + values A,B,C of the datatype: + + A - B = B - A symmetric law + if A = B, then A - C = B - C distance equivalence + if (A = B and B = C) or (A = B and B = C), + then A - B = A - C monotonicity + What exactly you're going to fix here? I think you at least should provide a proper formatting to this paragraph * 0006-Remove-distance-operators-from-btree_gist-v04.patch I see you provide btree_gist--1.6.sql and remove btree_gist--1.2.sql. Note, that in order to better checking of extension migrations, we're now providing just migration script to new version. So, everybody installing new version will go through the migration. However, in this particular case we've mass deletion of former extension objects. So, I think this case should be an exception to the rules. And it's good to provide new version of extension script in this case. Other opinions? A see btree_gist--1.5--1.6.sql contains a sophisticated query updating extension operators to builtin operators. However, what do you think about just long sequence of ALTER OPERATOR FAMILY commands removing old operators and adding new operators? It would be longer, but more predictable and easier for understanding. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] kNN for btree
> On Wed, Sep 26, 2018 at 5:41 PM Nikita Glukhov > wrote: > > Attached 3rd version of the patches rebased onto the current master. > > Changes from the previous version: > - Added support of INCLUDE columns to get_index_column_opclass() (1st patch). > - Added parallel kNN scan support. > - amcanorderbyop() was transformed into ammatchorderby() which takes a List of > PathKeys and checks each of them with new function match_orderbyop_pathkey() > extracted from match_pathkeys_to_index(). I think that this design can be > used in the future to support a mix of ordinary and order-by-op PathKeys, > but I am not sure. Hi, Unfortunately, the patch has some conflicts, could you rebase it? In the meantime I'll move it to the next CF, hoping to have more reviewers for this item.