Hi, > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual > > > time=0.097..0.099 rows=10 loops=1) > > > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual > > > time=0.096..0.097 rows=10 loops=1) > > > Sort Key: v1, v2 > > > Sort Method: top-N heapsort Memory: 25kB > > > -> Index Scan using test_v1_idx on test (cost=0.42..47604.42 > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1) > > > Total runtime: 0.125 ms > > > (6 rows) > > > > Is that actually all that beneficial when sorting with a bog standard > > qsort() since that doesn't generally benefit from data being pre-sorted? > > I think we might need to switch to a different algorithm to really > > benefit from mostly pre-sorted input. > > > > In this patch I don't do full sort of dataset. For instance, index returns > data ordered by first column and we need to order them also by second > column.
Ah, that makes sense. > But, I don't think we should expect pre-sorted values of second column > inside a group. Yes, if you do it that way, there doesn't seem to any need to assume that any more than we usually do. I think you should make the explain output reflect the fact that we're assuming v1 is presorted and just sorting v2. I'd be happy enough with: Sort Key: v1, v2 Partial Sort: v2 or even just "Partial Sort Key: [v1,] v2" but I am sure others disagree. > > > *partial-knn-1.patch* > > Rechecking from the heap means adding a sort node though, which I don't > > see here? Or am I misunderstanding something? > KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked > inside GiST and reinserted into same RB-tree. It appears to be much easier > implementation for PoC and also looks very efficient. I'm not sure what is > actually right design for it. This is what I like to discuss. I don't have enough clue about gist to say wether it's the right design, but it doesn't look wrong to my eyes. It'd probably be useful to export the knowledge that we are rechecking and how often that happens to the outside. While I didn't really look into the patch, I noticed in passing that you pass a all_dead variable to heap_hot_search_buffer without using the result - just pass NULL instead, that performs a bit less work. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers