On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <and...@2ndquadrant.com>wrote:
> Hi, > > > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual > > > > time=0.097..0.099 rows=10 loops=1) > > > > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual > > > > time=0.096..0.097 rows=10 loops=1) > > > > Sort Key: v1, v2 > > > > Sort Method: top-N heapsort Memory: 25kB > > > > -> Index Scan using test_v1_idx on test > (cost=0.42..47604.42 > > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1) > > > > Total runtime: 0.125 ms > > > > (6 rows) > > > > > > Is that actually all that beneficial when sorting with a bog standard > > > qsort() since that doesn't generally benefit from data being > pre-sorted? > > > I think we might need to switch to a different algorithm to really > > > benefit from mostly pre-sorted input. > > > > > > > In this patch I don't do full sort of dataset. For instance, index > returns > > data ordered by first column and we need to order them also by second > > column. > > Ah, that makes sense. > > > But, I don't think we should expect pre-sorted values of second column > > inside a group. > > Yes, if you do it that way, there doesn't seem to any need to assume > that any more than we usually do. > > I think you should make the explain output reflect the fact that we're > assuming v1 is presorted and just sorting v2. I'd be happy enough with: > Sort Key: v1, v2 > Partial Sort: v2 > or even just > "Partial Sort Key: [v1,] v2" > but I am sure others disagree. > Sure, I just didn't change explain output yet. It should look like what you propose. > > > *partial-knn-1.patch* > > > > Rechecking from the heap means adding a sort node though, which I don't > > > see here? Or am I misunderstanding something? > > > KNN-GiST contain RB-tree of scanned items. In this patch item is > rechecked > > inside GiST and reinserted into same RB-tree. It appears to be much > easier > > implementation for PoC and also looks very efficient. I'm not sure what > is > > actually right design for it. This is what I like to discuss. > > I don't have enough clue about gist to say wether it's the right design, > but it doesn't look wrong to my eyes. It'd probably be useful to export > the knowledge that we are rechecking and how often that happens to the > outside. > While I didn't really look into the patch, I noticed in passing that you > pass a all_dead variable to heap_hot_search_buffer without using the > result - just pass NULL instead, that performs a bit less work. Useful notice, thanks. ------ With best regards, Alexander Korotkov.