On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <and...@2ndquadrant.com>wrote:
> > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > > time=0.097..0.099 rows=10 loops=1)
> > > > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > > time=0.096..0.097 rows=10 loops=1)
> > > > Sort Key: v1, v2
> > > > Sort Method: top-N heapsort Memory: 25kB
> > > > -> Index Scan using test_v1_idx on test
> > > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > > > Total runtime: 0.125 ms
> > > > (6 rows)
> > >
> > > Is that actually all that beneficial when sorting with a bog standard
> > > qsort() since that doesn't generally benefit from data being
> > > I think we might need to switch to a different algorithm to really
> > > benefit from mostly pre-sorted input.
> > >
> > In this patch I don't do full sort of dataset. For instance, index
> > data ordered by first column and we need to order them also by second
> > column.
> Ah, that makes sense.
> > But, I don't think we should expect pre-sorted values of second column
> > inside a group.
> Yes, if you do it that way, there doesn't seem to any need to assume
> that any more than we usually do.
> I think you should make the explain output reflect the fact that we're
> assuming v1 is presorted and just sorting v2. I'd be happy enough with:
> Sort Key: v1, v2
> Partial Sort: v2
> or even just
> "Partial Sort Key: [v1,] v2"
> but I am sure others disagree.
Sure, I just didn't change explain output yet. It should look like what you
> > > *partial-knn-1.patch*
> > > Rechecking from the heap means adding a sort node though, which I don't
> > > see here? Or am I misunderstanding something?
> > KNN-GiST contain RB-tree of scanned items. In this patch item is
> > inside GiST and reinserted into same RB-tree. It appears to be much
> > implementation for PoC and also looks very efficient. I'm not sure what
> > actually right design for it. This is what I like to discuss.
> I don't have enough clue about gist to say wether it's the right design,
> but it doesn't look wrong to my eyes. It'd probably be useful to export
> the knowledge that we are rechecking and how often that happens to the
> While I didn't really look into the patch, I noticed in passing that you
> pass a all_dead variable to heap_hot_search_buffer without using the
> result - just pass NULL instead, that performs a bit less work.
Useful notice, thanks.
With best regards,