> > >  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > time=0.097..0.099 rows=10 loops=1)
> > >    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > time=0.096..0.097 rows=10 loops=1)
> > >          Sort Key: v1, v2
> > >          Sort Method: top-N heapsort  Memory: 25kB
> > >          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > >  Total runtime: 0.125 ms
> > > (6 rows)
> >
> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column.

Ah, that makes sense.

> But, I don't think we should expect pre-sorted values of second column
> inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
"Partial Sort Key: [v1,] v2"
but I am sure others disagree.

> > > *partial-knn-1.patch*

> > Rechecking from the heap means adding a sort node though, which I don't
> > see here? Or am I misunderstanding something?

> KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
> inside GiST and reinserted into same RB-tree. It appears to be much easier
> implementation for PoC and also looks very efficient. I'm not sure what is
> actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.


Andres Freund

 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to