Hi, Cool stuff.
On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote: > Currently when we need to get ordered result from table we have to choose > one of two approaches: get results from index in exact order we need or do > sort of tuples. However, it could be useful to mix both methods: get > results from index in order which partially meets our requirements and do > rest of work from heap. > ------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual > time=0.097..0.099 rows=10 loops=1) > -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual > time=0.096..0.097 rows=10 loops=1) > Sort Key: v1, v2 > Sort Method: top-N heapsort Memory: 25kB > -> Index Scan using test_v1_idx on test (cost=0.42..47604.42 > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1) > Total runtime: 0.125 ms > (6 rows) Is that actually all that beneficial when sorting with a bog standard qsort() since that doesn't generally benefit from data being pre-sorted? I think we might need to switch to a different algorithm to really benefit from mostly pre-sorted input. > *partial-knn-1.patch* > > KNN-GiST provides ability to get ordered results from index, but this order > is based only on index information. For instance, GiST index contains > bounding rectangles for polygons, and we can't get exact distance to > polygon from index (similar situation is in PostGIS). In attached patch, > GiST distance method can set recheck flag (similar to consistent method). > This flag means that distance method returned lower bound of distance and > we should recheck it from heap. > See an example. > > create table test as (select id, polygon(3+(random()*10)::int, > circle(point(random(), random()), 0.0003 + random()*0.001)) as p from > generate_series(1,1000000) id); > create index test_idx on test using gist (p); > > We can get results ordered by distance from polygon to point. > > postgres=# select id, p <-> point(0.5,0.5) from test order by p <-> > point(0.5,0.5) limit 10; > ---------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230 > rows=10 loops=1) > -> Index Scan using test_idx on test (cost=0.29..157672.29 > rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1) > Order By: (p <-> '(0.5,0.5)'::point) > Total runtime: 0.305 ms > (4 rows) Rechecking from the heap means adding a sort node though, which I don't see here? Or am I misunderstanding something? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers