Re: [HACKERS] PoC: Partial sort

Andres Freund Sat, 14 Dec 2013 04:55:48 -0800

Hi,

Cool stuff.


On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
> Currently when we need to get ordered result from table we have to choose
> one of two approaches: get results from index in exact order we need or do
> sort of tuples. However, it could be useful to mix both methods: get
> results from index in order which partially meets our requirements and do
> rest of work from heap.

> ------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> time=0.097..0.099 rows=10 loops=1)
>    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> time=0.096..0.097 rows=10 loops=1)
>          Sort Key: v1, v2
>          Sort Method: top-N heapsort  Memory: 25kB
>          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
>  Total runtime: 0.125 ms
> (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

> *partial-knn-1.patch*
> 
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

> See an example.
> 
> create table test as (select id, polygon(3+(random()*10)::int,
> circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
> generate_series(1,1000000) id);
> create index test_idx on test using gist (p);
> 
> We can get results ordered by distance from polygon to point.
> 
> postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
> point(0.5,0.5) limit 10;

> ----------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
> rows=10 loops=1)
>    ->  Index Scan using test_idx on test  (cost=0.29..157672.29
> rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
>          Order By: (p <-> '(0.5,0.5)'::point)
>  Total runtime: 0.305 ms
> (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?
Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

Reply via email to