Re: [HACKERS] workaround for expensive KNN?

Oleg Bartunov Fri, 08 Apr 2011 08:25:46 -0700

Hans,

what if you create index (price,title) ?



On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:

hello ...

i got that one ...

   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, say, 1 mio 
rows which have "handy" or so in it (1 mio out of 11 mio or so). you are moving 
out from one specific place.

my maths is like that:
        11 mio in total
        1 mio matching "iphone"
        cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 <-> 4 as distance;
distance
----------
       6
(1 row)

what does "buffers true" in your case say?

many thanks,

        hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:

Probably, you miss two-columnt index. From my early post:
http://www.sai.msu.su/~megera/wiki/knngist

=# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
to_tsvector('french',address));
=# SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) 
AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point 
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
  id    |                           address                           |         
dist 
---------+-------------------------------------------------------------+---------------------
 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
2.32488941293945e-05
4356328 | r Champ de Mars 75007 PARIS                                 |  
0.00421854756964406
5200167 | Champ De Mars 75007 Paris                                   |  
0.00453564562587288
9301676 | Champ de Mars, 75007 Paris,                                 |  
0.00453564562587288
2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France     |  
0.00624152097590896
1923818 | Champ de Mars Paris, France                                 |  
0.00838214733539654
5165953 | 39 Rue Champ De Mars Paris, France                          |  
0.00874410234569529
7395870 | 39 Rue Champ De Mars Paris, France                          |  
0.00874410234569529
4358671 | 32 Rue Champ De Mars Paris, France                          |  
0.00876089659276339
1923742 | 12 rue du Champ de Mars Paris, France                       |  
0.00876764731845995
(10 rows)

Time: 7.859 ms

=# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
'(2.29470491409302,48.858263472125)'::point
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;

                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit
  ->  Index Scan using spots_idx on spots
        Index Cond: ((coordinates >< 
'(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, 
address) @@ '''mar'''::tsquery))
(3 rows)


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:

hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;
                                                                          QUERY 
PLAN

-----------------------------------------------------------------------------------------------------------------------------
--------------------------------------
Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
 Buffers: shared hit=9 read=5004
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
       Order By: (int_price <-> 0::bigint)
       Buffers: shared hit=9 read=5004
Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') 
ORDER BY int_price <-> 0 LIMIT 10;
                                                                          QUERY 
PLAN

-----------------------------------------------------------------------------------------------------------------------------
-------------------------------------
Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
 Buffers: shared hit=3 read=2316
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
       Order By: (int_price <-> 0::bigint)
       Buffers: shared hit=3 read=2316
Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') 
ORDER BY int_price <-> 0 LIMIT 1;
                                                                       QUERY 
PLAN

-----------------------------------------------------------------------------------------------------------------------------
-------------------------------
Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
 Buffers: shared hit=1 read=1577
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1)
       Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
       Order By: (int_price <-> 0::bigint)
       Buffers: shared hit=1 read=1577
Total runtime: 28.558 ms
(7 rows)


under any circumstances - there is no way to reduce the number of buffers 
needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a 
single block from disk you will die a painful random I/O death.
is there any alternative which does not simply die when i try to achieve what i 
want?

the use case is quite simple: all products with a certain word (10 cheapest or 
so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.

        many thanks,

                hans

--
Cybertec Sch?nig & Sch?nig GmbH
Gr?hrm?hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Cybertec Sch?nig & Sch?nig GmbH
Gr?hrm?hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] workaround for expensive KNN?

Reply via email to