[HACKERS] workaround for expensive KNN?

PostgreSQL - Hans-Jürgen Schönig Fri, 08 Apr 2011 06:06:45 -0700

hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:


test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;
                                                                            
QUERY PLAN                                       
                                      
-----------------------------------------------------------------------------------------------------------------------------
--------------------------------------
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
   Buffers: shared hit=9 read=5004
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
         Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
         Order By: (int_price <-> 0::bigint)
         Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 10;
                                                                            
QUERY PLAN                                       
                                     
-----------------------------------------------------------------------------------------------------------------------------
-------------------------------------
 Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
   Buffers: shared hit=3 read=2316
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
         Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
         Order By: (int_price <-> 0::bigint)
         Buffers: shared hit=3 read=2316
 Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 1;
                                                                         QUERY 
PLAN                                          
                               
-----------------------------------------------------------------------------------------------------------------------------
-------------------------------
 Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
   Buffers: shared hit=1 read=1577
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1)
         Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
         Order By: (int_price <-> 0::bigint)
         Buffers: shared hit=1 read=1577
 Total runtime: 28.558 ms
(7 rows)


under any circumstances - there is no way to reduce the number of buffers 
needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a 
single block from disk you will die a painful random I/O death.
is there any alternative which does not simply die when i try to achieve what i 
want?

the use case is quite simple: all products with a certain word (10 cheapest or 
so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.

        many thanks,

                hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] workaround for expensive KNN?

Reply via email to