subject:"\[HACKERS\] workaround for expensive KNN\?"

Re: [HACKERS] workaround for expensive KNN?

2011-04-11 Thread PostgreSQL - Hans-Jürgen Schönig

price has a problem :(.
"iphone" can be a 20 cents bag or a sticker or a 900 euro thing signed by 
whoever ...
so, words and the sort-number / price are not related in anyway. price is in 
this case no way to narrow down the problem (e.g. evaluate first or so).

many thanks,

hans


On Apr 8, 2011, at 5:25 PM, Oleg Bartunov wrote:

> Hans,
> 
> what if you create index (price,title) ?
> 
> 
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> 
>> hello ...
>> 
>> i got that one ...
>> 
>>   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
>> title), int_price)
>> 
>> so, i have a combined index on text + number.
>> to me the plan seems fine ... it looks like a prober KNN traversal.
>> the difference between my plan and your plan seems to be the fact that i 
>> have, say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or 
>> so). you are moving out from one specific place.
>> 
>> my maths is like that:
>>  11 mio in total
>>  1 mio matching "iphone"
>>  cheapest / most expensive 10 out of this mio needed.
>> 
>> operator classes are all nice and in place:
>> 
>> SELECT 10 <-> 4 as distance;
>> distance
>> --
>>   6
>> (1 row)
>> 
>> what does "buffers true" in your case say?
>> 
>> many thanks,
>> 
>>  hans
>> 
>> 
>> On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:
>> 
>>> Probably, you miss two-columnt index. From my early post:
>>> http://www.sai.msu.su/~megera/wiki/knngist
>>> 
>>> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
>>> to_tsvector('french',address));
>>> =# SELECT id, address,  (coordinates <-> 
>>> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
>>> coordinates >< '(2.29470491409302,48.858263472125)'::point AND 
>>> to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>>  id|   address   |  
>>>dist 
>>> -+-+-
>>> 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
>>> 2.32488941293945e-05
>>> 4356328 | r Champ de Mars 75007 PARIS |  
>>> 0.00421854756964406
>>> 5200167 | Champ De Mars 75007 Paris   |  
>>> 0.00453564562587288
>>> 9301676 | Champ de Mars, 75007 Paris, |  
>>> 0.00453564562587288
>>> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
>>> 0.00624152097590896
>>> 1923818 | Champ de Mars Paris, France |  
>>> 0.00838214733539654
>>> 5165953 | 39 Rue Champ De Mars Paris, France  |  
>>> 0.00874410234569529
>>> 7395870 | 39 Rue Champ De Mars Paris, France  |  
>>> 0.00874410234569529
>>> 4358671 | 32 Rue Champ De Mars Paris, France  |  
>>> 0.00876089659276339
>>> 1923742 | 12 rue du Champ de Mars Paris, France   |  
>>> 0.00876764731845995
>>> (10 rows)
>>> 
>>> Time: 7.859 ms
>>> 
>>> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point
>>> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>>> 
>>>   QUERY PLAN
>>> --
>>> Limit
>>>  ->  Index Scan using spots_idx on spots
>>>Index Cond: ((coordinates >< 
>>> '(2.29470491409302,48.858263472125)'::point) AND 
>>> (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
>>> (3 rows)
>>> 
>>> 
>>> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
>>> 
 hello all ...
 
 given oleg's posting before i also wanted to fire up some KNN related 
 question.
 let us consider a simple example. i got some million lines and i want all 
 rows matching a tsquery sorted by price.
 i did some tests:
 
 test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
 product.t_product WHERE to_tsvector('german', title) @@ 
 to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
  
 QUERY PLAN
 
 -
 --
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual 
 time=36391.717..45542.590 rows=10 loops=1)
 Buffers: shared hit=9 read=5004
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
 (cost=0.00..13251.91 rows=3224 width=16) (actual time=
 36391.715..45542.573 rows=10 loops=1)
   Index Cond: (to_tsvector('german'::regconfig, title) @@ 
 '''iphon'''::

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread Oleg Bartunov


Oops, my previous example was fromm early prototype :)  I just
recreated test environment for 9.1:

knn=# select count(*) from spots;
 count 


 908846
(1 row)


knn=# explain (analyze true, buffers true) SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots 
WHERE to_tsvector('french',address) @@ to_tsquery('french','mars') 
ORDER BY coordinates <-> '(2.29470491409302,48.858263472125)'::point

LIMIT 10;
   QUERY PLAN 


 Limit  (cost=0.00..33.63 rows=10 width=58) (actual time=1.541..1.875 rows=10 
loops=1)
   Buffers: shared hit=251
   ->  Index Scan using spots_idx on spots  (cost=0.00..15279.12 rows=4544 
width=58) (actual time=1.540..1.874 rows=10 loops=1)
 Index Cond: (to_tsvector('french'::regconfig, address) @@ 
'''mar'''::tsquery)
 Order By: (coordinates <-> '(2.29470491409302,48.858263472125)'::point)
 Buffers: shared hit=251
 Total runtime: 1.905 ms
(7 rows)

Time: 2.372 ms


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:


hello ...

i got that one ...

   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, say, 1 mio 
rows which have "handy" or so in it (1 mio out of 11 mio or so). you are moving 
out from one specific place.

my maths is like that:
11 mio in total
1 mio matching "iphone"
cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 <-> 4 as distance;
distance
--
   6
(1 row)

what does "buffers true" in your case say?

many thanks,

hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:


Probably, you miss two-columnt index. From my early post:
http://www.sai.msu.su/~megera/wiki/knngist

=# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
to_tsvector('french',address));
=# SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) 
AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point 
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
  id|   address   | 
dist 
-+-+-
 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
2.32488941293945e-05
4356328 | r Champ de Mars 75007 PARIS |  
0.00421854756964406
5200167 | Champ De Mars 75007 Paris   |  
0.00453564562587288
9301676 | Champ de Mars, 75007 Paris, |  
0.00453564562587288
2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
0.00624152097590896
1923818 | Champ de Mars Paris, France |  
0.00838214733539654
5165953 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
7395870 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
4358671 | 32 Rue Champ De Mars Paris, France  |  
0.00876089659276339
1923742 | 12 rue du Champ de Mars Paris, France   |  
0.00876764731845995
(10 rows)

Time: 7.859 ms

=# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
'(2.29470491409302,48.858263472125)'::point
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;

   QUERY PLAN
--
Limit
  ->  Index Scan using spots_idx on spots
Index Cond: ((coordinates >< 
'(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, 
address) @@ '''mar'''::tsquery))
(3 rows)


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:


hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;
  QUERY 
PLAN

-
--
Limit  (cost=0.00..41.11 rows=10 wi

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig

hello ...

i got that one ...

"idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, 
say, 1 mio rows which have "handy" or so in it (1 mio out of 11 mio or so). you 
are moving out from one specific place.

my maths is like that:
11 mio in total
1 mio matching "iphone"
cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 <-> 4 as distance;
 distance 
--
6
(1 row)

what does "buffers true" in your case say?

many thanks,

hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:

> Probably, you miss two-columnt index. From my early post:
> http://www.sai.msu.su/~megera/wiki/knngist
> 
> =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
> to_tsvector('french',address));
> =# SELECT id, address,  (coordinates <-> 
> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE 
> coordinates >< '(2.29470491409302,48.858263472125)'::point AND 
> to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
>   id|   address   |   
>   dist 
> -+-+-
>  366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
> 2.32488941293945e-05
> 4356328 | r Champ de Mars 75007 PARIS |  
> 0.00421854756964406
> 5200167 | Champ De Mars 75007 Paris   |  
> 0.00453564562587288
> 9301676 | Champ de Mars, 75007 Paris, |  
> 0.00453564562587288
> 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
> 0.00624152097590896
> 1923818 | Champ de Mars Paris, France |  
> 0.00838214733539654
> 5165953 | 39 Rue Champ De Mars Paris, France  |  
> 0.00874410234569529
> 7395870 | 39 Rue Champ De Mars Paris, France  |  
> 0.00874410234569529
> 4358671 | 32 Rue Champ De Mars Paris, France  |  
> 0.00876089659276339
> 1923742 | 12 rue du Champ de Mars Paris, France   |  
> 0.00876764731845995
> (10 rows)
> 
> Time: 7.859 ms
> 
> =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
> '(2.29470491409302,48.858263472125)'::point
> AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
> 
>QUERY PLAN
> --
> Limit
>   ->  Index Scan using spots_idx on spots
> Index Cond: ((coordinates >< 
> '(2.29470491409302,48.858263472125)'::point) AND 
> (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
> (3 rows)
> 
> 
> On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:
> 
>> hello all ...
>> 
>> given oleg's posting before i also wanted to fire up some KNN related 
>> question.
>> let us consider a simple example. i got some million lines and i want all 
>> rows matching a tsquery sorted by price.
>> i did some tests:
>> 
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>> product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
>> 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
>>   
>> QUERY PLAN
>> 
>> -
>> --
>> Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
>> rows=10 loops=1)
>>  Buffers: shared hit=9 read=5004
>>  ->  Index Scan using idx_product_t_product_titleprice on t_product  
>> (cost=0.00..13251.91 rows=3224 width=16) (actual time=
>> 36391.715..45542.573 rows=10 loops=1)
>>Index Cond: (to_tsvector('german'::regconfig, title) @@ 
>> '''iphon'''::tsquery)
>>Order By: (int_price <-> 0::bigint)
>>Buffers: shared hit=9 read=5004
>> Total runtime: 45542.676 ms
>> (7 rows)
>> 
>> test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
>> product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
>> 'handy') ORDER BY int_price <-> 0 LIMIT 10;
>>   
>> QUERY PLAN
>> 
>> -
>> -
>> Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
>> rows=10 loops=1)

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread Tom Lane

Oleg Bartunov  writes:
> what if you create index (price,title) ?

I think that SELECT ... WHERE ... ORDER BY ... LIMIT is basically an
intractable problem.  We've recognized the difficulty in connection with
btree indexes for a long time, and there is no reason at all to think
that KNNGist will somehow magically dodge it.  You can either visit
*all* of the rows satisfying WHERE (and then sort them), or you can
visit the rows in ORDER BY order and hope that you find enough of them
satisfying the WHERE in a reasonable amount of time.  Either of these
strategies loses badly in many real-world cases.

Maybe with some sort of fuzzy notion of ordering it'd be possible to go
faster, but as long as you insist on an exact ORDER BY result, I don't
see any way out of it.

One way to be fuzzy is to introduce a maximum search distance:

SELECT ... WHERE x < limit AND other-conditions ORDER BY x LIMIT n

which essentially works by limiting the damage in the visit-all-the-rows
approach.  Hans didn't do that in his example, but I wonder how much
it'd help (and whether the existing GIST support is adequate for it).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread Oleg Bartunov


Hans,

what if you create index (price,title) ?


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:


hello ...

i got that one ...

   "idx_product_t_product_titleprice" gist (to_tsvector('german'::regconfig, 
title), int_price)

so, i have a combined index on text + number.
to me the plan seems fine ... it looks like a prober KNN traversal.
the difference between my plan and your plan seems to be the fact that i have, say, 1 mio 
rows which have "handy" or so in it (1 mio out of 11 mio or so). you are moving 
out from one specific place.

my maths is like that:
11 mio in total
1 mio matching "iphone"
cheapest / most expensive 10 out of this mio needed.

operator classes are all nice and in place:

SELECT 10 <-> 4 as distance;
distance
--
   6
(1 row)

what does "buffers true" in your case say?

many thanks,

hans


On Apr 8, 2011, at 3:22 PM, Oleg Bartunov wrote:


Probably, you miss two-columnt index. From my early post:
http://www.sai.msu.su/~megera/wiki/knngist

=# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
to_tsvector('french',address));
=# SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) 
AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point 
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
  id|   address   | 
dist 
-+-+-
 366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
2.32488941293945e-05
4356328 | r Champ de Mars 75007 PARIS |  
0.00421854756964406
5200167 | Champ De Mars 75007 Paris   |  
0.00453564562587288
9301676 | Champ de Mars, 75007 Paris, |  
0.00453564562587288
2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
0.00624152097590896
1923818 | Champ de Mars Paris, France |  
0.00838214733539654
5165953 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
7395870 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
4358671 | 32 Rue Champ De Mars Paris, France  |  
0.00876089659276339
1923742 | 12 rue du Champ de Mars Paris, France   |  
0.00876764731845995
(10 rows)

Time: 7.859 ms

=# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< 
'(2.29470491409302,48.858263472125)'::point
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;

   QUERY PLAN
--
Limit
  ->  Index Scan using spots_idx on spots
Index Cond: ((coordinates >< 
'(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, 
address) @@ '''mar'''::tsquery))
(3 rows)


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:


hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;
  QUERY 
PLAN

-
--
Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
 Buffers: shared hit=9 read=5004
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
   Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
   Order By: (int_price <-> 0::bigint)
   Buffers: shared hit=9 read=5004
Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') 
ORDER BY int_price <-> 0 LIMIT 10;
  QUERY 
PLAN

-
-
Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
 Buffers: shared hit=3 read=2316
 ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.0

Re: [HACKERS] workaround for expensive KNN?

2011-04-08 Thread Oleg Bartunov


Probably, you miss two-columnt index. From my early post:
http://www.sai.msu.su/~megera/wiki/knngist

=# CREATE INDEX spots_idx ON spots USING knngist (coordinates, 
to_tsvector('french',address));
=# SELECT id, address,  (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist 
FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point 
AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;
   id|   address   | dist 
-+-+-

  366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 
2.32488941293945e-05
 4356328 | r Champ de Mars 75007 PARIS |  
0.00421854756964406
 5200167 | Champ De Mars 75007 Paris   |  
0.00453564562587288
 9301676 | Champ de Mars, 75007 Paris, |  
0.00453564562587288
 2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France |  
0.00624152097590896
 1923818 | Champ de Mars Paris, France |  
0.00838214733539654
 5165953 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
 7395870 | 39 Rue Champ De Mars Paris, France  |  
0.00874410234569529
 4358671 | 32 Rue Champ De Mars Paris, France  |  
0.00876089659276339
 1923742 | 12 rue du Champ de Mars Paris, France   |  
0.00876764731845995
(10 rows)

Time: 7.859 ms

=# EXPLAIN (COSTS OFF) 
SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point

AND to_tsvector('french',address) @@ to_tsquery('french','mars')  LIMIT 10;

QUERY PLAN
--
 Limit
   ->  Index Scan using spots_idx on spots
 Index Cond: ((coordinates >< 
'(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, 
address) @@ '''mar'''::tsquery))
(3 rows)


On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:


hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;
   
QUERY PLAN

-
--
Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
  Buffers: shared hit=9 read=5004
  ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
Order By: (int_price <-> 0::bigint)
Buffers: shared hit=9 read=5004
Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') 
ORDER BY int_price <-> 0 LIMIT 10;
   
QUERY PLAN

-
-
Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
  Buffers: shared hit=3 read=2316
  ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
Order By: (int_price <-> 0::bigint)
Buffers: shared hit=3 read=2316
Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') 
ORDER BY int_price <-> 0 LIMIT 1;
QUERY 
PLAN

-
---
Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
  Buffers: shared hit=1 read=1577
  ->  Index Scan using idx_product_t_product_titleprice on t_

[HACKERS] workaround for expensive KNN?

2011-04-08 Thread PostgreSQL - Hans-Jürgen Schönig

hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows 
matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'iphone') ORDER BY int_price <-> 0 LIMIT 10;

QUERY PLAN   
  
-
--
 Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 
rows=10 loops=1)
   Buffers: shared hit=9 read=5004
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..13251.91 rows=3224 width=16) (actual time=
36391.715..45542.573 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''iphon'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=9 read=5004
 Total runtime: 45542.676 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 10;

QUERY PLAN   
 
-
-
 Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 
rows=10 loops=1)
   Buffers: shared hit=3 read=2316
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
7243.524..10935.217 rows=10 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=3 read=2316
 Total runtime: 10935.265 ms
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM 
product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 
'handy') ORDER BY int_price <-> 0 LIMIT 1;
 QUERY 
PLAN  
   
-
---
 Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 
loops=1)
   Buffers: shared hit=1 read=1577
   ->  Index Scan using idx_product_t_product_titleprice on t_product  
(cost=0.00..29762.61 rows=7255 width=16) (actual time=
28.525..28.525 rows=1 loops=1)
 Index Cond: (to_tsvector('german'::regconfig, title) @@ 
'''handy'''::tsquery)
 Order By: (int_price <-> 0::bigint)
 Buffers: shared hit=1 read=1577
 Total runtime: 28.558 ms
(7 rows)


under any circumstances - there is no way to reduce the number of buffers 
needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a 
single block from disk you will die a painful random I/O death.
is there any alternative which does not simply die when i try to achieve what i 
want?

the use case is quite simple: all products with a certain word (10 cheapest or 
so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] workaround for expensive KNN?

Re: [HACKERS] workaround for expensive KNN?

Re: [HACKERS] workaround for expensive KNN?

Re: [HACKERS] workaround for expensive KNN?

Re: [HACKERS] workaround for expensive KNN?

Re: [HACKERS] workaround for expensive KNN?

[HACKERS] workaround for expensive KNN?

7 matches

Site Navigation

Mail list logo

Footer information