Re: [GENERAL] tgrm index for word_similarity

2017-10-22 Thread Arthur Zakirov
On Sat, Oct 21, 2017 at 10:01:56PM -0700, Igal @ Lucee.org wrote:
> 
> 1) I thought that the whole idea behind indexes on expressions is that the
> index would be used in a WHERE clause?  See
> https://www.postgresql.org/docs/10/static/indexes-expressional.html - Am I
> missing something?
>

I think the idea is a little bit different. It is about computing index
entries only once, during index creation. During scan PostgreSQL doesn't
compute such entries every time.
I am not very good at PostgreSQL's planner. But I know that PostgreSQL
uses index scan for pg_trgm only with %, <%, ~~, ~~*, ~, ~* operators.
pg_trgm's operator classes (which should be implemented for index scan) are 
designed in this way.
 
> 2) A query with `WHERE input <% name` utilizes the index, but a query
> without a WHERE clause at all does not?

Because sequential scan is cheaper here than index scan.

> 
> 3) What happens if I do not create an index at all?  Does the query that I
> run in 30 - 40ms, the one that does not utilize an index, creates all of the
> tri-grams on the fly each time that it runs?  Would it be possible for me to
> create a TABLE or a VIEW with the tri-grams so that there is no need to
> create them each time the query runs?
> 

As far as I know you can't do it nowadays. You can't create an trigram
column, as you can do it for FTS, you can create an tsvector column.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tgrm index for word_similarity

2017-10-21 Thread Igal @ Lucee.org

On 10/21/2017 5:01 AM, Arthur Zakirov wrote:

PostgreSQL doesn't use index scan with functions within WHERE clause. So
you always need to use operators instead. You can try <% operator and
pg_trgm.word_similarity_threshold variable:

=# SET pg_trgm.word_similarity_threshold TO 0.1;
=# SELECT name, popularity
FROM temp.items3_v
,(values ('some phrase'::text)) consts(input)
WHERE input <% name
ORDER BY 2, input <<-> name;


Thank you, your solution does show that the index is used when I do 
`explain analyze`, and makes the query finish in about 20ms so it's 
about 1.5 - 2 times faster than without the index, but that raises a few 
questions for me:


1) I thought that the whole idea behind indexes on expressions is that 
the index would be used in a WHERE clause?  See 
https://www.postgresql.org/docs/10/static/indexes-expressional.html - Am 
I missing something?


2) A query with `WHERE input <% name` utilizes the index, but a query 
without a WHERE clause at all does not?


3) What happens if I do not create an index at all?  Does the query that 
I run in 30 - 40ms, the one that does not utilize an index, creates all 
of the tri-grams on the fly each time that it runs?  Would it be 
possible for me to create a TABLE or a VIEW with the tri-grams so that 
there is no need to create them each time the query runs?


Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] tgrm index for word_similarity

2017-10-21 Thread Arthur Zakirov
On Thu, Oct 19, 2017 at 04:54:19PM -0700, Igal @ Lucee.org wrote:
> 
> My query at the moment is:
> 
>     SELECT name, popularity
>     FROM   temp.items3_v
>       ,(values ('some phrase'::text)) consts(input)
>     WHERE  true
>         and word_similarity(input, name) > 0.01  -- be lenient as some names
> are 75 characters long and we want to match even on a few characters of
> input
>     ORDER BY 2, input <<-> name
> 

PostgreSQL doesn't use index scan with functions within WHERE clause. So
you always need to use operators instead. You can try <% operator and
pg_trgm.word_similarity_threshold variable:

=# SET pg_trgm.word_similarity_threshold TO 0.1;
=# SELECT name, popularity
FROM temp.items3_v
,(values ('some phrase'::text)) consts(input)
WHERE input <% name
ORDER BY 2, input <<-> name;

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] tgrm index for word_similarity

2017-10-20 Thread Igal @ Lucee.org

On 10/19/2017 4:54 PM, Igal @ Lucee.org wrote:
I want to use Postgres for a fuzzy auto-suggest search field.  As the 
user will be typing their search phrase, Postgres will show a list of 
items that fuzzy-matches what they typed so far, ordered by popularity 
(ntile(20)) and distance, i.e. 1 - word_similarity().


I created a Materialized View with two columns: name text, popularity int.

My query at the moment is:

    SELECT name, popularity
    FROM   temp.items3_v
      ,(values ('some phrase'::text)) consts(input)
    WHERE  true
        and word_similarity(input, name) > 0.01  -- be lenient as some 
names are 75 characters long and we want to match even on a few 
characters of input

    ORDER BY 2, input <<-> name

I tried to add a GIN trgm index on `name`:

    CREATE INDEX temp_items3_v_tgrm_item_name ON temp.items3_v USING 
GIN(name gin_trgm_ops);


But it is not used

What index would be good for that kind of query?


I see that when I use LIKE or ILIKE the index is used, but I lose all of 
the "fuzzy" benefits by doing that.


Is there any type of INDEX or even building my own COLUMN of trgm that 
can help speed my word_similarity() results?  When used in auto-suggest 
there are usually several queries for each user in a relatively short 
period of time, so speed is important.


Thanks,


Igal



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] tgrm index for word_similarity

2017-10-19 Thread Igal @ Lucee.org

Hello,

I want to use Postgres for a fuzzy auto-suggest search field.  As the 
user will be typing their search phrase, Postgres will show a list of 
items that fuzzy-matches what they typed so far, ordered by popularity 
(ntile(20)) and distance, i.e. 1 - word_similarity().


I created a Materialized View with two columns: name text, popularity int.

My query at the moment is:

    SELECT name, popularity
    FROM   temp.items3_v
      ,(values ('some phrase'::text)) consts(input)
    WHERE  true
        and word_similarity(input, name) > 0.01  -- be lenient as some 
names are 75 characters long and we want to match even on a few 
characters of input

    ORDER BY 2, input <<-> name

Which seems to yield pretty good results, but takes over 40+ ms on a 
table that's not that large.


So I tried to add a GIN trgm index on `name`:

    CREATE INDEX temp_items3_v_tgrm_item_name ON temp.items3_v USING 
GIN(name gin_trgm_ops);


But it is not used:

QUERY PLAN |
-|
Sort  (cost=264.42..269.91 rows=2198 width=43) (actual 
time=41.060..41.117 rows=1044 loops=1)    |
  Sort Key: items3_v.popularity, (('kandels'::text <<-> 
items3_v.name))  |

  Sort Method: quicksort  Memory: 149kB |
  ->  Seq Scan on items3_v  (cost=0.00..142.41 rows=2198 width=43) 
(actual time=0.217..40.471 rows=1044 loops=1) |
    Filter: (word_similarity('kandels'::text, name) > 
'0.01'::double precision)  |

    Rows Removed by Filter: 5550 |
Planning time: 0.149 ms |
Execution time: 41.308 ms |

What index would be good for that kind of query?

Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org