Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Jeff Janes
On Sun, Nov 13, 2016 at 3:54 AM, Aaron Lewis 
wrote:

> I have a simple table with Trigram index,
>
> create table mytable(hash char(40), title text);
> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
>
> When I run a query with 10m rows, it uses the Trigram index, but takes
> 3s to execute, very slow.
> (I have 80m rows, but only inserted 10m for testing purpose)
>
> test=# select count(*) from mytable;
>   count
> --
>  13971887
> (1 row)
>
> test=# explain analyze select * from mytable where title ilike 'x264';
>   QUERY PLAN
> 
> --
>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
>Recheck Cond: (title ~~* 'x264'::text)
>Rows Removed by Index Recheck: 11402855
>Heap Blocks: exact=39557 lossy=158010
>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
>  Index Cond: (title ~~* 'x264'::text)
>  Planning time: 0.611 ms
>  Execution time: 2937.729 ms
> (8 rows)
>
> Any ideas to speed things up?
>


What version of postgresql and pg_trgm are you using?  It might work better
under 9.6/1.3

Since your query doesn't use wildcards, it is probably more well suited to
a regular btree index, perhaps with citext.

Cheers,

Jeff


Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Oleg Bartunov
On Sun, Nov 13, 2016 at 5:25 PM, Aaron Lewis 
wrote:

> Thanks Oleg.
>
> I've increased work_mem to 128MB, now the query falls down to 1.7s,
> faster but still not good enough.
>
> Is there any other thing I can do about it?
>

your query 'x264' is short in terms of the number of trigrams, so trigram
index isn't good. Did you tried text_pattern_ops for btree ? Something like
create index title_btree_idx on mytable using btree(title text_pattern_ops
);

>
> test=# explain analyze select * from mytable where title ilike 'x264';
>   QUERY PLAN
> 
> --
>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
> width=83) (actual time=1754.656..1754.656 rows=0 loops=1)
>Recheck Cond: (title ~~* 'x264'::text)
>Rows Removed by Index Recheck: 1220793
>Heap Blocks: exact=197567
>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
> rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1)
>  Index Cond: (title ~~* 'x264'::text)
>  Planning time: 1.168 ms
>  Execution time: 1755.944 ms
>
>
> On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov 
> wrote:
> >
> >
> > On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis  >
> > wrote:
> >>
> >> I have a simple table with Trigram index,
> >>
> >> create table mytable(hash char(40), title text);
> >> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
> >>
> >> When I run a query with 10m rows, it uses the Trigram index, but takes
> >> 3s to execute, very slow.
> >> (I have 80m rows, but only inserted 10m for testing purpose)
> >>
> >> test=# select count(*) from mytable;
> >>   count
> >> --
> >>  13971887
> >> (1 row)
> >>
> >> test=# explain analyze select * from mytable where title ilike 'x264';
> >>   QUERY PLAN
> >>
> >> 
> --
> >>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
> >> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
> >>Recheck Cond: (title ~~* 'x264'::text)
> >>Rows Removed by Index Recheck: 11402855
> >>Heap Blocks: exact=39557 lossy=158010
> >>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
> >> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
> >>  Index Cond: (title ~~* 'x264'::text)
> >>  Planning time: 0.611 ms
> >>  Execution time: 2937.729 ms
> >> (8 rows)
> >>
> >> Any ideas to speed things up?
> >
> >
> >Rows Removed by Index Recheck: 11402855
> >Heap Blocks: exact=39557 lossy=158010
> >
> > You need to increase work_mem
> >>
> >>
> >> --
> >> Best Regards,
> >> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> >> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
> >>
> >>
> >> --
> >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-general
> >
> >
>
>
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>


Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Aaron Lewis
Thanks Oleg.

I've increased work_mem to 128MB, now the query falls down to 1.7s,
faster but still not good enough.

Is there any other thing I can do about it?

test=# explain analyze select * from mytable where title ilike 'x264';
  QUERY PLAN
--
 Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
width=83) (actual time=1754.656..1754.656 rows=0 loops=1)
   Recheck Cond: (title ~~* 'x264'::text)
   Rows Removed by Index Recheck: 1220793
   Heap Blocks: exact=197567
   ->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
rows=1380 width=0) (actual time=346.663..346.663 rows=1220793 loops=1)
 Index Cond: (title ~~* 'x264'::text)
 Planning time: 1.168 ms
 Execution time: 1755.944 ms


On Sun, Nov 13, 2016 at 10:04 PM, Oleg Bartunov  wrote:
>
>
> On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis 
> wrote:
>>
>> I have a simple table with Trigram index,
>>
>> create table mytable(hash char(40), title text);
>> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
>>
>> When I run a query with 10m rows, it uses the Trigram index, but takes
>> 3s to execute, very slow.
>> (I have 80m rows, but only inserted 10m for testing purpose)
>>
>> test=# select count(*) from mytable;
>>   count
>> --
>>  13971887
>> (1 row)
>>
>> test=# explain analyze select * from mytable where title ilike 'x264';
>>   QUERY PLAN
>>
>> --
>>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
>> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
>>Recheck Cond: (title ~~* 'x264'::text)
>>Rows Removed by Index Recheck: 11402855
>>Heap Blocks: exact=39557 lossy=158010
>>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
>> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
>>  Index Cond: (title ~~* 'x264'::text)
>>  Planning time: 0.611 ms
>>  Execution time: 2937.729 ms
>> (8 rows)
>>
>> Any ideas to speed things up?
>
>
>Rows Removed by Index Recheck: 11402855
>Heap Blocks: exact=39557 lossy=158010
>
> You need to increase work_mem
>>
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Trigram is slow when 10m rows

2016-11-13 Thread Oleg Bartunov
On Sun, Nov 13, 2016 at 2:54 PM, Aaron Lewis 
wrote:

> I have a simple table with Trigram index,
>
> create table mytable(hash char(40), title text);
> create index title_trgm_idx on mytable using gin(title gin_trgm_ops);
>
> When I run a query with 10m rows, it uses the Trigram index, but takes
> 3s to execute, very slow.
> (I have 80m rows, but only inserted 10m for testing purpose)
>
> test=# select count(*) from mytable;
>   count
> --
>  13971887
> (1 row)
>
> test=# explain analyze select * from mytable where title ilike 'x264';
>   QUERY PLAN
> 
> --
>  Bitmap Heap Scan on mytable  (cost=462.69..5639.67 rows=1380
> width=83) (actual time=2937.308..2937.308 rows=0 loops=1)
>Recheck Cond: (title ~~* 'x264'::text)
>Rows Removed by Index Recheck: 11402855
>Heap Blocks: exact=39557 lossy=158010
>->  Bitmap Index Scan on title_trgm_idx  (cost=0.00..462.35
> rows=1380 width=0) (actual time=342.440..342.440 rows=1220793 loops=1)
>  Index Cond: (title ~~* 'x264'::text)
>  Planning time: 0.611 ms
>  Execution time: 2937.729 ms
> (8 rows)
>
> Any ideas to speed things up?
>

   Rows Removed by Index Recheck: 11402855
   Heap Blocks: exact=39557 lossy=158010

You need to increase work_mem

>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>