Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Hey guys,

I'm trying to understand the performance impact of "Index Recheck", I
googled for Index Recheck, but didn't find much details about it,
where can I know more about it?

And how did you know the performance is being significantly hurt by
inadequate work_mem?

I'm running PG 9.6.1, built from source.


On Mon, Nov 14, 2016 at 2:51 AM, Tom Lane  wrote:
> Oleg Bartunov  writes:
>> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis 
>>> It takes 500ms with 10m rows, could it be faster?
>
>> sure.  Recheck with function call is pretty expensive, so I'd not recommend
>> to create functional index, just create separate column of type tsvector
>> (materialize to_tsvector) and create gin index on it.  You should surprise.
>
> I doubt it'll help that much --- more than half the time is going into the
> bitmap indexscan, and with over 1m candidate matches, there's no way
> that's going to be super cheap.
>
> I wonder whether a gist index would be better here, since it would support
> a plain indexscan which should require scanning much less of the index
> given the small LIMIT.
>
> (Materializing the tsvector would probably help for gist, too, by reducing
> the cost of lossy-index rechecks.)
>
> BTW, it still looks like the performance is being significantly hurt by
> inadequate work_mem.
>
> regards, tom lane



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Tom Lane
Oleg Bartunov  writes:
> On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis 
>> It takes 500ms with 10m rows, could it be faster?

> sure.  Recheck with function call is pretty expensive, so I'd not recommend
> to create functional index, just create separate column of type tsvector
> (materialize to_tsvector) and create gin index on it.  You should surprise.

I doubt it'll help that much --- more than half the time is going into the
bitmap indexscan, and with over 1m candidate matches, there's no way
that's going to be super cheap.

I wonder whether a gist index would be better here, since it would support
a plain indexscan which should require scanning much less of the index
given the small LIMIT.

(Materializing the tsvector would probably help for gist, too, by reducing
the cost of lossy-index rechecks.)

BTW, it still looks like the performance is being significantly hurt by
inadequate work_mem.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Oleg Bartunov
On Sun, Nov 13, 2016 at 6:05 PM, Aaron Lewis 
wrote:

> Sigh, didn't notice that. Thanks for the heads up.
>
> It takes 500ms with 10m rows, could it be faster?
>

sure.  Recheck with function call is pretty expensive, so I'd not recommend
to create functional index, just create separate column of type tsvector
(materialize to_tsvector) and create gin index on it.  You should surprise.


> I've increased work_mem to 256MB
>
> test=# explain analyze select * from mytable where
> to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
>   QUERY PLAN
> 
> 
> ---
>  Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
> time=348.506..536.483 rows=1000 loops=1)
>->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
> width=83) (actual time=345.354..536.199 rows=1010 loops=1)
>  Recheck Cond: (to_tsvector('english'::regconfig, title) @@
> '''x264'''::tsquery)
>  Rows Removed by Index Recheck: 12242
>  Heap Blocks: exact=20 lossy=186
>  ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
> rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
> loops=1)
>Index Cond: (to_tsvector('english'::regconfig, title)
> @@ '''x264'''::tsquery)
>  Planning time: 0.144 ms
>  Execution time: 537.212 ms
> (9 rows)
>
> On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
>  wrote:
> > On 13/11/2016 15:26, Aaron Lewis wrote:
> >> Hi Oleg,
> >>
> >> Can you elaborate on the title column? I don't get it.
> >>
> >
>  create table mytable(hash char(40), title varchar(500));
>  create index name_fts on mytable using gin(to_tsvector('english',
>  'title'));
> >
> > You created an index on the text 'title', not on the title column, so
> > the index is useless.
> >
> > Drop the existing index and create this one instead:
> >
> > create index name_fts on mytable using gin(to_tsvector('english',
> title));
> >
> >> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov 
> wrote:
> >>>
> >>>
> >>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis <
> the.warl0ck.1...@gmail.com>
> >>> wrote:
> 
>  I have a simple table, and a gin index,
> 
>  create table mytable(hash char(40), title varchar(500));
>  create index name_fts on mytable using gin(to_tsvector('english',
>  'title'));
> >>>
> >>>
> >>>
> >>> ^
> >>>
> 
>  create unique index md5_uniq_idx on mytable(hash);
> 
>  When I execute a query with tsquery, the GIN index was not in use:
> 
>  test=# explain analyze select * from mytable where
>  to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>   QUERY PLAN
> 
>  
> 
>   Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>  time=0.111..75.549 rows=10 loops=1)
> ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>  (actual time=0.110..75.546 rows=10 loops=1)
>   Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>  '''abc'' | ''def'''::tsquery)
>   Rows Removed by Filter: 10221
>   Planning time: 0.176 ms
>   Execution time: 75.564 ms
>  (6 rows)
> 
>  Any ideas?
> 
> >
> > --
> > Julien Rouhaud
> > http://dalibo.com - http://dalibo.org
>
>
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Sigh, didn't notice that. Thanks for the heads up.

It takes 500ms with 10m rows, could it be faster?
I've increased work_mem to 256MB

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'x264'::tsquery limit 1000 offset 10;
  QUERY PLAN
---
 Limit  (cost=684.06..2949.42 rows=1000 width=83) (actual
time=348.506..536.483 rows=1000 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=661.41..158917.22 rows=69859
width=83) (actual time=345.354..536.199 rows=1010 loops=1)
 Recheck Cond: (to_tsvector('english'::regconfig, title) @@
'''x264'''::tsquery)
 Rows Removed by Index Recheck: 12242
 Heap Blocks: exact=20 lossy=186
 ->  Bitmap Index Scan on name_fts  (cost=0.00..643.95
rows=69859 width=0) (actual time=333.703..333.703 rows=1044673
loops=1)
   Index Cond: (to_tsvector('english'::regconfig, title)
@@ '''x264'''::tsquery)
 Planning time: 0.144 ms
 Execution time: 537.212 ms
(9 rows)

On Sun, Nov 13, 2016 at 10:33 PM, Julien Rouhaud
 wrote:
> On 13/11/2016 15:26, Aaron Lewis wrote:
>> Hi Oleg,
>>
>> Can you elaborate on the title column? I don't get it.
>>
>
 create table mytable(hash char(40), title varchar(500));
 create index name_fts on mytable using gin(to_tsvector('english',
 'title'));
>
> You created an index on the text 'title', not on the title column, so
> the index is useless.
>
> Drop the existing index and create this one instead:
>
> create index name_fts on mytable using gin(to_tsvector('english', title));
>
>> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov  wrote:
>>>
>>>
>>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
>>> wrote:

 I have a simple table, and a gin index,

 create table mytable(hash char(40), title varchar(500));
 create index name_fts on mytable using gin(to_tsvector('english',
 'title'));
>>>
>>>
>>>
>>> ^
>>>

 create unique index md5_uniq_idx on mytable(hash);

 When I execute a query with tsquery, the GIN index was not in use:

 test=# explain analyze select * from mytable where
 to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
  QUERY PLAN

 
  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
 time=0.111..75.549 rows=10 loops=1)
->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
 (actual time=0.110..75.546 rows=10 loops=1)
  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
 '''abc'' | ''def'''::tsquery)
  Rows Removed by Filter: 10221
  Planning time: 0.176 ms
  Execution time: 75.564 ms
 (6 rows)

 Any ideas?

>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Julien Rouhaud
On 13/11/2016 15:26, Aaron Lewis wrote:
> Hi Oleg,
> 
> Can you elaborate on the title column? I don't get it.
> 

>>> create table mytable(hash char(40), title varchar(500));
>>> create index name_fts on mytable using gin(to_tsvector('english',
>>> 'title'));

You created an index on the text 'title', not on the title column, so
the index is useless.

Drop the existing index and create this one instead:

create index name_fts on mytable using gin(to_tsvector('english', title));

> On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov  wrote:
>>
>>
>> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
>> wrote:
>>>
>>> I have a simple table, and a gin index,
>>>
>>> create table mytable(hash char(40), title varchar(500));
>>> create index name_fts on mytable using gin(to_tsvector('english',
>>> 'title'));
>>
>>
>>
>> ^
>>
>>>
>>> create unique index md5_uniq_idx on mytable(hash);
>>>
>>> When I execute a query with tsquery, the GIN index was not in use:
>>>
>>> test=# explain analyze select * from mytable where
>>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>>  QUERY PLAN
>>>
>>> 
>>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>>> time=0.111..75.549 rows=10 loops=1)
>>>->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>>> (actual time=0.110..75.546 rows=10 loops=1)
>>>  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>>> '''abc'' | ''def'''::tsquery)
>>>  Rows Removed by Filter: 10221
>>>  Planning time: 0.176 ms
>>>  Execution time: 75.564 ms
>>> (6 rows)
>>>
>>> Any ideas?
>>> 

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
Hi Oleg,

Can you elaborate on the title column? I don't get it.

On Sun, Nov 13, 2016 at 10:10 PM, Oleg Bartunov  wrote:
>
>
> On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
> wrote:
>>
>> I have a simple table, and a gin index,
>>
>> create table mytable(hash char(40), title varchar(500));
>> create index name_fts on mytable using gin(to_tsvector('english',
>> 'title'));
>
>
>
> ^
>
>>
>> create unique index md5_uniq_idx on mytable(hash);
>>
>> When I execute a query with tsquery, the GIN index was not in use:
>>
>> test=# explain analyze select * from mytable where
>> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>>  QUERY PLAN
>>
>> 
>>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
>> time=0.111..75.549 rows=10 loops=1)
>>->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
>> (actual time=0.110..75.546 rows=10 loops=1)
>>  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
>> '''abc'' | ''def'''::tsquery)
>>  Rows Removed by Filter: 10221
>>  Planning time: 0.176 ms
>>  Execution time: 75.564 ms
>> (6 rows)
>>
>> Any ideas?
>>
>>
>> --
>> Best Regards,
>> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
>> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>



-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Oleg Bartunov
On Sun, Nov 13, 2016 at 2:50 PM, Aaron Lewis 
wrote:

> I have a simple table, and a gin index,
>
> create table mytable(hash char(40), title varchar(500));
> create index name_fts on mytable using gin(to_tsvector('english',
> 'title'));
>


^


> create unique index md5_uniq_idx on mytable(hash);
>
> When I execute a query with tsquery, the GIN index was not in use:
>
> test=# explain analyze select * from mytable where
> to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
>  QUERY PLAN
> 
> 
>  Limit  (cost=0.00..277.35 rows=10 width=83) (actual
> time=0.111..75.549 rows=10 loops=1)
>->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
> (actual time=0.110..75.546 rows=10 loops=1)
>  Filter: (to_tsvector('english'::regconfig, (title)::text) @@
> '''abc'' | ''def'''::tsquery)
>  Rows Removed by Filter: 10221
>  Planning time: 0.176 ms
>  Execution time: 75.564 ms
> (6 rows)
>
> Any ideas?
>
>
> --
> Best Regards,
> Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
> Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


[GENERAL] Why is this query not using GIN index?

2016-11-13 Thread Aaron Lewis
I have a simple table, and a gin index,

create table mytable(hash char(40), title varchar(500));
create index name_fts on mytable using gin(to_tsvector('english', 'title'));
create unique index md5_uniq_idx on mytable(hash);

When I execute a query with tsquery, the GIN index was not in use:

test=# explain analyze select * from mytable where
to_tsvector('english', title) @@ 'abc | def'::tsquery limit 10;
 QUERY PLAN

 Limit  (cost=0.00..277.35 rows=10 width=83) (actual
time=0.111..75.549 rows=10 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..381187.45 rows=13744 width=83)
(actual time=0.110..75.546 rows=10 loops=1)
 Filter: (to_tsvector('english'::regconfig, (title)::text) @@
'''abc'' | ''def'''::tsquery)
 Rows Removed by Filter: 10221
 Planning time: 0.176 ms
 Execution time: 75.564 ms
(6 rows)

Any ideas?


-- 
Best Regards,
Aaron Lewis - PGP: 0x13714D33 - http://pgp.mit.edu/
Finger Print:   9F67 391B B770 8FF6 99DC  D92D 87F6 2602 1371 4D33


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general