Hi all.
 
I'm doing full-text-search and want do display the results in the order the 
articles were received (timestamp). I have millions of articles where the 
newest are the most interesting, and a search may match many articles so doing 
the sort using some INDEX would be great.
 
Take the following example-schema:
 
create extension if not exists btree_gin; 
drop table if EXISTS delivery; create table delivery( id BIGSERIAL primary key
, fts_allTSVECTOR not null, folder_id BIGINT NOT NULL, received_timestamp 
TIMESTAMP not null, message varchar not null ); create index fts_idx ON delivery
using gin(fts_all, folder_id); CREATE OR REPLACE FUNCTION 
update_delivery_tsvector_tf()RETURNS TRIGGER AS $$ BEGIN  NEW.fts_all = 
to_tsvector('simple', NEW.message); return NEW; END; $$ LANGUAGE PLPGSQL; 
CREATE TRIGGERupdate_delivery_tsvector_t BEFORE INSERT OR UPDATE ON delivery 
FOR EACH ROW EXECUTE PROCEDUREupdate_delivery_tsvector_tf(); insert into 
delivery(folder_id, received_timestamp,message) values (1, '2015-01-01', 'Yes 
hit four') , (1, '2014-01-01', 'Hi man') , (2, '2013-01-01', 'Hi man') , (2, 
'2013-01-01', 'fish') ; analyze delivery; set ENABLE_SEQSCAN to off; explain 
analyze SELECTdel.id , del.received_timestamp FROM delivery del WHERE 1 = 1 AND 
del.fts_all @@ to_tsquery('simple', 'hi:*') AND del.folder_id = 1 ORDER BY 
del.received_timestampDESC LIMIT 101 OFFSET 0; 
 
I use btree_gin extention to make folder_id part of index.
 
I get the following plan (using 9.6 from master):
                                                         QUERY 
PLAN                                                         
 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Limit  (cost=5.23..5.23 rows=1 width=16) (actual time=0.042..0.043 rows=2 
loops=1)
    ->  Sort  (cost=5.23..5.23 rows=1 width=16) (actual time=0.040..0.040 
rows=2 loops=1)
          Sort Key: received_timestamp DESC
          Sort Method: quicksort  Memory: 25kB
          ->  Bitmap Heap Scan on delivery del  (cost=3.90..5.22 rows=1 
width=16) (actual time=0.029..0.030 rows=2 loops=1)
                Recheck Cond: (fts_all @@ '''hi'':*'::tsquery)
                Filter: (folder_id = 1)
                Rows Removed by Filter: 1
                Heap Blocks: exact=1
                ->  Bitmap Index Scan on fts_idx  (cost=0.00..3.90 rows=1 
width=0) (actual time=0.018..0.018 rows=3 loops=1)
                      Index Cond: (fts_all @@ '''hi'':*'::tsquery)
  Planning time: 0.207 ms
  Execution time: 0.085 ms
 (13 rows)
 
Here is the explain from a real-world query:
  EXPLAIN ANALYZE SELECT del.entity_id , del.received_timestamp FROM 
origo_email_delivery delWHERE 1 = 1 AND del.fts_all @@ to_tsquery('simple', 
'andre:*') AND del.folder_id = 44964 ORDER BY del.received_timestamp DESC LIMIT 
101OFFSET 0; 
                                                                        QUERY 
PLAN                                                                        
 
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Limit  (cost=92260.66..92260.91 rows=101 width=16) (actual 
time=347.891..347.907 rows=101 loops=1)
    ->  Sort  (cost=92260.66..92291.08 rows=12167 width=16) (actual 
time=347.888..347.899 rows=101 loops=1)
          Sort Key: received_timestamp DESC
          Sort Method: top-N heapsort  Memory: 29kB
          ->  Bitmap Heap Scan on origo_email_delivery del  
(cost=2480.95..91794.77 rows=12167 width=16) (actual time=152.568..346.132 
rows=18257 loops=1)
                Recheck Cond: (fts_all @@ '''andre'':*'::tsquery)
                Filter: (folder_id = 44964)
                Rows Removed by Filter: 264256
                Heap Blocks: exact=80871
                ->  Bitmap Index Scan on temp_fts_idx  (cost=0.00..2477.91 
rows=309588 width=0) (actual time=134.903..134.903 rows=282513 loops=1)
                      Index Cond: (fts_all @@ '''andre'':*'::tsquery)
  Planning time: 0.530 ms
  Execution time: 347.967 ms
 (13 rows)
 

 
1. Why isnt' folder_id part of the index-cond?
2. Is there a way to make it use the (same) index to sort by 
received_timestamp?
3. Using a GIN-index, is there a way to use the index at all for sorting?
4. It doesn't seem like ts_rank uses the index for sorting either.
 
Thanks.

 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


Reply via email to