Re: Bad plan for ltree predicate <@

Roman Konoval Fri, 01 Dec 2017 18:35:38 -0800

Hi Tom,

Thanks for your help.


> On Dec 1, 2017, at 22:33, Tom Lane <[email protected]> wrote:
> 
> 
>  The seqscan formulation of the query results in evaluating
> this function afresh at most of the rows

The function is defined as STABLE. I though that means that there is no need
 to reevaluate it on every row as input parameter is the same for every row and
 return value will be the same during the same query execution. Do I understand
 incorrectly what STABLE means?
Why is the function evaluated more than once?

> , whereas shoving it into an
> uncorrelated sub-select causes it to be evaluated only once.  That, I
> think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
> formulation go faster.  Certainly you'd not expect that a bitmap scan that
> has to hit most of the rows anyway is going to win over a seqscan.
> 
> The fact that the planner goes for a bitmap scan in the second formulation
> is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
> for selectivity estimation purposes, so you end up with a default estimate
> that says that the <@ condition only selects a small fraction of the rows.
> Not sure if we should try to change that or not.
> 
> I'd suggest setting the function's cost to 1000 or so and seeing if that
> doesn't improve matters.
> 


If I set function cost to 1000 I get slightly better plan but still 3.5 more 
buffers are read when compared to bitmap scan which as you wrote one would 
expect to be slower than seq scan.
Here is the plan:

                                                                                
 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=216438.81..216438.82 rows=1 width=0) (actual 
time=1262.244..1262.245 rows=1 loops=1)
   Buffers: shared hit=169215
   CTE trees
     ->  Index Scan using document_head__id_path__gist__idx on document_head d  
(cost=2.91..212787.85 rows=162265 width=74) (actual time=0.115..727.119 
rows=154854 loops=1)
           Index Cond: (id_path <@ 
get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying))
           Filter: (((id)::text <> 
'78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 
'DELETED'::text))
           Rows Removed by Filter: 23
           Buffers: shared hit=169215
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.119..1118.899 rows=154854 loops=1)
         Buffers: shared hit=169215
 Total runtime: 1277.010 ms
(11 rows)

My understanding is that the optimal plan in this case should read less data 
than bitmap scan by the amount of buffers hit by bitmap index scan. 
It should read roughly all buffers of the table itself. Something like the 
query with predicate using ltree literal instead of function invocation:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
  FROM document_head AS d
  WHERE (d.id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree
 AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;
                                                                                
                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=42114.02..42114.03 rows=1 width=0) (actual 
time=997.427..997.427 rows=1 loops=1)
   Buffers: shared hit=35230
   CTE trees
     ->  Seq Scan on document_head d  (cost=0.00..38463.06 rows=162265 
width=74) (actual time=0.013..593.082 rows=154854 loops=1)
           Filter: ((id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree)
 AND ((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND 
((state)::text <> 'DELETED'::text))
           Rows Removed by Filter: 23357
           Buffers: shared hit=35230
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.017..888.076 rows=154854 loops=1)
         Buffers: shared hit=35230
 Total runtime: 1011.565 ms
(10 rows)


The question is if it possible to get plan like that using function or some 
other way to get ltree value for given document_head.id value in one query?

As an alternative I can get ltree value with the separate query but this would 
require
1. a round-trip to postgres
2. me to change isolation level to REPEATABLE READ to make sure that I get 
consistent result 
so I would like to avoid that.

Regards,
Roman Konoval

Re: Bad plan for ltree predicate <@

Reply via email to