Re: Bad plan for ltree predicate <@

2017-12-01 Thread Roman Konoval
Hi Tom,

Thanks for your help.

> On Dec 1, 2017, at 22:33, Tom Lane  wrote:
> 
> 
>  The seqscan formulation of the query results in evaluating
> this function afresh at most of the rows

The function is defined as STABLE. I though that means that there is no need
 to reevaluate it on every row as input parameter is the same for every row and
 return value will be the same during the same query execution. Do I understand
 incorrectly what STABLE means?
Why is the function evaluated more than once?

> , whereas shoving it into an
> uncorrelated sub-select causes it to be evaluated only once.  That, I
> think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
> formulation go faster.  Certainly you'd not expect that a bitmap scan that
> has to hit most of the rows anyway is going to win over a seqscan.
> 
> The fact that the planner goes for a bitmap scan in the second formulation
> is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
> for selectivity estimation purposes, so you end up with a default estimate
> that says that the <@ condition only selects a small fraction of the rows.
> Not sure if we should try to change that or not.
> 
> I'd suggest setting the function's cost to 1000 or so and seeing if that
> doesn't improve matters.
> 


If I set function cost to 1000 I get slightly better plan but still 3.5 more 
buffers are read when compared to bitmap scan which as you wrote one would 
expect to be slower than seq scan.
Here is the plan:


 QUERY PLAN
-
 Aggregate  (cost=216438.81..216438.82 rows=1 width=0) (actual 
time=1262.244..1262.245 rows=1 loops=1)
   Buffers: shared hit=169215
   CTE trees
 ->  Index Scan using document_head__id_path__gist__idx on document_head d  
(cost=2.91..212787.85 rows=162265 width=74) (actual time=0.115..727.119 
rows=154854 loops=1)
   Index Cond: (id_path <@ 
get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying))
   Filter: (((id)::text <> 
'78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 
'DELETED'::text))
   Rows Removed by Filter: 23
   Buffers: shared hit=169215
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.119..1118.899 rows=154854 loops=1)
 Buffers: shared hit=169215
 Total runtime: 1277.010 ms
(11 rows)

My understanding is that the optimal plan in this case should read less data 
than bitmap scan by the amount of buffers hit by bitmap index scan. 
It should read roughly all buffers of the table itself. Something like the 
query with predicate using ltree literal instead of function invocation:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
  FROM document_head AS d
  WHERE (d.id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree
 AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;

 QUERY PLAN
-
 Aggregate  (cost=42114.02..42114.03 rows=1 width=0) (actual 
time=997.427..997.427 rows=1 loops=1)
   Buffers: shared hit=35230
   CTE trees
 ->  Seq Scan on document_head d  (cost=0.00..38463.06 rows=162265 
width=74) (actual time=0.013..593.082 rows=154854 loops=1)
   Filter: ((id_path <@ 
'869c0187_51ae_4deb_a36f_0425fdafda6e.78157c60_45bc_42c1_9aad_c5651995db5c'::ltree)
 AND ((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND 
((state)::text <> 'DELETED'::text))
   Rows Removed by Filter: 23357
   Buffers: shared hit=35230
   ->  CTE Scan on trees  (cost=0.00..3245.30 rows=162265 width=0) (actual 
time=0.017..888.076 rows=154854 loops=1)
 Buffers: shared hit=35230
 Total runtime: 1011.565 ms
(10 rows)


The question is if it possible to get plan like that using function or some 
other way to get ltree value for given document_head.id value in one query?

As an alternative I can get ltree value with the separate query but this would 
require
1. a round-trip to postgres
2. me to change isolation level to REPEATABLE READ to make sure that I get 
consistent result 
so I would like to avoid that.

Regards,
Roman Konoval

Re: Bad plan for ltree predicate <@

2017-12-01 Thread Tom Lane
Roman Konoval  writes:
> I have a problem on 9.3.14 with a query that accesses table:

I think the root of the problem is your intermediate function:

> CREATE OR REPLACE FUNCTION public.get_doc_path(document_id character varying)
>  RETURNS ltree
>  LANGUAGE plpgsql
>  STABLE
> AS $function$
> DECLARE
> path ltree;
> BEGIN
> select id_path into path from document_head where id = document_id;
> RETURN path;
> END $function$

This is quite expensive, as it involves another table search, but the
planner doesn't know that since you've not marked it as having higher than
normal cost.  The seqscan formulation of the query results in evaluating
this function afresh at most of the rows, whereas shoving it into an
uncorrelated sub-select causes it to be evaluated only once.  That, I
think, and not the seqscan-vs-indexscan aspect, is what makes the bitmap
formulation go faster.  Certainly you'd not expect that a bitmap scan that
has to hit most of the rows anyway is going to win over a seqscan.

The fact that the planner goes for a bitmap scan in the second formulation
is an artifact of the fact that it doesn't try to pre-evaluate sub-selects
for selectivity estimation purposes, so you end up with a default estimate
that says that the <@ condition only selects a small fraction of the rows.
Not sure if we should try to change that or not.

I'd suggest setting the function's cost to 1000 or so and seeing if that
doesn't improve matters.

(BTW, what tipped me off to this was that the "buffers hit" count for
the seqscan node was so high, several times more than the actual size
of the table.  I couldn't account for that until I realized that the
function itself would be adding a few buffer hits per execution.)

regards, tom lane



Bad plan for ltree predicate <@

2017-12-01 Thread Roman Konoval
Hi,


I have a problem on 9.3.14 with a query that accesses table:

Size: (retrieved by query 
https://gist.github.com/romank0/74f9d1d807bd3f41c0729d0fc6126749)

 schemaname |relname|  size  | toast  | associated_idx_size |  total
+---+++-+-
 public | document_head | 275 MB | 630 MB | 439 MB  | 1345 MB


Definition:
 Table "public.document_head"
   Column|   Type   |  Modifiers
-+--+-
 snapshot_id | character varying(36)| not null
 id  | character varying(36)| not null
 base_type   | character varying(50)| not null
 is_cascade  | boolean  | not null default false
 parent_id   | character varying(36)|
 fileplan_node_id| character varying(36)|
 state   | character varying(10)| default 
'ACTIVE'::character varying
 title   | character varying(4096)  | not null
 properties  | text | not null
 properties_cache| hstore   | not null
 serial_number   | integer  | not null
 major_version   | integer  | not null
 minor_version   | integer  | not null
 version_description | text |
 sensitivity | integer  | not null default 10
 id_path | ltree|
 path_name   | character varying(4096)  | collate C not null
 ltx_id  | bigint   | not null
 created_by  | integer  | not null
 created_date| timestamp with time zone | not null
 modified_by | integer  | not null
 modified_date   | timestamp with time zone | not null
 responsible_user_ids| integer[]|
 origin_id   | character varying(36)|
 origin_snapshot_id  | character varying(36)|
 ssn | character varying(64)|
 record_physical_location| text |
 record_physical_location_id | text |
 record_created_date | timestamp with time zone |
 record_aggregated_date  | timestamp with time zone |
 record_last_review_comment  | text |
 record_last_review_date | timestamp with time zone |
 record_next_review_date | timestamp with time zone |
 record_originated_date  | timestamp with time zone |
 record_is_vital | boolean  | not null default false
 storage_plan_state  | text | not null default 
'New'::text
 cut_off_date| timestamp with time zone |
 dispose_date| timestamp with time zone |
 archive_date| timestamp with time zone |
Indexes:
"document_head__id__uniq_key" PRIMARY KEY, btree (id)
"document_head__parent_id__path_name__unq_idx" UNIQUE, btree (parent_id, 
path_name) WHERE state::text = 'ACTIVE'::text
"document_head__snapshot_id__unq" UNIQUE, btree (snapshot_id)
"document_head__base_type__idx" btree (base_type) WHERE state::text <> 
'DELETED'::text
"document_head__fileplan_node_id__idx" btree (fileplan_node_id)
"document_head__id__idx" btree (id) WHERE state::text <> 'DELETED'::text
"document_head__id_path__btree__idx" btree (id_path) WHERE state::text <> 
'DELETED'::text
"document_head__id_path__gist__idx" gist (id_path)
"document_head__ltx_id__idx" btree (ltx_id)
"document_head__origin_id__hotfix__idx" btree (origin_id) WHERE origin_id 
IS NOT NULL
"document_head__origin_id__idx" btree (origin_id) WHERE state::text <> 
'DELETED'::text AND origin_id IS NOT NULL
"document_head__parent_id__idx" btree (parent_id)
"document_head__properties_cache__contact_username_idx" btree 
((properties_cache -> 'person_meta_info.username'::text)) WHERE base_type::text 
= 'Contact'::text AND exist(properties_cache, 'person_meta_info.username'::text)
"document_head__properties_cache__emailmeta_message_id__idx" btree 
((properties_cache -> 'emailmeta.message_id'::text)) WHERE base_type::text = 
'File'::text AND exist(properties_cache, 'emailmeta.message_id'::text)
"document_head__properties_cache__idx" gist (properties_cache) WHERE 
state::text <> 'DELETED'::text
"document_head__properties_cache__project_identifier__idx" btree 
((properties_cache -> 'project.identifier'::text)) WHERE base_type::text = 
'Project'::text AND exist(properties_cache, 'project.identifier'::text)
"document_head__properties_cache__published_origin__idx" btree 
((properties_ca