Jira (PDB-2415) PuppetDB Should Reduce Use of Cascading Deletes and Updates In PostgreSQL

Nick Walker (JIRA) Wed, 10 Feb 2016 09:18:24 -0800

Title: Message Title

Nick Walker created an issue

PDB-2415

Issue Type:	Improvement
Affects Versions:	PDB 3.2.2
Assignee:	Unassigned
Created:	2016/02/10 9:17 AM
Labels:	tcse
Priority:	Normal
Reporter:	Nick Walker

The Problem

While using the following to clean up some nodes that I had terminated I ran into an extreme slowdown in performance.

node-ttl = 12h

node-purge-ttl = 4h

I traced it back to multiple delete statements that were running and I believe they were coming from foreign-key constraints on the certnames table.

pe-puppetdb=# \d+ certnames

                                                          Table "public.certnames"

      Column      |           Type           |                       Modifiers                       | Storage  | Stats target | Description

------------------+--------------------------+-------------------------------------------------------+----------+--------------+-------------

 id               | bigint                   | not null default nextval('certname_id_seq'::regclass) | plain    |              |

 certname         | text                     | not null                                              | extended |              |

 latest_report_id | bigint                   |                                                       | plain    |              |

 deactivated      | timestamp with time zone |                                                       | plain    |              |

 expired          | timestamp with time zone |                                                       | plain    |              |

Indexes:

    "certnames_transform_pkey" PRIMARY KEY, btree (id)

    "certnames_transform_certname_key" UNIQUE CONSTRAINT, btree (certname)

Foreign-key constraints:

    "certnames_reports_id_fkey" FOREIGN KEY (latest_report_id) REFERENCES reports(id) ON DELETE SET NULL

Referenced by:

    TABLE "catalogs" CONSTRAINT "catalogs_certname_fkey" FOREIGN KEY (certname) REFERENCES certnames(certname) ON DELETE CASCADE

    TABLE "factsets" CONSTRAINT "factsets_certname_fk" FOREIGN KEY (certname) REFERENCES certnames(certname) ON UPDATE CASCADE ON DELETE CASCADE

    TABLE "reports" CONSTRAINT "reports_certname_fkey" FOREIGN KEY (certname) REFERENCES certnames(certname) ON DELETE CASCADE

Options: autovacuum_vacuum_scale_factor=0.20, autovacuum_analyze_scale_factor=0.10

My understanding is that instead of a bulk deletions of reports that are for certnames that are no longer active. We’re simply deleting reports, catalogs, and factsets using a cascading delete from certnames.

This means that if I have 200 certnames to delete I think that postgresql will simply shoot out 200 queries to delete the reports that look something like

DELETE from reports where certname = $1;

And another 200 queries to delete catalogs and another 200 to delete fact_sets

I think I proved that to myself by getting the query plan for deleting two certnames

pe-puppetdb=# EXPLAIN ANALYZE DELETE from certnames where certname IN ( 'mono-scale-agent-61.us-west-2.compute.internal', 'mono-scale-agent-60.us-west-2.compute.internal');

                                                                       QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

 Delete on certnames  (cost=8.84..16.45 rows=2 width=6) (actual time=0.116..0.116 rows=0 loops=1)

   ->  Bitmap Heap Scan on certnames  (cost=8.84..16.45 rows=2 width=6) (actual time=0.050..0.053 rows=2 loops=1)

         Recheck Cond: (certname = ANY ('{mono-scale-agent-61.us-west-2.compute.internal,mono-scale-agent-60.us-west-2.compute.internal}'::text[]))

         Heap Blocks: exact=2

         ->  Bitmap Index Scan on certnames_transform_certname_key  (cost=0.00..8.84 rows=2 width=0) (actual time=0.045..0.045 rows=2 loops=1)

               Index Cond: (certname = ANY ('{mono-scale-agent-61.us-west-2.compute.internal,mono-scale-agent-60.us-west-2.compute.internal}'::text[]))

 Planning time: 0.171 ms

 Trigger for constraint catalogs_certname_fkey on certnames: time=0.711 calls=2

 Trigger for constraint factsets_certname_fk on certnames: time=0.468 calls=2

 Trigger for constraint reports_certname_fkey on certnames: time=4063.079 calls=2

 Trigger for constraint catalog_resources_catalog_id_fkey on catalogs: time=159.127 calls=2

 Trigger for constraint factset_id_fk on factsets: time=1.617 calls=2

 Trigger for constraint certnames_reports_id_fkey on reports: time=491.102 calls=154

 Trigger for constraint resource_events_report_id_fkey on reports: time=197.813 calls=154

 Execution time: 4914.278 ms

(15 rows)

I first made sure those certnames existed and checked that they had reports.

pe-puppetdb=# select count(*) from reports where certname = 'mono-scale-agent-60.us-west-2.compute.internal';

 count

-------

(1 row)

pe-puppetdb=# select count(*) from reports where certname = 'mono-scale-agent-61.us-west-2.compute.internal';

 count

-------

(1 row)

Assuming that the delete the constraint fires looks like below then we can see the delete from reports also cascades to another two tables ( back to certnames and then also resource_events )

pe-puppetdb=# EXPLAIN ANALYZE DELETE from reports where certname = 'mono-scale-agent-62.us-west-2.compute.internal';

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

 Delete on reports  (cost=0.55..189.25 rows=46 width=6) (actual time=49.588..49.588 rows=0 loops=1)

   ->  Index Scan using reports_certname_idx on reports  (cost=0.55..189.25 rows=46 width=6) (actual time=0.064..0.423 rows=77 loops=1)

         Index Cond: (certname = 'mono-scale-agent-62.us-west-2.compute.internal'::text)

 Planning time: 0.068 ms

 Trigger for constraint certnames_reports_id_fkey: time=179.471 calls=77

 Trigger for constraint resource_events_report_id_fkey: time=333.978 calls=77

 Execution time: 563.123 ms

(7 rows)

While all of these are executing fast now that’s likely because I have more memory than the size of the database currently. I can see how all of these small queries would grind to a halt if they had to go to disk for any reason.

Suggestion:

Remove all cascading DELETEs and UDPATEs and instead opt for 1-time bulk queries that resolve the issues the constraints are meant to address.

In some cases, such as reports and resource_events, the situation would resolve itself when the reports are deleted via report-ttl and that would be an efficient bulk deletion.

We could also move to a delete that looks something like this.

DELETE FROM reports

WHERE reports.id IN ( select reports.id

from reports

join certnames

on reports.certname = certnames.certname

where ( certnames.deactivated is not null or certnames.expired is not null)

and (certnames.deactivated < now() - node_purge_ttl

        or  certnames.expired < now() - node_purge_ttl)

);

We would need to create deletes for catalogs, resource_events and other tables that currently have cascading delete.

Other Thoughts

I think this is likely uncommon as I’m not sure how many people use node_purge_ttl and if they do then they probably don’t have 100s of nodes to purge at a time. Even so I think there are benefits to moving to bulk deletes and updates or just doing nothing at all in some cases instead of letting these cascading deletes and updates happen as they generate a lot of small queries that can back up when the database is busy.

Add Comment

This message was sent by Atlassian JIRA

--
You received this message because you are subscribed to the Google Groups "Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-bugs+unsubscr...@googlegroups.com.
To post to this group, send email to puppet-bugs@googlegroups.com.
Visit this group at https://groups.google.com/group/puppet-bugs.
For more options, visit https://groups.google.com/d/optout.

Jira (PDB-2415) PuppetDB Should Reduce Use of Cascading Deletes and Updates In PostgreSQL

The Problem

Suggestion:

Other Thoughts

Reply via email to