Hey,
our database admin solved this problem by executing the following SQL query
in a 3 second sleep loop over multiple days:
DELETE FROM resource_params_cache
WHERE resource_params_cache.resource IN (SELECT resource_params_cache.resource
FROM resource_params_cache
WHERE NOT EXISTS (SELECT * FROM catalog_resources cr
WHERE cr.resource=resource_params_cache.resource)
LIMIT 2000)
RETURNING resource_params_cache;
After this clean up the table size shrinked by 93% and were returned to the
OS with VACUUM FULL
and the database GC did finish in 3 minutes:
2015-01-07 16:09:34,190 INFO [pool-3-thread-1] [cli.services] Starting
database garbage collection
2015-01-07 16:12:21,296 INFO [pool-3-thread-1] [cli.services] Finished
database garbage collection
On Friday, January 2, 2015 6:19:06 PM UTC+1, Andreas Paul wrote:
>
> Hi Wyatt,
>
> - Have you recently upgraded PuppetDB, or changed any other part of your
> Puppet/PuppetDB installation?
> No, we haven't changed our Puppet client/server or PuppetDB version
> recently.
>
> - Was PuppetDB running fine before this started occurring, or are you
> still getting it set up?
> PuppetDB is still running fine except when it tries to compact the
> database with the garbage collection. After the exception the queue gets
> emptied in a matter of seconds and the commands get processed as usual. We
> are running this setup for over a year now and it runs fine except this GC
> issue.
>
> - What version of PuppetDB are you running?
> 1.6.3
>
> - What version of Puppet are you running?
> 2.7.25
>
> - What version of Postgres are you running?
> PostgreSQL 9.2.9
>
> - Can you post the full stacktrace, perhaps in a gist?
> https://gist.github.com/xorpaul/ba0573fe2245abd407f9
>
> - What is the operating system running PuppetDB?
> Debian Wheezy
>
> - When the new commands are failing, they should be showing up in
> /var/lib/puppetdb/mq/discarded. If so, can you find one of the failed
> commands submitted after this error and gist that as well?
> We have some failing commands, but I don't think these correlate with this
> GC problem, but I can send you one command to your email address as these
> could contain some sensitive information that I don't want to post to
> github. I'm xorpaul on freenode and #puppet just send me a message.
>
>
>
> Our DB admin thinks that the SELECT query over this large table is too
> much for the application, but we don't know why it aborts the GC sometimes
> after 5 minutes and sometimes only after hours of trying.
> He also saw that PuppetDB tries to start a new transaction within an
> transaction without first closing the previous one, but this could be
> coincidental to our problem.
>
> Thanks!
>
> On Friday, January 2, 2015 5:32:36 PM UTC+1, Wyatt Alt wrote:
>>
>> Hey Andreas,
>>
>> Can you give some more information about how this cropped up?
>>
>> - Have you recently upgraded PuppetDB, or changed any other part of your
>> Puppet/PuppetDB installation?
>> - Was PuppetDB running fine before this started occurring, or are you
>> still getting it set up?
>> - What version of PuppetDB are you running?
>> - What version of Puppet are you running?
>> - What version of Postgres are you running?
>> - Can you post the full stacktrace, perhaps in a gist?
>> - What is the operating system running PuppetDB?
>> - When the new commands are failing, they should be showing up in
>> /var/lib/puppetdb/mq/discarded. If so, can you find one of the failed
>> commands submitted after this error and gist that as well?
>>
>> That should help get the ball rolling -- to my knowledge we haven't seen
>> this before.
>>
>> Thanks,
>> Wyatt
>>
>>
>> On 01/02/2015 06:44 AM, Andreas Paul wrote:
>> > Hi there,
>> >
>> > the PuppetDB garbage collection fails sometimes after 5 minutes and
>> > sometimes after 2-3 hours with the following error message:
>> >
>> > 2014-12-29 12:12:57,312 INFO [pool-3-thread-1] [cli.services]
>> > Starting database garbage collection
>> > 2014-12-29 16:31:39,106 ERROR [pool-3-thread-1] [cli.services] Error
>> > during garbage collection
>> >
>> >
>> > 2015-01-02 08:03:18,464 INFO [pool-3-thread-3] [cli.services]
>> > Starting database garbage collection
>> > [...]
>> > 2015-01-02 08:08:38,255 ERROR [pool-3-thread-3] [cli.services] Error
>> > during garbage collection
>> > java.sql.BatchUpdateException: Batch entry 0 DELETE FROM
>> > resource_params_cache WHERE NOT EXISTS (SELECT * FROM
>> > catalog_resources cr WHERE cr.resource=resource_params_cache.resource)
>> > was aborted. Call getNextException to see the cause.
>> > at
>> >
>> org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2746)
>>
>>
>> > at
>> >
>> org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:457)
>>
>>
>> > at
>> >
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1887)
>>
>>
>> > at
>> >
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:405)
>>
>> > at
>> >
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2893)
>>
>>
>> > [...]
>> > 2015-01-02 08:08:38,256 INFO [pool-3-thread-3] [cli.services]
>> > Starting sweep of stale nodes (threshold: 35 days)
>> > 2015-01-02 08:08:38,500 INFO [pool-3-thread-3] [cli.services]
>> > Finished sweep of stale nodes (threshold: 35 days)
>> >
>> >
>> >
>> > In this time no command can be processed by any PuppetDB server so the
>> > queue starts to increase very quickly.
>> >
>> >
>> > Here are some information about our PuppetDB data sets involved in the
>> > garbage collection query:
>> >
>> > # Explain output for select
>> > explain select * FROM resource_params_cache WHERE NOT EXISTS (SELECT *
>> FROM catalog_resources cr WHERE
>> cr.resource=resource_params_cache.resource);
>> > QUERY PLAN
>> >
>> --------------------------------------------------------------------------------------------------------------------------
>>
>>
>> > Nested Loop Anti Join (cost=0.00..2089716.08 rows=19861941
>> width=314)
>> > -> Seq Scan on resource_params_cache (cost=0.00..1267703.93
>> rows=19876193 width=314)
>> > -> Index Only Scan using idx_catalog_resources_resource on
>> catalog_resources cr (cost=0.00..11.15 rows=270 width=41)
>> > Index Cond: (resource =
>> (resource_params_cache.resource)::text
>> >
>> > Tables:
>> > Schema | Name | Type | Owner | Size |
>> Description
>> >
>> --------+-------------------------+-------+-------------+---------+-------------
>>
>>
>> > public | catalog_resources | table | puppet_prod | 1696 MB |
>> > public | catalogs | table | puppet_prod | 11 MB |
>> > public | certname_facts | table | puppet_prod | 588 MB |
>> > public | certname_facts_metadata | table | puppet_prod | 6128 kB |
>> > public | certnames | table | puppet_prod | 14 MB |
>> > public | edges | table | puppet_prod | 1227 MB |
>> > public | latest_reports | table | puppet_prod | 9296 kB |
>> > public | reports | table | puppet_prod | 942 MB |
>> > public | resource_events | table | puppet_prod | 11 GB |
>> > public | resource_params | table | puppet_prod | 98 GB |
>> > public | resource_params_cache | table | puppet_prod | 91 GB |
>> > public | schema_migrations | table | puppet_prod | 40 kB |
>> >
>> > # Indexes
>> > Schema | Name | Type |
>> Owner | Table | Size | Description
>> >
>> --------+----------------------------------------------+-------+-------------+-------------------------+---------+-------------
>>
>>
>> > public | catalog_resources_pkey | index |
>> puppet_prod | catalog_resources | 536 MB |
>> > public | catalogs_certname_key | index |
>> puppet_prod | catalogs | 4680 kB |
>> > public | catalogs_hash_key | index |
>> puppet_prod | catalogs | 6632 kB |
>> > public | catalogs_pkey | index |
>> puppet_prod | catalogs | 2152 kB |
>> > public | certname_facts_metadata_certname_key | index |
>> puppet_prod | certname_facts_metadata | 5240 kB |
>> > public | certname_facts_metadata_pkey | index |
>> puppet_prod | certname_facts_metadata | 6304 kB |
>> > public | certname_facts_pkey | index |
>> puppet_prod | certname_facts | 505 MB |
>> > public | certnames_pkey | index |
>> puppet_prod | certnames | 5120 kB |
>> > public | constraint_resource_events_unique | index |
>> puppet_prod | resource_events | 8067 MB |
>> > public | edges_certname_source_target_type_unique_key | index |
>> puppet_prod | edges | 2365 MB |
>> > public | idx_catalog_resources_exported_true | index |
>> puppet_prod | catalog_resources | 23 MB |
>> > public | idx_catalog_resources_resource | index |
>> puppet_prod | catalog_resources | 1197 MB |
>> > public | idx_catalog_resources_type | index |
>> puppet_prod | catalog_resources | 1473 MB |
>> > public | idx_catalog_resources_type_title | index |
>> puppet_prod | catalog_resources | 1473 MB |
>> > public | idx_catalogs_transaction_uuid | index |
>> puppet_prod | catalogs | 14 MB |
>> > public | idx_certname_facts_certname | index |
>> puppet_prod | certname_facts | 511 MB |
>> > public | idx_certname_facts_name | index |
>> puppet_prod | certname_facts | 877 MB |
>> > public | idx_latest_reports_report | index |
>> puppet_prod | latest_reports | 11 MB |
>> > public | idx_reports_certname | index |
>> puppet_prod | reports | 543 MB |
>> > public | idx_reports_end_time | index |
>> puppet_prod | reports | 180 MB |
>> > public | idx_reports_transaction_uuid | index |
>> puppet_prod | reports | 210 MB |
>> > public | idx_resource_events_containing_class | index |
>> puppet_prod | resource_events | 1150 MB |
>> > public | idx_resource_events_property | index |
>> puppet_prod | resource_events | 1150 MB |
>> > public | idx_resource_events_report | index |
>> puppet_prod | resource_events | 4089 MB |
>> > public | idx_resource_events_resource_type | index |
>> puppet_prod | resource_events | 1162 MB |
>> > public | idx_resource_events_resource_type_title | index |
>> puppet_prod | resource_events | 3461 MB |
>> > public | idx_resource_events_status | index |
>> puppet_prod | resource_events | 1150 MB |
>> > public | idx_resource_events_timestamp | index |
>> puppet_prod | resource_events | 1259 MB |
>> > public | idx_resources_params_name | index |
>> puppet_prod | resource_params | 5397 MB |
>> > public | idx_resources_params_resource | index |
>> puppet_prod | resource_params | 12 GB |
>> > public | latest_reports_pkey | index |
>> puppet_prod | latest_reports | 5152 kB |
>> > public | reports_pkey | index |
>> puppet_prod | reports | 594 MB |
>> > public | resource_params_cache_pkey | index |
>> puppet_prod | resource_params_cache | 2057 MB |
>> > public | resource_params_pkey | index |
>> puppet_prod | resource_params | 13 GB |
>> > public | schema_migrations_pkey | index |
>> puppet_prod | schema_migrations | 16 kB |
>> >
>> > # Statistics for resource_params_cache
>> > table_len | 8756715520
>> > tuple_count | 24245975
>> > tuple_len | 8300471564
>> > tuple_percent | 94.79
>> > dead_tuple_count | 348
>> > dead_tuple_len | 134886
>> > dead_tuple_percent | 0
>> > free_space | 219676684
>> > free_percent | 2.51
>> >
>> >
>> > select count(*) from catalog_resources;
>> > count
>> > ---------
>> > 3839182
>> >
>> > select count(*) from resource_params_cache;
>> > count
>> > ----------
>> > 24246308
>> >
>> >
>> >
>> >
>> >
>> > We will upgrade when our Puppet 3 servers are ready but, as it looks
>> right now, we're still stuck with having to operate Puppet 2.x for some
>> time to come, so we'd like to fix this problem tight now.
>> >
>> > Thanks in advance!
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "Puppet Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an email to [email protected]
>> > <mailto:[email protected]>.
>> > To view this discussion on the web visit
>> >
>> https://groups.google.com/d/msgid/puppet-users/8bd3cd4b-58cc-4377-a34b-fccc235813e8%40googlegroups.com
>>
>> > <
>> https://groups.google.com/d/msgid/puppet-users/8bd3cd4b-58cc-4377-a34b-fccc235813e8%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>
>>
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/a523fc47-e096-416b-93a1-18bde34b5e15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.