[PERFORM] Failing Multi-Job Restores, Missing Indexes on Restore

Cea Stapleton Thu, 29 Sep 2016 05:37:43 -0700

Hi!

We are having a baffling problem we hope you might be able to help with. We 
were hoping to speed up postgres restores to our reporting server. First, we 
were seeing missing indexes with pg_restore to our reporting server for one of 
our databases when we did pg_restore with multiple jobs (a clean restore, we 
also tried dropping the database prior to restore, just in case something was 
extant and amiss). The indexes missed were not consistent, and we were only 
ever seeing errors on import that indicated an index had not yet been built. 
For example:


pg_restore: [archiver (db)] could not execute query: ERROR:  index 
"index_versions_on_item_type_and_item_id" does not exist
   Command was: DROP INDEX public.index_versions_on_item_type_and_item_id;

Which seemed like a reasonable error to us. We had no errors on insertion to 
indicate that index creation was a problem. 

We believed this might be a race condition, so we attempted to do a schema-only 
restore followed by a data-only restore just for this database. This worked a 
few times, and then began growing exponentially in completion time before it 
became unsustainable. We figured we were using too many jobs, so we decreased 
them. Nothing helped.

We decided to move back to a multi-job regular restore, and then the restores 
began crashing thusly:
[2016-09-14 02:20:36 UTC]    LOG:  server process (PID 27624) was terminated by 
signal 9: Killed
[2016-09-14 02:20:36 UTC]    LOG:  terminating any other active server processes
[2016-09-14 02:20:36 UTC] postgres [local] DBNAME WARNING:  terminating 
connection because of crash of another server process
[2016-09-14 02:20:36 UTC] postgres [local] DBNAME DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted shared 
memory.

The restore crashed this way for all job numbers except for one. We’re now 
stuck back where we were prior to increasing job numbers, at one job for this 
restore in order to prevent errors and crashes.  

Background: 
        • 3 ec2 instances with postgres
                • 1 used for reporting, on Postgresql 9.5.4
                        • Reporting server is a c4.2xlarge, and should have 
been able to handle multiple jobs (8cpu / 
https://aws.amazon.com/ec2/instance-types/ )
                • 2 production servers; one leader and one follower, both on 
Postgresql 9.5.3. 

We have one very large database, 678GB, and several others, but the largest is 
our concern. 

I have attached our postgresql.conf file. Thank you so much for your time.

Best,



Cea Stapleton 
Operations Engineer
http://www.healthfinch.com

postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Failing Multi-Job Restores, Missing Indexes on Restore

Reply via email to