Re: [ADMIN] Dumping a database that is not accepting commands?

Natalie Wenz Tue, 17 Sep 2013 10:11:41 -0700

On Sep 17, 2013, at 7:43 AM, Kevin Grittner <kgri...@ymail.com> wrote:

> Natalie Wenz <nataliew...@ebureau.com> wrote:
> 
>> I have a large database from our test environment that got into trouble with
>> some high volume and some long-running queries about…six weeks ago? We have a
>> buffer mechanism that has been storing the new data since the database 
>> stopped
>> accepting connections, so we haven't really lost any data, which is good.
>> But the single-user backend vacuum freeze is still grinding away, using 100% 
>> cpu
>> most of the time, except when it's doing a lot of writes. We did cancel the
>> vacuum once so we could stop the database and take a snapshot of the 
>> filesystem
>> (we are using ZFS on FreeBSD) and copy the snapshot to another machine. This
>> allowed us to get a fresh database started (with 9.3!) where we could unspool
>> the last six weeks of data into a fresh database, and be able to access at 
>> least
>> that much of our data.
> 
> I'm not sure how you could have done that without dealing with the
> wraparound before the upgrade.

Sorry; my description of what is going on was a little unclear. We didn't 
upgrade the existing database. We moved it to different hardware, and just 
created a brand new database to accept the data that had been backing up in 
sqlite files while our original database was offline. I'm still dealing with 
the wraparound on the original, just on a different machine.

> 
>> Now:
>> I have a copy of the database (with data from all time up until the database
>> shut itself down six weeks ago) that I just need the data from. I am becoming
>> impatient with the vacuum, as it appears to have not even started working on 
>> the
>> files for one of the largest relations in the database (that table was about
>> 14TB last I saw). I'm trying to find alternatives to waiting another who
>> knows how many weeks for the vacuum to finish just to have the database in a
>> state where I can dump the data out, since this is no longer the
>> "live" version. This copy running on hardware with plenty of space to
>> work with. The database has about a million transactions before it wraps.
> 
> The copy running on 9.3, or the original?
The brand new database on 9.3 is running great. The original, running on 9.1, 
is the one I'd like to extract data from (it is currently vacuuming). After I 
get the data out of the original, whether I wait for the vacuum to finish or 
not, I'll have to manually import all of the old data into the new database, 
but it was decided that that was a reasonable tradeoff if it meant we could get 
our current data out of the buffer files and get our regular tools and reports 
back online today. We want all of the older data eventually, but it's not 
critical for our day-to-day operation.  

> 
>> Is it possible to somehow just dump the data using pg_dump or pg_dumpall? I
>> haven't tried to see if those utilities will work when the database is
>> protecting itself from data loss. If it were possible, would it be wise (do
>> those utilities use tons of transactions to dump, or would it be safely 
>> within
>> the million or so that I have)? I suppose I could use copy?
> 
> pg_dump uses COPY (by default, anyway), and does all its work in a
> single transaction.  But it needs to start that transaction.

I'll give that a try. It sounds like I technically have enough transactions 
left to copy the data out, so that is good. Hopefully the database will let me 
use that utility. Otherwise I'll just do a copy from the backend. 

> 
>> Would there be any risks if I were to do that? Maybe none of this
>> is risky at this point because we can always clone the original
>> snapshot, and try again.
> 
> I'm not sure that pg_dump followed by restore would be expected to
> be faster than finishing the VACUUM, unless that is configured to
> pace itself way too slowly.

That is actually a great point. I guess I'm anxious to *do* something, but 
maybe it will be so slow I might not gain much. I have been wondering about our 
vacuum configuration, actually. I would absolutely welcome feedback in this 
area, either in general, how a database like this should be tuned, and also if 
there is any tuning to change when there is an emergency backend vacuum that 
has to be run, in the interest of the vacuum running as aggressively as 
possible.

 version                         | PostgreSQL 9.1.9 on 
x86_64-unknown-freebsd9.1, compiled by gcc (GCC) 4.2.1 20070831 patched 
[FreeBSD], 64-bit
 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_freeze_max_age       | 800000000
 autovacuum_max_workers          | 3
 autovacuum_vacuum_cost_delay    | 0
 autovacuum_vacuum_scale_factor  | 0.1
 checkpoint_segments             | 128
 effective_cache_size            | 12GB
 listen_addresses                | *
 log_autovacuum_min_duration     | 10s
 log_destination                 | stderr
 log_filename                    | logfile-%A.log
 log_line_prefix                 | %t:%u:%r:[%p]: 
 log_rotation_age                | 1d
 log_rotation_size               | 1GB
 log_truncate_on_rotation        | on
 logging_collector               | on
 maintenance_work_mem            | 10GB
 max_connections                 | 500
 max_stack_depth                 | 2MB
 random_page_cost                | 1
 seq_page_cost                   | 1
 shared_buffers                  | 128MB
 synchronous_commit              | off
 temp_buffers                    | 128MB
 TimeZone                        | US/Central
 vacuum_cost_limit               | 500
 wal_buffers                     | 32MB
 work_mem                        | 256MB

This is the tuning of the original database, anything changed from the default 
settings. The machine it was running on had 48GB of memory. The database was 
36TB, with 2 tables taking up the bulk of that (about 14TB each), and about 10 
other tables and a few large indexes making up the rest. Our typical usage 
pattern is mostly inserts, with a some hourly summaries (which take maybe 5 
minutes), some daily summaries (which take about 20-40 minutes), and a couple 
of end of month queries that take several hours. We have the same setup and 
tuning in production, which is about the same size, with an additional end of 
month query that runs off one of the 14TB tables, which can take 4-7 days. 

>> Any ideas or suggestions?
> 
> After getting past this crisis, I would take a close look at your
> vacuuming regimen -- it sounds like it is not aggressive enough to
> keep you out of trouble.

Normally the autovacuum seem to keep up sufficiently. We got into trouble with 
a bad combination of not-typical long-running queries, a disk failure and 
subsequent zpool repair, and the only person who checks the log files regularly 
(me) was out of the office. This has been so painful and slow to recover from, 
I don't think we'll ever get into this mess again. (At least not quite like 
this. I seem to have a knack for finding *new* ways to break things.) I will 
also take this opportunity to mention again that if anyone is considering a 
making a patch for 64-bit xids, you would make at least one small group of 
people very, very happy.  :)

Side question: some of the longer queries that were running when the database 
yakked were deletes of old data that had been manually migrated to a new table 
with a more appropriate format. We were running out of disk space, so we were 
trying to clear up some space by removing data we now had in two places. Does a 
delete of previously-frozen rows unfreeze them, or anything like that? Because 
in a series of maybe a dozen queries or so, we deleted billions of rows. Does 
that generate a significant amount of extra work for the autovacuumer? 

> I'm sorry that I don't have a better suggestion for resolving the
> crisis than running VACUUM at maximum speed.

Thanks for your feedback! It's kind of a dismal situation, but I'm anxious to 
learn whatever I can from it. 

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] Dumping a database that is not accepting commands?

Reply via email to