Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread Kevin Grittner
Natalie Wenz  wrote:

> I have a large database from our test environment that got into trouble with
> some high volume and some long-running queries about…six weeks ago? We have a
> buffer mechanism that has been storing the new data since the database stopped
> accepting connections, so we haven't really lost any data, which is good.
> But the single-user backend vacuum freeze is still grinding away, using 100% 
> cpu
> most of the time, except when it's doing a lot of writes. We did cancel the
> vacuum once so we could stop the database and take a snapshot of the 
> filesystem
> (we are using ZFS on FreeBSD) and copy the snapshot to another machine. This
> allowed us to get a fresh database started (with 9.3!) where we could unspool
> the last six weeks of data into a fresh database, and be able to access at 
> least
> that much of our data.

I'm not sure how you could have done that without dealing with the
wraparound before the upgrade.

> Now:
> I have a copy of the database (with data from all time up until the database
> shut itself down six weeks ago) that I just need the data from. I am becoming
> impatient with the vacuum, as it appears to have not even started working on 
> the
> files for one of the largest relations in the database (that table was about
> 14TB last I saw). I'm trying to find alternatives to waiting another who
> knows how many weeks for the vacuum to finish just to have the database in a
> state where I can dump the data out, since this is no longer the
> "live" version. This copy running on hardware with plenty of space to
> work with. The database has about a million transactions before it wraps.

The copy running on 9.3, or the original?

> Is it possible to somehow just dump the data using pg_dump or pg_dumpall? I
> haven't tried to see if those utilities will work when the database is
> protecting itself from data loss. If it were possible, would it be wise (do
> those utilities use tons of transactions to dump, or would it be safely within
> the million or so that I have)? I suppose I could use copy?

pg_dump uses COPY (by default, anyway), and does all its work in a
single transaction.  But it needs to start that transaction.

> Would there be any risks if I were to do that? Maybe none of this
> is risky at this point because we can always clone the original
> snapshot, and try again.

I'm not sure that pg_dump followed by restore would be expected to
be faster than finishing the VACUUM, unless that is configured to
pace itself way too slowly.

> Any ideas or suggestions?

After getting past this crisis, I would take a close look at your
vacuuming regimen -- it sounds like it is not aggressive enough to
keep you out of trouble.

I'm sorry that I don't have a better suggestion for resolving the
crisis than running VACUUM at maximum speed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Too many WAL archive files

2013-09-17 Thread Keith Ouellette
Okay, I guess it makes sense that I am at 16GB based on your explanation below. 
I do not have a space constraint, so having 16GB is not the problem. The 
problem is that we have experienced connectivity issues in the past where 
replication would not startup. I would do a "manual" sync using rsync for the 
data directory. That includes the archive directory. I tried to exclude the 
archive directory, but when restarting PostgreSQL, I would get the archive 
missing errors in the startup log and PostgreSQL would not start. I guess I 
could reduce the number from 1000 (which looks like about 55 days with an 
average of 18 files being created each day) to a lower number. We picked 1000 
as this is a new application for us and we did not know how fast the WAL files 
would fill.  If I do make that change, will the excess files be automatically 
deleted or do I have to do that manually?



Thanks,

Keith




From: Matheus de Oliveira [matioli.math...@gmail.com]
Sent: Saturday, September 14, 2013 11:55 AM
To: Keith Ouellette
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Too many WAL archive files


On Sat, Sep 14, 2013 at 11:19 AM, Keith Ouellette 
mailto:keith.ouelle...@airgas.com>> wrote:

My company is using PostgreSQL 9.1 for one of our applications. We have it set 
up replicating between two sites using WAL and Pacemaker to manage the cluster 
and failover. I have noticed that archive folder is growing very large 16G out 
of the 19G that the data directory is in total. We have the wal_keep_segments = 
1000 set in the postgresql.conf file.



I thought that meant to keep the last 1000 files in the event that it was 
needed to catch up after falling behind.

No. PostgreSQL will always keep at least (can be a little more) 
wal_keep_segments files in any situation. So, wal_keep_segments is set to 1000, 
and each wal file has 16MB, doing the math it means PostgreSQL will use at 
least 15.625GB (~16GB) for wal files, it is what you have. If you don't have 
enough space for this, you should set keep wal_keep_segments to a lower value.


I am noticing that I have 12000 files in the archive directory. I know that 
includes the .backup files as well, but that is a smaller portion of the files 
in that directory.

Unless you have a lot of .backup files, there is no way 12000 wal file will use 
only 16GB, it would be 187.5GB, so this number seems wrong.

Also check if you have set up archiving and if it is working, because if 
archive_command fails, PostgreSQL will keep the "failed on archive" files on 
pg_xlog path and keep trying it.

Best regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres



Re: [ADMIN] New autovacuum messages in postgres log after upgrade

2013-09-17 Thread Kevin Grittner
Benjamin Krajmalnik  wrote:

> During a maintenance window, we upgraded our systems to Postgres
> 9.0.13 from 9.0.3 running on FreeBSD 8.1 amd64.
> When we restarted the postgres server, I notices, and continue to
> notice, a recurrence of messages in the log.
> 
> 2013-09-16 21:15:58 MDT LOG:  automatic vacuum of table 
> "ishield.public.tbltmptests": could not (re)acquire exclusive lock for 
> truncate scan
> 2013-09-16 21:15:59 MDT LOG:  automatic vacuum of table 
> "ishield.public.tbltmptests": could not (re)acquire exclusive lock for 
> truncate scan
> 2013-09-16 21:16:00 MDT LOG:  automatic vacuum of table 
> "ishield.public.tbltmptests": could not (re)acquire exclusive lock for 
> truncate scan
> 
> The tables on which I am seeing this messages are essentially
> temporary work tables into which we insert records, which are
> processed by triggers, and then deleted.
> Performance of the system does not seem to have been affected by
> the upgrade.
> Is this simply caused by a higher level of verbosity in the
> autovaccum logging, as a result of the autovacuum fix in 9.0.12?

Yes, table truncation after a large number of deletes is now
smarter, getting more done with less effort and blocking.  This
message, which was useful for developing the fix, made it into
production at the LOG level.  In the next minor release it will be
changed to the DEBUG level to avoid cluttering the log with entries
about routine activities.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[ADMIN] Script timeout

2013-09-17 Thread Roberto Grandi

Dear all,

i would set a script timeout from a .net application. My problem regards how to 
make sure that time out will be reset at the end of the script such as

SET statement_timeout 1000
SELECT pg_sleep(2);

trowhs exception


After that

SELECT pg_sleep(2);

wouldn't throw exception.


Can you help me to solve this issue?

Many thanks in advance.

BR
Roberto


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread Natalie Wenz
On Sep 17, 2013, at 7:43 AM, Kevin Grittner  wrote:

> Natalie Wenz  wrote:
> 
>> I have a large database from our test environment that got into trouble with
>> some high volume and some long-running queries about…six weeks ago? We have a
>> buffer mechanism that has been storing the new data since the database 
>> stopped
>> accepting connections, so we haven't really lost any data, which is good.
>> But the single-user backend vacuum freeze is still grinding away, using 100% 
>> cpu
>> most of the time, except when it's doing a lot of writes. We did cancel the
>> vacuum once so we could stop the database and take a snapshot of the 
>> filesystem
>> (we are using ZFS on FreeBSD) and copy the snapshot to another machine. This
>> allowed us to get a fresh database started (with 9.3!) where we could unspool
>> the last six weeks of data into a fresh database, and be able to access at 
>> least
>> that much of our data.
> 
> I'm not sure how you could have done that without dealing with the
> wraparound before the upgrade.

Sorry; my description of what is going on was a little unclear. We didn't 
upgrade the existing database. We moved it to different hardware, and just 
created a brand new database to accept the data that had been backing up in 
sqlite files while our original database was offline. I'm still dealing with 
the wraparound on the original, just on a different machine.


> 
>> Now:
>> I have a copy of the database (with data from all time up until the database
>> shut itself down six weeks ago) that I just need the data from. I am becoming
>> impatient with the vacuum, as it appears to have not even started working on 
>> the
>> files for one of the largest relations in the database (that table was about
>> 14TB last I saw). I'm trying to find alternatives to waiting another who
>> knows how many weeks for the vacuum to finish just to have the database in a
>> state where I can dump the data out, since this is no longer the
>> "live" version. This copy running on hardware with plenty of space to
>> work with. The database has about a million transactions before it wraps.
> 
> The copy running on 9.3, or the original?
The brand new database on 9.3 is running great. The original, running on 9.1, 
is the one I'd like to extract data from (it is currently vacuuming). After I 
get the data out of the original, whether I wait for the vacuum to finish or 
not, I'll have to manually import all of the old data into the new database, 
but it was decided that that was a reasonable tradeoff if it meant we could get 
our current data out of the buffer files and get our regular tools and reports 
back online today. We want all of the older data eventually, but it's not 
critical for our day-to-day operation.  

> 
>> Is it possible to somehow just dump the data using pg_dump or pg_dumpall? I
>> haven't tried to see if those utilities will work when the database is
>> protecting itself from data loss. If it were possible, would it be wise (do
>> those utilities use tons of transactions to dump, or would it be safely 
>> within
>> the million or so that I have)? I suppose I could use copy?
> 
> pg_dump uses COPY (by default, anyway), and does all its work in a
> single transaction.  But it needs to start that transaction.

I'll give that a try. It sounds like I technically have enough transactions 
left to copy the data out, so that is good. Hopefully the database will let me 
use that utility. Otherwise I'll just do a copy from the backend. 

> 
>> Would there be any risks if I were to do that? Maybe none of this
>> is risky at this point because we can always clone the original
>> snapshot, and try again.
> 
> I'm not sure that pg_dump followed by restore would be expected to
> be faster than finishing the VACUUM, unless that is configured to
> pace itself way too slowly.

That is actually a great point. I guess I'm anxious to *do* something, but 
maybe it will be so slow I might not gain much. I have been wondering about our 
vacuum configuration, actually. I would absolutely welcome feedback in this 
area, either in general, how a database like this should be tuned, and also if 
there is any tuning to change when there is an emergency backend vacuum that 
has to be run, in the interest of the vacuum running as aggressively as 
possible.


 version | PostgreSQL 9.1.9 on 
x86_64-unknown-freebsd9.1, compiled by gcc (GCC) 4.2.1 20070831 patched 
[FreeBSD], 64-bit
 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_freeze_max_age   | 8
 autovacuum_max_workers  | 3
 autovacuum_vacuum_cost_delay| 0
 autovacuum_vacuum_scale_factor  | 0.1
 checkpoint_segments | 128
 effective_cache_size| 12GB
 listen_addresses| *
 log_autovacuum_min_duration | 10s
 log_destination | stderr
 log_filename| logfile-%A.log
 log_line_prefix | %t:%u:%r:[%p]: 

Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread Natalie Wenz
No…  the shared_buffers value is just a legacy value that never got changed 
(the shmmax value in sysctl is still 1073741824).  When I set up the new 
database, I set the shared_buffers to 25% of system memory, so 12GB. (And since 
the new database is on 9.3, I didn't have to adjust the sysctl value for 
shmmax! Happy day!)

We used to have maintenance_work_mem set to something smaller, but we bumped 
that up after…… the last time this database shut itself down to avoid 
wraparound in March 2012. We were hoping that would help speed the recovery at 
that time. Not sure if it did, but we left it that way afterward anyway.


On Sep 17, 2013, at 2:02 PM, bricklen  wrote:

> On Tue, Sep 17, 2013 at 9:48 AM, Natalie Wenz  wrote:
>  maintenance_work_mem| 10GB
>  shared_buffers  | 128MB
> 
> maintenance_work_mem seems pretty high, and shared_buffers seems really low.  
> Out of curiousity, were those set as a product of internal testing which 
> determined those were effective settings?
> 



Re: [ADMIN] Script timeout

2013-09-17 Thread Roberto Grandi

Hi,

I think your suggestion is


SELECT set_config('statement_timeout','1000 s',false); -- enable timeout

--
-- DO SQL Statements here
--

SELECT set_config('statement_timeout','0',false); -- disable timeout


Am I correct? Moreover what can happen if another query runs on the same 
connection pool ?
I would limit the effect only on this specific query.


Roberto



- Original Message -
From: "Federico" 
To: "Roberto Grandi" 
Cc: pgsql-admin@postgresql.org
Sent: Tuesday, September 17, 2013 6:37:09 PM
Subject: Re: [ADMIN] Script timeout

This should work.
SELECT set_config('statement_timeout','1000 s',false);

The set_config function is quite flexible as can accept dynamic values.
More info here
http://www.postgresql.org/docs/current/static/functions-admin.html

I've used successfully to change the script timeout for each statement in
my maintenance scripts using a temporary table to store the deadline.

Kind Regards
Federico


On 17 September 2013 17:23, Roberto Grandi wrote:

>
> Dear all,
>
> i would set a script timeout from a .net application. My problem regards
> how to make sure that time out will be reset at the end of the script such
> as
>
> SET statement_timeout 1000
> SELECT pg_sleep(2);
>
> trowhs exception
>
>
> After that
>
> SELECT pg_sleep(2);
>
> wouldn't throw exception.
>
>
> Can you help me to solve this issue?
>
> Many thanks in advance.
>
> BR
> Roberto
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



-- 
Federico Campoli
DE MATERIALIZING, UK, Planet Earth, The Milky Way Galaxy
/***
There's no point being grown-up if you can't be childish sometimes.
(The fourth Doctor)
http://www.pgdba.co.uk
***/
Hi


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread Natalie Wenz
It occurs to me that asking for feedback on the tuning, I am asking about two 
separate things:

Was there anything in the tuning below that contributed to the database getting 
into trouble? And is there anything I should change in that tuning to make the 
single-user vacuum as fast as it can be for optimal recovery time?

>  version | PostgreSQL 9.1.9 on 
> x86_64-unknown-freebsd9.1, compiled by gcc (GCC) 4.2.1 20070831 patched 
> [FreeBSD], 64-bit
>  autovacuum  | on
>  autovacuum_analyze_scale_factor | 0.1
>  autovacuum_freeze_max_age   | 8
>  autovacuum_max_workers  | 3
>  autovacuum_vacuum_cost_delay| 0
>  autovacuum_vacuum_scale_factor  | 0.1
>  checkpoint_segments | 128
>  effective_cache_size| 12GB
>  listen_addresses| *
>  log_autovacuum_min_duration | 10s
>  log_destination | stderr
>  log_filename| logfile-%A.log
>  log_line_prefix | %t:%u:%r:[%p]: 
>  log_rotation_age| 1d
>  log_rotation_size   | 1GB
>  log_truncate_on_rotation| on
>  logging_collector   | on
>  maintenance_work_mem| 10GB
>  max_connections | 500
>  max_stack_depth | 2MB
>  random_page_cost| 1
>  seq_page_cost   | 1
>  shared_buffers  | 128MB
>  synchronous_commit  | off
>  temp_buffers| 128MB
>  TimeZone| US/Central
>  vacuum_cost_limit   | 500
>  wal_buffers | 32MB
>  work_mem| 256MB
> 
> This is the tuning of the original database, anything changed from the 
> default settings. The machine it was running on had 48GB of memory. The 
> database was 36TB, with 2 tables taking up the bulk of that (about 14TB 
> each), and about 10 other tables and a few large indexes making up the rest. 
> Our typical usage pattern is mostly inserts, with a some hourly summaries 
> (which take maybe 5 minutes), some daily summaries (which take about 20-40 
> minutes), and a couple of end of month queries that take several hours. We 
> have the same setup and tuning in production, which is about the same size, 
> with an additional end of month query that runs off one of the 14TB tables, 
> which can take 4-7 days. 
> 

As far as ideal tuning for the new database, running on 9.3, which will 
eventually hold all the data from the sad, recovering original database with 
the usage patterns described below, how is this for a starting point? I tried 
to follow the basic guidelines in the High Performance book, but sometimes I 
feel like I'm largely guessing.

  name   |  
  current_setting
-+---
 version | PostgreSQL 9.3.0 on 
x86_64-unknown-freebsd9.1, compiled by gcc (GCC) 4.2.1 20070831 patched 
[FreeBSD], 64-bit
 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_freeze_max_age   | 8
 autovacuum_max_workers  | 3
 autovacuum_vacuum_cost_delay| 0
 autovacuum_vacuum_scale_factor  | 0.1
 checkpoint_segments | 128
 effective_cache_size| 12GB
 lc_collate  | C
 lc_ctype| C
 listen_addresses| *
 log_autovacuum_min_duration | 1min
 log_destination | stderr
 log_filename| logfile-%A.log
 log_line_prefix | %t:%u:%r:[%p]: 
 log_min_duration_statement  | 1min
 log_rotation_age| 1d
 log_rotation_size   | 1GB
 log_truncate_on_rotation| on
 logging_collector   | on
 maintenance_work_mem| 4GB
 max_connections | 500
 max_stack_depth | 2MB
 random_page_cost| 1
 seq_page_cost   | 1
 shared_buffers  | 12GB
 synchronous_commit  | off
 temp_buffers| 128MB
 TimeZone| US/Central
 vacuum_cost_limit   | 500
 wal_buffers | 16MB
 work_mem| 256MB






Re: [ADMIN] Script timeout

2013-09-17 Thread Federico
This should work.
SELECT set_config('statement_timeout','1000 s',false);

The set_config function is quite flexible as can accept dynamic values.
More info here
http://www.postgresql.org/docs/current/static/functions-admin.html

I've used successfully to change the script timeout for each statement in
my maintenance scripts using a temporary table to store the deadline.

Kind Regards
Federico


On 17 September 2013 17:23, Roberto Grandi wrote:

>
> Dear all,
>
> i would set a script timeout from a .net application. My problem regards
> how to make sure that time out will be reset at the end of the script such
> as
>
> SET statement_timeout 1000
> SELECT pg_sleep(2);
>
> trowhs exception
>
>
> After that
>
> SELECT pg_sleep(2);
>
> wouldn't throw exception.
>
>
> Can you help me to solve this issue?
>
> Many thanks in advance.
>
> BR
> Roberto
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



-- 
Federico Campoli
DE MATERIALIZING, UK, Planet Earth, The Milky Way Galaxy
/***
There's no point being grown-up if you can't be childish sometimes.
(The fourth Doctor)
http://www.pgdba.co.uk
***/


Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread Kevin Grittner
Natalie Wenz  wrote:

> Sorry; my description of what is going on was a little unclear.
> We didn't upgrade the existing database. We moved it to different
> hardware, and just created a brand new database to accept the
> data that had been backing up in sqlite files while our original
> database was offline. I'm still dealing with the wraparound on
> the original, just on a different machine.

OK, to restate to be sure I understand, the original database is
being vacuumed in a new location, and a new 9.3 database in the
original location has absorbed the queued data?  Once you complete
the vacuum, you will copy the old data back to the new database at
the old location?

> autovacuum_freeze_max_age   | 8

> Normally the autovacuum seem to keep up sufficiently. We got into
> trouble with a bad combination of not-typical long-running
> queries, a disk failure and subsequent zpool repair, and the only
> person who checks the log files regularly (me) was out of the
> office. This has been so painful and slow to recover from, I
> don't think we'll ever get into this mess again. (At least not
> quite like this. I seem to have a knack for finding *new* ways to
> break things.)

A perfect storm of events, eh?  It's hard to have things always go
smoothly in the face of such events, but I see a couple things you
might want to consider.  Increasing autovacuum_freeze_max_age
reduces the amount of time you have to get back on track.  You
might want to take that back down to the default.  There was a bug
causing wraparound prevention autovacuums to trigger too
frequently, which is now fixed in the latest minor releases, so
making that chnage might not be as painful as you expect.  Make
sure you are monitoring for long-running transactions, so you don't
get burned by one that is accidental.  Also, if you have a
processing cycle where there are off-peak hours on a daily or
weekly basis, you might want to run a VACUUM ANALYZE command
durning those windows, to get some of the freezing done before it
is critical.

> I will also take this opportunity to mention again that if anyone
> is considering a making a patch for 64-bit xids, you would make
> at least one small group of people very, very happy.  :)

While 64-bit xids isn't likely, there is some work taking a more
creatie approach to the issue which might make you even happier in
a year or so.  :-)

> Side question: some of the longer queries that were running when
> the database yakked were deletes of old data that had been
> manually migrated to a new table with a more appropriate format.
> We were running out of disk space, so we were trying to clear up
> some space by removing data we now had in two places. Does a
> delete of previously-frozen rows unfreeze them, or anything like
> that? Because in a series of maybe a dozen queries or so, we
> deleted billions of rows. Does that generate a significant amount
> of extra work for the autovacuumer? 

The pages which had tuples deleted would need to be cleaned up by
vacuum, and rewritten.  It would also remove all index entries for
all deleted rows.  It might also scan backward from the end of the
table to release space to the OS.  That could conceivably be enough
avoidable work to make your idea of copying out the remaining data
feasible.  What percentage of the rows were deleted?  Could your
copy out be to a separate set of drives?

If you know at the time a row is added what group it will be in for
deletion, it might pay to move to partitioning, so that a group of
rows could be deleted pretty much as fast as you can drop a table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] Dumping a database that is not accepting commands?

2013-09-17 Thread bricklen
On Tue, Sep 17, 2013 at 9:48 AM, Natalie Wenz wrote:

>  maintenance_work_mem| 10GB
>  shared_buffers  | 128MB
>

maintenance_work_mem seems pretty high, and shared_buffers seems really
low.  Out of curiousity, were those set as a product of internal testing
which determined those were effective settings?