Re: [ADMIN] wrong database name in error message?

2013-09-16 Thread Kevin Grittner
Rural Hunter ruralhun...@gmail.com wrote:

 This was changed quite long time ago when I saw too frequent auto
 vacuums to prevent the wrap-around on a very busy/large table
 which slow down the performance. I will change it back to the
 default to see how it works.

There was a long-standing bug which could cause over-frequent
wraparound prevention autovacuums.  As long as you are on the
latest minor release, things should be much better now.

 I will try the parameters as you suggested too.

Possibly.  As I said before, I think the symptoms might better fit
a situation where the table in need of VACUUM was a shared table
and it just happened to mention db1 because that was the database
it was scanning at the time.  (Every database includes the shared
system tables in its catalog.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-16 Thread Rural Hunter

于 2013/9/17 0:02, Kevin Grittner 写道:
Possibly. As I said before, I think the symptoms might better fit a 
situation where the table in need of VACUUM was a shared table and it 
just happened to mention db1 because that was the database it was 
scanning at the time. (Every database includes the shared system 
tables in its catalog.)

OK, that sounds reasonable. thanks a lot for your advice and explanation.



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-15 Thread Kevin Grittner
Rural Hunter ruralhun...@gmail.com wrote:

 I'm on Ubuntu 12.04.1 64bit with 32 cores and 377G memory. The
 data is stored on several rai10 SAS 15k disks.

With a machine that beefy I have found it necessary to make the
autovacuum settings more aggressive.  Otherwise the need for
vacuuming can outpace the ability of autovacuum to keep up.

   autovacuum_freeze_max_age  | 20 | configuration file

   vacuum_freeze_table_age    | 10 | configuration file

There's your problem.  You left so little space between when
autovacuum would kick in for wraparound prevention (2 billion
transactions) and when the server prevents new transactions in
order to protect your data (2 ^ 31 - 100 transactions) that
autovacuum didn't have enough time to complete its effort to do so.

Changing a setting to ten times its default value is something
which should always be approached with caution.  In this case you
changed the threshold for starting the work to prevent data loss
from a little under 10% of the distance to the disastrous condition
to a little under 100% of that distance.

You could play with non-standard setting for these, but if you go
anywhere near this extreme you risk downtime like you have just
experienced.  Personally, I have never had a reason to change these
from the defaults.

To ensure that autovacuum can keep up with the activity on a
machine like this, I have generally gone to something like:

autovacuum_cost_limit = 800

If you have more than a couple large tables which take long enough
to scan to prevent small, frequently-updated tables from getting
attention soon enough, you might want to boost
autovacuum_max_workers, too.
 
-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-15 Thread Rural Hunter

于 2013/9/16 1:31, Kevin Grittner 写道:

There's your problem.  You left so little space between when
autovacuum would kick in for wraparound prevention (2 billion
transactions) and when the server prevents new transactions in
order to protect your data (2 ^ 31 - 100 transactions) that
autovacuum didn't have enough time to complete its effort to do so.

Changing a setting to ten times its default value is something
which should always be approached with caution.  In this case you
changed the threshold for starting the work to prevent data loss
from a little under 10% of the distance to the disastrous condition
to a little under 100% of that distance.

You could play with non-standard setting for these, but if you go
anywhere near this extreme you risk downtime like you have just
experienced.  Personally, I have never had a reason to change these
from the defaults.
OK, thanks for pointing out the problem. This was changed quite long 
time ago when I saw too frequent auto vacuums to prevent the wrap-around 
on a very busy/large table which slow down the performance. I will 
change it back to the default to see how it works.


To ensure that autovacuum can keep up with the activity on a
machine like this, I have generally gone to something like:

autovacuum_cost_limit = 800

If you have more than a couple large tables which take long enough
to scan to prevent small, frequently-updated tables from getting
attention soon enough, you might want to boost
autovacuum_max_workers, too.
  

I will try the parameters as you suggested too.

So you guys still think the problem is on db1(that's my main db) as the 
error message stated? Just auto-vacuum on db1 kicked off somehow and 
fixed the problem when I was running vacuum on other dbs?



--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-14 Thread Kevin Grittner
Rural Hunter ruralhun...@gmail.com wrote:

 Let me put here the whole scenario:
 1. I was called by our application users that all the updating was
 failing. So I went to check the db. Any update transaction including
 manual vacuum is blocked out by the error message:
 ERROR: database is not accepting commands to avoid wraparound data loss
 in database db1
   Suggestion:Stop the postmaster and use a standalone backend to
 vacuum that database.

 2. Since db1 is a very large database(it is the main db the user is
 using) I can not afford to take long time to vacuum full on that. So I
 thought about to try on other small dbs first.

Why in the world would you want to use VACUUM FULL in this circumstance?

 3. I stop the instance.

 4. I use echo 'vacuum full;' | postgres --single -D /pgdata [other
 dbs] to vacuum some other dbs. I still got several warning messages
 when vacuum the first database(let's say db2):
 2013-09-14 08:56:44 CST [5536]: [453-1] user=,db=,host= WARNING:
 database db1 must be vacuumed within 999775 transactions
 2013-09-14 08:56:44 CST [5536]: [454-1] user=,db=,host= HINT:  To avoid
 a database shutdown, execute a database-wide VACUUM in that database.

WARNING, not error, so the VACUUM would have run.

 Here the error message still points to db1.

I'm not sure which database would be referenced if the table which
needed the VACUUM was a shared table, like pg_database or
pg_authid.

 5.  When I ran the single connection vacuum on other dbs(not db1), there
 was not any error/warning message. So I tried to start whole instance.

 6. I started the instance and found everything is fine.

 So actually I have 3 questions here:
 1. Was the db name in the error message wrong?

Probably not, to the extent that running VACUUM (FULL is not
necessary) against that database would have solved the problem.  If
it was a shared catalog table it might be that it was not the
*only* database which would work.

 2. How would that happend? Shouldn't auto vacuum handle it and avoid
 such problem?

There are two possibilities -- either you had a long-running
transaction in the cluster or your autovacuum is not configured to
be aggressive enough to keep you out of trouble.

 3. How to detect such problem earlier?

We would need a description of the machine (cores, RAM, storage
system) and the output of these queries to be able to make good
suggestions on tuning autovacuum:

SELECT version();
SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

In addition, make sure that you are monitoring for long-running
transactions.  A reasonable monitoring scheme might be to alert
when either of these queries returns any rows:

select * from pg_stat_activity where xact_start  (now() - interval '1 hour');
select * from pg_prepared_xacts where prepared  (now() - interval '1 minute');

You can, of course, adjust the intervals to what makes the most
sense for your environment.  If you have max_prepared_transactions
set to zero, the latter query is not really necessary.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-14 Thread Rural Hunter

于 2013/9/15 1:06, Kevin Grittner 写道:

Rural Hunter ruralhun...@gmail.com wrote:

Why in the world would you want to use VACUUM FULL in this circumstance?
the db name in the error message wrong?

I just googled around and found the solution. What's the other option?

There are two possibilities -- either you had a long-running
transaction in the cluster or your autovacuum is not configured to
be aggressive enough to keep you out of trouble.
I checked the running transactions before I stopped the instance. There 
was no long running transaction exception one auto-vacuum running for 
about 1 hour.


We would need a description of the machine (cores, RAM, storage
system) and the output of these queries to be able to make good
suggestions on tuning autovacuum:

SELECT version();
SELECT name, current_setting(name), source
   FROM pg_settings
   WHERE source NOT IN ('default', 'override');
I'm on Ubuntu 12.04.1 64bit with 32 cores and 377G memory. The data is 
stored on several rai10 SAS 15k disks.


postgres=# SELECT version();
version
---
 PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 
4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit

(1 row)
postgres=# SELECT name, current_setting(name), source
postgres-#   FROM pg_settings
postgres-#   WHERE source NOT IN ('default', 'override');
name | 
current_setting   | source

-++--
 application_name| psql.bin | client
 archive_command | test ! -f /dbbk/postgres/logarch/%f.gz 
 gzip -c %p /dbbk/postgres/logarch/%f.gz | configuration file

 archive_mode| on | configuration file
 autovacuum  | on | configuration file
 autovacuum_freeze_max_age   | 20 | configuration file
 checkpoint_segments | 20 | configuration file
 client_encoding | UTF8 | client
 DateStyle   | ISO, YMD | configuration file
 default_text_search_config  | chinesecfg | configuration file
 effective_cache_size| 100GB | configuration file
 full_page_writes| off | configuration file
 lc_messages | zh_CN.utf8 | configuration file
 lc_monetary | zh_CN.utf8 | configuration file
 lc_numeric  | zh_CN.utf8 | configuration file
 lc_time | zh_CN.utf8 | configuration file
 listen_addresses| * | configuration file
 log_autovacuum_min_duration | 30min | configuration file
 log_destination | stderr | configuration file
 log_line_prefix | %t [%p]: [%l-1] 
user=%u,db=%d,host=%h  | 
configuration file

 log_min_duration_statement  | 10s | configuration file
 log_statement   | ddl | configuration file
 log_timezone| PRC | configuration file
 logging_collector   | on | configuration file
 maintenance_work_mem| 20GB | configuration file
 max_connections | 2500 | configuration file
 max_stack_depth | 2MB | environment variable
 max_wal_senders | 1 | configuration file
 port| 3500 | configuration file
 shared_buffers  | 16GB | configuration file
 synchronous_commit  | off | configuration file
 TimeZone| PRC | configuration file
 track_activities| on | configuration file
 track_counts| on | configuration file
 vacuum_freeze_table_age | 10 | configuration file
 wal_buffers | 16MB | configuration file
 wal_level   | hot_standby | configuration file
 work_mem| 8MB | configuration file
(37 rows)


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Re: [ADMIN] wrong database name in error message?

2013-09-14 Thread bricklen
On Sat, Sep 14, 2013 at 6:05 PM, Rural Hunter ruralhun...@gmail.com wrote:

 于 2013/9/15 1:06, Kevin Grittner 写道:

 Rural Hunter ruralhun...@gmail.com wrote:

 Why in the world would you want to use VACUUM FULL in this circumstance?
 the db name in the error message wrong?

 I just googled around and found the solution. What's the other option?


A plain VACUUM will suffice -- there is no need to also use the FULL
keyword in this case (wraparound).


Re: [ADMIN] wrong database name in error message?

2013-09-14 Thread Alvaro Herrera
Rural Hunter escribió:

 2. Since db1 is a very large database(it is the main db the user is
 using) I can not afford to take long time to vacuum full on that. So
 I thought about to try on other small dbs first.
 
 3. I stop the instance.
 
 4. I use echo 'vacuum full;' | postgres --single -D /pgdata [other
 dbs] to vacuum some other dbs.

Two things.  One is you don't need VACUUM FULL, as already pointed out;
plain VACUUM suffices.  The other is that you don't actually need to
vacuum all tables; only those with a very old pg_class.relfrozenxid.
The one with the oldest value is that feeds pg_database.datfrozenxid;
and that's what feeds the must be vacuumed within XY transactions
messages.  So you can just connect to db1, examine pg_class looking for
tables whose age(relfrozenxid) is old, and vacuum only those.  No need
for downtime.

Now the interesting question is why didn't autovacuum get to these.
Normally it does, but when there's conflicting activity (say you have
periodic ALTER TABLE for some reason) it might not be able to.  Check
the log for ERRORs that made autovacuum kill itself, for example.  In
the long run, the best solution is to not have such conflicting activity
in the first place.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin