Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-16 Thread Bruce McAlister
Hi All,

Okay, I'm getting a little further now. I'm about to create entries in the 
pg_autovacuum system tables. However, I'm a little confused as to how I go 
about finding out the OID value of the tables. The pg_autovacuum table 
requires the OID of the table you want to create settings for (vacrelid). 
Can anyone shed some light on how I can extract the OID of the table? Also, 
what happens if you create a table without OID's, are you still able to add 
it's details in the pg_autovacuum table if there is no OID associated with a 
table?

  Name Type References Description
  vacrelid oid pg_class.oid The table this entry is for
  enabled bool   If false, this table is never autovacuumed
  vac_base_thresh integer   Minimum number of modified tuples before 
vacuum
  vac_scale_factor float4   Multiplier for reltuples to add to 
vac_base_thresh
  anl_base_thresh integer   Minimum number of modified tuples before 
analyze
  anl_scale_factor float4   Multiplier for reltuples to add to 
anl_base_thresh
  vac_cost_delay integer   Custom vacuum_cost_delay parameter
  vac_cost_limit integer   Custom vacuum_cost_limit parameter
  freeze_min_age integer   Custom vacuum_freeze_min_age parameter
  freeze_max_age integer   Custom autovacuum_freeze_max_age parameter


Thanks
Bruce


Bruce McAlister [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Hi Tom,

 Thanks for the suggestion. It's been a while since I replied to this as I 
 had to go and do some further investigation of the docs with regards the 
 autovacuum daemons configuration. According to the documentation, the 
 formula's for the vacuum and analyze are as follows:

 Vacuum
vacuum threshold = vacuum base threshold + vacuum scale factor * number 
 of tuples
 Analyze
analyze threshold = analyze base threshold + analyze scale factor * 
 number of tuples

 My current settings for autovacuum are as follows:

 # - Cost-Based Vacuum Delay -

 vacuum_cost_delay = 200 # 0-1000 milliseconds
 vacuum_cost_page_hit= 1 # 0-1 credits
 vacuum_cost_page_miss = 10   # 0-1 credits
 vacuum_cost_page_dirty = 20   # 0-1 credits
 vacuum_cost_limit   = 200 # 0-1 credits

 #---
 # AUTOVACUUM PARAMETERS
 #---

 autovacuum   = on  # 
 enable autovacuum subprocess?
 
 # 'on' requires stats_start_collector
 
 # and stats_row_level to also be on
 autovacuum_naptime = 1min   # time 
 between autovacuum runs
 autovacuum_vacuum_threshold = 500 # min # of tuple 
 updates before
 
 # vacuum
 autovacuum_analyze_threshold  = 250# min # of 
 tuple updates before
 
 # analyze
 autovacuum_vacuum_scale_factor = 0.2 # fraction of rel 
 size before
 
 # vacuum
 autovacuum_analyze_scale_factor = 0.1 # fraction of 
 rel size before
 
 # analyze
 autovacuum_freeze_max_age   = 2 # maximum XID age 
 before forced vacuum
 
 # (change requires restart)
 autovacuum_vacuum_cost_delay  = -1   # default vacuum 
 cost delay for
 
 # autovacuum, -1 means use
 
 # vacuum_cost_delay
 autovacuum_vacuum_cost_limit= -1   # default 
 vacuum cost limit for
 
 # autovacuum, -1 means use
 
 # vacuum_cost_limit

 Thus to make the autovacuum more aggressive I am thinking along the lines 
 of changing the following parameters:

 autovacuum_vacuum_threshold = 250
 autovacuum_analyze_threshold = 125

 The documentation also mentions that when the autovacuum runs it selects a 
 single database to process on that run. This means that the particular 
 table that we are interrested in will only be vacuumed once every 17 
 minutes, assuming we have 18 databases and the selection process is 
 sequential through the database list.

 From my understanding of the documentation, the only way to work around 
 this issue is to manually update the system catalog table pg_autovacuum 
 and set the pg_autovacuum.enabled field to false to skip the autovacuum on 
 tables that dont require such frequent vacuums. If I do enable this 
 feature, and I manually run a vacuumdb from the command line against that 
 particular disabled table, will the vacuum still process the table? I'm 
 assuming too, that the best tables to disable autovacuum on will be ones 
 with a minimal amount of update/delete queries run against it. For 
 example, if we have a table that only has inserts applied to it, it is 
 safe to assume that that table can safely be ignored by autovacuum.

 Do you have any other suggestions as to which tables generally can be 
 excluded from the autovacuum based on the usage 

Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-13 Thread Anton Melser

In fact, getting rid of vacuum full, or changing it to work like
cluster, has been proposed in the past. The use case really is pretty
narrow; cluster is a lot faster if there's a lot of unused space in the
table, and if there's not, vacuum full isn't going to do much so there's
not much point running it in the first place. The reason it exists is
largely historical, there hasn't been a pressing reason to remove it either.


I can assure you it is a great way to get back gigabytes when someone
has put no vacuum strategy in place and your 200K row table (with
about 200 bytes per row) is taking up 1.7gig!!!
Vive le truncate table, and vive le vacuum full!
:-)
Anton

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-09 Thread Bruce McAlister
Hi Tom,

Thanks for the suggestion. It's been a while since I replied to this as I 
had to go and do some further investigation of the docs with regards the 
autovacuum daemons configuration. According to the documentation, the 
formula's for the vacuum and analyze are as follows:

Vacuum
vacuum threshold = vacuum base threshold + vacuum scale factor * number 
of tuples
Analyze
analyze threshold = analyze base threshold + analyze scale factor * 
number of tuples

My current settings for autovacuum are as follows:

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 200 # 0-1000 milliseconds
vacuum_cost_page_hit= 1 # 0-1 credits
vacuum_cost_page_miss = 10   # 0-1 credits
vacuum_cost_page_dirty = 20   # 0-1 credits
vacuum_cost_limit   = 200 # 0-1 credits

#---
# AUTOVACUUM PARAMETERS
#---

autovacuum   = on  # 
enable autovacuum subprocess?
 
  # 'on' requires stats_start_collector
 
  # and stats_row_level to also be on
autovacuum_naptime = 1min   # time 
between autovacuum runs
autovacuum_vacuum_threshold = 500 # min # of tuple 
updates before
 
  # vacuum
autovacuum_analyze_threshold  = 250# min # of tuple 
updates before
 
  # analyze
autovacuum_vacuum_scale_factor = 0.2 # fraction of rel 
size before
 
  # vacuum
autovacuum_analyze_scale_factor = 0.1 # fraction of rel 
size before
 
  # analyze
autovacuum_freeze_max_age   = 2 # maximum XID age before 
forced vacuum
 
  # (change requires restart)
autovacuum_vacuum_cost_delay  = -1   # default vacuum 
cost delay for
 
  # autovacuum, -1 means use
 
  # vacuum_cost_delay
autovacuum_vacuum_cost_limit= -1   # default vacuum 
cost limit for
 
 # autovacuum, -1 means use
 
 # vacuum_cost_limit

Thus to make the autovacuum more aggressive I am thinking along the lines of 
changing the following parameters:

autovacuum_vacuum_threshold = 250
autovacuum_analyze_threshold = 125

The documentation also mentions that when the autovacuum runs it selects a 
single database to process on that run. This means that the particular table 
that we are interrested in will only be vacuumed once every 17 minutes, 
assuming we have 18 databases and the selection process is sequential 
through the database list.

From my understanding of the documentation, the only way to work around this 
issue is to manually update the system catalog table pg_autovacuum and set 
the pg_autovacuum.enabled field to false to skip the autovacuum on tables 
that dont require such frequent vacuums. If I do enable this feature, and I 
manually run a vacuumdb from the command line against that particular 
disabled table, will the vacuum still process the table? I'm assuming too, 
that the best tables to disable autovacuum on will be ones with a minimal 
amount of update/delete queries run against it. For example, if we have a 
table that only has inserts applied to it, it is safe to assume that that 
table can safely be ignored by autovacuum.

Do you have any other suggestions as to which tables generally can be 
excluded from the autovacuum based on the usage patterns?
Can you see anything with respect to my new autovacuum parameters that may 
cause issue's and are there any other parameters that you suggest I need to 
change to make the autovacuum daemon more aggressive?

PS: Currently we have the Cluster command running on the sipaccounts table 
as the vacuum full is taking too long. It would be nice though to have some 
piece of mind that the cluster command is mvcc safe, as Heikki and Aidan 
have mentioned that it is not and may break things in our environment, I'm a 
little afraid of running with the cluster command, and 

Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Heikki Linnakangas

Bruce McAlister wrote:
Over time we have noticed increased response times from the database which 
has an adverse affect on our registration times. After doing some research 
it appears that this may have been related to our maintenance regime, and 
has thus been amended as follows:



[1] AutoVacuum runs during the day over the entire PostgreSQL cluster,

[2] A Vacuum Full Verbose is run during our least busy period (generally 
03:30) against the Database,


[3] A Re-Index on the table is performed,

[4] A Cluster on the table is performed against the most used index,

[5] A Vacuum Analyze Verbose is run against the database.


These maintenance steps have been setup to run every 24 hours.


The database in essence, once loaded up and re-index is generally around 
17MB for data and 4.7MB for indexes in size.



Over a period of 24 hours the database can grow up to around 250MB and the 
indexes around 33MB (Worst case thus far). When the maintenance kicks in, 
the vacuum full verbose step can take up to 15 minutes to complete (worst 
case). The re-index, cluster and vacuum analyze verbose steps complete in 
under 1 second each. The problem here is the vacuum full verbose, which 
renders the environment unusable during the vacuum phase. The idea here is 
to try and get the vacuum full verbose step to complete in less than a 
minute. Ideally, if we could get it to complete quicker then that would be 
GREAT, but our minimal requirement is for it to complete at the very most 1 
minute. Looking at the specifications of our environment below, do you think 
that this is at all possible?


250MB+33MB isn't very much. It should easily fit in memory, I don't see 
why you need the 12 disk RAID array. Are you sure you got the numbers right?


Vacuum full is most likely a waste of time. Especially on the tables 
that you cluster later, cluster will rewrite the whole table and indexes 
anyway. A regular normal vacuum should be enough to keep the table in 
shape. A reindex is also not usually necessary, and for the tables that 
you cluster, it's a waste of time like vacuum full.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Bruce McAlister
Hi Heikki,

Thanks for the reply.

The RAID array was implemented due to a projected growth pattern which 
incorporate all 18 of our databases. The sizings I mentioned only refer to 1 
of those databases, which, is also the most heavily used database :)

If I understand you correctly, we could in essence change our maintenance 
routine to the follwing:

[1] Cluster on most used index
[2] Perform a vacuum analyze on the table

If I read your post correctly, this will regenerate the index that the 
cluster is performed on (1 of 3) and also re-generate the table in the 
sequence of that index?

If that is the case, why would anyone use the vacuum full approach if they 
could use the cluster command on a table/database that will regen these 
files for you. It almost seems like the vacuum full approach would, or 
could, be obsoleted by the cluster command, especially if the timings in 
their respective runs are that different (in our case the vacuum full took 
15 minutes in our worst case, and the cluster command took under 1 second 
for the same table and scenario).

The output of our script for that specific run is as follows (just in-case 
i'm missing something):

Checking disk usage before maintenance on service (sipaccounts) at 
02-Mar-2007 03:30:00

258M /database/pgsql/bf_service/data
33M /database/pgsql/bf_service/index

Completed checking disk usage before maintenance on service (sipaccounts) at 
02-Mar-2007 03:30:00

Starting VACUUM FULL VERBOSE on service (sipaccounts) at 02-Mar-2007 
03:30:00

INFO: vacuuming public.sipaccounts
INFO: sipaccounts: found 71759 removable, 9314 nonremovable row versions 
in 30324 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 318 to 540 bytes long.
There were 439069 unused item pointers.
Total free space (including removable row versions) is 241845076 bytes.
28731 pages are or will become empty, including 41 at the end of the table.
30274 pages containing 241510688 free bytes are potential move destinations.
CPU 0.00s/0.05u sec elapsed 31.70 sec.
INFO: index sippeers_name_key now contains 9314 row versions in 69 pages
DETAIL: 7265 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 1.52 sec.
INFO: index sippeers_pkey now contains 9314 row versions in 135 pages
DETAIL: 7161 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 3.07 sec.
INFO: index accountcode_index now contains 9314 row versions in 3347 pages
DETAIL: 71759 index row versions were removed.
1151 index pages have been deleted, 1151 are currently reusable.
CPU 0.02s/0.08u sec elapsed 56.31 sec.
INFO: sipaccounts: moved 3395 row versions, truncated 30324 to 492 pages
DETAIL: CPU 0.03s/0.56u sec elapsed 751.99 sec.
INFO: index sippeers_name_key now contains 9314 row versions in 69 pages
DETAIL: 3395 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.21 sec.
INFO: index sippeers_pkey now contains 9314 row versions in 135 pages
DETAIL: 3395 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index accountcode_index now contains 9314 row versions in 3347 pages
DETAIL: 3395 index row versions were removed.
1159 index pages have been deleted, 1159 are currently reusable.
CPU 0.01s/0.01u sec elapsed 30.03 sec.
INFO: vacuuming pg_toast.pg_toast_2384131
INFO: pg_toast_2384131: found 0 removable, 0 nonremovable row versions in 
0 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index pg_toast_2384131_index now contains 0 row versions in 1 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Completed VACUUM FULL VERBOSE on service (sipaccounts) at 02-Mar-2007 
03:44:35
Starting REINDEX on service (sipaccounts) at 02-Mar-2007 03:44:35

REINDEX

Completed REINDEX on service (sipaccounts) at 02-Mar-2007 03:44:35
Starting CLUSTER on service (sipaccounts) at 02-Mar-2007 03:44:35

CLUSTER sipaccounts;
CLUSTER

Completed CLUSTER on service (sipaccounts) at 02-Mar-2007 03:44:36
Starting VACUUM ANALYZE VERBOSE on service (sipaccounts) at 02-Mar-2007 
03:44:36

INFO: vacuuming public.sipaccounts
INFO: scanned index sippeers_name_key to remove 9 row versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: scanned index sippeers_pkey to remove 9 row versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.20 sec.
INFO: scanned index accountcode_index to remove 9 row versions
DETAIL: CPU 0.00s/0.00u sec 

Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Tom Lane
Bruce McAlister [EMAIL PROTECTED] writes:
 [1] AutoVacuum runs during the day over the entire PostgreSQL cluster,

Good, but evidently you need to make it more aggressive.

 [2] A Vacuum Full Verbose is run during our least busy period (generally 
 03:30) against the Database,

 [3] A Re-Index on the table is performed,

 [4] A Cluster on the table is performed against the most used index,

 [5] A Vacuum Analyze Verbose is run against the database.

That is enormous overkill.  Steps 2 and 3 are a 100% waste of time if
you are going to cluster in step 4.  Just do the CLUSTER and then
ANALYZE (or VACUUM ANALYZE if you really must, but the value is marginal).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Heikki Linnakangas

Aidan Van Dyk wrote:

* Heikki Linnakangas [EMAIL PROTECTED] [070305 09:46]:
In fact, getting rid of vacuum full, or changing it to work like 
cluster, has been proposed in the past. The use case really is pretty 
narrow; cluster is a lot faster if there's a lot of unused space in the 
table, and if there's not, vacuum full isn't going to do much so there's 
not much point running it in the first place. The reason it exists is 
largely historical, there hasn't been a pressing reason to remove it either.


I've never used CLUSTER, because I've always heard murmerings of it not
being completely MVCC safe.  From the TODO:
* CLUSTER
o Make CLUSTER preserve recently-dead tuples per MVCC
  requirements


Good point, I didn't remember that. Using cluster in an environment like 
the OP has, cluster might actually break the consistency of concurrent 
transactions.



But the documents don't mention anything about cluster being unsafe.


Really? checks docs. Looks like you're right. Should definitely be 
mentioned in the docs.



AFAIK, Vacuum full doesn't suffer the same MVCC issues that cluster
does.  Is this correct?


That's right. Vacuum full goes to great lengths to be MVCC-safe.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL 8.2.3 VACUUM Timings/Performance

2007-03-05 Thread Aidan Van Dyk
* Heikki Linnakangas [EMAIL PROTECTED] [070305 09:46]:

 If that is the case, why would anyone use the vacuum full approach if they 
 could use the cluster command on a table/database that will regen these 
 files for you. It almost seems like the vacuum full approach would, or 
 could, be obsoleted by the cluster command, especially if the timings in 
 their respective runs are that different (in our case the vacuum full took 
 15 minutes in our worst case, and the cluster command took under 1 second 
 for the same table and scenario).
 
 In fact, getting rid of vacuum full, or changing it to work like 
 cluster, has been proposed in the past. The use case really is pretty 
 narrow; cluster is a lot faster if there's a lot of unused space in the 
 table, and if there's not, vacuum full isn't going to do much so there's 
 not much point running it in the first place. The reason it exists is 
 largely historical, there hasn't been a pressing reason to remove it either.

I've never used CLUSTER, because I've always heard murmerings of it not
being completely MVCC safe.  From the TODO:
* CLUSTER
o Make CLUSTER preserve recently-dead tuples per MVCC
  requirements
But the documents don't mention anything about cluster being unsafe.

AFAIK, Vacuum full doesn't suffer the same MVCC issues that cluster
does.  Is this correct?

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature