Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-09 Thread Grzegorz Jaskiewicz


On Mar 9, 2007, at 6:42 AM, Tom Lane wrote:


Alvaro Herrera [EMAIL PROTECTED] writes:

Now regarding your restartable vacuum work.  I think that stopping a
vacuum at some point and being able to restart it later is very  
cool and

may get you some hot chicks, but I'm not sure it's really useful.


Too true :-(


Yeah.
Wouldn't 'divide and conquer' kinda approach make it better ? Ie. let  
vacuum to work on some part of table/db. Than stop, pick up another  
part later, vacuum it, etc, etc ?


--
Grzegorz Jaskiewicz
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-09 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Er, why not just finish out the scan at the reduced I/O rate?  Any sort
 of abort behavior is going to create net inefficiency, eg doing an
 index scan to remove only a few tuples.  ISTM that the vacuum ought to
 just continue along its existing path at a slower I/O rate.

I think the main motivation to abort a vacuum scan is so we can switch to some
more urgent scan. So if in the middle of a 1-hour long vacuum of some big
warehouse table we realize that a small hot table is long overdue for a vacuum
we want to be able to remove the tuples we've found so far, switch to the hot
table, and when we don't have more urgent tables to vacuum resume the large
warehouse table vacuum.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-09 Thread Alvaro Herrera
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Er, why not just finish out the scan at the reduced I/O rate?  Any sort
  of abort behavior is going to create net inefficiency, eg doing an
  index scan to remove only a few tuples.  ISTM that the vacuum ought to
  just continue along its existing path at a slower I/O rate.
 
 I think the main motivation to abort a vacuum scan is so we can switch to some
 more urgent scan. So if in the middle of a 1-hour long vacuum of some big
 warehouse table we realize that a small hot table is long overdue for a vacuum
 we want to be able to remove the tuples we've found so far, switch to the hot
 table, and when we don't have more urgent tables to vacuum resume the large
 warehouse table vacuum.

Why not just let another autovac worker do the hot table?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  Is everybody OK with not putting a per-tablespace worker limit?
  Is everybody OK with putting per-database worker limits on a pg_database
  column?
 
 I don't think we need a new pg_database column.  If it's a GUC you can
 do ALTER DATABASE SET, no?  Or was that what you meant?

No, that doesn't work unless we save the datconfig column to the
pg_database flatfile, because it's the launcher (which is not connected) 
who needs to read it.  Same thing with an hypothetical per-database
naptime.  The launcher would also need to parse it, which is not ideal
(though not a dealbreaker either).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Alvaro Herrera
Galy Lee wrote:

Hi,

 Alvaro Herrera wrote:
  I still haven't received the magic bullet to solve the hot table
  problem, but these at least means we continue doing *something*.
 
 Can I know about what is your plan or idea for autovacuum improvement
 for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Things I want to do for 8.3:

- Make use of the launcher/worker stuff, that is, allow multiple
  autovacuum processes in parallel.  With luck we'll find out how to
  deal with hot tables.

Things I'm not sure we'll be able to have in 8.3, in which case I'll get
to them for early 8.4:

- The maintenance window stuff, i.e., being able to throttle workers
  depending on a user-defined schedule.

8.4 material:

- per-tablespace throttling, coordinating IO from multiple workers


I don't have anything else as detailed as a plan.  If you have
suggestions, I'm all ears.

Now regarding your restartable vacuum work.  I think that stopping a
vacuum at some point and being able to restart it later is very cool and
may get you some hot chicks, but I'm not sure it's really useful.  I
think it makes more sense to do something like throttling an ongoing
vacuum to a reduced IO rate, if the maintenance window closes.  So if
you're in the middle of a heap scan and the maintenance window closes,
you immediately stop the scan and go the the index cleanup phase, *at a
reduced IO rate*.  This way, the user will be able to get the benefits
of vacuuming at some not-too-distant future, without requiring the
maintenance window to open again, but without the heavy IO impact that
was allowed during the maintenance window.

Does this make sense?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Galy Lee

Alvaro Herrera wrote:
 I don't have anything else as detailed as a plan.  If you have
 suggestions, I'm all ears.
Cool, thanks for the update. :) We also have some new ideas on the
improvement of autovacuum now. I will raise it up later.

 Now regarding your restartable vacuum work.  
 Does this make sense?
I also have reached a similar conclusion now.  Thank you.

Regards
Galy

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Now regarding your restartable vacuum work.  I think that stopping a
 vacuum at some point and being able to restart it later is very cool and
 may get you some hot chicks, but I'm not sure it's really useful.

Too true :-(

 I think it makes more sense to do something like throttling an ongoing
 vacuum to a reduced IO rate, if the maintenance window closes.  So if
 you're in the middle of a heap scan and the maintenance window closes,
 you immediately stop the scan and go the the index cleanup phase, *at a
 reduced IO rate*.

Er, why not just finish out the scan at the reduced I/O rate?  Any sort
of abort behavior is going to create net inefficiency, eg doing an
index scan to remove only a few tuples.  ISTM that the vacuum ought to
just continue along its existing path at a slower I/O rate.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Galy Lee

Tom Lane wrote:
 Er, why not just finish out the scan at the reduced I/O rate?  Any sort

Sometimes, you may need to vacuum large table in maintenance window and
hot table in the service time. If vacuum for hot table does not eat two
much foreground resource, then you can vacuum large table with a lower
IO rate outside maintenance window; but if vacuum for hot table is
overeating the system resource, then launcher had better to stop the
long running vacuum outside maintenance window.

But I am not insisting on the stop-start feature at this moment.
Changing the cost delay dynamically sounds more reasonable. We can use
it to balance total I/O of workers in service time or maintenance time.
It is not so difficult to achieve this by leveraging the share memory of
autovacuum.

Best Regards
Galy Lee

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Is everybody OK with changing the autovacuum_naptime semantics?

it seems already different from 8.2, so no objection to further change.

 Is everybody OK with not putting a per-tablespace worker limit?
 Is everybody OK with putting per-database worker limits on a pg_database
 column?

I don't think we need a new pg_database column.  If it's a GUC you can
do ALTER DATABASE SET, no?  Or was that what you meant?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Galy Lee
Alvaro,

Alvaro Herrera wrote:
 I still haven't received the magic bullet to solve the hot table
 problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Thanks,

Galy Lee
lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Jim Nasby

On Mar 7, 2007, at 4:00 PM, Alvaro Herrera wrote:
Is everybody OK with putting per-database worker limits on a  
pg_database

column?


I'm worried that we would live to regret such a limit. I can't really  
see any reason to limit how many vacuums are occurring in a database,  
because there's no limiting factor there; you're either going to be  
IO bound (per-tablespace), or *maybe* CPU-bound (perhaps the  
Greenplum folks could enlighten us as to whether they run into vacuum  
being CPU-bound on thumpers).


Changing the naptime behavior to be database related makes perfect  
sense, because the minimum XID you have to worry about is a per- 
database thing; I just don't see limiting the number of vacuums as  
being per-database, though. I'm also skeptical that we'll be able to  
come up with a good way to limit the number of backends until we get  
the hot table issue addressed. Perhaps a decent compromise for now  
would be to limit how many 'small table' vacuums could run on each  
tablespace, and then limit how many 'unlimited table size' vacuums  
could run on each tablespace, where 'small table' would probably have  
to be configurable. I don't think it's the best final solution, but  
it should at least solve the immediate need.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend