Tom Lane wrote:
Alvaro Herrera [EMAIL PROTECTED] writes:
Tom Lane wrote:
1. Grab the AutovacSchedule LWLock exclusively.
2. Check to see if another worker is currently processing
that table; if so drop LWLock and go to next list entry.
3. Recompute whether table needs vacuuming; if not,
Alvaro Herrera wrote:
worker to-do list
-
It removes from its to-do list the tables being processed. Finally, it
writes the list to disk.
I am worrying about the worker-to-do-list in your proposal. I think
worker isn't suitable to maintain any vacuum task list; instead
it is
Galy Lee [EMAIL PROTECTED] writes:
I am worrying about the worker-to-do-list in your proposal. I think
worker isn't suitable to maintain any vacuum task list; instead
it is better to maintain a unified vacuum task queue on autovacuum share
memory.
Shared memory is fixed-size.
Galy Lee wrote:
Alvaro Herrera wrote:
worker to-do list
-
It removes from its to-do list the tables being processed. Finally, it
writes the list to disk.
I am worrying about the worker-to-do-list in your proposal. I think
worker isn't suitable to maintain any vacuum
Alvaro Herrera wrote:
worker to-do list
-
When each worker starts, it determines which tables to process in the
usual fashion: get pg_autovacuum and pgstat data and compute the
equations.
The worker then takes a snapshot of what's currently going on in the
database, by
Alvaro Herrera [EMAIL PROTECTED] writes:
Tom Lane wrote:
1. Grab the AutovacSchedule LWLock exclusively.
2. Check to see if another worker is currently processing
that table; if so drop LWLock and go to next list entry.
3. Recompute whether table needs vacuuming; if not,
drop LWLock and go
Hi, Alvaro
Alvaro Herrera wrote:
keep such a task list in shared memory, because we aren't able to grow
that memory after postmaster start.
We can use the fix-size share memory to maintain such a queue. The
maximum task size is the number of all tables. So the size of the queue
can be the
Galy Lee [EMAIL PROTECTED] writes:
We can use the fix-size share memory to maintain such a queue. The
maximum task size is the number of all tables. So the size of the queue
can be the same with max_fsm_relations which is usually larger than the
numbers of tables and indexes in the cluster.
Tom Lane [EMAIL PROTECTED] wrote:
In any case, I still haven't seen a good case made why a global work
queue will provide better behavior than each worker keeping a local
queue. The need for small hot tables to be visited more often than
big tables suggests to me that a global queue will
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
Tom Lane [EMAIL PROTECTED] wrote:
In any case, I still haven't seen a good case made why a global work
queue will provide better behavior than each worker keeping a local
queue.
If we have some external vacuum schedulers, we need to see and touch
Tom Lane [EMAIL PROTECTED] wrote:
Who said anything about external schedulers? I remind you that this is
AUTOvacuum. If you want to implement manual scheduling you can still
use plain 'ol vacuum commands.
I think we can split autovacuum into two (or more?) functions:
task gatherers and task
My initial reaction is that this looks good to me, but still a few
comments below.
Alvaro Herrera wrote:
Here is a low-level, very detailed description of the implementation of
the autovacuum ideas we have so far.
launcher's dealing with databases
-
[ Snip ]
Matthew T. O'Connor matthew@zeut.net writes:
Does a new worker really care about the PID of other workers or what
table they are currently working on?
As written, it needs the PIDs so it can read in the other workers' todo
lists (which are in files named by PID).
It's not clear to me why a
Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
It's not clear to me why a worker cares that there is a new worker,
since the new worker is going to ignore all the tables that are already
claimed by all worker todo lists.
That seems wrong to me, since it means that new workers
On Tue, Feb 27, 2007 at 01:26:00AM -0500, Matthew T. O'Connor wrote:
Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
I'm not sure what you are saying here, are you now saying that partial
vacuum won't work for autovac? Or are you saying that saving state as
Jim is describing
On Tue, Feb 27, 2007 at 12:54:28AM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
Here is a worst case example: A DB with 6 tables all of which are highly
active and will need to be vacuumed
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
Jim C. Nasby wrote:
The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters
Jim C. Nasby wrote:
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
Jim C. Nasby wrote:
The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've
On Tue, Feb 27, 2007 at 12:12:22PM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
Jim C. Nasby wrote:
The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that
On Feb 26, 2007, at 12:49 PM, Alvaro Herrera wrote:
Jim C. Nasby wrote:
That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to
tie
it to time and not size, since the key factor is that you want it
to hit
Tom Lane wrote:
Saving the array is
expensive both in runtime and code complexity, and I don't believe we
can trust it later --- at least not without even more expensive-and-
complex measures, such as WAL-logging every such save :-(
I don’t understand well the things you are worrying about.
Jim C. Nasby wrote:
That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number of seconds.
If we
Alvaro Herrera wrote:
Jim C. Nasby wrote:
That's why I'm thinking it would be best to keep the maximum size of
stuff for the second worker small. It probably also makes sense to tie
it to time and not size, since the key factor is that you want it to hit
the high-update tables every X number
Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
The second mode is the hot table worker mode, enabled when the worker
detects that there's already a worker in the database. In this mode,
the worker is limited to those tables that can be vacuumed in less than
autovacuum_naptime, so large
Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
How can you determine what tables can be vacuumed within
autovacuum_naptime?
My assumption is that
pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to vacuum
This is of course not the reality, because the delay is not how
Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
How can you determine what tables can be vacuumed within
autovacuum_naptime?
My assumption is that
pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to
vacuum
This is of course not the
Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a
bloated table? For example, a 100 row table that is constantly updated
and hasn't been vacuumed in a while (say the admin disabled autovacuum
for a while), now that small
Alvaro Herrera [EMAIL PROTECTED] writes:
Matthew T. O'Connor wrote:
I'm not sure it's a good idea to tie this to the vacuum cost delay
settings either, so let me as you this, how is this better than just
allowing the admin to set a new GUC variable like
autovacuum_hot_table_size_threshold
Tom Lane wrote:
Alvaro Herrera [EMAIL PROTECTED] writes:
Matthew T. O'Connor wrote:
I'm not sure it's a good idea to tie this to the vacuum cost delay
settings either, so let me as you this, how is this better than just
allowing the admin to set a new GUC variable like
On Mon, Feb 26, 2007 at 09:22:42PM -0500, Tom Lane wrote:
Alvaro Herrera [EMAIL PROTECTED] writes:
Matthew T. O'Connor wrote:
I'm not sure it's a good idea to tie this to the vacuum cost delay
settings either, so let me as you this, how is this better than just
allowing the admin to set
On Mon, Feb 26, 2007 at 08:11:44PM -0300, Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
The second mode is the hot table worker mode, enabled when the worker
detects that there's already a worker in the database. In this mode,
the worker is limited to those
On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
How can you determine what tables can be vacuumed within
autovacuum_naptime?
My assumption is that
pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to
Jim C. Nasby wrote:
The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters that
are running dozens-hundreds of databases; in that environment
Jim C. Nasby [EMAIL PROTECTED] writes:
On Mon, Feb 26, 2007 at 09:22:42PM -0500, Tom Lane wrote:
I'm not liking any of these very much, as they seem critically dependent
on impossible-to-tune parameters. I think it'd be better to design this
around having the first worker explicitly expose
[ oh, I forgot to respond to this: ]
Jim C. Nasby [EMAIL PROTECTED] writes:
Isn't there a special lock acquired on a relation by vacuum? Can't we
just check for that?
I think you're thinking that ConditionalLockRelation solves the problem,
but it does not, because it will fail if someone has
Tom Lane wrote:
I think an absolute minimum requirement for a sane design is that no two
workers ever try to vacuum the same table concurrently, and I don't see
where that behavior will emerge from your proposal; whereas it's fairly
easy to make it happen if non-first workers pay attention to
Jim C. Nasby wrote:
On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a
bloated table? For example, a 100 row table that is constantly updated
and hasn't been vacuumed in a while (say the admin disabled
Alvaro Herrera [EMAIL PROTECTED] writes:
Tom Lane wrote:
I think an absolute minimum requirement for a sane design is that no two
workers ever try to vacuum the same table concurrently,
FWIW, I've always considered this to be a very important and obvious
issue, and I think I've neglected
Tom Lane wrote:
Jim C. Nasby [EMAIL PROTECTED] writes:
The real problem is trying to set that up in such a fashion that keeps
hot tables frequently vacuumed;
Are we assuming that no single worker instance will vacuum a given table
more than once? (That's not a necessary assumption,
Tom Lane wrote:
BTW, to what extent might this whole problem be simplified if we adopt
chunk-at-a-time vacuuming (compare current discussion with Galy Lee)?
If the unit of work has a reasonable upper bound regardless of table
size, maybe the problem of big tables starving small ones goes away.
Matthew T. O'Connor matthew@zeut.net writes:
Tom Lane wrote:
I'm inclined to propose an even simpler algorithm in which every worker
acts alike;
That is what I'm proposing except for one difference, when you catch up
to an older worker, exit.
No, that's a bad idea, because it means that
Matthew T. O'Connor matthew@zeut.net writes:
That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?
It seems fairly trivial to me to have a scheme where you do one
fill-workmem-and-scan-indexes cycle per invocation, and store the
next-heap-page-to-scan in some handy place (new
Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
Tom Lane wrote:
I'm inclined to propose an even simpler algorithm in which every worker
acts alike;
That is what I'm proposing except for one difference, when you catch up
to an older worker, exit.
No, that's a bad idea, because
Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?
It seems fairly trivial to me to have a scheme where you do one
fill-workmem-and-scan-indexes cycle per invocation, and store the
next-heap-page-to-scan in
On Tue, Feb 27, 2007 at 12:00:41AM -0300, Alvaro Herrera wrote:
Jim C. Nasby wrote:
The advantage to keying this to autovac_naptime is that it means we
don't need another GUC, but after I suggested that before I realized
that's probably not the best idea. For example, I've seen clusters
On Mon, Feb 26, 2007 at 10:48:49PM -0500, Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
That does sounds simpler. Is chunk-at-a-time a realistic option for 8.3?
It seems fairly trivial to me to have a scheme where you do one
fill-workmem-and-scan-indexes cycle per invocation,
Jim C. Nasby [EMAIL PROTECTED] writes:
The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it. With the default maintenance_work_mem of 128M it would mean writing
out 64M of state every minute on average,
On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
On Mon, Feb 26, 2007 at 06:23:22PM -0500, Matthew T. O'Connor wrote:
I'm not sure how pg_class.relpages is maintained but what happens to a
bloated table? For example, a 100 row table that is constantly
On Tue, Feb 27, 2007 at 12:37:42AM -0500, Tom Lane wrote:
Jim C. Nasby [EMAIL PROTECTED] writes:
The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it. With the default maintenance_work_mem of 128M it
Jim C. Nasby wrote:
On Mon, Feb 26, 2007 at 10:18:36PM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
Here is a worst case example: A DB with 6 tables all of which are highly
active and will need to be vacuumed constantly. While this is totally
hypothetical, it is how I envision
Tom Lane wrote:
Jim C. Nasby [EMAIL PROTECTED] writes:
The proposal to save enough state to be able to resume a vacuum at
pretty much any point in it's cycle might work; we'd have to benchmark
it. With the default maintenance_work_mem of 128M it would mean writing
out 64M of state every minute
Matthew T. O'Connor matthew@zeut.net writes:
I'm not sure what you are saying here, are you now saying that partial
vacuum won't work for autovac? Or are you saying that saving state as
Jim is describing above won't work?
I'm saying that I don't like the idea of trying to stop on a dime by
Tom Lane wrote:
Matthew T. O'Connor matthew@zeut.net writes:
I'm not sure what you are saying here, are you now saying that partial
vacuum won't work for autovac? Or are you saying that saving state as
Jim is describing above won't work?
I'm saying that I don't like the idea of trying to
Jim C. Nasby wrote:
On Thu, Feb 22, 2007 at 10:32:44PM -0500, Matthew T. O'Connor wrote:
I'm not sure this is a great idea, but I don't see how this would result
in large numbers of workers working in one database. If workers work
on tables in size order, and exit as soon as they catch
On Fri, Feb 23, 2007 at 01:22:17PM -0300, Alvaro Herrera wrote:
Jim C. Nasby wrote:
On Thu, Feb 22, 2007 at 10:32:44PM -0500, Matthew T. O'Connor wrote:
I'm not sure this is a great idea, but I don't see how this would result
in large numbers of workers working in one database. If
vacuum should be a process with the least amount of voodoo.
If we can just have vacuum_delay and vacuum_threshold, where
threshold allows an arbitrary setting of how much bandwidth
we will allot to the process, then that is a beyond wonderful thing.
It is easy to determine how much IO
On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to
Jim C. Nasby wrote:
On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2) worker1 deals with hot tables first, then regular tables.
3) Launcher
On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
On Wed, Feb 21, 2007 at 05:40:53PM -0500, Matthew T. O'Connor wrote:
My Proposal: If we require admins to identify hot tables tables, then:
1) Launcher fires-off a worker1 into database X.
2)
On Thu, Feb 22, 2007 at 09:35:45AM +0100, Zeugswetter Andreas ADI SD wrote:
vacuum should be a process with the least amount of voodoo.
If we can just have vacuum_delay and vacuum_threshold, where
threshold allows an arbitrary setting of how much bandwidth
we will allot to the
vacuum should be a process with the least amount of voodoo.
If we can just have vacuum_delay and vacuum_threshold, where
threshold allows an arbitrary setting of how much bandwidth we
will
allot to the process, then that is a beyond wonderful thing.
It is easy to determine
Jim C. Nasby wrote:
On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
So the heuristic would be:
* Launcher fires off workers into a database at a given interval
(perhaps configurable?)
* Each worker works on tables in size order.
* If a worker ever catches up to an
On Thu, Feb 22, 2007 at 10:32:44PM -0500, Matthew T. O'Connor wrote:
Jim C. Nasby wrote:
On Thu, Feb 22, 2007 at 09:32:57AM -0500, Matthew T. O'Connor wrote:
So the heuristic would be:
* Launcher fires off workers into a database at a given interval
(perhaps configurable?)
* Each worker
Ok, scratch that :-) Another round of braindumping below.
Launcher starts one worker in each database. This worker is not going
to do vacuum work, just report how much vacuum effort is needed in the
database. Vacuum effort is measured as the total number of pages in
need of vacuum, being the
Alvaro Herrera wrote:
Ok, scratch that :-) Another round of braindumping below.
I still think this is solution in search of a problem. The main problem
we have right now is that hot tables can be starved from vacuum. Most
of this proposal doesn't touch that. I would like to see that
Alvaro Herrera [EMAIL PROTECTED] writes:
Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace. I say we're not :-)
I did say that. But your comment about using a high cost_delay was fairly
convincing too. It would be a simpler design and
Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc. He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is
I'm wondering if we can do one better...
Since what we really care about is I/O responsiveness for the rest of
the system, could we just time how long I/O calls take to complete? I
know that gettimeofday can have a non-trivial overhead, but do we care
that much about it in the case of autovac?
One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.
From my experience this is not acceptable... I have tables
Gregory Stark wrote:
If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.
I think to make vacuum has this
Alvaro Herrera wrote:
After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:
[Snip Detailed Description]
How does this sound?
On first blush, I'm not sure I like this
In an ideal world I think you want precisely one vacuum process running per
tablespace on the assumption that each tablespace represents a distinct
physical device.
The cases where we currently find ourselves wanting more are where small
tables are due for vacuuming more frequently than the time
Matthew T. O'Connor wrote:
Alvaro Herrera wrote:
After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:
[Snip Detailed Description]
How does this sound?
On first
Alvaro Herrera wrote:
Matthew T. O'Connor wrote:
On first blush, I'm not sure I like this as it doesn't directly attack
the table starvation problem, and I think it could be a net loss of speed.
VACUUM is I/O bound, as such, just sending multiple vacuum commands at a
DB isn't going to make
[EMAIL PROTECTED] (Alvaro Herrera) writes:
When there is a single worker processing a database, it does not recheck
pgstat data after each table. This is to prevent a high-update-rate
table from starving the vacuuming of other databases.
This case is important; I don't think that having
Alvaro Herrera [EMAIL PROTECTED] writes:
Each worker, including the initial one, starts vacuuming tables
according to pgstat data. They recheck the pgstat data after finishing
each table, so that a table vacuumed by another worker is not processed
twice (maybe problematic: a table with high
Alvaro Herrera wrote:
Once autovacuum_naptime... autovacuum_max_workers...
How does this sound?
The knobs exposed on autovacuum feel kinda tangential to
what I think I'd really want to control.
IMHO vacuum_mbytes_per_second would be quite a bit more
intuitive than cost_delay, naptime, etc.
77 matches
Mail list logo