[HACKERS] Changing semantics of autovacuum_cost_limit
Hi all, As AUTOVACUUM is having multiple workers now, the semantics of autovacuum_cost_limit also need to be redefined. Currently, autovacuum_cost_limit is the accumulated cost that will cause one single worker vacuuming process to sleep. It is used to restrict the I/O consumption of a single vacuum worker. When there are N workers, the I/O consumption by autovacuum workers can be increased by N times. This autovacuum_cost_limit semantics produces unpredictable I/O consumption for multiple-autovacuum-workers. One simple idea is to set cost limit for every worker to: autovacuum_cost_limit / max_autovacuum_workers. But for scenarios which have fewer active workers, it is obvious unfair to active workers. So a better way is to set cost limit of every active worker to: autovacuum_cost_limit/autovacuum_active_workers. This ensures the I/O consumption of autovacuum is stable. Worker can be extended to have its own cost_limit on share memory. When a worker is brought up or a worker has finished its work, launcher recalculates: worker_cost_limit= (autovacuum_cost_limit/autovacuum_active_workers) and sets new value for each active workers. The above approach requires launcher can change cost delay setting of workers on-the-fly. This can be achieved by forcing VACUUM refers to the cost delay setting in its worker’s share memory every vacuum_delay_point. Any comments or suggestions? Best Regards Galy Lee [EMAIL PROTECTED] NTT Open Source Software Center ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] autovacuum next steps, take 3
Alvaro Herrera wrote: worker to-do list - It removes from its to-do list the tables being processed. Finally, it writes the list to disk. I am worrying about the worker-to-do-list in your proposal. I think worker isn't suitable to maintain any vacuum task list; instead it is better to maintain a unified vacuum task queue on autovacuum share memory. Here are the basic ideas: * Why is such a task queue needed? - Launcher might schedule all vacuum tasks by such a queue. It provides a facility to schedule tasks smartly for further autovacuum improvement. - Also such a task list can be viewed easily from a system view. This can be implemented easily in 8.3 by the task queue. * VACUUM task queue VACUUM tasks of cluster are maintained in a unified cluster-wide queue in the share memory of autovacuum. global shared TaskInfo tasks[]; It can be viewed as: SELECT * FROM pg_autovacuum_tasks; dbid | relid | group | worker ---+---+---+ 2 | 20001 | 0 | 1001 2 | 20002 | 0 | 3 | 30001 | 0 | 1002 VACUUM tasks belong to the same database might be divided into several groups. One worker might be assigned to process one specific task group. The task queue might be filled by dedicated task-gathering-worker or it might be filled by *external task gatherer*. It allows external program to develop a more sophisticated vacuum scheme. Based on previous discussion, it appears that it is difficult to implement an all-purpose algorithm to satisfy the requirements of all applications. It is better to allow user to develop their vacuum strategies. *User-defined external program* might fill the task queue, and schedule tasks by their own strategy. Launcher will response for coordinating workers only. This pluggable-vacuum-strategy approach seems a good solution. * status of worker It is also convenience to allow user to monitor the status of vacuum worker by a system view.The snapshot of worker can also be viewed as: SELECT * FROM pg_autovacuum_workers; pid | dbid | relid | group --+---+---+--- 1001 | 2 | 20001 | 0 1002 | 3 | 30001 | 0 Best Regards Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] autovacuum next steps, take 3
Hi, Alvaro Alvaro Herrera wrote: keep such a task list in shared memory, because we aren't able to grow that memory after postmaster start. We can use the fix-size share memory to maintain such a queue. The maximum task size is the number of all tables. So the size of the queue can be the same with max_fsm_relations which is usually larger than the numbers of tables and indexes in the cluster. This is sufficient to contain most of the vacuum tasks. Even though the queue is over flow, for task-gatherer is scanning the whole cluster every autovacuum_naptime, it is quickly enough to pick those tasks up again. We don’t need to write any thing to external file. So there is no problem to use a fix-size share memory to maintain a global queue. Other idea I had was having a third kind of autovacuum process, namely a schedule builder If we have such a global queue, task-gathering worker can connect to every database every naptime to gather tasks in time. The task-gathering worker won’t build the schedule, LAUNCHER or external program responses for such activity. How to dispatch tasks to worker is just a scheduling problem, a good dispatching algorithm needs to ensure each worker can finish their tasks on time, this might resolve the headache HOT table problem. But this is a further issue to be discussed after 8.3. Best Regards Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
Alvaro Herrera wrote: I don't have anything else as detailed as a plan. If you have suggestions, I'm all ears. Cool, thanks for the update. :) We also have some new ideas on the improvement of autovacuum now. I will raise it up later. Now regarding your restartable vacuum work. Does this make sense? I also have reached a similar conclusion now. Thank you. Regards Galy ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
Tom Lane wrote: Er, why not just finish out the scan at the reduced I/O rate? Any sort Sometimes, you may need to vacuum large table in maintenance window and hot table in the service time. If vacuum for hot table does not eat two much foreground resource, then you can vacuum large table with a lower IO rate outside maintenance window; but if vacuum for hot table is overeating the system resource, then launcher had better to stop the long running vacuum outside maintenance window. But I am not insisting on the stop-start feature at this moment. Changing the cost delay dynamically sounds more reasonable. We can use it to balance total I/O of workers in service time or maintenance time. It is not so difficult to achieve this by leveraging the share memory of autovacuum. Best Regards Galy Lee ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] RFC: changing autovacuum_naptime semantics
Alvaro, Alvaro Herrera wrote: I still haven't received the magic bullet to solve the hot table problem, but these at least means we continue doing *something*. Can I know about what is your plan or idea for autovacuum improvement for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4? Thanks, Galy Lee lee.galy _at_ ntt.oss.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Restartable VACUUM design overview version 2
Hi Thanks for a lot of feadback and good ideas on the restartable vacuum. Here is a new design overview of it based on previous discussions. There are several ideas to address the problem of long running VACUUM in a defined maintenance window. One idea might be: when maintenance time is running out, we can do the following things: - send a smart stop request to vacuum and vacuum can stop at a right place.(It might take a long time.) - change the cost delay setting of vacuum on-the-fly to make vacuum less aggressive. The followings are the discussions for them. Restartable vacuum design overview -- * Where to stop: There are two approaches to stop a vacuum: (1) tell VACUUM where to stop when it is starting VACUUM can be told to stop at a right point when it starts (By SQL syntax like: VACUUM SOME). The optional stop point is after one full fill-workmem-clean-index-clean-deadtuple cycle. VACUUM stops when it has finished such a cycle. (2) interrupt VACUUM when it is running. Another approach is to interrupt the running VACUUM. VACUUM checks for a smart stop request at normal vacuum delay points, if such a request is detected, a flag is set to tell VACUUM to stop at a right point. VACUUM stops at the end of one full fill-workmem-clean-index -clean-deadtuple cycle. But I can not figure out a simple way to request a running VACUUM to stop in (2), for the signals of backend have been use up. (1) is simple to be implemented, for it doesn’t require a communication with running VACUUM. * How to stop When VACUUM is stopping, - it saves the block number that it had reached to pg_class; - it also updates the free space information to FSM. (This might be posted by a separated patch.) * How to restart: When VACUUM is restarting, it reads the stored block from pg_class to restart the interrupted scan. Change VACUUM cost delay settings on-the-fly feature -- When the end of maintenance window comes, we might notify VACUUM to use a set of less aggressive cost delay setting. I don’t have a clear idea on how to implement this feature yet. Maybe we need a message passing mechanism between backbends to exchange the cost delay setting like a patch in here: http://archives.postgresql.org/pgsql-patches/2006-04/msg00047.php Another simple way to achieve this is to use the setting for different maintenance window in system catalog. There are some previous discussions about the implementation of maintenance window, but further discussions still have not been raised. So it is seems that it is better to implement this feature after the implementation of maintenance window. Implementation plan --- Changing VACUUM cost delay setting on-the-fly requires a internal massage passing mechanism or the implementation of maintenance window, maybe it is not a good timing to rush for it now. But I hope the *restartable VACUUM feature* can be accepted for 8.3. Hope your comments and suggestions. Best Regards Galy Lee lee.galy _at_ ntt.oss.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs wrote: You haven't explained how saving the dead-tuple-list could be done in a safe mannner and it seems risky to me. The files are placed in a new directory $PGDATA/pg_vacuum with the name: spcNode.dbNode.relNode for each relations which have been interrupted during vacuum. It has the format likes: 1. VacStateFileHeader 2. VacStateData 3. Dead Tuple list 4. CRC32 The files are removed - when original physical heap files are removed, - when vacuum full have been issued, - or after the content has been read in memory. - etc. Is there any potential big risk there? Correct me if I am wrong. Deferring completion of VACUUM means deferring refreshing the FSM. I borrow the code from DSM patch to merge free space info into FSM when vacuum stops. Are you saying you know for certain this lock is held for a long time, or are you just saying you think it is? If you have some evidence for long truncation times then that would be a separate issue of concern, since that might starve out normal users. Please say more? Sorry. I *thought* it is. The benchmark has not shown such kind of problem anyway. Thanks for the clarification for me. :) Regards, -- Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs wrote: Galy, please hear that people like your idea and understand your use case, but just don't like all of the proposal, just the main thrust of it. The usual way is that (people that agree + amount of your exact idea remaining) = 100% Thank you. I am glad to hear that. :) But I still can not kill my idea yet. Let's come to the core issue we care about: do we need the stop-on-dime feature to stop vacuum immediately? As my previous opinion: if there are some problems for long running vacuum, yes we *did need* to stop vacuum immediately. I am still not convinced that we don’t have such kind of problems. The potential problem for running a long vacuum is that it may block some foreground transactions, like ALTER TABLE; if it is true that long running vacuum did block some DDLs for a long time, it is a big problem. I think we need to stop vacuum immediately to handle such kind of problems. I admit that the implementation is much complex, but I can not see any big problems to save the dead tuples out and read it in again(like two phase commit does). Why do we need to hold the lock and transaction? We can open the lock and abandon the transaction ID, vacuum can take the lock and get a new ID when restarting. Why do we need to worry about if the dead tuple is still alive, only vacuum will sweep them, HOT can not touch the tuple until we have finished sweeping. But it is true that in some scenarios, it is better to stop after the cycle has finished, then the effect of vacuum can appear quickly. So I hope the final design may combine then together. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Resumable vacuum proposal and design overview
Tom Lane wrote: One problem with it is that a too-small target would result in vacuum proceeding to scan indexes after having accumulated only a few dead tuples, resulting in increases (potentially enormous ones) in the total work needed to vacuum the table completely. Yeah. This is also my big concern about the idea of Simon and you. Every vacuum stop causes an index scan, this means the total time of vacuum is relative to how much times vacuum have stopped. I think it's sufficient to have two cases: abort now, and restart from the last cycle-completion point next time (this would basically just be If there is only one cycle, then there is a problem for this approach. (If maintenance work memory is not so small, this situation is normal.) or set a flag to stop at the next cycle-completion point. The extra cost to clean indexes may prevent this approach to work in practices. Perhaps a more useful answer to the problem of using a defined maintenance window is to allow VACUUM to respond to changes in the vacuum cost delay settings on-the-fly. This is a good idea! Itagaki also have talked about exactly the same idea to me yesterday. But if we change the parameters on-fly to make vacuum less aggressive, my concern is that: is there any potential problems to run vacuum in several days? Although I don’t have plan to touch VACUUM FULL, but seems concurrent VACUUM also holds excusive lock when truncating table. I am a little worrying about this kind of problem for this approach. Also maybe we need some share memory area to share the cost-delay parameter between VACUUMs, or any other ideas? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum next steps, take 2
Tom Lane wrote: Saving the array is expensive both in runtime and code complexity, and I don't believe we can trust it later --- at least not without even more expensive-and- complex measures, such as WAL-logging every such save :-( I don’t understand well the things you are worrying about. If we find that we can not trust the saved file, or the file has corrupted, then we can drop it and scan from the beginning of the heap block. If something like CLUSTER, PITR has changed the physical layout of heap, then we can simply drop the files. Why do we need WAL for it? I don’t see any point in it. Also, I don’t think it is expensive. If it is combined with maintenance window to stop once in a whole day, writing 256MB/2 = 128MB things out can not be said expensive. Of course, this feature isn’t for autovacuum to use it in every minutes, autovacuum can use it after it has adopted maintenance window. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Resumable vacuum proposal and design overview
Tom Lane wrote: Huh? There is no extra cost in what I suggested; it'll perform exactly the same number of index scans that it would do anyway. The things I wanted to say is that: If we can stop at any point, we can make maintenance memory large sufficient to contain all of the dead tuples, then we only need to clean index for once. No matter how many times vacuum stops, indexes are cleaned for once. But in your proposal, indexes will be scan as many as vacuum stops. Those extra indexes cleaning are thought as the extra cost compared with stop-on-dime approach. To vacuum a large table by stopping 8 times, tests show the extra cost can be one third of the stop-on-dime approach. So I'm not really convinced that being able to stop a table vacuum halfway is critical. To run vacuum on the same table for a long period, it is critical to be sure: 1. not to eat resources that foreground processes needs 2. not to block vacuuming of hot-updated tables 3. not to block any transaction, not to block any backup activities In the current implementation of concurrent vacuum, the third is not satisfied obviously, the first issue comes to my mind is the lazy_truncate_heap, it takes AccessExclusiveLock for a long time, that is problematic. Except we change such kinds of mechanism to ensure that there is no problem to run vacuum on the same table for several days, we can not say we don’t need to stop in a half way. Best Regards, -- Galy Lee [EMAIL PROTECTED] NTT Open Source Software Center ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Resumable vacuum proposal and design overview
Hi We are developing a new feature for vacuum, here is a brief overview about it. Introduction A) What is it? This feature enables vacuum has resumable capability. Vacuum can remembers the point it stops, then resumes interrupted vacuum operation from the point next time. The SQL syntaxes for this feature has the format similar to: # VACUUM ALL -- It has the same functionality with current vacuum; # VACUUM -- It uses the information saved by previous interrupted vacuum to continue vacuum processing. B) Why do we need it? For large table, although it can be vacuum by enabling vacuum cost-based delay, but the processing may last for several days. It definitely has negative affect on system performance. So if systems which has maintenance time, it is preferred to vacuum in maintenance window. Vacuum task can be split into small subtasks, and they can be scheduled into maintenance window time slot. This can reduce the impact of vacuum to system service. But currently vacuum task can not be split: if an interrupt or error occurs during vacuum processing, vacuum totally forgets what it has done and terminates itself. Following vacuum on the same table has to scan from the beginning of the heap block. This proposal enable vacuum has capability to stop and resume. C) How can we use it? This feature can enable autovacuum or cron-vacuum-scheduler to develop more sophisticated vacuum schedule schemes combined with *maintenance window*. For example, if the system has two hour maintenance window every day, vacuum can be interrupted by this way: SET statement_timeout TO 2*3600*1000; # two hours VACUUM freeze talname; SET statement_timeout TO DEFAULT; Or it can be interrupted by SIGINT directly. Autovacuum or manual vacuum scheduler can split a large vacuum task into small subtasks by this feature. Subtasks can be scheduled according to system load. Design Overview --- A) Vacuum internal overview Concurrent vacuum mainly has the following steps to vacuum a table: 1. scan heap to collect dead tuple list 2. (if the table has indexes) scan and sweep indexes 3. sweep dead tuples collected in step 1 4. perform additional index cleanup operation 5. (if a certain of free space found) truncate table 6. register free space to FSM 7. update statistics of the table If maintenance memory is not sufficient to contain all of dead tuples, step 1-3 are repeated. C) Where to stop The first option is that it can accept stop request at boundary of each steps, and it resumes from the unfinished step. But some steps takes a short time, some take a long time. For example, step 1 takes more than 40% of total time, but step 5 and 6 take less than 1% of total time. This granularity is too larger. The second option is to accept stop request before vacuum begin to process one of the blocks in step 1-4. In current vacuum implementation, vacuum_delay_point is also placed in such locations. This option has a much good granularity than option 1. This implementation accepts stop request at *blocks level* in step 1-4. D) How to stop and resume - stop: When vacuum stop in step 1-4, vacuum perform following things: vacuum saves dead tuple list, the heap block number on which it stop, unswept index relations, unswept dead tuple and FreezeLimit into a disk file. Free space information collected in step 1-3 can be registered to FSM when vacuum is interrupted. - resume: When vacuum is resuming, it reads out saved information and skip the finished operations, then continue to finish remaining operations. There are two additional issues which need to be discussed here: *) FreezeLimit. There are two options to select FreezeLimit for a resuming a vacuum:(a) FreezeLimit of interrupted vacuum, (b) FreezeLimit of resuming vacuum. FreezeLimit-(b) is safe. But for the heap blocks are not full scanned, so when FreezeLimit-(b) is used , the relfrozenxid should be updated with FreezeLimit-(a) at the end of vacuum, and CLOG can only be truncated by FreezeLimit-(a). *) Resuming index operation. There are two possible resuming levels when vacuum is interrupted in step 2:(a) skip the *index relations* which have been swept completely (b) skip the *index blocks* which have been swept. Level (a) is safe and simple to be implemented; level (b) need to consider the scenarios that leaf page is split; further investigation is needed to clarify if it is safe or not. This implementation adopts *level (a) resuming*. 3. Implementation Plan -- We are working on the patch now; I will send the WIP patch to the list later. I am sorry this late proposal, but I hope it can go into 8.3. Welcome your comments and ideas. Best Regards Galy Lee ([EMAIL PROTECTED]) NTT Open Source Software Center ---(end
Re: [HACKERS] Resumable vacuum proposal and design overview
Simon Riggs wrote: old dead tuple list. If the system manages the dead tuple list we may need to keep such files around for long periods, which doesn't sound great either. The system manages such files. The files are kept in location like $PGDATA/pg_vacuum. They are removed when CLUSTER, DROP TABLE, ALTER TABLE, VACUUM etc, changes the physical layout of heap. I think this is a reasonable way. ISTM simpler to make the optional stop/restart point be after one full cycle of cleaning, so exactly at the point where we discard one tuple list and we move on the next. I just summary your ideas here: Where to stop? - stop after one full cycle of cleaning finished How to stop? - When stopping at step 1, goes straight to step 2 and step 3 to clean the dead tuples harvested so far. - When stopping at step 2 or step 3, vacuum ignores the stop request to finish all of the steps. Merit: - For it does not require external file to store dead tuple list, it can simplify the implementation. Demerit: - The time length between stop request generated and vacuum stopping is still quit unpredictable. (Considering that sometimes step 2 takes 2 or 3 hours, step 3 may take the same time with step 1) - It is difficult to refined vacuum in maintenance window frame. The point in here is that *how long* can we accept for vacuum stopping. For example, there is one table: - The table is a hundreds GBs table. - It takes 4-8 hours to vacuum such a large table. - Enabling cost-based delay may make it last for 24 hours. - It can be vacuumed during night time for 2-4 hours. It is true there is no such restrict requirement that vacuum need to be interrupt immediately, but it should be stopped in an *predictable way*. In the above example, if we have to wait for the end of one full cycle of cleaning, it may take up to 8 hours for vacuum to stop after it has received stop request. This seems quit unacceptable. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] autovacuum next steps
Gregory Stark wrote: If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle of vacuums on smaller tables, then resume, that problem would go away. That sounds too difficult though, but perhaps we could do something nearly as good. I think to make vacuum has this interrupted-resume capability is quite useful for large table. It can provide more flexibility for autovacuum to create a good schedule scheme. Sometimes it takes a whole day to vacuum the large table (Hundreds-GB table may qualify); setting the cost_delay make it even lasts for several days. If the system has maintenance time, vacuum task of the large table can be split to fit into the maintenance time by interrupted-resume feature. One option that I've heard before is to have vacuum after a single iteration (ie, after it fills maintenance_work_mem and does the index cleanup and the second heap pass), remember where it was and pick up from that point next time. Even a single iteration may take a long time, so it is not so much useful to have a break in the boundary of the iteration. I think it is not so difficult to get vacuum to remember where it leaves and start from where it leaves last time. The following is a basic idea. A typical vacuum process mainly have the following phases: Phase 1. scan heap Phase 2. scan and sweep index Phase 3. sweep heap Phase 4. update FSM Phase 5. truncate CLOG Where vacuum is interrupted, we can just save the collected information into the disk, and restore it later when vacuum restarts. When vacuum process is interrupted, we can remember the dead tuple list and the block number it has scanned in phase 1; the indexes it has cleanup in phase 2; the tuples it has swept in phase 3. Before exiting from vacuum, we can also merge the free space information into FSM. We are working on this feature now. I will propose it latter to discuss with you. Best Regards Galy Lee -- NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PERFORM] how to plan for vacuum?
Jim C. Nasby wrote: On Thu, Jan 25, 2007 at 12:52:02AM -0300, Alvaro Herrera wrote: Jim C. Nasby wrote: I'll generally start with a cost delay of 20ms and adjust based on IO utilization. I've been considering set a default autovacuum cost delay to 10ms; does this sound reasonable? The problem in here is that we can not easily find a direct relation between Cost delay - CPU/IO utilization -- real performance (response time in peak hour) It is very hard for any normal user to set this correctly. I think the experience / trial-and-error approach is awful for the user, every DBA need to be an expert of vacuum to keep the system stable. For vacuum is still a big threat to the performance, a more intelligent way is needed. A lot of efforts have contributed to make vacuum to be a more lightweight operation, but I think we should still need more efforts on how to make it can be used easily and safely. So I have proposed the vacuum in time feature in previous; just let vacuum know how long can it runs, and then it will minimize the impact in the time span for you. Some argue that it should not have the maintenance window assumption, but the most safely way is to run in the maintenance window. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Deadline-Based Vacuum Delay
Tom Lane wrote: I think the context for this is that you have an agreed-on maintenance window, say extending from 2AM to 6AM local time, and you want to get all your vacuuming done in that window without undue spikes in the system load (because you do still have live users then, just not as many as during prime time). If there were a decent way to estimate the amount of work to be done then it'd be possible to spread the work fairly evenly across the window. What I do not see is where you get that estimate from --- especially since you probably have more than one table to vacuum in your window. It is true that there is not a decent way to estimate the amount of work to be done. But the purpose in here is not “spread the vacuum over 6 hours exactly”, it is “finish vacuum within 6 hours, and spread the spikes as much as possible”. So the maximum estimation of the work is enough to refine the vacuum within the window, it is fine if vacuum run quickly than schedule. Also we don’t need to estimate the time of vacuum, we only need to compare the actual progress of time window and the progress of the work, and then adjust them to have the same pace in the delay point. The maximum of the work of vacuum can be estimated by size of the heap, the size of the index, and the number of dead tuples. For example the lazy vacuum has the following works: 1. scan heap 2. vacuum index 3. vacuum heap 4. truncate heap Although 2 and 4 are quite unpredictable, but the total amount of work including 1, 2, 3, and 4 can be estimated. The other problem is that vacuum only during a maintenance window doesn't seem all that compelling a policy anyway. We see a lot of examples of tables that need to be vacuumed much more often than once a day. So I'd rather put effort into making sure that vacuum can be run in the background even under high load, instead of designing around a maintenance-window assumption. This feature is not necessary has a maintenance window assumption. For example, if a table needs to be vacuumed every 3 hours to sweep the garbage, then instead of tuning cost delay GUC hardly to refine vacuum in 3 hours, we can make vacuum finish within the time frame by “VACUUM IN time” feature. If we can find a good way to tune the cost delay GUC to enable vacuum to catch up with the speed of garbage generation in the high frequency update system, then we won’t need this feature. For example, the interval of two vacuums can be estimated by tracking the speed of the dead tuple generation, but how can you tune the vacuum time to fit in the interval of two vacuums? It seems that there is not easy to tune the delay time of vacuum correctly. Best Regards -- Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Deadline-Based Vacuum Delay
Hello, I'd like to propose a new feature, Deadline-Based Vacuum Delay, the syntax is something like VACUUM IN 6 HOURS. Vacuum is a non-trivial task to be performed. The database needs to be vacuumed before the system performance suffers from the garbage; it also needs to ensure the system won't be hammered during the vacuum operation. The cost-based delay vacuum can reduce the impact of the disk I/O storm, but sometimes it can last for several hours and it is hard to know when it will end. So there are many complains about the unpredictable execution time of vacuum. On the other hand, several users want to run vacuum in their maintenance window. Also, autovacuum kicks vacuum when a certain amount of garbage has been generated. It restricts the interval of each vacuum for a relation. So I am thinking another way to perform vacuum. For example vacuum can be refined in a maintenance time frame like VACUUM IN 6 HOURS, and then vacuum operation will be performed within the window. The delay time is adjusted internally to spread the disk I/O over the time frame. This may make vacuum more predictable, also ensures vacuum doesn't have side effect on the producing system outside the maintenance window. Any ideas or comments? Best Regards, -- Galy Lee NTT Open Source Software Center ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match