[HACKERS] Changing semantics of autovacuum_cost_limit

2007-03-26 Thread Galy Lee
Hi all,

As AUTOVACUUM is having multiple workers now, the semantics of
autovacuum_cost_limit also need to be redefined.

Currently, autovacuum_cost_limit is the accumulated cost that will cause
one single worker vacuuming process to sleep.  It is used to restrict
the I/O consumption of a single vacuum worker. When there are N workers,
the I/O consumption by autovacuum workers can be increased by N times.
This autovacuum_cost_limit semantics produces unpredictable I/O
consumption for multiple-autovacuum-workers.

One simple idea is to set cost limit for every worker to:
autovacuum_cost_limit / max_autovacuum_workers. But for scenarios which
have fewer active workers, it is obvious unfair to active workers. So a
better way is to set cost limit of every active worker to:
autovacuum_cost_limit/autovacuum_active_workers. This ensures the I/O
consumption of autovacuum is stable.

Worker can be extended to have its own cost_limit on share memory. When
a worker is brought up or a worker has finished its work, launcher
recalculates:

   worker_cost_limit= (autovacuum_cost_limit/autovacuum_active_workers)

and sets new value for each active workers.

The above approach requires launcher can change cost delay setting of
workers on-the-fly. This can be achieved by forcing VACUUM refers to the
cost delay setting in its worker’s share memory every vacuum_delay_point.

Any comments or suggestions?

Best Regards

Galy Lee
[EMAIL PROTECTED]
NTT Open Source Software Center

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] autovacuum next steps, take 3

2007-03-12 Thread Galy Lee

Alvaro Herrera wrote:
worker to-do list
-
It removes from its to-do list the tables being processed.  Finally, it
writes the list to disk.

I am worrying about the worker-to-do-list in your proposal. I think
worker isn't suitable to maintain any vacuum task list; instead
it is better to maintain a unified vacuum task queue on autovacuum share
memory.

Here are the basic ideas:

* Why is such a task queue needed?

- Launcher might schedule all vacuum tasks by such a queue. It provides
a facility to schedule tasks smartly for further autovacuum improvement.

- Also such a task list can be viewed easily from a system view. This
can be implemented easily in 8.3 by the task queue.

* VACUUM task queue

VACUUM tasks of cluster are maintained in a unified cluster-wide queue
in the share memory of autovacuum.

  global shared TaskInfo tasks[];

It can be viewed as:

SELECT * FROM pg_autovacuum_tasks;
  dbid | relid | group | worker
---+---+---+
 2 | 20001 | 0 |  1001
 2 | 20002 | 0 |
 3 | 30001 | 0 |  1002

VACUUM tasks belong to the same database might be divided into several
groups. One worker might be assigned to process one specific task group.

The task queue might be filled by dedicated task-gathering-worker or it
might be filled by *external task gatherer*.

It allows external program to develop a more sophisticated vacuum
scheme. Based on previous discussion, it appears that it is difficult to
implement an all-purpose algorithm to satisfy the requirements of all
applications. It is better to allow user to develop their vacuum
strategies. *User-defined external program* might fill the task queue,
and schedule tasks by their own strategy. Launcher will response for
coordinating workers only. This pluggable-vacuum-strategy approach seems
a good solution.

* status of worker

It is also convenience to allow user to monitor the status of vacuum
worker by a system view.The snapshot of worker can also be viewed as:

SELECT * FROM pg_autovacuum_workers;
 pid  |  dbid | relid | group
--+---+---+---
 1001 | 2 | 20001 | 0
 1002 | 3 | 30001 | 0


Best Regards
Galy Lee
lee.galy _at_ oss.ntt.co.jp
NTT Open Source Software Center


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum next steps, take 3

2007-03-12 Thread Galy Lee
Hi, Alvaro

Alvaro Herrera wrote:
 keep such a task list in shared memory, because we aren't able to grow
 that memory after postmaster start. 

We can use the fix-size share memory to maintain such a queue. The
maximum task size is the number of all tables. So the size of the queue
can be the same with max_fsm_relations which is usually larger than the
numbers of tables and indexes in the cluster. This is sufficient to
contain most of the vacuum tasks.

Even though the queue is over flow, for task-gatherer is scanning the
whole cluster every autovacuum_naptime, it is quickly enough to pick
those tasks up again. We don’t need to write any thing to external file.
So there is no problem to use a fix-size share memory to maintain a
global queue.

 Other idea I had was having a third kind of autovacuum process, namely a
 schedule builder

If we have such a global queue, task-gathering worker can connect to
every database every naptime to gather tasks in time.

The task-gathering worker won’t build the schedule, LAUNCHER or external
program responses for such activity. How to dispatch tasks to worker is
just a scheduling problem, a good dispatching algorithm needs to ensure
each worker can finish their tasks on time, this might resolve the
headache HOT table problem. But this is a further issue to be discussed
after 8.3.

Best Regards

Galy Lee
lee.galy _at_ oss.ntt.co.jp
NTT Open Source Software Center


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Galy Lee

Alvaro Herrera wrote:
 I don't have anything else as detailed as a plan.  If you have
 suggestions, I'm all ears.
Cool, thanks for the update. :) We also have some new ideas on the
improvement of autovacuum now. I will raise it up later.

 Now regarding your restartable vacuum work.  
 Does this make sense?
I also have reached a similar conclusion now.  Thank you.

Regards
Galy

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-08 Thread Galy Lee

Tom Lane wrote:
 Er, why not just finish out the scan at the reduced I/O rate?  Any sort

Sometimes, you may need to vacuum large table in maintenance window and
hot table in the service time. If vacuum for hot table does not eat two
much foreground resource, then you can vacuum large table with a lower
IO rate outside maintenance window; but if vacuum for hot table is
overeating the system resource, then launcher had better to stop the
long running vacuum outside maintenance window.

But I am not insisting on the stop-start feature at this moment.
Changing the cost delay dynamically sounds more reasonable. We can use
it to balance total I/O of workers in service time or maintenance time.
It is not so difficult to achieve this by leveraging the share memory of
autovacuum.

Best Regards
Galy Lee

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Galy Lee
Alvaro,

Alvaro Herrera wrote:
 I still haven't received the magic bullet to solve the hot table
 problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Thanks,

Galy Lee
lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Restartable VACUUM design overview version 2

2007-03-05 Thread Galy Lee
Hi

Thanks for a lot of feadback and good ideas on the restartable vacuum.
Here is a new design overview of it based on previous discussions.

There are several ideas to address the problem of long running VACUUM
in a defined maintenance window.  One idea might be: when maintenance
time is running out, we can do the following things:
- send a smart stop request to vacuum and vacuum can stop at a right
  place.(It might take a long time.)
- change the cost delay setting of vacuum on-the-fly to make vacuum
  less aggressive.

The followings are the  discussions for them.

Restartable vacuum design overview
--

* Where to stop:

  There are two approaches to stop a vacuum:

  (1) tell VACUUM where to stop when it is starting
   VACUUM can be told to stop at a right point when it starts (By SQL
   syntax like: VACUUM SOME). The optional stop point is after one
   full fill-workmem-clean-index-clean-deadtuple cycle. VACUUM stops
   when it has finished such a cycle.

  (2) interrupt VACUUM when it is running.
   Another approach is to interrupt the running VACUUM. VACUUM checks
   for a smart stop request at normal vacuum delay points, if such a
   request is detected, a flag is set to tell VACUUM to stop at a right
   point. VACUUM stops at the end of one full fill-workmem-clean-index
   -clean-deadtuple cycle.

   But I can not figure out a simple way to request a running VACUUM to
   stop in (2), for the signals of backend have been use up. (1) is
   simple to be implemented, for it doesn’t require a communication
   with running VACUUM.

* How to stop

   When VACUUM is stopping,
   - it saves the block number that it had reached to pg_class;
   - it also updates the free space information to FSM. (This might be
 posted by a separated patch.)

* How to restart:

  When VACUUM is restarting, it reads the stored block from pg_class to
  restart the interrupted scan.


Change VACUUM cost delay settings on-the-fly feature
--

When the end of maintenance window comes, we might notify VACUUM to use
a set of less aggressive cost delay setting.

I don’t have a clear idea on how to implement this feature yet. Maybe
we need a message passing mechanism between backbends to exchange the
cost delay setting like a patch in here:
   http://archives.postgresql.org/pgsql-patches/2006-04/msg00047.php

Another simple way to achieve this is to use the setting for different
maintenance window in system catalog. There are some previous
discussions about the implementation of maintenance window, but further
discussions still have not been raised. So it is seems that it is better
to implement this feature after the implementation of maintenance
window.

Implementation plan
---

Changing VACUUM cost delay setting on-the-fly requires a internal
massage passing mechanism or the implementation of maintenance window,
maybe it is not a good timing to rush for it now.  But I hope the
*restartable VACUUM feature* can be accepted for 8.3.

Hope your comments and suggestions.

Best Regards

Galy Lee lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Galy Lee


Simon Riggs wrote:
 You haven't explained how saving the dead-tuple-list could be done
 in a safe mannner and it seems risky to me.

The files are placed in a new directory $PGDATA/pg_vacuum
with the name: spcNode.dbNode.relNode for each relations
which have been interrupted during vacuum.

It has the format likes:

1. VacStateFileHeader
2. VacStateData
3. Dead Tuple list
4. CRC32

The files are removed
 - when original physical heap files are removed,
 - when vacuum full have been issued,
 - or after the content has been read in memory.
 - etc.

Is there any potential big risk there? Correct me if I am
wrong.

 Deferring completion of VACUUM means deferring refreshing the FSM.

I borrow the code from DSM patch to merge free space info
 into FSM when vacuum stops.

 Are you saying you know for certain this lock is held for a long time,
 or are you just saying you think it is? If you have some evidence for
 long truncation times then that would be a separate issue of concern,
 since that might starve out normal users. Please say more?

Sorry. I *thought* it is. The benchmark has not shown such
kind of problem anyway. Thanks for the clarification for me. :)

Regards,
-- 
Galy Lee
lee.galy _at_ oss.ntt.co.jp
NTT Open Source Software Center

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-28 Thread Galy Lee

Simon Riggs wrote:
 Galy, please hear that people like your idea and understand your use
 case, but just don't like all of the proposal, just the main thrust of
 it. The usual way is that 
 (people that agree + amount of your exact idea remaining) = 100%

Thank you. I am glad to hear that. :) But I still can not kill my
idea yet.

Let's come to the core issue we care about: do we need the stop-on-dime
feature to stop vacuum immediately?  As my previous opinion: if there
are some problems for long running vacuum, yes we *did need* to stop
vacuum immediately.

I am still not convinced that we don’t have such kind of problems. The
potential problem for running a long vacuum is that it may block some
foreground transactions, like ALTER TABLE; if it is true that long
running vacuum did block some DDLs for a long time, it is a big problem.
I think we need to stop vacuum immediately to handle such kind of
problems.

I admit that the implementation is much complex, but I can not
see any big problems to save the dead tuples out and read it in
again(like two phase commit does). Why do we need to hold the lock
and transaction? We can open the lock and abandon the transaction ID,
vacuum can take the lock and get a new ID when restarting. Why do we
need to worry about if the dead tuple is still alive, only vacuum will
sweep them, HOT can not touch the tuple until we have finished sweeping.

But it is true that in some scenarios, it is better to stop after
the cycle has finished, then the effect of vacuum can appear quickly.
So I hope the final design may combine then together.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-27 Thread Galy Lee
Tom Lane wrote:
One problem with it is that a too-small target would result in vacuum
proceeding to scan indexes after having accumulated only a few dead
tuples, resulting in increases (potentially enormous ones) in the total
work needed to vacuum the table completely.

Yeah. This is also my big concern about the idea of Simon and you.
Every vacuum stop causes an index scan, this means the total
time of vacuum is relative to how much times vacuum have stopped.

I think it's sufficient to have two cases: abort now, and restart from
the last cycle-completion point next time (this would basically just be
If there is only one cycle, then there is a problem for this approach.
(If maintenance work memory is not so small, this situation is normal.)

or set a flag to stop at the next cycle-completion point.

The extra cost to clean indexes may prevent this approach to work in
practices.

Perhaps a more useful answer to the problem of using a
defined maintenance window is to allow VACUUM to respond to changes in
the vacuum cost delay settings on-the-fly.

This is a good idea! Itagaki also have talked about exactly the same
idea to me yesterday.

But if we change the parameters on-fly to make vacuum less aggressive,
my concern is that: is there any potential problems to run vacuum in
 several days? Although I don’t have plan to touch VACUUM FULL, but
seems concurrent VACUUM also holds excusive lock when truncating table.
I am a little worrying about this kind of problem for this approach.

Also maybe we need some share memory area to share the cost-delay
parameter between VACUUMs, or any other ideas?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] autovacuum next steps, take 2

2007-02-27 Thread Galy Lee

Tom Lane wrote:
 Saving the array is
 expensive both in runtime and code complexity, and I don't believe we
 can trust it later --- at least not without even more expensive-and-
 complex measures, such as WAL-logging every such save :-(
I don’t understand well the things you are worrying about.
If we find that we can not trust the saved file, or the file has
corrupted, then we can drop it and scan from the beginning of the heap
block. If something like CLUSTER, PITR has changed the physical layout
of heap, then we can simply drop the files. Why do we need WAL for it?
I don’t see any point in it.

Also, I don’t think it is expensive. If it is combined with maintenance
 window to stop once in a whole day, writing 256MB/2 = 128MB things out
can not be said expensive. Of course, this feature isn’t for autovacuum
 to use it in every minutes, autovacuum can use it after it has adopted
maintenance window.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-27 Thread Galy Lee


Tom Lane wrote:
 Huh?  There is no extra cost in what I suggested; it'll perform
 exactly the same number of index scans that it would do anyway.

The things I wanted to say is that:
If we can stop at any point, we can make maintenance memory large
sufficient to contain all of the dead tuples, then we only need to
clean index for once. No matter how many times vacuum stops,
indexes are cleaned for once.

But in your proposal, indexes will be scan as many as vacuum stops.
Those extra indexes cleaning are thought as the extra cost compared
with stop-on-dime approach. To vacuum a large table by stopping 8
times, tests show the extra cost can be one third of the stop-on-dime
 approach.


So I'm not really convinced that being able to stop a table
 vacuum halfway is critical.
To run vacuum on the same table for a long period, it is critical
to be sure:
1. not to eat resources that foreground processes needs
2. not to block vacuuming of hot-updated tables
3. not to block any transaction, not to block any backup activities

In the current implementation of concurrent vacuum, the third is not
satisfied obviously, the first issue comes to my mind is the
lazy_truncate_heap, it takes AccessExclusiveLock for a long time,
that is problematic. Except we change such kinds of mechanism to ensure
that there is no problem to run vacuum on the same table for several
days, we can not say we don’t need to stop in a half way.


Best Regards,
-- 
Galy Lee [EMAIL PROTECTED]
NTT Open Source Software Center

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Resumable vacuum proposal and design overview

2007-02-26 Thread Galy Lee

Hi

We are developing a new feature for vacuum, here is a brief overview
about it.

Introduction


A) What is it?

This feature enables vacuum has resumable capability. Vacuum can
remembers the point it stops, then resumes interrupted vacuum operation
from the point next time.

The SQL syntaxes for this feature has the format similar to:

  # VACUUM ALL  -- It has the same functionality with current vacuum;
  # VACUUM  -- It uses the information saved by previous
interrupted vacuum to continue vacuum processing.

B) Why do we need it?

For large table, although it can be vacuum by enabling vacuum cost-based
delay, but the processing may last for several days. It definitely has
negative affect on system performance. So if systems which has
maintenance time, it is preferred to vacuum in maintenance window.
Vacuum task can be split into small subtasks, and they can be
scheduled into maintenance window time slot. This can reduce the impact
of vacuum to system service.

But currently vacuum task can not be split: if an interrupt or error
occurs during vacuum processing, vacuum totally forgets what it has
done and terminates itself. Following vacuum on the same table has to
scan from the beginning of the heap block. This proposal enable vacuum
has capability to stop and resume.

C) How can we use it?

This feature can enable autovacuum or cron-vacuum-scheduler to develop
more sophisticated vacuum schedule schemes combined with *maintenance
window*.

For example, if the system has two hour maintenance window every day,
vacuum can be interrupted by this way:

  SET statement_timeout TO 2*3600*1000; # two hours
  VACUUM freeze talname;
  SET statement_timeout TO DEFAULT;

Or it can be interrupted by SIGINT directly.

Autovacuum or manual vacuum scheduler can split a large vacuum task
into small subtasks by this feature. Subtasks can be scheduled according
to system load.

Design Overview
---

A) Vacuum internal overview

Concurrent vacuum mainly has the following steps to vacuum a table:

  1. scan heap to collect dead tuple list
  2. (if the table has indexes) scan and sweep indexes
  3. sweep dead tuples collected in step 1
  4. perform additional index cleanup operation
  5. (if a certain of free space found) truncate table
  6. register free space to FSM
  7. update statistics of the table

If maintenance memory is not sufficient to contain all of dead tuples,
step 1-3 are repeated.

C) Where to stop

The first option is that it can accept stop request at boundary
of each steps, and it resumes from the unfinished step. But some
steps takes a short time, some take a long time. For example,
step 1 takes more than 40% of total time, but step 5 and 6 take
less than 1% of total time. This granularity is too larger.

The second option is to accept stop request before vacuum begin to
process one of the blocks in step 1-4. In current vacuum implementation,
vacuum_delay_point is also placed in such locations. This option has a
much good granularity than option 1.

This implementation accepts stop request at *blocks level* in step 1-4.

D) How to stop and resume

- stop:

   When vacuum stop in step 1-4, vacuum perform following things:
   vacuum saves dead tuple list, the heap block number on which it
   stop, unswept index relations, unswept dead tuple and FreezeLimit
   into a disk file.

   Free space information collected in step 1-3 can be registered to
   FSM when vacuum is interrupted.

- resume:

   When vacuum is resuming, it reads out saved information and skip the
   finished operations, then continue to finish remaining operations.

   There are two additional issues which need to be discussed here:
*) FreezeLimit.
   There are two options to select FreezeLimit for a resuming a
   vacuum:(a) FreezeLimit of interrupted vacuum, (b) FreezeLimit of
   resuming vacuum. FreezeLimit-(b) is safe. But for the heap
   blocks are  not full scanned, so when FreezeLimit-(b) is used ,
   the relfrozenxid should be updated with FreezeLimit-(a) at the
   end of vacuum, and CLOG can only be truncated by FreezeLimit-(a).

*) Resuming index operation.
   There are two possible resuming levels when vacuum is interrupted
   in step 2:(a) skip the *index relations* which have been swept
   completely (b) skip the *index blocks* which have been swept.
   Level (a) is safe and simple to be implemented; level (b) need to
   consider the scenarios that leaf page is split; further
   investigation is needed to clarify if it is safe or not.

   This implementation adopts *level (a) resuming*.

3. Implementation Plan
--

We are working on the patch now; I will send the WIP patch to the list
later. I am sorry this late proposal, but I hope it can go into 8.3.

Welcome your comments and ideas.

Best Regards
Galy Lee ([EMAIL PROTECTED])
NTT Open Source Software Center

---(end

Re: [HACKERS] Resumable vacuum proposal and design overview

2007-02-26 Thread Galy Lee
Simon Riggs wrote:
old dead tuple list. If the system manages the dead tuple list we may
need to keep such files around for long periods, which doesn't sound
great either.

The system manages such files. The files are kept in location like
 $PGDATA/pg_vacuum. They are removed when CLUSTER, DROP TABLE, ALTER
TABLE, VACUUM etc, changes the physical layout of heap. I think this
is a reasonable way.

ISTM simpler to make the optional stop/restart point be after one full
cycle of cleaning, so exactly at the point where we discard one tuple
list and we move on the next.

I just summary your ideas here:

  Where to stop?
   -  stop after one full cycle of cleaning finished

How to stop?
   - When stopping at step 1, goes straight to step 2 and step 3
 to clean the dead tuples harvested so far.
  -  When stopping at step 2 or step 3, vacuum ignores the
 stop request to finish all of the steps.

Merit:
  - For it does not require external file to store dead
   tuple list, it can simplify the implementation.

Demerit:
  - The time length between stop request generated and vacuum
stopping is still quit unpredictable. (Considering that
sometimes step 2 takes 2 or 3 hours, step 3 may take the
same time with step 1)
  - It is difficult to refined vacuum in maintenance window
frame.

The point in here is that *how long* can we accept for vacuum stopping.

For example, there is one table:
   - The table is a hundreds GBs table.
   - It takes 4-8 hours to vacuum such a large table.
   - Enabling cost-based delay may make it last for 24 hours.
   - It can be vacuumed during night time for 2-4 hours.

It is true there is no such restrict requirement that vacuum
need to be interrupt immediately, but it should be stopped in an
*predictable way*. In the above example, if we have to wait for the end
 of one full cycle of cleaning, it may take up to 8 hours for vacuum to
stop after it has received stop request. This seems quit unacceptable.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Galy Lee


Gregory Stark wrote:

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.


I think to make vacuum has this interrupted-resume capability is quite 
useful for large table.


It can provide more flexibility for autovacuum to create a good schedule 
scheme. Sometimes it takes a whole day to vacuum the large table 
(Hundreds-GB table may qualify); setting the cost_delay make it even 
lasts for several days. If the system has maintenance time, vacuum task 
of the large table can be split to fit into the maintenance time by 
interrupted-resume feature.



One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.



Even a single iteration may take a long time, so it is not so much 
useful to have a break in the boundary of the iteration. I think it is 
not so difficult to get vacuum to remember where it leaves and start 
from where it leaves last time. The following is a basic idea.


A typical vacuum process mainly have the following phases:
 Phase 1. scan heap
 Phase 2. scan and sweep index
 Phase 3. sweep heap
 Phase 4. update FSM
 Phase 5. truncate CLOG

Where vacuum is interrupted, we can just save the collected information 
into the disk, and restore it later when vacuum restarts. When vacuum 
process is interrupted, we can remember the dead tuple list and the 
block number it has scanned in phase 1; the indexes it has cleanup in 
phase 2; the tuples it has swept in phase 3. Before exiting from vacuum, 
we can also merge the free space information into FSM.


We are working on this feature now.  I will propose it latter to discuss 
with you.


Best Regards
Galy Lee
--
NTT Open Source Software Center

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] how to plan for vacuum?

2007-01-25 Thread Galy Lee



Jim C. Nasby wrote:

On Thu, Jan 25, 2007 at 12:52:02AM -0300, Alvaro Herrera wrote:

Jim C. Nasby wrote:


I'll generally start with a cost delay of 20ms and adjust based on IO
utilization.

I've been considering set a default autovacuum cost delay to 10ms; does
this sound reasonable?


The problem in here is that we can not easily find a direct relation 
between
  Cost delay  - CPU/IO utilization -- real performance (response 
time in peak hour)


It is very hard for any normal user to set this correctly. I think the 
experience / trial-and-error approach is awful for the user, every DBA 
need to be an expert of vacuum to keep the system stable. For vacuum is 
still a big threat to the performance, a more intelligent way is needed.


A lot of efforts have contributed to make vacuum to be a more 
lightweight operation, but I think we should still need more efforts on 
how to make it can be used easily and safely.


So I have proposed the vacuum in time feature in previous; just let 
vacuum know how long can it runs, and then it will minimize the impact 
in the time span for you. Some argue that it should not have the 
maintenance window assumption, but the most safely way is to run in the 
maintenance window.






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Deadline-Based Vacuum Delay

2007-01-05 Thread Galy Lee

Tom Lane wrote:
 I think the context for this is that you have an agreed-on maintenance
 window, say extending from 2AM to 6AM local time, and you want to get
 all your vacuuming done in that window without undue spikes in the
 system load (because you do still have live users then, just not as many
 as during prime time).  If there were a decent way to estimate the
 amount of work to be done then it'd be possible to spread the work
 fairly evenly across the window.  What I do not see is where you get
 that estimate from --- especially since you probably have more than one
 table to vacuum in your window.

It is true that there is not a decent way to estimate the amount of work 
to be done. But the purpose in here is not “spread the vacuum over 6 
hours exactly”, it is “finish vacuum within 6 hours, and spread the 
spikes as much as possible”. So the maximum estimation of the work is 
enough to refine the vacuum within the window, it is fine if vacuum run 
quickly than schedule. Also we don’t need to estimate the time of 
vacuum, we only need to compare the actual progress of time window and 
the progress of the work, and then adjust them to have the same pace in 
the delay point.


The maximum of the work of vacuum can be estimated by size of the heap, 
the size of the index, and the number of dead tuples. For example the 
lazy vacuum has the following works:

 1. scan heap
 2. vacuum index
 3. vacuum heap
 4. truncate heap
Although 2 and 4 are quite unpredictable, but the total amount of work 
including 1, 2, 3, and 4 can be estimated.


 The other problem is that vacuum only during a maintenance window
 doesn't seem all that compelling a policy anyway.  We see a lot of
 examples of tables that need to be vacuumed much more often than once
 a day.  So I'd rather put effort into making sure that vacuum can be run
 in the background even under high load, instead of designing around a
 maintenance-window assumption.

This feature is not necessary has a maintenance window assumption. For 
example, if a table needs to be vacuumed every 3 hours to sweep the 
garbage, then instead of tuning cost delay GUC hardly to refine vacuum 
in 3 hours, we can make vacuum finish within the time frame by “VACUUM 
IN time” feature.


If we can find a good way to tune the cost delay GUC to enable vacuum to 
catch up with the speed of garbage generation in the high frequency 
update system, then we won’t need this feature. For example, the 
interval of two vacuums can be estimated by tracking the speed of the 
dead tuple generation, but how can you tune the vacuum time to fit in 
the interval of two vacuums? It seems that there is not easy to tune the 
delay time of vacuum correctly.


Best Regards
--
Galy Lee lee.galy _at_ oss.ntt.co.jp
NTT Open Source Software Center



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Deadline-Based Vacuum Delay

2006-12-27 Thread Galy Lee
Hello,

I'd like to propose a new feature, Deadline-Based Vacuum Delay, the
syntax is something like VACUUM IN 6 HOURS.

Vacuum is a non-trivial task to be performed. The database needs to be
vacuumed before the system performance suffers from the garbage; it also
needs to ensure the system won't be hammered during the vacuum operation.

The cost-based delay vacuum can reduce the impact of the disk I/O storm,
but sometimes it can last for several hours and it is hard to know when
it will end. So there are many complains about the unpredictable
execution time of vacuum.

On the other hand, several users want to run vacuum in their maintenance
window. Also, autovacuum kicks vacuum when a certain amount of garbage
has been generated. It restricts the interval of each vacuum for a relation.

So I am thinking another way to perform vacuum. For example vacuum can
be refined in a maintenance time frame like VACUUM IN 6 HOURS, and
then vacuum operation will be performed within the window. The delay
time is adjusted internally to spread the disk I/O over the time frame.
This may make vacuum more predictable, also ensures vacuum doesn't have
side effect on the producing system outside the maintenance window.

Any ideas or comments?

Best Regards,
-- 
Galy Lee
NTT Open Source Software Center


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match