Re: [PERFORM] 'Real' auto vacuum?

2005-08-31 Thread Brad Nicholson

Mindaugas Riauba wrote:


When a row is orphaned it's added to a list of possibly available rows.
When a new row is needed the list of possible rows is examined and the
first one with a transaction id less then the lowest running transaction
id is chosen to be the new row?  These rows can be in a heap so it's
really fast to find one.
 


This is the long-term plan.However, it's actually a lot harder than it
sounds.  Patches welcome.
   



 Some ETA? Since that would be the most welcome addition for us. We
have few very heavily updated databases where table bloat and constant
vacuuming is killing performance.



How often are you vacuuming (the definition of 'constantly' tends to 
vary)?  Are you vacuuming the whole database each time?  If so, identify 
which tables are being updated frequently, and vacuum those often.  
Vacuum other tables less frequently.


Also, are you you using VACUUM FULL (if so, you certainly don't want to be).

--
Brad Nicholson  416-673-4106[EMAIL PROTECTED]
Database Administrator, Afilias Canada Corp.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] 'Real' auto vacuum?

2005-08-30 Thread Ralph Mason
This is a wild and crazy thought which I am sure is invalid for some 
good reason.


But why can't postgres just vacuum itself as it goes along?

When a row is orphaned it's added to a list of possibly available rows.  
When a new row is needed the list of possible rows is examined and the 
first one with a transaction id less then the lowest running transaction 
id is chosen to be the new row?  These rows can be in a heap so it's 
really fast to find one.


Like magic - no more vacuuming.  No more holes for people to fall into.

Is this an oversimplification of the problem?

Ralph

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 'Real' auto vacuum?

2005-08-30 Thread mark
On Wed, Aug 31, 2005 at 10:21:20AM +1200, Ralph Mason wrote:
 This is a wild and crazy thought which I am sure is invalid for some 
 good reason.
 
 But why can't postgres just vacuum itself as it goes along?
 
 When a row is orphaned it's added to a list of possibly available rows.  
 When a new row is needed the list of possible rows is examined and the 
 first one with a transaction id less then the lowest running transaction 
 id is chosen to be the new row?  These rows can be in a heap so it's 
 really fast to find one.
 
 Like magic - no more vacuuming.  No more holes for people to fall into.

Yes please. :-)

 Is this an oversimplification of the problem?

But, yeah. It's probably not that easy, especially with really big
databases. Where is this free list stored? How efficient is it to keep
track of the lowest running transaction at all times? How does one
synchronize access to this free list, to ensure that processes don't
block up waiting for access to the free list? Is the fre list
journalled to prevent corruption, and the accidental re-use of a still
in use row? And, there would be a cost to scanning this list on every
insert or update.

As an outsider (like you?) I see the current model as a design flaw as
well. A neat and tidy model on paper. Not so nice in real life. The
need to vacuum in batch mode, to keep the database from dying, seems
intuitively bad.

I think there must be answers to this problem. Even simple
optimizations, such as defining a table such that any delete or update
within a table, upon commit, will attempt to vacuum just the rows that
should not be considered free for any new transactions. If it's in
use by an active transaction, oh well. It can be picked up by a batch
run of vacuum. If it's free though - let's do it now.

I think any optimizations we come up with, will be more happily accepted
with a working patch that causes no breakage... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 'Real' auto vacuum?

2005-08-30 Thread Josh Berkus
Ralph,

 When a row is orphaned it's added to a list of possibly available rows.
 When a new row is needed the list of possible rows is examined and the
 first one with a transaction id less then the lowest running transaction
 id is chosen to be the new row?  These rows can be in a heap so it's
 really fast to find one.

This is the long-term plan.However, it's actually a lot harder than it 
sounds.  Patches welcome.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 'Real' auto vacuum?

2005-08-30 Thread Ralph Mason

[EMAIL PROTECTED] wrote:


But, yeah. It's probably not that easy, especially with really big
databases. Where is this free list stored? How efficient is it to keep
track of the lowest running transaction at all times? How does one
synchronize access to this free list, to ensure that processes don't
block up waiting for access to the free list? Is the fre list
journalled to prevent corruption, and the accidental re-use of a still
in use row? And, there would be a cost to scanning this list on every
insert or update.
 

I suspect the freelist could be stored as an index, and just handily 
postgres supports those out of the box.There would be a cost yes, 
but then what is the cost of adding pages to the file all the time?  I 
guess as with all things there is no one size fits all, so perhaps you 
could turn it off - although I expect for 99.9% of the cases 'on' would 
be the better choice. If it gets broken there is already the reindex 
code that can fix it. A coherency / fixing / recover of a table command 
would probably be a useful tool anyway.



As an outsider (like you?) I see the current model as a design flaw as
well. A neat and tidy model on paper. Not so nice in real life. The
need to vacuum in batch mode, to keep the database from dying, seems
intuitively bad.
 

We have a script that vacuums the database every 5 minutes, excessive - 
yes, but turns out that any less is no good really.  I think that this 
is sub optimal, the DB work keeps running, but the vacuum can slow down 
other tasks.  It also probably flushes data that we would need out of 
the page cache so it can look at data that isn't used often as the 
vacuum runs. Not the most optimal data access pattern I could imagine.



I think there must be answers to this problem. Even simple
optimizations, such as defining a table such that any delete or update
within a table, upon commit, will attempt to vacuum just the rows that
should not be considered free for any new transactions. If it's in
use by an active transaction, oh well. It can be picked up by a batch
run of vacuum. If it's free though - let's do it now.
 

Anything would be good - I think it's the achilles  heel of postgres. 
Perhaps there is something simple like that could fix 95% of the problem.



I think any optimizations we come up with, will be more happily accepted
with a working patch that causes no breakage... :-)

 


I am sure they would.

Cheers
Ralph


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 'Real' auto vacuum?

2005-08-30 Thread Mindaugas Riauba

  When a row is orphaned it's added to a list of possibly available rows.
  When a new row is needed the list of possible rows is examined and the
  first one with a transaction id less then the lowest running transaction
  id is chosen to be the new row?  These rows can be in a heap so it's
  really fast to find one.

 This is the long-term plan.However, it's actually a lot harder than it
 sounds.  Patches welcome.

  Some ETA? Since that would be the most welcome addition for us. We
have few very heavily updated databases where table bloat and constant
vacuuming is killing performance.

  Mindaugas


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly