Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-09-04 Thread Bruce Momjian
Manfred Koizar wrote:
 [ still brainstorming ... ]
 
 On Thu, 21 Aug 2003 17:16:50 -0400, Tom Lane [EMAIL PROTECTED]
 wrote:
  Whenever a backend encounters a dead tuple it inserts a reference to
  its page into the RSM.
 
 This assumes that backends will visit dead tuples with significant
 probability.  I doubt that assumption is tenable;
 
 Good point.  What about:  Whenever a backend *deletes* a tuple it
 inserts a reference to its page into the RSM?  Then an entry in the
 RSM doesn't necessarily mean that the referenced page has reclaimable
 space, but it would still be valueable information.

Added to TODO:

* Maintain a map of recently-expired of pages so vacuum can reclaim
  free space without a sequential scan

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-28 Thread Grant Succeeded
[EMAIL PROTECTED] (Tom Lane) wrote in message news:[EMAIL PROTECTED]...
 Manfred Koizar [EMAIL PROTECTED] writes:
  better.  AFAICS Vivek's problem is that it is hard enough to hold a
  good part of the working set in the cache, and still his disks are
  saturated.  Now a VACUUM not only adds one more process to disk I/O
  contention, but also makes sure that the working set pages are *not*
  in memory which leads to higher I/O rates after the VACUUM.

This is a phenomenon I'm very familiar with with large databases (not
pg, but same techniques should apply).  Generally,  large OS
filesystem caches just get in the way when you're doing lots of IO on
a db thats big relative to the cache footprint.

perhaps restating:
The best for me by far, is to get the OS to *not* cache stuff.  As
long as the database uses the information it inherently has available,
it can make far more effective use of the same amount of memory the OS
would have used to cache the whole filesystem.  Furthermore, a cache
hit in the db is just a pointer lookup, and is far cheaper than a
read() system call resulting in a OS cache hit.  Even if the OS does a
good job, every hit still costs a read system call.

 
 We have had some people looking at improved buffer management
 algorithms; LRU-2 or something smarter would help.  I dunno whether
 we can dissuade the kernel from flushing its cache though.
 
once you get the relatively dumb, uninformed OS cache out of the way,
the DB has enough info to cache more effectively.

I'd love to see the new cache stuff, I'm still groking the 7.3 source
code.

my two favorite features of a certain other rdbms:
- it puts sequential IO pages near the end of the LRU, not the top. 
Pretty
  effective in avoiding cache churn.  The essential heuristic is index
scans
  go to the LRU end, full scan and (for pg) vaccuum go near the MRU
end.
  It's an effective, and hopefully straighforward thing. Does pg do
this?
- For still tougher performanc/stability cases it allows one to
segregate
  the cache into  different LRU lists, each with a configurable slice
of
  the overall buffer shared mem.  This for is when the application
itself causes
  the same cache churn phenomenon that vaccuum causes for the above
user.
  One configures distinct LRU pools, then optionally associates key
objects
  to non-default pools.  So when a block from a given object gets
pulled in,
  it goes to the LRU list the user chose for it (else default if user
didn't
  choose to put it in a distinct LRU pool)   THen when
  some random query does a huge scan on a huge table, it wont flush
the default
  cache pool if that table has been configured to another pool
  In most applications it's pretty clear which big,historical tables
should be
  segregated in this manner. but the default behavior (one big pool)
is the
  same as pg works today.

Hopefully these above features are only incremental changes to the
existing pg LRU?


  If we teach VACUUM to not read pages that don't contain any dead
  tuples, this could be a significant improvement.  I'm envisioning a
  data structure (reclaimable space map, RSM) similar to the FSM.
  Whenever a backend encounters a dead tuple it inserts a reference to
  its page into the RSM.



 
 This assumes that backends will visit dead tuples with significant
 probability.  I doubt that assumption is tenable; it's certainly not
 if you assume that no backend is doing seqscans.  (And if they are,
 then VACUUM is not the only I/O culprit...)
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-28 Thread Tom Lane
[EMAIL PROTECTED] (Grant Succeeded) writes:
 The best for me by far, is to get the OS to *not* cache stuff.  As
 long as the database uses the information it inherently has available,
 it can make far more effective use of the same amount of memory the OS
 would have used to cache the whole filesystem.

This is a very long-running debate in this community, and I think at
this point the majority opinion contradicts yours.  The OS buffer cache
has a number of fundamental advantages over what the database can do,
the most obvious being that it actually knows how much free memory is
available for buffer cache at any instant.  Also, the existing DBs that
take the we'll-do-the-buffer-management approach are designed around
thirty-year-old OS technology.  I'm not inclined to accept this position
without some substantial evidence in its favor.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-26 Thread Curt Sampson
On Thu, 21 Aug 2003, Tom Lane wrote:

 We have had some people looking at improved buffer management
 algorithms; LRU-2 or something smarter would help.  I dunno whether
 we can dissuade the kernel from flushing its cache though.

Using open/read/write system calls, you can't. You can always use an OS
like Solaris that should detect the sequential read, however, and avoid
blowing out the buffer cache.

Most operating systems support the madvise system call, which does let
you do things like say, I'm accessing this sequentially and I don't
need this to be buffered any more, though how much attention most
operating systems pay to this advice varies with the OS. However, it
turns out to be non-trivial to get postgres to use mmap for data blocks,
since you can't actually write any data to a mmmap'd block until you've
confirmed that the log entry has been written, because once you've
touched data in an mmaped block you have no way of stopping it from
being written to the disk right away.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.NetBSD.org
Don't you know, in this new Dark Age, we're all light.  --XTC

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-26 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 better.  AFAICS Vivek's problem is that it is hard enough to hold a
 good part of the working set in the cache, and still his disks are
 saturated.  Now a VACUUM not only adds one more process to disk I/O
 contention, but also makes sure that the working set pages are *not*
 in memory which leads to higher I/O rates after the VACUUM.

We have had some people looking at improved buffer management
algorithms; LRU-2 or something smarter would help.  I dunno whether
we can dissuade the kernel from flushing its cache though.

 If we teach VACUUM to not read pages that don't contain any dead
 tuples, this could be a significant improvement.  I'm envisioning a
 data structure (reclaimable space map, RSM) similar to the FSM.
 Whenever a backend encounters a dead tuple it inserts a reference to
 its page into the RSM.

This assumes that backends will visit dead tuples with significant
probability.  I doubt that assumption is tenable; it's certainly not
if you assume that no backend is doing seqscans.  (And if they are,
then VACUUM is not the only I/O culprit...)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-26 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 Good point.  What about:  Whenever a backend *deletes* a tuple it
 inserts a reference to its page into the RSM?  Then an entry in the
 RSM doesn't necessarily mean that the referenced page has reclaimable
 space, but it would still be valueable information.

That might work if the RSM were lossless, but in practice I think it'd
have to be lossy, like the FSM.  Which would mean that you'd still have
to do full-scan vacuums fairly regularly to make sure you hadn't
forgotten any freeable tuples.  Conceivably it could be a win, though,
if you could do frequent vacuum decents and only a full-scan vacuum
once in awhile (once a day maybe).

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Decent VACUUM (was: Buglist)

2003-08-21 Thread Manfred Koizar
On Wed, 20 Aug 2003 15:39:26 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
But I think the real point here is that there's no reason to think that
doing tuple deletion on-the-fly in foreground transactions is superior
to doing it in background with a vacuum process.  You're taking what
should be noncritical maintenance work and moving it into the critical
paths of your foreground applications.  Not only that, but you're
probably doing more total work per tuple --- VACUUM batches its work
in more ways than just the index cleanup aspect, IIRC.

Yes, I sign that, 100%.  That doesn't mean that we couldn't do any
better.  AFAICS Vivek's problem is that it is hard enough to hold a
good part of the working set in the cache, and still his disks are
saturated.  Now a VACUUM not only adds one more process to disk I/O
contention, but also makes sure that the working set pages are *not*
in memory which leads to higher I/O rates after the VACUUM.

I can imagine several use cases where only a small part of a large
relation is subject to DELETEs/UPDATEs.  Maybe Vivek's application
falls into this category.

If we teach VACUUM to not read pages that don't contain any dead
tuples, this could be a significant improvement.  I'm envisioning a
data structure (reclaimable space map, RSM) similar to the FSM.
Whenever a backend encounters a dead tuple it inserts a reference to
its page into the RSM.  Dead tuple detection is no problem, it is
already implemented for marking dead index tuples.  VACUUM, when run
in a new mode (decent), only checks pages that are listed in the
RSM.  To get full advantage of not doing unnecessary page reads, we'll
also need to redesign the index bulk delete routines.

The autovaccum daemon will watch the RSM and when the number of
entries is above a configurable threshold, it will start a
VACUUM DECENT ...

Servus
 Manfred

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-21 Thread Manfred Koizar
[ still brainstorming ... ]

On Thu, 21 Aug 2003 17:16:50 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
 Whenever a backend encounters a dead tuple it inserts a reference to
 its page into the RSM.

This assumes that backends will visit dead tuples with significant
probability.  I doubt that assumption is tenable;

Good point.  What about:  Whenever a backend *deletes* a tuple it
inserts a reference to its page into the RSM?  Then an entry in the
RSM doesn't necessarily mean that the referenced page has reclaimable
space, but it would still be valueable information.

Servus
 Manfred

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Decent VACUUM (was: Buglist)

2003-08-21 Thread Manfred Koizar
On Thu, 21 Aug 2003 17:56:02 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
Conceivably it could be a win, though,
if you could do frequent vacuum decents and only a full-scan vacuum
once in awhile (once a day maybe).

That's what I had in mind; similar to the current situation where you
can avoid expensive VACUUM FULL by doing lazy VACUUM frequently
enough.

Servus
 Manfred

---(end of broadcast)---
TIP 8: explain analyze is your friend