Re: Reviving Time Travel (was Re: [HACKERS] 'TID index')

2004-09-30 Thread Jim C. Nasby
To answer Simon's earlier comment about if I was looking to start
hacking on PostgreSQL... I'm not. :) I might take a look at the TODO
again, but I seem to do a great job of finding things to put on my plate
as it is. I am interested in minimizing the impact of vacuum, which is
why I brought my idea up originally. So, if anything, I'll be much more
interested to work on improving vacuum than on something else.

Heh, I was about to ask for the pages with dead tuples list to be added
to the TODO, but it seems it's already there ('Maintain a map of
recently-expired rows'). One thing that isn't there which I remember
being discussed was having the page-write daemon do a vacuum of a page
before it's written; has this been done already?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: Reviving Time Travel (was Re: [HACKERS] 'TID index')

2004-09-30 Thread Alvaro Herrera
On Thu, Sep 30, 2004 at 04:57:37PM -0500, Jim C. Nasby wrote:

 Heh, I was about to ask for the pages with dead tuples list to be added
 to the TODO, but it seems it's already there ('Maintain a map of
 recently-expired rows'). One thing that isn't there which I remember
 being discussed was having the page-write daemon do a vacuum of a page
 before it's written; has this been done already?

Not done, but discussed.  One problem is that to vacuum a page, you may
need to bring the related index pages to memory, which is more work than
you are saving.

One idea would be to vacuum the page if it can be determined that the
relation doesn't have indexes.  This information is generally not known,
because the index list isn't constructed until/unless someone explicitly
asks for it.  But anyway, do you have a lot of tables with no index?
And how many of them are big enough to warrant doing this?

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today (Mary Gardiner)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: Reviving Time Travel (was Re: [HACKERS] 'TID index')

2004-09-30 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 One idea would be to vacuum the page if it can be determined that the
 relation doesn't have indexes.  This information is generally not known,

... especially not by the page writer.  You can't assume that you have
access to the relation descriptor --- for instance, it's entirely
possible that the bgwriter (or another backend) will need to push out a
dirty page that belongs to a newly created relation for which the
catalog data remains uncommitted.  There are related scenarios involving
uncommitted drops.

More generally I think that invoking VACUUM processing from the bgwriter
would be a serious violation of the module hierarchy, and would inflict
more pain in the form of bugs and maintenance headaches than it could
possibly be worth.  We just this version managed to get smgr decoupled
from the relcache, like it should have been all along.  (bufmgr should
be too, but I haven't tackled that yet...)  This was actually a
necessary step to make the separate bgwriter feasible.  Let's not
reverse that cleanup in pursuit of dubious optimizations.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: Reviving Time Travel (was Re: [HACKERS] 'TID index')

2004-09-30 Thread Jim C. Nasby
On Thu, Sep 30, 2004 at 07:11:29PM -0400, Tom Lane wrote:
 More generally I think that invoking VACUUM processing from the bgwriter
 would be a serious violation of the module hierarchy, and would inflict
 more pain in the form of bugs and maintenance headaches than it could
 possibly be worth.  We just this version managed to get smgr decoupled
 from the relcache, like it should have been all along.  (bufmgr should
 be too, but I haven't tackled that yet...)  This was actually a
 necessary step to make the separate bgwriter feasible.  Let's not
 reverse that cleanup in pursuit of dubious optimizations.

Yeah, I thought about the same thing. It would certainly be more modular
to have a vacuum daemon that runs ahead of the page writer.

As for the indexes, would it be reasonable to see if the required index
pages were already in memory?

Ultimately, it's going to depend on the table and access patterns as to
whether this provides a speed improvement. A table with a lot of indexes
and a lot of updates per page might not see much benefit. A table where
updates are spread across the table would more likely benefit even if it
does have to bring index pages into memory. The alternative is the
vacuum running later, having to bring in the base data page, and then
all the indexes anyway. But once there is a list of pages with dead
tuples only one read and one write would be saved, which probably isn't
worth the extra code.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: Reviving Time Travel (was Re: [HACKERS] 'TID index')

2004-09-28 Thread Simon Riggs
Hannu Krosing [mailto:[EMAIL PROTECTED]
 On P, 2004-09-26 at 09:17, Tom Lane wrote:
  Ross J. Reedstrom [EMAIL PROTECTED] writes:
   ... So, all this append-only writing leads to files with lots of dead
   tuples, so the vacuum command was added to reclaim space.
  
  Actually, I believe the original Postgres design envisioned that no
  tuple would ever be permanently deleted: the idea was that you would
  always be able to query the database state as of past times as well
  as the present instant.  Stonebraker intended to use the WORM drive as
  the repository for dead tuples that might be needed to answer such
  historical queries.  The vacuum cleaner was originally a background
  process that pushed dead tuples from magnetic disk to WORM storage.
 
 We are now getting back to the point where the background process part
 is true again - how hard would it be to modify vacuum to write recalimed
 tuples to somewhere (network pipe, WORM drive, other DB).
 
 It seems that in addition to writing deleted tuples out to history DB
 and making create and delete transaction ids explicitly visible (and do
 something(tm) about the transaction id wraparound while at it), the only
 thing left to do is some kind of transaction time log - and voila! we
 have the original Time Travel feature back - a great feature for
 resolving both audit trail and clumsy dba problems. 
 
 The modern WORM drive equivalent is an IDE(-RAID) disk with its very
 tape-like access profile (3+ hours to write full 300GB disk, random
 access order(s) of magnitude slower than sequential);
 
 So if someone is looking for a project, this seems to be something that
 is both theoretically possible and also theoretically useful ;)
 

Yes, I thought that too - but not for me, not now.

Look here: http://research.microsoft.com/db/ImmortalDB/
for similar

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 'TID index'

2004-09-26 Thread Tom Lane
Ross J. Reedstrom [EMAIL PROTECTED] writes:
 ... So, all this append-only writing leads to files with lots of dead
 tuples, so the vacuum command was added to reclaim space.

Actually, I believe the original Postgres design envisioned that no
tuple would ever be permanently deleted: the idea was that you would
always be able to query the database state as of past times as well
as the present instant.  Stonebraker intended to use the WORM drive as
the repository for dead tuples that might be needed to answer such
historical queries.  The vacuum cleaner was originally a background
process that pushed dead tuples from magnetic disk to WORM storage.
Its current manifestation is dumbed-down from that, since we only
delete rows and make no attempt to save them somewhere else --- but
it was always an integral part of the system design.

It's quite entertaining to read The design of the POSTGRES storage
system (ERL-M87-06, available at http://db.cs.berkeley.edu//papers/)
and compare it to where we are now.  There is just enough similarity
that it's obviously the ancestor of our present code ... but there is
also a lot in that paper that has left *no* trace in our present code.
I would love to know just how much of the paper actually got implemented
and then discarded, and how much never got beyond the arm-waving stage.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 'TID index'

2004-09-25 Thread Simon Riggs
Jim C. Nasby
 On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote:
  There are many good ideas out there, yet it is almost impossible to find
  somebody else to implement yours!
 
  The acid test is to try and write it...
 
  Overall, I agree VACUUM could do with some tuning - and 8.0 has
 just that.
  It needs very careful thought to make sure both concurrency and
  recoverability considerations are fully met in any solution you come up
  with.

 Before I try and tackle this I wanted to investigate why PostgreSQL's
 MVCC depends on vacuum instead of going with an undo log ala Oracle or
 InnoDB. The only thing I could find is this thread
 (http://tinyurl.com/42opl) which doesn't say all that much.

 Has anyone looked at the performance impact of having to vacuum vs a
 method that doesn't recquire vacuuming? Obviously rollback with an undo
 log is slow, but it seems it's better to optimize for the normal
 situation of committing.

 I'm not looking for a holy war here but I'd like to know if any analysis
 has happened to determine how efficient vacuum can be made and if it's
 as efficient as just deleting an undo log once you no longer need it.

Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole
of its history. There is much written on this and you should search some
more - references are in the manual.
http://citeseer.ist.psu.edu/cache/papers/cs/4130/http:zSzzSzwuarchive.wustl.
eduzSzpackageszSzpostgreszSzpaperszSzERL-M87-06.pdf/stonebraker87design.pdf

Put simply, MVCC requires vacuum - they are two halves of the overall
mechanism. An Undo log would be an alternative to MVCC and therefore offer
an alternative to vacuum. But there are other things to consider - no other
RDBMS requires a vacuum, but all of them have SOME requirement for
off-line scanning of the database, whether its ANALYZE, or REORG, or dbcc
(or more than of those...). Removing vacuum won't remove the other
requirements, so there is less to be gained than you might think.

IMHO vacuum could be optimised further in terms of what it does and how it
does it, but many of those optimizations wouldn't apply generally - so you'd
then either be introducing further tweaking requirements for DBAs (which are
frowned upon) or giving yourself the additional task of auto-optimizing
their deployment.

In PostgreSQL 8.0, vacuum has been enhanced so that it operates in the
background, avoids locks, doesn't spoil the dbpage cache - on top of being
automated. That approach is simple and pragmatic and that's hard to beat.

I think the effort of altering the current storage system is not something
you'd have people invest their time in (round here) and would probably
regard even the debate as wasted effort (and remember that is a most closely
guarded resource on the project). PG does allow you to write your own
storage manager...

If you're keen to code, there are many TODO items that are pre-agreed as
being required (from earlier discussions). That's probably the best place to
start. Having said that, research into why the code is the way it is can be
interesting and very educational - overall, the code really is written
fairly optimally from a robustness and performance perspective - you need to
be a fair way in to spot opps for improvement. Don't be put off - get stuck
in.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 'TID index'

2004-09-25 Thread Ross J. Reedstrom
On Sat, Sep 25, 2004 at 11:14:53AM +0100, Simon Riggs wrote:
 Jim C. Nasby
 
 Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole
 of its history. There is much written on this and you should search some
 more - references are in the manual.

Well, not quite it's whole history: MVCC showed up in 6.5. Vacuum's been
there since before SQL. Actually, is a bit of a historical accident. My
understanding of the squence of events is that Hewlett-Packard donated
an early WORM optical drive to the Stonebraker lab. Since it's write
once, it had the beavior that you could only append to files. Someone
thought it might be useful for auditing, etc., so they wrote the first
storage mananger for postgres to accommodate that drive. The other
storage manager at the time was for battery-backed, persistent RAM.
So, all this append-only writing leads to files with lots of dead
tuples, so the vacuum command was added to reclaim space. I think on the
WORM drive, this was supposed to mark blocks 'invisible' in some sense.

I don't know if the WORM drive ever actually got used with postgres.

Ross

-- 
Ross Reedstrom, Ph.D. [EMAIL PROTECTED]
Research Scientist  phone: 713-348-6166
The Connexions Project  http://cnx.rice.edu   fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] 'TID index'

2004-09-15 Thread Jim C. Nasby
I just had a thought that could potentially greatly improve vacuum
performance. What about some kind of TID (or does vacuum use CID?)
index? This would allow vacuum to visit only the pages it needs to
visit. Actually, I guess TID/CID wouldn't even be involved; the only
information needed would be if any tuples on a page have been marked
deleted. Something as simple as a bitmap could work. Storing TID info
might provide added vacuum efficiency, but my guess is it's probably not
worth the extra effort.

This might not help much for tables that just see a lot of random update
activity, but I think it would be very useful for large tables where
pages with dead tuples are likely to be a small percentage of the total
number of pages.

Maintaining this information on a per-transaction basis might prove
difficult to do without causing concurrency issues. Luckily, I think
this could probably be done in the background without much difficulty.
One possibility is to check for dead tuples as pages are written to disk
(actually, by definition, there would have to be dead tuples at that
point I would think). If memory serves writing these pages is now a
background process, so this shouldn't cause contention issues.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 'TID index'

2004-09-15 Thread Simon Riggs
 Jim C. Nasby wrote
 I just had a thought that could potentially greatly improve vacuum
 performance. What about some kind of TID (or does vacuum use CID?)
 index? This would allow vacuum to visit only the pages it needs to
 visit. Actually, I guess TID/CID wouldn't even be involved; the only
 information needed would be if any tuples on a page have been marked
 deleted. Something as simple as a bitmap could work. Storing TID info
 might provide added vacuum efficiency, but my guess is it's probably not
 worth the extra effort.

 This might not help much for tables that just see a lot of random update
 activity, but I think it would be very useful for large tables where
 pages with dead tuples are likely to be a small percentage of the total
 number of pages.


 Maintaining this information on a per-transaction basis might prove
 difficult to do without causing concurrency issues. Luckily, I think
 this could probably be done in the background without much difficulty.
 One possibility is to check for dead tuples as pages are written to disk
 (actually, by definition, there would have to be dead tuples at that
 point I would think). If memory serves writing these pages is now a
 background process, so this shouldn't cause contention issues.

There are many good ideas out there, yet it is almost impossible to find
somebody else to implement yours!

The acid test is to try and write it...

Overall, I agree VACUUM could do with some tuning - and 8.0 has just that.
It needs very careful thought to make sure both concurrency and
recoverability considerations are fully met in any solution you come up
with.

Best regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 'TID index'

2004-09-15 Thread Jim C. Nasby
On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote:
 There are many good ideas out there, yet it is almost impossible to find
 somebody else to implement yours!
 
 The acid test is to try and write it...
 
 Overall, I agree VACUUM could do with some tuning - and 8.0 has just that.
 It needs very careful thought to make sure both concurrency and
 recoverability considerations are fully met in any solution you come up
 with.
 
Heh, I wasn't even thinking of implentation yet. :) I fully understand
the lack of developers.

Unfortunately, I have very little idea on the internals of PGSQL, and
I'm decidedly not a C coder. I *might* be able to get something hacked
up that stores info in a table (since that would mean all the space
management stuff would be handled for me).

If this is a worthwhile idea can we at least get a TODO? Would it be
useful to come up with a high-level design (something I could probably
do)?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster