So are whole pages stored in rollback segments or just
the modified data?
This is implementation dependent. Storing whole pages is
much easy to do, but obviously it's better to store just
modified data.
I am not sure it is necessarily better. Seems to be a tradeoff
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback
segment.
So are whole pages stored in rollback segments or just
So are whole pages stored in rollback segments or just
the modified data?
This is implementation dependent. Storing whole pages is
much easy to do, but obviously it's better to store just
modified data.
I am not sure it is necessarily better. Seems to be a tradeoff here.
pros
Seems overwrite smgr has mainly advantages in terms of
speed for operations other than rollback.
... And rollback is required for 5% transactions ...
This obviously depends on application.
Small number of aborted transactions was used to show
useless of UNDO in terms of space
OTOH it is possible to do without rolling back at all as
MySQL folks have shown us ;)
Not with SDB tables which support transactions.
My point was that MySQL was used quite a long time without it
and still quite many useful applications were produced.
And my point was that
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback segment.
I guess thet it could be slower than our current way of doing it.
Yes, that is a good description. And old version is only required in the following
two cases:
1. the txn that modified this tuple is still open (reader in default committed read)
2. reader is in serializable transaction isolation and has earlier xtid
Seems overwrite smgr has mainly
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback segment.
So are whole pages stored in rollback segments or just the
Impractical ? Oracle does it.
Oracle has MVCC?
With restrictions, yes.
What restrictions? Rollback segments size?
No, that is not the whole story. The problem with their rollback segment approach is,
that they do not guard against overwriting a tuple version in the rollback
Do we want to head for an overwriting storage manager?
Not sure.
Advantages: UPDATE has easy space reuse because usually done
in-place, no index change on UPDATE unless key is changed.
Disadvantages: Old records have to be stored somewhere for MVCC use.
Could limit transaction
Oracle has MVCC?
With restrictions, yes.
What restrictions? Rollback segments size?
No, that is not the whole story. The problem with their
rollback segment approach is, that they do not guard against
overwriting a tuple version in the rollback segment.
They simply recycle
Removing dead records from rollback segments should
be faster than from datafiles.
Is it for better locality or are they stored in a different way ?
Locality - all dead data would be localized in one place.
Do you think that there is some fundamental performance advantage
in making a
Impractical ? Oracle does it.
Oracle has MVCC?
With restrictions, yes.
What restrictions? Rollback segments size?
Non-overwriting smgr can eat all disk space...
You didn't know that? Vadim did ...
Didn't I mention a few times that I was
inspired by Oracle? -:)
Vadim
If PostgreSQL wants to stay MVCC, then we should imho forget
overwriting smgr very fast.
Let me try to list the pros and cons that I can think of:
Pro:
no index modification if key stays same
no search for free space for update (if tuple still
fits into page)
I think so too. I've never said that an overwriting smgr
is easy and I don't love it particularily.
What I'm objecting is to avoid UNDO without giving up
an overwriting smgr. We shouldn't be noncommittal now.
Why not? We could decide to do overwriting smgr later
and implement UNDO then.
- A simple typo in psql can currently cause a forced
rollback of the entire TX. UNDO should avoid this.
Yes, I forgot to mention this very big advantage, but undo is
not the only possible way to implement savepoints. Solutions
using CommandCounter have been discussed.
This would be
Oracle has MVCC?
With restrictions, yes.
What restrictions? Rollback segments size?
Non-overwriting smgr can eat all disk space...
Is'nt the same true for an overwriting smgr ? ;)
Removing dead records from rollback segments should
be faster than from datafiles.
You didn't
At 01:51 25/05/01 +0500, Hannu Krosing wrote:
How does it do MVCC with an overwriting storage manager ?
I don't know about Oracle, but Dec/RDB also does overwriting and MVCC. It
does this by taking a snapshot of pages that are participating in both RW
and RO transactions (De/RDB has the
The downside would only be, that long running txn's cannot
[easily] rollback to savepoint.
We should implement savepoints for all or none transactions, no?
We should not limit transaction size to online available disk space for WAL.
Imho that is much more important. With guaranteed undo
If community will not like UNDO then I'll probably try to implement
Imho UNDO would be great under the following circumstances:
1. The undo is only registered for some background work process
and not done in the client's backend (or only if it is a small txn).
2.
People also have referred to an overwriting smgr
easily. Please tell me how to introduce an overwriting smgr
without UNDO.
There is no way. Although undo for an overwriting smgr would involve a
very different approach than with non-overwriting. See Vadim's post about what
info suffices
At 14:33 22/05/01 -0700, Mikheev, Vadim wrote:
If community will not like UNDO then I'll probably try to implement
dead space collector which will read log files and so on.
I'd vote for UNDO; in terms of usability friendliness it's a big win.
Tom's plans for FSM etc are, at least, going to
If community will not like UNDO then I'll probably try to implement
dead space collector which will read log files and so on.
I'd vote for UNDO; in terms of usability friendliness it's a big win.
Could you please try it a little more verbose ? I am very interested in
the advantages you
At 11:25 23/05/01 +0200, Zeugswetter Andreas SB wrote:
If community will not like UNDO then I'll probably try to implement
dead space collector which will read log files and so on.
I'd vote for UNDO; in terms of usability friendliness it's a big win.
Could you please try it a little more
- A simple typo in psql can currently cause a forced rollback of the entire
TX. UNDO should avoid this.
Yes, I forgot to mention this very big advantage, but undo is not the only possible
way
to implement savepoints. Solutions using CommandCounter have been discussed.
Although the pg_log
Hiroshi Inoue [EMAIL PROTECTED] writes:
I guess that is the question. Are we heading for an overwriting storage
manager?
I've never heard that it was given up. So there seems to be
at least a possibility to introduce it in the future.
Unless we want to abandon MVCC (which I don't), I think
Hiroshi Inoue [EMAIL PROTECTED] writes:
Tom Lane wrote:
Unless we want to abandon MVCC (which I don't), I think an overwriting
smgr is impractical.
Impractical ? Oracle does it.
Oracle has MVCC?
regards, tom lane
---(end of
Todo:
1. Compact log files after checkpoint (save records of uncommitted
transactions and remove/archive others).
On the grounds that undo is not guaranteed anyway (concurrent heap access),
why not simply forget it, since above sounds rather expensive ?
The downside would only be, that
As a rule of thumb, online applications that hold open
transactions during user interaction are considered to be
Broken By Design (tm). So I'd slap the programmer/design
team with - let's use the server box since it doesn't contain
anything useful.
We
Correct me if I am wrong, but both cases do present a problem currently
in 7.1. The WAL log will not remove any WAL files for transactions that
are still open (even after a checkpoint occurs). Thus if you do a bulk
insert of gigabyte size you will require a gigabyte sized WAL
REDO in oracle is done by something known as a 'rollback segment'.
You are not seriously saying that you like the rollback segments in Oracle.
They only cause trouble:
1. configuration (for every different workload you need a different config)
2. snapshot too old
3. tx abort because
And, I cannot say that I would implement UNDO because of
1. (cleanup) OR 2. (savepoints) OR 4. (pg_log management)
but because of ALL of 1., 2., 4.
OK, I understand your reasoning here, but I want to make a comment.
Looking at the previous features you added, like subqueries, MVCC, or
We could keep share buffer lock (or add some other kind of lock)
untill tuple projected - after projection we need not to read data
for fetched tuple from shared buffer and time between fetching
tuple and projection is very short, so keeping lock on buffer will
not impact concurrency
We could keep share buffer lock (or add some other kind of lock)
untill tuple projected - after projection we need not to read data
for fetched tuple from shared buffer and time between fetching
tuple and projection is very short, so keeping lock on buffer will
not impact concurrency
Mikheev, Vadim [EMAIL PROTECTED] writes:
I'm not sure that the time to do projection is short though
--- what if there are arbitrary user-defined functions in the quals
or the projection targetlist?
Well, while we are on this subject I finally should say about issue
bothered me for long
Vadim Mikheev [EMAIL PROTECTED] writes:
It probably will not cause more IO than vacuum does right now.
But unfortunately it will not reduce that IO.
Uh ... what? Certainly it will reduce the total cost of vacuum,
because it won't bother to try to move tuples to fill holes.
The index cleanup
Vadim, can you remind me what UNDO is used for?
4. Split pg_log into small files with ability to remove old ones (which
do not hold statuses for any running transactions).
They are already small (16Mb). Or do you mean even smaller ?
This imposes one huge risk, that is already a pain in
Vadim, can you remind me what UNDO is used for?
4. Split pg_log into small files with ability to remove old ones (which
do not hold statuses for any running transactions).
and I wrote:
They are already small (16Mb). Or do you mean even smaller ?
Sorry for above little confusion of
Really?! Once again: WAL records give you *physical* address of tuples
(both heap and index ones!) to be removed and size of log to read
records from is not comparable with size of data files.
So how about a background vacuum like process, that reads the WAL
and does the cleanup ? Seems that
Would it be possible to split the WAL traffic into two sets of files,
Sure, downside is two fsyncs :-( When I first suggested physical log
I had a separate file in mind, but that is imho only a small issue.
Of course people with more than 3 disks could benefit from a split.
Tom: If your
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
Tom: If your ratio of physical pages vs WAL records is so bad, the config
should simply be changes to do fewer checkpoints (say every 20 min like a
typical Informix setup).
I was using the default configuration. What caused the problem was
My point is that we'll need in dynamic cleanup anyway and UNDO is
what should be implemented for dynamic cleanup of aborted changes.
I do not yet understand why you want to handle aborts different than outdated
tuples. The ratio in a well tuned system should well favor outdated tuples.
If
It probably will not cause more IO than vacuum does right now.
But unfortunately it will not reduce that IO.
Uh ... what? Certainly it will reduce the total cost of vacuum,
because it won't bother to try to move tuples to fill holes.
Oh, you're right here, but daemon will most likely
Tom: If your ratio of physical pages vs WAL records is so bad, the config
should simply be changes to do fewer checkpoints (say every 20 min like a
typical Informix setup).
I was using the default configuration. What caused the problem was
probably not so much the standard 5-minute
I hope we can avoid on-disk FSM. Seems to me that that would create
problems both for performance (lots of extra disk I/O) and reliability
(what happens if FSM is corrupted? A restart won't fix it).
We can use WAL for FSM.
Vadim
---(end of
Really?! Once again: WAL records give you *physical*
address of tuples (both heap and index ones!) to be
removed and size of log to read records from is not
comparable with size of data files.
So how about a background vacuum like process, that reads
the WAL and does the cleanup ?
Jan Wieck [EMAIL PROTECTED] writes:
I think the in-shared-mem FSM could have some max-per-table
limit and the background VACUUM just skips the entire table
as long as nobody reused any space.
I was toying with the notion of trying to use Vadim's MNMB idea
(see his
My point is that we'll need in dynamic cleanup anyway and UNDO is
what should be implemented for dynamic cleanup of aborted changes.
I do not yet understand why you want to handle aborts different than
outdated tuples.
Maybe because of aborted tuples have shorter Time-To-Live.
And
From: Mikheev, Vadim
Sent: Monday, May 21, 2001 10:23 AM
To: 'Jan Wieck'; Tom Lane
Cc: The Hermit Hacker; 'Bruce Momjian';
[EMAIL PROTECTED]
Strange address, Jan?
Subject: RE: [HACKERS] Plans for solving the VACUUM problem
I think the in-shared-mem FSM could have some max-per
Correct me if I am wrong, but both cases do present a problem
currently in 7.1. The WAL log will not remove any WAL files
for transactions that are still open (even after a checkpoint
occurs). Thus if you do a bulk insert of gigabyte size you will
require a gigabyte sized WAL directory.
Hm. On the other hand, relying on WAL for undo means you cannot drop
old WAL segments that contain records for any open transaction. We've
already seen several complaints that the WAL logs grow unmanageably huge
when there is a long-running transaction, and I think we'll see a lot
more.
Vadim Mikheev [EMAIL PROTECTED] writes:
1. Space reclamation via UNDO doesn't excite me a whole lot, if we can
make lightweight VACUUM work well.
Sorry, but I'm going to consider background vacuum as temporary solution
only. As I've already pointed, original PG authors finally became
1. Space reclamation via UNDO doesn't excite me a whole lot, if we can
make lightweight VACUUM work well.
Sorry, but I'm going to consider background vacuum as temporary solution
only. As I've already pointed, original PG authors finally became
disillusioned with the same approach.
Were you going to use WAL to get free space from old copies too?
Considerable approach.
Vadim, I think I am missing something. You mentioned UNDO would be used
for these cases and I don't understand the purpose of adding what would
seem to be a pretty complex capability:
Yeh, we already
Vadim Mikheev [EMAIL PROTECTED] writes:
Really?! Once again: WAL records give you *physical* address of tuples
(both heap and index ones!) to be removed and size of log to read
records from is not comparable with size of data files.
You sure? With our current approach of dumping data pages
On Sun, 20 May 2001, Vadim Mikheev wrote:
1. Space reclamation via UNDO doesn't excite me a whole lot, if we can
make lightweight VACUUM work well.
Sorry, but I'm going to consider background vacuum as temporary solution
only. As I've already pointed, original PG authors finally
The Hermit Hacker [EMAIL PROTECTED] writes:
If its an experiment, shouldn't it be done outside of the main source
tree, with adequate testing in a high load situation, with a patch
released to the community for further testing/comments, before it is added
to the source tree?
Mebbe we
Really?! Once again: WAL records give you *physical* address of tuples
(both heap and index ones!) to be removed and size of log to read
records from is not comparable with size of data files.
You sure? With our current approach of dumping data pages into the WAL
on first change since
If its an experiment, shouldn't it be done outside of the main source
tree, with adequate testing in a high load situation, with a patch
released to the community for further testing/comments, before it is added
to the source tree? From reading Vadim's comment above (re:
pre-Postgres95),
Seriously, I don't think that my proposed changes need be treated with
quite that much suspicion. The only part that is really intrusive is
Agreed. I fight for UNDO, not against background vacuum -:)
the shared-memory free-heap-space-management change. But AFAICT that
will be a necessary
Vadim Mikheev [EMAIL PROTECTED] writes:
Unfortunately, I think that we'll need in on-disk FSM and that FSM is
actually the most complex thing to do in space reclamation project.
I hope we can avoid on-disk FSM. Seems to me that that would create
problems both for performance (lots of extra
Bruce Momjian [EMAIL PROTECTED] writes:
In fact, multi-query transactions are just a special case of
subtransactions, where all previous subtransactions are
committed/visible. We could use the same pg_log-style memory area for
multi-query transactions, eliminating the command counter and
Philip Warner [EMAIL PROTECTED] writes:
At 19:05 17/05/01 -0400, Tom Lane wrote:
1. Forget moving tuples from one page to another.
Could this be done opportunistically, meaning it builds up a list of
candidates to move (perhaps based on emptiness of page), then moves a
subset of these in
I have been thinking about the problem of VACUUM and how we
might fix it for 7.2. Vadim has suggested that we should
attack this by implementing an overwriting storage manager
and transaction UNDO, but I'm not totally comfortable with
that approach: it seems to me that it's an awfully
Mikheev, Vadim [EMAIL PROTECTED] writes:
If a tuple is dead, we care not whether its index entries are still
around or not; so there's no risk to logical consistency.
What does this sentence mean? We canNOT remove dead heap tuple untill
we know that there are no index tuples referencing it
On Fri, May 18, 2001 at 06:10:10PM -0700, Mikheev, Vadim wrote:
Vadim, can you remind me what UNDO is used for?
Ok, last reminder -:))
On transaction abort, read WAL records and undo (rollback)
changes made in storage. Would allow:
1. Reclaim space allocated by aborted transactions.
Isn't current implementation bulk delete ?
No, the index AM is called separately for each index tuple to be
deleted; more to the point, the search for deletable index tuples
should be moved inside the index AM for performance reasons.
Wouldn't a sequential scan on the heap table be
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
foreach tuple in heap that can be deleted do:
foreach index
call the current index delete with constructed key and xtid
See discussion with Hiroshi. This is much more complex than TID-based
delete and would be faster only
A particular point worth making is that in the common case where you've
updated the same row N times (without changing its index key), the above
approach has O(N^2) runtime. The indexscan will find all N index tuples
matching the key ... only one of which is the one you are looking for on
Zeugswetter Andreas SB [EMAIL PROTECTED] writes:
It was my understanding, that the heap xtid is part of the key now,
It is not.
There was some discussion of doing that, but it fell down on the little
problem that in normal index-search cases you *don't* know the heap tid
you are looking for.
Oleg Bartunov [EMAIL PROTECTED] writes:
On Thu, 17 May 2001, Tom Lane wrote:
We will also want to look at upgrading the non-btree index types to allow
concurrent operations.
am I right you plan to work with GiST indexes as well ?
We read a paper Concurrency and Recovery in Generalized
Vadim, can you remind me what UNDO is used for?
Ok, last reminder -:))
On transaction abort, read WAL records and undo (rollback)
changes made in storage. Would allow:
1. Reclaim space allocated by aborted transactions.
2. Implement SAVEPOINTs.
Just to remind -:) - in the event of error
Bruce Momjian [EMAIL PROTECTED] writes:
I am confused why we can't implement subtransactions as part of our
command counter? The counter is already 4 bytes long. Couldn't we
rollback to counter number X-10?
That'd work within your own transaction, but not from outside it.
After you commit,
Bruce Momjian [EMAIL PROTECTED] writes:
Hey, I have an idea. Can we do subtransactions as separate transactions
(as Tom mentioned), and put the subtransaction id's in the WAL, so they
an be safely committed/rolledback as a group?
It's not quite that easy: all the subtransactions have to
I have been thinking about the problem of VACUUM and how we might fix it
for 7.2. Vadim has suggested that we should attack this by implementing
an overwriting storage manager and transaction UNDO, but I'm not totally
comfortable with that approach: it seems to me that it's an awfully large
Matthew T. O'Connor [EMAIL PROTECTED] writes:
Another quick thought for handling FSM contention problems. A backend could
give up waiting for access to the FSM after a short period of time, and just
append it's data to the end of the file the same way it's done now. Dunno
if that is
At 19:05 17/05/01 -0400, Tom Lane wrote:
But having said that, there's no reason to remove the existing VACUUM
code: we can keep it around for situations where you need to crunch a
table as much as possible and you can afford to lock the table while
you do it.
It would be great if this was
Mike Mascari [EMAIL PROTECTED] writes:
Very neat. You mention that the truncation of both heap and index
relations is not necessarily mandatory. Under what conditions would
either of them be truncated?
In the proposal as given, a heap file would be truncated if (a) it
has at least one
Bruce Momjian [EMAIL PROTECTED] writes:
The only question I have is about the Free Space Map. It would seem
better to me if we could get this map closer to the table itself, rather
than having every table of every database mixed into the same shared
memory area. I can just see random table
79 matches
Mail list logo