Title: Farewell
It's
timefor formalacknowledgement that I'm not in The Project any
more.
I'm not interested in small features/fixes and have
no time for big ones.
It was this way for very long time and I don't see
how/when that could change.
My participation in The Project was one of the
So if you do this, do you still need to store that information in
pg_control at all?
Yes: to speeds up the recovery process.
If it's going to slow down the performance of my database when not doing
recovery (because I have to write two files for every transaction,
rather than one)
I'll be on vacation from 12/27/02 till 01/20/03.
Vadim
---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
It seems that locking tuples via LockTable at Phase 1 is not
required anymore, right?
We haven't put those hooks in yet, so the current version is master/slave.
So, you are not going to use any LockTable in Phase 1 on master right
now but you still need some LockTable in Phase 3 on slaves.
| Placing a restriction on an application that says it must treat the
values
| returned from a sequence as if they might not be committed is absurd.
|
| Why? The fact that you are not able to rollback sequences does not
| necessary mean that you are not required to perform commit to ensure
I have committed changes to implement this proposal. I'm not seeing
any significant performance difference on pgbench on my single-CPU
system ... but pgbench is I/O bound anyway on this hardware, so that's
not very surprising. I'll be interested to see what other people
observe. (Tatsuo,
Just wondering what is the status of this patch. Is seems from comments
that people like the idea. I have also looked in the archives for other
people looking for this kind of feature and have found alot of interest.
If you think it is a good idea for 7.2, let me know what needs to be
Can you explain how I would get the tblNode for an existing database
index files if it doesn't have the same OID as the database entry in
pg_databases.
Well, keeping in mind future tablespace implementation I would
add tblNode to pg_class and in pg_databases I'd have
defaultTblNode and
Well, ability to lock only unlocked rows in select for update is useful,
of course. But uniq features of user'locks are:
1. They don't interfere with normal locks hold by session/transaction.
2. Share lock is available.
3. User can lock *and unlock objects* inside transaction, which is not
1. Tx Old is running.
2. Tx S reads new transaction ID in GetSnapshotData() and swapped away
before SInval acquired.
3. Tx New gets new transaction ID, makes changes and commits.
4. Tx Old changes some row R changed by Tx New and commits.
5. Tx S gets snapshot data and now sees R
Baby girl on Jun 27.
Vadim
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/users-lounge/docs/faq.html
Yes, that is a good description. And old version is only required in the following
two cases:
1. the txn that modified this tuple is still open (reader in default committed read)
2. reader is in serializable transaction isolation and has earlier xtid
Seems overwrite smgr has mainly
You mean it is restored in session that is running the transaction ?
Depends on what you mean with restored. It first reads the heap page,
sees that it needs an older version and thus reads it from the rollback segment.
So are whole pages stored in rollback segments or just the
Hm. On the other hand, relying on WAL for undo means you cannot drop
old WAL segments that contain records for any open transaction. We've
already seen several complaints that the WAL logs grow unmanageably huge
when there is a long-running transaction, and I think we'll see a lot
more.
1. Space reclamation via UNDO doesn't excite me a whole lot, if we can
make lightweight VACUUM work well.
Sorry, but I'm going to consider background vacuum as temporary solution
only. As I've already pointed, original PG authors finally became
disillusioned with the same approach.
Were you going to use WAL to get free space from old copies too?
Considerable approach.
Vadim, I think I am missing something. You mentioned UNDO would be used
for these cases and I don't understand the purpose of adding what would
seem to be a pretty complex capability:
Yeh, we already
Really?! Once again: WAL records give you *physical* address of tuples
(both heap and index ones!) to be removed and size of log to read
records from is not comparable with size of data files.
You sure? With our current approach of dumping data pages into the WAL
on first change since
If its an experiment, shouldn't it be done outside of the main source
tree, with adequate testing in a high load situation, with a patch
released to the community for further testing/comments, before it is added
to the source tree? From reading Vadim's comment above (re:
pre-Postgres95),
Seriously, I don't think that my proposed changes need be treated with
quite that much suspicion. The only part that is really intrusive is
Agreed. I fight for UNDO, not against background vacuum -:)
the shared-memory free-heap-space-management change. But AFAICT that
will be a necessary
There's a report of startup recovery failure in Japan.
Redo done but ...
Unfortunately I have no time today.
Please ask to start up with wal_debug = 1...
Vadim
---(end of broadcast)---
TIP 2: you can get off all lists at once with the
BTW, I've got ~320tps with 50 clients inserting (int4, text[1-256])
records into 50 tables (-B 16384, wal_buffers = 256) on Ultra10
with 512Mb RAM, IDE (clients run on the same host as server).
Not bad. What were you getting before these recent changes?
As I already reported - with
Just committed changes in bufmgr.c
Regress tests passed but need more specific tests,
as usually. Descr as in CVS:
Check bufHdr-cntxDirty and call StartBufferIO in BufferSync()
*before* acquiring shlock on buffer context. This way we should be
protected against conflicts with
Tom, since you appear to be able to recreate the bug, can you comment on
this, as to whether we are okay now?
Sorry for the delay --- I was down in Norfolk all day, and am just now
catching up on email. I will pull Vadim's update and run the test some
more. However, last night I only
At this point I must humbly say "yes, you told me so", because if I
No, I didn't - I must humbly say that I didn't foresee this deadlock,
so "I didn't tell you so" -:)
Anyway, deadlock in my tests are very correlated with new log file
creation - something probably is still wrong...
Vadim
Anyway I like idea of StartUpID in page headers - this will help
Can you please describe StartUpID for me ?
Ideal would be a stamp that has the last (smallest open) XID, or something else
that has more or less timestamp characteristics (without the actual need of
wallclock)
in regard
StartUpID counts database startups and so has timestamp characteristics.
Actually, idea is to use SUI in future to allow reusing XIDs after startup: seeing
old SUI in data pages we'll know that all transaction on this page was committed
"long ago" (ie visible from MVCC POV). This requires
Ok, I've made changes in xlog.c and run tests:
Could you send me your diffs?
Sorry, Monday only.
Vadim
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
What do we debate?
I never told that we shouldn't worry about current WAL disability to restart.
And WAL already correctly works in situation of "failing to write a couple of disk
blocks when the system crashes".
My statement at first place that "WAL can't help in the event of disk errors"
was to
Was the following bug already fixed ?
I was going to ask same Q.
I see that seek+write was changed to write-s in XLogFileInit
(that was induced by subj, right?), but what about problem
itself?
DEBUG: redo starts at (0, 21075520)
The Data Base System is starting up
DEBUG: open(logfile 0
I thought the intended way to change a GUC parameter permanently was to
edit data/postgresql.conf . No ?
What I've thought is to implement a new command to
change archdir under WAL's control.
If it's different from Vadim's plan I don't object.
Actually, I have no concrete plans for
Before commit or rollback the xlog is not flushed to disk, thus you can loose
those xlog entries, but the index page might already be on disk because of
LRU buffer reuse, no ?
No. Buffer page is written to disk *only after corresponding records are flushed
to log* (WAL means Write-Ahead-Log -
I have just sent to the pgsql-patches list a rather large set of
Please send it to me directly - pgsql-patches' archieve is dated by Feb -:(
proposed diffs for the WAL code. These changes:
* Store two past checkpoint locations, not just one, in pg_control.
On startup, we fall back to
Before commit or rollback the xlog is not flushed to disk, thus you can loose
those xlog entries, but the index page might already be on disk because of
LRU buffer reuse, no ?
No. Buffer page is written to disk *only after corresponding records are flushed
to log* (WAL means
Consider the following scenario:
1. A new transaction inserts a tuple. The tuple is entered into its
heap file with the new transaction's XID, and an associated WAL log
entry is made. Neither one of these are on disk yet --- the heap tuple
is in a shmem disk buffer, and the WAL entry is
The point is to make the allocation of XIDs and OIDs work the same way.
In particular, if we are forced to reset the XLOG using what's stored in
pg_control, it would be good if what's stored in pg_control is a value
beyond the last-used XID/OID, not a value less than the last-used ones.
If
On third thought --- we could still log the original page contents and
the modification log record atomically, if what were logged in the xlog
record were (essentially) the parameters to the operation being logged,
not its results. That is, make the log entry before you start doing the
mod
Hm, wasn't it handling non-atomic disk writes, Andreas?
Yes, but for me, that was only one (for me rather minor) issue.
I still think that the layout of PostgreSQL pages was designed to
reduce the risc of a (heap) page beeing inconsistent because it is
only partly written to an
Hi!
Snow in New York - I'm arrived only today.
Reading mail...
Vadim
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done
on just the odd-numbered bytes and one on just the even-numbered bytes
of the datastream. That's hardly any stronger than a single 32-bit CRC;
I believe that the longer data the more chance to get same CRC/hash
for
I've reported the major problems to the mailing lists
but gotten almost no feedback about what to do.
I can't comment without access to code -:(
commit: 2001-02-26 17:19:57
0/0059996C: prv 0/00599948; xprv 0/; xid 0;
RM 0 info 00 len 32
checkpoint: redo 0/0059996C; undo
So 7.0.3 is twice as fast only with fsync off.
Are there FK updates/deletes in pgbench' tests?
Remember how SELECT FOR UPDATE in FK triggers
affects performance...
Also, 5 clients is small number.
Vadim
P.S. Sorry for delays with my replies -
internet connection is pain here: it takes
5-10
Regardless of whether this particular behavior is fixable,
this brings up something that I think we *must* do before
7.1 release: create a utility that blows away a corrupted
logfile to allow the system to restart with whatever is in
the datafiles. Otherwise, there is no recovery technique
It may be that WAL has changed the rollback
time-characteristics to worse than pre-wal ?
Nothing changed ... yet. And in future rollbacks
of read-only transactions will be as fast as now,
anyway.
What about rollbacks of a bunch uf inserts/updates/deletes?
I remember a scenario
It may be that WAL has changed the rollback
time-characteristics to worse than pre-wal ?
Nothing changed ... yet. And in future rollbacks
of read-only transactions will be as fast as now,
anyway.
So my guess is that the 7.1 updates (with default
fsync) are significantly slower than 7.0.3
It removes the need to disable fsync to get best performance!
-F performance is still better, only the difference is not so big as before.
Well, when "checkpoint seek in logs" will be implemented difference
will be the same - lost consistency.
Since there is a fundamental recovery
As you can see from the current open items list, there isn't much left
to do for the 7.1 release. I am going to suggest we remove the LAZY
VACUUM option at this point. I know Tom Lane posted an item about the
Well, leaving for vacation tomorrow I have to agree -:(
LAZY patch will be
O, your system reached max transaction ID -:(
That's two reports now of people who have managed to wrap around the XID
counter. It doesn't seem that hard to do in a heavily used database.
Does anyone want to take more seriously the stopgap solution I proposed
for this problem
from Feb 15 till Mar 6...
I'll not be able to read mail lists, so
in the event of needs please use
[EMAIL PROTECTED] address.
Regards!
Vadim
#2 0x20dc71 in abort () from /lib/libc.so.6
#3 0x8080495 in XLogFileOpen ()
Hm. Evidently it's failing to open the xlog file, but the code is set
up in such a way that it dies before telling you why :-( Take a look
at XLogFileOpen in src/backend/access/transam/xlog.c and tweak the
DEBUG: starting up
DEBUG: database system was interrupted at 2001-02-11 04:08:12
DEBUG: Checkpoint record at (0, 805076492)
postmaster: reaping dead processes...
Startup failed - abort
And that is it, from running 'postmaster -D /usr/local/pgsql/data/'. I get
the same thing each time I
Hm. It was OK to use spinlocks to control buffer access when the max
delay was just the time to read or write one disk page. But it sounds
Actually, btree split requires 3 simult. buffers locks and after that
_bt_getstackbuf may read *many* parent buffers while holding locks on
2 buffers.
Shouldn't we increase S_MAX_BUSY and use ERROR instead of FATAL?
No. If you have delays exceeding a minute, or that are even a visible
fraction of a minute, then a spinlock is NOT the correct mechanism to be
using to wait ... because guess what, it's spinning, and consuming
processor time
during the nightly vacuum pgsql closed and do not start any more.
Attached the log.
Seems the problem was rebuilding an Index,
There is a way to force wal to ignore indexes ?
The problem was in redoing tuple movement in *table*.
Can I delete it ?
...
DEBUG: redo starts at (6,
Here are the open items for 7.1. Much shorter:
+ Runtime btree recovery
Vadim
With Apache Mod Perl, Apache::DBI, stress test with apache bench (ab -n
10 -c 4) in apache error_log i've got:
[Pg7.1beta3 with standard conf files.]
And how many simult connections you did?
..
[Fri Jan 12 07:48:58 2001] [error] DBI-connect(dbname=mydb) failed: The
Data
No, I thought we agreed disk block CRC was way overkill. If the CRC on
the WAL log checks for errors that are not checked anywhere else, then
fine, but I thought disk CRC would just duplicate the I/O subsystem/disk
checks.
A disk-block CRC would detect partially written blocks (ie,
But CRC is used in WAL records only.
Oh. I thought we'd agreed that a CRC on each stored disk block would
be a good idea as well. I take it you didn't do that.
Do we want to consider doing this (and forcing another initdb)?
Or shall we say "too late for 7.1"?
I personally was never
Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
(pg_ctl -m fast stop), but shouldn't ABORT clean up everything?
Er, shouldn't ABORT leave the system in the exact state that it's
in so that one can get a crashdump/traceback on a wedged process
without it trying to
just committed. initdb is required.
*I didn't do any serious tests yet*, simple regression only.
I'll run more tests in the next couple of days.
Plase, help with this.
Vadim
when doing txlog switches there seems to be a problem with remembering the
correct = active logfile, when the postmaster crashes.
This is one of the problems I tried to show up previously:
You cannot rely on writes to other files except the txlog itself !!!
Why? If you handle those files
I totaly missed your point here. How closing source of ERserver is related
to closing code of PostgreSQL DB server? Let me clear things:
(not based on WAL)
That's wasn't clear from the blurb.
Still, this notion that PG, Inc will start producing closed-source products
poisons the well.
There is risk here. It isn't so much in the fact that PostgreSQL, Inc
is doing a couple of modest closed-source things with the code. After
all, the PG community has long acknowleged that the BSD license would
allow others to co-op the code and commercialize it with no obligations.
It is
As for replaying logs against a restored snapshot dump... AIUI, a
dump records tuples by OID, but the WAL refers to TIDs. Therefore,
the WAL won't work as a re-do log to recover your transactions
because the TIDs of the restored tables are all different.
True for current
In xlog.c, the declaration of struct ControlFileData says:
/*
* MORE DATA FOLLOWS AT THE END OF THIS STRUCTURE - locations of data
* dirs
*/
Is this comment accurate? I don't see any sign in the code of placing
extra data after the declared structure. If you're
Now WAL is ON by default. make distclean + initdb are required.
Vadim
Nope. Still fails...
I know, but looks better, eh? -:)
*** ./expected/opr_sanity.out Tue Nov 14 13:32:58 2000
--- ./results/opr_sanity.out Mon Nov 20 20:27:46 2000
***
*** 482,489
(p2.pronargs = 1 AND p1.aggbasetype = 0)));
oid | aggname | oid |
Larry Rosenman [EMAIL PROTECTED] writes:
Nope. Still fails...
You should've said that the OIDs are now just off-by-one from where they
were before, instead of off by several thousand. That I'm willing to
accept as an implementation change ;-) I've updated the expected file.
Actually,
Ok, so with CHECKPOINTS, we could move the offline log files to
somewhere else so that we could archive them, in my
undertstanding. Now question is, how we could recover from disaster
like losing every table files except log files. Can we do this with
WAL? If so, how can we do it?
Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself). That would offer less-than-
Hi!
I'll be in Las Vegas (Comdex) till next week
Wednesday.
I wasn't able to implement redo for sequences but
was going to turn WAL on by default anyway.
Unfortunately, I've got core in regress:opr_sanity
test (in FileSeek code), so WAL still is not
default.
See you!
Vadim
One idea I had from this is actually truncating pg_log at some point if
we know all the tuples have the special committed xid. It would prevent
the file from growing without bounds.
Not truncating, but implementing pg_log as set of files - we could remove
files for old xids.
Vadim, can you
So, we'll have to abort some long running transaction.
Well, yes, some transaction that continues running while ~ 500 million
other transactions come and go might give us trouble. I wasn't really
planning to worry about that case ;-)
Agreed, I just don't like to rely on assumptions -:)
I think that to handle locations we could symlink catalogs - ln -s
path_to_database_in_some_location .../base/DatabaseOid
But that's a kludge. We ought to discourage people from messing with the
storage internals.
It's not a kluge, it's a perfectly fine implementation. The only
"Mikheev, Vadim" [EMAIL PROTECTED] writes:
I think that at least 1 2 from WAL todo (checkpoints and port to
machines without TAS) is required before beta.
I'm not sure that you do need to add support for machines without TAS.
I pointed out a couple months ago that the non-TAS support
First, as I've already mentioned in answer to Tom about DROP TABLE, undo
logic will not be implemented in 7.1 -:( Doable for tables but for
indices we
would need either in compensation records or in xmin/cmin in index
tuples.
So, we'll still live with dust from aborted xactions in our
Mmmm, why not call FlushRelationBuffers? Calling bufmgr from smgr
doesn't look like right thing. ?
Yes, it's a little bit ugly, but if we call FlushRelationBuffers then we
will likely be doing some useless writes (to flush out pages that we are
only going to throw away anyway). If we
BTW, why do we force buffers to disk in FlushRelationBuffers at all?
Seems all what is required is to flush them *from* pool, not *to* disk
immediately.
Good point. Seems like it'd be sufficient to do a standard async write
rather than write + fsync.
We'd still need some additional
Now that we have numeric file names, I would like to have a command I
can run from psql that will dump a mapping of numeric file name to table
name, i.e.,
121233 pg_proc
143423 pg_index
select oid, relname from pg_class;
No. select relfilenode, relname from pg_class - in theory
In my understanding,locking levels you provided contains
an implicit share/exclusive lock on the corrsponding
pg_class tuple i.e. AccessExclusive Lock acquires an
exclusive lock on the corresping pg_class tuple and
other locks acquire a share lock, Is it right ?
No.
in general. What I'm proposing is that once an xact has touched a
table, other xacts should not be able to apply schema updates to that
table until the first xact commits.
I agree with you.
I don't know. We discussed this issue just after 6.5 and decided to
allow concurrent schema
As for locks,weak locks doesn't pass intensive locks. Dba
seems to be able to alter a table at any time.
Sorry, I don't understand this sentence. Tom suggested placing a shared
lock on
any table that is accessed until end of tx. Noone can alter table until
all users have
closed their txns
Bruce Momjian [EMAIL PROTECTED] writes:
Speaking of error messages, one idea for 7.2 might be to prepended
numbers to the error messages.
Isn't that long since on the TODO list? I know we've had long
discussions about a thoroughgoing revision of error reporting.
Yes, yes, yes! We need
I notice that ProcessUtility() calls SetQuerySnapshot() for FETCH
and COPY TO statements, and nothing else.
Seems to me this is very broken. Isn't a query snapshot needed for
any utility command that might do database accesses?
Not needed. We don't support multi-versioning for schema
Snapshot is made per top-level statement and functions/subqueries
use the same snapshot as that of top-level statement.
Not so. SetQuerySnapshot is executed per querytree, not per top-level
statement --- for example, if a rule generates multiple queries from
a user statement,
Seems to me this is very broken. Isn't a query snapshot needed for
any utility command that might do database accesses?
Not needed. We don't support multi-versioning for schema operations.
No? Seems to me we're almost there. Look for instance at that DROP
USER bug I just fixed: it
I am inclined to think that we should do SetQuerySnapshot in the outer
loop of pg_exec_query_string, just before calling
pg_analyze_and_rewrite. This would ensure that parse/plan accesses to
^^
Actually not - snapshot is passed as
Well, hopefully WAL will be ready for alpha testing in a few days.
Unfortunately
at the moment I have to step side from main stream to implement new file
naming,
the biggest todo for integration WAL into system.
I would really appreciate any help in the following issues (testing can
start
88 matches
Mail list logo