On Fri, 2006-05-19 at 17:27 +0100, Simon Riggs wrote:
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
OK, I'm on it.
What solution have you got in mind? I was thinking about an fcntl lock
to ensure only one archiver is active in a given data
Simon Riggs [EMAIL PROTECTED] writes:
This doesn't quite get to the nub of the problem: archiver is designed
to keep archiving files, even in the event that the postmaster explodes.
It will keep archiving until they're all gone.
I think we just need a PostmasterIsAlive check in the per-file
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
This doesn't quite get to the nub of the problem: archiver is designed
to keep archiving files, even in the event that the postmaster explodes.
It will keep archiving until they're all gone.
I think
Simon Riggs [EMAIL PROTECTED] writes:
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
I think we just need a PostmasterIsAlive check in the per-file loop.
...which would mean the archiver would not outlive postmaster in the
event it crashes...which is exactly the time you want it to keep
On Tue, 2006-05-23 at 11:09 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote:
I think we just need a PostmasterIsAlive check in the per-file loop.
...which would mean the archiver would not outlive postmaster in the
event it
Simon Riggs [EMAIL PROTECTED] writes:
My recent patch will prevent server startup, so if you do a fast restart
to bounce the server and change parameters you'll have to keep the
server down while the archiver completes (or you kill it).
BTW, I was not planning on having it do that. The
On Sun, 21 May 2006, Jeff Frost wrote:
So the chances of the original problem being archiver related are
receding...
This is possible, but I guess I should try and reproduce the actual problem
with the same archive_command script and a CIFS mount just to see what
happens. Perhaps the real
Jeff Frost [EMAIL PROTECTED] writes:
I tried both pulling the plug on the CIFS server and unsharing the CIFS
share,
but pgbench continued completely unconcerned. I guess the failure mode of
the
NAS device in the customer colo must be something different that I don't yet
know how to
On Tue, 23 May 2006, Tom Lane wrote:
I'm still thinking that the simplest explanation is that $PGDATA/pg_clog/
is on the NAS device. Please double-check the file locations.
I know that seems like an excellent candidate, but it really isn't, I swear.
In fact, you almost had me convinced the
On Fri, 19 May 2006, Simon Riggs wrote:
Now I can run my same pg_bench, or do you guys
have any other suggestions on attempting to reproduce the problem?
No. We're back on track to try to reproduce the original error.
I've been futzing with trying to reproduce the original problem for a few
On Sun, 2006-05-21 at 14:16 -0700, Jeff Frost wrote:
On Fri, 19 May 2006, Simon Riggs wrote:
Now I can run my same pg_bench, or do you guys
have any other suggestions on attempting to reproduce the problem?
No. We're back on track to try to reproduce the original error.
I've been
Jeff Frost [EMAIL PROTECTED] writes:
Well now, will you look at this:
postgres 20228 1 0 May17 ?00:00:00 postgres: archiver process
postgres 20573 1 0 May17 ?00:00:00 postgres: archiver process
postgres 23817 23810 0 May17 pts/11 00:00:00 postgres: archiver
On Sun, 21 May 2006, Simon Riggs wrote:
I've been futzing with trying to reproduce the original problem for a few days
and so far postgres seems to be just fine with a long delay on archiving, so
now I'm rather at a loss. In fact, I currently have 1,234 xlog files in
pg_xlog, but the archiver
Simon Riggs [EMAIL PROTECTED] writes:
OK, I'm on it.
What solution have you got in mind? I was thinking about an fcntl lock
to ensure only one archiver is active in a given data directory. That
would fix the problem without affecting anything outside the archiver.
Not sure what's the most
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote:
Simon Riggs [EMAIL PROTECTED] writes:
OK, I'm on it.
What solution have you got in mind? I was thinking about an fcntl lock
to ensure only one archiver is active in a given data directory. That
would fix the problem without affecting
On Fri, 19 May 2006, Tom Lane wrote:
Well, there's our smoking gun. IIRC, all the failures you showed us are
consistent with race conditions caused by multiple archiver processes
all trying to do the same tasks concurrently.
Do you frequently stop and restart the postmaster? Because I don't
Jeff Frost [EMAIL PROTECTED] writes:
Hurray! Unfortunately, the postmaster on the original troubled server almost
never gets restarted, and in fact only has only one archiver process running
right now. Drat!
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more
I wrote:
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened. The orphaned archiver
would eventually quit.
But, actually, nevermind: we have explained the failures you were seeing
in the test setup, but a
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote:
I wrote:
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened. The orphaned archiver
would eventually quit.
But, actually, nevermind: we have explained the failures you
On Fri, 19 May 2006, Tom Lane wrote:
Well, the fact that there's only one archiver *now* doesn't mean there
wasn't more than one when the problem happened. The orphaned archiver
would eventually quit.
Do you have logs that would let you check when the production postmaster
was restarted?
I
On Fri, 19 May 2006, Tom Lane wrote:
What I'd suggest is resuming the test after making sure you've killed
off any old archivers, and seeing if you can make any progress on
reproducing the original problem. We definitely need a
multiple-archiver interlock, but I think that must be unrelated to
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote:
On Fri, 19 May 2006, Tom Lane wrote:
What I'd suggest is resuming the test after making sure you've killed
off any old archivers, and seeing if you can make any progress on
reproducing the original problem. We definitely need a
22 matches
Mail list logo