Re: [HACKERS] [ADMIN] does wal archiving block the current client

2006-05-23 Thread Simon Riggs
On Fri, 2006-05-19 at 17:27 +0100, Simon Riggs wrote: On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: OK, I'm on it. What solution have you got in mind? I was thinking about an fcntl lock to ensure only one archiver is active in a given data

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: This doesn't quite get to the nub of the problem: archiver is designed to keep archiving files, even in the event that the postmaster explodes. It will keep archiving until they're all gone. I think we just need a PostmasterIsAlive check in the per-file

Re: [HACKERS] [ADMIN] does wal archiving block the current client

2006-05-23 Thread Simon Riggs
On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: This doesn't quite get to the nub of the problem: archiver is designed to keep archiving files, even in the event that the postmaster explodes. It will keep archiving until they're all gone. I think

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: I think we just need a PostmasterIsAlive check in the per-file loop. ...which would mean the archiver would not outlive postmaster in the event it crashes...which is exactly the time you want it to keep

Re: [HACKERS] [ADMIN] does wal archiving block the current client

2006-05-23 Thread Simon Riggs
On Tue, 2006-05-23 at 11:09 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Tue, 2006-05-23 at 10:53 -0400, Tom Lane wrote: I think we just need a PostmasterIsAlive check in the per-file loop. ...which would mean the archiver would not outlive postmaster in the event it

Re: [HACKERS] [ADMIN] does wal archiving block the current client

2006-05-23 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: My recent patch will prevent server startup, so if you do a fast restart to bounce the server and change parameters you'll have to keep the server down while the archiver completes (or you kill it). BTW, I was not planning on having it do that. The

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Jeff Frost
On Sun, 21 May 2006, Jeff Frost wrote: So the chances of the original problem being archiver related are receding... This is possible, but I guess I should try and reproduce the actual problem with the same archive_command script and a CIFS mount just to see what happens. Perhaps the real

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes: I tried both pulling the plug on the CIFS server and unsharing the CIFS share, but pgbench continued completely unconcerned. I guess the failure mode of the NAS device in the customer colo must be something different that I don't yet know how to

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Jeff Frost
On Tue, 23 May 2006, Tom Lane wrote: I'm still thinking that the simplest explanation is that $PGDATA/pg_clog/ is on the NAS device. Please double-check the file locations. I know that seems like an excellent candidate, but it really isn't, I swear. In fact, you almost had me convinced the

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Jeff Frost
On Fri, 19 May 2006, Simon Riggs wrote: Now I can run my same pg_bench, or do you guys have any other suggestions on attempting to reproduce the problem? No. We're back on track to try to reproduce the original error. I've been futzing with trying to reproduce the original problem for a few

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Simon Riggs
On Sun, 2006-05-21 at 14:16 -0700, Jeff Frost wrote: On Fri, 19 May 2006, Simon Riggs wrote: Now I can run my same pg_bench, or do you guys have any other suggestions on attempting to reproduce the problem? No. We're back on track to try to reproduce the original error. I've been

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes: Well now, will you look at this: postgres 20228 1 0 May17 ?00:00:00 postgres: archiver process postgres 20573 1 0 May17 ?00:00:00 postgres: archiver process postgres 23817 23810 0 May17 pts/11 00:00:00 postgres: archiver

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Jeff Frost
On Sun, 21 May 2006, Simon Riggs wrote: I've been futzing with trying to reproduce the original problem for a few days and so far postgres seems to be just fine with a long delay on archiving, so now I'm rather at a loss. In fact, I currently have 1,234 xlog files in pg_xlog, but the archiver

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: OK, I'm on it. What solution have you got in mind? I was thinking about an fcntl lock to ensure only one archiver is active in a given data directory. That would fix the problem without affecting anything outside the archiver. Not sure what's the most

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 12:03 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: OK, I'm on it. What solution have you got in mind? I was thinking about an fcntl lock to ensure only one archiver is active in a given data directory. That would fix the problem without affecting

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: Well, there's our smoking gun. IIRC, all the failures you showed us are consistent with race conditions caused by multiple archiver processes all trying to do the same tasks concurrently. Do you frequently stop and restart the postmaster? Because I don't

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes: Hurray! Unfortunately, the postmaster on the original troubled server almost never gets restarted, and in fact only has only one archiver process running right now. Drat! Well, the fact that there's only one archiver *now* doesn't mean there wasn't more

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
I wrote: Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. But, actually, nevermind: we have explained the failures you were seeing in the test setup, but a

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote: I wrote: Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. But, actually, nevermind: we have explained the failures you

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. Do you have logs that would let you check when the production postmaster was restarted? I

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: What I'd suggest is resuming the test after making sure you've killed off any old archivers, and seeing if you can make any progress on reproducing the original problem. We definitely need a multiple-archiver interlock, but I think that must be unrelated to

Re: [HACKERS] [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote: On Fri, 19 May 2006, Tom Lane wrote: What I'd suggest is resuming the test after making sure you've killed off any old archivers, and seeing if you can make any progress on reproducing the original problem. We definitely need a