Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-22 18:19:25 -0600, Jim Nasby wrote: On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Simon Riggs
On 23 January 2014 01:19, Jim Nasby j...@nasby.net wrote: On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-23 13:56:49 +0100, Simon Riggs wrote: IMHO we need to resolve the deadlock inherent in the disk-full/WALlock-up/checkpoint situation. My view is that can be solved in a similar way to the way the buffer pin deadlock was resolved for Hot Standby. I don't think that approach works

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 01:23, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 18:59:13 -0500, Tom Lane wrote: Another thing to think about is whether we couldn't put a hard limit on WAL record size somehow. Multi-megabyte WAL records are an abuse of

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 01:30, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: How are we supposed to wait while e.g. ProcArrayLock? Aborting transactions doesn't work either, that writes abort records which can get signficantly large. Yeah, that's an interesting

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Heikki Linnakangas
On 01/22/2014 02:10 PM, Simon Riggs wrote: As Jeff points out, the blocks being modified would be locked until space is freed up. Which could make other users wait. The code required to avoid that wait would be complex and not worth any overhead. Checkpoint also acquires the content lock of

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 13:14, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/22/2014 02:10 PM, Simon Riggs wrote: As Jeff points out, the blocks being modified would be locked until space is freed up. Which could make other users wait. The code required to avoid that wait would be

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Well, PANIC is certainly bad, but what I'm suggesting is that we just focus on getting that down to ERROR and not worry about trying to get out of the disk-shortage situation automatically. Nor do I believe that it's such a good idea to have the database

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Andres Freund
On 2014-01-21 21:42:19 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 19:45:19 -0500, Tom Lane wrote: I don't think that's a comparable case. Incomplete actions are actions to be taken immediately, and which the replayer then has to complete somehow if

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 21:42:19 -0500, Tom Lane wrote: Uh, what? The behavior I'm talking about is *exactly the same* as what happens now. The only change is that the data sent to the WAL file is laid out a bit differently, and the replay logic has to

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Simon Riggs
On 22 January 2014 14:25, Simon Riggs si...@2ndquadrant.com wrote: On 22 January 2014 13:14, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/22/2014 02:10 PM, Simon Riggs wrote: As Jeff points out, the blocks being modified would be locked until space is freed up. Which could make

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-22 Thread Jim Nasby
On 1/21/14, 6:46 PM, Andres Freund wrote: On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: In the Redesigning checkpoint_segments thread, many people opined that there should be a hard limit on the amount of disk space used for WAL:

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. I don't see we need to prevent WAL

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 21 January 2014 18:35, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Greg Stark
Fwiw I think all transactions lock up until space appears is *much* better than PANICing. Often disks fill up due to other transient storage or people may have options to manually increase the amount of space. it's much better if the database just continues to function after that rather than need

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Greg Stark st...@mit.edu writes: Fwiw I think all transactions lock up until space appears is *much* better than PANICing. Often disks fill up due to other transient storage or people may have options to manually increase the amount of space. it's much better if the database just continues to

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Jeff Janes
On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a PANIC condition. Probably the cause is that it's

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Peter Geoghegan
On Tue, Jan 21, 2014 at 3:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of future WAL space always

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 18:24:39 -0500, Tom Lane wrote: Jeff Janes jeff.ja...@gmail.com writes: On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: My preference would be that we simply start failing writes with ERRORs rather than PANICs. I'm not real sure ATM why this has to be a

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 18:24:39 -0500, Tom Lane wrote: Maybe we could get some mileage out of the fact that very approximate techniques would be good enough. For instance, I doubt anyone would bleat if the system insisted on having 10MB or even 100MB of

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 18:59:13 -0500, Tom Lane wrote: Another thing to think about is whether we couldn't put a hard limit on WAL record size somehow. Multi-megabyte WAL records are an abuse of the design anyway, when you get right down to it. So for example maybe we could split up commit records,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Simon Riggs
On 21 January 2014 23:01, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Jan 21, 2014 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 6 June 2013 16:00, Heikki Linnakangas hlinnakan...@vmware.com wrote: The current situation is that if you run

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-22 01:18:36 +0100, Simon Riggs wrote: My understanding is that if it runs out of buffer space while in an XLogInsert, it will be holding one or more buffer content locks exclusively, and unless it can complete the xlog (or scrounge up the info to return that buffer to its

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 18:59:13 -0500, Tom Lane wrote: Another thing to think about is whether we couldn't put a hard limit on WAL record size somehow. Multi-megabyte WAL records are an abuse of the design anyway, when you get right down to it. So for

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: How are we supposed to wait while e.g. ProcArrayLock? Aborting transactions doesn't work either, that writes abort records which can get signficantly large. Yeah, that's an interesting point ;-). We can't *either* commit or abort without emitting

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Peter Geoghegan
On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund and...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar has to be very high when we're considering the trade-off between taking some emergency precaution

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 19:23:57 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 18:59:13 -0500, Tom Lane wrote: Another thing to think about is whether we couldn't put a hard limit on WAL record size somehow. Multi-megabyte WAL records are an abuse of the design

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 19:23:57 -0500, Tom Lane wrote: I'm not suggesting that we stop providing that information! I'm just saying that we perhaps don't need to store it all in one WAL record, if instead we put the onus on WAL replay to be able to

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote: On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund and...@2ndquadrant.com wrote: I personally think this isn't worth complicating the code for. You're probably right. However, I don't see why the bar has to be very high when we're considering

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Andres Freund
On 2014-01-21 19:45:19 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 19:23:57 -0500, Tom Lane wrote: I'm not suggesting that we stop providing that information! I'm just saying that we perhaps don't need to store it all in one WAL record, if instead we

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-01-21 19:45:19 -0500, Tom Lane wrote: I don't think that's a comparable case. Incomplete actions are actions to be taken immediately, and which the replayer then has to complete somehow if it doesn't find the rest of the action in the WAL

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-22 Thread Bruce Momjian
On Mon, Jun 10, 2013 at 07:28:24AM +0800, Craig Ringer wrote: (I'm still learning the details of Pg's WAL, WAL replay and recovery, so the below's just my understanding): The problem is that WAL for all tablespaces is mixed together in the archives. If you lose your tablespace then you have

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-17 Thread Dimitri Fontaine
Peter Eisentraut pete...@gmx.net writes: I suspect that there are actually only about 5 or 6 common ways to do archiving (say, local, NFS, scp, rsync, S3, ...). There's no reason why we can't fully specify and/or script what to do in each of these cases. And provide either fully reliable

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Sat, Jun 8, 2013 at 10:36 AM, MauMau maumau...@gmail.com wrote: Yes, I feel designing reliable archiving, even for the simplest case - copy WAL to disk, is very difficult. I know there are following three problems if you just follow the PostgreSQL manual. Average users won't notice them.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Claudio Freire
On Wed, Jun 12, 2013 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: I hope PostgreSQL will provide a reliable archiving facility that is ready to use. +1. I think we should have a way to set an archive DIRECTORY, rather than an archive command. And if you set it, then PostgreSQL

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Tatsuo Ishii
On Sat, Jun 8, 2013 at 10:36 AM, MauMau maumau...@gmail.com wrote: Yes, I feel designing reliable archiving, even for the simplest case - copy WAL to disk, is very difficult. I know there are following three problems if you just follow the PostgreSQL manual. Average users won't notice them.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Magnus Hagander
On Jun 12, 2013 4:56 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Jun 8, 2013 at 10:36 AM, MauMau maumau...@gmail.com wrote: Yes, I feel designing reliable archiving, even for the simplest case - copy WAL to disk, is very difficult. I know there are following three problems if you

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Sat, Jun 8, 2013 at 7:20 PM, Jeff Janes jeff.ja...@gmail.com wrote: If archiving is on and failure is due to no space, could we just keep trying XLogFileInit again for a couple minutes to give archiving a chance to do its things? Doing that while holding onto locks and a critical section

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Wed, Jun 12, 2013 at 11:32 AM, Magnus Hagander mag...@hagander.net wrote: Wouldn't that encourage people to do local archiving, which is almost always a bad idea? Maybe, but refusing to improve the UI because people might then use the feature seems wrong-headed. I'd rather improve the

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Peter Eisentraut
On 6/12/13 10:55 AM, Robert Haas wrote: But it's got to be pretty common to archive to a local path that happens to be a remote mount, or to a local directory whose contents are subsequently copied off by a batch job. Making that work nicely with near-zero configuration would be a significant

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Robert Haas
On Wed, Jun 12, 2013 at 12:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 6/12/13 10:55 AM, Robert Haas wrote: But it's got to be pretty common to archive to a local path that happens to be a remote mount, or to a local directory whose contents are subsequently copied off by a batch job.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Joshua D. Drake
On 06/12/2013 08:49 AM, Robert Haas wrote: Sure, remote archiving is great, and I'm glad you've been working on it. In general, I think that's a cleaner approach, but there are still enough people using archive_command that we can't throw them under the bus. Correct. I guess archiving

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-12 Thread Claudio Freire
On Wed, Jun 12, 2013 at 6:03 PM, Joshua D. Drake j...@commandprompt.com wrote: Right now you have to be a rocket scientist no matter what configuration you're running. This is quite a bit overblown. Assuming your needs are simple. Archiving is at it is now, a relatively simple process to

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread MauMau
From: Craig Ringer cr...@2ndquadrant.com The problem is that WAL for all tablespaces is mixed together in the archives. If you lose your tablespace then you have to keep *all* WAL around and replay *all* of it again when the tablespace comes back online. This would be very inefficient, would

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
Josh, Daniel, Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might accidentally cause your database

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:07 AM, Joshua D. Drake j...@commandprompt.comwrote: On 06/08/2013 07:36 AM, MauMau wrote: 1. If the machine or postgres crashes while archive_command is copying a WAL file, later archive recovery fails. This is because cp leaves a file of less than 16MB in archive

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina
On Mon, Jun 10, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote: Anyway, what I'm pointing out is that this is a business decision, and there is no way that we can make a decision for the users what to do when we run out of WAL space. And that the stop archiving option needs to be there

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
Daniel, Jeff, I don't doubt this, that's why I do have a no-op fallback for emergencies. The discussion was about defaults. I still think that drop-wal-from-archiving-whenever is not a good one. Yeah, we can argue defaults for a long time. What would be better is some way to actually

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina
On Mon, Jun 10, 2013 at 4:42 PM, Josh Berkus j...@agliodbs.com wrote: Daniel, Jeff, I don't doubt this, that's why I do have a no-op fallback for emergencies. The discussion was about defaults. I still think that drop-wal-from-archiving-whenever is not a good one. Yeah, we can argue

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Joshua D. Drake
On 06/10/2013 04:42 PM, Josh Berkus wrote: Actually we describe what archive_command needs to fulfill, and tell them to use something that accomplishes that. The example with cp is explicitly given as an example, not a recommendation. If we offer cp as an example, we *are* recommending it.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Josh Berkus
Not a bad idea. One that supports rsync and another that supports robocopy. That should cover every platform we support. Example script: = #!/usr/bin/env bash # Simple script to copy WAL archives from one server to another # to be called as archive_command (call

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Craig Ringer
On 06/09/2013 08:32 AM, MauMau wrote: - Failure of a disk containing data directory or tablespace If checkpoint can't write buffers to disk because of disk failure, checkpoint cannot complete, thus WAL files accumulate in pg_xlog/. This means that one disk failure will lead to postgres

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Craig Ringer
On 06/09/2013 03:02 AM, Jeff Janes wrote: It would be nice to have the ability to specify multiple log destinations with different log_min_messages for each one. I'm sure syslog already must implement some kind of method for doing that, but I've been happy enough with the text logs that I've

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Andres Freund
On 2013-06-08 13:26:56 -0700, Joshua D. Drake wrote: At the points where the XLogInsert()s happens we're in critical sections out of which we *cannot* ERROR out because we already may have made modifications that cannot be allowed to be performed partially/unlogged. That's why we're throwing a

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread MauMau
From: Craig Ringer cr...@2ndquadrant.com On 06/09/2013 08:32 AM, MauMau wrote: - Failure of a disk containing data directory or tablespace If checkpoint can't write buffers to disk because of disk failure, checkpoint cannot complete, thus WAL files accumulate in pg_xlog/. This means that one

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-09 Thread Craig Ringer
On 06/10/2013 06:39 AM, MauMau wrote: The problem is that the reliability of the database system decreases with more disks, because failure of any one of those disks would result in a database PANIC shutdown More specifically, with more independent sets of disks / file systems. I'd rather

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: Daniel Farina dan...@heroku.com On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/07/2013 12:14 PM, Josh Berkus wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no packages for, or you might

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/08/2013 07:36 AM, MauMau wrote: 1. If the machine or postgres crashes while archive_command is copying a WAL file, later archive recovery fails. This is because cp leaves a file of less than 16MB in archive area, and postgres refuses to start when it finds such a small archive WAL file.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: I think it can be made fairly robust otherwise, and the performance impact should be pretty easy to measure with e.g pgbench. Once upon a time in a land far, far away, we expected users to manage their own systems. We had things like soft and

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are creating a segment. What do we care why the

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-07 12:02:57 +0300, Heikki Linnakangas wrote: On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: The archive command can be made a shell script (or that matter a compiled program) which can do anything it wants upon failure, including emailing people. You're talking about using external tools -- frequently

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:15 AM, Joshua D. Drake j...@commandprompt.comwrote: On 06/06/2013 07:52 AM, Heikki Linnakangas wrote: I think it can be made fairly robust otherwise, and the performance impact should be pretty easy to measure with e.g pgbench. Once upon a time in a land far, far

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake
On 06/08/2013 11:27 AM, Andres Freund wrote: On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote: To me, a more pragmatic approach makes sense. Obviously having some kind of code that checks the space makes sense but I don't know that it needs to be around any operation other than we are

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Simon Riggs
On 7 June 2013 10:02, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:27 AM, Andres Freund and...@2ndquadrant.comwrote: You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. At the points where the XLogInsert()s happens we're in

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: Joshua D. Drake j...@commandprompt.com On 06/08/2013 07:36 AM, MauMau wrote: 3. You cannot know the reason of archive_command failure (e.g. archive area full) if you don't use PostgreSQL's server logging. This is because archive_command failure is not logged in syslog/eventlog. Wait,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: Joshua D. Drake j...@commandprompt.com On 06/08/2013 11:27 AM, Andres Freund wrote: You know, the PANIC isn't there just because we like to piss of users. There's actual technical reasons that don't just go away by judging the PANIC as stupid. Yes I know we aren't trying to piss off

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau
From: Josh Berkus j...@agliodbs.com There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/06/2013 10:00 PM, Heikki Linnakangas wrote: I've seen a case, where it was even worse than a PANIC and shutdown. pg_xlog was on a separate partition that had nothing else on it. The partition filled up, and the system shut down with a PANIC. Because there was no space left, it could not

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/08/2013 10:57 AM, Daniel Farina wrote: At which point most sensible users say no thanks, I'll use something else. [snip] I have a clear bias in experience here, but I can't relate to someone who sets up archives but is totally okay losing a segment unceremoniously, because it only

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 07.06.2013 00:38, Andres Freund wrote: On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Bernd Helmle
--On 6. Juni 2013 16:25:29 -0700 Josh Berkus j...@agliodbs.com wrote: Archiving - In some ways, this is the simplest case. Really, we just need a way to know when the available WAL space has become 90% full, and abort archiving at that stage. Once we stop attempting to archive, we

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 06.06.2013 17:00, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert, heap_update, etc., and all similar indexam entry points.

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 06.06.2013 17:00, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Heikki Linnakangas
On 07.06.2013 19:33, Tom Lane wrote: Heikki Linnakangashlinnakan...@vmware.com writes: On 06.06.2013 17:00, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: On 07.06.2013 19:33, Tom Lane wrote: Not only is that a horrible layering/modularity violation, but surely LockBuffer can have no idea how much WAL space will be needed. It can be just a conservative guess, like, 32KB. That should be enough

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Josh Berkus
I would oppose that as the solution, either an unconditional one, or configurable with is it as the default. Those segments are not unneeded. I need them. That is why I set up archiving in the first place. If you need to shut down the database rather than violate my established retention

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Daniel Farina
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote: Right now, what we're telling users is You can have continuous backup with Postgres, but you'd better hire and expensive consultant to set it up for you, or use this external tool of dubious provenance which there's no

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Andres Freund
On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert, heap_update, etc., and all similar indexam entry

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Heikki Linnakangas
On 06.06.2013 17:17, Andres Freund wrote: On 2013-06-06 17:00:30 +0300, Heikki Linnakangas wrote: A more workable idea is to sprinkle checks in higher-level code, before you hold any critical locks, to check that there is enough preallocated WAL. Like, at the beginning of heap_insert,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Christian Ullrich
* Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL insertions from happening, before it's too late. A naive idea is to check if

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Andres Freund
On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL insertions from happening,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Greg Stark
On Thu, Jun 6, 2013 at 10:38 PM, Andres Freund and...@2ndquadrant.com wrote: That's not a bad technique. I wonder how reliable it would be in postgres. Do all filesystems allow a rename() to succeed if there isn't actually any space left? E.g. on btrfs I wouldn't be sure. We need to rename

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Josh Berkus
Let's talk failure cases. There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it up. - Archiving: archiving is failing or too slow,

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Jaime Casanova
On Thu, Jun 6, 2013 at 4:28 PM, Christian Ullrich ch...@chrullrich.net wrote: * Heikki Linnakangas wrote: The current situation is that if you run out of disk space while writing WAL, you get a PANIC, and the server shuts down. That's awful. We can So we need to somehow stop new WAL

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Jeff Janes
On Thursday, June 6, 2013, Josh Berkus wrote: Let's talk failure cases. There's actually three potential failure cases here: - One Volume: WAL is on the same volume as PGDATA, and that volume is completely out of space. - XLog Partition: WAL is on its own partition/volume, and fills it

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Joshua D. Drake
On 06/06/2013 09:30 PM, Jeff Janes wrote: Archiving - In some ways, this is the simplest case. Really, we just need a way to know when the available WAL space has become 90% full, and abort archiving at that stage. Once we stop attempting to archive, we can

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Daniel Farina
On Thu, Jun 6, 2013 at 9:30 PM, Jeff Janes jeff.ja...@gmail.com wrote: I would oppose that as the solution, either an unconditional one, or configurable with is it as the default. Those segments are not unneeded. I need them. That is why I set up archiving in the first place. If you need