Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Maybe specify an archive location (that of course could be on a separate partition) that the external archiver should check in addition to the normal WAL location. At some predetermined interval, push WAL log segments no longer needed to the archive location. Does that really help? The panic happens when you fill the normal and archive partitions, how's that different from one partition? I see your point. But it would allow you to use a relatively modest local partition for WAL segments, while you might be using a 1TB netapp tray over NFS for the archive segments. I guess if the archive partition fills up, I would err on the side of dropping archive segments on the floor. That would mean a new full backup would be needed, but at least it wouldn't result in a corrupt, or shut down, database. Joe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Simon Riggs [EMAIL PROTECTED] writes: Strict behaviour is fairly straightforward, you just PANIC! There is another mode possible as well. Oracle for example neither panics nor continues, it just freezes. It keeps retrying the transaction until it finds it has space. The sysadmin or dba just has to somehow create additional space by removing old files or however and the database will continue where it left off. That seems a bit nicer than panicing. When I first heard that I was shocked. It means implementing archive logs *created* a new failure mode where there was none before. I thought that was the dumbest idea in the world: who needed a backup process that increased the chances of an outage? Now I can see the logic, but I'm still not sure which mode I would pick if it was up to me. As others have said, I guess it would depend on the situation. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Bruce Momjian Simon Riggs wrote: User-selectable behaviour? OK. That's how we deal with fsync; I can relate to that. That hadn't been part of my thinking because of the importance I'd attached to the log files themselves, but I can go with that, if that's what was meant. So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That way, we've got something akin to transaction isolation level with various levels of protection. Yep, we will definately need something like that. Basically whenever the logs are being archived, you have to stop the database if you can't archive, no? That certainly was my initial feeling, though I believe it is possible to accommodate both viewpoints. I would not want to have only the alternative viewpoint, I must confess. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Joe Conway [mailto:[EMAIL PROTECTED] Simon Riggs wrote: Tom Lane [mailto:[EMAIL PROTECTED] That should be user-scriptable policy, in my worldview. O... and other dbms will freeze when this situation is hit, rather than continue and drop archive logs.] Been there, done that, don't see how it's any better. I hesitate to be real specific here, but let's just say the end result was restore from backup :-( So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That sounds good to me. For the Optimistic case, we need to yell loudly if we do find ourselves needing to drop segments. For the Strict case, we just need to be sure it works correctly ;-) Good. Yell loudly really needs to happen sometime earlier, which is as Gavin originally thought something to do with tablespaces. Strict behaviour is fairly straightforward, you just PANIC! I'd think we could rename these to Fail Operational rather than Optimistic Fail Safe rather than Strict ...the other names were a bit like I'm right and but I'll do yours too ;} Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Simon Riggs wrote: Bruce Momjian Simon Riggs wrote: User-selectable behaviour? OK. That's how we deal with fsync; I can relate to that. That hadn't been part of my thinking because of the importance I'd attached to the log files themselves, but I can go with that, if that's what was meant. So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That way, we've got something akin to transaction isolation level with various levels of protection. Yep, we will definately need something like that. Basically whenever the logs are being archived, you have to stop the database if you can't archive, no? That certainly was my initial feeling, though I believe it is possible to accommodate both viewpoints. I would not want to have only the alternative viewpoint, I must confess. Added to PITR TODO list. Anything else to add: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Tom Lane [mailto:[EMAIL PROTECTED] Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Maybe specify an archive location (that of course could be on a separate partition) that the external archiver should check in addition to the normal WAL location. At some predetermined interval, push WAL log segments no longer needed to the archive location. Does that really help? The panic happens when you fill the normal and archive partitions, how's that different from one partition? I see your point. But it would allow you to use a relatively modest local partition for WAL segments, while you might be using a 1TB netapp tray over NFS for the archive segments. Fair enough, but it seems to me that that sort of setup really falls in the category of a user-defined archiving process --- that is, the hook that Postgres calls will push WAL segments from the local partition to the NFS server, and then pushing them off NFS to tape is the responsibility of some other user-defined subprocess. Database panic happens if and only if the local partition overflows. I don't see that making Postgres explicitly aware of the secondary NFS arrangement will buy anything. Tom's last sentence there summarises the design I was working with. I had considered Joe's suggested approach (which was Oracle's also). However, the PITR design will come with a usable low-function program which can easily copy logs from pg_xlog to another archive directory. That's needed as a test harness anyway, so it may as well be part of the package. You'd be able to use that in production to copy xlogs to another larger directory as a staging area to tape/failover on another system: effectively Joe's idea is catered for in the basic package. Anyway I'm answering questions before publishing the design as stands...though people do keep spurring me to refine it as I'm writing it down! That's why its good to document it I guess. I guess if the archive partition fills up, I would err on the side of dropping archive segments on the floor. That should be user-scriptable policy, in my worldview. Hmmm. Very difficult that one. My experience is in commercial systems. Dropping archive segments on the floor is just absolutely NOT GOOD, if that is the only behaviour. The whole purpose of having a dbms is so that you can protect your business data, while using it. Such behaviour would most likely be a barrier to wider commercial adoption. [Oracle and other dbms will freeze when this situation is hit, rather than continue and drop archive logs.] User-selectable behaviour? OK. That's how we deal with fsync; I can relate to that. That hadn't been part of my thinking because of the importance I'd attached to the log files themselves, but I can go with that, if that's what was meant. So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That way, we've got something akin to transaction isolation level with various levels of protection. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Joe Conway [EMAIL PROTECTED] writes: Simon Riggs wrote: O... and other dbms will freeze when this situation is hit, rather than continue and drop archive logs.] Been there, done that, don't see how it's any better. I hesitate to be real specific here, but let's just say the end result was restore from backup :-( It's hard for me to imagine a situation in which killing the database would be considered a more attractive option than dropping old log data. You may or may not ever need the old log data, but you darn well do need a functioning database. (If you don't, you wouldn't be going to all this work.) I think also that Simon completely misunderstood my intent in saying that this could be user-scriptable policy. By that I meant that the *user* could write the code to behave whichever way he liked. Not that we were going to go into a mad rush of feature invention and try to support every combination we could think of. I repeat: code that pushes logs into a secondary area is not ours to write. We should concentrate on providing an API that lets users write it. We have only limited manpower for this project and we need to spend it on getting the core functionality done right, not on inventing frammishes. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Simon Riggs wrote: User-selectable behaviour? OK. That's how we deal with fsync; I can relate to that. That hadn't been part of my thinking because of the importance I'd attached to the log files themselves, but I can go with that, if that's what was meant. So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That way, we've got something akin to transaction isolation level with various levels of protection. Yep, we will definately need something like that. Basically whenever the logs are being archived, you have to stop the database if you can't archive, no? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Simon Riggs wrote: Tom Lane [mailto:[EMAIL PROTECTED] That should be user-scriptable policy, in my worldview. O... and other dbms will freeze when this situation is hit, rather than continue and drop archive logs.] Been there, done that, don't see how it's any better. I hesitate to be real specific here, but let's just say the end result was restore from backup :-( So, if we had a parameter called Wal_archive_policy that has 3 settings: None = no archiving Optimistic = archive, but if for some reason log space runs out then make space by dropping the oldest archive logs Strict = if log space runs out, stop further write transactions from committing, by whatever means, even if this takes down dbms. That sounds good to me. For the Optimistic case, we need to yell loudly if we do find ourselves needing to drop segments. For the Strict case, we just need to be sure it works correctly ;-) Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Tom Lane wrote: I think also that Simon completely misunderstood my intent in saying that this could be user-scriptable policy. By that I meant that the *user* could write the code to behave whichever way he liked. Not that we were going to go into a mad rush of feature invention and try to support every combination we could think of. I repeat: code that pushes logs into a secondary area is not ours to write. We should concentrate on providing an API that lets users write it. We have only limited manpower for this project and we need to spend it on getting the core functionality done right, not on inventing frammishes. Hmm... I totally agree. I think the backend could just offer a shared memory segment and a marker message to another process to allow copy from it. then it is the applications business to do the things. Of course there has to be a two way agreement about it but an API is a real nice thing rather than an application. Shridhar ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Tom Lane [mailto:[EMAIL PROTECTED] Simon Riggs [EMAIL PROTECTED] writes: You're absolutely right about the not-knowing when you're out of space issue. However, if the xlog has been written then it is not desirable, but at least acceptable that the checkpoint/bgwriter cannot complete on an already committed txn. It's not the txn which is getting the error, that's all. Right. This is in fact not a fatal situation, as long as you don't run out of preallocated WAL space. ...following on also from thoughts on [PERFORM] list... Clearly running out of pre-allocated WAL space is likely to be the next issue. Running out of space in the first place is likely to be because of an intense workload, which is exactly the thing which also makes you run out of pre-allocated WAL space. Does that make sense? Best regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Simon Riggs [EMAIL PROTECTED] writes: Tom Lane wrote: Right. This is in fact not a fatal situation, as long as you don't run out of preallocated WAL space. Clearly running out of pre-allocated WAL space is likely to be the next issue. Running out of space in the first place is likely to be because of an intense workload, which is exactly the thing which also makes you run out of pre-allocated WAL space. Does that make sense? I think one of the first things people would do with tablespaces is stick the data files onto a separate partition from the WAL and clog files. (Actually you can do this today with a simple symlink hack, but tablespaces will make it easier and clearer.) The space usage for WAL is really pretty predictable, because of the checkpoint-at-least- every-N-segments setting. clog is not exactly a space hog either. Once you have that separation established, out-of-disk-space can kill individual transactions but never the database as a whole. One of the things that bothers me about the present PITR design is that it presumes that individual WAL log segments can be kept until the external archiver process feels like writing them somewhere. If there's no guarantee that that happens within X amount of time, then you can't bound the amount of space needed on the WAL drive, and so you are back facing the possibility of an out-of-WAL-space panic. I suspect that we cannot really do anything about that, but it's annoying. Any bright ideas out there? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was
Tom Lane wrote: One of the things that bothers me about the present PITR design is that it presumes that individual WAL log segments can be kept until the external archiver process feels like writing them somewhere. If there's no guarantee that that happens within X amount of time, then you can't bound the amount of space needed on the WAL drive, and so you are back facing the possibility of an out-of-WAL-space panic. I suspect that we cannot really do anything about that, but it's annoying. Any bright ideas out there? Maybe specify an archive location (that of course could be on a separate partition) that the external archiver should check in addition to the normal WAL location. At some predetermined interval, push WAL log segments no longer needed to the archive location. Joe ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Out of space situation and WAL log pre-allocation (was Tablespaces)
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: facing the possibility of an out-of-WAL-space panic. I suspect that we cannot really do anything about that, but it's annoying. Any bright ideas out there? Maybe specify an archive location (that of course could be on a separate partition) that the external archiver should check in addition to the normal WAL location. At some predetermined interval, push WAL log segments no longer needed to the archive location. Does that really help? The panic happens when you fill the normal and archive partitions, how's that different from one partition? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend