Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-27 Thread Simon Riggs
On Tue, 2005-07-26 at 19:15 -0400, Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: We should run tests with much higher wal_buffers numbers to nullify the effect described above and reduce contention. That way we will move towards the log disk speed being the limiting factor, patch or

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-27 Thread Josh Berkus
Simon, I guess I'd be concerned that the poor bgwriter can't do all of this work. I was thinking about a separate log writer, so we could have both bgwriter and logwriter active simultaneously on I/O. It has taken a while to get bgwriter to perform its duties efficiently, so I'd rather not

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Simon Riggs
On Fri, 2005-07-22 at 19:11 -0400, Tom Lane wrote: Hmm. Eyeballing the NOTPM trace for cases 302912 and 302909, it sure looks like the post-checkpoint performance recovery is *slower* in the latter. And why is 302902 visibly slower overall than 302905? I thought for a bit that you had gotten

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Josh Berkus
Simon, We should run tests with much higher wal_buffers numbers to nullify the effect described above and reduce contention. That way we will move towards the log disk speed being the limiting factor, patch or no patch. I've run such tests, at a glance they do seem to improve performance. I

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: We should run tests with much higher wal_buffers numbers to nullify the effect described above and reduce contention. That way we will move towards the log disk speed being the limiting factor, patch or no patch. I've run such tests, at a glance they do

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-26 Thread Josh Berkus
Tom, I have no idea whether the DBT benchmarks would benefit at all, but given that they are affected positively by increasing wal_buffers, they must have a fair percentage of not-small transactions. Even if they don't, we'll have series tests for DW here at GreenPlum soon, and I'll bet

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-25 Thread Mark Wong
On Fri, 22 Jul 2005 19:11:36 -0400 Tom Lane [EMAIL PROTECTED] wrote: BTW, I'd like to look at 302906, but its [Details] link is broken. Ugh, I tried digging onto the internal systems and it looks like they were destroyed (or not saved) somehow. It'll have to be rerun. Sorry... Mark

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, This will remove just the CRC calculation work associated with backed-up pages.  Note that any attempt to recover from the WAL will fail, but I assume you don't need that for the purposes of the test run. Looks like the CRC calculation work isn't the issue. I did test runs of no-CRC

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Did you test with full_page_writes on and off? --- Josh Berkus wrote: Tom, This will remove just the CRC calculation work associated with backed-up pages. ?Note that any attempt to recover from the WAL will fail, but

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Bruce, Did you test with full_page_writes on and off? I didn't use your full_page_writes version because Tom said it was problematic. This is CVS from July 3rd. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Josh Berkus wrote: Bruce, Did you test with full_page_writes on and off? I didn't use your full_page_writes version because Tom said it was problematic. This is CVS from July 3rd. I think we need those tests run. -- Bruce Momjian| http://candle.pha.pa.us

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Bruce, I think we need those tests run. Sure. What CVS day should I grab? What's the option syntax? ( -c full_page_writes=false)? I have about 20 tests in queue right now but can stack yours up behind them. -- --Josh Josh Berkus Aglio Database Solutions San Francisco

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Bruce Momjian
Josh Berkus wrote: Bruce, I think we need those tests run. Sure. What CVS day should I grab? What's the option syntax? ( -c full_page_writes=false)? Yes. You can grab any from the day Tom fixed it, which was I think two weeks ago. I have about 20 tests in queue right now but

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Greg Stark
Josh Berkus josh@agliodbs.com writes: I think this test run http://khack.osdl.org/stp/302903/results/0/, with a 30-min checkpoint shows pretty clearly that the behavior of the performance drop is consistent with needing to re-prime the WAL will full page images. Each checkpoint drops

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Looks like the CRC calculation work isn't the issue. I did test runs of no-CRC vs. regular DBT2 with different checkpoint timeouts, and didn't discern any statistical difference. See attached spreadsheet chart (the two different runs are on two

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Bruce, Did you test with full_page_writes on and off? I didn't use your full_page_writes version because Tom said it was problematic. This is CVS from July 3rd. We already know the results: should be equivalent to the hack Josh tried first. So what

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Greg, For any benchmarking to be meaningful you have to set the checkpoint interval to something more realistic. Something like 5 minutes. That way when the final checkpoint cycle isn't completely included in the timing data you'll at least be missing a statistically insignificant portion of

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, Um, where are the test runs underlying this spreadsheet? I don't have a whole lot of confidence in looking at full-run average TPM numbers to discern whether transient dropoffs in TPM are significant or not. Web in the form of: http://khack.osdl.org/stp/#test_number#/ Where

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: For any benchmarking to be meaningful you have to set the checkpoint interval to something more realistic. Something like 5 minutes. That way when the final checkpoint cycle isn't completely included in the timing data you'll at least be missing a

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Um, where are the test runs underlying this spreadsheet? I don't have a whole lot of confidence in looking at full-run average TPM numbers to discern whether transient dropoffs in TPM are significant or not. Web in the form of:

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-22 Thread Josh Berkus
Tom, There's something awfully weird going on here. I was prepared to see no statistically-significant differences, but not multiple cases that seem to be going the wrong direction. There's a lot of variance in the tests. I'm currently running a variance test battery on one machine to

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-21 Thread Josh Berkus
Tom, Josh, I see that all of those runs seem to be using wal_buffers = 8. Have you tried materially increasing wal_buffers (say to 100 or 1000) and/or experimenting with different wal_sync_method options since we fixed the bufmgrlock problem? I am wondering if the real issue is WAL buffer

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: So, now that we know what the performance bottleneck is, how do we fix it? Josh, I see that all of those runs seem to be using wal_buffers = 8. Have you tried materially increasing wal_buffers (say to 100 or 1000) and/or experimenting with different

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-16 Thread Kevin Brown
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I don't think we should care too much about indexes. We can rebuild them...but losing heap sectors means *data loss*. If you're so concerned about *data loss* then none of this will be acceptable to you at all. We are talking about

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-16 Thread Bruce Momjian
I don't think our problem is partial writes of WAL, which we already check, but heap/index page writes, which we currently do not check for partial writes. --- Kevin Brown wrote: Tom Lane wrote: Simon Riggs [EMAIL

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-16 Thread Kevin Brown
Bruce Momjian wrote: I don't think our problem is partial writes of WAL, which we already check, but heap/index page writes, which we currently do not check for partial writes. Hmm...I've read through the thread again and thought about the problem further, and now think I understand what

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 14:45 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I don't think we should care too much about indexes. We can rebuild them...but losing heap sectors means *data loss*. If you're so concerned about *data loss* then none of this will be acceptable to

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 09:34 +0200, Zeugswetter Andreas DAZ SD wrote: The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I can't see that you'd turn off the full-page-write option either. So

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-11 Thread Josh Berkus
Simon, Tom, Will do. Results in a few days. Actually, between the bad patch on the 5th and ongoing STP issues, I don't think I will have results before I leave town.Will e-mail you offlist to give you info to retrieve results. Any chance you'd be able to do this with ext3 and a

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-09 Thread Hannu Krosing
On R, 2005-07-08 at 14:45 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I don't think we should care too much about indexes. We can rebuild them...but losing heap sectors means *data loss*. There might be some merit in idea to disabling WAL/PITR for indexes, where one can

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Zeugswetter Andreas DAZ SD
The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I can't see that you'd turn off the full-page-write option either. So we have not solved anyone's performance problem. Yes, this is basically

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Simon Riggs
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Dawid Kuroczko
On 7/7/05, Bruce Momjian pgman@candle.pha.pa.us wrote: One idea would be to just tie its behavior directly to fsync and remove the option completely (that was the original TODO), or we can adjust it so it doesn't have the same risks as fsync, or the same lack of failure reporting as fsync. I

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Is there also a potential showstopper in the redo machinery? We work on the assumption that the post-checkpoint block is available in WAL as a before image. Redo for all actions merely replay the write action again onto the block. If we must reapply the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Heikki Linnakangas
On Thu, 7 Jul 2005, Tom Lane wrote: We still don't know enough about the situation to know what a solution might look like. Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs, or the extra I/O cost, or excessive locking of the WAL-related data structures while we do this

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Josh Berkus
Tom, Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT and just now --- Bruce's full_page_writes patch introduced a large random negative component into the timing ... Ach. Starting over, then. --Josh -- Josh Berkus Aglio Database Solutions San Francisco

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Simon Riggs
On Fri, 2005-07-08 at 09:47 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Having raised that objection, ISTM that checking for torn pages can be accomplished reasonably well using a few rules... I have zero confidence in this; the fact that you can think of (incomplete,

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I don't think we should care too much about indexes. We can rebuild them...but losing heap sectors means *data loss*. If you're so concerned about *data loss* then none of this will be acceptable to you at all. We are talking about going from a system that

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Wed, 2005-07-06 at 17:17 -0700, Joshua D. Drake wrote: Tom, I think you're the only person that could or would be trusted to make such a change. Even past the 8.1 freeze, I say we need to do something now on this issue. I think if we document full_page_writes as similar to fsync in

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Simon Riggs
On Thu, 2005-07-07 at 00:29 -0400, Bruce Momjian wrote: Tom Lane wrote: Bruno Wolff III [EMAIL PROTECTED] writes: Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't think you can possibly assume that. If the block crosses a cylinder boundary then it's certainly an unsafe assumption,

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Here's an idea: We read the page that we would have backed up, calc the CRC and write a short WAL record with just the CRC, not the block. When we recover we re-read the database page, calc its CRC and compare it with the CRC from the transaction log. If they differ, we know that the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes: Only workable solution would imho be to write the LSN to each 512 byte block (not that I am propagating that idea). We're not doing anything like that, as it would create an impossible space-management problem (or are you happy with limiting

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
I wrote: We still don't know enough about the situation to know what a solution might look like. Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs, or the extra I/O cost, or excessive locking of the WAL-related data structures while we do this stuff, or ???. Need more

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Zeugswetter Andreas DAZ SD
Only workable solution would imho be to write the LSN to each 512 byte block (not that I am propagating that idea). Only workable was a stupid formulation, I meant a solution that works with a LSN. We're not doing anything like that, as it would create an impossible space-management

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote: SCSI tagged queueing certainly allows 512-byte blocks to be reordered during writes. Then a torn-page tell-tale is required that will tell us of any change to any of the 512-byte sectors that make up a block/page. Here's an idea: We read the page that we would have

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Yes, that is a good idea! ... which was shot down in the very next message. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Zeugswetter Andreas DAZ SD wrote: Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't think you can possibly assume that. If the block crosses a cylinder boundary then

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Zeugswetter Andreas DAZ SD [EMAIL PROTECTED] writes: Only workable solution would imho be to write the LSN to each 512 byte block (not that I am propagating that idea). We're not doing anything like that, as it would create an impossible

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: What we *could* do is calculate a page-level CRC and store it in the page header just before writing out. Torn pages would then manifest as a wrong CRC on read. No correction ability, but at least a reliable detection

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: As far as #2, my posted proposal was to write the full pages to WAL when they are written to the file system, and not when they are first modified in the shared buffers --- That is *completely* unworkable. Or were you planning

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I can't see that you'd turn off the full-page-write option either. So we have not

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: The point here is that fsync-off is only realistic for development or playpen installations. You don't turn it off in a production machine, and I can't see that you'd turn off the full-page-write option

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Joshua D. Drake
Just to make my position perfectly clear: I don't want to see this option shipped in 8.1. It's reasonable to have it in there for now as an aid to our performance investigations, but I don't see that it has any value for production. Well, this is the first I am hearing that, and of course

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Joshua D. Drake wrote: Just to make my position perfectly clear: I don't want to see this option shipped in 8.1. It's reasonable to have it in there for now as an aid to our performance investigations, but I don't see that it has any value for production. Well, this is the first I

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Bruce Momjian
Simon Riggs wrote: On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: Well, I added #1 yesterday as 'full_page_writes', and it has the same warnings as fsync (namely, on crash, be prepared to recovery or check your system thoroughly. Yes, which is why I comment now that the GUC

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Josh Berkus
Tom, Josh, is OSDL up enough that you can try another comparison run? Thankfully, yes. If so, please undo the previous patch (which disabled page dumping entirely) and instead try removing this block of code, starting at about xlog.c line 620 in CVS tip: Will do. Results in a few days.

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Kenneth Marshall
On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: What we *could* do is calculate a page-level CRC and store it in the page header just before writing out. Torn pages would then manifest as a wrong CRC on

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-07 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: If so, please undo the previous patch (which disabled page dumping entirely) and instead try removing this block of code, starting at about xlog.c line 620 in CVS tip: Will do. Results in a few days. Great. BTW, don't bother testing snapshots between

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Simon Riggs
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Uh, what exactly did you cut out? I suggested dropping the dumping of full page images, but not removing CRCs altogether ... Attached is the patch I used. OK, thanks for the clarification. So it

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruce Momjian
Simon Riggs wrote: On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Uh, what exactly did you cut out? I suggested dropping the dumping of full page images, but not removing CRCs altogether ... Attached is the patch I used. OK, thanks for

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Simon Riggs
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: Well, I added #1 yesterday as 'full_page_writes', and it has the same warnings as fsync (namely, on crash, be prepared to recovery or check your system thoroughly. Yes, which is why I comment now that the GUC alone is not enough. There

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Joshua D. Drake
Tom, I think you're the only person that could or would be trusted to make such a change. Even past the 8.1 freeze, I say we need to do something now on this issue. I think if we document full_page_writes as similar to fsync in risk, we are OK for 8.1, but if something can be done easily, it

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Oliver Jowett
Simon Riggs wrote: I agree we *must* have the GUC, but we also *must* have a way for crash recovery to tell us for certain that it has definitely worked, not just maybe worked. Doesn't the same argument apply to the existing fsync = off case? i.e. we already have a case where we don't provide

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote: Well, I added #1 yesterday as 'full_page_writes', and it has the same warnings as fsync (namely, on crash, be prepared to recovery or check your system thoroughly. Yes, which is why I comment now

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruno Wolff III
On Wed, Jul 06, 2005 at 21:48:44 +0100, Simon Riggs [EMAIL PROTECTED] wrote: We could implement the torn-pages option, but that seems a lot of work. Another way of implementing a tell-tale would be to append the LSN again as a data page trailer as the last 4 bytes of the page. Thus the LSN

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes: Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't think you can possibly assume that. If the block crosses a cylinder boundary then it's

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-06 Thread Bruce Momjian
Tom Lane wrote: Bruno Wolff III [EMAIL PROTECTED] writes: Are you sure about that? That would probably be the normal case, but are you promised that the hardware will write all of the sectors of a block in order? I don't think you can possibly assume that. If the block crosses a

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Greg Stark
Bruce Momjian pgman@candle.pha.pa.us writes: I have an idea! Currently we write the backup pages (copies of pages modified since the last checkpoint) when we write the WAL changes as part of the commit. See the XLogCheckBuffer() call in XLogInsert(). Can someone explain exactly what the

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Russell Smith
On Sun, 3 Jul 2005 04:47 pm, Greg Stark wrote: Bruce Momjian pgman@candle.pha.pa.us writes: I have an idea! Currently we write the backup pages (copies of pages modified since the last checkpoint) when we write the WAL changes as part of the commit. See the XLogCheckBuffer() call in

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: Can someone explain exactly what the problem being defeated by writing whole pages to the WAL log? Partial writes. Without the full-page image, we do not have enough information in WAL to reconstruct the correct page contents. A further optimization would

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: Can someone explain exactly what the problem being defeated by writing whole pages to the WAL log? Partial writes. Without the full-page image, we do not have enough information in WAL to reconstruct the correct

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-03 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Partial writes. Without the full-page image, we do not have enough information in WAL to reconstruct the correct page contents. Sure, but why not? If a 8k page contains 16 low level segments on disk and the old data is

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-02 Thread Bruce Momjian
Tom Lane wrote: Josh Berkus josh@agliodbs.com writes: Uh, what exactly did you cut out? I suggested dropping the dumping of full page images, but not removing CRCs altogether ... Attached is the patch I used. OK, thanks for the clarification. So it does seem that dumping full page

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-07-01 Thread Qingqing Zhou
Magnus Hagander [EMAIL PROTECTED] writes FWIW, MSSQL deals with this using Torn Page Detection. This is off by default (no check at all!), but can be abled on a per-database level. Note that it only *detects* torn pages. If it finds one, it won't start and tell you to recover from backup. It

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Magnus Hagander
2. Think of a better defense against partial-page writes. I like #2, or would if I could think of a better defense. Ideas anyone? FWIW, MSSQL deals with this using Torn Page Detection. This is off by default (no check at all!), but can be abled on a per-database level. Note that it only

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Josh Berkus
Tom, Database pages. The current theory is that we can completely reconstruct from WAL data every page that's been modified since the last checkpoint. So the first write of any page after a checkpoint dumps a full image of the page into WAL; subsequent writes only write differences. What

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: What I'm confused about is that this shouldn't be anything new for 8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0 does, and it's pretty much entirely due to this check. That's simply not believable --- better recheck your analysis.

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-30 Thread Josh Berkus
Tom, What I'm confused about is that this shouldn't be anything new for 8.1. Yet 8.1 has *worse* performance on the STP machines than 8.0 does, and it's pretty much entirely due to this check. That's simply not believable --- better recheck your analysis. If 8.1 is worse it's not

[HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, All: Ok, finally managed though the peristent efforts of Mark Wong to get some tests through. Here are two tests with the CRC and wall buffer checking completely cut out of the code, as Tom suggested: 5-min checkpoint: http://khack.osdl.org/stp/302738/results/0/

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Ok, finally managed though the peristent efforts of Mark Wong to get some tests through. Here are two tests with the CRC and wall buffer checking completely cut out of the code, as Tom suggested: Uh, what exactly did you cut out? I suggested dropping

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, Uh, what exactly did you cut out? I suggested dropping the dumping of full page images, but not removing CRCs altogether ... Attached is the patch I used. (it's a -Urn patch 'cause that's what STP takes) -- --Josh Josh Berkus Aglio Database Solutions San Francisco diff -urN

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Uh, what exactly did you cut out? I suggested dropping the dumping of full page images, but not removing CRCs altogether ... Attached is the patch I used. OK, thanks for the clarification. So it does seem that dumping full page images is a pretty big

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: 1. Offer a GUC to turn off full-page-image dumping, which you'd use only if you really trust your hardware :-( Are these just WAL pages? Or database pages as well? Database pages. The current theory is that we can completely reconstruct from WAL data

Re: [HACKERS] Checkpoint cost, looks like it is WAL/CRC

2005-06-29 Thread Josh Berkus
Tom, 1. Offer a GUC to turn off full-page-image dumping, which you'd use only if you really trust your hardware :-( Are these just WAL pages? Or database pages as well? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of