Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Florian Pflug
On Jan4, 2012, at 21:27 , Robert Haas wrote: I think the first thing we need to look at is increasing the number of CLOG buffers. What became of the idea to treat the stable (i.e. earlier than the oldest active xid) and the unstable (i.e. the rest) parts of the CLOG differently. On 64-bit

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 5:15 AM, Florian Pflug f...@phlo.org wrote: On Jan4, 2012, at 21:27 , Robert Haas wrote: I think the first thing we need to look at is increasing the number of CLOG buffers. What became of the idea to treat the stable (i.e. earlier than the oldest active xid) and the

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 6:15 AM, Florian Pflug f...@phlo.org wrote: On 64-bit machines at least, we could simply mmap() the stable parts of the CLOG into the backend address space, and access it without any locking at all. True. I think this could be done, but it would take some fairly careful

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Benedikt Grundmann
For what's worth here are the numbers on one of our biggest databases (same system as I posted about separately wrt seq_scan_cost vs random_page_cost). 0053 1001 00BA 1009 0055 1001 00B9 1020 0054 983 00BB 1010 0056 1001 00BC 1019 0069 0 00BD 1009 006A 224 00BE 1018 006B 1009 00BF 1008 006C 1008

Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Kevin Grittner
Benedikt Grundmann bgrundm...@janestreet.com wrote: For what's worth here are the numbers on one of our biggest databases (same system as I posted about separately wrt seq_scan_cost vs random_page_cost). That's would be a 88.4% hit rate on the summarized data. -Kevin -- Sent via

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby
On Dec 23, 2011, at 2:23 PM, Kevin Grittner wrote: Jeff Janes jeff.ja...@gmail.com wrote: Could we get some major OLTP users to post their CLOG for analysis? I wouldn't think there would be much security/propietary issues with CLOG data. FWIW, I got the raw numbers to do my quick check

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte blocks (1024 transactions each) were all commits.

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 3:02 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: 2. The CLOG code isn't designed to manage a large number of buffers, so adding more might cause a performance regression on small systems. On Nate Boley's 32-core system, running pgbench at scale factor 100, the optimal number of buffers seems to

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Jim Nasby
On Jan 4, 2012, at 2:02 PM, Kevin Grittner wrote: Jim Nasby j...@nasby.net wrote: Here's output from our largest OLTP system... not sure exactly how to interpret it, so I'm just providing the raw data. This spans almost exactly 1 month. Those number wind up meaning that 18% of the 256-byte

Re: [HACKERS] Page Checksums + Double Writes

2012-01-04 Thread Robert Haas
On Wed, Jan 4, 2012 at 4:02 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: 2. The CLOG code isn't designed to manage a large number of buffers, so adding more might cause a performance regression on small systems. On Nate Boley's 32-core

Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Greg Stark
On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote: I bet if you kept a judicious number of clog pages in each local process with some smart invalidation you could cover enough cases that scribbling the bits down would become unnecessary. I don't understand how any

Re: [HACKERS] Page Checksums + Double Writes

2011-12-28 Thread Merlin Moncure
On Wed, Dec 28, 2011 at 8:45 AM, Greg Stark st...@mit.edu wrote: On Tue, Dec 27, 2011 at 10:43 PM, Merlin Moncure mmonc...@gmail.com wrote:  I bet if you kept a judicious number of clog pages in each local process with some smart invalidation you could cover enough cases that scribbling the

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Thu, 2011-12-22 at 03:50 -0600, Kevin Grittner wrote: Now, on to the separate-but-related topic of double-write. That absolutely requires some form of checksum or CRC to detect torn pages, in order for the technique to work at all. Adding a CRC without double-write would work fine if you

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Merlin Moncure
On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis pg...@j-davis.com wrote: 3. Attack hint bits problem. A large number of problems would go away if the current hint bit system could be replaced with something that did not require writing to the tuple itself. FWIW, moving the bits around seems like a

Re: [HACKERS] Page Checksums + Double Writes

2011-12-27 Thread Jeff Davis
On Tue, 2011-12-27 at 16:43 -0600, Merlin Moncure wrote: On Tue, Dec 27, 2011 at 1:24 PM, Jeff Davis pg...@j-davis.com wrote: 3. Attack hint bits problem. A large number of problems would go away if the current hint bit system could be replaced with something that did not require writing

Re: [HACKERS] Page Checksums + Double Writes

2011-12-24 Thread Simon Riggs
On Thu, Dec 22, 2011 at 9:58 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon, does it sound like I understand your proposal? Yes, thanks for restating. I've implemented that proposal, posting patch on a

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: I would suggest you examine how to have an array of N bgwriters, then just slot the code for hinting into the bgwriter. That way a bgwriter can set hints, calc CRC and write pages in sequence on a particular block. The hinting needs to be

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Robert Haas
On Fri, Dec 23, 2011 at 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Thoughts? Those are good thoughts. Here's another random idea, which might be completely nuts. Maybe we could consider some kind of summarization of CLOG data, based on the idea that most transactions commit.

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: An obvious problem is that, if the abort rate is significantly different from zero, and especially if the aborts are randomly mixed in with commits rather than clustered together in small portions of the XID space, the CLOG rollup data would become

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Robert Haas
On Fri, Dec 23, 2011 at 12:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: An obvious problem is that, if the abort rate is significantly different from zero, and especially if the aborts are randomly mixed in with commits rather than clustered together in

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Jeff Janes
On 12/23/11, Robert Haas robertmh...@gmail.com wrote: On Fri, Dec 23, 2011 at 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Thoughts? Those are good thoughts. Here's another random idea, which might be completely nuts. Maybe we could consider some kind of summarization of

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: An obvious problem is that, if the abort rate is significantly different from zero, and especially if the aborts are randomly mixed in with commits rather than clustered together in small portions of the XID space,

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote: Could we get some major OLTP users to post their CLOG for analysis? I wouldn't think there would be much security/propietary issues with CLOG data. FWIW, I got the raw numbers to do my quick check using this Ruby script (put together for me by Peter

Re: [HACKERS] Page Checksums + Double Writes

2011-12-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: I had a perhaps crazier idea. Aren't CLOG pages older than global xmin effectively read only? Could backends that need these bypass locking and shared memory altogether? Hmm ... once they've been written out from the SLRU arena, yes. In fact you don't

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Florian Weimer
* David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 7:44 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 22.12.2011 01:43, Tom Lane wrote: A utility to bump the page version is equally a whole lot easier said than done, given that the new version has more overhead space and thus less payload space

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 8:42 AM, Florian Weimer fwei...@bfk.de wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jesper Krogh
On 2011-12-22 09:42, Florian Weimer wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs wrote: So overall, I do now think its still possible to add an optional checksum in the 9.2 release and am willing to pursue it unless there are technical objections. Just to restate Simon's proposal, to make sure I'm understanding it, we would support a new page header format

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh jes...@krogh.cc wrote: On 2011-12-22 09:42, Florian Weimer wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Jignesh Shah jks...@gmail.com wrote: When we use Doublewrite with checksums, we can safely disable full_page_write causing a HUGE reduction to the WAL traffic without loss of reliatbility due to a write fault since there are two writes always. (Implementation detail discussable). The

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 11:16 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jignesh Shah jks...@gmail.com wrote: When we use Doublewrite with checksums, we can safely disable full_page_write causing a HUGE reduction to the WAL traffic without loss of reliatbility due to a write fault

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Robert Haas
On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote: In the double write implementation, every checkpoint write is double writed, Unless I'm quite thoroughly confused, which is possible, the double write will need to happen the first time a buffer is written following each

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Jignesh Shah
On Thu, Dec 22, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote: In the double write implementation, every checkpoint write is double writed, Unless I'm quite thoroughly confused, which is possible, the double

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Simon Riggs
On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon, does it sound like I understand your proposal? Yes, thanks for restating. Now, on to the separate-but-related topic of double-write.  That absolutely requires some form of checksum or CRC to detect torn

Re: [HACKERS] Page Checksums + Double Writes

2011-12-22 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote: It could work that way, but I seriously doubt that a technique only mentioned in dispatches one month before the last CF is likely to become trustable code within one month. We've been discussing CRCs for years, so assembling the puzzle seems much

[HACKERS] Page Checksums + Double Writes

2011-12-21 Thread David Fetter
Folks, One of the things VMware is working on is double writes, per previous discussions of how, for example, InnoDB does things. I'd initially thought that introducing just one of the features in $Subject at a time would help, but I'm starting to see a mutual dependency. The issue is that

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Alvaro Herrera
Excerpts from David Fetter's message of mié dic 21 18:59:13 -0300 2011: If not, we'll have to do some extra work on the patch as described below. Thanks to Kevin Grittner for coming up with this :) - Use a header bit to say whether we've got a checksum on the page. We're using 3/16 of

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote: If you get away with a new page format, let's make sure and coordinate so that we can add more info into the header. One thing I wanted was to have an ID struct on each file, so that you know what DB/relation/segment the file corresponds to.

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Wed, Dec 21, 2011 at 10:19 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Alvaro Herrera alvhe...@commandprompt.com wrote: If you get away with a new page format, let's make sure and coordinate so that we can add more info into the header.  One thing I wanted was to have an ID

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Tom Lane
David Fetter da...@fetter.org writes: There's a separate issue we'd like to get clear on, which is whether it would be OK to make a new PG_PAGE_LAYOUT_VERSION. If you're not going to provide pg_upgrade support, I think there is no chance of getting a new page layout accepted. The people who

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: We don't need to use any flag bits at all. We add PG_PAGE_LAYOUT_VERSION to the control file, so that CRC checking becomes an initdb option. All new pages can be created with PG_PAGE_LAYOUT_VERSION from the control file. All existing pages must be

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Wed, Dec 21, 2011 at 11:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: It seems like you've forgotten all of the previous discussion of how we'd manage a page format version change. Maybe I've had too much caffeine. It's certainly late here. Having two different page formats running around in

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Rob Wultsch
On Wed, Dec 21, 2011 at 1:59 PM, David Fetter da...@fetter.org wrote: One of the things VMware is working on is double writes, per previous discussions of how, for example, InnoDB does things. The world is moving to flash, and the lifetime of flash is measured writes. Potentially doubling the

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Robert Haas
On Wed, Dec 21, 2011 at 7:06 PM, Simon Riggs si...@2ndquadrant.com wrote: My feeling is it probably depends upon how different the formats are, so given we are discussing a 4 byte addition to the header, it might be doable. I agree. When thinking back on Zoltan's patches, it's worth

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread David Fetter
On Wed, Dec 21, 2011 at 04:18:33PM -0800, Rob Wultsch wrote: On Wed, Dec 21, 2011 at 1:59 PM, David Fetter da...@fetter.org wrote: One of the things VMware is working on is double writes, per previous discussions of how, for example, InnoDB does things. The world is moving to flash, and

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Simon Riggs
On Thu, Dec 22, 2011 at 12:06 AM, Simon Riggs si...@2ndquadrant.com wrote: Having two different page formats running around in the system at the same time is far from free; in the worst case it means that every single piece of code that touches pages has to know about and be prepared to cope

Re: [HACKERS] Page Checksums + Double Writes

2011-12-21 Thread Heikki Linnakangas
On 22.12.2011 01:43, Tom Lane wrote: A utility to bump the page version is equally a whole lot easier said than done, given that the new version has more overhead space and thus less payload space than the old. What does it do when the old page is too full to be converted? Move some data