Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Simon Riggs
On 10 April 2013 09:01, Ants Aasma a...@cybertec.at wrote: Using SIMD for WAL is not a requirement at all; I just thought it might be a nice benefit for non-checksum-enabled users in some later release. I think we should first deal with using it for page checksums and if future versions

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Ants Aasma
On Wed, Apr 10, 2013 at 12:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On 10 April 2013 09:01, Ants Aasma a...@cybertec.at wrote: Using SIMD for WAL is not a requirement at all; I just thought it might be a nice benefit for non-checksum-enabled users in some later release. I think we

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Bruce Momjian
On Wed, Apr 10, 2013 at 01:15:12PM +0300, Ants Aasma wrote: This work needs to happen now, since once the checksum algorithm is set we won't easily be able to change it. The page checksum algorithm needs to be decided now, but WAL CRCs and full page writes can be changed in 9.4 and don't

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Jeff Davis
On Wed, 2013-04-10 at 11:01 +0300, Ants Aasma wrote: I think we should first deal with using it for page checksums and if future versions want to reuse some of the code for WAL checksums then we can rearrange the code. Sounds good to me, although I expect we at least want any assembly to be in

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Simon Riggs
On 10 April 2013 11:15, Ants Aasma a...@cybertec.at wrote: * We might even be able to calculate CRC32 checksum for normal WAL records, and use Ants' checksum for full page writes (only). So checking WAL checksum would then be to confirm header passes CRC32 and then re-check the Ants

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Jeff Davis
On Wed, 2013-04-10 at 20:17 +0100, Simon Riggs wrote: OK, so we have a single combined calculate a checksum for a block function. That uses Jeff's zeroing trick and Ants' bulk-oriented performance optimization. For buffer checksums we simply calculate for the block. Sounds good. For

Re: [HACKERS] Enabling Checksums

2013-04-09 Thread Simon Riggs
On 9 April 2013 03:35, Ants Aasma a...@cybertec.at wrote: On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote: Unless somebody tells me not to waste my time I'll go ahead and come up with a workable patch by Monday. And here you go. I decided to be verbose with the comments

Re: [HACKERS] Enabling Checksums

2013-04-09 Thread Ants Aasma
On Tue, Apr 9, 2013 at 10:03 AM, Simon Riggs si...@2ndquadrant.com wrote: Thanks. Would you mind reworking the patch so that you aren't removing the existing code, only IFDEFing it out of the way. I'd like to make it as easy as possible to skip your implementation for both us and the use of the

Re: [HACKERS] Enabling Checksums

2013-04-09 Thread Ants Aasma
On Tue, Apr 9, 2013 at 5:35 AM, Ants Aasma a...@cybertec.at wrote: Quick bench results on the worst case workload: master no checksums: tps = 15.561848 master with checksums: tps = 1.695450 simd checksums: tps = 14.602698 For reference, results for the generic version, with default build

Re: [HACKERS] Enabling Checksums

2013-04-09 Thread Jeff Davis
On Tue, 2013-04-09 at 05:35 +0300, Ants Aasma wrote: And here you go. I decided to be verbose with the comments as it's easier to delete a comment to write one. I also left in a huge jumble of macros to calculate the contents of a helper var during compile time. This can easily be replaced

Re: [HACKERS] Enabling Checksums

2013-04-08 Thread Simon Riggs
On 6 April 2013 08:40, Heikki Linnakangas hlinnakan...@vmware.com wrote: AFAICS that could be easily avoided by doing a simple PageGetLSN() like we used to, if checksums are not enabled. In XLogCheckBuffer: /* * XXX We assume page LSN is first data on *every* page that can

Re: [HACKERS] Enabling Checksums

2013-04-08 Thread Ants Aasma
On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote: Unless somebody tells me not to waste my time I'll go ahead and come up with a workable patch by Monday. And here you go. I decided to be verbose with the comments as it's easier to delete a comment to write one. I also left in

Re: [HACKERS] Enabling Checksums

2013-04-06 Thread Heikki Linnakangas
On 05.04.2013 23:25, Kevin Grittner wrote: Jeff Davispg...@j-davis.com wrote: Also, the first version doesn't necessarily need to perform well; we can leave optimization as future work. +1, as long as we don't slow down instances not using the feature, and we don't paint ourselves into a

Re: [HACKERS] Enabling Checksums

2013-04-05 Thread Jeff Davis
On Tue, 2013-03-26 at 03:34 +0200, Ants Aasma wrote: The main thing to look out for is that we don't have any blind spots for conceivable systemic errors. If we decide to go with the SIMD variant then I intend to figure out what the blind spots are and show that they don't matter. Are you

Re: [HACKERS] Enabling Checksums

2013-04-05 Thread Ants Aasma
On Fri, Apr 5, 2013 at 7:23 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-03-26 at 03:34 +0200, Ants Aasma wrote: The main thing to look out for is that we don't have any blind spots for conceivable systemic errors. If we decide to go with the SIMD variant then I intend to figure out

Re: [HACKERS] Enabling Checksums

2013-04-05 Thread Greg Smith
On 4/5/13 12:23 PM, Jeff Davis wrote: Are you still looking into SIMD? Right now, it's using the existing CRC implementation. Obviously we can't change it after it ships. Or is it too late to change it already? Simon just headed away for a break, so I'll try to answer this. He committed with

Re: [HACKERS] Enabling Checksums

2013-04-05 Thread Jeff Davis
On Fri, 2013-04-05 at 21:39 +0300, Ants Aasma wrote: Yes, I just managed to get myself some time so I can look at it some more. I was hoping that someone would weigh in on what their preferences are on the performance/effectiveness trade-off and the fact that we need to use assembler to make

Re: [HACKERS] Enabling Checksums

2013-04-05 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote: My opinion is that we don't need to be perfect as long as we catch 99% of random errors and we don't have any major blind spots. +1 Also, the first version doesn't necessarily need to perform well; we can leave optimization as future work. +1, as long

Re: [HACKERS] Enabling Checksums

2013-03-29 Thread Andres Freund
On 2013-03-28 21:02:06 -0400, Robert Haas wrote: On Wed, Mar 27, 2013 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-03-27 10:06:19 -0400, Robert Haas wrote: On Mon, Mar 18, 2013 at 4:31 PM, Greg Smith g...@2ndquadrant.com wrote: to get them going again. If the install

Re: [HACKERS] Enabling Checksums

2013-03-29 Thread Jim Nasby
On 3/25/13 8:25 AM, Bruce Momjian wrote: On Fri, Mar 22, 2013 at 11:35:35PM -0500, Jim Nasby wrote: On 3/20/13 8:41 AM, Bruce Momjian wrote: Also, if a users uses checksums in 9.3, could they initdb without checksums in 9.4 and use pg_upgrade? As coded, the pg_controldata checksum settings

Re: [HACKERS] Enabling Checksums

2013-03-28 Thread Robert Haas
On Wed, Mar 27, 2013 at 10:15 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-03-27 10:06:19 -0400, Robert Haas wrote: On Mon, Mar 18, 2013 at 4:31 PM, Greg Smith g...@2ndquadrant.com wrote: to get them going again. If the install had checksums, I could have figured out which

Re: [HACKERS] Enabling Checksums

2013-03-27 Thread Robert Haas
On Mon, Mar 18, 2013 at 4:31 PM, Greg Smith g...@2ndquadrant.com wrote: to get them going again. If the install had checksums, I could have figured out which blocks were damaged and manually fixed them, basically go on a hunt for torn pages and the last known good copy via full-page write.

Re: [HACKERS] Enabling Checksums

2013-03-27 Thread Andres Freund
On 2013-03-27 10:06:19 -0400, Robert Haas wrote: On Mon, Mar 18, 2013 at 4:31 PM, Greg Smith g...@2ndquadrant.com wrote: to get them going again. If the install had checksums, I could have figured out which blocks were damaged and manually fixed them, basically go on a hunt for torn pages

Re: [HACKERS] Enabling Checksums

2013-03-25 Thread Bruce Momjian
On Fri, Mar 22, 2013 at 11:35:35PM -0500, Jim Nasby wrote: On 3/20/13 8:41 AM, Bruce Momjian wrote: Also, if a users uses checksums in 9.3, could they initdb without checksums in 9.4 and use pg_upgrade? As coded, the pg_controldata checksum settings would not match and pg_upgrade would throw

Re: [HACKERS] Enabling Checksums

2013-03-25 Thread Bruce Momjian
On Fri, Mar 22, 2013 at 05:09:53PM +0200, Ants Aasma wrote: To see real world performance numbers I dumped the algorithms on top of the checksums patch. I set up postgres with 32MB shared buffers, and ran with concurrency 4 select only pgbench and a worst case workload, results are median of 5

Re: [HACKERS] Enabling Checksums

2013-03-25 Thread Ants Aasma
On Mon, Mar 25, 2013 at 3:51 PM, Bruce Momjian br...@momjian.us wrote: Great analysis. Is there any logic to using a lighter-weight checksum calculation for cases where the corruption is rare? For example, we know that network transmission can easily be corrupted, while buffer corruption is

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Jim Nasby
On 3/18/13 2:25 PM, Simon Riggs wrote: On 18 March 2013 19:02, Jeff Davis pg...@j-davis.com wrote: On Sun, 2013-03-17 at 22:26 -0700, Daniel Farina wrote: as long as I am able to turn them off easily To be clear: you don't get the performance back by doing ignore_checksum_failure = on. You

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Jim Nasby
I realize Simone relented on this, but FWIW... On 3/16/13 4:02 PM, Simon Riggs wrote: Most other data we store doesn't consist of large runs of 0x00 or 0xFF as data. Most data is more complex than that, so any runs of 0s or 1s written to the block will be detected. ... It's not that uncommon

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Jim Nasby
On 3/20/13 8:41 AM, Bruce Momjian wrote: On Mon, Mar 18, 2013 at 01:52:58PM -0400, Bruce Momjian wrote: I assume a user would wait until they suspected corruption to turn it on, and because it is only initdb-enabled, they would have to dump/reload their cluster. The open question is whether

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Ants Aasma
On Sat, Mar 23, 2013 at 5:14 AM, Craig Ringer cr...@2ndquadrant.com wrote: Making zero a not checksummed magic value would significantly detract from the utility of checksums IMO. FWIW using 65521 modulus to compress larger checksums into 16 bits will leave 14 non-zero values unused. Regards,

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Andres Freund
Results for pgbench scale 100: No checksums: tps = 56623.819783 Fletcher checksums: tps = 55282.222687 (1.024x slowdown) CRC Checksums: tps = 50571.324795 (1.120x slowdown) SIMD Checksums: tps = 56608.888985 (1.000x slowdown) So to conclude, the 3 approaches: Great

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Ants Aasma
On Sat, Mar 23, 2013 at 3:10 PM, Andres Freund and...@2ndquadrant.com wrote: Andres showed that switching out the existing CRC for zlib's would result in 8-30% increase in INSERT-SELECT speed (http://www.postgresql.org/message-id/201005202227.49990.and...@anarazel.de) with the speeded up CRC

Re: [HACKERS] Enabling Checksums

2013-03-23 Thread Andres Freund
On 2013-03-23 15:36:03 +0200, Ants Aasma wrote: On Sat, Mar 23, 2013 at 3:10 PM, Andres Freund and...@2ndquadrant.com wrote: Andres showed that switching out the existing CRC for zlib's would result in 8-30% increase in INSERT-SELECT speed

Re: [HACKERS] Enabling Checksums

2013-03-22 Thread Ants Aasma
On Fri, Mar 22, 2013 at 3:04 AM, Jeff Davis pg...@j-davis.com wrote: I've been following your analysis and testing, and it looks like there are still at least three viable approaches: 1. Some variant of Fletcher 2. Some variant of CRC32 3. Some SIMD-based checksum Each of those has some

Re: [HACKERS] Enabling Checksums

2013-03-22 Thread Jeff Davis
On Fri, 2013-03-22 at 17:09 +0200, Ants Aasma wrote: For performance the K8 results gave me confidence that we have a reasonably good overview what the performance is like for the class of CPU's that PostgreSQL is likely to run on. I don't think there is anything left to optimize there, all

Re: [HACKERS] Enabling Checksums

2013-03-22 Thread Jeff Davis
On Fri, 2013-03-22 at 17:09 +0200, Ants Aasma wrote: So to conclude, the 3 approaches: One other question: assuming that the algorithms use the full 16-bit space, is there a good way to avoid zero without skewing the result? Can we do something like un-finalize (after we figure out that it's

Re: [HACKERS] Enabling Checksums

2013-03-22 Thread Ants Aasma
On Fri, Mar 22, 2013 at 7:35 PM, Jeff Davis pg...@j-davis.com wrote: On Fri, 2013-03-22 at 17:09 +0200, Ants Aasma wrote: For performance the K8 results gave me confidence that we have a reasonably good overview what the performance is like for the class of CPU's that PostgreSQL is likely to

Re: [HACKERS] Enabling Checksums

2013-03-22 Thread Craig Ringer
On 03/23/2013 02:00 AM, Jeff Davis wrote: On Fri, 2013-03-22 at 17:09 +0200, Ants Aasma wrote: So to conclude, the 3 approaches: One other question: assuming that the algorithms use the full 16-bit space, is there a good way to avoid zero without skewing the result? Can we do something like

Re: [HACKERS] Enabling Checksums

2013-03-21 Thread Jeff Davis
On Wed, 2013-03-20 at 02:11 +0200, Ants Aasma wrote: Fletcher is also still a strong contender, we just need to replace the 255 modulus with something less prone to common errors, maybe use 65521 as the modulus. I'd have to think how to best combine the values in that case. I believe we can

Re: [HACKERS] Enabling Checksums

2013-03-20 Thread Greg Stark
On Mon, Mar 18, 2013 at 5:52 PM, Bruce Momjian br...@momjian.us wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. For what it's worth I think cpu overhead of the checksum is totally a red herring.. Of course there's no reason not to optimize it to be as

Re: [HACKERS] Enabling Checksums

2013-03-20 Thread Bruce Momjian
On Mon, Mar 18, 2013 at 01:52:58PM -0400, Bruce Momjian wrote: I assume a user would wait until they suspected corruption to turn it on, and because it is only initdb-enabled, they would have to dump/reload their cluster. The open question is whether this is a usable feature as written, or

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Jeff Davis
On Sat, 2013-03-16 at 20:41 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On 15 March 2013 13:08, Andres Freund and...@2ndquadrant.com wrote: I commented on this before, I personally think this property makes fletcher a not so good fit for this. Its not uncommon for

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Jeff Davis
On Fri, 2013-03-15 at 14:32 +0200, Ants Aasma wrote: The most obvious case here is that you can swap any number of bytes from 0x00 to 0xFF or back without affecting the hash. That's a good point. Someone (Simon?) had brought that up before, but you and Tom convinced me that it's a problem. As

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 19 March 2013 17:18, Jeff Davis pg...@j-davis.com wrote: I will move back to verifying the page hole, as well. That was agreed long ago... There are a few approaches: 1. Verify that the page hole is zero before write and after read. 2. Include it in the calculation (if we think there

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 19 March 2013 00:17, Ants Aasma a...@cybertec.at wrote: I looked for fast CRC implementations on the net. Thanks very much for great input. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: I will move back to verifying the page hole, as well. There are a few approaches: 1. Verify that the page hole is zero before write and after read. 2. Include it in the calculation (if we think there are some corner cases where the hole might not be all

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/8/13 4:40 PM, Greg Stark wrote: On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus j...@agliodbs.com wrote: After some examination of the systems involved, we conculded that the issue was the FreeBSD drivers for the new storage, which were unstable and had custom source patches. However, without

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/18/13 8:17 PM, Ants Aasma wrote: I looked for fast CRC implementations on the net. The fastest plain C variant I could find was one produced by Intels RD department (available with a BSD license [1], requires some porting). Very specifically, it references

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Ants Aasma
On Tue, Mar 19, 2013 at 11:28 PM, Greg Smith g...@2ndquadrant.com wrote: I don't remember if there's any good precedent for whether this form of BSD licensed code can be assimilated into PostgreSQL without having to give credit to Intel in impractical places. I hate these licenses with the

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/19/13 6:08 PM, Ants Aasma wrote: My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse than what you currently measured for a shared buffer swapping workload.

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Daniel Farina
On Tue, Mar 19, 2013 at 3:52 PM, Greg Smith g...@2ndquadrant.com wrote: On 3/19/13 6:08 PM, Ants Aasma wrote: My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Andrew Dunstan
On 03/19/2013 06:52 PM, Greg Smith wrote: While being a lazy researcher today instead of writing code, I discovered that the PNG file format includes a CRC-32 on its data chunks, and to support that there's a CRC32 function inside of zlib: http://www.zlib.net/zlib_tech.html Is there

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/19/13 7:13 PM, Daniel Farina wrote: I'm confused. Postgres includes a CRC32 implementation for WAL, does it not? Are you referring to something else? I'm just pointing out that zlib includes one, too, and they might be more motivated/able as a project to chase after Intel's hardware

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Ants Aasma
On Wed, Mar 20, 2013 at 12:52 AM, Greg Smith g...@2ndquadrant.com wrote: On 3/19/13 6:08 PM, Ants Aasma wrote: My main worry is that there is a reasonably large population of users out there that don't have that acceleration capability and will have to settle for performance overhead 4x worse

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Simon Riggs
On 20 March 2013 00:03, Greg Smith g...@2ndquadrant.com wrote: Simon suggested the other day that we should make the exact checksum mechanism used pluggable at initdb time, just some last minute alternatives checking on the performance of the real server code. I've now got the WAL CRC32, the

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/19/13 8:17 PM, Simon Riggs wrote: We know that will work, has reasonable distribution characteristics and might even speed things up rather than have two versions of CRC in the CPU cache. That sounds reasonable to me. All of these CRC options have space/time trade-offs via how large the

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes: While being a lazy researcher today instead of writing code, I discovered that the PNG file format includes a CRC-32 on its data chunks, and to support that there's a CRC32 function inside of zlib: http://www.zlib.net/zlib_tech.html Hah, old sins

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Greg Smith
On 3/19/13 10:05 PM, Tom Lane wrote: FWIW, I would argue that any tradeoffs we make in this area must be made on the assumption of no such acceleration. If we can later make things better for Intel(TM) users, that's cool, but let's not screw those using other CPUs. I see compatibility with

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Simon Riggs
On 18 March 2013 00:50, Greg Smith g...@2ndquadrant.com wrote: On 3/17/13 1:41 PM, Simon Riggs wrote: So I'm now moving towards commit using a CRC algorithm. I'll put in a feature to allow algorithm be selected at initdb time, though that is mainly a convenience to allow us to more easily do

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Bruce Momjian
On Sun, Mar 17, 2013 at 05:50:11PM -0700, Greg Smith wrote: As long as the feature is off by default, so that people have to turn it on to hit the biggest changed code paths, the exposure to potential bugs doesn't seem too bad. New WAL data is no fun, but it's not like this hasn't happened

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Pavel Stehule
2013/3/18 Bruce Momjian br...@momjian.us: On Sun, Mar 17, 2013 at 05:50:11PM -0700, Greg Smith wrote: As long as the feature is off by default, so that people have to turn it on to hit the biggest changed code paths, the exposure to potential bugs doesn't seem too bad. New WAL data is no fun,

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Simon Riggs
On 18 March 2013 17:52, Bruce Momjian br...@momjian.us wrote: On Sun, Mar 17, 2013 at 05:50:11PM -0700, Greg Smith wrote: As long as the feature is off by default, so that people have to turn it on to hit the biggest changed code paths, the exposure to potential bugs doesn't seem too bad. New

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. I'd expect that quite a few people would, myself included on a brand new DB that I didn't have any reason to think would need to be super-performant. I assume a user

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Josh Berkus
With a potential 10-20% overhead, I am unclear who would enable this at initdb time. People who know they have a chronic issue with bad disks/cards/drivers would. Or anyone with enough machines that IO corruption is an operational concern worth more than 10% overhead. Or, in a word: Heroku,

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Jeff Davis
On Mon, 2013-03-18 at 13:52 -0400, Bruce Momjian wrote: In fact, this feature is going to need pg_upgrade changes to detect from pg_controldata that the old/new clusters have the same checksum setting. I believe that has been addressed in the existing patch. Let me know if you see any

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Jeff Davis
On Sun, 2013-03-17 at 22:26 -0700, Daniel Farina wrote: as long as I am able to turn them off easily To be clear: you don't get the performance back by doing ignore_checksum_failure = on. You only get around the error itself, which allows you to dump/reload the good data. Regards, Jeff

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Bruce Momjian
On Mon, Mar 18, 2013 at 11:42:23AM -0700, Jeff Davis wrote: On Mon, 2013-03-18 at 13:52 -0400, Bruce Momjian wrote: In fact, this feature is going to need pg_upgrade changes to detect from pg_controldata that the old/new clusters have the same checksum setting. I believe that has been

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Bruce Momjian
On Mon, Mar 18, 2013 at 06:24:37PM +, Simon Riggs wrote: On 18 March 2013 17:52, Bruce Momjian br...@momjian.us wrote: On Sun, Mar 17, 2013 at 05:50:11PM -0700, Greg Smith wrote: As long as the feature is off by default, so that people have to turn it on to hit the biggest changed code

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Simon Riggs
On 18 March 2013 19:02, Jeff Davis pg...@j-davis.com wrote: On Sun, 2013-03-17 at 22:26 -0700, Daniel Farina wrote: as long as I am able to turn them off easily To be clear: you don't get the performance back by doing ignore_checksum_failure = on. You only get around the error itself, which

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Greg Smith
On 3/18/13 10:52 AM, Bruce Momjian wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. If you survey people who are running PostgreSQL on cloud hardware, be it Amazon's EC2 or similar options from other vendors, you will find a high percentage of them

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Ants Aasma
On Mon, Mar 18, 2013 at 2:04 AM, Greg Smith g...@2ndquadrant.com wrote: On 3/15/13 5:32 AM, Ants Aasma wrote: Best case using the CRC32 instruction would be 6.8 bytes/cycle [1]. But this got me thinking about how to do this faster... [1]

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Daniel Farina
On Mon, Mar 18, 2013 at 1:31 PM, Greg Smith g...@2ndquadrant.com wrote: On 3/18/13 10:52 AM, Bruce Momjian wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. If you survey people who are running PostgreSQL on cloud hardware, be it Amazon's EC2 or

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Greg Smith
On 3/18/13 5:36 PM, Daniel Farina wrote: Clarification, because I think this assessment as delivered feeds some unnecessary FUD about EBS: EBS is quite reliable. Presuming that all noticed corruptions are strictly EBS's problem (that's quite a stretch), I'd say the defect rate falls somewhere

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Daniel Farina
On Mon, Mar 18, 2013 at 7:13 PM, Greg Smith g...@2ndquadrant.com wrote: I wasn't trying to flog EBS as any more or less reliable than other types of storage. What I was trying to emphasize, similarly to your quite a stretch comment, was the uncertainty involved when such deployments fail.

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Simon Riggs
On 17 March 2013 00:41, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 15 March 2013 13:08, Andres Freund and...@2ndquadrant.com wrote: I commented on this before, I personally think this property makes fletcher a not so good fit for this. Its not uncommon

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Simon Riggs
On 13 March 2013 06:33, Jeff Davis pg...@j-davis.com wrote: On Thu, 2013-03-07 at 13:45 -0800, Jeff Davis wrote: I need to do another self-review after these changes and some more extensive testing, so I might have missed a couple things. New patch attached. Aside from rebasing, I also

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Greg Smith
On 3/15/13 5:32 AM, Ants Aasma wrote: Best case using the CRC32 instruction would be 6.8 bytes/cycle [1]. But this got me thinking about how to do this faster... [1] http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411 The optimization work you went through here

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Greg Smith
On 3/17/13 1:41 PM, Simon Riggs wrote: So I'm now moving towards commit using a CRC algorithm. I'll put in a feature to allow algorithm be selected at initdb time, though that is mainly a convenience to allow us to more easily do further testing on speedups and whether there are any platform

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Daniel Farina
On Sun, Mar 17, 2013 at 5:50 PM, Greg Smith g...@2ndquadrant.com wrote: On the testing front, we've seen on-list interest in this feature from companies like Heroku and Enova, who both have some resources and practice to help testing too. Heroku can spin up test instances with workloads any

Re: [HACKERS] Enabling Checksums

2013-03-16 Thread Simon Riggs
On 15 March 2013 13:08, Andres Freund and...@2ndquadrant.com wrote: On 2013-03-15 14:32:57 +0200, Ants Aasma wrote: On Wed, Mar 6, 2013 at 1:34 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64

Re: [HACKERS] Enabling Checksums

2013-03-16 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On 15 March 2013 13:08, Andres Freund and...@2ndquadrant.com wrote: I commented on this before, I personally think this property makes fletcher a not so good fit for this. Its not uncommon for parts of a block being all-zero and many disk corruptions

Re: [HACKERS] Enabling Checksums

2013-03-15 Thread Ants Aasma
On Wed, Mar 6, 2013 at 1:34 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious weakness in it, but if it's indeed faster and as good as a

Re: [HACKERS] Enabling Checksums

2013-03-15 Thread Ants Aasma
On Fri, Mar 15, 2013 at 2:32 PM, Ants Aasma a...@cybertec.at wrote: I was able to coax GCC to vectorize the code in the attached patch Now actually attached. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de

Re: [HACKERS] Enabling Checksums

2013-03-15 Thread Andres Freund
On 2013-03-15 14:32:57 +0200, Ants Aasma wrote: On Wed, Mar 6, 2013 at 1:34 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Fletcher's checksum is good in general, I was mainly worried about truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious weakness in it,

Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Jeff Davis
On Thu, 2013-03-07 at 13:45 -0800, Jeff Davis wrote: I need to do another self-review after these changes and some more extensive testing, so I might have missed a couple things. New patch attached. Aside from rebasing, I also found a problem with temp tables. At first I was going to fix it by

Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Josh Berkus
Jeff, However, I'm willing to be convinced to exclude temp tables again. Those reasons sound persuasive. Let's leave them in for 9.3. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Enabling Checksums

2013-03-13 Thread Jim Nasby
On 3/7/13 9:31 PM, Bruce Momjian wrote: 1 storage 2 storage controller 3 file system 4 RAM 5 CPU I would add 2.5 in there: storage interconnect. iSCSI, FC, what-have-you. Obviously not everyone has that. My guess is that storage checksums only cover

Re: [HACKERS] Enabling Checksums

2013-03-09 Thread Simon Riggs
On 8 March 2013 03:31, Bruce Momjian br...@momjian.us wrote: I also see the checksum patch is taking a beating. I wanted to step back and ask what percentage of known corruptions cases will this checksum patch detect? What percentage of these corruptions would filesystem checksums have

Re: [HACKERS] Enabling Checksums

2013-03-08 Thread Heikki Linnakangas
On 08.03.2013 05:31, Bruce Momjian wrote: Also, don't all modern storage drives have built-in checksums, and report problems to the system administrator? Does smartctl help report storage corruption? Let me take a guess at answering this --- we have several layers in a database server:

Re: [HACKERS] Enabling Checksums

2013-03-08 Thread Heikki Linnakangas
On 07.03.2013 23:45, Jeff Davis wrote: By the way, I can not find any trace of XLogCheckBufferNeedsBackup(), was that a typo? Ah, sorry, that was a new function introduced by another patch I was reviewing at the same time, and I conflated the two. - Heikki -- Sent via pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-08 Thread Josh Berkus
I also see the checksum patch is taking a beating. I wanted to step back and ask what pertntage of known corruptions cases will this checksum patch detect? I'm pretty sure that early on Jeff posted some statstics which indicated that the current approach would detect 99% of corruption

Re: [HACKERS] Enabling Checksums

2013-03-08 Thread Greg Stark
On Fri, Mar 8, 2013 at 5:46 PM, Josh Berkus j...@agliodbs.com wrote: After some examination of the systems involved, we conculded that the issue was the FreeBSD drivers for the new storage, which were unstable and had custom source patches. However, without PostgreSQL checksums, we couldn't

Re: [HACKERS] Enabling Checksums

2013-03-08 Thread Greg Smith
On 3/8/13 3:38 AM, Heikki Linnakangas wrote: See https://www.kernel.org/doc/Documentation/block/data-integrity.txt That includes an interesting comment that's along the lines of the MySQL checksum tests already mentioned: The 16-bit CRC checksum mandated by both the SCSI and SATA specs is

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Jeff Davis
On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote: If you enable checksums, the free space map never gets updated in a standby. It will slowly drift to be completely out of sync with reality, which could lead to significant slowdown and bloat after failover. One of the design

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Bruce Momjian
On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Pavel Stehule
2013/3/8 Bruce Momjian br...@momjian.us: On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Daniel Farina
On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Simon Riggs
On 5 March 2013 09:35, Heikki Linnakangas hlinnakan...@vmware.com wrote: Are there objectors? In addition to my hostility towards this patch in general, there are some specifics in the patch I'd like to raise (read out in a grumpy voice): ;-) We all want to make the right choice here, so

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Simon Riggs
On 5 March 2013 18:02, Jeff Davis pg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research showed Fletcher was significantly faster

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Heikki Linnakangas
On 06.03.2013 10:41, Simon Riggs wrote: On 5 March 2013 18:02, Jeff Davispg...@j-davis.com wrote: Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. IIRC the research

<    1   2   3   4   >