Re: [HACKERS] Enabling Checksums

2013-09-12 Thread Greg Smith
On 3/18/13 10:52 AM, Bruce Momjian wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. If you survey people who are running PostgreSQL on cloud hardware, be it Amazon's EC2 or similar options from other vendors, you will find a high percentage of them

Re: [HACKERS] Enabling Checksums

2013-09-12 Thread Greg Smith
On 3/18/13 10:52 AM, Bruce Momjian wrote: With a potential 10-20% overhead, I am unclear who would enable this at initdb time. If you survey people who are running PostgreSQL on cloud hardware, be it Amazon's EC2 or similar options from other vendors, you will find a high percentage of them

Re: [HACKERS] Enabling Checksums

2013-04-25 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On 24 April 2013 21:06, Jeff Davis pg...@j-davis.com wrote: What goal are you trying to accomplish with this patch? That we might need to patch the checksum version on a production release. I don't actually buy that argument, certainly not as

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 01:10, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote: * make the pg_control.data_checksums field into a version number, for future flexibility... patch attached Commenting on this separately because it's a separate issue. I'd

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 08:20 +0100, Simon Riggs wrote: On 24 April 2013 01:10, Jeff Davis pg...@j-davis.com wrote: I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no checksum, 1 for FNV-1a-SR3, etc. That would allow us to release 9.4 with a new algorithm without forcing

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Simon Riggs
On 24 April 2013 21:06, Jeff Davis pg...@j-davis.com wrote: What goal are you trying to accomplish with this patch? That we might need to patch the checksum version on a production release. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support,

Re: [HACKERS] Enabling Checksums

2013-04-24 Thread Jeff Davis
On Wed, 2013-04-24 at 21:09 +0100, Simon Riggs wrote: On 24 April 2013 21:06, Jeff Davis pg...@j-davis.com wrote: What goal are you trying to accomplish with this patch? That we might need to patch the checksum version on a production release. Oh, I see. I don't think we need two output

Re: [HACKERS] Enabling Checksums

2013-04-23 Thread Simon Riggs
On 23 April 2013 02:35, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-04-23 at 01:08 +0300, Ants Aasma wrote: A slight delay, but here it is. I didn't lift the checksum part into a separate file as I didn't have a great idea what I would call it. The code is reasonably compact so I don't

Re: [HACKERS] Enabling Checksums

2013-04-23 Thread Jeff Davis
On Tue, 2013-04-23 at 16:28 +0100, Simon Riggs wrote: * make the pg_control.data_checksums field into a version number, for future flexibility... patch attached Commenting on this separately because it's a separate issue. I'd prefer that it was some kind of a checksum ID code -- e.g. 0 for no

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Robert Haas
On Wed, Apr 17, 2013 at 8:21 PM, Greg Smith g...@2ndquadrant.com wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. And that's not going to get anyone past

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Andres Freund
On 2013-04-22 11:27:25 -0400, Robert Haas wrote: On Wed, Apr 17, 2013 at 8:21 PM, Greg Smith g...@2ndquadrant.com wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Ants Aasma
On Mon, Apr 22, 2013 at 6:27 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 17, 2013 at 8:21 PM, Greg Smith g...@2ndquadrant.com wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Ants Aasma
On Mon, Apr 22, 2013 at 6:33 PM, Andres Freund and...@2ndquadrant.com wrote: I don't see us changing away from CRCs anymore either by now. But I think at least changing the polynom to something that a) has higher error detection properties b) can noticeably sped up on a a good part of the

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Josh Berkus
On 04/22/2013 09:25 AM, Ants Aasma wrote: This leaves lingering doubts about the quality of the checksum. It's hard if not impossible to prove absence of interesting patterns that would trigger collisions. I do know the checksum quality is miles ahead of the Fletcher sum originally proposed

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Florian Pflug
On Apr22, 2013, at 18:25 , Ants Aasma a...@cybertec.at wrote: I was just now writing up a generic C based patch based on the parallel FNV-1a + shift that we discussed with Florian with an added round of mixing. Testing the performance in isolation indicates that: 1) it is about an order of

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Ants Aasma
On Mon, Apr 22, 2013 at 9:04 PM, Florian Pflug f...@phlo.org wrote: The one downside of the fnv1+shift approach is that it's built around the assumption that processing 64-bytes at once is the sweet spot. That might be true for x86 and x86_64 today, but it won't stay that way for long, and

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Jeff Davis
On Mon, 2013-04-22 at 20:04 +0200, Florian Pflug wrote: The one downside of the fnv1+shift approach is that it's built around the assumption that processing 64-bytes at once is the sweet spot. That might be true for x86 and x86_64 today, but it won't stay that way for long, and quite surely

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Robert Haas
On Mon, Apr 22, 2013 at 3:14 PM, Jeff Davis pg...@j-davis.com wrote: The biggest problem now is getting one of these faster algorithms (FNV or even a faster CRC) into shape that is acceptable to reviewers/committers. If we don't do that, we will be missing out on a lot of potential checksum

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Jeff Davis
On Mon, 2013-04-22 at 19:25 +0300, Ants Aasma wrote: I was just now writing up a generic C based patch based on the parallel FNV-1a + shift that we discussed with Florian with an added round of mixing. Testing the performance in isolation indicates that: 1) it is about an order of magnitude

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Florian Pflug
On Apr22, 2013, at 21:14 , Jeff Davis pg...@j-davis.com wrote: On Mon, 2013-04-22 at 20:04 +0200, Florian Pflug wrote: The one downside of the fnv1+shift approach is that it's built around the assumption that processing 64-bytes at once is the sweet spot. That might be true for x86 and x86_64

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Simon Riggs
On 22 April 2013 20:32, Florian Pflug f...@phlo.org wrote: Assuming that we only ship a plain C implementation with 9.3, what are we missing on that front? The C implementation of FNV1+SHIFT is only a few dozen lines or so. Forgive me, I can't seem to locate the patch for this? Re-post

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Ants Aasma
On Mon, Apr 22, 2013 at 10:54 PM, Simon Riggs si...@2ndquadrant.com wrote: On 22 April 2013 20:32, Florian Pflug f...@phlo.org wrote: Assuming that we only ship a plain C implementation with 9.3, what are we missing on that front? The C implementation of FNV1+SHIFT is only a few dozen lines

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Ants Aasma
On Mon, Apr 22, 2013 at 10:57 PM, Ants Aasma a...@cybertec.at wrote: On Mon, Apr 22, 2013 at 10:54 PM, Simon Riggs si...@2ndquadrant.com wrote: On 22 April 2013 20:32, Florian Pflug f...@phlo.org wrote: Assuming that we only ship a plain C implementation with 9.3, what are we missing on that

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Jeff Davis
On Tue, 2013-04-23 at 01:08 +0300, Ants Aasma wrote: A slight delay, but here it is. I didn't lift the checksum part into a separate file as I didn't have a great idea what I would call it. The code is reasonably compact so I don't see a great need for this right now. It would be more worth

Re: [HACKERS] Enabling Checksums

2013-04-22 Thread Alvaro Herrera
Jeff Davis escribió: I'm not sure what we should call the separate file or where we should put it, though. How about src/backend/utils/checksum/checksum_fnv.c? Is there a clean way to override the compiler flags for a single file so we don't need to put it in its own directory? Sure, see

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Andres Freund
On 2013-04-17 18:16:36 -0700, Daniel Farina wrote: The original paper is often shorthanded Castagnoli 93, but it exists in the IEEE's sphere of influence and is hard to find a copy of. Luckily, a pretty interesting survey paper discussing some of the issues was written by Koopman in 2002 and

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Andres Freund
On 2013-04-18 00:44:02 +0300, Ants Aasma wrote: I went ahead and coded up both the parallel FNV-1a and parallel FNV-1a + srl1-xor variants and ran performance tests and detection rate tests on both. Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Simon Riggs
On 17 April 2013 22:36, Bruce Momjian br...@momjian.us wrote: I would like to know the answer of how an upgrade from checksum to no-checksum would behave so I can modify pg_upgrade to allow it. Why? 9.3 pg_upgrade certainly doesn't need it. When we get to 9.4, if someone has checksums

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Daniel Farina
On Wed, Apr 17, 2013 at 11:08 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-17 18:16:36 -0700, Daniel Farina wrote: The original paper is often shorthanded Castagnoli 93, but it exists in the IEEE's sphere of influence and is hard to find a copy of. Luckily, a pretty interesting

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Ants Aasma
On Thu, Apr 18, 2013 at 5:08 AM, Greg Smith g...@2ndquadrant.com wrote: On 4/17/13 8:56 PM, Ants Aasma wrote: Nothing from the two points, but the CRC calculation algorithm can be switched out for slice-by-4 or slice-by-8 variant. Speed up was around factor of 4 if I remember correctly...I

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Bruce Momjian
On Thu, Apr 18, 2013 at 09:17:39AM +0100, Simon Riggs wrote: On 17 April 2013 22:36, Bruce Momjian br...@momjian.us wrote: I would like to know the answer of how an upgrade from checksum to no-checksum would behave so I can modify pg_upgrade to allow it. Why? 9.3 pg_upgrade certainly

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Ants Aasma
On Thu, Apr 18, 2013 at 5:57 PM, Ants Aasma a...@cybertec.at wrote: I'll generate an avalanche diagram for CRC32C too, but it will take a while even if I use a smaller dataset. Well that was useless... In CRC flipping each bit in the input flips preset pattern of bits in the output regardless

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Jeff Davis
On Wed, 2013-04-17 at 20:21 -0400, Greg Smith wrote: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. That is fairly easy to fix by using a different modulus:

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Florian Weimer
* Greg Smith: The TCP/IP checksum spec is at https://tools.ietf.org/html/rfc793 ; its error detection limitations are described at http://www.noahdavids.org/self_published/CRC_and_checksum.html ; and a good article about optimizing its code is at

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Florian Pflug
On Apr18, 2013, at 19:04 , Jeff Davis pg...@j-davis.com wrote: On Wed, 2013-04-17 at 20:21 -0400, Greg Smith wrote: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot distinguish between blocks of all 0 bits and blocks of all 1

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Ants Aasma
On Thu, Apr 18, 2013 at 8:05 PM, Florian Pflug f...@phlo.org wrote: On Apr18, 2013, at 19:04 , Jeff Davis pg...@j-davis.com wrote: On Wed, 2013-04-17 at 20:21 -0400, Greg Smith wrote: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Florian Pflug
On Apr18, 2013, at 18:48 , Ants Aasma a...@cybertec.at wrote: On Thu, Apr 18, 2013 at 5:57 PM, Ants Aasma a...@cybertec.at wrote: I'll generate an avalanche diagram for CRC32C too, but it will take a while even if I use a smaller dataset. Well that was useless... In CRC flipping each bit in

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Ants Aasma
On Thu, Apr 18, 2013 at 8:15 PM, Florian Pflug f...@phlo.org wrote: So either the CRC32-C polynomial isn't irreducible, or there something fishy going on. Could there be a bug in your CRC implementation? Maybe a mixup between big and little endian, or something like that? I'm suspecting an

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Jeff Davis
On Thu, 2013-04-18 at 19:05 +0200, Florian Pflug wrote: On Apr18, 2013, at 19:04 , Jeff Davis pg...@j-davis.com wrote: On Wed, 2013-04-17 at 20:21 -0400, Greg Smith wrote: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it cannot

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Ants Aasma
On Thu, Apr 18, 2013 at 8:24 PM, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 18, 2013 at 8:15 PM, Florian Pflug f...@phlo.org wrote: So either the CRC32-C polynomial isn't irreducible, or there something fishy going on. Could there be a bug in your CRC implementation? Maybe a mixup between

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Florian Pflug
On 18.04.2013, at 20:02, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 18, 2013 at 8:24 PM, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 18, 2013 at 8:15 PM, Florian Pflug f...@phlo.org wrote: So either the CRC32-C polynomial isn't irreducible, or there something fishy going on. Could there

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Greg Stark
On Thu, Apr 18, 2013 at 6:04 PM, Florian Weimer f...@deneb.enyo.de wrote: The TCP checksum is too weak to be practical. Every now an then, I see data transfers where the checksum is valid, but the content contains bit flips. Well of course, it's only a 16-bit checksum. 64k packets isn't very

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma
On Wed, Apr 17, 2013 at 2:26 AM, Florian Pflug f...@phlo.org wrote: This raises two question. First, why are there two primes? You could just as well using a single prime q and set p=q^64 mod 2^16. You then get S = sum V[i,j] * q^(64*(64-i) + (64-j) = sum V[i,j] * q^(4096 - 64*(i-1) - j)

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 05:47:55PM +0300, Ants Aasma wrote: The SSE4.1 implementation of this would be as fast as the last pat, generic version will be faster and we avoid the linearity issue. By using different offsets for each of the partial hashes we don't directly suffer from commutativity

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug
On Apr17, 2013, at 17:09 , Bruce Momjian br...@momjian.us wrote: As much as I love the idea of improving the algorithm, it is disturbing we are discussing this so close to beta, with an algorithm that is under analysis, with no (runtime) CPU detection, and in something that is going to be

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug
On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote: This made me remember, the issue I had was with high order bits, not with low order ones, somehow I got them confused. The exact issue is that the high order bits don't affect any bit lower than them. It's easy to see that if you

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Stark
On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually What advantages are we postponing now

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote: However, you're right that time's running out. It'd be a shame though if we'd lock ourselves into CRC as the only available algorithm essentially forever. Is there any way we can change the checksum algorithm in 9.4 *without*

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug
On Apr17, 2013, at 18:15 , Bruce Momjian br...@momjian.us wrote: On Wed, Apr 17, 2013 at 05:28:06PM +0200, Florian Pflug wrote: However, you're right that time's running out. It'd be a shame though if we'd lock ourselves into CRC as the only available algorithm essentially forever. Is there

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 06:33:58PM +0200, Florian Pflug wrote: I was going to ask about the flexibility of pg_upgrade and checksums. Right now you have to match the old and new cluster checksum modes, but it seems it would be possible to allow pg_upgrade use from checksum to no-checksum

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes someday eventually

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 01:22:01PM -0400, Tom Lane wrote: Greg Stark st...@mit.edu writes: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: Uh, not sure how pg_upgrade would detect that as the version number is not stored in pg_controldata, e.g.: Data page checksums: enabled/disabled That seems pretty shortsighted. The field probably ought to be defined as containing

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Uh, not sure how pg_upgrade would detect that as the version number is not stored in pg_controldata, e.g.: Data page checksums: enabled/disabled That seems pretty

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane
Bruce Momjian br...@momjian.us writes: On Wed, Apr 17, 2013 at 01:29:18PM -0400, Tom Lane wrote: But having said that, I'm not sure why this would be pg_upgrade's problem. By definition, we do not want pg_upgrade running around looking at individual data pages. Therefore, whatever we might

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis
On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote: AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not being able to pg_upgrade at all. Otherwise, why would people

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Jeff Davis
On Wed, 2013-04-17 at 16:58 +0100, Greg Stark wrote: On Wed, Apr 17, 2013 at 4:28 PM, Florian Pflug f...@phlo.org wrote: Is there any way we can change the checksum algorithm in 9.4 *without* breaking pg_upgrade? Personally I think we're going to need a solution for page format changes

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Bruce Momjian
On Wed, Apr 17, 2013 at 01:59:12PM -0700, Jeff Davis wrote: On Wed, 2013-04-17 at 12:42 -0400, Bruce Momjian wrote: AFAIK, there's currently no per-page checksum flag. Still, being only able to go from checksummed to not-checksummed probably is for all practical purposes the same as not

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma
On Wed, Apr 17, 2013 at 6:54 PM, Florian Pflug f...@phlo.org wrote: On Apr17, 2013, at 16:47 , Ants Aasma a...@cybertec.at wrote: This made me remember, the issue I had was with high order bits, not with low order ones, somehow I got them confused. The exact issue is that the high order bits

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Tom Lane
Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle, that's why the N_SUMS adds up to 128 bytes not 16 bytes. The

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug
On Apr18, 2013, at 00:32 , Tom Lane t...@sss.pgh.pa.us wrote: Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each cycle,

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Florian Pflug
On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote: Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a + srl-1: 7.4 bytes/s Detection rates: False positive rates: Add-mul FNV-1a FNV-1a + srl-1 Single bit flip:

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma
On Thu, Apr 18, 2013 at 1:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ants Aasma a...@cybertec.at writes: I was thinking about something similar too. The big issue here is that the parallel checksums already hide each other latencies effectively executing one each of movdqu/pmullw/paddw each

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma
On Thu, Apr 18, 2013 at 2:25 AM, Florian Pflug f...@phlo.org wrote: On Apr17, 2013, at 23:44 , Ants Aasma a...@cybertec.at wrote: Performance results: Mul-add checksums: 12.9 bytes/s FNV-1a checksums: 13.5 bytes/s FNV-1a + srl-1: 7.4 bytes/s Detection rates: False positive rates:

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith
On 4/17/13 6:32 PM, Tom Lane wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. And that's not going to get anyone past review, since all the tests I've been

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Ants Aasma
On Thu, Apr 18, 2013 at 3:21 AM, Greg Smith g...@2ndquadrant.com wrote: On 4/17/13 6:32 PM, Tom Lane wrote: The more I read of this thread, the more unhappy I get. It appears that the entire design process is being driven by micro-optimization for CPUs being built by Intel in 2013. And

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Daniel Farina
On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith g...@2ndquadrant.com wrote: Let me see if I can summarize where the messages flying by are at since you'd like to close this topic for now: -Original checksum feature used Fletcher checksums. Its main problems, to quote wikipedia, include that it

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Greg Smith
On 4/17/13 8:56 PM, Ants Aasma wrote: Nothing from the two points, but the CRC calculation algorithm can be switched out for slice-by-4 or slice-by-8 variant. Speed up was around factor of 4 if I remember correctly...I can provide you with a patch of the generic version of any of the discussed

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 9 April 2013 03:35, Ants Aasma a...@cybertec.at wrote: On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote: Unless somebody tells me not to waste my time I'll go ahead and come up with a workable patch by Monday. And here you go. I decided to be verbose with the comments as

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Greg Stark
On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote: * WAL checksum is not used as the sole basis for end-of-WAL discovery. We reuse the WAL files, so the prev field in each WAL record shows what the previous end of WAL was. Hence if the WAL checksums give a false positive

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote: * WAL checksum is not used as the sole basis for end-of-WAL discovery. We reuse the WAL files, so the prev field in each WAL record shows what the previous end of WAL was. Hence if the

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Simon Riggs
On 16 April 2013 20:27, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Fri, Apr 12, 2013 at 9:42 PM, Simon Riggs si...@2ndquadrant.com wrote: * WAL checksum is not used as the sole basis for end-of-WAL discovery. We reuse the WAL files, so the prev field in each WAL

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote: Based on current analysis, it is particularly good at detecting single bit errors, as good at detecting burst errors as can be expected from 16 bits and not horrible at detecting burst writes of zeroes. It is quite bad at detecting

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs si...@2ndquadrant.com wrote: On 9 April 2013 03:35, Ants Aasma a...@cybertec.at wrote: On Fri, Apr 5, 2013 at 9:39 PM, Ants Aasma a...@cybertec.at wrote: Unless somebody tells me not to waste my time I'll go ahead and come up with a workable patch

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Tom Lane
Ants Aasma a...@cybertec.at writes: On Tue, Apr 16, 2013 at 5:05 PM, Simon Riggs si...@2ndquadrant.com wrote: My only review comments are to ask for some explanation of the magic numbers... The specific values used are mostly magic to me too. As mentioned in a short sentence in the patch,

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Ants Aasma
On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug f...@phlo.org wrote: On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote: Based on current analysis, it is particularly good at detecting single bit errors, as good at detecting burst errors as can be expected from 16 bits and not

Re: [HACKERS] Enabling Checksums

2013-04-16 Thread Florian Pflug
On Apr16, 2013, at 23:41 , Ants Aasma a...@cybertec.at wrote: On Tue, Apr 16, 2013 at 11:20 PM, Florian Pflug f...@phlo.org wrote: On Apr13, 2013, at 17:14 , Ants Aasma a...@cybertec.at wrote: Based on current analysis, it is particularly good at detecting single bit errors, as good at

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Simon Riggs
On 12 April 2013 23:21, Ants Aasma a...@cybertec.at wrote: In general, we have more flexibility with WAL because there is no upgrade issue. It would be nice to share code with the data page checksum algorithm; but really we should just use whatever offers the best trade-off in terms of

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Bruce Momjian
On Fri, Apr 12, 2013 at 02:38:27PM -0700, Jeff Davis wrote: In general, we have more flexibility with WAL because there is no upgrade issue. It would be nice to share code with the data page checksum algorithm; but really we should just use whatever offers the best trade-off in terms of

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Andres Freund
On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: On Fri, Apr 12, 2013 at 02:38:27PM -0700, Jeff Davis wrote: In general, we have more flexibility with WAL because there is no upgrade issue. It would be nice to share code with the data page checksum algorithm; but really we should just use

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: As I understand it, SIMD is just a CPU-optimized method for producing a CRC checksum. Is that right? Does it produce the same result as a non-CPU-optimized CRC calculation? No we are talking

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Andres Freund
On 2013-04-13 10:58:53 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: As I understand it, SIMD is just a CPU-optimized method for producing a CRC checksum. Is that right? Does it produce the same result as a

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Ants Aasma
On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: As I understand it, SIMD is just a CPU-optimized method for producing a CRC checksum. Is that right? Does it produce the same

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Andres Freund
On 2013-04-13 18:14:28 +0300, Ants Aasma wrote: On Sat, Apr 13, 2013 at 5:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-04-13 09:14:26 -0400, Bruce Momjian wrote: As I understand it, SIMD is just a CPU-optimized method for producing a CRC

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Ants Aasma
On Sat, Apr 13, 2013 at 6:26 PM, Andres Freund and...@2ndquadrant.com wrote: All in all I would say that the performance is worth the loss in detection capability as we are not talking about using the checksum to prove correctness. Is it actually a loss compared to our 16bit flavor of crc32

Re: [HACKERS] Enabling Checksums

2013-04-13 Thread Bruce Momjian
On Sat, Apr 13, 2013 at 06:14:28PM +0300, Ants Aasma wrote: CRCs are known to be good for that sort of thing; it's what they were designed for. I'd like to see some evidence that any substitute algorithm has similar properties. Without that, I'm going to vote against this idea. Sorry

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian
On Wed, Apr 10, 2013 at 11:19:56AM -0700, Jeff Davis wrote: On Wed, 2013-04-10 at 11:01 +0300, Ants Aasma wrote: I think we should first deal with using it for page checksums and if future versions want to reuse some of the code for WAL checksums then we can rearrange the code. Sounds

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis
On Thu, 2013-04-11 at 20:12 +0100, Simon Riggs wrote: So, if we apply a patch like the one attached, we then end up with the WAL checksum using the page checksum as an integral part of its calculation. (There is no increase in code inside WALInsertLock, nothing at all touched in that area).

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian
On Fri, Apr 12, 2013 at 12:07:36PM -0700, Jeff Davis wrote: (Attached patch is discussion only. Checking checksum in recovery isn't coded at all.) I like it. A few points: * Given that setting the checksum is unconditional in a backup block, do we want to zero the checksum field when

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Andres Freund
On 2013-04-11 20:12:59 +0100, Simon Riggs wrote: On 11 April 2013 04:27, Jeff Davis pg...@j-davis.com wrote: On Wed, 2013-04-10 at 20:17 +0100, Simon Riggs wrote: OK, so we have a single combined calculate a checksum for a block function. That uses Jeff's zeroing trick and Ants'

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Bruce Momjian
On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: Only point worth discussing is that this change would make backup blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record header is covered by a CRC32 but the backup blocks only by 16-bit. That means

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Andres Freund
On 2013-04-12 15:31:36 -0400, Bruce Momjian wrote: On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: Only point worth discussing is that this change would make backup blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record header is covered by a

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Heikki Linnakangas
On 12.04.2013 22:31, Bruce Momjian wrote: On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote: Only point worth discussing is that this change would make backup blocks be covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record header is covered by a CRC32 but the

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Simon Riggs
On 12 April 2013 21:03, Heikki Linnakangas hlinnakan...@vmware.com wrote: No, the patch has to compute the 16-bit checksum for the page when the full-page image is added to the WAL record. There would otherwise be no need to calculate the page checksum at that point, but only later when the

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis
On Fri, 2013-04-12 at 15:21 -0400, Bruce Momjian wrote: * When we do PageSetChecksumInplace(), we need to be 100% sure that the hole is empty; otherwise the checksum will fail when we re-expand it. It might be worth a memset beforehand just to be sure. Do we write the page holes to the

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis
On Fri, 2013-04-12 at 21:28 +0200, Andres Freund wrote: That means we will have to do the verification for this in ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we won't always recognize the end of WAL correctly. And I am a bit wary of reducing the likelihood of noticing

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Jeff Davis
On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote: I think this is a bad idea. It complicates the WAL format significantly. Simon's patch didn't include the changes to recovery to validate the checksum, but I suspect it would be complicated. And it reduces the error-detection

Re: [HACKERS] Enabling Checksums

2013-04-12 Thread Ants Aasma
On Sat, Apr 13, 2013 at 12:38 AM, Jeff Davis pg...@j-davis.com wrote: On Fri, 2013-04-12 at 23:03 +0300, Heikki Linnakangas wrote: I think this is a bad idea. It complicates the WAL format significantly. Simon's patch didn't include the changes to recovery to validate the checksum, but I

Re: [HACKERS] Enabling Checksums

2013-04-11 Thread Simon Riggs
On 11 April 2013 04:27, Jeff Davis pg...@j-davis.com wrote: On Wed, 2013-04-10 at 20:17 +0100, Simon Riggs wrote: OK, so we have a single combined calculate a checksum for a block function. That uses Jeff's zeroing trick and Ants' bulk-oriented performance optimization. For buffer

Re: [HACKERS] Enabling Checksums

2013-04-10 Thread Ants Aasma
On Wed, Apr 10, 2013 at 4:36 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-04-09 at 05:35 +0300, Ants Aasma wrote: And here you go. I decided to be verbose with the comments as it's easier to delete a comment to write one. I also left in a huge jumble of macros to calculate the contents

  1   2   3   4   >