Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 10:30 PM, Bruce Momjian wrote: > On Fri, Feb 24, 2017 at 10:09:50PM +0200, Ants Aasma wrote: >> On Fri, Feb 24, 2017 at 9:37 PM, Bruce Momjian wrote: >> > Oh, that's why we will hopefully eventually change the page checksum >> > algorithm to use the special CRC32 instruction, and set a new checksum >> > version --- got it. I assume there is currently no compile-time way to >> > do this. >> >> Using CRC32 as implemented now for the WAL would be significantly >> slower than what we have now due to instruction latency. Even the best >> theoretical implementation using the CRC32 instruction would still be >> about the same speed than what we have now. I haven't seen anybody >> working on swapping out the current algorithm. And I don't really see >> a reason to, it would introduce a load of headaches for no real gain. > > Uh, I am confused. I thought you said we were leaving some performance > on the table. What is that? I though CRC32 was SSE4.1. Why is CRC32 > good for the WAL but bad for the page checksums? What about the WAL > page images? The page checksum algorithm was designed to take advantage of CPUs that provide vectorized 32bit integer multiplication. On x86 this was introduced with SSE4.1 extensions. This means that by default we can't take advantage of the design. The code is written in a way that compiler auto vectorization works on it, so only using appropriate compilation flags are needed to compile a version that does use vector instructions. However to enable it on generic builds, a runtime switch between different levels of vectorization support is needed. This is what is leaving the performance on the table. The page checksum algorithm we have is extremely fast, memcpy fast. Even without vectorization it is right up there with Murmurhash3a and xxHash. With vectorization it's 4x faster. And it works this fast on most modern CPUs, not only Intel. The downside is that it only works well for large blocks, and only fixed power-of-2 size with the current implementation. WAL page images have the page hole removed so can't easily take advantage of this. That said, I haven't really seen either the hardware accelerated CRC32 calculation nor the non-vectorized page checksum take a noticeable amount of time on real world workloads. The benchmarks presented in this thread seem to corroborate this observation. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 10:09:50PM +0200, Ants Aasma wrote: > On Fri, Feb 24, 2017 at 9:37 PM, Bruce Momjian wrote: > > Oh, that's why we will hopefully eventually change the page checksum > > algorithm to use the special CRC32 instruction, and set a new checksum > > version --- got it. I assume there is currently no compile-time way to > > do this. > > Using CRC32 as implemented now for the WAL would be significantly > slower than what we have now due to instruction latency. Even the best > theoretical implementation using the CRC32 instruction would still be > about the same speed than what we have now. I haven't seen anybody > working on swapping out the current algorithm. And I don't really see > a reason to, it would introduce a load of headaches for no real gain. Uh, I am confused. I thought you said we were leaving some performance on the table. What is that? I though CRC32 was SSE4.1. Why is CRC32 good for the WAL but bad for the page checksums? What about the WAL page images? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 9:49 PM, Jim Nasby wrote: > On 2/24/17 12:30 PM, Tomas Vondra wrote: >> >> In any case, we can't just build x86-64 packages with compile-time >> SSE4.1 checks. > > > Dumb question... since we're already discussing llvm for the executor, would > that potentially be an option here? AIUI that also opens the possibility of > using the GPU as well. Just transferring the block to the GPU would be slower than what we have now. Theoretically LLVM could be used to JIT the checksum calculation, but just precompiling a couple of versions and swithcing between them at runtime would be simpler and would give the same speedup. Regards, Ants saasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 9:37 PM, Bruce Momjian wrote: > Oh, that's why we will hopefully eventually change the page checksum > algorithm to use the special CRC32 instruction, and set a new checksum > version --- got it. I assume there is currently no compile-time way to > do this. Using CRC32 as implemented now for the WAL would be significantly slower than what we have now due to instruction latency. Even the best theoretical implementation using the CRC32 instruction would still be about the same speed than what we have now. I haven't seen anybody working on swapping out the current algorithm. And I don't really see a reason to, it would introduce a load of headaches for no real gain. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 10:02 PM, Bruce Momjian wrote: > Uh, as far as I know, the best you are going to get from llvm is > standard assembly, while the SSE4.1 instructions use special assembly > instructions, so they would be faster, and in a way they are a GPU built > into CPUs. Both LLVM and GCC are capable of compiling the code that we have to a vectorized loop using SSE4.1 or AVX2 instructions given the proper compilation flags. This is exactly what was giving the speedup in the test I showed in my e-mail. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 01:49:07PM -0600, Jim Nasby wrote: > On 2/24/17 12:30 PM, Tomas Vondra wrote: > >In any case, we can't just build x86-64 packages with compile-time > >SSE4.1 checks. > > Dumb question... since we're already discussing llvm for the executor, would > that potentially be an option here? AIUI that also opens the possibility of > using the GPU as well. Uh, as far as I know, the best you are going to get from llvm is standard assembly, while the SSE4.1 instructions use special assembly instructions, so they would be faster, and in a way they are a GPU built into CPUs. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2/24/17 12:30 PM, Tomas Vondra wrote: In any case, we can't just build x86-64 packages with compile-time SSE4.1 checks. Dumb question... since we're already discussing llvm for the executor, would that potentially be an option here? AIUI that also opens the possibility of using the GPU as well. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 08:31:09PM +0200, Ants Aasma wrote: > >> We looked at that when picking the algorithm. At that point it seemed > >> that CRC CPU instructions were not universal enough to rely on them. > >> The algorithm we ended up on was designed to be fast on SIMD hardware. > >> Unfortunately on x86-64 that required SSE4.1 integer instructions, so > >> with default compiles there is a lot of performance left on table. A > >> low hanging fruit would be to do CPU detection like the CRC case and > >> enable a SSE4.1 optimized variant when those instructions are > >> available. IIRC it was actually a lot faster than the naive hardware > >> CRC that is used for WAL and about on par with interleaved CRC. > > > > Uh, I thought already did compile-time testing for SSE4.1 and used them > > if present. Why do you say "with default compiles there is a lot of > > performance left on table?" > > Compile time checks don't help because the compiled binary could be > run on a different host that does not have SSE4.1 (as extremely > unlikely as it is at this point of time). A runtime check is done for Right. > WAL checksums that use a special CRC32 instruction. Block checksums > predate that and use a different algorithm that was picked because it > could be accelerated with vectorized execution on non-Intel > architectures. We just never got around to adding runtime checks for > the architecture to enable this speedup. Oh, that's why we will hopefully eventually change the page checksum algorithm to use the special CRC32 instruction, and set a new checksum version --- got it. I assume there is currently no compile-time way to do this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 24, 2017 at 7:47 PM, Bruce Momjian wrote: > On Sat, Jan 21, 2017 at 09:02:25PM +0200, Ants Aasma wrote: >> > It might be worth looking into using the CRC CPU instruction to reduce this >> > overhead, like we do for the WAL checksums. Since that is a different >> > algorithm it would be a compatibility break and we would need to support >> > the >> > old algorithm for upgraded clusters.. >> >> We looked at that when picking the algorithm. At that point it seemed >> that CRC CPU instructions were not universal enough to rely on them. >> The algorithm we ended up on was designed to be fast on SIMD hardware. >> Unfortunately on x86-64 that required SSE4.1 integer instructions, so >> with default compiles there is a lot of performance left on table. A >> low hanging fruit would be to do CPU detection like the CRC case and >> enable a SSE4.1 optimized variant when those instructions are >> available. IIRC it was actually a lot faster than the naive hardware >> CRC that is used for WAL and about on par with interleaved CRC. > > Uh, I thought already did compile-time testing for SSE4.1 and used them > if present. Why do you say "with default compiles there is a lot of > performance left on table?" Compile time checks don't help because the compiled binary could be run on a different host that does not have SSE4.1 (as extremely unlikely as it is at this point of time). A runtime check is done for WAL checksums that use a special CRC32 instruction. Block checksums predate that and use a different algorithm that was picked because it could be accelerated with vectorized execution on non-Intel architectures. We just never got around to adding runtime checks for the architecture to enable this speedup. The attached test runs 1M iterations of the checksum about 3x faster when compiled with SSE4.1 and vectorization, 4x if AVX2 is added into the mix. Test: gcc $CFLAGS -Isrc/include -DN=100 testchecksum.c -o testchecksum && time ./testchecksum Results: CFLAGS="-O2": 2.364s CFLAGS="-O2 -msse4.1 -ftree-vectorize": 0.752s CFLAGS="-O2 -mavx2 -ftree-vectorize": 0.552s That 0.552s is 15GB/s per core on a 3 year old laptop. Regards, Ants Aasma #include "postgres.h" #include "storage/checksum_impl.h" void main() { char page[8192] = {0}; uint32 i, sum = 0; for (i = 0; i < N; i++) sum ^= pg_checksum_page(page, i); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 02/24/2017 06:47 PM, Bruce Momjian wrote: On Sat, Jan 21, 2017 at 09:02:25PM +0200, Ants Aasma wrote: It might be worth looking into using the CRC CPU instruction to reduce this overhead, like we do for the WAL checksums. Since that is a different algorithm it would be a compatibility break and we would need to support the old algorithm for upgraded clusters.. We looked at that when picking the algorithm. At that point it seemed that CRC CPU instructions were not universal enough to rely on them. The algorithm we ended up on was designed to be fast on SIMD hardware. Unfortunately on x86-64 that required SSE4.1 integer instructions, so with default compiles there is a lot of performance left on table. A low hanging fruit would be to do CPU detection like the CRC case and enable a SSE4.1 optimized variant when those instructions are available. IIRC it was actually a lot faster than the naive hardware CRC that is used for WAL and about on par with interleaved CRC. Uh, I thought already did compile-time testing for SSE4.1 and used them if present. Why do you say "with default compiles there is a lot of performance left on table?" Compile-time is not enough - we build binary packages that may then be installed on machines without the SSE4.1 instructions available. On Intel this may not be a huge issue - the first microarchitecture with SSE4.1 was "Nehalem", announced in 2007, so we're only left with very old boxes based on "Intel Core" (and perhaps the even older P6). On AMD, it's a bit worse - the first micro-architecture with SSE4.1 was Bulldozer (late 2011). So quite a few CPUs out there, even if most people use Intel. In any case, we can't just build x86-64 packages with compile-time SSE4.1 checks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 09:02:25PM +0200, Ants Aasma wrote: > > It might be worth looking into using the CRC CPU instruction to reduce this > > overhead, like we do for the WAL checksums. Since that is a different > > algorithm it would be a compatibility break and we would need to support the > > old algorithm for upgraded clusters.. > > We looked at that when picking the algorithm. At that point it seemed > that CRC CPU instructions were not universal enough to rely on them. > The algorithm we ended up on was designed to be fast on SIMD hardware. > Unfortunately on x86-64 that required SSE4.1 integer instructions, so > with default compiles there is a lot of performance left on table. A > low hanging fruit would be to do CPU detection like the CRC case and > enable a SSE4.1 optimized variant when those instructions are > available. IIRC it was actually a lot faster than the naive hardware > CRC that is used for WAL and about on par with interleaved CRC. Uh, I thought already did compile-time testing for SSE4.1 and used them if present. Why do you say "with default compiles there is a lot of performance left on table?" -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Feb 23, 2017 at 10:37 PM, Bruce Momjian wrote: > On Sat, Jan 21, 2017 at 12:46:05PM -0500, Stephen Frost wrote: > > * Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote: > > > As we don't know the performance impact is (there was no benchmark done > > > on reasonably current code base) I really don't understand how you can > > > judge if it's worth it or not. > > > > Because I see having checksums as, frankly, something we always should > > have had (as most other databases do, for good reason...) and because > > they will hopefully prevent data loss. I'm willing to give us a fair > > bit to minimize the risk of losing data. > > Do these other databases do checksums because they don't do > full_page_writes? They just detect torn pages rather than repair them > like we do? > Torn page detection is usually/often done by other means than checksums. I don't think those are necessarily related. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 12:46:05PM -0500, Stephen Frost wrote: > * Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote: > > As we don't know the performance impact is (there was no benchmark done > > on reasonably current code base) I really don't understand how you can > > judge if it's worth it or not. > > Because I see having checksums as, frankly, something we always should > have had (as most other databases do, for good reason...) and because > they will hopefully prevent data loss. I'm willing to give us a fair > bit to minimize the risk of losing data. Do these other databases do checksums because they don't do full_page_writes? They just detect torn pages rather than repair them like we do? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 02/11/2017 01:38 AM, Tomas Vondra wrote: Incidentally, I've been dealing with a checksum failure reported by a customer last week, and based on the experience I tend to agree that we don't have the tools needed to deal with checksum failures. I think such tooling should be a 'must have' for enabling checksums by default. In this particular case the checksum failure is particularly annoying because it happens during recovery (on a standby, after a restart), during startup, so FATAL means shutdown. I've managed to inspect the page in different way (dd and pageinspect from another instance), and it looks fine - no obvious data corruption, the only thing that seems borked is the checksum itself, and only three consecutive bits are flipped in the checksum. So this doesn't seem like a "stale checksum" - hardware issue is a possibility (the machine has ECC RAM though), but it might just as easily be a bug in PostgreSQL, when something scribbles over the checksum due to a buffer overflow, just before we write the buffer to the OS. So 'false failures' are not entirely impossible thing. Not to leave this without any resolution, it seems the issue has been caused by a SAN. Some configuration changes or something was being done at the time of the issue, and the SAN somehow ended up writing a page into a different relfilenode, into a different block. The page was from a btree index and got written into a heap relfilenode, but otherwise it was correct - the only thing that changed seems to be the block number, which explains the minimal difference in the checksum. I don't think we'll learn much more, but it seems the checksums did their work in detecting the issue. > So I think we're not ready to enable checksums by default for everyone, not until we can provide tools to deal with failures like this (I don't think users will be amused if we tell them to use 'dd' and inspect the pages in a hex editor). ISTM the way forward is to keep the current default (disabled), but to allow enabling checksums on the fly. That will mostly fix the issue for people who actually want checksums but don't realize they need to enable them at initdb time (and starting from scratch is not an option for them), are running on good hardware and are capable of dealing with checksum errors if needed, even without more built-in tooling. Being able to disable checksums on the fly is nice, but it only really solves the issue of extra overhead - it does really help with the failures (particularly when you can't even start the database, because of a checksum failure in the startup phase). So, shall we discuss what tooling would be useful / desirable? Although the checksums did detect the issue (we might never notice without them, or maybe the instance would mysteriously crash), I still think better tooling is neeed. I've posted some minor pageinspect improvements I hacked together while investigating this, but I don't think pageinspect is a very good tool for investigating checksum / data corruption issues, for a number of reasons: 1) It does not work at all when the instance does not even start - you have to manually dump the pages and try inspecting them from another instance. 2) Even then it assumes the pages are not corrupted, and may easily cause segfaults or other issues if that's not the case. 3) Works on a manual page-by-page basis. 4) It does not even try to resolve the issue somehow. For example I think it'd be great to have a tool that work even on instances that are not running. For example, something that recursively walks through all files in a data directory, verifies checksums on everything, lists/dumps pages with broken checksums for further inspection. I have an alpha-alpha versions of something along those lines, written before the root cause was identified. It'd be nice to have something that could help with fixing the issues (e.g. by fetching the last FPI from the backup, or so). But that's clearly way more difficult. There are probably some other tools that might be useful when dealing with data corruption (e.g. scrubbing to detect it). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 02/13/2017 02:29 AM, Jim Nasby wrote: On 2/10/17 6:38 PM, Tomas Vondra wrote: And no, backups may not be a suitable solution - the failure happens on a standby, and the page (luckily) is not corrupted on the master. Which means that perhaps the standby got corrupted by a WAL, which would affect the backups too. I can't verify this, though, because the WAL got removed from the archive, already. But it's a possibility. Possibly related... I've got a customer that periodically has SR replias stop in their tracks due to WAL checksum failure. I don't think there's any hardware correlation (they've seen this on multiple machines). Studying the code, it occurred to me that if there's any bugs in the handling of individual WAL record sizes or pointers during SR then you could get CRC failures. So far every one of these occurrences has been repairable by replacing the broken WAL file on the replica. I've requested that next time this happens they save the bad WAL. I don't follow. You're talking about WAL checksums, this thread is about data checksums. I'm not seeing any WAL checksum failure, but when the standby attempts to apply the WAL (in particular a Btree/DELETE WAL record), it detects an incorrect data checksum in the underlying table. So either there's a hardware issue, or the heap got corrupted by some preceding WAL. Or maybe one of the tiny gnomes in the CPU got tired and punched the bits wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2/10/17 6:38 PM, Tomas Vondra wrote: And no, backups may not be a suitable solution - the failure happens on a standby, and the page (luckily) is not corrupted on the master. Which means that perhaps the standby got corrupted by a WAL, which would affect the backups too. I can't verify this, though, because the WAL got removed from the archive, already. But it's a possibility. Possibly related... I've got a customer that periodically has SR replias stop in their tracks due to WAL checksum failure. I don't think there's any hardware correlation (they've seen this on multiple machines). Studying the code, it occurred to me that if there's any bugs in the handling of individual WAL record sizes or pointers during SR then you could get CRC failures. So far every one of these occurrences has been repairable by replacing the broken WAL file on the replica. I've requested that next time this happens they save the bad WAL. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Fri, Feb 10, 2017 at 7:38 PM, Tomas Vondra wrote: > Incidentally, I've been dealing with a checksum failure reported by a > customer last week, and based on the experience I tend to agree that we > don't have the tools needed to deal with checksum failures. I think such > tooling should be a 'must have' for enabling checksums by default. > > In this particular case the checksum failure is particularly annoying > because it happens during recovery (on a standby, after a restart), during > startup, so FATAL means shutdown. > > I've managed to inspect the page in different way (dd and pageinspect from > another instance), and it looks fine - no obvious data corruption, the only > thing that seems borked is the checksum itself, and only three consecutive > bits are flipped in the checksum. So this doesn't seem like a "stale > checksum" - hardware issue is a possibility (the machine has ECC RAM > though), but it might just as easily be a bug in PostgreSQL, when something > scribbles over the checksum due to a buffer overflow, just before we write > the buffer to the OS. So 'false failures' are not entirely impossible thing. > > And no, backups may not be a suitable solution - the failure happens on a > standby, and the page (luckily) is not corrupted on the master. Which means > that perhaps the standby got corrupted by a WAL, which would affect the > backups too. I can't verify this, though, because the WAL got removed from > the archive, already. But it's a possibility. > > So I think we're not ready to enable checksums by default for everyone, not > until we can provide tools to deal with failures like this (I don't think > users will be amused if we tell them to use 'dd' and inspect the pages in a > hex editor). > > ISTM the way forward is to keep the current default (disabled), but to allow > enabling checksums on the fly. That will mostly fix the issue for people who > actually want checksums but don't realize they need to enable them at initdb > time (and starting from scratch is not an option for them), are running on > good hardware and are capable of dealing with checksum errors if needed, > even without more built-in tooling. > > Being able to disable checksums on the fly is nice, but it only really > solves the issue of extra overhead - it does really help with the failures > (particularly when you can't even start the database, because of a checksum > failure in the startup phase). > > So, shall we discuss what tooling would be useful / desirable? FWIW, I appreciate this analysis and I think it's exactly the kind of thing we need to set a strategy for moving forward. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Hi, I've repeated those benchmarks on a much smaller/older machine, with only minimal changes (mostly related to RAM and cores available). I've expected to see more significant differences, assuming that newer CPUs will handle the checksumming better, but to my surprise the impact of enabling checksums on this machine is ~2%. As usual, full results and statistics are available for review here: https://bitbucket.org/tvondra/checksum-bench-i5 Looking at average TPS (measured over 2 hours, with a checkpoints every 30 minutes), I see this: testscale checksums no-checksums --- pgbench 50 7444 7518 99.02% 300 6863 6936 98.95% 1000 4195 4295 97.67% read-write50 48858 48832100.05% 300 41999 42302 99.28% 1000 16539 1 99.24% skewed50 7485 7480100.07% 300 7245 7280 99.52% 1000 5950 6050 98.35% skewed-n 50 10234 10226100.08% 300 9618 9649 99.68% 1000 7371 7393 99.70% And the amount of WAL produced looks like this: test scalechecksums no-checksums - pgbench 5024.8924.67 100.89% 30037.9437.54 101.07% 100065.9164.88 101.58% read-write5010.009.98 100.11% 30023.2823.35 99.66% 100054.2053.20 101.89% skewed5024.3524.01 101.43% 30035.1234.51 101.77% 100052.1451.15 101.93% skewed-n 5021.7121.13 102.73% 30032.2331.54 102.18% 100053.2451.94 102.50% Again, this is hardly a proof of non-existence of a workload where data checksums have much worse impact, but I've expected to see a much more significant impact on those workloads. Incidentally, I've been dealing with a checksum failure reported by a customer last week, and based on the experience I tend to agree that we don't have the tools needed to deal with checksum failures. I think such tooling should be a 'must have' for enabling checksums by default. In this particular case the checksum failure is particularly annoying because it happens during recovery (on a standby, after a restart), during startup, so FATAL means shutdown. I've managed to inspect the page in different way (dd and pageinspect from another instance), and it looks fine - no obvious data corruption, the only thing that seems borked is the checksum itself, and only three consecutive bits are flipped in the checksum. So this doesn't seem like a "stale checksum" - hardware issue is a possibility (the machine has ECC RAM though), but it might just as easily be a bug in PostgreSQL, when something scribbles over the checksum due to a buffer overflow, just before we write the buffer to the OS. So 'false failures' are not entirely impossible thing. And no, backups may not be a suitable solution - the failure happens on a standby, and the page (luckily) is not corrupted on the master. Which means that perhaps the standby got corrupted by a WAL, which would affect the backups too. I can't verify this, though, because the WAL got removed from the archive, already. But it's a possibility. So I think we're not ready to enable checksums by default for everyone, not until we can provide tools to deal with failures like this (I don't think users will be amused if we tell them to use 'dd' and inspect the pages in a hex editor). ISTM the way forward is to keep the current default (disabled), but to allow enabling checksums on the fly. That will mostly fix the issue for people who actually want checksums but don't realize they need to enable them at initdb time (and starting from scratch is not an option for them), are running on good hardware and are capable of dealing with checksum errors if needed, even without more built-in tooling. Being able to disable checksums on the fly is nice, but it only really solves the issue of extra overhead - it does really help with the failures (particularly when you can't even start the database, because of a checksum failure in the startup phase). So, shall we discuss what tooling woul
Re: [HACKERS] Checksums by default?
On 2/3/17 5:31 PM, Andres Freund wrote: You can't really see things from other databases that way tho. So you need to write a tool that iterates all databases and such. Not that that's a huge problem, but it doesn't make things easier at least. True. Not terribly hard to iterate though, and if the author of this mythical extension really wanted to they could probably use a bgworker that was free to iterate through the databases. (and you need to deal with things like forks, but that's not a huge issue) Yeah, which maybe requires version-specific hard-coded knowledge of how many forks you might have. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 30, 2017 at 12:29 PM, David Steele wrote: > The solution was to simply ignore the checksums of any pages with an LSN >>= the LSN returned by pg_start_backup(). This means that hot blocks > may never be checked during backup, but if they are active then any > problems should be caught directly by PostgreSQL. I feel like this doesn't fix the problem. Suppose the backup process reads part of a block that hasn't been modified in a while, and then PostgreSQL writes the block, and then the backup process reads the rest of the block. The LSN test will not prevent the checksum from being verified, but the checksum will fail to match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-02-03 17:23:15 -0600, Jim Nasby wrote: > On 1/25/17 6:40 PM, Stephen Frost wrote: > > Obviously, having to bring up a full database is an extra step (one we > > try to make easy to do), but, sadly, we don't have any way to ask PG to > > verify all the checksums with released versions, so that's what we're > > working with. > > Wouldn't it be fairly trivial to write an extension that did that though? > > foreach r in pg_class where relkind in (...) > for (b = 0; b < r.relpages; b++) > ReadBufferExtended(..., BAS_BULKREAD); You can't really see things from other databases that way tho. So you need to write a tool that iterates all databases and such. Not that that's a huge problem, but it doesn't make things easier at least. (and you need to deal with things like forks, but that's not a huge issue) - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/25/17 6:40 PM, Stephen Frost wrote: Obviously, having to bring up a full database is an extra step (one we try to make easy to do), but, sadly, we don't have any way to ask PG to verify all the checksums with released versions, so that's what we're working with. Wouldn't it be fairly trivial to write an extension that did that though? foreach r in pg_class where relkind in (...) for (b = 0; b < r.relpages; b++) ReadBufferExtended(..., BAS_BULKREAD); -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/25/17 10:38 PM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund wrote: >>> On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: * Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> As it is, there are backup solutions which *do* check the checksum when >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> but something which really exists and will hopefully keep users from >> losing data. > > Wouldn't that have issues with torn pages? No, why would it? The page has either been written out by PG to the OS, in which case the backup s/w will see the new page, or it hasn't been. >>> >>> Uh. Writes aren't atomic on that granularity. That means you very well >>> *can* see a torn page (in linux you can e.g. on 4KB os page boundaries >>> of a 8KB postgres page). Just read a page while it's being written out. >> >> Yeah. This is also why backups force full page writes on even if >> they're turned off in general. > > I've got a question into David about this, I know we chatted about the > risk at one point, I just don't recall what we ended up doing (I can > imagine a few different possible things- re-read the page, which isn't a > guarantee but reduces the chances a fair bit, or check the LSN, or > perhaps the plan was to just check if it's in the WAL, as I mentioned) > or if we ended up concluding it wasn't a risk for some, perhaps > incorrect, reason and need to revisit it. The solution was to simply ignore the checksums of any pages with an LSN >= the LSN returned by pg_start_backup(). This means that hot blocks may never be checked during backup, but if they are active then any problems should be caught directly by PostgreSQL. This technique assumes that blocks can be consistently read in the order they were written. If the second 4k (or 512 byte, etc.) block of the fwrite is visible before the first 4k block then there would a false positive. I have a hard time imagining any sane buffering system working this way, but I can't discount it. It's definitely possible for pages on disk to have this characteristic (i.e., the first block is not written first) but that should be fixed during recovery before it is possible to take a backup. Note that reports of page checksum errors are informational only and do not have any effect on the backup process. Even so we would definitely prefer to avoid false positives. If anybody can poke a hole in this solution then I would like to hear it. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Checksums by default?
On 01/25/2017 05:25 PM, Peter Geoghegan wrote: On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan wrote: I understand that my experience with storage devices is unusually narrow compared to everyone else here. That's why I remain neutral on the high level question of whether or not we ought to enable checksums by default. I'll ask other hackers to answer what may seem like a very naive question, while bearing what I just said in mind. The question is: Have you ever actually seen a checksum failure in production? And, if so, how helpful was it? No. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund wrote: > > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > >> * Peter Geoghegan (p...@heroku.com) wrote: > >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost > >> > wrote: > >> > > As it is, there are backup solutions which *do* check the checksum when > >> > > backing up PG. This is no longer, thankfully, some hypothetical thing, > >> > > but something which really exists and will hopefully keep users from > >> > > losing data. > >> > > >> > Wouldn't that have issues with torn pages? > >> > >> No, why would it? The page has either been written out by PG to the OS, > >> in which case the backup s/w will see the new page, or it hasn't been. > > > > Uh. Writes aren't atomic on that granularity. That means you very well > > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > > of a 8KB postgres page). Just read a page while it's being written out. > > Yeah. This is also why backups force full page writes on even if > they're turned off in general. I've got a question into David about this, I know we chatted about the risk at one point, I just don't recall what we ended up doing (I can imagine a few different possible things- re-read the page, which isn't a guarantee but reduces the chances a fair bit, or check the LSN, or perhaps the plan was to just check if it's in the WAL, as I mentioned) or if we ended up concluding it wasn't a risk for some, perhaps incorrect, reason and need to revisit it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On 26 January 2017 at 01:58, Thomas Munro wrote: > > I don't know how comparable it is to our checksum technology, but > MySQL seems to have some kind of checksums on table data, and you can > find public emails, blogs etc lamenting corrupted databases by > searching Google for the string "InnoDB: uncompressed page, stored > checksum in field1" (that's the start of a longer error message that > includes actual and expected checksums). I'm not sure what exactly that teaches us however. I see these were often associated with software bugs (Apparently MySQL long assumed that a checksum of 0 never happened for example). In every non software case I stumbled across seemed to be following a power failure. Apparently MySQL uses a "doublewrite buffer" to protect against torn pages but when I search for that I get tons of people inquiring how to turn it off... So even without software bugs in the checksum code I don't know that the frequency of the error necessarily teaches us anything about the frequency of hardware corruption either. And more to the point it seems what people are asking for in all those lamentations is how they can convince MySQL to continue and ignore the corruption. A typical response was "We slightly modified innochecksum and added option -f that means if the checksum of a page is wrong, rewrite it in the InnoDB page header." Which begs the question... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund wrote: > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: >> * Peter Geoghegan (p...@heroku.com) wrote: >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> > > As it is, there are backup solutions which *do* check the checksum when >> > > backing up PG. This is no longer, thankfully, some hypothetical thing, >> > > but something which really exists and will hopefully keep users from >> > > losing data. >> > >> > Wouldn't that have issues with torn pages? >> >> No, why would it? The page has either been written out by PG to the OS, >> in which case the backup s/w will see the new page, or it hasn't been. > > Uh. Writes aren't atomic on that granularity. That means you very well > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > of a 8KB postgres page). Just read a page while it's being written out. Yeah. This is also why backups force full page writes on even if they're turned off in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 2:28 PM, Stephen Frost wrote: > Sadly, without having them enabled by default, there's not a huge corpus > of example cases to draw from. > > There have been a few examples already posted about corruption failures > with PG, but one can't say with certainty that they would have been > caught sooner if checksums had been enabled. I don't know how comparable it is to our checksum technology, but MySQL seems to have some kind of checksums on table data, and you can find public emails, blogs etc lamenting corrupted databases by searching Google for the string "InnoDB: uncompressed page, stored checksum in field1" (that's the start of a longer error message that includes actual and expected checksums). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Peter, * Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan wrote: > > I understand that my experience with storage devices is unusually > > narrow compared to everyone else here. That's why I remain neutral on > > the high level question of whether or not we ought to enable checksums > > by default. I'll ask other hackers to answer what may seem like a very > > naive question, while bearing what I just said in mind. The question > > is: Have you ever actually seen a checksum failure in production? And, > > if so, how helpful was it? > > I'm surprised that nobody has answered my question yet. > > I'm not claiming that not actually seeing any corruption in the wild > due to a failing checksum invalidates any argument. I *do* think that > data points like this can be helpful, though. Sadly, without having them enabled by default, there's not a huge corpus of example cases to draw from. There have been a few examples already posted about corruption failures with PG, but one can't say with certainty that they would have been caught sooner if checksums had been enabled. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan wrote: > I understand that my experience with storage devices is unusually > narrow compared to everyone else here. That's why I remain neutral on > the high level question of whether or not we ought to enable checksums > by default. I'll ask other hackers to answer what may seem like a very > naive question, while bearing what I just said in mind. The question > is: Have you ever actually seen a checksum failure in production? And, > if so, how helpful was it? I'm surprised that nobody has answered my question yet. I'm not claiming that not actually seeing any corruption in the wild due to a failing checksum invalidates any argument. I *do* think that data points like this can be helpful, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost wrote: > > I hope to discuss it further after we have the ability to turn it off > > easily. > > I think we should have the ability to flip it in BOTH directions easily. Presumably you imply this to mean "before we enable it by default." I'm not sure that I can agree with that, but we haven't got it in either direction yet, so it's not terribly interesting to discuss that particular "what if." > It sounds to me like you are misleading users about the positives and > negatives of checksums, which then causes them to be shocked that they > are not the default. I don't try to claim that they are without downsides or performance impacts, if that's the implication here. > > [ more unsolicited bragging an unspecified backup tool, presumably still > > pgbackrest ] It was explicitly to counter the claim that there aren't things out there which are working to actively check the checksums. > > I'd rather walk into an engagement where the user is saying "yeah, we > > enabled checksums and it caught this corruption issue" than having to > > break the bad news, which I've had to do over and over, that their > > existing system hasn't got checksums enabled. This isn't hypothetical, > > it's what I run into regularly with entirely reasonable and skilled > > engineers who have been deploying PG. > > Maybe you should just stop telling them and use the time thus freed up > to work on improving the checksum feature. I'm working to improve the usefulness of our checksum feature in a way which will produce practical and much more immediate results than anything I could do today in PG. That said, I do plan to also support working on checksums as I'm able to. At the moment, that's supporting Magnus' thread about enabling them by default. I'd be a bit surprised if he was trying to force a change on PG because he thinks it's going to improve things for pgbackrest, but if so, I'm not going to complain when it seems like an entirely sensible and good change which will benefit PG's users too. Even better would be if we had an independent tool to check checksums endorsed by the PG community, but that won't happen for a release cycle. I'd also be extremely happy if the other backup tools out there grew the ability to check checksums in PG pages; frankly, I hope that adding it to pgbackrest will push them to do so. > I'm skeptical of this whole discussion because you seem to be filled > with unalloyed confidence that checksums have little performance > impact and will do wonderful things to prevent data loss, whereas I > think they have significant performance impact and will only very > slightly help to prevent data loss. I admit that they'll have a significant performance impact in some environments, but I think the vast majority of installations won't see anything different, while some of them may be saved by it, including, as likely as not, a number of actual corruption issues that have been brought up on these lists in the past few days, simply because reports were asked for. > I admit that the idea of having > pgbackrest verify checksums while backing up seems like it could > greatly improve the chances of checksums being useful, but I'm not > going to endorse changing PostgreSQL's default for pgbackrest's > benefit. I'm glad to hear that you generally endorse the idea of having a backup tool verify checksums. I'd love it if all of them did and I'm not going to apologize for, as far as I'm aware, being the first to even make an effort in that direction. > It's got to be to the benefit of PostgreSQL users broadly, > not just the subset of those people who use one particular backup > tool. Hopefully, other backup solutions will add similar capability, and perhaps someone will also write an independent tool, and eventually those will get out in released versions, and maybe PG will grow a tool to check checksums too, but I can't make other tool authors implement it, nor can I make other committers work on it and while I'm doing what I can, as I'm sure you understand, we all have a lot of different hats. > Also, the massive hit that will probably occur on > high-concurrency OLTP workloads larger than shared_buffers is going to > be had to justify for any amount of backup security. I think that > problem's got to be solved or at least mitigated before we think about > changing this. I realize that not everyone would set the bar that > high, but I see far too many customers with exactly that workload to > dismiss it lightly. I have a sneaking suspicion that the customers which you get directly involved with tend to be at a different level than the majority of PG users which exist out in the wild (I can't say that it's really any different for me). I don't think that's a bad thing, but I do think users at all levels deserve consideration and not just those running close to the limits of their gear. Thanks! Ste
Re: [HACKERS] Checksums by default?
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > That would be enough. It should also be rare enough that there would > not be that many pages to track when looking at records from the > backup start position to minimum recovery point. It could be also > simpler, though more time-consuming, to just let a backup recover up > to the minimum recovery point (recovery_target = 'immediate'), and > then run the checksum sanity checks. There are other checks usually > needed on a backup anyway like being sure that index pages are in good > shape even with a correct checksum, etc. Belive me, I'm all for *all* of that. > But here I am really high-jacking the thread, so I'll stop.. If you have further thoughts, I'm all ears. This is all relatively new, and I don't expect to have all of the answer or solutions. Obviously, having to bring up a full database is an extra step (one we try to make easy to do), but, sadly, we don't have any way to ask PG to verify all the checksums with released versions, so that's what we're working with. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > > * Peter Geoghegan (p...@heroku.com) wrote: > > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > > > As it is, there are backup solutions which *do* check the checksum when > > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > > but something which really exists and will hopefully keep users from > > > > losing data. > > > > > > Wouldn't that have issues with torn pages? > > > > No, why would it? The page has either been written out by PG to the OS, > > in which case the backup s/w will see the new page, or it hasn't been. > > Uh. Writes aren't atomic on that granularity. That means you very well > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > of a 8KB postgres page). Just read a page while it's being written out. > > You simply can't reliably verify checksums without replaying WAL (or > creating a manual version of replay, as in checking the WAL for a FPW). Looking through the WAL isn't any surprise and is something we've been planning to do for other reasons anyway. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier >> wrote: >> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >>> As it is, there are backup solutions which *do* check the checksum when >> >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >>> but something which really exists and will hopefully keep users from >> >>> losing data. >> >> >> >> Wouldn't that have issues with torn pages? >> > >> > Why? What do you foresee here? I would think such backup solutions are >> > careful enough to ensure correctly the durability of pages so as they >> > are not partially written. >> >> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup >> tool from overlapping with a write(fd, buf, 8192) performed by the >> backend. > > As Michael mentioned, that'd depend on if things are atomic from a > user's perspective at certain sizes (perhaps 4k, which wouldn't be too > surprising, but may also be system-dependent), in which case verifying > that the page is in the WAL would be sufficient. That would be enough. It should also be rare enough that there would not be that many pages to track when looking at records from the backup start position to minimum recovery point. It could be also simpler, though more time-consuming, to just let a backup recover up to the minimum recovery point (recovery_target = 'immediate'), and then run the checksum sanity checks. There are other checks usually needed on a backup anyway like being sure that index pages are in good shape even with a correct checksum, etc. But here I am really high-jacking the thread, so I'll stop.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > * Peter Geoghegan (p...@heroku.com) wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > > As it is, there are backup solutions which *do* check the checksum when > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > but something which really exists and will hopefully keep users from > > > losing data. > > > > Wouldn't that have issues with torn pages? > > No, why would it? The page has either been written out by PG to the OS, > in which case the backup s/w will see the new page, or it hasn't been. Uh. Writes aren't atomic on that granularity. That means you very well *can* see a torn page (in linux you can e.g. on 4KB os page boundaries of a 8KB postgres page). Just read a page while it's being written out. You simply can't reliably verify checksums without replaying WAL (or creating a manual version of replay, as in checking the WAL for a FPW). > This isn't like a case where only half the page made it to the disk > because of a system failure though; everything is online and working > properly during an online backup. I don't think that really changes anything. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier > wrote: > > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >>> As it is, there are backup solutions which *do* check the checksum when > >>> backing up PG. This is no longer, thankfully, some hypothetical thing, > >>> but something which really exists and will hopefully keep users from > >>> losing data. > >> > >> Wouldn't that have issues with torn pages? > > > > Why? What do you foresee here? I would think such backup solutions are > > careful enough to ensure correctly the durability of pages so as they > > are not partially written. > > Well, you'd have to keep a read(fd, buf, 8192) performed by the backup > tool from overlapping with a write(fd, buf, 8192) performed by the > backend. As Michael mentioned, that'd depend on if things are atomic from a user's perspective at certain sizes (perhaps 4k, which wouldn't be too surprising, but may also be system-dependent), in which case verifying that the page is in the WAL would be sufficient. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Michael Paquier (michael.paqu...@gmail.com) wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. I believe his concern was that the backup sw might see a partially-updated page when it reads the file while PG is writing it. In other words, would the kernel return some intermediate state of data while an fwrite() is in progress. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > As it is, there are backup solutions which *do* check the checksum when > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > but something which really exists and will hopefully keep users from > > losing data. > > Wouldn't that have issues with torn pages? No, why would it? The page has either been written out by PG to the OS, in which case the backup s/w will see the new page, or it hasn't been. Our testing has not turned up any issues as yet. That said, it's relatively new and I wouldn't be surprised if we need to do some adjustments in that area, which might be system-dependent even. We could certainly check the WAL for the page that had a checksum error (we currently simply report them, though don't throw away a prior backup if we detect one). This isn't like a case where only half the page made it to the disk because of a system failure though; everything is online and working properly during an online backup. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:22 AM, Andres Freund wrote: > On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: >> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >> As it is, there are backup solutions which *do* check the checksum when >> >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >> but something which really exists and will hopefully keep users from >> >> losing data. >> > >> > Wouldn't that have issues with torn pages? >> >> Why? What do you foresee here? I would think such backup solutions are >> careful enough to ensure correctly the durability of pages so as they >> are not partially written. > > That means you have to replay enough WAL to get into a consistent > state... Ah, OK I got the point. Yes that would be a problem to check this field on raw backups except if the page size matches the kernel's one at 4k. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >>> As it is, there are backup solutions which *do* check the checksum when >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >>> but something which really exists and will hopefully keep users from >>> losing data. >> >> Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. Well, you'd have to keep a read(fd, buf, 8192) performed by the backup tool from overlapping with a write(fd, buf, 8192) performed by the backend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost wrote: > I hope to discuss it further after we have the ability to turn it off > easily. I think we should have the ability to flip it in BOTH directions easily. >> Second, really hard to enable is a relative term. I accept that >> enabling checksums is not a pleasant process. Right now, you'd have >> to do a dump/restore, or use logical replication to replicate the data >> to a new cluster and then switch over. On the other hand, if >> checksums are really a critical feature, how are people getting to the >> point where they've got a mission-critical production system and only >> then discovering that they want to enable checksums? > > I truely do wish everyone would come talk to me before building out a > database. Perhaps that's been your experience, in which case, I envy > you, but I tend to get a reaction more along the lines of "wait, what do > you mean I had to pass some option to initdb to enable checksum?!?!". > The fact that we've got a WAL implementation and clearly understand > fsync requirements, why full page writes make sense, and that our WAL > has its own CRCs which isn't possible to disable, tends to lead people > to think we really know what we're doing and that we care a lot about > their data. It sounds to me like you are misleading users about the positives and negatives of checksums, which then causes them to be shocked that they are not the default. > As I have said, I don't believe it has to be on for everyone. For the second time, I didn't say that. But the default has a powerful influence on behavior. If it didn't, you wouldn't be trying to get it changed. > [ unsolicited bragging about an unspecified backup tool, presumably > pgbackrest ] Great. > Presently, last I checked at least, the database system doesn't fall > over and die if a single page's checksum fails. This is another thing that I never said. > [ more unsolicited bragging an unspecified backup tool, presumably still > pgbackrest ] Swell. >> I'm not trying to downplay the usefulness of checksums *in a certain >> context*. It's a good feature, and I'm glad we have it. But I think >> you're somewhat inflating the utility of it while discounting the very >> real costs. > > The costs for checksums don't bother me any more than the costs for WAL > or WAL CRCs or full page writes. Obviously. But I think they should. Frankly, I think the costs for full page writes should bother the heck out of all of us, but the solution isn't to shut them off any more than it is to enable checksums despite the cost. It's to find a way to reduce the costs. > They may not be required on every > system, but they're certainly required on more than 'zero' entirely > reasonable systems which people deploy in their production environments. Nobody said otherwise. > I'd rather walk into an engagement where the user is saying "yeah, we > enabled checksums and it caught this corruption issue" than having to > break the bad news, which I've had to do over and over, that their > existing system hasn't got checksums enabled. This isn't hypothetical, > it's what I run into regularly with entirely reasonable and skilled > engineers who have been deploying PG. Maybe you should just stop telling them and use the time thus freed up to work on improving the checksum feature. I'm skeptical of this whole discussion because you seem to be filled with unalloyed confidence that checksums have little performance impact and will do wonderful things to prevent data loss, whereas I think they have significant performance impact and will only very slightly help to prevent data loss. I admit that the idea of having pgbackrest verify checksums while backing up seems like it could greatly improve the chances of checksums being useful, but I'm not going to endorse changing PostgreSQL's default for pgbackrest's benefit. It's got to be to the benefit of PostgreSQL users broadly, not just the subset of those people who use one particular backup tool. Also, the massive hit that will probably occur on high-concurrency OLTP workloads larger than shared_buffers is going to be had to justify for any amount of backup security. I think that problem's got to be solved or at least mitigated before we think about changing this. I realize that not everyone would set the bar that high, but I see far too many customers with exactly that workload to dismiss it lightly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. That means you have to replay enough WAL to get into a consistent state... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> As it is, there are backup solutions which *do* check the checksum when >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> but something which really exists and will hopefully keep users from >> losing data. > > Wouldn't that have issues with torn pages? Why? What do you foresee here? I would think such backup solutions are careful enough to ensure correctly the durability of pages so as they are not partially written. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > As it is, there are backup solutions which *do* check the checksum when > backing up PG. This is no longer, thankfully, some hypothetical thing, > but something which really exists and will hopefully keep users from > losing data. Wouldn't that have issues with torn pages? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost wrote: > > Yet, our default is to have them disabled and *really* hard to enable. > > First of all, that could be fixed by further development. I'm certainly all for doing so, but I don't agree that it necessairly is required before we flip the default. That said, if the way to get checksums enabled by default is providing a relativly easy way to turn them off, then that's something which I'll do what I can to help work towards. In other words, I'm not going to continue to argue, given the various opinions of the group, that we should just flip it tomorrow. I hope to discuss it further after we have the ability to turn it off easily. > Second, really hard to enable is a relative term. I accept that > enabling checksums is not a pleasant process. Right now, you'd have > to do a dump/restore, or use logical replication to replicate the data > to a new cluster and then switch over. On the other hand, if > checksums are really a critical feature, how are people getting to the > point where they've got a mission-critical production system and only > then discovering that they want to enable checksums? I truely do wish everyone would come talk to me before building out a database. Perhaps that's been your experience, in which case, I envy you, but I tend to get a reaction more along the lines of "wait, what do you mean I had to pass some option to initdb to enable checksum?!?!". The fact that we've got a WAL implementation and clearly understand fsync requirements, why full page writes make sense, and that our WAL has its own CRCs which isn't possible to disable, tends to lead people to think we really know what we're doing and that we care a lot about their data. > > I agree that it's unfortunate that we haven't put more effort into > > fixing that- I'm all for it, but it's disappointing to see that people > > are not in favor of changing the default as I believe it would both help > > our users and encourage more development of the feature. > > I think it would help some users and hurt others. I do agree that it > would encourage more development of the feature -- almost of > necessity. In particular, I bet it would spur development of an > efficient way of turning checksums off -- but I'd rather see us > approach it from the other direction: let's develop an efficient way > of turning the feature on and off FIRST. Deciding that the feature > has to be on for everyone because turning it on later is too hard for > the people who later decide they want it is letting the tail wag the > dog. As I have said, I don't believe it has to be on for everyone. > Also, I think that one of the big problems with the way checksums work > is that you don't find problems with your archived data until it's too > late. Suppose that in February bits get flipped in a block. You > don't access the data until July[1]. Well, it's nice to have the > system tell you that the data is corrupted, but what are you going to > do about it? By that point, all of your backups are probably > corrupted. So it's basically: If your backup system is checking the checksums when backing up PG, which I think every backup system *should* be doing, then guess what? You've got a backup which you can go back to immediately, possibly with the ability to restore all of the data from WAL. That won't always be the case, naturally, but it's a much better position than simply having a system which continues to degrade until you've actually reached the "you're screwed" level because PG will no longer read a page or perhaps can't even start up, *and* you no longer have any backups. As it is, there are backup solutions which *do* check the checksum when backing up PG. This is no longer, thankfully, some hypothetical thing, but something which really exists and will hopefully keep users from losing data. > It's nice to know that (maybe?) but without a recovery strategy a > whole lot of people who get that message are going to immediately > start asking "How do I ignore the fact that I'm screwed and try to > read the data anyway?". And we have options for that. > And then you wonder what the point of having > the feature turned on is, especially if it's costly. It's almost an > attractive nuisance at that point - nobody wants to be the user that > turns off checksums because they sound good on paper, but when you > actually have a problem an awful lot of people are NOT going to want > to try to restore from backup and maybe lose recent transactions. > They're going to want to ignore the checksum failures. That's kind of > awful. Presently, last I checked at least, the database system doesn't fall over and die if a single page's checksum fails. I agree entirely that we want the system to fail gracefully (unless the user instructs us otherwise, perhaps because they have a redundant system that they can flip to immediately). > Pe
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 12:23 PM, Robert Haas wrote: > Also, I think that one of the big problems with the way checksums work > is that you don't find problems with your archived data until it's too > late. Suppose that in February bits get flipped in a block. You > don't access the data until July[1]. Well, it's nice to have the > system tell you that the data is corrupted, but what are you going to > do about it? By that point, all of your backups are probably > corrupted. So it's basically: > > ERROR: you're screwed > > It's nice to know that (maybe?) but without a recovery strategy a > whole lot of people who get that message are going to immediately > start asking "How do I ignore the fact that I'm screwed and try to > read the data anyway?". That's also how I tend to think about it. I understand that my experience with storage devices is unusually narrow compared to everyone else here. That's why I remain neutral on the high level question of whether or not we ought to enable checksums by default. I'll ask other hackers to answer what may seem like a very naive question, while bearing what I just said in mind. The question is: Have you ever actually seen a checksum failure in production? And, if so, how helpful was it? I myself have not, despite the fact that Heroku uses checksums wherever possible, and has the technical means to detect problems like this across the entire fleet of customer databases. Not even once. This is not what I would have expected myself several years ago. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost wrote: >> Sure. If the database runs fast enough with checksums enabled, >> there's basically no reason to have them turned off. The issue is >> when it doesn't. > > I don't believe we're talking about forcing every user to have checksums > enabled. We are discussing the default. I never said otherwise. > Would you say that most user's databases run fast enough with checksums > enabled? Or more than most, maybe 70%? 80%? In today's environment, > I'd probably say that it's more like 90+%. I don't have statistics on that, but I'd certainly agree that it's over 90%. However, I estimate that the number of percentage of people who wouldn't be helped by checksums is also over 90%. I don't think it's easy to say whether there are more people who would benefit from checksums than would be hurt by the performance penalty or visca versa. My own feeling is the second, but I understand that yours is the first. > Yet, our default is to have them disabled and *really* hard to enable. First of all, that could be fixed by further development. Second, really hard to enable is a relative term. I accept that enabling checksums is not a pleasant process. Right now, you'd have to do a dump/restore, or use logical replication to replicate the data to a new cluster and then switch over. On the other hand, if checksums are really a critical feature, how are people getting to the point where they've got a mission-critical production system and only then discovering that they want to enable checksums? If you tell somebody "we have an optional feature called checksums and you should really use it" and they respond "well, I'd like to, but I already put my system into critical production use and it's not worth it to me to take downtime to get them enabled", that sounds to me like the feature is nice-to-have, not absolutely essential. When something is essential, you find a way to get it done, whether it's painful or not, because that's what essential means. And if checksums are not essential, then they shouldn't be enabled by default unless they're very cheap -- and I think we already know that's not true in all workloads. > I agree that it's unfortunate that we haven't put more effort into > fixing that- I'm all for it, but it's disappointing to see that people > are not in favor of changing the default as I believe it would both help > our users and encourage more development of the feature. I think it would help some users and hurt others. I do agree that it would encourage more development of the feature -- almost of necessity. In particular, I bet it would spur development of an efficient way of turning checksums off -- but I'd rather see us approach it from the other direction: let's develop an efficient way of turning the feature on and off FIRST. Deciding that the feature has to be on for everyone because turning it on later is too hard for the people who later decide they want it is letting the tail wag the dog. Also, I think that one of the big problems with the way checksums work is that you don't find problems with your archived data until it's too late. Suppose that in February bits get flipped in a block. You don't access the data until July[1]. Well, it's nice to have the system tell you that the data is corrupted, but what are you going to do about it? By that point, all of your backups are probably corrupted. So it's basically: ERROR: you're screwed It's nice to know that (maybe?) but without a recovery strategy a whole lot of people who get that message are going to immediately start asking "How do I ignore the fact that I'm screwed and try to read the data anyway?". And then you wonder what the point of having the feature turned on is, especially if it's costly. It's almost an attractive nuisance at that point - nobody wants to be the user that turns off checksums because they sound good on paper, but when you actually have a problem an awful lot of people are NOT going to want to try to restore from backup and maybe lose recent transactions. They're going to want to ignore the checksum failures. That's kind of awful. Peter's comments upthread get at this: "We need to invest in corruption detection/verification tools that are run on an as-needed basis." Exactly. If we could verify that our data is good before throwing away our old backups, that'd be good. If we could verify that our indexes were structurally sane, that would be superior to anything checksums can ever give us because it catches not only storage failures but also software failures within PostgreSQL itself and user malfeasance above the PostgreSQL layer (e.g. redefining the supposedly-immutable function to give different answers) and damage inflicted inadvertently by environmental changes (e.g. upgrading glibc and having strcoll() change its mind). If we could verify that every XID and MXID in the heap points to a clog or multixact record that still exists, that'
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 8:18 PM, Robert Haas wrote: > Also, it's not as if there are no other ways of checking whether your > disks are failing. SMART, for example, is supposed to tell you about > incipient hardware failures before PostgreSQL ever sees a bit flip. > Surely an average user would love to get a heads-up that their > hardware is failing even when that hardware is not being used to power > PostgreSQL, yet many people don't bother to configure SMART (or > similar proprietary systems provided by individual vendors). You really can't rely on SMART to tell you about hardware failures. 1 in 4 drives fail completely with 0 SMART indication [1]. And for the 1 in 1000 annual checksum failure rate other indicators except system restarts only had a weak correlation[2]. And this is without filesystem and other OS bugs that SMART knows nothing about. My view may be biased by mostly seeing the cases where things have already gone wrong, but I recommend support clients to turn checksums on unless it's known that write IO is going to be an issue. Especially because I know that if it turns out to be a problem I can go in and quickly hack together a tool to help them turn it off. I do agree that to change the PostgreSQL default at least some tool turn it off online should be included. [1] https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/ [2] https://www.usenix.org/legacy/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/25/2017 11:41 AM, Tom Lane wrote: Stephen Frost writes: Would you say that most user's databases run fast enough with checksums enabled? Or more than most, maybe 70%? 80%? In today's environment, I'd probably say that it's more like 90+%. It would be nice if there were some actual evidence about this, rather than numbers picked out of the air. I agree that it's unfortunate that we haven't put more effort into fixing that- I'm all for it, but it's disappointing to see that people are not in favor of changing the default as I believe it would both help our users and encourage more development of the feature. I think the really key point is that a whole lot of infrastructure work needs to be done still, and changing the default before that work has been done is not going to be user-friendly. The most pressing issue being the difficulty of changing the setting after the fact. It would be a *whole* lot easier to sell default-on if there were a way to turn it off, and yet you want us to buy into default-on before that way exists. Come back after that feature is in, and we can talk. +1 Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Stephen Frost writes: > Would you say that most user's databases run fast enough with checksums > enabled? Or more than most, maybe 70%? 80%? In today's environment, > I'd probably say that it's more like 90+%. It would be nice if there were some actual evidence about this, rather than numbers picked out of the air. > I agree that it's unfortunate that we haven't put more effort into > fixing that- I'm all for it, but it's disappointing to see that people > are not in favor of changing the default as I believe it would both help > our users and encourage more development of the feature. I think the really key point is that a whole lot of infrastructure work needs to be done still, and changing the default before that work has been done is not going to be user-friendly. The most pressing issue being the difficulty of changing the setting after the fact. It would be a *whole* lot easier to sell default-on if there were a way to turn it off, and yet you want us to buy into default-on before that way exists. Come back after that feature is in, and we can talk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: > > Trying to force those people to use checksums is just masterminding; > > they've made their own decision that it's not worth bothering with. > > When something goes wrong, WE still care about distinguishing hardware > > failure from PostgreSQL failure. Our pride is on the line. But the > > customer often doesn't. The DBA isn't the same person as the > > operating system guy, and the operating system guy isn't going to > > listen to the DBA even if the DBA complains of checksum failures. > > We need to invest in corruption detection/verification tools that are > run on an as-needed basis. They are available to users of every other > major database system. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby wrote: > > I'm not completely grokking your second paragraph, but I would think that an > > average user would love got get a heads-up that their hardware is failing. > > Sure. If the database runs fast enough with checksums enabled, > there's basically no reason to have them turned off. The issue is > when it doesn't. I don't believe we're talking about forcing every user to have checksums enabled. We are discussing the default. Would you say that most user's databases run fast enough with checksums enabled? Or more than most, maybe 70%? 80%? In today's environment, I'd probably say that it's more like 90+%. Yet, our default is to have them disabled and *really* hard to enable. I agree that it's unfortunate that we haven't put more effort into fixing that- I'm all for it, but it's disappointing to see that people are not in favor of changing the default as I believe it would both help our users and encourage more development of the feature. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 1:37 PM, Peter Geoghegan wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: >> Trying to force those people to use checksums is just masterminding; >> they've made their own decision that it's not worth bothering with. >> When something goes wrong, WE still care about distinguishing hardware >> failure from PostgreSQL failure. Our pride is on the line. But the >> customer often doesn't. The DBA isn't the same person as the >> operating system guy, and the operating system guy isn't going to >> listen to the DBA even if the DBA complains of checksum failures. > > We need to invest in corruption detection/verification tools that are > run on an as-needed basis. They are available to users of every other > major database system. +1, but the trick is (a) figuring out exactly what to develop and (b) finding the time to develop it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: > Trying to force those people to use checksums is just masterminding; > they've made their own decision that it's not worth bothering with. > When something goes wrong, WE still care about distinguishing hardware > failure from PostgreSQL failure. Our pride is on the line. But the > customer often doesn't. The DBA isn't the same person as the > operating system guy, and the operating system guy isn't going to > listen to the DBA even if the DBA complains of checksum failures. We need to invest in corruption detection/verification tools that are run on an as-needed basis. They are available to users of every other major database system. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby wrote: > I'm not completely grokking your second paragraph, but I would think that an > average user would love got get a heads-up that their hardware is failing. Sure. If the database runs fast enough with checksums enabled, there's basically no reason to have them turned off. The issue is when it doesn't. Also, it's not as if there are no other ways of checking whether your disks are failing. SMART, for example, is supposed to tell you about incipient hardware failures before PostgreSQL ever sees a bit flip. Surely an average user would love to get a heads-up that their hardware is failing even when that hardware is not being used to power PostgreSQL, yet many people don't bother to configure SMART (or similar proprietary systems provided by individual vendors). Trying to force those people to use checksums is just masterminding; they've made their own decision that it's not worth bothering with. When something goes wrong, WE still care about distinguishing hardware failure from PostgreSQL failure. Our pride is on the line. But the customer often doesn't. The DBA isn't the same person as the operating system guy, and the operating system guy isn't going to listen to the DBA even if the DBA complains of checksum failures. Or the customer has 100 things on the same piece of hardware and PostgreSQL is the only one that failed; or alternatively they all failed around the same time; either way the culprit is obvious. Or the remedy is to restore from backup[1] whether the problem is hardware or software and regardless of whose software is to blame. Or their storage cost a million dollars and is a year old and they simply won't believe that it's failing. Or their storage cost a hundred dollars and is 8 years old and they're looking for an excuse to replace it whether it's responsible for the problem du jour or not. I think it's great that we have a checksum feature and I think it's great for people who want to use it and are willing to pay the cost of it to turn it on. I don't accept the argument that all of our users, or even most of them, fall into that category. I also think it's disappointing that there's such a vigorous argument for changing the default when so little follow-on development has gone into this feature. If we had put any real effort into making this easier to turn on and off, for example, the default value would be less important, because people could change it more easily. But nobody's making that effort. I suggest that the people who think this a super-high-value feature should be willing to put some real work into improving it instead of trying to force it down everybody's throat as-is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Alternatively, sometimes the remedy is to wish the had a usable backup while frantically running pg_resetxlog. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 11:57 AM, Andres Freund wrote: > On 2017-01-21 11:39:18 +0100, Magnus Hagander wrote: >> Is it time to enable checksums by default, and give initdb a switch to turn >> it off instead? > > -1 - the WAL overhead is quite massive, and in contrast to the other > GUCs recently changed you can't just switch this around. I agree. I bet that if somebody does the test suggested by Amit downthread, it'll turn out that the performance is just awful. And those cases are common. I think the people saying "well, the overhead is worth it" must be people whose systems (or whose customer's systems) aren't processing continuous heavy OLTP workloads. If you've got a data warehousing workload, checksums are probably pretty cheap. If you've got a low-velocity OLTP workload, or an OLTP workload that fits in shared_buffers, it's probably bearable. But if you've got 8GB of shared_buffers and 100GB of data, and you've got 100 or so backends continuously doing random updates, I think checksums are going nail you to the wall. And EnterpriseDB, at least, has lots of customers that do exactly that sort of thing. Having said that, I've certain run into situations where I speculated that a customer had a hardware problem and they speculated that we had given them buggy database software. In a pretty significant number of cases, the customer turned out to be right; for example, some of those people were suffering from multixact bugs that resulted in unexplainable corruption. Now, would it have been useful to know that checksums were passing (suggesting a PostgreSQL problem) rather than failing (suggesting an OS problem)? Yes, that would have been great. I could have given those customers better support. On the other hand, I think I've run into MORE cases where the customer was desperately seeking options to improve write performance, which remains a pretty significant problem for PostgreSQL. I can't see taking a significant hit in that area for my convenience in understanding what's going on in data corruption situations. The write performance penalty is paid by everybody all the time, whereas data corruption is a rare event even among support cases. And even when you do have corruption, whether or not the data corruption is accompanied by a checksum failure is only ONE extra bit of useful data. A failure doesn't guarantee a hardware problem; it could be caused by a faulty backup procedure, like forgetting to run pg_start_backup(). The lack of a failure doesn't guarantee a software problem; it could be caused by a faulty backup procedure, like using an OS-level snapshot facility that isn't exactly simultaneous across tablespaces. What you really need to do when a customer has corruption is figure out why they have corruption, and the leading cause by far is neither the hardware nor the software but some kind of user error. Checksums are at best a very modest assist in figuring out whether an error has been made and if so of what type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/24/17 10:30 AM, Joshua D. Drake wrote: Tom is correct here. They are not a net win for the average user. We tend to forget that although we collectively have a lot of enterprise installs where this does matter, we collectively do not equal near the level of average user installs. From an advocacy perspective, the average user install is the one that we tend most because that tending (in theory) will grow something that is more fruitful e.g; the enterprise install over time because we constantly and consistently provided a reasonable and expected experience to the average user. I'm not completely grokking your second paragraph, but I would think that an average user would love got get a heads-up that their hardware is failing. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/21/2017 09:09 AM, Tom Lane wrote: Stephen Frost writes: As for checksums, I do see value in them and I'm pretty sure that the author of that particular feature did as well, or we wouldn't even have it as an option. You seem to be of the opinion that we might as well just rip all of that code and work out as being useless. Not at all; I just think that it's not clear that they are a net win for the average user, Tom is correct here. They are not a net win for the average user. We tend to forget that although we collectively have a lot of enterprise installs where this does matter, we collectively do not equal near the level of average user installs. From an advocacy perspective, the average user install is the one that we tend most because that tending (in theory) will grow something that is more fruitful e.g; the enterprise install over time because we constantly and consistently provided a reasonable and expected experience to the average user. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 21.01.2017 19:35, Tom Lane wrote: Andres Freund writes: Sure, it might be easy, but we don't have it. Personally I think checksums just aren't even ready for prime time. If we had: - ability to switch on/off at runtime (early patches for that have IIRC been posted) - *builtin* tooling to check checksums for everything - *builtin* tooling to compute checksums after changing setting - configurable background sweeps for checksums Yeah, and there's a bunch of usability tooling that we don't have, centered around "what do you do after you get a checksum error?". AFAIK there's no way to check or clear such an error; but without such tools, I'm afraid that checksums are as much of a foot-gun as a benefit. I wanted to raise the same issue. A "something is broken" flag is fine to avoid more things get broken. But if you can't repair them, its not very useful. Since i'm a heavy user of ZFS: there are checksums and if you enable shadow-copies or using a raid, checksums are helpful, since the allow to recover from the problems. I personally would prefer to enable checksums manually and than get the possibility to repair damages. Manually because this would at least double the needed space. Greetings, Torsten -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 21.01.2017 19:37, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost writes: Because I see having checksums as, frankly, something we always should have had (as most other databases do, for good reason...) and because they will hopefully prevent data loss. I'm willing to give us a fair bit to minimize the risk of losing data. To be perfectly blunt, that's just magical thinking. Checksums don't prevent data loss in any way, shape, or form. In fact, they can *cause* data loss, or at least make it harder for you to retrieve your data, in the event of bugs causing false-positive checksum failures. This is not a new argument, at least to me, and I don't agree with it. I don't agree also. Yes, statistically it is more likely that checksum causes data-loss. The IO is greater, therefore the disc has more to do and breaks faster. But the same is true for RAID: adding more disk increases the odds of an disk-fallout. So: yes. If you use checksums at a single disc its more likely to cause problems. But if you managed it right (like ZFS for example) its an overall gain. What checksums can do for you, perhaps, is notify you in a reasonably timely fashion if you've already lost data due to storage-subsystem problems. But in a pretty high percentage of cases, that fact would be extremely obvious anyway, because of visible data corruption. Exactly, and that awareness will allow a user to prevent further data loss or corruption. Slow corruption over time is a very much known and accepted real-world case that people do experience, as well as bit flipping enough for someone to write a not-that-old blog post about them: https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1 A really nice property of checksums on pages is that they also tell you what data you *didn't* lose, which can be extremely valuable. Indeed! Greetings, Torsten -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Greetings, * Ants Aasma (ants.aa...@eesti.ee) wrote: > On Tue, Jan 24, 2017 at 4:07 AM, Tom Lane wrote: > > Peter Geoghegan writes: > >> I thought that checksums went in in part because we thought that there > >> was some chance that they'd find bugs in Postgres. > > > > Not really. AFAICS the only point is to catch storage-system malfeasance. > > This matches my understanding. Actual physical media errors are caught > by lower level checksums/error correction codes, and memory errors are > caught by ECC RAM. Not everyone runs with ECC, sadly. > Checksums do very little for PostgreSQL bugs, which > leaves only filesystem and storage firmware bugs. However the latter > are still reasonably common faults. Agreed, but in additional to filesystem and storage firmware bugs, virtualization systems can have bugs as well and if those bugs hit the kernel's cache (which is actually the more likely case- that's what the VM system is going to think it can monkey with, as long as it works with the kernel) then you can have cases which PG's checksum would likely catch since we check the checksum when we read from the kernel's read cache, and calculate the checksum before we push the page to the kernel's write cache. > I have seen multiple cases where, > after reviewing the corruption with a hex editor, the only reasonable > conclusion was a bug in the storage system. Data shifted around by > non-page size amounts, non-page aligned extents that are zeroed out, > etc. Right, I've seen similar kinds of things happening in memory of virtualized systems; things like random chunks of memory suddenly being zero'd. > Unfortunately none of those customers had checksums turned on at > the time. I feel that reacting to such errors with a non-cryptic and > easily debuggable checksum error is much better than erroring out with > huge memory allocations, crashing or returning bogus data. Timely > reaction to data corruption is really important for minimizing data > loss. Agreed. In addition to that, in larger environments where there are multiple databases involved for the explicit purpose of fail-over, a system which is going south because of bad memory or storage could be detected and pulled out, potentially with zero data loss. Of course, to minimize data loss, it'd be extremely important for the fail-over system to identify a checksum error more-or-less immediately and take the bad node out. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Tue, Jan 24, 2017 at 4:07 AM, Tom Lane wrote: > Peter Geoghegan writes: >> I thought that checksums went in in part because we thought that there >> was some chance that they'd find bugs in Postgres. > > Not really. AFAICS the only point is to catch storage-system malfeasance. This matches my understanding. Actual physical media errors are caught by lower level checksums/error correction codes, and memory errors are caught by ECC RAM. Checksums do very little for PostgreSQL bugs, which leaves only filesystem and storage firmware bugs. However the latter are still reasonably common faults. I have seen multiple cases where, after reviewing the corruption with a hex editor, the only reasonable conclusion was a bug in the storage system. Data shifted around by non-page size amounts, non-page aligned extents that are zeroed out, etc. Unfortunately none of those customers had checksums turned on at the time. I feel that reacting to such errors with a non-cryptic and easily debuggable checksum error is much better than erroring out with huge memory allocations, crashing or returning bogus data. Timely reaction to data corruption is really important for minimizing data loss. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-23 21:11:37 -0600, Merlin Moncure wrote: > End user data damage ought to prevented at all costs IMO. Really, really, really not. We should do a lot, but if that'd be the only priority we'd enable all the expensive as shit stuff and be so slow that there'd be no users. We'd never add new scalability/performance features, because they'll initially have more bugs / increase complexity. Etc. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Merlin Moncure writes: > Hm, but at least in some cases wouldn't it protect people from further > damage? End user data damage ought to prevented at all costs IMO. Well ... not directly. Disallowing you from accessing busted block A doesn't in itself prevent the same thing from happening to block B. The argument seems to be that checksum failure complaints might prompt users to, say, replace a failing disk drive before it goes dead completely. But I think there's a whole lot of wishful thinking in that, particularly when it comes to the sort of low-information users who would actually be affected by a change in the default checksum setting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 8:07 PM, Tom Lane wrote: > Peter Geoghegan writes: >> I thought that checksums went in in part because we thought that there >> was some chance that they'd find bugs in Postgres. > > Not really. AFAICS the only point is to catch storage-system malfeasance. > > It's barely possible that checksumming would help detect cases where > we'd written data meant for block A into block B, but I don't rate > that as being significantly more probable than bugs in the checksum > code itself. Also, if that case did happen, the checksum code might > "detect" it in some sense, but it would be remarkably unhelpful at > identifying the actual cause. Hm, but at least in some cases wouldn't it protect people from further damage? End user data damage ought to prevented at all costs IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-23 21:40:53 -0500, Stephen Frost wrote: > Perhaps I'm missing something here, but with checksums enabled, a hint > bit update is going to dirty the page (and we're going to write it into > the WAL and write it out to the heap), no? No. We only WAL log hint bits the first time a page is modified after a checkpoint. It's quite likely that you'll set hint bits in the same checkpoint cycle as the row has been modified last (necessating the hint bit change). So we can't just pessimize this. I'm a bit confused about the amount of technically wrong arguments in this thread. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> But we don't maintain the checksum of a page while it sits in shared >> buffers. Trying to do so would break, eg, concurrent hint-bit updates. > Hence why I said 'clean' pages.. When we write out a page, we copy it into private memory and compute the checksum there, right? We don't copy the checksum back into the page's shared-buffer image, and if we did, that would defeat the point because we would've had to maintain exclusive lock on the buffer or else the checksum might be out of date. So I don't see how this works without throwing away a lot of carefully-designed behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Jim Nasby writes: > > On 1/23/17 7:47 PM, Stephen Frost wrote: > >> It might be interesting to consider checking them in 'clean' pages in > >> shared_buffers in a background process, as that, presumably, *would* > >> detect shared buffers corruption. > > > Hmm... that would be interesting. Assuming the necessary functions are > > exposed it presumably wouldn't be difficult to do that in an extension, > > as a bgworker. > > But we don't maintain the checksum of a page while it sits in shared > buffers. Trying to do so would break, eg, concurrent hint-bit updates. Hence why I said 'clean' pages.. Perhaps I'm missing something here, but with checksums enabled, a hint bit update is going to dirty the page (and we're going to write it into the WAL and write it out to the heap), no? We'd have to accept that checking the checksum on a page would require a read lock on each page as it goes through, I imagine, though we could do something like check if the page is clean, obtain a read lock, then check if it's still clean and if so calculate the checksum and then let the lock go, then at least we're avoiding trying to lock dirty pages. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On 1/23/17 8:24 PM, Tom Lane wrote: Jim Nasby writes: On 1/23/17 7:47 PM, Stephen Frost wrote: It might be interesting to consider checking them in 'clean' pages in shared_buffers in a background process, as that, presumably, *would* detect shared buffers corruption. Hmm... that would be interesting. Assuming the necessary functions are exposed it presumably wouldn't be difficult to do that in an extension, as a bgworker. But we don't maintain the checksum of a page while it sits in shared buffers. Trying to do so would break, eg, concurrent hint-bit updates. Hrm, I thought the checksum would be valid if the buffer is marked clean? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Jim Nasby writes: > On 1/23/17 7:47 PM, Stephen Frost wrote: >> It might be interesting to consider checking them in 'clean' pages in >> shared_buffers in a background process, as that, presumably, *would* >> detect shared buffers corruption. > Hmm... that would be interesting. Assuming the necessary functions are > exposed it presumably wouldn't be difficult to do that in an extension, > as a bgworker. But we don't maintain the checksum of a page while it sits in shared buffers. Trying to do so would break, eg, concurrent hint-bit updates. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/23/17 7:47 PM, Stephen Frost wrote: It might be interesting to consider checking them in 'clean' pages in shared_buffers in a background process, as that, presumably, *would* detect shared buffers corruption. Hmm... that would be interesting. Assuming the necessary functions are exposed it presumably wouldn't be difficult to do that in an extension, as a bgworker. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Peter Geoghegan writes: > I thought that checksums went in in part because we thought that there > was some chance that they'd find bugs in Postgres. Not really. AFAICS the only point is to catch storage-system malfeasance. It's barely possible that checksumming would help detect cases where we'd written data meant for block A into block B, but I don't rate that as being significantly more probable than bugs in the checksum code itself. Also, if that case did happen, the checksum code might "detect" it in some sense, but it would be remarkably unhelpful at identifying the actual cause. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 6:01 PM, Tom Lane wrote: > Maybe this is a terminology problem. I'm taking "false positive" to mean > "checksum reports a failure, but in fact there is no observable data > corruption". Depending on why the false positive occurred, that might > help alert you to underlying storage problems, but it isn't helping you > with respect to being able to access your perfectly valid data. It was a terminology problem. Thank you for the clarification. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Peter Geoghegan writes: > Perhaps I've missed the point entirely, but, I have to ask: How could > there ever be false positives? Bugs. For example, checksum is computed while somebody else is setting a hint bit in the page, so that what is written out is completely valid except that the checksum doesn't match. (I realize that that specific scenario should be impossible given our implementation, but I hope you aren't going to claim that bugs in the checksum code are impossible.) Maybe this is a terminology problem. I'm taking "false positive" to mean "checksum reports a failure, but in fact there is no observable data corruption". Depending on why the false positive occurred, that might help alert you to underlying storage problems, but it isn't helping you with respect to being able to access your perfectly valid data. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 5:47 PM, Stephen Frost wrote: >> Perhaps I've missed the point entirely, but, I have to ask: How could >> there ever be false positives? With checksums, false positives are >> simply not allowed. Therefore, there cannot be a false positive, >> unless we define checksums as a mechanism that should only find >> problems that originate somewhere at or below the filesystem. We >> clearly have not done that, so ISTM that checksums could legitimately >> find bugs in the checksum code. I am not being facetious. > > I'm not sure I'm following your question here. A false positive would > be a case where the checksum code throws an error on a page whose > checksum is correct, or where the checksum has failed but nothing is > actually wrong/different on the page. I thought that checksums went in in part because we thought that there was some chance that they'd find bugs in Postgres. I was under the impression that that was at least a secondary goal of checksums. > As for the purpose of checksums, it's exactly to identify cases where > the page has been changed since we wrote it out, due to corruption in > the kernel, filesystem, storage system, etc. As we only check them when > we read in a page and calculate them when we go to write the page out, > they aren't helpful for shared_buffers corruption, generally speaking. I'd have guessed that they might catch a bug in recovery itself, even when the filesystem maintains the guarantees Postgres requires. In any case, it seems exceedingly unlikely that the checksum code itself would fail. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 24/01/17 02:39, Michael Paquier wrote: > On Tue, Jan 24, 2017 at 10:26 AM, Stephen Frost wrote: >> * Tom Lane (t...@sss.pgh.pa.us) wrote: > I don't recall ever seeing a checksum failure on a Heroku Postgres > database, >> >> Not sure how this part of that sentence was missed: >> >> - >> ... even though they were enabled as soon as the feature became >> available. >> - >> >> Which would seem to me to say "the code's been running for a long time >> on a *lot* of systems without throwing a false positive or surfacing a >> bug." > > I am reading that similarly to what Tom is seeing: enabling it has > proved Heroku that it did not catch problems in years, meaning that > the performance cost induced by enabling it has paid nothing in > practive, except the insurance to catch up problems should they > happen. > +1 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Mon, Jan 23, 2017 at 5:26 PM, Stephen Frost wrote: > > Not sure how this part of that sentence was missed: > > > > - > > ... even though they were enabled as soon as the feature became > > available. > > - > > > > Which would seem to me to say "the code's been running for a long time > > on a *lot* of systems without throwing a false positive or surfacing a > > bug." > > I think you've both understood what I said correctly. Note that I > remain neutral on the question of whether or not checksums should be > enabled by default. > > Perhaps I've missed the point entirely, but, I have to ask: How could > there ever be false positives? With checksums, false positives are > simply not allowed. Therefore, there cannot be a false positive, > unless we define checksums as a mechanism that should only find > problems that originate somewhere at or below the filesystem. We > clearly have not done that, so ISTM that checksums could legitimately > find bugs in the checksum code. I am not being facetious. I'm not sure I'm following your question here. A false positive would be a case where the checksum code throws an error on a page whose checksum is correct, or where the checksum has failed but nothing is actually wrong/different on the page. As for the purpose of checksums, it's exactly to identify cases where the page has been changed since we wrote it out, due to corruption in the kernel, filesystem, storage system, etc. As we only check them when we read in a page and calculate them when we go to write the page out, they aren't helpful for shared_buffers corruption, generally speaking. It might be interesting to consider checking them in 'clean' pages in shared_buffers in a background process, as that, presumably, *would* detect shared buffers corruption. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Tue, Jan 24, 2017 at 10:26 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> >> I don't recall ever seeing a checksum failure on a Heroku Postgres >> >> database, > > Not sure how this part of that sentence was missed: > > - > ... even though they were enabled as soon as the feature became > available. > - > > Which would seem to me to say "the code's been running for a long time > on a *lot* of systems without throwing a false positive or surfacing a > bug." I am reading that similarly to what Tom is seeing: enabling it has proved Heroku that it did not catch problems in years, meaning that the performance cost induced by enabling it has paid nothing in practive, except the insurance to catch up problems should they happen. > Given your up-thread concerns that enabling checksums could lead to > false positives and might surface bugs, that's pretty good indication > that those concerns are unfounded. FWIW, I have seen failures that could have been as hardware failures or where enabling checksums may have helped, but they were on pre-9.3 instances. Now there is a performance penalty in enabling them, but it solely depends on the page eviction from shared buffers, which is pretty high for some load patterns I work with, still even with that we saw a 1~2% output penalty in measurements. Perhaps not everybody would like to pay this price. FWIW, we are thinking about paying it as VMs are more sensitive to vmdk-like class of bugs. I am not sure that everybody would like to pay that. By seeing this thread -hackers likely are fine with the cost induced. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 5:26 PM, Stephen Frost wrote: > Not sure how this part of that sentence was missed: > > - > ... even though they were enabled as soon as the feature became > available. > - > > Which would seem to me to say "the code's been running for a long time > on a *lot* of systems without throwing a false positive or surfacing a > bug." I think you've both understood what I said correctly. Note that I remain neutral on the question of whether or not checksums should be enabled by default. Perhaps I've missed the point entirely, but, I have to ask: How could there ever be false positives? With checksums, false positives are simply not allowed. Therefore, there cannot be a false positive, unless we define checksums as a mechanism that should only find problems that originate somewhere at or below the filesystem. We clearly have not done that, so ISTM that checksums could legitimately find bugs in the checksum code. I am not being facetious. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/23/17 7:15 PM, Tom Lane wrote: Uhm, Peter G just said that Heroku enables this on all their databases and have yet to see a false-positive report or an issue with having it enabled. That, plus the reports and evidence we've seen in the past couple days, look like a pretty ringing endorsement for having them. You must have read a different Peter G than I did. What I read was I don't recall ever seeing a checksum failure on a Heroku Postgres database, which did not sound like an endorsement to me. Well, it is pretty good evidence that there's no bugs and that false positives aren't a problem. As I mentioned earlier, my bet is that any significantly large cloud provider has a ton of things going on behind the scenes to prevent oddball (as in non-repeating) errors. When you've got 1M+ servers even small probability bugs can become really big problems. In any case, how can we go about collecting data that checksums help? We certainly know people suffer data corruption. We can only guess at how many of those incidents would be caught by checksums. I don't see how we can get data on that unless we get a lot more users running checksums. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: > On 1/23/17 6:55 PM, Stephen Frost wrote: > >* Jim Nasby (jim.na...@bluetreble.com) wrote: > >>As others have mentioned, right now practically no one enables this, > >>so we've got zero data on how useful it might actually be. > >Uhm, Peter G just said that Heroku enables this on all their databases > >and have yet to see a false-positive report or an issue with having it > >enabled. > > > >That, plus the reports and evidence we've seen in the past couple days, > >look like a pretty ringing endorsement for having them. > > > >I'll ping the RDS crowd and see if they'll tell me what they're doing > >and what their thoughts are on it. > > Oh, I read the thread as "there's no data to support checksums are > useful", There's been multiple reports on this thread that corruption does happen. Sure, it'd be nice if we had a report of it happening with checksums enabled and where checksums caught it, but I don't see any basis for an argument that they wouldn't ever catch real-world bit-flipping corruption. > IIRC Grant's mentioned in one of his presentations that they enable > checksums, but getting more explicit info would be good. Frankly, my recollection is that they wouldn't use PG until it had page-level checksums, and that they run it on all of their instances, but I'd like to get confirmation of that, if I can, and also hear if they've got examples of the checksums we have catching real issues. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Jim Nasby (jim.na...@bluetreble.com) wrote: > >> As others have mentioned, right now practically no one enables this, > >> so we've got zero data on how useful it might actually be. > > > Uhm, Peter G just said that Heroku enables this on all their databases > > and have yet to see a false-positive report or an issue with having it > > enabled. > > > That, plus the reports and evidence we've seen in the past couple days, > > look like a pretty ringing endorsement for having them. > > You must have read a different Peter G than I did. What I read was > > >> I don't recall ever seeing a checksum failure on a Heroku Postgres > >> database, Not sure how this part of that sentence was missed: - ... even though they were enabled as soon as the feature became available. - Which would seem to me to say "the code's been running for a long time on a *lot* of systems without throwing a false positive or surfacing a bug." Given your up-thread concerns that enabling checksums could lead to false positives and might surface bugs, that's pretty good indication that those concerns are unfounded. In addition, it shows that big hosting providers were anxious to get the feature and enabled it immediately for their users, while we debate if it might be useful for *our* users. I certainly don't believe that Heroku or Amazon have all the right answers for everything, but I do think we should consider that they enabled checksums immediately, along with the other consultants on this thread who have said the same. Lastly, I've already pointed out that there were 2 cases recently reported on IRC of corruption on reasonably modern gear, with a third comment following that up from Merlin. These notions that corruption doesn't happen today, or that we would have heard about it if it had, also look unfounded from my perspective. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On 1/23/17 6:55 PM, Stephen Frost wrote: * Jim Nasby (jim.na...@bluetreble.com) wrote: As others have mentioned, right now practically no one enables this, so we've got zero data on how useful it might actually be. Uhm, Peter G just said that Heroku enables this on all their databases and have yet to see a false-positive report or an issue with having it enabled. That, plus the reports and evidence we've seen in the past couple days, look like a pretty ringing endorsement for having them. I'll ping the RDS crowd and see if they'll tell me what they're doing and what their thoughts are on it. Oh, I read the thread as "there's no data to support checksums are useful", not "there's no data to support there's little risk of bugs or false-positives". I certainly agree that Heroku is a good test of both of those. IIRC Grant's mentioned in one of his presentations that they enable checksums, but getting more explicit info would be good. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Stephen Frost writes: > * Jim Nasby (jim.na...@bluetreble.com) wrote: >> As others have mentioned, right now practically no one enables this, >> so we've got zero data on how useful it might actually be. > Uhm, Peter G just said that Heroku enables this on all their databases > and have yet to see a false-positive report or an issue with having it > enabled. > That, plus the reports and evidence we've seen in the past couple days, > look like a pretty ringing endorsement for having them. You must have read a different Peter G than I did. What I read was >> I don't recall ever seeing a checksum failure on a Heroku Postgres >> database, which did not sound like an endorsement to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: > As others have mentioned, right now practically no one enables this, > so we've got zero data on how useful it might actually be. Uhm, Peter G just said that Heroku enables this on all their databases and have yet to see a false-positive report or an issue with having it enabled. That, plus the reports and evidence we've seen in the past couple days, look like a pretty ringing endorsement for having them. I'll ping the RDS crowd and see if they'll tell me what they're doing and what their thoughts are on it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On 1/23/17 6:14 PM, Peter Geoghegan wrote: In practice, Postgres checksums do *not* seem to catch problems. That's been my experience, at least. For someone running on a bunch of AWS hardware that doesn't really surprise me. Presumably, anyone operating at that scale would be quickly overwhelmed if odd hardware errors were even remotely common. (Note that odd errors aren't the same as an outright failure.) Where I'd expect this to help is with anyone running a moderate-sized data center that doesn't have the kind of monitoring resources a cloud provider does. As for collecting data, I don't really know what more data we can get. We get data corruption reports on a fairly regular basis. I think it's a very safe bet that CRCs would identify somewhere between 20% and 80%. Maybe that number could be better refined, but that's still going to be guesswork. As others have mentioned, right now practically no one enables this, so we've got zero data on how useful it might actually be. If the patch to make this a GUC goes through then at least we could tell people that have experienced corruption to enable this. That might provide some data, though the horse is already well out of the barn by then. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 1/23/17 1:30 AM, Amit Kapila wrote: On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra wrote: That being said, I'm ready to do some benchmarking on this, so that we have at least some numbers to argue about. Can we agree on a set of workloads that we want to benchmark in the first round? I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. I tried testing this (and thought I sent an email about it but don't see it now :/). Unfortunately, on my laptop I wasn't getting terribly consistent runs; I was seeing +/- ~8% TPS. Sometimes checksumps appeared to add ~10% overhead, but it was hard to tell. If someone has a more stable (is in, dedicated) setup, testing would be useful. BTW, I ran the test with small (default 128MB) shared_buffers, scale 50 (800MB database), sync_commit = off, checkpoint_timeout = 1min, to try and significantly increase the rate of buffers being written out. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 12:35 PM, Tom Lane wrote: > Andres Freund writes: >> Sure, it might be easy, but we don't have it. Personally I think >> checksums just aren't even ready for prime time. If we had: >> - ability to switch on/off at runtime (early patches for that have IIRC >> been posted) >> - *builtin* tooling to check checksums for everything >> - *builtin* tooling to compute checksums after changing setting >> - configurable background sweeps for checksums > > Yeah, and there's a bunch of usability tooling that we don't have, > centered around "what do you do after you get a checksum error?". > AFAIK there's no way to check or clear such an error; but without > such tools, I'm afraid that checksums are as much of a foot-gun > as a benefit. I see your point here, but they sure saved my ass with that pl/sh issue. So I'm inclined to lightly disagree; there are good arguments either way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 9:09 AM, Tom Lane wrote: > Not at all; I just think that it's not clear that they are a net win > for the average user, and so I'm unconvinced that turning them on by > default is a good idea. I could be convinced otherwise by suitable > evidence. What I'm objecting to is turning them on without making > any effort to collect such evidence. +1 One insight Jim Gray has in the classic paper "Why Do Computers Stop and What Can Be Done About It?" [1] is that fault-tolerant hardware is table stakes, and so most failures are related to operator error, and to a lesser extent software bugs. The paper is about 30 years old. I don't recall ever seeing a checksum failure on a Heroku Postgres database, even though they were enabled as soon as the feature became available. I have seen a few corruption problems brought to light by amcheck, though, all of which were due to bugs in software. Apparently, before I joined Heroku there were real reliability problems with the storage subsystem that Heroku Postgres runs on (it's a pluggable storage service from a popular cloud provider -- the "pluggable" functionality would have made it fairly novel at the time). These problems were something that the Heroku Postgres team dealt with about 6 years ago. However, anecdotal evidence suggests that the reliability of the same storage system *vastly* improved roughly a year or two later. We still occasionally lose drives, but drives seem to fail fast in a fashion that lets us recover without data loss easily. In practice, Postgres checksums do *not* seem to catch problems. That's been my experience, at least. Obviously every additional check helps, and it may be something we can do without any appreciable downside. I'd like to see a benchmark. [1] http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 6:57 PM, Tomas Vondra wrote: > On 01/23/2017 01:40 PM, Amit Kapila wrote: >> >> On Mon, Jan 23, 2017 at 3:56 PM, Tomas Vondra >> wrote: >>> >>> On 01/23/2017 09:57 AM, Amit Kapila wrote: On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra wrote: > > > On 01/23/2017 08:30 AM, Amit Kapila wrote: >> >> >> >> >> I think if we can get data for pgbench read-write workload when data >> doesn't fit in shared buffers but fit in RAM, that can give us some >> indication. We can try by varying the ratio of shared buffers w.r.t >> data. This should exercise the checksum code both when buffers are >> evicted and at next read. I think it also makes sense to check the >> WAL data size for each of those runs. >> > > Yes, I'm thinking that's pretty much the worst case for OLTP-like > workload, > because it has to evict buffers from shared buffers, generating a > continuous > stream of writes. Doing that on good storage (e.g. PCI-e SSD or > possibly > tmpfs) will further limit the storage overhead, making the time spent > computing checksums much more significant. Makes sense? > Yeah, I think that can be helpful with respect to WAL, but for data, if we are considering the case where everything fits in RAM, then faster storage might or might not help. >>> >>> I'm not sure I understand. Why wouldn't faster storage help? It's only a >>> matter of generating enough dirty buffers (that get evicted from shared >>> buffers) to saturate the storage. >>> >> >> When the page gets evicted from shared buffer, it is just pushed to >> kernel; the real write to disk won't happen until the kernel feels >> like it.They are written to storage later when a checkpoint occurs. >> So, now if we have fast storage subsystem then it can improve the >> writes from kernel to disk, but not sure how much that can help in >> improving TPS. >> > > I don't think that's quite true. If the pages are evicted by bgwriter, since > 9.6 there's a flush every 512kB. > Right, but backend flush after is zero by default. > This will also flush data written by > backends, of course. But even without the flushing, the OS does not wait > with the flush until the very last moment - that'd be a huge I/O spike. > Instead, the OS will write the dirty data to disk after 30 seconds, of after > accumulating some predefined amount of dirty data. > This is the reason I told it might or might not help. I think there is no point in having too much discussion on this point, if you access to fast storage system, then go ahead and perform the tests on same, if not, then also we can try without that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/23/2017 01:40 PM, Amit Kapila wrote: On Mon, Jan 23, 2017 at 3:56 PM, Tomas Vondra wrote: On 01/23/2017 09:57 AM, Amit Kapila wrote: On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra wrote: On 01/23/2017 08:30 AM, Amit Kapila wrote: I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. Yes, I'm thinking that's pretty much the worst case for OLTP-like workload, because it has to evict buffers from shared buffers, generating a continuous stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly tmpfs) will further limit the storage overhead, making the time spent computing checksums much more significant. Makes sense? Yeah, I think that can be helpful with respect to WAL, but for data, if we are considering the case where everything fits in RAM, then faster storage might or might not help. I'm not sure I understand. Why wouldn't faster storage help? It's only a matter of generating enough dirty buffers (that get evicted from shared buffers) to saturate the storage. When the page gets evicted from shared buffer, it is just pushed to kernel; the real write to disk won't happen until the kernel feels like it.They are written to storage later when a checkpoint occurs. So, now if we have fast storage subsystem then it can improve the writes from kernel to disk, but not sure how much that can help in improving TPS. I don't think that's quite true. If the pages are evicted by bgwriter, since 9.6 there's a flush every 512kB. This will also flush data written by backends, of course. But even without the flushing, the OS does not wait with the flush until the very last moment - that'd be a huge I/O spike. Instead, the OS will write the dirty data to disk after 30 seconds, of after accumulating some predefined amount of dirty data. So the system will generally get into a "stable state" where it writes about the same amount of data to disk on average. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 3:56 PM, Tomas Vondra wrote: > On 01/23/2017 09:57 AM, Amit Kapila wrote: >> >> On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra >> wrote: >>> >>> On 01/23/2017 08:30 AM, Amit Kapila wrote: I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. >>> >>> Yes, I'm thinking that's pretty much the worst case for OLTP-like >>> workload, >>> because it has to evict buffers from shared buffers, generating a >>> continuous >>> stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly >>> tmpfs) will further limit the storage overhead, making the time spent >>> computing checksums much more significant. Makes sense? >>> >> >> Yeah, I think that can be helpful with respect to WAL, but for data, >> if we are considering the case where everything fits in RAM, then >> faster storage might or might not help. >> > > I'm not sure I understand. Why wouldn't faster storage help? It's only a > matter of generating enough dirty buffers (that get evicted from shared > buffers) to saturate the storage. > When the page gets evicted from shared buffer, it is just pushed to kernel; the real write to disk won't happen until the kernel feels like it.They are written to storage later when a checkpoint occurs. So, now if we have fast storage subsystem then it can improve the writes from kernel to disk, but not sure how much that can help in improving TPS. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/23/2017 09:57 AM, Amit Kapila wrote: On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra wrote: On 01/23/2017 08:30 AM, Amit Kapila wrote: I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. Yes, I'm thinking that's pretty much the worst case for OLTP-like workload, because it has to evict buffers from shared buffers, generating a continuous stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly tmpfs) will further limit the storage overhead, making the time spent computing checksums much more significant. Makes sense? Yeah, I think that can be helpful with respect to WAL, but for data, if we are considering the case where everything fits in RAM, then faster storage might or might not help. I'm not sure I understand. Why wouldn't faster storage help? It's only a matter of generating enough dirty buffers (that get evicted from shared buffers) to saturate the storage. With some storage you'll hit that at 100 MB/s, with PCI-e it might be more like 1GB/s. Of course, if the main bottleneck is somewhere else (e.g. hitting 100% CPU utilization before putting any pressure on storage), that's not going to make much difference. Or perhaps I missed something important? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra wrote: > On 01/23/2017 08:30 AM, Amit Kapila wrote: >> >> >> I think if we can get data for pgbench read-write workload when data >> doesn't fit in shared buffers but fit in RAM, that can give us some >> indication. We can try by varying the ratio of shared buffers w.r.t >> data. This should exercise the checksum code both when buffers are >> evicted and at next read. I think it also makes sense to check the >> WAL data size for each of those runs. >> > > Yes, I'm thinking that's pretty much the worst case for OLTP-like workload, > because it has to evict buffers from shared buffers, generating a continuous > stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly > tmpfs) will further limit the storage overhead, making the time spent > computing checksums much more significant. Makes sense? > Yeah, I think that can be helpful with respect to WAL, but for data, if we are considering the case where everything fits in RAM, then faster storage might or might not help. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/23/2017 08:30 AM, Amit Kapila wrote: On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra wrote: That being said, I'm ready to do some benchmarking on this, so that we have at least some numbers to argue about. Can we agree on a set of workloads that we want to benchmark in the first round? I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. Yes, I'm thinking that's pretty much the worst case for OLTP-like workload, because it has to evict buffers from shared buffers, generating a continuous stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly tmpfs) will further limit the storage overhead, making the time spent computing checksums much more significant. Makes sense? What about other types of workload? I think we should not look just at write-heavy workloads - I wonder what is the overhead of verifying the checksums in read-only workloads (again, with data that fits into RAM). What about large data loads simulating OLAP, and exports (e.g. pg_dump)? That leaves us with 4 workload types, I guess: 1) read-write OLTP (shared buffers < data < RAM) 2) read-only OLTP (shared buffers < data < RAM) 3) large data loads (COPY) 4) large data exports (pg_dump) Anything else? The other question is of course hardware - IIRC there are differences between CPUs. I do have a new e5-2620v4, but perhaps it'd be good to also do some testing on a Power machine, or an older Intel CPU. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra wrote: > > That being said, I'm ready to do some benchmarking on this, so that we have > at least some numbers to argue about. Can we agree on a set of workloads > that we want to benchmark in the first round? > I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
> 21 янв. 2017 г., в 18:18, Petr Jelinek > написал(а): > > On 21/01/17 11:39, Magnus Hagander wrote: >> Is it time to enable checksums by default, and give initdb a switch to >> turn it off instead? +1 > > I'd like to see benchmark first, both in terms of CPU and in terms of > produced WAL (=network traffic) given that it turns on logging of hint bits. Here are the results of my testing for 9.4 in December 2014. The benchmark was done on a production like use case when all the data fits in memory (~ 20 GB) and doesn’t fit into shared_buffers (it was intentionally small - 128 MB on stress stend), so that shared_blk_read / shared_blk_hit = 1/4. The workload was a typical OLTP but with mostly write queries (80% writes, 20% reads). Here are the number of WALs written during the test: Defaults263 wal_log_hints 410 checksums 367 I couldn’t find the answer why WAL write amplification is even worse for wal_log_hints than for checksums but I checked several times and it always reproduced. As for CPU I couldn’t see the overhead [1]. And perf top showed me less then 2% in calculating CRC. For all new DBs we now enable checksums at initdb and several dozens of our shards use checksums now. I don’t see any performance difference for them comparing with non-checksumed clusters. And we have already had one case when we caught data corruption with checksums. [1] https://yadi.sk/i/VAiWjv6t3AQCs2?lang=en -- May the force be with you… https://simply.name -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander > Is it time to enable checksums by default, and give initdb a switch to turn > it off instead? > > I keep running into situations where people haven't enabled it, because > (a) they didn't know about it, or (b) their packaging system ran initdb > for them so they didn't even know they could. And of course they usually > figure this out once the db has enough data and traffic that the only way > to fix it is to set up something like slony/bucardo/pglogical and a whole > new server to deal with it.. (Which is something that would also be good > to fix -- but having the default changed would be useful as well) +10 I was wondering why the community had decided to turn it off by default. IIRC, the reason was that the performance overhead was 20-30% when the entire data directory was placed on the tmpfs, but it's not as important as the data protection by default. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 8:16 PM, Ants Aasma wrote: > On Sat, Jan 21, 2017 at 7:39 PM, Petr Jelinek > wrote: > > So in summary "postgresql.conf options are easy to change" while "initdb > > options are hard to change", I can see this argument used both for > > enabling or disabling checksums by default. As I said I would be less > > worried if it was easy to turn off, but we are not there afaik. And even > > then I'd still want benchmark first. > > Adding support for disabling checksums is almost trivial as it only > requires flipping a value in the control file. And I have somewhere > sitting around a similarly simple tool for turning on checksums while > the database is offline. FWIW, based on customers and fellow > conference goers I have talked to most would gladly take the > performance hit, but not the downtime to turn it on on an existing > database. > This is exactly the scenario I've been exposed to over and over again. If it can be turned on/off online, then the default matters a lot less. But it has to be online. Yes, you can set up a replica (which today requires third party products like slony, bucardo or pglogical -- at least we'll hopefully have pglogical fully in 10, but it's still a very expensive way to fix the problem). If we can make it cheap and easy to turn them off, that makes a change of the default a lot cheaper. Did you have a tool for that sitting around as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Checksums by default?
On 01/21/2017 04:18 PM, Petr Jelinek wrote: On 21/01/17 11:39, Magnus Hagander wrote: Is it time to enable checksums by default, and give initdb a switch to turn it off instead? I'd like to see benchmark first, both in terms of CPU and in terms of produced WAL (=network traffic) given that it turns on logging of hint bits. ... and those hint bits may easily trigger full-page writes, resulting in significant write amplification. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers