Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/17/17 11:29, Michael Paquier wrote: > FWIW, I would rather see any optimization done in > AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing > the WAL page header after its data has been initialized by > AdvanceXLInsertBuffer() once. Is that an aesthetic 'rather', or is there a technical advantage you have in mind? I also began by looking at how to stop AdvanceXLInsertBuffer() initializing headers and taking locks when neither is needed. But Heikki's just-rezero-them suggestion has a definite simplicity advantage. It can be implemented entirely with a tight group of lines added to CopyXLogRecordToWAL, as opposed to modifying AdvanceXLInsertBuffer in a few distinct places, adding a parameter, and changing its call sites. There's a technical appeal to making the changes in AdvanceXLInsertBuffer (who wants to do unnecessary initialization and locking?), but the amount of unnecessary work that can be avoided is proportional to the number of unused pages at switch time, meaning it is largest when the system is least busy, and may be of little practical concern. Moreover, optimizing AdvanceXLInsertBuffer would reveal one more complication: some of the empty pages about to be written out may have been initialized opportunistically in earlier calls to AdvanceXLInsertBuffer, so those already have populated headers, and would need rezeroing anyway. And not necessarily just an insignificant few of them: if XLOGChooseNumBuffers chose the maximum, it could even be all of them. That might also be handled by yet another conditional within AdvanceXLInsertBuffer. But with all of that in view, maybe it is just simpler to have one loop in CopyXLogRecordToWAL simply zero them all, and leave AdvanceXLInsertBuffer alone, so no complexity is added when it is called from other sites that are arguably hotter. Zeroing SizeOfXLogShortPHD bytes doesn't cost a whole lot. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Thu, Jul 6, 2017 at 3:48 PM, Heikki Linnakangas wrote: > On 07/03/2017 06:30 PM, Chapman Flack wrote: >> Although it's moot in the straightforward approach of re-zeroing in >> the loop, it would still help my understanding of the system to know >> if there is some subtlety that would have broken what I proposed >> earlier, which was an extra flag to AdvanceXLInsertBuffer() that >> would tell it not only to skip initializing headers, but also to >> skip the WaitXLogInsertionsToFinish() check ... because I have >> the entire region reserved and I hold all the writer slots >> at that moment, it seems safe to assure AdvanceXLInsertBuffer() >> that there are no outstanding writes to wait for. > > Yeah, I suppose that would work, too. FWIW, I would rather see any optimization done in AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing the WAL page header after its data has been initialized by AdvanceXLInsertBuffer() once. That's too late for 10, but you still have time for a patch to be integrated in 11. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Fri, Jun 23, 2017 at 6:08 AM, Chapman Flack wrote: > Well, gzip was doing pretty well; it could get a 16 MB segment file down > to under 27 kB, or less than 14 bytes for each of 2000 pages, when a page > header is what, 20 bytes, it looks like? I'm not sure how much better > I'd expect a (non-custom) compression scheme to do. The real difference > comes between compressing (even well) a large unchanged area, versus being > able to recognize (again with a non-custom tool) that the whole area is > unchanged. Have you tried as well lz4 for your cases? It performs faster than gzip at minimum compression and compresses less, but I am really wondering if for almost zero pages it performs actually better. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/03/2017 06:30 PM, Chapman Flack wrote: On 07/03/2017 09:39 AM, Heikki Linnakangas wrote: Hmm. That's not the problem, though. Imagine that instead of the loop above, you do just: WALInsertLockUpdateInsertingAt(CurrPos); AdvanceXLInsertBuffer(EndPos, false); AdvanceXLInsertBuffer() will call XLogWrite(), to flush out any pages before EndPos, to make room in the wal_buffers for the new pages. Before doing that, it will call WaitXLogInsertionsToFinish() Although it's moot in the straightforward approach of re-zeroing in the loop, it would still help my understanding of the system to know if there is some subtlety that would have broken what I proposed earlier, which was an extra flag to AdvanceXLInsertBuffer() that would tell it not only to skip initializing headers, but also to skip the WaitXLogInsertionsToFinish() check ... because I have the entire region reserved and I hold all the writer slots at that moment, it seems safe to assure AdvanceXLInsertBuffer() that there are no outstanding writes to wait for. Yeah, I suppose that would work, too. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/03/2017 09:39 AM, Heikki Linnakangas wrote: > Hmm. That's not the problem, though. Imagine that instead of the loop > above, you do just: > > WALInsertLockUpdateInsertingAt(CurrPos); > AdvanceXLInsertBuffer(EndPos, false); > > AdvanceXLInsertBuffer() will call XLogWrite(), to flush out any pages > before EndPos, to make room in the wal_buffers for the new pages. Before > doing that, it will call WaitXLogInsertionsToFinish() Although it's moot in the straightforward approach of re-zeroing in the loop, it would still help my understanding of the system to know if there is some subtlety that would have broken what I proposed earlier, which was an extra flag to AdvanceXLInsertBuffer() that would tell it not only to skip initializing headers, but also to skip the WaitXLogInsertionsToFinish() check ... because I have the entire region reserved and I hold all the writer slots at that moment, it seems safe to assure AdvanceXLInsertBuffer() that there are no outstanding writes to wait for. I suppose it's true there's not much performance to gain; it would save a few pairs of lock operations per empty page to be written, but then, the more empty pages there are at the time of a log switch, the less busy the system is, so the less it matters. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/03/2017 09:39 AM, Heikki Linnakangas wrote: > The most straightforward solution would be to just clear each page with > memset() in the loop. It's a bit wasteful to clear the page again, just > after AdvanceXLInsertBuffer() has initialized it, but this isn't > performance-critical. An in that straightforward approach, I imagine it would suffice to memset just the length of a (short) page header; the page content is already zeroed, and there isn't going to be a switch at the very start of a segment, so a long header won't be encountered ... will it? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 06/26/2017 04:20 AM, Chapman Flack wrote: I notice CopyXLogRecordToWAL contains this loop (in the case where the record being copied is a switch): while (CurrPos < EndPos) { /* initialize the next page (if not initialized already) */ WALInsertLockUpdateInsertingAt(CurrPos); AdvanceXLInsertBuffer(CurrPos, false); CurrPos += XLOG_BLCKSZ; } in which it calls, one page at a time, AdvanceXLInsertBuffer, which contains its own loop able to do a sequence of pages. A comment explains why: /* * We do this one page at a time, to make sure we don't deadlock * against ourselves if wal_buffers < XLOG_SEG_SIZE. */ I want to make sure I understand what the deadlock potential is in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish before writing any dirty buffer, and we do hold insertion slot locks (all of 'em, in the case of a log switch, because that makes XlogInsertRecord call WALInsertLockAcquireExclusive instead of just WALInsertLockAcquire for other record types). Does not the fact we hold all the insertion slots exclude the possibility that any dirty buffer (preceding the one we're touching) needs to be checked for in-flight insertions? Hmm. That's not the problem, though. Imagine that instead of the loop above, you do just: WALInsertLockUpdateInsertingAt(CurrPos); AdvanceXLInsertBuffer(EndPos, false); AdvanceXLInsertBuffer() will call XLogWrite(), to flush out any pages before EndPos, to make room in the wal_buffers for the new pages. Before doing that, it will call WaitXLogInsertionsToFinish() to wait for any insertions to those pages to be completed. But the backend itself is advertising the insertion position CurrPos, and it will therefore wait for itself, forever. I've been thinking along the lines of another parameter to AdvanceXLInsertBuffer to indicate when the caller is exactly this loop filling out the tail after a log switch (originally, to avoid filling in page headers). It now seems to me that, if AdvanceXLInsertBuffer has that information, it could also be safe for it to skip the WaitXLogInsertionsToFinish in that case. Would that eliminate the deadlock potential, and allow the loop in CopyXLogRecordToWAL to be replaced with a single call to AdvanceXLInsertBuffer and a single WALInsertLockUpdateInsertingAt ? Or have I overlooked some other subtlety? The most straightforward solution would be to just clear each page with memset() in the loop. It's a bit wasteful to clear the page again, just after AdvanceXLInsertBuffer() has initialized it, but this isn't performance-critical. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 06/25/17 21:20, Chapman Flack wrote: > I want to make sure I understand what the deadlock potential is > in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish > ... > Does not the fact we hold all the insertion slots exclude the possibility > that any dirty buffer (preceding the one we're touching) needs to be checked > for in-flight insertions? [in the filling-out-the-log-tail case only] Anyone? Or have I not even achieved 'wrong' yet? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
I notice CopyXLogRecordToWAL contains this loop (in the case where the record being copied is a switch): while (CurrPos < EndPos) { /* initialize the next page (if not initialized already) */ WALInsertLockUpdateInsertingAt(CurrPos); AdvanceXLInsertBuffer(CurrPos, false); CurrPos += XLOG_BLCKSZ; } in which it calls, one page at a time, AdvanceXLInsertBuffer, which contains its own loop able to do a sequence of pages. A comment explains why: /* * We do this one page at a time, to make sure we don't deadlock * against ourselves if wal_buffers < XLOG_SEG_SIZE. */ I want to make sure I understand what the deadlock potential is in this case. AdvanceXLInsertBuffer will call WaitXLogInsertionsToFinish before writing any dirty buffer, and we do hold insertion slot locks (all of 'em, in the case of a log switch, because that makes XlogInsertRecord call WALInsertLockAcquireExclusive instead of just WALInsertLockAcquire for other record types). Does not the fact we hold all the insertion slots exclude the possibility that any dirty buffer (preceding the one we're touching) needs to be checked for in-flight insertions? I've been thinking along the lines of another parameter to AdvanceXLInsertBuffer to indicate when the caller is exactly this loop filling out the tail after a log switch (originally, to avoid filling in page headers). It now seems to me that, if AdvanceXLInsertBuffer has that information, it could also be safe for it to skip the WaitXLogInsertionsToFinish in that case. Would that eliminate the deadlock potential, and allow the loop in CopyXLogRecordToWAL to be replaced with a single call to AdvanceXLInsertBuffer and a single WALInsertLockUpdateInsertingAt ? Or have I overlooked some other subtlety? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 06/21/17 04:51, Heikki Linnakangas wrote: > (I'm cleaning up my inbox, hence the delayed reply) I had almost completely forgotten ever bringing it up. :) > When I wrote that code, I don't remember if I realized that we're > initializing the page headers, or if I thought that it's good enough even if > we do. I guess I didn't realize it, because a comment would've been in order > if it was intentional. > > So +1 on fixing that, a patch would be welcome. Ok, that sounds like something I could take a whack at. Overall, xlog.c is a bit daunting, but this particular detail seems fairly approachable. > In the meanwhile, have you > tried using a different compression program? Something else than gzip might > do a better job at the almost zero pages. Well, gzip was doing pretty well; it could get a 16 MB segment file down to under 27 kB, or less than 14 bytes for each of 2000 pages, when a page header is what, 20 bytes, it looks like? I'm not sure how much better I'd expect a (non-custom) compression scheme to do. The real difference comes between compressing (even well) a large unchanged area, versus being able to recognize (again with a non-custom tool) that the whole area is unchanged. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
(I'm cleaning up my inbox, hence the delayed reply) On 08/02/2016 10:51 PM, Robert Haas wrote: On Tue, Aug 2, 2016 at 2:33 PM, Bruce Momjian wrote: On Tue, Jul 26, 2016 at 05:42:43PM -0400, Chapman Flack wrote: Even so, I'd be curious whether it would break anything to have xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero pages written to fill out a segment. At least until it's felt that archive_timeout has been so decidedly obsoleted by streaming replication that it is removed, and the log-tail zeroing code with it. That at least would eliminate the risk of anyone else repeating my astonishment. :) I had read that 9.4 added built-in log-zeroing code, and my first reaction was "cool! that may make the compression technique we're using unnecessary, but certainly can't make it worse" only to discover that it did, by ~ 300x, becoming now 3x *worse* than plain gzip, which itself is ~ 100x worse than what we had. My guess is that the bytes are there to detect problems where a 512-byte disk sector is zeroed by a disk failure. I don't see use changing that for the use-case you have described. Is there actually any code that makes such a check? I'm inclined to doubt that was the motivation, though admittedly we're both speculating about the contents of Heikki's brain, a tricky proposition on a good day. Given that we used to just leave them as garbage, it seems pretty safe to zero them out now. It's kind of nice that all the XLOG pages have valid page headers. One way to think of the WAL switch record is that it's a very large WAL record that just happens to consume the rest of the WAL segment. Except that it's not actually represented like that; the xl_tot_len field of an XLOG switch record does not include the zeroed out portion. Instead, there's special handling in the reader code, that skips to the end of the segment when it sees a switch record. So that point is moot. When I wrote that code, I don't remember if I realized that we're initializing the page headers, or if I thought that it's good enough even if we do. I guess I didn't realize it, because a comment would've been in order if it was intentional. So +1 on fixing that, a patch would be welcome. In the meanwhile, have you tried using a different compression program? Something else than gzip might do a better job at the almost zero pages. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Tue, Aug 2, 2016 at 2:33 PM, Bruce Momjian wrote: > On Tue, Jul 26, 2016 at 05:42:43PM -0400, Chapman Flack wrote: >> Even so, I'd be curious whether it would break anything to have >> xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero >> pages written to fill out a segment. At least until it's felt >> that archive_timeout has been so decidedly obsoleted by streaming >> replication that it is removed, and the log-tail zeroing code >> with it. >> >> That at least would eliminate the risk of anyone else repeating >> my astonishment. :) I had read that 9.4 added built-in log-zeroing >> code, and my first reaction was "cool! that may make the compression >> technique we're using unnecessary, but certainly can't make it worse" >> only to discover that it did, by ~ 300x, becoming now 3x *worse* than >> plain gzip, which itself is ~ 100x worse than what we had. > > My guess is that the bytes are there to detect problems where a 512-byte > disk sector is zeroed by a disk failure. I don't see use changing that > for the use-case you have described. Is there actually any code that makes such a check? I'm inclined to doubt that was the motivation, though admittedly we're both speculating about the contents of Heikki's brain, a tricky proposition on a good day. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 08/02/2016 02:33 PM, Bruce Momjian wrote: > My guess is that the bytes are there to detect problems where > a 512-byte disk sector is zeroed by a disk failure. Does that seem plausible? (a) there is only one such header for every 16 512-byte disk sectors, so it only affords a 6% chance of detecting a zeroed sector, and (b) the header contains other non-zero values in fields other than xlp_pageaddr, so the use of a fixed value for _that field_ in zeroed tail blocks would not prevent (or even reduce the 6% probability of) detecting a sector zeroed by a defect. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Tue, Jul 26, 2016 at 05:42:43PM -0400, Chapman Flack wrote: > Even so, I'd be curious whether it would break anything to have > xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero > pages written to fill out a segment. At least until it's felt > that archive_timeout has been so decidedly obsoleted by streaming > replication that it is removed, and the log-tail zeroing code > with it. > > That at least would eliminate the risk of anyone else repeating > my astonishment. :) I had read that 9.4 added built-in log-zeroing > code, and my first reaction was "cool! that may make the compression > technique we're using unnecessary, but certainly can't make it worse" > only to discover that it did, by ~ 300x, becoming now 3x *worse* than > plain gzip, which itself is ~ 100x worse than what we had. My guess is that the bytes are there to detect problems where a 512-byte disk sector is zeroed by a disk failure. I don't see use changing that for the use-case you have described. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/26/16 20:01, Michael Paquier wrote: > On Tue, Jul 26, 2016 at 9:48 PM, Amit Kapila wrote: >> Does any body else see the use case >> reported by Chapman important enough that we try to have some solution >> for it in-core? > > The lack of updates in the pg_lesslog project is a sign that it is not > that much used. I does not seem a good idea to bring in-core a tool > not used that much by users. Effectively, it already was brought in-core in commit 9a20a9b. Only, that change had an unintended consequence that *limits* compressibility - and it would not have that consequence, if it were changed to simply set xlp_pageaddr to InvalidXLogRecPtr in the dummy zero pages written to fill out a segment. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Tue, Jul 26, 2016 at 9:48 PM, Amit Kapila wrote: > Does any body else see the use case > reported by Chapman important enough that we try to have some solution > for it in-core? The lack of updates in the pg_lesslog project is a sign that it is not that much used. I does not seem a good idea to bring in-core a tool not used that much by users. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/26/2016 04:21 PM, Robert Haas wrote: > I'm kind of curious WHY you are using archiving and forcing regular > segment switches rather than just using streaming replication. > ... AFAIK, streaming replication > essentially obsoleted that use case. You can just dribble the > individual bytes over the wire a few at a time to the standby or, with > pg_receivexlog, to an archive location. If it takes 6 months to fill > up a WAL segment, you don't care: you'll always have all the bytes Part of it is just the legacy situation: at the moment, the offsite host is of a different architecture and hasn't got PostgreSQL installed (but it's easily ssh'd to for delivering compressed WAL segments). We could change that down the road, and pg_receivexlog would work for getting the bytes over there. My focus for the moment was just on migrating a cluster to 9.5 without changing the surrounding arrangements all at once. Seeing how much worse our compression ratio will be, though, maybe I need to revisit that plan. Even so, I'd be curious whether it would break anything to have xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero pages written to fill out a segment. At least until it's felt that archive_timeout has been so decidedly obsoleted by streaming replication that it is removed, and the log-tail zeroing code with it. That at least would eliminate the risk of anyone else repeating my astonishment. :) I had read that 9.4 added built-in log-zeroing code, and my first reaction was "cool! that may make the compression technique we're using unnecessary, but certainly can't make it worse" only to discover that it did, by ~ 300x, becoming now 3x *worse* than plain gzip, which itself is ~ 100x worse than what we had. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Fri, Jul 22, 2016 at 6:02 PM, Chapman Flack wrote: > At $work, we have a usually-low-activity PG database, so that almost > always the used fraction of each 16 MB WAL segment is far smaller > than 16 MB, and so it's a big win for archived-WAL storage space > if an archive-command can be written that compresses those files > effectively. I'm kind of curious WHY you are using archiving and forcing regular segment switches rather than just using streaming replication. Pre-9.0, use of archive_timeout was routine, since there was no other way to ensure that the data ended up someplace other than your primary with reasonable regularity. But, AFAIK, streaming replication essentially obsoleted that use case. You can just dribble the individual bytes over the wire a few at a time to the standby or, with pg_receivexlog, to an archive location. If it takes 6 months to fill up a WAL segment, you don't care: you'll always have all the bytes that were generated more than a fraction of a second before the master melted into a heap of slag. I'm not saying you don't have a good reason for doing what you are doing, just that I cannot think of one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/26/2016 08:48 AM, Amit Kapila wrote: > general, if you have a very low WAL activity, then the final size of > compressed WAL shouldn't be much even if you use gzip. It seems your 9.5 pg_xlog, low activity test cluster (segment switches forced only by checkpoint timeouts), compressed with gzip -9: $ for i in 0*; do echo -n "$i " && gzip -9 <$i | wc -c; done 000100010042 27072 000100010043 27075 000100010044 27077 000100010045 27073 000100010046 27075 Log from live pre-9.4 cluster, low-activity time of day, delta compression using rsync: 2016-07-26 03:54:02 EDT (walship) INFO: using 2.39s user, 0.4s system, 9.11s on wall: 231 byte 000100460029_000100460021_fwd ... 2016-07-26 04:54:01 EDT (walship) INFO: using 2.47s user, 0.4s system, 8.43s on wall: 232 byte 00010046002A_000100460022_fwd ... 2016-07-26 05:54:02 EDT (walship) INFO: using 2.56s user, 0.29s system, 9.44s on wall: 230 byte 00010046002B_000100460023_fwd So when I say "factor of 100", I'm understating slightly. (Those timings, for the curious, include sending a copy offsite via ssh.) > everything zero. Now, it might be possible to selectively initialize > the fields that doesn't harm the methodology for archive you are using > considering there is no other impact of same in code. However, it Indeed, it is only the one header field that duplicates the low- order part of the (hex) file name that breaks delta compression, because it has always been incremented even when nothing else is different, and it's scattered 2000 times through the file. Would it break anything for *that* to be zero in dummy blocks? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Tue, Jul 26, 2016 at 9:02 AM, Chapman Flack wrote: > On 07/25/16 22:09, Michael Paquier wrote: > >> This is over-complicating things for little gain. The new behavior of >> filling in with zeros the tail of a segment makes things far better >> when using gzip in archive_command. > > Then how about filling with actual zeros, instead of with mostly-zeros > as is currently done? That would work just as well for gzip, and would > not sacrifice the ability to do 100x better than gzip. > There is a flag XLP_BKP_REMOVABLE for the purpose of ignoring empty blocks, any external tool/'s relying on it can break, if make everything zero. Now, it might be possible to selectively initialize the fields that doesn't harm the methodology for archive you are using considering there is no other impact of same in code. However, it doesn't look to be a neat way to implement the requirement. In general, if you have a very low WAL activity, then the final size of compressed WAL shouldn't be much even if you use gzip. It seems your main concern is that the size of WAL even though not high, but it is more than what you were earlier getting for your archive data. I think that is a legitimate concern, but I don't see much options apart for providing some selective way to not initialize everything in WAL page headers or have some tool like pg_lesslog that can be shipped as part of contrib module. I am not sure whether your use case is important enough to proceed with one of those options or may be consider some another approach. Does any body else see the use case reported by Chapman important enough that we try to have some solution for it in-core? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/25/16 22:09, Michael Paquier wrote: > This is over-complicating things for little gain. The new behavior of > filling in with zeros the tail of a segment makes things far better > when using gzip in archive_command. Then how about filling with actual zeros, instead of with mostly-zeros as is currently done? That would work just as well for gzip, and would not sacrifice the ability to do 100x better than gzip. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Mon, Jul 25, 2016 at 11:21 PM, Chapman Flack wrote: > The impression that leaves is of tools that relied too heavily > on internal format knowledge to be viable outside of core, which > have had at least periods of incompatibility with newer PG versions, > and whose current status, if indeed any are current, isn't easy > to find out. WAL format has gone through a lot of changes in 9.4 as well. 9.3 has as well introduced xlogreader.c which is what *any* client trying to read WAL into an understandable format should use. > And that, I assume, was also the motivation to put the zeroing > in AdvanceXLInsertBuffer, eliminating the need for one narrow, > specialized tool like pg{_clear,_compress,less}log{,tail}, so > the job can be done with ubiquitous, bog standard (and therefore > *very* exhaustively tested) tools like gzip. Exactly, and honestly this has been a huge win to make such segments more compressible. > Even so, it still seems to me that a cheaper solution is a %e > substitution in archive_command: just *tell* the command where > the valid bytes end. Accomplishes the same thing as ~ 16 MB > of otherwise-unnecessary I/O at the time of archiving each > lightly-used segment. > > Then the actual zeroing could be suppressed to save I/O, maybe > with a GUC variable, or maybe just when archive_command is seen > to contain a %e. Commands that don't have a %e continue to work > and compress effectively because of the zeroing. This is over-complicating things for little gain. The new behavior of filling in with zeros the tail of a segment makes things far better when using gzip in archive_command. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On 07/23/2016 08:25 AM, Amit Kapila wrote: > On Sat, Jul 23, 2016 at 3:32 AM, Chapman Flack wrote: >> >> Would it then be possible to go back to the old behavior (or make >> it selectable) of not overwriting the full 16 MB every time? >> > > I don't see going back to old behaviour is an improvement, because as > as you pointed out above that it helps to improve the compression > ratio of WAL files for tools like gzip and it doesn't seem advisable > to loose that capability. I think providing an option to select that > behaviour could be one choice, but use case seems narrow to me > considering there are tools (pglesslog) to clear the tail. Do you > find any problems with that tool which makes you think that it is not > reliable? It was a year or so ago when I was surveying tools that attempted to do that. I had found pg_clearxlogtail, and I'm sure I also found pglesslog / pg_compresslog ... my notes from then simply refer to "contrib efforts like pg_clearxlogtail" and observed either a dearth of recent search results for them, or a predominance of results of the form "how do I get this to compile for PG x.x?" pg_compresslog is mentioned in a section, Compressed Archive Logs, of the PG 9.1 manual: https://www.postgresql.org/docs/9.1/static/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS That section is absent in the docs any version > 9.1. The impression that leaves is of tools that relied too heavily on internal format knowledge to be viable outside of core, which have had at least periods of incompatibility with newer PG versions, and whose current status, if indeed any are current, isn't easy to find out. It seems a bit risky (to me, anyway) to base a backup strategy on having a tool in the pipeline that depends so heavily on internal format knowledge, can become uncompilable between PG releases, and isn't part of core and officially supported. And that, I assume, was also the motivation to put the zeroing in AdvanceXLInsertBuffer, eliminating the need for one narrow, specialized tool like pg{_clear,_compress,less}log{,tail}, so the job can be done with ubiquitous, bog standard (and therefore *very* exhaustively tested) tools like gzip. So it's just kind of unfortunate that there used to be a *further* factor of 100 (nothing to sneeze at) possible using rsync (another non-PG-specific, ubiquitous, exhaustively tested tool) but a trivial feature of the new behavior has broken that. Factors of 100 are enough to change the sorts of things you think about, like possibly retaining years-long unbroken histories of transactions in WAL. What would happen if the overwriting of the log tail were really done with just zeros, as the git comment implied, rather than zeros with initialized headers? Could the log-reading code handle that gracefully? That would support all forms of non-PG-specific, ubiquitous tools used for compression; it would not break the rsync approach. Even so, it still seems to me that a cheaper solution is a %e substitution in archive_command: just *tell* the command where the valid bytes end. Accomplishes the same thing as ~ 16 MB of otherwise-unnecessary I/O at the time of archiving each lightly-used segment. Then the actual zeroing could be suppressed to save I/O, maybe with a GUC variable, or maybe just when archive_command is seen to contain a %e. Commands that don't have a %e continue to work and compress effectively because of the zeroing. -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
On Sat, Jul 23, 2016 at 3:32 AM, Chapman Flack wrote: > > Would it then be possible to go back to the old behavior (or make > it selectable) of not overwriting the full 16 MB every time? > I don't see going back to old behaviour is an improvement, because as as you pointed out above that it helps to improve the compression ratio of WAL files for tools like gzip and it doesn't seem advisable to loose that capability. I think providing an option to select that behaviour could be one choice, but use case seems narrow to me considering there are tools (pglesslog) to clear the tail. Do you find any problems with that tool which makes you think that it is not reliable? > Or did the 9.4 changes also change enough other logic that stuff > would now break if that isn't done? > The changes related to the same seems to be isolated (mainly in CopyXLogRecordToWAL()) and doesn't look to impact other parts of system, although some more analysis is needed to confirm the same, but I think the point to make it optional doesn't seem convincing to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Teaser: change made in 9.4 to simplify WAL segment compression made it easier to compress a low-activity-period WAL segment from 16 MB to about 27 kB ... but much harder to do better than that, as I was previously doing (about two orders of magnitude better). At $work, we have a usually-low-activity PG database, so that almost always the used fraction of each 16 MB WAL segment is far smaller than 16 MB, and so it's a big win for archived-WAL storage space if an archive-command can be written that compresses those files effectively. Our database was also running on a pre-9.4 version, and I'm currently migrating to 9.5.3. As I understand it, 9.4 was where commit 9a20a9b landed, which changed what happens in the unwritten 'tail' of log segments. In my understanding, before 9.4, the 'tail' of any log segment on disk just wasn't written, and so (as segment recycling simply involves renaming a file that held some earlier segment), the remaining content was simply whatever had been there before recycling. That was never a problem for recovery (which could tell when it reached the end of real data), but was not well compressible with a generic tool like gzip. Specialized tools like pg_clearxlogtail existed, but had to know too much about the internal format, and ended up unmaintained and therefore difficult to trust. The change in 9.4 included this, from the git comment: This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. ... thus making the segments easily compressible with bog standard tools. So I can just point gzip at one of our WAL segments from a light-activity period and it goes from 16 MB down to about 27 kB. Nice, right? But why does it break my earlier approach, which was doing about two orders of magnitude better, getting low-activity WAL segments down to 200 to 300 *bytes*? (Seriously: my last solid year of archived WAL is contained in a 613 MB zip file.) That approach was based on using rsync (also bog standard) to tease apart the changed and unchanged bits of the newly-archived segment and the last-seen content of the file with the same i-number. You would expect that to work just as well when the tail is always zeros as it was working before, right? And what's breaking it now is the tiny bit of fine print that's in the code comment for AdvanceXLInsertBuffer but not in the git comment above: * ... Any new pages are initialized to zeros, with pages headers * initialized properly. That innocuous "headers initialized" means that the tail of the file is *almost* all zeros, but every 8 kB there is a tiny header, and in each tiny header, there is *one byte* that differs from its value in the pre-recycle content at the same i-node, because that one byte in each header reflects the WAL segment number. Before the 9.4 change, I see there were still headers there, and they did contain a byte matching the segment number, but in the unwritten portion of course it matched the pre-recycle segment number, and rsync easily detected the whole unchanged tail of the file. Now there is one changed byte every 8 kB, and the rsync output, instead of being 100x better than vanilla gzip, is about 3x worse. Taking a step back, isn't overwriting the whole unused tail of each 16 MB segment really just an I/O intensive way of communicating to the archive-command where the valid data ends? Could that not be done more efficiently by adding another code, say %e, in archive-command, that would be substituted by the offset of the end of the XLOG_SWITCH record? That way, however archive-command is implemented, it could simply know how much of the file to copy. Would it then be possible to go back to the old behavior (or make it selectable) of not overwriting the full 16 MB every time? Or did the 9.4 changes also change enough other logic that stuff would now break if that isn't done? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers