Re: [HACKERS] WAL format changes break the suppression of do-nothing checkpoints.
On 03/31/2015 09:09 PM, Jeff Janes wrote: On Tue, Mar 31, 2015 at 6:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 03/30/2015 09:01 PM, Jeff Janes wrote: commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs saying If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed, presumably by accident. It seems that this part is no longer true when it should be true: if (curInsert == ControlFile-checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount by which curInsert gets advanced is still 104, like it was before the commit. Hmm. Wasn't this a bit broken before too, when the checkpoint record crosses a page boundary? But once the next one it was written, it wouldn't across a boundary anymore. So you would get one extra checkpoint, rather than an infinite stream of them. So a bit broken, but just a bit. Yeah. It's not serious. It occurs to me that that's actually nice from a robustness point of view: if something goes badly wrong and corrupt, you're more likely to be able to recover the last checkpoint record if it doesn't cross a page boundary. And more importantly, a segment boundary. OTOH, when you create a spurious checkpoint, you lose the previous checkpoint location. A good compromise is probably to not skip the checkpoint if the last one crossed a segment boundary. (Committed that way) Instead of trying to calculate where the checkpoint record ends, I think we could check that the prev-pointer points to the last checkpoint record. Wouldn't doing that involve reading WAL while not in recovery? No. We keep track of the beginning position of the last inserted record at all times, because when the next record is inserted, we have to put that in the xl_prev field. Committed a patch to do that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes break the suppression of do-nothing checkpoints.
On 03/30/2015 09:01 PM, Jeff Janes wrote: commit 2c03216d831160bedd72d45f7 has invalidated the part of the docs saying If no WAL has been written since the previous checkpoint, new checkpoints will be skipped even if checkpoint_timeout has passed, presumably by accident. It seems that this part is no longer true when it should be true: if (curInsert == ControlFile-checkPoint + MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint)) MAXALIGN(SizeOfXLogRecord + sizeof(CheckPoint) is now 96, but the amount by which curInsert gets advanced is still 104, like it was before the commit. Hmm. Wasn't this a bit broken before too, when the checkpoint record crosses a page boundary? Instead of trying to calculate where the checkpoint record ends, I think we could check that the prev-pointer points to the last checkpoint record. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On fre, 2012-06-15 at 00:01 +0300, Heikki Linnakangas wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. A thought on this. There were some concerns that this would silently break tools that pretend to have detailed knowledge of WAL file numbering and this previous behavior of skipping the FF files. We could address this by fixing the overall file naming from something like 000108D000FD 000108D000FE 000108D000FF 000108D1 to 000108D0FD00 000108D0FE00 000108D0FF00 000108D1 which represents the new true WAL stream numbering as opposed to the old two-part numbering. Thus, any tool that thinks it knows how the WAL files are sequenced will break very obviously, but any tool that just looks for 24 hexadecimal digits will be fine. I wonder if any tools in the former category would also break if one changes XLOG_SEG_SIZE. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thu, Jun 14, 2012 at 10:01 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: This has the advantage that you can calculate the CRC for all the other fields before acquiring WALInsertLock. For xl_prev, you need to know where exactly the record is inserted, so it's handy that it's the last field before CRC. It may be late to mention this but fwiw you don't need to reorder the fields to do this. CRC has the property that you can easily adjust it for any changes to the data covered by it. Regardless of where the xl_prev link is you can calculate the CRC as if xl_prev is 0 and then once you get the lock add in the correct xl_prev. This is an argument in favour of using CRC over other checksums for which that would be hard or impossible. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ok, committed all the WAL format changes now. This breaks pg_resetxlog -l at all. When I ran pg_resetxlog -l 0x01,0x01,0x01 in the HEAD, I got the following error message though the same command successfully completed in 9.1. pg_resetxlog: invalid argument for option -l Try pg_resetxlog --help for more information. I think the attached patch needs to be applied. Regards, -- Fujii Masao resetxlog_bugfix_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ok, committed all the WAL format changes now. I found the typo. In walsender.c -reply.write.xlogid, reply.write.xrecoff, -reply.flush.xlogid, reply.flush.xrecoff, -reply.apply.xlogid, reply.apply.xrecoff); +(uint32) (reply.write 32), (uint32) reply.write, +(uint32) (reply.flush 32), (uint32) reply.flush, +(uint32) (reply.apply 32), (uint32) reply.apply); should be . The attached patch fixes this typo. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Tue, Jun 26, 2012 at 2:53 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jun 25, 2012 at 1:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ok, committed all the WAL format changes now. I found the typo. In walsender.c - reply.write.xlogid, reply.write.xrecoff, - reply.flush.xlogid, reply.flush.xrecoff, - reply.apply.xlogid, reply.apply.xrecoff); + (uint32) (reply.write 32), (uint32) reply.write, + (uint32) (reply.flush 32), (uint32) reply.flush, + (uint32) (reply.apply 32), (uint32) reply.apply); should be . The attached patch fixes this typo. Oh, I forgot to attach the patch.. Here is the patch. Regards, -- Fujii Masao walsender_typo_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masao masao.fu...@gmail.com wrote: should be . The attached patch fixes this typo. Oh, I forgot to attach the patch.. Here is the patch. I committed both of the patches you posted to this thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 25.06.2012 21:01, Robert Haas wrote: On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com wrote: should be. The attached patch fixes this typo. Oh, I forgot to attach the patch.. Here is the patch. I committed both of the patches you posted to this thread. Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL file name, instead of comma-separated tli, xlogid, segno arguments. The latter is a bit meaningless now that we don't use the xlogid+segno combination anywhere else. Alvaro pointed out that pg_upgrade was broken by the change in pg_resetxlog -n output - I changed that too to print the First log segment after reset information as a WAL file name, instead of logid+segno. Another option would be to print the 64-bit segment number, but I think that's worse, because the 64-bit segment number is harder to associate with a physical WAL file. So I think we should change pg_resetxlog -l option to take a WAL file name as argument, and fix pg_upgrade accordingly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
Excerpts from Heikki Linnakangas's message of lun jun 25 20:09:34 -0400 2012: On 25.06.2012 21:01, Robert Haas wrote: On Mon, Jun 25, 2012 at 1:57 PM, Fujii Masaomasao.fu...@gmail.com wrote: should be. The attached patch fixes this typo. Oh, I forgot to attach the patch.. Here is the patch. I committed both of the patches you posted to this thread. Thanks Robert. I was thinking that pg_resetxlog -l would accept a WAL file name, instead of comma-separated tli, xlogid, segno arguments. The latter is a bit meaningless now that we don't use the xlogid+segno combination anywhere else. Alvaro pointed out that pg_upgrade was broken by the change in pg_resetxlog -n output - I changed that too to print the First log segment after reset information as a WAL file name, instead of logid+segno. Another option would be to print the 64-bit segment number, but I think that's worse, because the 64-bit segment number is harder to associate with a physical WAL file. So I think we should change pg_resetxlog -l option to take a WAL file name as argument, and fix pg_upgrade accordingly. The only thing pg_upgrade does with the tli/logid/segno combo, AFAICT, is pass it back to pg_resetxlog -l, so this plan seems reasonable. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: So I think we should change pg_resetxlog -l option to take a WAL file name as argument, and fix pg_upgrade accordingly. Seems reasonable I guess. It's really specifying a starting WAL location, but only to file granularity, so treating the argument as a file name is sort of a type cheat but seems convenient. If we do it that way, we'd better validate that the argument is a legal WAL file name, so as to catch any cases where somebody tries to do it old-style. BTW, does pg_resetxlog's logic for setting the default -l value (from scanning pg_xlog to find the largest existing file name) still work? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: So I think we should change pg_resetxlog -l option to take a WAL file name as argument, and fix pg_upgrade accordingly. Seems reasonable I guess. It's really specifying a starting WAL location, but only to file granularity, so treating the argument as a file name is sort of a type cheat but seems convenient. If we do it that way, we'd better validate that the argument is a legal WAL file name, so as to catch any cases where somebody tries to do it old-style. BTW, does pg_resetxlog's logic for setting the default -l value (from scanning pg_xlog to find the largest existing file name) still work? It finds the segment number for largest existing file name from pg_xlog and then compare it with input provided by the user for -l Option, if input is greater it will use the input to set in control file. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
Ok, committed all the WAL format changes now. On 19.06.2012 18:57, Robert Haas wrote: Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. That would make it easier to write a client that works across different architectures, in= 9.3. The contents of the WAL would of course be architecture-dependent, but it would be nice if pg_receivexlog and similar tools could nevertheless be architecture-independent. I share Andres' question about how we're doing this already. I think if we're going to break this, I'd rather do it in 9.3 than 5 years from now. At this point it's just a minor annoyance, but it'll probably get worse as people write more tools that understand WAL. I didn't touch the replication protocol yet, but I think we should do it some time during 9.3. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 24 June 2012 17:24, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ok, committed all the WAL format changes now. Nice! -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Tue, Jun 19, 2012 at 5:57 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. +1. The LSNs on pages are still stored in the old format, to avoid changing the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control file and WAL are changed, however, so an initdb (or at least pg_resetxlog) is required. Seems fine. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. That would make it easier to write a client that works across different architectures, in = 9.3. The contents of the WAL would of course be architecture-dependent, but it would be nice if pg_receivexlog and similar tools could nevertheless be architecture-independent. I share Andres' question about how we're doing this already. I think if we're going to break this, I'd rather do it in 9.3 than 5 years from now. At this point it's just a minor annoyance, but it'll probably get worse as people write more tools that understand WAL. If we are looking at breaking it, and we are especially concerned about something like pg_receivexlog... Is it something we could/should change in the protocl *now* for 9.2, to make it non-broken in any released version? As in, can we extract just the protocol change and backpatch that to 9.2beta? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Wed, Jun 20, 2012 at 8:19 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jun 19, 2012 at 5:57 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. +1. The LSNs on pages are still stored in the old format, to avoid changing the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control file and WAL are changed, however, so an initdb (or at least pg_resetxlog) is required. Seems fine. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. That would make it easier to write a client that works across different architectures, in = 9.3. The contents of the WAL would of course be architecture-dependent, but it would be nice if pg_receivexlog and similar tools could nevertheless be architecture-independent. I share Andres' question about how we're doing this already. I think if we're going to break this, I'd rather do it in 9.3 than 5 years from now. At this point it's just a minor annoyance, but it'll probably get worse as people write more tools that understand WAL. If we are looking at breaking it, and we are especially concerned about something like pg_receivexlog... Is it something we could/should change in the protocl *now* for 9.2, to make it non-broken in any released version? As in, can we extract just the protocol change and backpatch that to 9.2beta? pg_receivexlog in 9.2 cannot handle correctly the WAL location FF (which was skipped in 9.2 or before). For example, pg_receivexlog calls XLByteAdvance() which always skips FF. So even if we change the protocol, ISTM pg_receivexlog in 9.2 cannot work well with the server in 9.3 which might send FF. No? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 20.06.2012 20:43, Fujii Masao wrote: On Wed, Jun 20, 2012 at 8:19 PM, Magnus Hagandermag...@hagander.net wrote: On Tue, Jun 19, 2012 at 5:57 PM, Robert Haasrobertmh...@gmail.com wrote: On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. +1. The LSNs on pages are still stored in the old format, to avoid changing the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control file and WAL are changed, however, so an initdb (or at least pg_resetxlog) is required. Seems fine. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. That would make it easier to write a client that works across different architectures, in= 9.3. The contents of the WAL would of course be architecture-dependent, but it would be nice if pg_receivexlog and similar tools could nevertheless be architecture-independent. I share Andres' question about how we're doing this already. I think if we're going to break this, I'd rather do it in 9.3 than 5 years from now. At this point it's just a minor annoyance, but it'll probably get worse as people write more tools that understand WAL. If we are looking at breaking it, and we are especially concerned about something like pg_receivexlog... Is it something we could/should change in the protocl *now* for 9.2, to make it non-broken in any released version? As in, can we extract just the protocol change and backpatch that to 9.2beta? pg_receivexlog in 9.2 cannot handle correctly the WAL location FF (which was skipped in 9.2 or before). For example, pg_receivexlog calls XLByteAdvance() which always skips FF. So even if we change the protocol, ISTM pg_receivexlog in 9.2 cannot work well with the server in 9.3 which might send FF. No? Yeah, you can't use pg_receivexlog from 9.2 against a 9.3 server. We can't really promise compatibility when using an older client against a newer server, but we can try to be backwards-compatible in the other direction. I'm thinking of using a 9.3 pg_receivexlog against a 9.2 server. But I guess Robert is right and we shouldn't worry about backwards-compatibility at this point. Instead, let's try to get the protocol right, so that we can more easily provide backwards-compatibility in the future. Like, using a 9.4 pg_receivexlog against a 9.3 server. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote: On 18.06.2012 21:08, Heikki Linnakangas wrote: On 18.06.2012 21:00, Robert Haas wrote: On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. I wondered that, too. There may be a good reason for keeping it split up that way, but we at least oughta think about it a bit. The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure if having two representations which just have a constant factor inbetween really makes sense. The LSNs on pages are still stored in the old format, to avoid changing the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control file and WAL are changed, however, so an initdb (or at least pg_resetxlog) is required. Sounds sensible. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. The replication protocol uses pq_sendint for integers which should take care of converting to big endian already. I don't think anything but the wal itself is otherwise transported in a binary fashion? So I don't think there is any such architecture dependency in the protocol currently? I don't really see a point in keeping around a backward-compatible representation just for the sake of running such tools on multiple versions. I might not be pragmatic enough, but: Why would you want to do that *at the moment*? Many of the other tools are already version specific, so... When the protocol starts to be used by more tools, maybe, but imo were not there yet. But then its not hard to convert to the old representation for that. I kept the %X/%X representation in error messages etc. I'm quite used to that output, so reluctant to change it, although it's a bit silly now that it represents just 64-bit value. Using UINT64_FORMAT would also make the messages harder to translate. No opinion on that. Its easier to see for me whether two values are exactly the same or very similar with the 64bit representation but its harder to gauge bigger differences. So ... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. +1. The LSNs on pages are still stored in the old format, to avoid changing the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control file and WAL are changed, however, so an initdb (or at least pg_resetxlog) is required. Seems fine. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. That would make it easier to write a client that works across different architectures, in = 9.3. The contents of the WAL would of course be architecture-dependent, but it would be nice if pg_receivexlog and similar tools could nevertheless be architecture-independent. I share Andres' question about how we're doing this already. I think if we're going to break this, I'd rather do it in 9.3 than 5 years from now. At this point it's just a minor annoyance, but it'll probably get worse as people write more tools that understand WAL. I kept the %X/%X representation in error messages etc. I'm quite used to that output, so reluctant to change it, although it's a bit silly now that it represents just 64-bit value. Using UINT64_FORMAT would also make the messages harder to translate. I could go either way on this one, but I have no problem with the way you did it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 19.06.2012 18:46, Andres Freund wrote: On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure if having two representations which just have a constant factor inbetween really makes sense. I wasn't planning to, it didn't even occur to me that we might be able to get rid of XLogSegNo to be honest. There's places that deal whole segments, rather than with specific byte positions in the WAL, so I think XLogSegNo makes more sense in that context. Take XLogArchiveNotifySeg(), for example. It notifies the archiver that a given segment is ready for archiving, so we pass an XLogSegNo to identify that segment as an argument. I suppose we could pass an XLogRecPtr that points to the beginning of the segment instead, but it doesn't really feel like an improvement to me. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. The replication protocol uses pq_sendint for integers which should take care of converting to big endian already. I don't think anything but the wal itself is otherwise transported in a binary fashion? So I don't think there is any such architecture dependency in the protocol currently? We only use pg_sendint() for the few values exchanged in the handshake before we start replicating, but once we begin, we just send structs around. For example, in ProcessStandbyReplyMessage(): static void ProcessStandbyReplyMessage(void) { StandbyReplyMessage reply; pq_copymsgbytes(reply_message, (char *) reply, sizeof(StandbyReplyMessage)); ... After that, we just the fields in the reply struct like in any other struct. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
Hi, On Wednesday, June 20, 2012 12:24:54 AM Heikki Linnakangas wrote: On 19.06.2012 18:46, Andres Freund wrote: On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote: Well, that was easier than I thought. Attached is a patch to make XLogRecPtr a uint64, on top of my other WAL format patches. I think we should go ahead with this. Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure if having two representations which just have a constant factor inbetween really makes sense. I wasn't planning to, it didn't even occur to me that we might be able to get rid of XLogSegNo to be honest. There's places that deal whole segments, rather than with specific byte positions in the WAL, so I think XLogSegNo makes more sense in that context. Take XLogArchiveNotifySeg(), for example. It notifies the archiver that a given segment is ready for archiving, so we pass an XLogSegNo to identify that segment as an argument. I suppose we could pass an XLogRecPtr that points to the beginning of the segment instead, but it doesn't really feel like an improvement to me. I am not sure its a win either, was just wondering because they now are that similar. Should we keep the old representation in the replication protocol messages? That would make it simpler to write a client that works with different server versions (like pg_receivexlog). Or, while we're at it, perhaps we should mandate network-byte order for all the integer and XLogRecPtr fields in the replication protocol. The replication protocol uses pq_sendint for integers which should take care of converting to big endian already. I don't think anything but the wal itself is otherwise transported in a binary fashion? So I don't think there is any such architecture dependency in the protocol currently? We only use pg_sendint() for the few values exchanged in the handshake before we start replicating, but once we begin, we just send structs around. For example, in ProcessStandbyReplyMessage(): static void ProcessStandbyReplyMessage(void) { StandbyReplyMessage reply; pq_copymsgbytes(reply_message, (char *) reply, sizeof(StandbyReplyMessage)); ... After that, we just the fields in the reply struct like in any other struct. Yes, forgot that, true. I guess the best fix would be to actually send normal messages instead of CopyData ones? Much more to type though... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thu, Jun 14, 2012 at 5:58 PM, Andres Freund and...@2ndquadrant.com wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. I wondered that, too. There may be a good reason for keeping it split up that way, but we at least oughta think about it a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 18.06.2012 21:00, Robert Haas wrote: On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. I wondered that, too. There may be a good reason for keeping it split up that way, but we at least oughta think about it a bit. The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote: On 18.06.2012 21:00, Robert Haas wrote: On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. I wondered that, too. There may be a good reason for keeping it split up that way, but we at least oughta think about it a bit. The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. I am out of my depth here, not having read any of the relevant code, but couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without marking the page dirty? There is the valid argument that you would loose some information when pages with hint bits are written out again, but on the other hand you would also gain the information that it was a hint-bit write... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Mon, Jun 18, 2012 at 2:08 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 18.06.2012 21:00, Robert Haas wrote: On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com wrote: 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. I wondered that, too. There may be a good reason for keeping it split up that way, but we at least oughta think about it a bit. The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. Ugh. Well, that's a good reason for thinking twice before changing it, if not abandoning the idea altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 18.06.2012 21:13, Andres Freund wrote: On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote: The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. I am out of my depth here, not having read any of the relevant code, but couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without marking the page dirty? There is the valid argument that you would loose some information when pages with hint bits are written out again, but on the other hand you would also gain the information that it was a hint-bit write... Sorry, I don't understand that. Where would you replace the LSN from disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) ) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote: On 18.06.2012 21:13, Andres Freund wrote: On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote: The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. I am out of my depth here, not having read any of the relevant code, but couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without marking the page dirty? There is the valid argument that you would loose some information when pages with hint bits are written out again, but on the other hand you would also gain the information that it was a hint-bit write... Sorry, I don't understand that. Where would you replace the LSN from disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) ) In ReadBuffer_common or such, after reading a page from disk and verifying that the page has a valid header it seems to be possible to replace pd_lsn *in memory* by InvalidXLogRecPtr without marking the page dirty. If the page isn't modified the lsn on disk won't be changed so you don't loose debugging information in that case. If will be zero if it has been written by a hint-bit write and thats arguable a win. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On 18.06.2012 21:45, Andres Freund wrote: On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote: On 18.06.2012 21:13, Andres Freund wrote: On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote: The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. I am out of my depth here, not having read any of the relevant code, but couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without marking the page dirty? There is the valid argument that you would loose some information when pages with hint bits are written out again, but on the other hand you would also gain the information that it was a hint-bit write... Sorry, I don't understand that. Where would you replace the LSN from disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) ) In ReadBuffer_common or such, after reading a page from disk and verifying that the page has a valid header it seems to be possible to replace pd_lsn *in memory* by InvalidXLogRecPtr without marking the page dirty. If the page isn't modified the lsn on disk won't be changed so you don't loose debugging information in that case. If will be zero if it has been written by a hint-bit write and thats arguable a win. We use the LSN to decide whether a full-page image need to be xlogged if the page is modified. If you reset LSN every time you read in a page, you'll be doing full page writes every time a page is read from disk and modified, whether or not it's the first time the page is modified after the last checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Monday, June 18, 2012 09:19:48 PM Heikki Linnakangas wrote: On 18.06.2012 21:45, Andres Freund wrote: On Monday, June 18, 2012 08:32:54 PM Heikki Linnakangas wrote: On 18.06.2012 21:13, Andres Freund wrote: On Monday, June 18, 2012 08:08:14 PM Heikki Linnakangas wrote: The page header contains an XLogRecPtr (LSN), so if we change it we'll have to deal with pg_upgrade. I guess we could still keep XLogRecPtr around as the on-disk representation, and convert between the 64-bit integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out - many xlog calculations would admittedly be simpler if it was an uint64. I am out of my depth here, not having read any of the relevant code, but couldn't we simply replace the lsn from disk with InvalidXLogRecPtr without marking the page dirty? There is the valid argument that you would loose some information when pages with hint bits are written out again, but on the other hand you would also gain the information that it was a hint-bit write... Sorry, I don't understand that. Where would you replace the LSN from disk with InvalidXLogRecPtr ? (and no, it probably won't work ;-) ) In ReadBuffer_common or such, after reading a page from disk and verifying that the page has a valid header it seems to be possible to replace pd_lsn *in memory* by InvalidXLogRecPtr without marking the page dirty. If the page isn't modified the lsn on disk won't be changed so you don't loose debugging information in that case. If will be zero if it has been written by a hint-bit write and thats arguable a win. We use the LSN to decide whether a full-page image need to be xlogged if the page is modified. If you reset LSN every time you read in a page, you'll be doing full page writes every time a page is read from disk and modified, whether or not it's the first time the page is modified after the last checkpoint. Yea, I somehow disregarded that hint-bit writes would make a problem with full page writes in that case. Normal writes wouldn't be a problem... This should be fixable but the result would be too ugly. So its either converting the on-disk representation or keeping the duplicated representation. pd_lsn seems to be well enough abstracted, doing the conversion in PageSet/GetLSN seems to be easy enough. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote: As I threatened earlier (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co m), here are three patches that change the WAL format. The goal is to change the format so that when you're inserting a WAL record of a given size, you know exactly how much space it requires in the WAL. I fear the patches need rebasing after the pgindent run... Even before that (60801944fa105252b48ea5688d47dfc05c695042) it only applies with offsets? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format changes
On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote: As I threatened earlier (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co m), here are three patches that change the WAL format. The goal is to change the format so that when you're inserting a WAL record of a given size, you know exactly how much space it requires in the WAL. 1. Use a 64-bit segment number, instead of the log/seg combination. And don't waste the last segment on each logical 4 GB log file. The concept of a logical log file is now completely gone. XLogRecPtr is unchanged, but it should now be understood as a plain 64-bit value, just split into two 32-bit integers for historical reasons. On disk, this means that there will be log files ending in FF, those were skipped before. Whats the reason for keeping that awkward split now? There aren't that many users of xlogid/xcrecoff and many of those would be better served by using helper macros. API compatibility isn't a great argument either as code manually playing around with those needs to be checked anyway. I think there might be some code around that does XLogRecPtr addition manuall and such. 2. Always include the xl_rem_len field, used for continuation records, in the xlog page header. A continuation log record only contained that one field, it's now included straight in the page header, so the concept of a continuation record doesn't exist anymore. Because of alignment, this wastes 4 bytes on every page that contains continued data from a previous record, and 8 bytes on pages that don't. That's not very much, and the next step will buy that back: 3. Allow WAL record header to be split across pages. Per Tom's suggestion, move xl_tot_len to be the first field in XLogRecord, so that even if the header is split, xl_tot_len is always on the first page. xl_crc is moved to be the last field, and xl_prev is the second to last. This has the advantage that you can calculate the CRC for all the other fields before acquiring WALInsertLock. For xl_prev, you need to know where exactly the record is inserted, so it's handy that it's the last field before CRC. This patch doesn't try to take advantage of that, however, and I'm not sure if that makes any difference once I finish the patch to make XLogInsert scale better, which is the ultimate goal of all this. Those are the three patches I'd like to get committed in this commitfest. To see where all this is leading to, I've included a rough WIP version of the XLogInsert scaling patch. This version is quite different from the one I posted in spring, it takes advantage of the WAL format changes, and I'm also experimenting with a different method of tracking how far each WAL insertion has progressed. But more on that later. (Note to self: remember to bump XLOG_PAGE_MAGIC) Will review. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers