Re: [HACKERS] Reducing size of WAL record headers
On 1/10/13 6:14 PM, Simon Riggs wrote: On 10 January 2013 20:13, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Wed, Jan 9, 2013 at 05:06:49PM -0500, Tom Lane wrote: Let's wait till we see where the logical rep stuff ends up before we worry about saving 4 bytes per WAL record. Well, we have wal_level to control the amount of WAL traffic. That's entirely irrelevant. The point here is that we'll need more bits to identify what any particular record is, unless we make a decision that we'll have physically separate streams for logical replication info, which doesn't sound terribly attractive; and in any case no such decision has been made yet, AFAIK. You were right to say that this is less important than logical replication. I don't need any more reason than that to stop talking about it. I have a patch for this, but as yet no way to submit it while at the same time saying put this at the back of the queue. Anything ever come of this? -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On Wed, Jan 9, 2013 at 05:06:49PM -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Overall, the WAL record is MAXALIGN'd, so with 8 byte alignment we waste 4 bytes per record. Or put another way, if we could reduce record header by 4 bytes, we would actually reduce it by 8 bytes per record. So looking for ways to do that seems like a good idea. I think this is extremely premature, in view of the ongoing discussions about shoehorning logical replication and other kinds of data into the WAL stream. It seems quite likely that we'll end up eating some of that padding space to support those features. So whacking a lot of code around in service of squeezing the existing padding out could very easily end up being wasted work, in fact counterproductive if it degrades either code readability or robustness. Let's wait till we see where the logical rep stuff ends up before we worry about saving 4 bytes per WAL record. Well, we have wal_level to control the amount of WAL traffic. It is hard to imagine we are going to want to ship logical WAL information by default, so most people will not be using logical WAL and would see a benefit from an optimized WAL stream? What percentage is 8-bytes in a typical WAL record? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
Bruce Momjian br...@momjian.us writes: On Wed, Jan 9, 2013 at 05:06:49PM -0500, Tom Lane wrote: Let's wait till we see where the logical rep stuff ends up before we worry about saving 4 bytes per WAL record. Well, we have wal_level to control the amount of WAL traffic. That's entirely irrelevant. The point here is that we'll need more bits to identify what any particular record is, unless we make a decision that we'll have physically separate streams for logical replication info, which doesn't sound terribly attractive; and in any case no such decision has been made yet, AFAIK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On 10 January 2013 20:13, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Wed, Jan 9, 2013 at 05:06:49PM -0500, Tom Lane wrote: Let's wait till we see where the logical rep stuff ends up before we worry about saving 4 bytes per WAL record. Well, we have wal_level to control the amount of WAL traffic. That's entirely irrelevant. The point here is that we'll need more bits to identify what any particular record is, unless we make a decision that we'll have physically separate streams for logical replication info, which doesn't sound terribly attractive; and in any case no such decision has been made yet, AFAIK. You were right to say that this is less important than logical replication. I don't need any more reason than that to stop talking about it. I have a patch for this, but as yet no way to submit it while at the same time saying put this at the back of the queue. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On 09.01.2013 22:36, Simon Riggs wrote: Overall, the WAL record is MAXALIGN'd, so with 8 byte alignment we waste 4 bytes per record. Or put another way, if we could reduce record header by 4 bytes, we would actually reduce it by 8 bytes per record. So looking for ways to do that seems like a good idea. Agreed. The WAL record header starts with xl_tot_len, a 4 byte field. There is also another field, xl_len. The difference is that xl_tot_len includes the header, xl_len and any backup blocks. Since the header is fixed, the only time xl_tot_len != SizeOfXLogRecord + xl_len is when we have backup blocks. We can re-arrange the record layout so that we remove xl_tot_len and add another (maxaligned) 4 byte field (-- 8 bytes) after the record header, xl_bkpblock_len that only exists if we have backup blocks. This will then save 8 bytes from every record that doesn't have backup blocks, and be the same as now with backup blocks. Here's a better idea: Let's keep xl_tot_len as it is, but move xl_len at the very end of the WAL record, after all the backup blocks. If there are no backup blocks, xl_len is omitted. Seems more robust to keep xl_tot_len, so that you require less math to figure out where one record ends and where the next one begins. Forcing the XLogRecord header to be all on one page makes the format more robust and simplifies the code that copes with header wrapping. -1 on that. That would essentially revert the changes I made earlier. The purpose of allowing the header to be wrapped was that you could easily calculate ahead of time exactly how much space a WAL record takes. My motivation for that was the XLogInsert scaling patch. Now, I admit I haven't had a chance to work further on that patch, so we're not gaining much from the format change at the moment. Nevertheless, I don't want us to get back to the situation that you sometimes need to add padding to the end of a WAL page. My suggestion above to keep xl_tot_len and remove xl_len from XLogRecord doesn't have a problem with crossing page boundaries. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On Wed, Jan 9, 2013 at 10:54:33PM +0200, Heikki Linnakangas wrote: On 09.01.2013 22:36, Simon Riggs wrote: Overall, the WAL record is MAXALIGN'd, so with 8 byte alignment we waste 4 bytes per record. Or put another way, if we could reduce record header by 4 bytes, we would actually reduce it by 8 bytes per record. So looking for ways to do that seems like a good idea. Agreed. The WAL record header starts with xl_tot_len, a 4 byte field. There is also another field, xl_len. The difference is that xl_tot_len includes the header, xl_len and any backup blocks. Since the header is fixed, the only time xl_tot_len != SizeOfXLogRecord + xl_len is when we have backup blocks. We can re-arrange the record layout so that we remove xl_tot_len and add another (maxaligned) 4 byte field (-- 8 bytes) after the record header, xl_bkpblock_len that only exists if we have backup blocks. This will then save 8 bytes from every record that doesn't have backup blocks, and be the same as now with backup blocks. Here's a better idea: Let's keep xl_tot_len as it is, but move xl_len at the very end of the WAL record, after all the backup blocks. If there are no backup blocks, xl_len is omitted. Seems more robust to keep xl_tot_len, so that you require less math to figure out where one record ends and where the next one begins. OK, crazy idea, but can we just record xl_len as a difference against xl_tot_len, and shorten the xl_len field? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On 09.01.2013 22:59, Bruce Momjian wrote: On Wed, Jan 9, 2013 at 10:54:33PM +0200, Heikki Linnakangas wrote: On 09.01.2013 22:36, Simon Riggs wrote: The WAL record header starts with xl_tot_len, a 4 byte field. There is also another field, xl_len. The difference is that xl_tot_len includes the header, xl_len and any backup blocks. Since the header is fixed, the only time xl_tot_len != SizeOfXLogRecord + xl_len is when we have backup blocks. We can re-arrange the record layout so that we remove xl_tot_len and add another (maxaligned) 4 byte field (-- 8 bytes) after the record header, xl_bkpblock_len that only exists if we have backup blocks. This will then save 8 bytes from every record that doesn't have backup blocks, and be the same as now with backup blocks. Here's a better idea: Let's keep xl_tot_len as it is, but move xl_len at the very end of the WAL record, after all the backup blocks. If there are no backup blocks, xl_len is omitted. Seems more robust to keep xl_tot_len, so that you require less math to figure out where one record ends and where the next one begins. OK, crazy idea, but can we just record xl_len as a difference against xl_tot_len, and shorten the xl_len field? Hmm, so it would essentially be the length of all the backup blocks. perhaps rename it to xl_bkpblk_len. However, that would cap the total size of backup blocks to 64k. Which would not be enough with 32k BLCKSZ. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On 9 January 2013 21:02, Heikki Linnakangas hlinnakan...@vmware.com wrote: OK, crazy idea, but can we just record xl_len as a difference against xl_tot_len, and shorten the xl_len field? Hmm, so it would essentially be the length of all the backup blocks. perhaps rename it to xl_bkpblk_len. However, that would cap the total size of backup blocks to 64k. Which would not be enough with 32k BLCKSZ. Since that requires a recompile anyway, why not make XLogRecord smaller only for 16k BLCKSZ or less? Problem if we do that is that xl_len is used extensively in _redo routines, so its a much more invasive patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On 9 January 2013 20:54, Heikki Linnakangas hlinnakan...@vmware.com wrote: Here's a better idea: Let's keep xl_tot_len as it is, but move xl_len at the very end of the WAL record, after all the backup blocks. If there are no backup blocks, xl_len is omitted. Seems more robust to keep xl_tot_len, so that you require less math to figure out where one record ends and where the next one begins. OK, I avoided tampering with xl_len cos its so widely used. Will look. Forcing the XLogRecord header to be all on one page makes the format more robust and simplifies the code that copes with header wrapping. -1 on that. That would essentially revert the changes I made earlier. OK, idea dropped. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
On Wed, Jan 9, 2013 at 09:15:16PM +, Simon Riggs wrote: On 9 January 2013 21:02, Heikki Linnakangas hlinnakan...@vmware.com wrote: OK, crazy idea, but can we just record xl_len as a difference against xl_tot_len, and shorten the xl_len field? Hmm, so it would essentially be the length of all the backup blocks. perhaps rename it to xl_bkpblk_len. However, that would cap the total size of backup blocks to 64k. Which would not be enough with 32k BLCKSZ. Since that requires a recompile anyway, why not make XLogRecord smaller only for 16k BLCKSZ or less? Problem if we do that is that xl_len is used extensively in _redo routines, so its a much more invasive patch. I would just make it int16 on =16k block size, and int32 on 16k blocks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing size of WAL record headers
Simon Riggs si...@2ndquadrant.com writes: Overall, the WAL record is MAXALIGN'd, so with 8 byte alignment we waste 4 bytes per record. Or put another way, if we could reduce record header by 4 bytes, we would actually reduce it by 8 bytes per record. So looking for ways to do that seems like a good idea. I think this is extremely premature, in view of the ongoing discussions about shoehorning logical replication and other kinds of data into the WAL stream. It seems quite likely that we'll end up eating some of that padding space to support those features. So whacking a lot of code around in service of squeezing the existing padding out could very easily end up being wasted work, in fact counterproductive if it degrades either code readability or robustness. Let's wait till we see where the logical rep stuff ends up before we worry about saving 4 bytes per WAL record. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers