Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-03-02 18:45:18 +0100, Andres Freund wrote: On 2015-03-02 19:23:56 +0200, Heikki Linnakangas wrote: Pass the prepared XID as yet another argument to XactEmitCommitRecord, and have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the record too. It might even make sense to handle the prepared XID like all the other optional fields and add an xinfo flag for it. That's what I mean with non simple. Not a fan of teaching xact.c even more about twophase's dealings than it already knows. I made an effort to show how horrible it would look like. Turns out it's not that bad ;). Far from pretty, especially in xlog.c, but acceptable. I think treating them more similar actually might open the road for reducing the difference further at some point. The attached patch does pretty much what you suggested above and tries to address all the point previously made. I plan to push this fairly soon; unless somebody has further input. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 7557d0afa860c37a1992240b54e2bf3710296fc7 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 15 Mar 2015 14:42:09 +0100 Subject: [PATCH] Merge the various forms of transaction commit abort records. Since 465883b0a two versions of commit records have existed. A compact version that was used when no cache invalidations, smgr unlinks and similar were needed, and a full version that could deal with all that. Additionally the full version was embedded into twophase commit records. That resulted in a measurable reduction in the size of the logged WAL in some workloads. But more recently additions like logical decoding, which e.g. needs information about the database something was executed on, made it applicable in fewer situations. The static split generally made it hard to expand the commit record, because concerns over the size made it hard to add anything to the compact version. Additionally it's not particularly pretty to have twophase.c insert RM_XACT records. Rejigger things so that the commit and abort records only have one form each, including the twophase equivalents. The presence of the various optional (in the sense of not being in every record) pieces is indicated by a bits in the 'xinfo' flag. That flag previously was not included in compact commit records. To prevent an increase in size due to its presence, it's only included if necessary; signalled by a bit in the xl_info bits available for xact.c, similar to heapam.c's XLOG_HEAP_OPMASK/XLOG_HEAP_INIT_PAGE. Twophase commit/aborts are now the same as their normal counterparts. The original transaction's xid is included in an optional data field. This means that commit records generally are smaller, except in the case of a transaction with subtransactions, but no other special cases; the increase there is four bytes, which seems acceptable given that the more common case of not having subtransactions shrank. The savings are especially measurable for twophase commits, which previously always used the full version; but will in practice only infrequently have required that. The motivation for this work are not the space savings and and deduplication though; it's that it makes it easier to extend commit records with additional information. That's just a few lines of code now; without impacting the common case where that information is not needed. Discussion: 20150220152150.gd4...@awork2.anarazel.de, 235610.92468.qm%40web29004.mail.ird.yahoo.com Reviewed-By: Heikki Linnakangas, Simon Riggs --- src/backend/access/rmgrdesc/xactdesc.c | 245 +++- src/backend/access/transam/twophase.c| 59 +--- src/backend/access/transam/xact.c| 476 +++ src/backend/access/transam/xlog.c| 126 src/backend/replication/logical/decode.c | 133 +++-- src/include/access/xact.h| 215 ++ src/include/access/xlog_internal.h | 2 +- 7 files changed, 749 insertions(+), 507 deletions(-) diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c index 3e87978..b036b6d 100644 --- a/src/backend/access/rmgrdesc/xactdesc.c +++ b/src/backend/access/rmgrdesc/xactdesc.c @@ -14,53 +14,188 @@ */ #include postgres.h +#include access/transam.h #include access/xact.h #include catalog/catalog.h #include storage/sinval.h #include utils/timestamp.h +/* + * Parse the WAL format of a xact commit and abort records into a easier to + * understand format. + * + * This routines are in xactdesc.c because they're accessed in backend (when + * replaying WAL) and frontend (pg_xlogdump) code. This file is the only xact + * specific one shared between both. They're complicated enough that + * duplication would be bothersome. + */ + +void +ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-03-15 16:30:16 +0100, Andres Freund wrote: The attached patch does pretty much what you suggested above and tries to address all the point previously made. I plan to push this fairly soon; unless somebody has further input. Pushed, after fixing two typos in decode.c I introduced while cleaning up. This might not yet be perfect, but there seems little point in waiting further. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-03-02 19:11:15 +0200, Heikki Linnakangas wrote: Come to think of it, it would be cleaner anyway to move the XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then those structs don't need to be static either. That was my first thought as well - but it doesn't easily work because of the way commit/abort records are embedded into twophase.c. I couldn't come with a simple way to change that, and anythign non simple imo defeats the purpose. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-02-25 12:10:42 +0100, Andres Freund wrote: On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you could pass an xl_xact_parsed/abort_commit struct to them, instead of the individual fields? You could then also avoid the static variables inside it, passing pointers to that struct to XLogRegisterData() instead. Hm, that's an idea. And rename it to xaxt_commit/abort_data? So, I just tried this, and it doesn't really seem to come out as a net positive. More code at the callsites and more complex code in the *Emit* routines. It's impossible to use [FLEXIBLE_ARRAY_MEMBER] employing datatypes in xact_commit_data while emitting because there obviously are several chunks needing it. And avoid using it would make for a slightly less clear format. So unless you feel strongly about it, don't think so, I'll keep the statics, even if they're not particularly pretty. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 03/02/2015 06:51 PM, Andres Freund wrote: On 2015-02-25 12:10:42 +0100, Andres Freund wrote: On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you could pass an xl_xact_parsed/abort_commit struct to them, instead of the individual fields? You could then also avoid the static variables inside it, passing pointers to that struct to XLogRegisterData() instead. Hm, that's an idea. And rename it to xaxt_commit/abort_data? So, I just tried this, and it doesn't really seem to come out as a net positive. More code at the callsites and more complex code in the *Emit* routines. It's impossible to use [FLEXIBLE_ARRAY_MEMBER] employing datatypes in xact_commit_data while emitting because there obviously are several chunks needing it. And avoid using it would make for a slightly less clear format. So unless you feel strongly about it, don't think so, I'll keep the statics, even if they're not particularly pretty. Come to think of it, it would be cleaner anyway to move the XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then those structs don't need to be static either. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 03/02/2015 07:14 PM, Andres Freund wrote: On 2015-03-02 19:11:15 +0200, Heikki Linnakangas wrote: Come to think of it, it would be cleaner anyway to move the XLogBeginInsert() and XLogInsert() calls inside XactEmitCommitRecord. Then those structs don't need to be static either. That was my first thought as well - but it doesn't easily work because of the way commit/abort records are embedded into twophase.c. I couldn't come with a simple way to change that, and anythign non simple imo defeats the purpose. Pass the prepared XID as yet another argument to XactEmitCommitRecord, and have XactEmitCommitRecord emit the xl_xact_commit_prepared part of the record too. It might even make sense to handle the prepared XID like all the other optional fields and add an xinfo flag for it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On Fri, Feb 27, 2015 at 7:10 AM, Michael Paquier michael.paqu...@gmail.com wrote: No, no. I meant that it is good the way your patch does it in xactdesc.c, where both frontend and backend can reach it. Agreed, that seems much better than duplicating it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 2015-02-27 16:26:08 +0900, Michael Paquier wrote: On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: On 02/20/2015 05:21 PM, Andres Freund wrote: There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. pg_xlogdump outputs subxacts and such; I don't forsee other usages. Sure, we could copy the code around, but I think that's worse than having it in xactdesc.c. Needs a comment explaining why it's there if I haven't added one already. FWIW, I think they would live better in frontend code for client applications. What do you mean with that? You mean you'd rather see a copy in pg_xlogdump somewhere? How would you trigger that being used instead of the normal description routine? +/* free opcode 0x70 */ + +#define XLOG_XACT_OPMASK 0x70 There is a contradiction here, 0x70 is not free. The above is a mask, not an opcode - so I don't think it's a contraction. I'll expand on the commentary on the next version. Thanks for having a look! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On Fri, Feb 27, 2015 at 6:49 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-27 16:26:08 +0900, Michael Paquier wrote: On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: On 02/20/2015 05:21 PM, Andres Freund wrote: There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. pg_xlogdump outputs subxacts and such; I don't forsee other usages. Sure, we could copy the code around, but I think that's worse than having it in xactdesc.c. Needs a comment explaining why it's there if I haven't added one already. FWIW, I think they would live better in frontend code for client applications. What do you mean with that? You mean you'd rather see a copy in pg_xlogdump somewhere? How would you trigger that being used instead of the normal description routine? No, no. I meant that it is good the way your patch does it in xactdesc.c, where both frontend and backend can reach it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On Wed, Feb 25, 2015 at 8:10 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: On 02/20/2015 05:21 PM, Andres Freund wrote: There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. pg_xlogdump outputs subxacts and such; I don't forsee other usages. Sure, we could copy the code around, but I think that's worse than having it in xactdesc.c. Needs a comment explaining why it's there if I haven't added one already. FWIW, I think they would live better in frontend code for client applications. That's a nice patch. +1 for merging them. Here are a couple of comments: +/* Parse the WAL format of a xact abort into a easier to understand format. */ +void +ParseCommitRecord(uint8 info, xl_xact_commit *xlrec, xl_xact_parsed_commit *parsed) I think that you mean here of an xact commit, not abort. + * Emit, but don't insert, a abort record. s/a abort/an abort/ XactEmitAbortRecord has some problems with tabs replaced by 4 spaces at a couple of places. +/* free opcode 0x70 */ + +#define XLOG_XACT_OPMASK 0x70 There is a contradiction here, 0x70 is not free. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 24 February 2015 at 18:51, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/20/2015 05:21 PM, Andres Freund wrote: In the attached patch I've merged compact/noncompact code, made aborts use similar logic to avoid including useless bytes and used both for the 2pc equivalents. +1 for this approach in general. +1 also The compact commit record idea wasn't useful enough to stick with, especially in the light of logical rep, possible future extensions to the commit record and the prevalence of subtransactions. XactEmitCommitRecord and XactEmitAbortRecord are somewhat hard to understand. More comments or clearer design please. The record format itself needs more explanation than I see in the patch. Might be worth documenting when and how those fields would be populated also (my bad). It might be useful to have the optional field descriptions be named with a _opt suffix to make it clearer. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
Hi, On 2015-02-24 20:51:42 +0200, Heikki Linnakangas wrote: On 02/20/2015 05:21 PM, Andres Freund wrote: To avoid using more space in the compact case the 'xinfo' field indicating the presence of further data is only included when a byte in the xl_info flag is set (similar to what heap rmgr does). That means that transactions without subtransactions and other things are four bytes smaller than before, but ones with a subtransaction but no other complications 4 byte larger. database info, nsubxacts, nrels, nmsgs et al are also only included if required. I think that's a overall win, even disregarding wal_level = logical. xinfo is a bit underdocumented now. Yea, the whole thing needs a bit more comments. I really wanted some feedback before sinking even more time into this... The comment in xl_xact_commit should mention that it's a uint32. But actually, why is it 32 bits wide? The first reason is that it's essentially just the old 'xinfo' field moved to a different place ;). I think I'll add a xl_xact_flags or so, and document it there. Only 6 bits are used, so it would in a single byte. I guess that's because of alignment: now that it's 32 bits wide, that ensures that all the other structs are 4 byte aligned. That should be documented. Alternatively, store them unaligned to save the few bytes and use memcpy. But yes, I didn't think about changing it because of alignment. Given that commit and abort records can be fairly large, it seems better to waste three bytes in a few scenarios (not the minimal one!) than allocate another large chunk of memory for the copied data. Even in the current code there's some rather underdocumented assumptions about alignment; namely that RelFileNode+, TransactionId+ always result in a acceptable alignment for the data after them. That happens to be the case, but is worth a comment and/or a MAXALIGN. There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. pg_xlogdump outputs subxacts and such; I don't forsee other usages. Sure, we could copy the code around, but I think that's worse than having it in xactdesc.c. Needs a comment explaining why it's there if I haven't added one already. Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you could pass an xl_xact_parsed/abort_commit struct to them, instead of the individual fields? You could then also avoid the static variables inside it, passing pointers to that struct to XLogRegisterData() instead. Hm, that's an idea. And rename it to xaxt_commit/abort_data? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge compact/non compact commits, make aborts dynamically sized
On 02/20/2015 05:21 PM, Andres Freund wrote: In the attached patch I've merged compact/noncompact code, made aborts use similar logic to avoid including useless bytes and used both for the 2pc equivalents. +1 for this approach in general. To avoid using more space in the compact case the 'xinfo' field indicating the presence of further data is only included when a byte in the xl_info flag is set (similar to what heap rmgr does). That means that transactions without subtransactions and other things are four bytes smaller than before, but ones with a subtransaction but no other complications 4 byte larger. database info, nsubxacts, nrels, nmsgs et al are also only included if required. I think that's a overall win, even disregarding wal_level = logical. xinfo is a bit underdocumented now. The comment in xl_xact_commit should mention that it's a uint32. But actually, why is it 32 bits wide? Only 6 bits are used, so it would in a single byte. I guess that's because of alignment: now that it's 32 bits wide, that ensures that all the other structs are 4 byte aligned. That should be documented. Alternatively, store them unaligned to save the few bytes and use memcpy. There's one bit that I'm not so sure about though: To avoid duplication I've added Parse(Commit/Abort)Record(), but unfortunately that has to be available both in front and backend code - so it's currently living in xactdesc.c. I think we can live with that, but it's certainly not pretty. Yeah, that's ugly. Why does frontend code need that? The old format isn't exactly trivial for frontend code to decode either. Regarding XactEmitCommitRecord and XactEmitAbortRecord, I wonder if you could pass an xl_xact_parsed/abort_commit struct to them, instead of the individual fields? You could then also avoid the static variables inside it, passing pointers to that struct to XLogRegisterData() instead. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers