On Thu, Apr 25, 2019 at 7:15 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > +static UndoBuffers def_buffers[MAX_UNDO_BUFFERS]; > > +static int buffer_idx; > > > > This is a file-global variable with a very generic name that is very > > similar to a local variable name used by multiple functions within the > > file (bufidx) and no comments. Ugh. > > > Comment added.
The variable name is still bad, and the comment isn't very helpful either. First, you can't tell by looking at the name that it has anything to do with the undo_buffers variable, because undo_buffers and buffer_idx are not obviously related names. Second, it's not an index; it's a count. A count tells you how many of something you have; an index tells you which one of those you are presently thinking about. Third, the undo_buffer array is itself poorly named, because it's not an array of all the undo buffers in the world or anything like that, but rather an array of undo buffers for some particular purpose. "static UndoBuffers *undo_buffer" is about as helpful as "int integer" and I hope I don't need to explain why that isn't usually a good thing to write. Maybe prepared_undo_buffer for the array and nprepared_undo_buffer for the count, or something like that. > I think the simple solution will be that inside UndoRecordIsValid > function we can directly check UndoLogIsDiscarded if oldest_data is > not yet initialized, I think we don't need to release the discard lock > for that. So I have changed it like that. Can we instead eliminate the special case? It seems like the if (log->oldest_data == InvalidUndoRecPtr) case will be taken very rarely, so if it's buggy, we might not notice. > Right, from uur_prevxid we would know that for which xid's undo we are > looking for but without having uur_xid in undo record it self how we > would know which undo record is inserted by the xid we are looking > for? Because in zheap while following the undo chain and if slot got > switch, then there is possibility (because of slot reuse) that we > might get some other transaction's undo record for the same zheap > tuple, but we want to traverse back as we want to find the record > inserted by uur_prevxid. So we need uur_xid as well to tell who is > inserter of this undo record? It seems desirable to me to make this the caller's problem. When we are processing undo a transaction at a time, we'll know the XID because it will be available from the transaction header. If a system like zheap maintains a pointer to an undo record someplace in the middle of a transaction, it can also store the XID if it needs it. The thing is, the zheap code almost works that way already. Transaction slots within a page store both the undo record pointer and the XID. The only case where zheap doesn't store the undo record pointer and the XID is when a slot switch occurs, but that could be changed. If we moved urec_xid into UndoRecordTransaction, we'd save 4 bytes per undo record across the board. When zheap emits undo records for updates or deletes, they would need store an UndoRecPtr (8 bytes) + FullTransactionId (8 bytes) in the payload unless the previous change to that TID is all-visible or the previous change to that TID was made by the same transaction. Also, zheap would no longer need to store the slot number in the payload in any case, because this would substitute for that (and permit more efficient lookups, to boot). So the overall impact on zheap update and delete records would be somewhere between -4 bytes (when we save the space used by XID and incur no other cost) and +12 bytes (when we lose the XID but gain the UndoRecPtr + FullTransactionId). That worst case could be further optimized. For example, instead of storing a FullTransactionId, zheap could store the difference between the XID to which the current record pertains (which in this model the caller is required to know) and the XID of whoever last modified the tuple. That difference certainly can't exceed 4 billion (or even 2 billion) so 4 bytes is enough. That reduces the worst-case difference to +8 bytes. Probably zheap could use payloads with some kind of variable-length encoding and squeeze out even more in common cases, but I'm not sure that's necessary or worth the complexity. Let's also give uur_blkprev its own UREC_INFO_* constant and omit it when this is the first time we're touching this block in this transaction and thus the value is InvalidUndoRecPtr. In the pretty-common case where a transaction updates one tuple on the page and never comes back, this - together with the optimization in the previous paragraph - will cause zheap to come out even on undo, because it'll save 4 bytes by omitting urec_xid and 8 bytes by omitting uur_blkrev, and it'll lose 8 bytes storing an UndoRecPtr in the payload and 4 bytes storing an XID-difference. Even with those changes, zheap's update and delete could still come out a little behind on undo volume if hitting many tuples on the same page, because for every tuple they hit after the first, we'll still need the UndoRecPtr for the previous change to that page (uur_blkprev) and we'll also have the UndoRecPtr extracted from the tuple's previous slot, store in the payload. So we'll end up +8 bytes in this case. I think that's acceptable, because it often won't happen, it's hardly catastrophic if it does, and saving 4 bytes on every insert, and on every update or delete where the old undo is already discarded is pretty sweet. > Yeah so we can directly jump to the header which is not yet completely > read but if any of the header is partially read then we need to > maintain some kind of partial read variable otherwise from 'already > read' we wouldn't be able to know how many bytes of the header got > read in last call unless we calculate that from uur_info or maintain > the partial_read in context like I have done in the new version. Right, we need to know the bytes already read for the next header. > Note: > - I think the ucontext->stage are same for the insert and DECODE can > we just declare only > one enum and give some generic name e.g. UNDO_PROCESS_STAGE_HEADER ? I agree. Maybe UNDO_PACK_STAGE_WHATEVER or, more briefly, UNDO_PACK_WHATEVER. > - In SkipInsertingUndoData also I have to go through all the stages so > that if we find some valid > block then stage is right for inserting the partial record? Do you > think I could have avoided that? Hmm, I didn't foresee that, but no, I don't think you can avoid that. That problem wouldn't occur before we added the stage stuff, since we'd just go through all the stages every time and each one would know its own size and do nothing if that number of bytes had already been passed, but with this design there seems to be no way around it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company