On 09/13/2017 07:53 AM, Andrey Borodin wrote: >> * I see there are conditions like this: >> >> if(xlogreader->blocks[nblock].forknum == MAIN_FORKNUM) >> >> Why is it enough to restrict the block-tracking code to main fork? >> Aren't we interested in all relation forks? > fsm, vm and others are small enough to take them >
That seems like an optimization specific to your backup solution, not necessarily to others and/or to other possible use cases. >> I guess you'll have to explain >> what the implementation of the hooks is supposed to do, and why these >> locations for hook calls are the right ones. It's damn impossible to >> validate the patch without that information. >> >> Assuming you still plan to use the hook approach ... > Yes, I still think hooking is good idea, but you are right - I need > prototype first. I'll mark patch as Returned with feedback before > prototype implementation. > OK >> >>>> There >>>> are no arguments fed to this hook, so modules would not be able to >>>> analyze things in this context, except shared memory and process >>>> state? >>> >>>> >>>> Those hooks are put in hot code paths, and could impact performance of >>>> WAL insertion itself. >>> I do not think sending few bytes to cached array is comparable to disk >> write of XLog record. Checking the func ptr is even cheaper with correct >> branch prediction. >>> >> >> That seems somewhat suspicious, for two reasons. Firstly, I believe we >> only insert the XLOG records into WAL buffer here, so why should there >> be any disk write related? Or do you mean the final commit? > Yes, I mean finally we will be waiting for disk. Hundred empty ptr > checks are neglectable in comparision with disk. Aren't we doing these calls while holding XLog locks? IIRC there was quite a significant performance improvement after Heikki reduced the amount of code executed while holding the locks. >> >> But more importantly, doesn't this kind of information require some >> durability guarantees? I mean, if it gets lost during server crashes or >> restarts, doesn't that mean the incremental backups might miss some >> buffers? I'd guess the hooks will have to do some sort of I/O, to >> achieve that, no? > We need durability only on the level of one segment. If we do not have > info from segment we can just rescan it. > If we send segment to S3 as one file, we are sure in it's integrity. But > this IO can by async. > > PTRACK in it's turn switch bits in fork's buffers which are written in > checkpointer and..well... recovered during recovery. By usual WAL replay > of recovery. > But how do you do that from the hooks, if they only store the data into a buffer in memory? Let's say you insert ~8MB of WAL into a segment, and then the system crashes and reboots. How do you know you have incomplete information from the WAL segment? Although, that's probably what wal_switch_hook() might do - sync the data whenever the WAL segment is switched. Right? > >> From this POV, the idea to collect this information on the backup system >> (WAL archive) by pre-processing the arriving WAL segments seems like the >> most promising. It moves the work to another system, the backup system >> can make it as durable as the WAL segments, etc. > > Well, in some not so rare cases users encrypt backups and send to S3. > And there is no system with CPUs that can handle that WAL parsing. > Currently, I'm considering mocking prototype for wal-g, which works > exactly this. > Why couldn't there be a system with enough CPU power? Sure, if you want to do this, you'll need a more powerful system, but regular CPUs can do >1GB/s in AES-256-GCM thanks to AES-NI. Or you could do it on the database as part of archive_command, before the encryption, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers