On Thu, 2011-07-28 at 17:10 +0200, Hannu Krosing wrote: > On Thu, 2011-07-28 at 10:45 -0400, Tom Lane wrote: > > Hannu Krosing <ha...@2ndquadrant.com> writes: > > > On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote: > > >> I'm confused by this, because I don't think any of this can be done > > >> when we insert the commit record into the WAL stream. > > > > > The update to stored snapshot needs to happen at the moment when the WAL > > > record is considered to be "on stable storage", so the "current > > > snapshot" update presumably can be done by the same process which forces > > > it to stable storage, with the same contention pattern that applies to > > > writing WAL records, no ? > > > > No. There is no reason to tie this to fsyncing WAL. For purposes of > > other currently-running transactions, the commit can be considered to > > occur at the instant the commit record is inserted into WAL buffers. > > If we crash before that makes it to disk, no problem, because nothing > > those other transactions did will have made it to disk either. > > Agreed. Actually figured it out right after pushing send :) > > > The > > advantage of defining it that way is you don't have weirdly different > > behaviors for sync and async transactions. > > My main point was, that we already do synchronization when writing wal, > why not piggyback on this to also update latest snapshot .
So the basic design could be "a sparse snapshot", consisting of 'xmin, xmax, running_txids[numbackends] where each backend manages its own slot in running_txids - sets a txid when aquiring one and nulls it at commit, possibly advancing xmin if xmin==mytxid. as xmin update requires full scan of running_txids, it is also a good time to update xmax - no need to advance xmax when "inserting" your next txid, so you don't need to locak anything at insert time. the valid xmax is still computed when getting the snapshot. hmm, probably no need to store xmin and xmax at all. it needs some further analysis to figure out, if doing it this way without any locks can produce any relevantly bad snapshots. maybe you still need one spinlock + memcpy of running_txids to local memory to get snapshot. also, as the running_txids array is global, it may need to be made even sparser to minimise cache-line collisions. needs to be a tuning decision between cache conflicts and speed of memcpy. > > > -- > ------- > Hannu Krosing > PostgreSQL (Infinite) Scalability and Performance Consultant > PG Admin Book: http://www.2ndQuadrant.com/books/ > > -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers