On Thu, Jul 19, 2012 at 10:09 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Seems a bit complex, but it might be worth it. Keep in mind that I >> eventually want to be able to make an unlogged table logged or a visca >> versa, which will probably entail unlinking just the init fork (for >> the logged -> unlogged direction). > > Well, as far as that goes, I don't see a reason why you couldn't unlink > the init fork immediately on commit. The checkpointer should not have > to be involved at all --- there's no reason to send it a FORGET FSYNC > request either, because there shouldn't be any outstanding writes > against an init fork, no?
Well, it gets written when it gets created. Some of those writes go through shared_buffers. > But having said that, this does serve as an example that we might > someday want the flexibility to kill individual forks. I was > intending to kill smgrdounlinkfork altogether, but I'll refrain. If you want to remove it, it's OK with me. We can always put it back later if it's needed. We have an SCM that allows us to revert patches. :-) > What about checking just the immediately previous entry? This would > at least fix the problem for bulk-load situations, and the cost ought > to be about negligible compared to acquiring the LWLock. Well, two things: 1. If a single bulk load is the ONLY activity on the system, or more generally if only one segment in the system is being heavily written, then that would reduce the number of entries that get added to the queue, but if you're doing two bulk loads on different tables at the same time, then it might not do much. From Greg Smith's previous comments on this topic, I understand that having two or three entries alternating in the queue is a fairly common pattern. 2. You say "fix the problem" but I'm not exactly clear what problem you think this fixes. It's true that the compaction code is a lot slower than an ordinary queue insertion, but I think it generally doesn't happen enough to matter, and when it does happen the system is generally I/O bound anyway, so who cares? One possible argument in favor of doing something along these lines is that it would reduce the amount of data that the checkpointer would have to copy while holding the lock, thus causing less disruption for other processes trying to insert into the request queue. But I don't know whether that effect is significant enough to matter. > I have also been wondering about de-duping on the backend side, but > the problem is that if a backend remembers its last few requests, > it doesn't know when that cache has to be cleared because of a new > checkpoint cycle starting. We could advertise the current cycle > number in shared memory, but you'd still need to take a lock to > read it. (If we had memory fence primitives it could be a bit > cheaper, but I dunno how much.) Well, we do have those, as of 9.2. There not being used for anything yet, but I've been looking for an opportunity to put them into use. sinvaladt.c's msgnumLock is an obvious candidate, but the 9.2 changes to reduce the impact of sinval synchronization work sufficiently well that I haven't been motivated to tinker with it any further. Maybe it would be worth doing just to exercise that code, though. Or, maybe we can use them here. But after some thought I can't see exactly how we'd do it. Memory barriers prevent a value from being prefetched too early or written back to main memory too late, relative to other memory operations by the same process, but the definition of "too early" and "too late" is not quite clear to me here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers