On Tuesday, May 01, 2012 05:06:11 PM Robert Haas wrote: > On Tue, May 1, 2012 at 10:31 AM, Andres Freund <and...@anarazel.de> wrote: > >> efficient than our current method - I'm guessing that it actually > >> writes the updated metadata back to disk, where write() does not (this > >> makes one wonder how safe it is to count on write to have the behavior > >> we need here in the first place). > > > > Currently the write() doesn't need to be crashsafe because it will be > > repeated on crash-recovery and a checkpoint will fsync the file. > > That's not what I'm worried about. If the write() succeeds and then a > subsequent close() on the filehandle reports an ENOSPC condition that > means the write didn't really write after all, I am concerned that we > might not handle that cleanly. Hm. While write() might not write its state to disk I don't think that can imply than that the *in memory* state is inconsistent. Posix doesn't allow ENOSPC for close() as far as I can see.
> > I don't really see why it would need to compare in the 8kb case. What > > reason would there be to further extend in that small increments? > In previous discussions, the concern has been that holding the > relation extension lock across a multi-block extension would cause > latency spikes for both the process doing the extensions and any other > concurrent processes that need the lock. Obviously if it were > possible to extend by 64kB in the same time it takes to extend by 8kB > that would be awesome, but if it takes eight times longer then things > don't look so good. Yes, sure. > > There is the question whether this should be done in the background > > though, so the relation extension lock is never hit in anything > > time-critical... > Yeah, although I'm fuzzy on how and whether that can be made to work, > which is not to say that it can't. The biggest problem I see is knowing when to trigger the extension of which file without scanning files all the time. Using some limited size shm-queue of {reltblspc, relfilenode} of to-be- extended files + a latch is the first thing I can think of. Every time a backend initializes a page with offset % EXTEND_SIZE == 0 it adds that table to the queue. The background writer extends the file by EXTEND_SIZE * 2 if necessary. If the queue is overflown all files are checked. Or the backends extend themselves again... EXTEND_SIZE should probably scale with the table size up to 64MB or so... > It might also be interesting to provide a mechanism to pre-extend a > relation to a certain number of blocks, though if we did that we'd > have to make sure that autovac got the memo not to truncate those > pages away again. Hm. I have to say I don't really see a big need to do this if the size of preallocation is adaptive to the file size. Sounds like it would add to much complications for little benefit. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers