Robert Haas <robertmh...@gmail.com> writes: > It would be nice if the Linux guys would fix this problem for us, but > I'm not sure whether they will. For those who may be curious, the > problem is in generic_file_llseek() in fs/read-write.c. On a platform > with 8-byte atomic reads, it seems like it ought to be very possible > to read inode->i_size without taking a spinlock. A little Googling > around suggests that some patches along these lines have been proposed > and - for reasons that I don't fully understand - rejected. That now > seems unfortunate. Barring a kernel-level fix, we could try to > implement our own cache to work around this problem. However, any > such cache would need to be darn cheap to check and update (since we > can't assume that relation extension is an infrequent event) and must > somehow having the same sort of mutex contention that's killing the > kernel in this workload.
What about making the relation extension much less frequent? It's been talked about before here, that instead of extending 8kB at a time we could (should) extend by much larger chunks. I would go as far as preallocating the whole next segment (1GB) (in the background) as soon as the current is more than half full, or such a policy. Then you have the problem that you can't really use lseek() anymore to guess'timate a relation size, but Tom said in this thread that the planner certainly doesn't need something that accurate. Maybe the reltuples would do? If not, it could be that some adapting of its accuracy could be done? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers