Robert, For not understanding me, we seem to be in violent agreement. ;)
* Robert Haas (robertmh...@gmail.com) wrote: > I think you might be confused, or else I'm confused, because I don't > believe we have any such thing as an extent lock. The relation extension lock is what I was referring to. Apologies for any confusion there. > What we do have is > a relation extension lock, but the size of the segment on disk has > nothing to do with that: there's only one for the whole relation, and > you hold it when adding a block to the relation. Yes, which is farrr too small. I'm certainly aware that the segments on disk are dealt with in the storage layer- currently. My proposal was to consider how we might change that, a bit, to allow improved throughput when there are multiple writers. Consider this, for example- when we block on the relation extension lock, rather than sit and wait or continue to compete with the other threads, simply tell the storage layer to give us a dedicated file to work with. Once we're ready to commit, move that file into place as the next segment (through some command to the storage layer), using an atomic command to ensure that it either works and doesn't overwrite anything, or fails and we try again by moving the segment number up. We would need to work out, at the storage layer, how to handle cases where the file is less than 1G and realize that we should just skip over those blocks on disk as being known-to-be-empty. Those blocks would also be then put on the free space map and used for later processes which need to find somewhere to put new data, etc. > But that having been said, it just so happens that I was recently > playing around with ways of trying to fix the relation extension > bottleneck. One thing I tried was: every time a particular backend > extends the relation, it extends the relation by more than 1 block at > a time before releasing the relation extension lock. Right, exactly. One idea that I was discussing w/ Greg was to do this using some log(relation-size) approach or similar. > This does help... > but at least in my tests, extending by 2 blocks instead of 1 was the > big winner, and after that you didn't get much further relief. How many concurrent writers did you have and what kind of filesystem was backing this? Was it a temp filesystem where writes are essentially to memory, causing this relation extention lock to be much more contentious? > Another thing I tried was pre-extending the relation to the estimated > final size. That worked a lot better, and might be worth doing (e.g. > ALTER TABLE zorp SET MINIMUM SIZE 1GB) but a less manual solution > would be preferable if we can come up with one. Slightly confused here- above you said that '2' was way better than '1', but you implied that "more than 2 wasn't really much better"- yet "wayyy more than 2 is much better"? Did I follow that right? I can certainly understand such a case, just want to understand it and make sure it's what you meant. What "small-number" options did you try? > After that, I ran out of time for investigation. Too bad! Thanks much for the work in this area, it'd really help if we could improve this for our data warehouse, in particular, users. Thanks! Stephen
signature.asc
Description: Digital signature