On Sat, Apr 7, 2018 at 8:27 PM, Craig Ringer <cr...@2ndquadrant.com> wrote: > More below, but here's an idea #5: decide InnoDB has the right idea, and go > to using a single massive blob file, or a few giant blobs. > > We have a storage abstraction that makes this way, way less painful than it > should be. > > We can virtualize relfilenodes into storage extents in relatively few big > files. We could use sparse regions to make the addressing more convenient, > but that makes copying and backup painful, so I'd rather not. > > Even one file per tablespace for persistent relation heaps, another for > indexes, another for each fork type.
I'm not sure that we can do that now, since it would break the new "Optimize btree insertions for common case of increasing values" optimization. (I did mention this before it went in.) I've asked Pavan to at least add a note to the nbtree README that explains the high level theory behind the optimization, as part of post-commit clean-up. I'll ask him to say something about how it might affect extent-based storage, too. -- Peter Geoghegan