On Mon, Jul 18, 2016 at 10:19 AM, Greg Stark <st...@mit.edu> wrote: > On Sun, Jul 17, 2016 at 1:55 PM, Robert Haas <robertmh...@gmail.com> wrote: >>On Wed, Jul 13, 2016 at 4:39 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> I wonder whether we could compromise by reducing the minimum "standard >>> chunk header" to be just a pointer to owning context, with the other >>> fields becoming specific to particular mcxt implementations. >> >> I think that would be worth doing. It's not perfect, and the extra 8 >> (or 4) bytes per chunk certainly do matter. > > I wonder if we could go further. If we don't imagine having a very > large number of allocators then we could just ask each one in turn if > this block is one of theirs and which context it came from. That would > allow an allocator that just allocated everything in a contiguous > block to recognize pointers and return the memory context just by the > range the pointer lies in. > > There could be optimizations like if the leading points point to a > structure with a decently large magic number then assume it's a valid > header to avoid the cost of checking with lots of allocators. But I'm > imagining that the list of allocators in use concurrenlty will be > fairly small so it might not even be necessary.
Well, if we were going to do this, it would make more sense to have a central registry of blocks that is shared across all memory context types, and when you see a pointer you just look it up in the chunk map and find the containing context that way. I actually did something like this when I was working on sb_alloc (see archives); it looked up whether the chunk was of the new type and, if not, it assumed it came from aset.c. To do this, it used a three-level trie (like Google's tcmalloc based on the pointer address) with, IIRC, some optimizations for the case where we repeatedly free from the same chunk. That slowed down pfree noticeably, though. The problem is that it's pretty well impossible to do something that is as cheap as fetching a memory context pointer from the chunk header, chasing a pointer or two from there, and then jumping. That's just really cheap. The test case I used previously was an external sort, which does lots of retail pfrees. Now that we've mostly abandoned replacement selection, there will be many fewer pfrees while building runs, I think, but still quite a few while merging runs. Now it might be the case that if the allocating is fast enough and we save a bunch of memory, spending a few additional cycles freeing things is no big deal. It might also be the case that this is problematic in a few cases but that we can eliminate those cases. It's likely to take some work, though. Anyway, my point is really that doing what Tom proposes would be a good first step and probably would not involve much risk. And with that change, any new allocator that has some extrinsic way of determing chunk sizes can save 8 bytes per chunk, which is certainly meaningful for any context that does lots of allocations. If somebody writes a new allocator that has good performance characteristics and uses 400MB of memory on some test where aset.c would have used 535MB of memory, and we can compute that with no chunk headers at all would use only 346MB of memory, then we can talk about how to get there. Doing things stepwise is good. One random thought is that we could have a memory allocator altogether separate from palloc/pfree. For example, suppose we introduce qalloc/qfree. A chunk of memory allocated with qalloc can only be freed with qfree, not pfree. In this world, you're completely freed from the requirement to care about chunk headers in any form. The downside, of course, is that you can't let qalloc'd tuples escape into the executor, say, because it only knows how to pfree things, not how to qfree things. But there are plenty of places where you can easily see that allocations won't escape outside the module where they are performed; in fact, there are many cases where it would be easy to pass the memory context *as an argument to the free function* -- which would presumably be quite advantageous for any allocator that doesn't store a pointer to the context in the chunk header. I'm sure there will be some reluctance to go down these kinds of paths because it is undeniably convenient from a programmer's perspective to be able to just say pfree and forget the details, but I believe there will be some cases - especially for contexts that hold lots of allocations - where the performance gains from these kinds of techniques are quite measurable on macrobenchmarks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers