On Wed, Sep 18, 2024 at 8:55 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Sep 19, 2024 at 6:46 AM David Rowley <dgrowle...@gmail.com> wrote: > > > > On Thu, 19 Sept 2024 at 11:54, Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > I've done some benchmark tests for three different code bases with > > > different test cases. In short, reducing the generation memory context > > > block size to 8kB seems to be promising; it mitigates the problem > > > while keeping a similar performance. > > > > Did you try any sizes between 8KB and 8MB? 1000x reduction seems > > quite large a jump. There is additional overhead from having more > > blocks. It means more malloc() work and more free() work when deleting > > a context. It would be nice to see some numbers with all powers of 2 > > between 8KB and 8MB. I imagine the returns are diminishing as the > > block size is reduced further. > > > > Good idea.
Agreed. I've done other benchmarking tests while changing the memory block sizes from 8kB to 8MB. I measured the execution time of logical decoding of one transaction that inserted 10M rows. I set logical_decoding_work_mem large enough to avoid spilling behavior. In this scenario, we allocate many memory chunks while decoding the transaction and resulting in calling more malloc() in smaller memory block sizes. Here are results (an average of 3 executions): 8kB: 19747.870 ms 16kB: 19780.025 ms 32kB: 19760.575 ms 64kB: 19772.387 ms 128kB: 19825.385 ms 256kB: 19781.118 ms 512kB: 19808.138 ms 1MB: 19757.640 ms 2MB: 19801.429 ms 4MB: 19673.996 ms 8MB: 19643.547 ms Interestingly, there were no noticeable differences in the execution time. I've checked the number of allocated memory blocks in each case and more blocks are allocated in smaller block size cases. For example, when the logical decoding used the maximum memory (about 1.5GB), we allocated about 80k blocks in 8kb memory block size case and 80 blocks in 8MB memory block cases. It could have different results in different environments. I've attached the patch that I used to change the memory block size via GUC. It would be appreciated if someone also could do a similar test to see the differences. > > > Another alternative idea would be to defragment transactions with a > > large number of changes after they grow to some predefined size. > > Defragmentation would just be a matter of performing > > palloc/memcpy/pfree for each change. If that's all done at once, all > > the changes for that transaction would be contiguous in memory. If > > you're smart about what the trigger point is for performing the > > defragmentation then I imagine there's not much risk of performance > > regression for the general case. For example, you might only want to > > trigger it when MemoryContextMemAllocated() for the generation context > > exceeds logical_decoding_work_mem by some factor and only do it for > > transactions where the size of the changes exceeds some threshold. > > Interesting idea. > After collecting the changes that exceed 'logical_decoding_work_mem', > one can choose to stream the transaction and free the changes to avoid > hitting this problem, however, we can use that or some other constant > to decide the point of defragmentation. The other point we need to > think in this idea is whether we actually need any defragmentation at > all. This will depend on whether there are concurrent transactions > being decoded. This would require benchmarking to see the performance > impact. > The fact that we're using rb->size and txn->size to choose the transactions to evict could make this idea less attractive. Even if we defragment transactions, rb->size and txn->size don't change. Therefore, it doesn't mean we can avoid evicting transactions due to full of logical_decoding_work_mem, but just mean the amount of allocated memory might have been reduced. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
rb_mem_block_size.patch
Description: Binary data