Hi, On 2023-05-24 21:56:22 +0200, Tomas Vondra wrote: > >> The really hard thing was determining what causes the memory leak - the > >> simple instrumentation doesn't help with that at all. It tells you there > >> might be a leak, but you don't know where did the allocations came from. > >> > >> What I ended up doing is a simple gdb script that sets breakpoints on > >> all palloc/pfree variants, and prints info (including the backtrace) for > >> each call on ExecutorState. And then a script that aggregate those to > >> identify which backtraces allocated most chunks that were not freed. > > > > FWIW, for things like this I found "heaptrack" to be extremely helpful. > > > > E.g. for a reproducer of the problem here, it gave me the attach "flame > > graph" > > of the peak memory usage - after attaching to a running backend and running > > an > > UPDATE triggering the leak.. > > > > Because the view I screenshotted shows the stacks contributing to peak > > memory > > usage, it works nicely to find "temporary leaks", even if memory is actually > > freed after all etc. > > > > That's a nice visualization, but isn't that useful only once you > determine there's a memory leak? Which I think is the hard problem.
So is your gdb approach, unless I am misunderstanding? The view I screenshotted shows the "peak" allocated memory, if you have a potential leak, you can see where most of the allocated memory was allocated. Which at least provides you with a good idea of where to look for a problem in more detail. > >>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a > >>> shorter > >>> lived memory context instead? Otherwise we'll just end up with the same > >>> problem in a few years. > >>> > >> > >> I agree using a shorter lived memory context would be more elegant, and > >> more in line with how we do things. But it's not clear to me why we'd > >> end up with the same problem in a few years with what the patch does. > > > > Because it sets up the pattern of manual memory management and continues to > > run the relevant code within a query-lifetime context. > > > > Oh, you mean someone might add new allocations to this code (or into one > of the functions executed from it), and that'd leak again? Yeah, true. Yes. It's certainly not obvious this far down that we are called in a semi-long-lived memory context. Greetings, Andres Freund