Mat Arye <m...@timescale.com> writes: > I found a memory leak in plpython3u. It happens when converting the > argument for an spi plan execution (via plpy.execute() or similar). I've > attached a sql file to create two plpython3u functions to reproduce the > issue (memory_leak_test.sql).
I see the leak all right, and your patch does seem to fix it, so I believe your diagnosis that PLy_output_convert() is what's leaking. > I believe the issue is in the call to `PLy_output_convert` inside > `PLy_spi_execute_plan`. I've attached a patch that reduces the memory usage > to ~70MB from ~720 MB when using test_mem. I based what I did on what > `PLy_input_convert` does. But I am not an expert in this part of the code > so I am not sure the patch is correct. In particular, I don't fully grasp > the comment in `PLy_input_convert` about why the scratch is reset before > not after the conversion cycle and what that has to do with python > refcounts. A review by an expert would be appreciated. The corresponding concern here would be that some pass-by-reference conversion result Datum could come back verbatim as an output of SPI_execute_plan, and we mustn't reset the scratch context again before we're done with using the Datum. Sadly, I don't think it will work if that happens, because what we do as soon as SPI_execute_plan finishes is to call PLy_spi_execute_fetch_result, and that calls PLy_input_from_tuple, which is one of the things that will zap the scratch context. So this doesn't feel very safe ... and indeed it falls over in plpython's existing regression tests. However, I don't really see why we need to use that scratch context. PLy_spi_execute_plan runs a subtransaction, so we could perfectly well use the subtransaction's CurTransactionContext. (IOW, this wouldn't even be an issue if PLy_spi_subtransaction_begin didn't forcibly switch back to oldcontext.) A couple of other points: * PLy_spi_execute_plan already has an outer-level variable named oldcontext, so this coding will draw complaints about a shadowed local variable in compilers that are picky about that. But I don't think you need the inner oldcontext variable anyway, since you know perfectly well that the previous context is the outer oldcontext. Just switch back to that after using CurTransactionContext. * Looking around at other callers of PLy_output_convert, it seems likely that PLy_cursor_plan() has this same problem. (I think we're okay though with the calls from PLy_exec_function and PLy_modify_tuple, since those could only happen right before return from the plpython function; there's no path to iterate them enough times to create a meaningful leak.) regards, tom lane