On Thu, Jan 09, 2020 at 07:40:12PM -0500, Tom Lane wrote:
I wrote:
ReorderBuffer: 223302560 total in 26995 blocks; 7056 free (3 chunks);
223295504 used
The test case is only inserting 50K fairly-short rows, so this seems
like an unreasonable amount of memory to be consuming for that; and
even if you think it's reasonable, it clearly isn't going to scale
to large production transactions.
Now, the good news is that v11 and later get through
006_logical_decoding.pl just fine under the same restriction.
So we did something in v11 to fix this excessive memory consumption.
However, unless we're willing to back-port whatever that was, this
test case is clearly consuming excessive resources for the v10 branch.
I dug around a little in the git history for backend/replication/logical/,
and while I find several commit messages mentioning memory leaks and
faulty spill logic, they all claim to have been back-patched as far
as 9.4.
It seems reasonably likely to me that this result is telling us about
an actual bug, ie, faulty back-patching of one or more of those fixes
into v10 and perhaps earlier branches.
I don't know this code well enough to take point on looking for the
problem, though.
Well, one thing we did in 11 is introduction of the Generation context.
In 10 we're still stashing all tuple data into the main AllocSet. I
wonder if backporting a4ccc1cef5a04cc054af83bc4582a045d5232cb3 and a
couple of follow-up fixes would make the issue go away.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services