On 1/6/25 22:07, Jim Nasby wrote: > On Jan 2, 2025, at 9:21 PM, Tomas Vondra <to...@vondra.me> wrote: >> >>> That said, I do think a workload manager would be more effective than >>> trying to limit total connections. >> >> The "workload management" concept is so abstract I find it very >> difficult to discuss without much more detail about how would it >> actually work / be implemented. >> >> I believe implementing some rudimentary "global" memory accounting would >> not be *that* hard (possibly along the lines of the patches early in >> this thread), and adding some sort of dynamic connection limit would not >> be much harder I think. But then comes the hard part of actually doing >> the "workload management" part, which seems pretty comparable to what a >> QoS / scheduler needs to do. With all the weird corner cases. > > I’ve been saying “workload management” for lack of a better term, but my > initial suggestion upthread was to simply stop allowing new transactions > to start if global work_mem consumption exceeded some threshold. That’s > simplistic enough that I wouldn’t really consider it “workload > management”. Maybe “deferred execution” would be a better name. The only > other thing it’d need is a timeout on how long a new transaction could > sit in limbo. >
How would that not be a huge DoS vector? If a low-priority task can allocate a chunk of memory, and block/delay execution of everything else, that seems potentially disastrous. I believe there is a reason why workload management systems tend to be fairly complex systems, both to implement and manage, requiring stuff like resource pools, workload classes, etc. > I agree that no matter what everything being proposed would rely on > having metrics on actual work_mem consumption. That would definitely be > a good feature on its own. I’m thinking adding “work_mem_bytes” and > “work_mem_operations” to pg_stat_activity (where “work_mem_operations” > would tell you how many different things were using work_mem in the backend. > > Incidentally, something related to this that I’ve seen is backend memory > consumption slowly growing over time. Unbounded growth of relcache and > friends was presumably the biggest contributor. There’s an argument to > be made for a view dedicated to tracking per-backend memory stats, with > additional info about things contributing to idle memory consumption. True, but I don't quite see how would a global memory limit help with any of that. In fact, relcache seems exactly like the thing to limit at the backend level. regards -- Tomas Vondra