On Sun, Aug 13, 2017 at 9:59 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@anarazel.de> writes: >> You both are obviously right. Another point of potential concern could >> be that we'd pretyt much fully rely on dsm/dht's being available, for >> the server to function correctly. Are we ok with that? Right now >> postgres still works perfectly well, leaving parallelism aside, with >> dynamic_shared_memory_type = none. > > In principle we could keep the existing mechanism as a fallback. > Whether that's worth the trouble is debatable. The current code > in initdb believes that every platform has some type of DSM support > (see choose_dsm_implementation). Nobody's complained about that, > and it certainly works on every buildfarm animal. So for all we know, > dynamic_shared_memory_type = none is broken already.
Actually, now that you mention it, I think it *is* broken already, or more to the point, if you configure that value, the autovacuum launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro added. When I just tested it, the AV launcher somehow ended up waiting for AutovacuumLock and I had to SIGQUIT the server to shut it down. That's actually not really entirely the fault of dynamic_shared_memory_type = none, though, because the code in autovacuum.c does this: AutoVacuumDSA = dsa_create(AutovacuumLock->tranche); /* make sure it doesn't go away even if we do */ dsa_pin(AutoVacuumDSA); dsa_pin_mapping(AutoVacuumDSA); Now, that's actually really broken because if dsa_create() throws an error of any kind, you're going to have already assigned the value to AutoVacuumDSA, but you will not have pinned the DSA or the DSA mapping. There's evidently some additional bug here because I'd sorta expect this code to just go into an infinite loop in this case, failing over and over trying to reattach the segment, but evidently something even worse happening - perhaps the ERROR isn't releasing AutovacuumLock. Anyway, this code has multiple error handling defects and IMHO it's pretty questionable whether DSA should have been used here at all. Allowing the number of autovacuum work items to grow without bound would not be a good design - and if we've got a bound somewhere, why not just put this in the main shared memory segment? Leaving all that aside, when the DSM facility was introduced in 9.4, I was afraid that it would break things for people and that they'd demand a way to turn it off, and I included "none" as an option for that purpose. Well, it did break a few things initially but those seem to have mostly been fixed during the 9.4 cycle itself. I'm not aware of any subsequent problem reports caused by having DSM enabled (pointers welcome!) and given that we now have parallel query relying on DSM, people are much less likely to want to just give up on having DSM available. If somebody has a system where no other form of shared memory, works dynamic_shared_memory_type = mmap is still a thing, so the use case for "none" seems very thin indeed. I'd vote for just ripping it out in v11. >> a) It'd be quite useful to avoid needing a whole cluster's stats in >> memory. Even if $subject would save memory, I'm hesitant committing >> to something requiring all stats to be in memory forever. As a first >> step it seems reasonable to e.g. not require state for all databases >> to be in memory. The first time per-database stats are required, it >> could be "paged in". Could even be more aggressive and do that on a >> per-table level and just have smaller placeholder entries for >> non-accessed tables, but that seems more work. > > Huh? As long as we don't lock the shared memory into RAM, which on most > platforms we couldn't do without being root anyway, ISTM the kernel should > do just fine at paging out little-used areas of the stats. Let's not > reinvent that wheel until there's demonstrable proof of need. I think some systems aren't able to page out data stored in shared memory segments, and it would probably be pretty awful for performance if they did (there's a reason large sorts need to switch to using temp files ... and the same rationale seems to apply here). That having been said, I don't see a need to change everything at once. If we moved the stats collector data from frequently-rewritten files to shared memory, we'd already be saving a lot of I/O and possibly memory utilization. If somebody then wanted to refine that further by spilling rarely used data out to disk and reloading it on demand, that could be done as a follow-on patch. But I think that would only be needed for people with *really* large numbers of tables, and it would only help them if most of those tables were barely ever touched. >> b) I think our tendency to dump all stats whenever we crash isn't really >> tenable, given how autovacuum etc are tied to them. > > Eh, maybe. I don't think crashes are really so common on production > systems. As developers we have an inflated view of their frequency ;-) Without taking a position on the point under debate, AFAIK it wouldn't be technically complex either under our current architecture or the proposed new one to dump out a new permanent stats file every 10 minutes or so. So if there is an issue here I think it might not be very hard to fix, whatever else we do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers