On Sun, Aug 13, 2017 at 05:56:56PM -0700, Andres Freund wrote: > Hi, > > Since we're getting a bit into the weeds of a different topic, and since > I think it's an interesting feature, I'm detaching this into a separate > thread. > > On 2017-08-12 23:37:27 -0400, Tom Lane wrote: > > >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote: > > >>> I think it'd be pretty interesting to look at replacing parts of the > > >>> stats collector machinery with something DHT-based. > > > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund <and...@anarazel.de> > > > wrote: > > >> That seems to involve a lot more than this though, given that currently > > >> the stats collector data doesn't entirely have to be in memory. I've > > >> seen sites with a lot of databases with quite some per-database stats > > >> data. Don't think we can just require that to be in memory :( > > > > Robert Haas <robertmh...@gmail.com> writes: > > > Hmm. I'm not sure it wouldn't end up being *less* memory. Don't we > > > end up caching 1 copy of it per backend, at least for the database to > > > which that backend is connected? Accessing a shared copy would avoid > > > that sort of thing. > > > > Yeah ... the collector itself has got all that in memory anyway. > > We do need to think about synchronization issues if we make that > > memory globally available, but I find it hard to see how that would > > lead to more memory consumption overall than what happens now. > > You both are obviously right. Another point of potential concern could > be that we'd pretyt much fully rely on dsm/dht's being available, for > the server to function correctly. Are we ok with that? Right now > postgres still works perfectly well, leaving parallelism aside, with > dynamic_shared_memory_type = none. > > > What are your thoughts about how to actually implement this? It seems > we'd have to do something like: > > 1) Keep the current per-backend & per-transaction state in each > backend. That allows both to throw away the information and avoids > increasing contention quite noticeably. > > 2) Some plain shared memory with metadata. A set of shared hashtables > for per database, per relation contents. > > 3) Individual database/relation entries are either individual atomics > (we don't rely on consistency anyway), or seqcount (like > st_changecount) based. > > 4) Instead of sending stats at transaction end, copy them into a > "pending" entry. Nontransactional contents can be moved to > the pending entry more frequently. > > 5) Occasionally, try to flush the pending array into the global hash. > The lookup in the table would be protected by something > LWLockConditionalAcquire() based, to avoid blocking - don't want to > introduce chokepoints due to commonly used tables and such. Updating > the actual stats can happen without the partition locks being held. > > I think there's two other relevant points here: > > a) It'd be quite useful to avoid needing a whole cluster's stats in > memory. Even if $subject would save memory, I'm hesitant committing > to something requiring all stats to be in memory forever. As a first > step it seems reasonable to e.g. not require state for all databases > to be in memory. The first time per-database stats are required, it > could be "paged in". Could even be more aggressive and do that on a > per-table level and just have smaller placeholder entries for > non-accessed tables, but that seems more work. > > On the other hand, autoavcuum is likely going to make that approach > useless anyway, given it's probably going to access otherwise unneded > stats regularly. > > b) I think our tendency to dump all stats whenever we crash isn't really > tenable, given how autovacuum etc are tied to them. We should think > about ways to avoid that if we're going to do a major rewrite of the > stats stuff, which this certainly sounds like. > > > If there weren't HS to worry about, these two points kinda sound like > the data should be persisted into an actual table, rather than some > weird other storage format. But HS seems to make that untenable. > > Greetings, > > Andres Freund
As I recall, David Gould (hi!) had run across a case where there were thousand of tables and the stats file became a pretty serious bottleneck. There might even be a design or even code to address it. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers