Since we're getting a bit into the weeds of a different topic, and since
I think it's an interesting feature, I'm detaching this into a separate

On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> >>> I think it'd be pretty interesting to look at replacing parts of the
> >>> stats collector machinery with something DHT-based.
> > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund <and...@anarazel.de> wrote:
> >> That seems to involve a lot more than this though, given that currently
> >> the stats collector data doesn't entirely have to be in memory. I've
> >> seen sites with a lot of databases with quite some per-database stats
> >> data. Don't think we can just require that to be in memory :(
> Robert Haas <robertmh...@gmail.com> writes:
> > Hmm.  I'm not sure it wouldn't end up being *less* memory.  Don't we
> > end up caching 1 copy of it per backend, at least for the database to
> > which that backend is connected?  Accessing a shared copy would avoid
> > that sort of thing.
> Yeah ... the collector itself has got all that in memory anyway.
> We do need to think about synchronization issues if we make that
> memory globally available, but I find it hard to see how that would
> lead to more memory consumption overall than what happens now.

You both are obviously right.  Another point of potential concern could
be that we'd pretyt much fully rely on dsm/dht's being available, for
the server to function correctly. Are we ok with that? Right now
postgres still works perfectly well, leaving parallelism aside, with
dynamic_shared_memory_type = none.

What are your thoughts about how to actually implement this? It seems
we'd have to do something like:

1) Keep the current per-backend & per-transaction state in each
   backend. That allows both to throw away the information and avoids
   increasing contention quite noticeably.

2) Some plain shared memory with metadata.  A set of shared hashtables
   for per database, per relation contents.

3) Individual database/relation entries are either individual atomics
   (we don't rely on consistency anyway), or seqcount (like
   st_changecount) based.

4) Instead of sending stats at transaction end, copy them into a
   "pending" entry.  Nontransactional contents can be moved to
   the pending entry more frequently.

5) Occasionally, try to flush the pending array into the global hash.
   The lookup in the table would be protected by something
   LWLockConditionalAcquire() based, to avoid blocking - don't want to
   introduce chokepoints due to commonly used tables and such.  Updating
   the actual stats can happen without the partition locks being held.

I think there's two other relevant points here:

a) It'd be quite useful to avoid needing a whole cluster's stats in
   memory. Even if $subject would save memory, I'm hesitant committing
   to something requiring all stats to be in memory forever. As a first
   step it seems reasonable to e.g. not require state for all databases
   to be in memory. The first time per-database stats are required, it
   could be "paged in". Could even be more aggressive and do that on a
   per-table level and just have smaller placeholder entries for
   non-accessed tables, but that seems more work.

   On the other hand, autoavcuum is likely going to make that approach
   useless anyway, given it's probably going to access otherwise unneded
   stats regularly.

b) I think our tendency to dump all stats whenever we crash isn't really
   tenable, given how autovacuum etc are tied to them. We should think
   about ways to avoid that if we're going to do a major rewrite of the
   stats stuff, which this certainly sounds like.

If there weren't HS to worry about, these two points kinda sound like
the data should be persisted into an actual table, rather than some
weird other storage format. But HS seems to make that untenable.


Andres Freund

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to