On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote: > No problem at all! (I re-explained because I'm not always 100% sure that my > explanations are crystal clear ;-) )
We've discussed a bit this patch offline, but after studying the patch I doubt that this part is a good idea: + /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */ + PgStatShared_IO io; } PgStat_ShmemControl; We are going to be in trouble if we introduce a second member in this routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl relies on the fact that all its members after deterministic offset positions in this structure. So this lacks flexibility. This choice is caused by the fact that we don't exactly know the number of backends because that's controlled by the PGC_POSTMASTER GUC max_connections so the size of the structure would be undefined. There is a parallel with replication slot statistics here, where we save the replication slot data in the dshash based on their index number in shmem. Hence, wouldn't it be better to do like replication slot stats, where we use the dshash and a key based on the procnum of each backend or auxiliary process (ProcNumber in procnumber.h)? If at restart max_connections is lower than what was previously used, we could discard entries that would not fit anymore into the charts. This is probably not something that happens often, so I'm not really worried about forcing the removal of these stats depending on how the upper-bound of ProcNumber evolves. So, using a new kind of ID and making this kind variable-numbered may ease the implementation quite a bit, while avoiding any extensibility issues with the shmem portion of the patch if these are fixed-numbered. The reporting of these stats comes down to having a parallel with pgstat_count_io_op_time(), but to make sure that the stats are split by connection slot number rather than the current split of pg_stat_io. All its callers are in localbuf.c, bufmgr.c and md.c, living with some duplication in the code paths to gather the stats may be OK. pg_stat_get_my_io() is based on a O(n^3). IOOBJECT_NUM_TYPES is fortunately low, still that's annoying. This would rely on the fact that we would use the ProcNumber for the dshash key, and this information is not provided in pg_stat_activity. Perhaps we should add this information in pg_stat_activity so as it would be easily possible to do joins with a SQL function that returns a SRF with all the stats associated with a given connection slot (auxiliary or backend process)? That would be a separate patch. Perhaps that's even something that has popped up for the work with threading (did not follow this part closely, TBH)? The active PIDs of the live sessions are not stored in the active stats, why not? Perhaps that's useless anyway if we expose the ProcNumbers in pg_stat_activity and make the stats available with a single function taking in input a ProcNumber. Just mentioning an option to consider. -- Michael
signature.asc
Description: PGP signature