> Hi, > > Over the last weeks we've been fighting again with pg_stat_statements > issues, specifically with the issue of having too many unique entries, > and the corresponding problem of large query text files and LWLock > wait events related to pg_stat_statements.
Yeah, LWLock:pg_stat_statements related to garbage collection can be really bad when large files must be recreated with live query texts, or if this GC must occur often (in the case of high query entry churn). A high dealloc count is already a good indicator of the need for GC, but most people, in my experience, do not realize there is a GC process, since it is not documented or exposed. A high dealloc is already a good indicator of the need for gc, but most people, in my experience, don't realize there is a gc process, since it's not documented and not exposed. > I think we can improve debugging for such situations by adding two > more columns to pg_stat_statements_info that expose information > already tracked: > > 1) "gc_count", showing when the pg_stat_statements query garbage > collection cycles occur (which can correlate with > LWLock:pg_stat_statements) > > 2) "query_file_size" which tells us the extent of the query text file, > so we can fine-tune when we query the texts from pg_stat_statements in > monitoring scripts (i.e. query it less frequently if the query text > file is very large). I do agree that having such additional information, with proper documentation, is a good idea. However, I do wonder if we should hold off on adding any of this info in 19 because of the point you make below, which could completely change the information we need to expose. Adding this information in 19 and then removing it for 20 may not be worthwhile. > I've had a patch to improve this prepared for a previous cycle, but > wasn't sure it was still needed because of the discussion re: keeping > query texts in shared memory. But since it looks like that won't > change for 19 (though I'm hoping to contribute more to improving that > in the PG 20 cycle), see attached for consideration. 19 has 4ba012a8ed, which allows us to serialize and deserialize query texts stored in, for example, DSA, with a dsa_pointer tracked by the entry of a custom stats kind. I was also planning on continuing this work for 20, and getting 4ba012a8ed was an important prerequisite for this. What do you think? [1] [https://www.postgresql.org/message-id/caa5rz0s9sdou+z6veojchwk+kdetktatc-ky9fq9z6bjddu...@mail.gmail.com] -- Sami Imseih Amazon Web Services (AWS)
