On 15.01.2025 12:47, Ilia Evdokimov wrote:
On 06.01.2025 18:57, Andrey M. Borodin wrote:
1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure 
it's a good idea, but still. Also, it uses "<=", not "<".

     xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);

Sorry for the delayed reply. Andrey was right about this suggestion—first, it makes the code more readable for others, and second, it avoids engaging the PRNG on edge values of 0.0 and 1.0. I’ve attached patch v11 with these changes.

Patch looks fine. Thank you!

On 14.01.2025 15:00, Ilia Evdokimov wrote:

Alena, Sami – I apologize for not including you in the previous email. If you're interested in this approach, I'm open to any suggestions.

[0]: https://www.postgresql.org/message-id/1b13d748-5e98-479c-9222-3253a734a038%40tantorlabs.com

This is a difficult question. I tend to agree with Alexander Korotkov's proposal to add a filter that saves information about queries whose statistical information satisfies the conditions of the configured filter. However, I don’t believe query execution time is a sufficient metric for this purpose. It is too unstable and influenced by many external factors, such as system load. For instance, various background processes like vacuum, checkpointer, or background writer mechanisms could be running simultaneously.

Additionally, a query may take a long time to execute because another large query is consuming most of the system's resources at the same time. In such cases, the long execution time of the triggered query may not indicate anything remarkable. Furthermore, statistics for resource-intensive queries may appear normal if the query simply processed a large volume of data, such as when it involves a Cartesian product or a full join.

Therefore, I think the idea of using filters is more promising, especially if multiple filters are implemented. For example, we could add filters for buffer usage (pages read or modified), differences in cardinality (predicted vs. actual), username, application name, and other criteria. These filters would help reduce the volume of queries tracked by the pg_stat_statements extension. However, there might be challenges in keeping this state up to date, given the volatile and unstable nature of the system.

Your proposed parameter could also reduce the volume of queries for which statistics need to be saved, but it is too unpredictable for analysis. This unpredictability makes it unclear how the data would be interpreted in the future. For instance, a query that genuinely impacted performance might not be processed by pg_stat_statements simply due to randomness, while a small, insignificant query could be selected instead. Analyzing such statistical information might lead to misleading conclusions.

--
Regards,
Alena Rybakina
Postgres Professional

Reply via email to