> > I took a look at this today out of interest, > > Thanks! > > > so, instead of calling IS_INJECTION_POINT_ATTACHED macro which is > > I think that not calling IS_INJECTION_POINT_ATTACHED() but only relying on > "ifdef USE_INJECTION_POINTS" would set the tiny timeout value to the entire > test > suit.
Yes, you are right about that. To Michael's earlier comments: > I don't find the design of this patch appealing, and my mind points > towards two pieces of it: > 1) The new requirement related to pgstat_schedule_anytime_update() > that a stats kind needs to call to enable a timeout. This partially > doubles with pgstat_report_fixed. And I suspect that this extra set > of requirements, introducing a new level of complexity for in-core > stats kinds as well as extension developers, would be the source of > more bugs. I don't see how for fixed stats, adding a pgstat_schedule_anytime_update() call for such kinds will be too complex or more error prone. it looks quite straightforward as this is done before pgstat_report_fixed is set to true. Also, because processes like wal-sender can loop forever, as mentioned here [1], a timeout at stats_flush_interval seems like a straightforward way to deal with this problem. For variable-length statistics, perhaps we can do things a bit differently than what is currently proposed. 0005 requires a relation anytime stat update to call pgstat_schedule_anytime_update(). This is done this way because it allows long-running queries to update their stats every stats_flush_interval using a timeout. But maybe what we should be doing for variable-numbered stats is to schedule an anytime update whenever a "transaction goes idle". This way, unlike the current state of things where we are only updating relation stats at the end of a transaction, we are now updating relation stats at the end of SQL execution, and within a transaction. Basically, we will continue scheduling anytime updates every stats_flush_interval for fixed stats (xlog, bgwriter, etc.), but for variable stats, we only update anytime stats after SQL execution. This is better than what we have now, where stats are only updated at the end of the transaction. The timeout will only be needed to schedule an update for fixed stats. For variable stats, we can use GetCurrentStatementStartTimestamp, which is the timestamp of the last query executed, to throttle pgstat_flush_pending_entries(). We can also flush variable number stats whenever we flush fixed number stats, in case we enter a long idle-in-transaction state after a few quick back-to-back queries. > 2) The timeout requirement itself, relying on a timeout threshold > controlled by a backend-side configuration. Perhaps we may not need the stats_flush_inteval and just force a 10 second timeout for fixed stats. > With that in mind, wouldn't it be simpler if we introduced an API that > could be used from client applications instead, in a model similar > what we do for procsignal.c/h? One such example is > LOG_MEMORY_CONTEXT, where we have a SQL function that is able to tell > to a backend that it needs to do something. I could see various > benefits to this approach, because it gives more flexibility with the > timing of the stats flushes, which may not be a backend-side only > policy: > - Use a cron bgworker in the backend, that scans pg_stat_activity, for > example for long-running transactions based on a threshold. > - Do the same periodic scan of pg_stat_activity, but from a client > application. I find it odd to ask applications/clients to trigger a flush. I am not saying that we should not offer such an API, especially if someone want to flush stats more frequently than stats_flush_interval, but there should be the ability for core to handle this automatically outside of the transaction boundaries. One comment about the current test. I think there is a bug that was missed in the earlier review. For the var_anytime_update, we need to have an escape before the pipe. Also, we should set stats_fetch_consistency=none in that test. -like($result, qr/^entry2|2|/m, +like($result, qr/^entry2\|2\|/m, Otherwise, the test is returning a false-positive. [1] https://www.postgresql.org/message-id/erpzwxoptqhuptdrtehqydzjapvroumkhh7lc6poclbhe7jk7l%40l3yfsq5q4pw7 -- Sami Imseih Amazon Web Services (AWS)
