Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 09/05/2011 07:52 PM, Tomas Vondra wrote: If your logging criteria for the write phase was display a message any time more than 30 seconds have passed since last seeing one, that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I still am not sure what should be a reasonable value or how to determine it. What happens when the checkpoint_timeout is increased, there's more shared_buffers etc.? What about using (checkpoint_timeout/10) for the time-based checkpoints and 30s for the other checkpoints? That may work fine. Maybe implement it like that, and see if the amount of logging detail is reasonable in a couple of test scenarios. I think combining the two makes the most sense: log when=30 seconds have passed since the last message, and there's been=10% more progress made. (Maybe do the progress check before the time one, to cut down on Is this is a good idea? The case when the timeout expires and not much data was written is interesting, and this would not log it. But OTOH this would nicely solve the issue with time-based checkpoints and a fixed threshold. One thing I am trying to avoid here is needing to check the system clock after every buffer write. I also consider it useful to put an upper bound on how many of these messages will appear even in the verbose mode. This deals with both those problems. Yes, there is a potential problem with this idea. Let's say checkpoint writes degrade to where they take an hour. In that case, you won't see the first progress report until 6 minutes (10%) have gone by with this implementation. I don't see a good way to resolve that without violating one of the other priorities I listed above though. You'll have to poll the system clock constantly and will end up creating a lot of log entries if you don't do a check against the % progress first. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Greg Smith g...@2ndquadrant.com writes: One thing I am trying to avoid here is needing to check the system clock after every buffer write. On machines where gettimeofday is slow (and last I heard there were still lots of them), any such thing would be a disaster performance-wise. I'm still afraid to add more gettimeofday's into the query parse/plan/execute code path, even though it would greatly ease the problem of figuring out whether re-planning is worthwhile. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Tom Lane t...@sss.pgh.pa.us writes: On machines where gettimeofday is slow (and last I heard there were still lots of them), any such thing would be a disaster performance-wise. I'm still afraid to add more gettimeofday's into the query parse/plan/execute code path, even though it would greatly ease the problem of figuring out whether re-planning is worthwhile. Excuse my ignorance here, but is SIGALARM less of a problem? Then you could ask the system for an alarm next second and count the alarms rather than poll the clock. We don't need high precision in both those cases I guess. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 22:04, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 2 Z?? 2011, 21:23, Tom Lane wrote: Well, to be blunt, putting stuff into the postmaster log is entirely the wrong way to satify a requirement like that. If you want to expose progress information, it should be exposed via something dynamic like pg_stat_activity. What could be useful to log is statistics that people might want to aggregate later, and I don't immediately see a reason why such stats couldn't be logged just once at end of each checkpoint cycle. The problem with pg_stat_activity is that it provides just 'current state', no history. If you don't sample that often enough, you may completely miss the checkpoint (and thus you'll have no info about it, unless you enable log_checkpoints and check the log). And it's imposible to use if you need info about something that happened in the past. And AFAIK it does not show processes running timed checkpoints for example. Your requirements seem sufficiently slippery that I don't think you've thought them through very well. As far as I can see, the proposed patch will mostly result in bloating the postmaster log with repetitive information of next to no value. I can see the reason for wanting to know what the system is doing right now, and I can see the reason for wanting aggregatable statistics so that you can tell over time whether your settings need to be adjusted. I don't see the value in a lot of 10% done log entries --- there is essentially no historical value in such, IMO, because they don't correspond to any user-level activity. (Which is what distinguishes this from, say, log_connections or log_statements.) The fact that you can't figure out a reasonable frequency for making the entries is a symptom of the design being wrong at its core. No, I probably haven't thought through all the requirements - but that's exactly the feedback I expect from a commit fest. Originally I've built the patch to fix a single use-case, where the 'log each 10%' approach made perfect sense. I'm the first one to admin this is not a good approach for the other use-cases and that the frequency should be figured out in a different way. Anyway I don't think that a piece of information that does not correspond to any user-level activity is useless. Checkpoints are often a very intrusively and significantly influence the users - that's why I believe any help to tune them is desirable. The ability to see what the system is doing right now and aggregatable statistics are fine, but in many cases they're useless (e.g. when investigating past events). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 3 Září 2011, 8:19, Greg Smith wrote: If you're expanding log_checkpoints to an enum, for that to handle what I think everybody might ever want (for what checkpoints do now at least), I'd find that more useful if it happened like this instead: log_checkpoints = {off, on, write, sync, verbose} I don't think you should change the semantics of off/on, which will avoid breaking existing postgresql.conf files and resources that suggest tuning advice. write can toggle on what you're adding; sync should control whether the DEBUG1 messages showing the individual file names in the sync phase appear; and verbose can include both. Thanks, those are definitely good ideas extending the original patch and making it much more useful I guess. As far as a heuristic for making this less chatty when there's nothing exciting happening goes, I think something based on how much time has passed would be the best one. In your use case, I would guess you don't really care whether a message appears every n%. If I understand you correctly now, you would mainly care about getting enough log detail to know 1) when things are running really slow, or b) often enough that the margin of error in your benchmark results from unaccounted checkpoint writes is acceptable. In both of those cases, I'd think a time-based threshold would be appropriate, and that also deals with the time-based checkpoints, too. Yes, the time-based threshold seems like the right solution. If your logging criteria for the write phase was display a message any time more than 30 seconds have passed since last seeing one, that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I still am not sure what should be a reasonable value or how to determine it. What happens when the checkpoint_timeout is increased, there's more shared_buffers etc.? What about using (checkpoint_timeout/10) for the time-based checkpoints and 30s for the other checkpoints? I think combining the two makes the most sense: log when =30 seconds have passed since the last message, and there's been =10% more progress made. (Maybe do the progress check before the time one, to cut down on Is this is a good idea? The case when the timeout expires and not much data was written is interesting, and this would not log it. But OTOH this would nicely solve the issue with time-based checkpoints and a fixed threshold. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Mon, Sep 5, 2011 at 7:52 PM, Tomas Vondra t...@fuzzy.cz wrote: If your logging criteria for the write phase was display a message any time more than 30 seconds have passed since last seeing one, that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I still am not sure what should be a reasonable value or how to determine it. What happens when the checkpoint_timeout is increased, there's more shared_buffers etc.? What about using (checkpoint_timeout/10) for the time-based checkpoints and 30s for the other checkpoints? I think the idea here is that we only need to log a message often enough that the admin who is sitting there watching this won't get too impatient waiting for the next one. As that's not a function of checkpoint_timeout, I don't see much value in conditioning this on that. +1 for the suggestion of 30s intervals - that seems infrequent enough not to result in too much log spam, but sufficiently frequent that anyone who is concerned about checkpoint progress won't have to wait terribly long to find out how things are going. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 09/02/2011 11:10 AM, Tomas Vondra wrote: My 'ideal' solution would be either to add another GUC (to turn this on/off) or allow log_checkpoints to have three values log_checkpoints = {off, normal, detailed} where 'normal' provides the current output and 'detail' produces this much verbose output. If this is going to be acceptable, that's likely the only path it could happen by and still meet what you're looking for. I will just again stress that the part you're working on instrumenting better right now is not actually where larger systems really run into the most problems here, based on what I've seen. I added a series of log messages to 9.1 at DEBUG1, aimed at tracking the sync phase. That's where I see many more checkpoint issues than in the write one. On Linux in particular, it's almost impossible for the write phase to be more of a problem than the sync one. So the logging you're adding here I don't ever expect to turn on. But I wouldn't argue against an option to handle the logging use-case you're concerned about. Letting people observe for themselves and decide which of the phases is more interesting to their workload seems appropriate. Then users have options for what to log, no matter which type of problem they run into. If you're expanding log_checkpoints to an enum, for that to handle what I think everybody might ever want (for what checkpoints do now at least), I'd find that more useful if it happened like this instead: log_checkpoints = {off, on, write, sync, verbose} I don't think you should change the semantics of off/on, which will avoid breaking existing postgresql.conf files and resources that suggest tuning advice. write can toggle on what you're adding; sync should control whether the DEBUG1 messages showing the individual file names in the sync phase appear; and verbose can include both. As far as a heuristic for making this less chatty when there's nothing exciting happening goes, I think something based on how much time has passed would be the best one. In your use case, I would guess you don't really care whether a message appears every n%. If I understand you correctly now, you would mainly care about getting enough log detail to know 1) when things are running really slow, or b) often enough that the margin of error in your benchmark results from unaccounted checkpoint writes is acceptable. In both of those cases, I'd think a time-based threshold would be appropriate, and that also deals with the time-based checkpoints, too. If your logging criteria for the write phase was display a message any time more than 30 seconds have passed since last seeing one, that would give you only a few lines of output in a boring, normal checkpoint--certainly less than the 9 in-progress samples you're outputting now, at 10% intervals. But in the pathological situations where writes are super slow, your log data would become correspondingly denser, which is exactly what you want in that situation. I think combining the two makes the most sense: log when =30 seconds have passed since the last message, and there's been =10% more progress made. (Maybe do the progress check before the time one, to cut down on gettimeofday() calls) That would give you 4 in-progress reports during a standard 2.5 minute write phase, and in cases where the checkpoints are taking a long time you'd get as many as 9. That's pretty close to auto-tuning the amount of log output, so the amount of it is roughly proportional to how likely the information it's logging will be interesting. We certainly don't want to add yet another GUC just to control the frequency. I don't think it will be too hard to put two hard-coded thresholds in and do good enough for just about everyone though. I would probably prefer setting those thresholds to 60 seconds/20% instead. That might not be detailed enough for you though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote: I've prepared a significantly simplified version of the patch. The two main changes are (a) it does not update the pg_stat_bgwriter anymore, it just prints an info to the server log (b) a new GUC is not required, it's driven by the log_checkpoints The comment still refers to the checkpoint_update_limit. This version will log at least 10 'checkpoint status' lines (at 10%, 20%, 30%, ...) and whenever 5 seconds since the last log elapses. The time is not checked for each buffer but for 128 buffers. So if the checkpoint is very slow, you'll get a message every 5 seconds, if it's fast you'll get 10 messages. I would personally find this very annoying. If I read it correctly, anybody with a database with no problem at all but that has log_checkpoints on suddenly got at least 10 times as many messages? I generally try to advise my clients to *not* log excessively because then they will end up not bothering to read the logs... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 9:47, Magnus Hagander wrote: On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote: I've prepared a significantly simplified version of the patch. The two main changes are (a) it does not update the pg_stat_bgwriter anymore, it just prints an info to the server log (b) a new GUC is not required, it's driven by the log_checkpoints The comment still refers to the checkpoint_update_limit. OK, I'll fix that. This version will log at least 10 'checkpoint status' lines (at 10%, 20%, 30%, ...) and whenever 5 seconds since the last log elapses. The time is not checked for each buffer but for 128 buffers. So if the checkpoint is very slow, you'll get a message every 5 seconds, if it's fast you'll get 10 messages. I would personally find this very annoying. If I read it correctly, anybody with a database with no problem at all but that has log_checkpoints on suddenly got at least 10 times as many messages? I generally try to advise my clients to *not* log excessively because then they will end up not bothering to read the logs... What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Fri, Sep 2, 2011 at 11:01, Tomas Vondra t...@fuzzy.cz wrote: On 2 Září 2011, 9:47, Magnus Hagander wrote: On Thu, Sep 1, 2011 at 21:59, Tomas Vondra t...@fuzzy.cz wrote: I've prepared a significantly simplified version of the patch. The two main changes are (a) it does not update the pg_stat_bgwriter anymore, it just prints an info to the server log (b) a new GUC is not required, it's driven by the log_checkpoints The comment still refers to the checkpoint_update_limit. OK, I'll fix that. This version will log at least 10 'checkpoint status' lines (at 10%, 20%, 30%, ...) and whenever 5 seconds since the last log elapses. The time is not checked for each buffer but for 128 buffers. So if the checkpoint is very slow, you'll get a message every 5 seconds, if it's fast you'll get 10 messages. I would personally find this very annoying. If I read it correctly, anybody with a database with no problem at all but that has log_checkpoints on suddenly got at least 10 times as many messages? I generally try to advise my clients to *not* log excessively because then they will end up not bothering to read the logs... What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. I guess if it's at a DEBUG level it won't annoy anybody who doesn't need it. Not sure if NOTICE is low enough.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 12:45, Magnus Hagander wrote: On Fri, Sep 2, 2011 at 11:01, Tomas Vondra t...@fuzzy.cz wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. I guess if it's at a DEBUG level it won't annoy anybody who doesn't need it. Not sure if NOTICE is low enough.. I've changed the level to NOTICE. I guess I could put that to info, but the debug levels seem too low to me. Is there a rule of a thumb about where to put messages? Tomasdiff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c new file mode 100644 index 4c7cfb0..a9bdd89 *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *** BufferSync(int flags) *** 1175,1180 --- 1175,1193 int num_to_write; int num_written; int mask = BM_DIRTY; + + int num_since_update; + + longcurr_secs, + total_secs; + int curr_usecs, + total_usecs; + float curr_time, + total_time; + + TimestampTz startTimestamp, lastTimestamp; + + int log_interval, check_interval; /* Make sure we can handle the pin inside SyncOneBuffer */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); *** BufferSync(int flags) *** 1238,1243 --- 1251,1265 buf_id = StrategySyncStart(NULL, NULL); num_to_scan = NBuffers; num_written = 0; + num_since_update = 0; + + startTimestamp = GetCurrentTimestamp(); + lastTimestamp = startTimestamp; + + /* check the progress each 128 buffers or 10% */ + log_interval = (int)round(num_to_write/10); + check_interval = ((log_interval 128) (log_interval 0)) ? log_interval : 128; + while (num_to_scan-- 0) { volatile BufferDesc *bufHdr = BufferDescriptors[buf_id]; *** BufferSync(int flags) *** 1261,1266 --- 1283,1334 TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); BgWriterStats.m_buf_written_checkpoints++; num_written++; + num_since_update++; + + /* +* Every time we write enough buffers we check the time since +* last log and whenever we reach 5 seconds or 10% of the buffers +* to write, we put a checkpoint status message. +* +* The log message contains info about total number of buffers to +* write, how many buffers are already written, average and current +* write speed and an estimate remaining time. +*/ + if ((log_checkpoints) (num_written % check_interval == 0)) { + + TimestampDifference(lastTimestamp, + GetCurrentTimestamp(), + curr_secs, curr_usecs); + + /* if at least 5 seconds elapsed since the last log, log */ + if ((curr_secs = 5) || (num_since_update = log_interval)) { + + TimestampDifference(startTimestamp, + GetCurrentTimestamp(), + total_secs, total_usecs); + + curr_time = curr_secs + (float)curr_usecs / 100; + total_time = total_secs + (float)total_usecs / 100; + + elog(NOTICE, checkpoint status: wrote %d buffers of %d (%.1f%%) in %.1f s; + average %.1f MB/s (%d buffers, %ld.%03d s), + current %.1f MB/s (%d buffers, %ld.%03d s), + remaining %.1f s, + num_written, num_to_write, ((float) num_written * 100 / num_to_write), +
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Frankly, logging as verbose as this is proposed to be is something that I can't imagine anybody wanting at all, especially not in production environments. DEBUG3 or somewhere around there seems fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 15:44, Peter Eisentraut wrote: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). In the first patch I've proposed a new GUC (used to set how often the info should be logged or disable it), but Josh Berkus pointed out that I should get rid of it if I can. Which is what I've done in the following patches. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Sep 2, 2011 5:02 PM, Tomas Vondra t...@fuzzy.cz wrote: On 2 Září 2011, 15:44, Peter Eisentraut wrote: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). In the first patch I've proposed a new GUC (used to set how often the info should be logged or disable it), but Josh Berkus pointed out that I should get rid of it if I can. Which is what I've done in the following patches. Well, josh doesn't speak for everybody ;-) Maybe one way could be to change log_checkpoints into an enum of off, on, debug (values open for bikeshedding of course) /Magnus
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 16:13, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Frankly, logging as verbose as this is proposed to be is something that I can't imagine anybody wanting at all, especially not in production environments. DEBUG3 or somewhere around there seems fine. Yes, it can be a bit verbose, but I don't think it's that verbose. I'm one of those this information about checkpoint progress may be a valuable info - even on production systems - that's why I don't want to put that into the debug levels. It should write about 10 lines for each checkpoint (altough I've just realized the current patch won't do that for the timed checkpoints - it'll probably log significantly more). My 'ideal' solution would be either to add another GUC (to turn this on/off) or allow log_checkpoints to have three values log_checkpoints = {off, normal, detailed} where 'normal' provides the current output and 'detail' produces this much verbose output. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 17:08, Magnus Hagander wrote: On Sep 2, 2011 5:02 PM, Tomas Vondra t...@fuzzy.cz wrote: On 2 Září 2011, 15:44, Peter Eisentraut wrote: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). In the first patch I've proposed a new GUC (used to set how often the info should be logged or disable it), but Josh Berkus pointed out that I should get rid of it if I can. Which is what I've done in the following patches. Well, josh doesn't speak for everybody ;-) Sure, but I think the effort not to have a zillion of GUC makes sense. Maybe one way could be to change log_checkpoints into an enum of off, on, debug (values open for bikeshedding of course) Yes, that's actually one of the solutions I'd prefer. Not sure why I rejected it ... Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
.Tomas Vondra t...@fuzzy.cz wrote: On 2 Zá*í 2011, 15:44, Peter Eisentraut wrote: Changing the log level is not the appropriate solution. Make it a configuration parameter. Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). Maybe?: log_checkpoints = 'verbose' It seems to me that one of the most likely uses of this is for investigating performance issues in production. Like with log_statement, you might want to turn it on briefly and examine results. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Sep 2, 2011 5:18 PM, Jaime Casanova ja...@2ndquadrant.com wrote: On Fri, Sep 2, 2011 at 10:10 AM, Tomas Vondra t...@fuzzy.cz wrote: My 'ideal' solution would be either to add another GUC (to turn this on/off) or allow log_checkpoints to have three values log_checkpoints = {off, normal, detailed} +1 on make it an enum but i prefer values {on, off, verbose} where on will log just the same as now Yeah, keeping on and off working the same way as they do today is a feature on itself. /Magnus
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Fri, Sep 2, 2011 at 10:10 AM, Tomas Vondra t...@fuzzy.cz wrote: My 'ideal' solution would be either to add another GUC (to turn this on/off) or allow log_checkpoints to have three values log_checkpoints = {off, normal, detailed} +1 on make it an enum but i prefer values {on, off, verbose} where on will log just the same as now -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Thu, Sep 1, 2011 at 3:59 PM, Tomas Vondra t...@fuzzy.cz wrote: I've prepared a significantly simplified version of the patch. The two main changes are (a) it does not update the pg_stat_bgwriter anymore, it just prints an info to the server log (b) a new GUC is not required, it's driven by the log_checkpoints This version will log at least 10 'checkpoint status' lines (at 10%, 20%, 30%, ...) and whenever 5 seconds since the last log elapses. The time is not checked for each buffer but for 128 buffers. So if the checkpoint is very slow, you'll get a message every 5 seconds, if it's fast you'll get 10 messages. This seems like a strange heuristic. I understand the value of emitting a progress report every N seconds, but why would you want a report at 10%, 20%, 30% even if it hasn't been 5 seconds yet? I don't need ten progress messages on a checkpoint that only takes three seconds to complete. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote: On 2 Září 2011, 15:44, Peter Eisentraut wrote: On fre, 2011-09-02 at 11:01 +0200, Tomas Vondra wrote: What about logging it with a lower level, e.g. NOTICE instead of the current LOG? If that's not a solution then a new GUC is needed I guess. Changing the log level is not the appropriate solution. Make it a configuration parameter. Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). If a user is forced to change the log level to get at one particular piece of information, they will then also turn on countless other log events on that level, which is annoying. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote: Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). If a user is forced to change the log level to get at one particular piece of information, they will then also turn on countless other log events on that level, which is annoying. Yeah, if we're going to have this at all, some form of GUC control over it seems necessary. I'm still disturbed by the verbosity of the proposed output though. Couldn't we collapse the information into a single log entry per checkpoint cycle? Perhaps that would be sufficient to just let the log_checkpoints setting be used as-is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 20:46, Peter Eisentraut wrote: On fre, 2011-09-02 at 17:13 +0200, Tomas Vondra wrote: Sure, but I think the effort not to have a zillion of GUC makes sense. Well, I'll be the first to agree that reducing complexity in configuration and tuning settings is worth aiming for. But for logging, I'd rather have more settings, theoretically up to one for each possible message. That doesn't increase complexity, as long as it has linear behavior. It's debatable whether that means a new log_something parameter for each situation, or just a single parameter containing some kind of list, or something else, but that's a different problem. OK, good point. The effort to make this work without a GUC obviously is a dead end, and the idea to turn log_checkpoints into an enum seems reasonable. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 20:48, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-02 at 17:02 +0200, Tomas Vondra wrote: Why is it inappropriate solution? There's a log_checkpoints GUC that drives it and you can either get basic info (summary of the checkpoint) or detailed log (with a lower log level). If a user is forced to change the log level to get at one particular piece of information, they will then also turn on countless other log events on that level, which is annoying. Yeah, if we're going to have this at all, some form of GUC control over it seems necessary. I'm still disturbed by the verbosity of the proposed output though. Couldn't we collapse the information into a single log entry per checkpoint cycle? Perhaps that would be sufficient to just let the log_checkpoints setting be used as-is. Yes, the GUC seems inevitable. I'm still working on the verbosity for different checpoint typees, but I think 5 seconds for xlog and 10% for timed checkpoint is a reasonable setting. I'm not sure what you mean by collapsing the info into a single log entry? That would mean I'd have to wait till the checkpoint completes, and one of the reasons for this patch was to get info about progress while the checkpoint is running. For example I'd like to have this information in cases when the checkpoint never finishes - for example when performing automated benchmarks. I sometimes just kill the database (I need to rebuild it from scratch for the next run) so I don't get any checkpoint message at all. Waiting for the checkpoint would significantly increase the time for each run and thus for the whole benchmark. With one run that probably is not a problem, with 360 runs each minute makes a big difference. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Tomas Vondra t...@fuzzy.cz writes: On 2 ZáÅà 2011, 20:48, Tom Lane wrote: Yeah, if we're going to have this at all, some form of GUC control over it seems necessary. I'm still disturbed by the verbosity of the proposed output though. Couldn't we collapse the information into a single log entry per checkpoint cycle? Perhaps that would be sufficient to just let the log_checkpoints setting be used as-is. I'm not sure what you mean by collapsing the info into a single log entry? That would mean I'd have to wait till the checkpoint completes, and one of the reasons for this patch was to get info about progress while the checkpoint is running. Well, to be blunt, putting stuff into the postmaster log is entirely the wrong way to satify a requirement like that. If you want to expose progress information, it should be exposed via something dynamic like pg_stat_activity. What could be useful to log is statistics that people might want to aggregate later, and I don't immediately see a reason why such stats couldn't be logged just once at end of each checkpoint cycle. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 2 Září 2011, 21:23, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 2 Z?? 2011, 20:48, Tom Lane wrote: Yeah, if we're going to have this at all, some form of GUC control over it seems necessary. I'm still disturbed by the verbosity of the proposed output though. Couldn't we collapse the information into a single log entry per checkpoint cycle? Perhaps that would be sufficient to just let the log_checkpoints setting be used as-is. I'm not sure what you mean by collapsing the info into a single log entry? That would mean I'd have to wait till the checkpoint completes, and one of the reasons for this patch was to get info about progress while the checkpoint is running. Well, to be blunt, putting stuff into the postmaster log is entirely the wrong way to satify a requirement like that. If you want to expose progress information, it should be exposed via something dynamic like pg_stat_activity. What could be useful to log is statistics that people might want to aggregate later, and I don't immediately see a reason why such stats couldn't be logged just once at end of each checkpoint cycle. The problem with pg_stat_activity is that it provides just 'current state', no history. If you don't sample that often enough, you may completely miss the checkpoint (and thus you'll have no info about it, unless you enable log_checkpoints and check the log). And it's imposible to use if you need info about something that happened in the past. And AFAIK it does not show processes running timed checkpoints for example. So IMHO it's unusable for what I'm trying to achieve. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Tomas Vondra t...@fuzzy.cz writes: On 2 ZáÅà 2011, 21:23, Tom Lane wrote: Well, to be blunt, putting stuff into the postmaster log is entirely the wrong way to satify a requirement like that. If you want to expose progress information, it should be exposed via something dynamic like pg_stat_activity. What could be useful to log is statistics that people might want to aggregate later, and I don't immediately see a reason why such stats couldn't be logged just once at end of each checkpoint cycle. The problem with pg_stat_activity is that it provides just 'current state', no history. If you don't sample that often enough, you may completely miss the checkpoint (and thus you'll have no info about it, unless you enable log_checkpoints and check the log). And it's imposible to use if you need info about something that happened in the past. And AFAIK it does not show processes running timed checkpoints for example. Your requirements seem sufficiently slippery that I don't think you've thought them through very well. As far as I can see, the proposed patch will mostly result in bloating the postmaster log with repetitive information of next to no value. I can see the reason for wanting to know what the system is doing right now, and I can see the reason for wanting aggregatable statistics so that you can tell over time whether your settings need to be adjusted. I don't see the value in a lot of 10% done log entries --- there is essentially no historical value in such, IMO, because they don't correspond to any user-level activity. (Which is what distinguishes this from, say, log_connections or log_statements.) The fact that you can't figure out a reasonable frequency for making the entries is a symptom of the design being wrong at its core. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
I've prepared a significantly simplified version of the patch. The two main changes are (a) it does not update the pg_stat_bgwriter anymore, it just prints an info to the server log (b) a new GUC is not required, it's driven by the log_checkpoints This version will log at least 10 'checkpoint status' lines (at 10%, 20%, 30%, ...) and whenever 5 seconds since the last log elapses. The time is not checked for each buffer but for 128 buffers. So if the checkpoint is very slow, you'll get a message every 5 seconds, if it's fast you'll get 10 messages. Tomasdiff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c new file mode 100644 index 4c7cfb0..e60d304 *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *** BufferSync(int flags) *** 1175,1180 --- 1175,1193 int num_to_write; int num_written; int mask = BM_DIRTY; + + int num_since_update; + + longcurr_secs, + total_secs; + int curr_usecs, + total_usecs; + float curr_time, + total_time; + + TimestampTz startTimestamp, lastTimestamp; + + int log_interval, check_interval; /* Make sure we can handle the pin inside SyncOneBuffer */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); *** BufferSync(int flags) *** 1238,1243 --- 1251,1265 buf_id = StrategySyncStart(NULL, NULL); num_to_scan = NBuffers; num_written = 0; + num_since_update = 0; + + startTimestamp = GetCurrentTimestamp(); + lastTimestamp = startTimestamp; + + /* check the progress each 128 buffers or 10% */ + log_interval = (int)round(num_to_write/10); + check_interval = ((log_interval 128) (log_interval 0)) ? log_interval : 128; + while (num_to_scan-- 0) { volatile BufferDesc *bufHdr = BufferDescriptors[buf_id]; *** BufferSync(int flags) *** 1261,1266 --- 1283,1334 TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); BgWriterStats.m_buf_written_checkpoints++; num_written++; + num_since_update++; + + /* +* Every time we write enough buffers (checkpoint_update_limit), +* we log a checkpoint status message and update the bgwriter +* stats (so that the pg_stat_bgwriter table may be updated). +* +* The log message contains info about total number of buffers to +* write, how many buffers are already written, average and current +* write speed and an estimate remaining time. +*/ + if ((log_checkpoints) (num_written % check_interval == 0)) { + + TimestampDifference(lastTimestamp, + GetCurrentTimestamp(), + curr_secs, curr_usecs); + + /* if at least 5 seconds elapsed since the last log, log */ + if ((curr_secs = 5) || (num_since_update = log_interval)) { + + TimestampDifference(startTimestamp, + GetCurrentTimestamp(), + total_secs, total_usecs); + + curr_time = curr_secs + (float)curr_usecs / 100; + total_time = total_secs + (float)total_usecs / 100; + + elog(LOG, checkpoint status: wrote %d buffers of %d (%.1f%%) in %.1f s; + average %.1f MB/s (%d buffers, %ld.%03d s), + current %.1f MB/s (%d buffers, %ld.%03d s), + remaining %.1f s, + num_written, num_to_write, ((float) num_written * 100 / num_to_write), +
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 29 Srpen 2011, 7:47, Noah Misch wrote: On Sat, Aug 27, 2011 at 03:57:16PM +0200, Tomas Vondra wrote: On 27 Srpen 2011, 6:01, Noah Misch wrote: Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? I already do that, but it really does not solve the issue. It just aligns the first expected 'timed' checkpoint, it does not solve the problem with in-progress checkpoints unless the runs behave exactly the same (and that's the boring case). To clarify, run that command _after_ the 10-minute pgbench run. It blocks until completion of both the in-progress checkpoint, if any, and the requested checkpoint. I don't think this is going to help when there is a checkpoint in progress, as the new CHECKPOINT will wait till the already running one completes. It might force it to finish faster, but I still can't say how much data was written when the pgbench was running. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Sat, Aug 27, 2011 at 03:57:16PM +0200, Tomas Vondra wrote: On 27 Srpen 2011, 6:01, Noah Misch wrote: On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? I already do that, but it really does not solve the issue. It just aligns the first expected 'timed' checkpoint, it does not solve the problem with in-progress checkpoints unless the runs behave exactly the same (and that's the boring case). To clarify, run that command _after_ the 10-minute pgbench run. It blocks until completion of both the in-progress checkpoint, if any, and the requested checkpoint. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/27/2011 12:01 AM, Noah Misch wrote: On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? That's what I do in pgbench-tools, and it helps a lot. It makes it easier to identify when the checkpoint kicks in if you know it's approximately the same time after each test run begins, given similar testing parameters. That said, it's hard to eliminate all of the edge conditions here. For example, imagine that you're consuming WAL files such that you hit checkpoint_segments every 4 minutes. In a 10 minute test run, a checkpoint will start at 4:00 and finish at around 6:00 (with checkpoint_completion_target=0.5). The next will start at 8:00 and should finish at around 10:00--right at the end of when the test ends. Given the variation that sync timing and rounding issues in the write phase adds to things, you can expect that some test runs will include stats from 2 checkpoints, while others will end the test just before the second one finishes. It does throw the numbers off a bit. To avoid this when it pops up, I normally aim to push up to where there are =4 checkpoints per test run, just so whether I get n or n-1 of them doesn't impact results as much. But that normally takes doubling the length of the test to 20 minutes. As it will often take me days of test time to plow through exploring just a couple of parameters, I'm sympathetic to Tomas trying to improve accuracy here without having to run for quite so long. There's few people who have this problem to worry about though, it's a common issue with benchmarking but not many other contexts. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 27 Srpen 2011, 6:01, Noah Misch wrote: On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? I already do that, but it really does not solve the issue. It just aligns the first expected 'timed' checkpoint, it does not solve the problem with in-progress checkpoints unless the runs behave exactly the same (and that's the boring case). Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/25/2011 04:57 PM, Tomas Vondra wrote: (b) sends bgwriter stats (so that the buffers_checkpoint is updated) The idea behind only updating the stats in one chunk, at the end, is that it makes one specific thing easier to do. Let's say you're running a monitoring system that is grabbing snapshots of pg_stat_bgwriter periodically. If you want to figure out how much work a checkpoint did, you only need two points of data to compute that right now. Whenever you see either of the checkpoint count numbers increase, you just subtract off the previous sample; now you've got a delta for how many buffers that checkpoint wrote out. You can derive the information about the buffer counts involved that appears in the logs quite easily this way. The intent was to make that possible to do, so that people can figure this out without needing to parse the log data. Spreading out the updates defeats that idea. It also makes it possible to see the buffer writes more in real-time, as they happen. You can make a case for both approaches having their use cases; the above is just summarizing the logic behind why it's done the way it is right now. I don't think many people are actually doing things with this to the level where their tool will care. The most popular consumer of pg_stat_bgwriter data I see is Munin graphing changes, and I don't think it will care either way. Giving people the option of doing it the other way is a reasonable idea, but I'm not sure there's enough use case there to justify adding a GUC just for that. My next goal here is to eliminate checkpoint_segments, not to add yet another tunable extremely few users would ever touch. As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current once every ~5 minutes is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable. If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Thu, Aug 25, 2011 at 22:57, Tomas Vondra t...@fuzzy.cz wrote: Hello, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: (a) logs a checkpoint status message into the server log, with info about total number of buffers to write, number of already written buffers, current and average write speed and estimate of remaining time (b) sends bgwriter stats (so that the buffers_checkpoint is updated) I believe this will make checkpoint tuning easier, especially with large shared bufferers and large when there's other write activity (so that it's difficult to see checkpoint I/O). The default value (0) means this continuous logging is disabled. This seems like the wrong thing to write to the log. It's really only useful *during* the checkpoint run, isn't it? If so, I think it should go in a pg_stat view. In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Either way, it should be updated in shared memory in that case (like current query is), and not sent with a message to the collector. IMHO, of course ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 9:35, Greg Smith wrote: On 08/25/2011 04:57 PM, Tomas Vondra wrote: (b) sends bgwriter stats (so that the buffers_checkpoint is updated) As for throwing more log data out, I'm not sure what new analysis you're thinking of that it allows. I/O gets increasingly spiky as you zoom in on it; averaging over a shorter period can easily end up providing less insight about trends. If anything, I spend more time summarizing the data that's already there, rather than wanting to break them down. It's already providing way too much detail for most people. Customers tell me they don't care to see checkpoint stats unless they're across a day or more of sampling, so even the current once every ~5 minutes is way more info than they want. I have all this log parsing code and things that look at pg_stat_bgwriter to collect that data and produce higher level reports. And lots of it would break if any of this patch is added and people turn it on. I imagine other log/stat parsing programs might suffer issues too. That's your other hurdle for change here: the new analysis techniques have to be useful enough to justify that some downstream tool disruption is inevitable. I was aware that by continuously updating pg_stat_bgwriter, the data won't be synchronized (i.e. the buffers_checkpoint counters will change while the number of requested/timed checkpoints remain the same). But does that really break the tools that process the data? When you're working with summarized data, the result should be more or less the same as the difference will be smoothed out by averaging etc. You can always see just one in progress checkpoint, so if you get 24 checkpoints a day, the difference will be 1/24 of a checkpoint. Yes, it's a difference. A really crazy workaround would be to change checkpoints_requested / checkpoints_timed to double, and use that to indicate current progress of the checkpoint. So for example 10.54 would mean 10 checkpoints completed, one checkpoint in progress, already written 54% of blocks. But yes, that's a bit crazy. If you have an idea for how to use this extra data for something useful, let's talk about what that is and see if it's possible to build it in instead. This problem is harder than it looks, mainly because the way the OS caches writes here makes trying to derive hard numbers from what the background writer is doing impossible. When the database writes things out, and when they actually get written to disk, they are not the same event. The actual write is often during the sync phase, and not being able to tracking that beast is where I see the most problems at. The write phase, the easier part to instrument in the database, that is pretty boring. That's why the last extra logging I added here focused on adding visibility to the sync activity instead. Hmmm, let me explain what led me to this patch - right now I'm doing a comparison of filesystems with various block sizes (both fs and db blocks). I've realized that the db block size significantly influences frequency of checkpoints and amount of data to write, so I'm collecting data from pg_stat_bgwriter too. The benchmark goes like this 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. I can't infer this from other data (e.g. iostat), because that does allow me what I/O is caused by the checkpoint. Yes, this does not consider sync timing, but in my case that's not a big issue (the page cache is rather small so the data are actually forced to the drive soon). I could probably live with keeping the current pg_stat_bgwriter logic (i.e. updating just once) and writing checkpoint status just to the log. I don't think that should break any current tools that parse logs, because the message is completely different (prefixed with 'checkpoint status') so any reasonably written tool should be OK. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 08/26/2011 03:54 AM, Magnus Hagander wrote: In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Right. The whole progress indicator idea is hard to do for queries in general. But there's enough of these other progress indicator ideas around now that it may be worth putting a standard way to handle them in here. It sounds like that would be sufficient to address the area Tomas is trying to instrument better. I badly want a progress indicator on CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the build process it's on. That's turned into a major headache recently. If we run with the idea of just allowing backends to publish a progress text string, I think this one maps into a similar space as the autovacuum one. Publishing how many seconds the operation has been running for may be reasonable too. Whether the overhead of the timing calls necessary to compute that will be high or not depends on the refresh rate of the progress info. My suggestion before was to name these as key=value pairs for easy parsing; here's three examples now: autovacumm: pgbench_accounts h=182701 m=301515 d=321345 s=62.231 (cache hits, cache misses, dirty writes, seconds) background writer: checkpoint b=511 t=3072 s=5.321 (buffers written, total, seconds) create index concurrently: pgbench_accounts p=1 b=62 t=6213 s=81.232 (phase, blocks processed, total block estimate, seconds) I think that the idea of making this easily human readable is optimistic, because it will make all these strings big enough to start mattering. Given that, we almost have to assume the only consumers of this data will be able to interpret it using the documentation. I'd be happy with just the minimal data set in each case, not including any statistics you can easily derive from the values given (like the MB/s readings). Adding that figure in particular to more of the log messages would be nice though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 9:54, Magnus Hagander wrote: This seems like the wrong thing to write to the log. It's really only useful *during* the checkpoint run, isn't it? If so, I think it should go in a pg_stat view. In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Either way, it should be updated in shared memory in that case (like current query is), and not sent with a message to the collector. I personally find it handy for example when I need to find out why performance degraded at a certain point in the past. Now I can see there was a checkpoint (thanks to log_checkpoints=on), but I don't know any details about it's progress. I already collect info from the pg_stat_bgwriter, that's why I thought I could update it more often. The log file is a natural destination for such information, IMHO. I see that as an extension to the current checkpoint messages that are written to the log. The commandstring approach is probably fine for interactive work, but not very handy when you need to analyze something that already happened. Unless you collected the data, of course. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 19:17, Greg Smith wrote: On 08/26/2011 03:54 AM, Magnus Hagander wrote: In theory, this could be the progress view or progress field talked about around Gregs previous patch - or it could just be modifying the commandstring in pg_stat_activity. Right. The whole progress indicator idea is hard to do for queries in general. But there's enough of these other progress indicator ideas around now that it may be worth putting a standard way to handle them in here. It sounds like that would be sufficient to address the area Tomas is trying to instrument better. I badly want a progress indicator on CREATE INDEX CONCURRENTLY too, to at least let me know what phase of the build process it's on. That's turned into a major headache recently. If we run with the idea of just allowing backends to publish a progress text string, I think this one maps into a similar space as the autovacuum one. Publishing how many seconds the operation has been running for may be reasonable too. Whether the overhead of the timing calls necessary to compute that will be high or not depends on the refresh rate of the progress info. My suggestion before was to name these as key=value pairs for easy parsing; here's three examples now: autovacumm: pgbench_accounts h=182701 m=301515 d=321345 s=62.231 (cache hits, cache misses, dirty writes, seconds) background writer: checkpoint b=511 t=3072 s=5.321 (buffers written, total, seconds) create index concurrently: pgbench_accounts p=1 b=62 t=6213 s=81.232 (phase, blocks processed, total block estimate, seconds) I think that the idea of making this easily human readable is optimistic, because it will make all these strings big enough to start mattering. Given that, we almost have to assume the only consumers of this data will be able to interpret it using the documentation. I'd be happy with just the minimal data set in each case, not including any statistics you can easily derive from the values given (like the MB/s readings). Adding that figure in particular to more of the log messages would be nice though. I'm a bit confused - are you talking about updating process title or about writing the info to log? The process title is probably fine for watching the progress interactively, but it really does not solve what I need. I need to be able to infer the progress for past events, so I'd have to sample the 'ps ax' output regularly. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: Hmmm, let me explain what led me to this patch - right now I'm doing a comparison of filesystems with various block sizes (both fs and db blocks). I've realized that the db block size significantly influences frequency of checkpoints and amount of data to write, so I'm collecting data from pg_stat_bgwriter too. The benchmark goes like this 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: regular logging of checkpoint progress
Hello, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: (a) logs a checkpoint status message into the server log, with info about total number of buffers to write, number of already written buffers, current and average write speed and estimate of remaining time (b) sends bgwriter stats (so that the buffers_checkpoint is updated) I believe this will make checkpoint tuning easier, especially with large shared bufferers and large when there's other write activity (so that it's difficult to see checkpoint I/O). The default value (0) means this continuous logging is disabled. Tomasdiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 67e722f..64d84b0 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** SET ENABLE_SEQSCAN TO OFF; *** 1863,1868 --- 1863,1885 /listitem /varlistentry + varlistentry id=guc-checkpoint-update-limit xreflabel=checkpoint_update_limit + termvarnamecheckpoint_update_limit/varname (typeinteger/type)/term + indexterm +primaryvarnamecheckpoint_update_limit/ configuration parameter/primary + /indexterm + listitem +para + Number of buffers written during a checkpoint between logging a status + (with total number of buffers to write, number of already written buffers, + average/current write and estimate of the remaining time) and updates bgwriter + stats at the same time. The default value 0 disables the continuous updates so + the stats are updated only at the end of the checkpoint. This parameter can only + be set in the filenamepostgresql.conf/ file or on the server command line. +/para + /listitem + /varlistentry + /variablelist /sect2 sect2 id=runtime-config-wal-archiving diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c new file mode 100644 index 4c7cfb0..b24ec93 *** a/src/backend/storage/buffer/bufmgr.c --- b/src/backend/storage/buffer/bufmgr.c *** *** 66,71 --- 66,72 bool zero_damaged_pages = false; int bgwriter_lru_maxpages = 100; doublebgwriter_lru_multiplier = 2.0; + int checkpoint_update_limit = 0; /* * How many buffers PrefetchBuffer callers should try to stay ahead of their *** BufferSync(int flags) *** 1175,1180 --- 1176,1192 int num_to_write; int num_written; int mask = BM_DIRTY; + + int num_since_update; + + longcurr_secs, + total_secs; + int curr_usecs, + total_usecs; + float curr_time, + total_time; + + TimestampTz startTimestamp, lastTimestamp; /* Make sure we can handle the pin inside SyncOneBuffer */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); *** BufferSync(int flags) *** 1238,1243 --- 1250,1260 buf_id = StrategySyncStart(NULL, NULL); num_to_scan = NBuffers; num_written = 0; + num_since_update = 0; + + startTimestamp = GetCurrentTimestamp(); + lastTimestamp = startTimestamp; + while (num_to_scan-- 0) { volatile BufferDesc *bufHdr = BufferDescriptors[buf_id]; *** BufferSync(int flags) *** 1261,1266 --- 1278,1327 TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); BgWriterStats.m_buf_written_checkpoints++; num_written++; + num_since_update++; + + /* +* Every time we write enough buffers (checkpoint_update_limit), +* we log a checkpoint status message and update the bgwriter +* stats (so that the pg_stat_bgwriter table may be updated). +* +* The log message contains info about total number of buffers to +* write, how many buffers are already written, average and current +* write speed and an estimate remaining time. +*/ + if ((checkpoint_update_limit 0) (num_since_update = checkpoint_update_limit)) + { + +
Re: [HACKERS] PATCH: regular logging of checkpoint progress
Tomas, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: I'd rather not have a new GUC if we can avoid it. What about just making this some reasonable value (like 1000 buffers) if log_checkpoints = on? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 0:18, Josh Berkus wrote: Tomas, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: I'd rather not have a new GUC if we can avoid it. What about just making this some reasonable value (like 1000 buffers) if log_checkpoints = on? I was thinking about that too, but I think no value can fit all cases because the systems may have very different I/O subsystems. With one 7.2k drive I usually get about 25MB/s on average, with big arrays / good controllers / fast drives you can write much faster. So either the value will be too low (and the log will be infested with those messages) or too high (so it won't be updated very often). And it depends on the number of shared buffers too. I thought it could be something like 5% of shared buffers but it really does not solve the issues. So I think there should be a new GUC. OTOH, it's probably a good idea to log this only when the log_checkpoints is enabled as those who tune checkpoints will enable log_checkpoints in the first case. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On 26 Srpen 2011, 0:39, Tomas Vondra wrote: On 26 Srpen 2011, 0:18, Josh Berkus wrote: Tomas, I'd like to propose a small patch that allows better checkpoint progress monitoring. The patch is quite simple - it adds a new integer GUC checkpoint_update_limit and every time checkpoint writes this number of buffers, it does two things: I'd rather not have a new GUC if we can avoid it. What about just making this some reasonable value (like 1000 buffers) if log_checkpoints = on? I was thinking about that too, but I think no value can fit all cases because the systems may have very different I/O subsystems. With one 7.2k drive I usually get about 25MB/s on average, with big arrays / good controllers / fast drives you can write much faster. So either the value will be too low (and the log will be infested with those messages) or too high (so it won't be updated very often). Hmmm, maybe we could use time instead of number of buffers? Something like every 5 seconds, log the checkpoint progress instead of every time 1000 buffers is written ... That should work on systems regardless of I/O performance. But although using time instead of number of buffers seems like a good idea, I don't think it eliminates the need for a new GUC. (a) Even with time limit, I find it useful to be able to set the limits differently. (b) In some cases it may be useful to enable just basic (current behaviour) checkpoint logging using log_checkpoints, so using it for this new patch may not be a good idea. Although this could be fixed by allowing three values no/basic/detailed instead of just true/false. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers