Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Tom Lane t...@sss.pgh.pa.us wrote: 2. I do not understand the stuff with propagating counts into the top instrumentation node. That seems like it's going to double-count those counts. In any case it is 100% inconsistent to propagate only buffer counts that way and not any other resource usage. I think you should drop the TopInstrument variable and the logic that propagates counts up. It is required by contrib/pg_stat_statements. EXPLAIN wants per-node accumulation, but pg_stat_statements wants the total number. Is it enough to add a PG_TRY block to standard_ExecutorRun() to cleanup TopInstrument on error? I'm working on your other comments, but I cannot remove TopInstrument for pg_state_statements. I considerd other approaches, but all of them require node-dependent routines; for example, adding a function to walk through a plan tree and accumulate instrumentations in it at pg_stat_statements. But it is hard to be maintained on executor nodes changes. Are there any better idea? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Tom Lane t...@sss.pgh.pa.us wrote: 2. I do not understand the stuff with propagating counts into the top instrumentation node. It is required by contrib/pg_stat_statements. EXPLAIN wants per-node accumulation, but pg_stat_statements wants the total number. Well, you need to find another way or risk getting the patch rejected altogether. Those global variables are the weakest part of the whole design, and I'm not going to commit a patch that destabilizes the entire system for the sake of a debatable requirement of a contrib module. If you went with the alternative definition I suggested (don't reset the static counters, so that every node includes its children's counts) then the behavior you want would fall out automatically. Or, at the price of running both resettable and non-resettable static counters, you could have pg_stat_statements obtain totals that are independent of any particular instrumentation node. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
On Wed, Oct 14, 2009 at 9:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Tom Lane t...@sss.pgh.pa.us wrote: 2. I do not understand the stuff with propagating counts into the top instrumentation node. It is required by contrib/pg_stat_statements. EXPLAIN wants per-node accumulation, but pg_stat_statements wants the total number. Well, you need to find another way or risk getting the patch rejected altogether. Those global variables are the weakest part of the whole design, and I'm not going to commit a patch that destabilizes the entire system for the sake of a debatable requirement of a contrib module. If you went with the alternative definition I suggested (don't reset the static counters, so that every node includes its children's counts) then the behavior you want would fall out automatically. Or, at the price of running both resettable and non-resettable static counters, you could have pg_stat_statements obtain totals that are independent of any particular instrumentation node. I am marking this patch as Returned with Feedback. I hope that it will be resubmitted for a future CommitFest, because I think this could be pretty interesting feature. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Robert Haas robertmh...@gmail.com wrote: Well, you need to find another way or risk getting the patch rejected altogether. ?Those global variables are the weakest part of the whole design, and I'm not going to commit a patch that destabilizes the entire system for the sake of a debatable requirement of a contrib module. I am marking this patch as Returned with Feedback. I hope that it will be resubmitted for a future CommitFest, because I think this could be pretty interesting feature. Ok, I'll reconsider them and re-submit patches for the next commitfest. Maybe I need to split the patch into EXPLAIN-part and contrib-part. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
2009/10/14 Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp: Robert Haas robertmh...@gmail.com wrote: Well, you need to find another way or risk getting the patch rejected altogether. ?Those global variables are the weakest part of the whole design, and I'm not going to commit a patch that destabilizes the entire system for the sake of a debatable requirement of a contrib module. I am marking this patch as Returned with Feedback. I hope that it will be resubmitted for a future CommitFest, because I think this could be pretty interesting feature. Ok, I'll reconsider them and re-submit patches for the next commitfest. Maybe I need to split the patch into EXPLAIN-part and contrib-part. My (limited) experience is that it's usually better to get something incremental committed, even if it's not what you really want. You can always take another crack at the remaining issues later, but if the whole patch gets shot down then you are out of luck. In this case, I think that the auto_explain changes out to be part of the same patch as the core EXPLAIN changes, but if the pg_stat_statement stuff is severable it might make sense to push that off until later. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Robert Haas robertmh...@gmail.com wrote: My (limited) experience is that it's usually better to get something incremental committed, even if it's not what you really want. You can always take another crack at the remaining issues later, but if the whole patch gets shot down then you are out of luck. Yeah, that makes sense. But the partial change should also be a long-term solution ;-). It is hard to determine whether the partial change is a good solution until the whole features works as expected (at least partially). Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
On Wed, Oct 14, 2009 at 9:38 PM, Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp wrote: Robert Haas robertmh...@gmail.com wrote: My (limited) experience is that it's usually better to get something incremental committed, even if it's not what you really want. You can always take another crack at the remaining issues later, but if the whole patch gets shot down then you are out of luck. Yeah, that makes sense. But the partial change should also be a long-term solution ;-). It is hard to determine whether the partial change is a good solution until the whole features works as expected (at least partially). Well, that's an indication that you've chosen too small a piece. But I don't really believe that a change that affects only core EXPLAIN and auto_explain is too small a piece to be independently useful. If it is, the whole feature is probably badly conceived in the first place... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Here is an update version of buffer usage patch. I started to look at this patch, and I have a few comments: 1. I was expecting this patch to get rid of ShowBufferUsage() and friends altogether, instead of adding yet more static counters to support them. Isn't that stuff pretty well superseded by having EXPLAIN support? 2. I do not understand the stuff with propagating counts into the top instrumentation node. That seems like it's going to double-count those counts. In any case it is 100% inconsistent to propagate only buffer counts that way and not any other resource usage. I think you should drop the TopInstrument variable and the logic that propagates counts up. 3. I don't believe that you've sufficiently considered the problem of restoring the previous value of CurrentInstrument after an error. It is not at all adequate to do it in postgres.c; consider subtransactions for example. However, so far as I can see that variable is useless anyway. Couldn't you just drop both that and the prev link? (If you keep TopInstrument then the same objection applies to it.) 4. I don't believe this counting scheme works, except in the special case where all buffer access happens in leaf plan nodes (which might be enough if it weren't for Sort, Materialize, Hash, etc). It looks to me like counts will be transferred into the instrumentation node for the next plan node to stop execution, which could be a descendant of the node that really ought to get charged. You could deal with #4 by having the low-level I/O routines accumulate counts directly into *CurrentInstrument and not have static I/O counters at all, but then you'd have to contend with fixing #3 properly instead of just eliminating that global variable. It might be better to add a start field to struct Instrumentation for each counter, and do something like this: * StartNode copies static counter into start field * StopNode computes delta = static counter - start field, then adds delta to node's count and resets counter to start The reason for the reset is so that the I/O isn't double counted by parent nodes. If you wanted buffer I/O to be charged to the node causing it *and* to all parent nodes, which would be more consistent with the way we charge CPU time, then don't do the reset. Offhand though that seems to me like it'd be more surprising than useful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
On Wed, Sep 30, 2009 at 10:40 PM, Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp wrote: Euler Taveira de Oliveira eu...@timbira.com wrote: But there are some confusions in postgres; ReadBufferCount and BufferHitCount are used for get and hit, but heap_blks_read and heap_blks_hit are used for read and hit in pg_statio_all_tables. I see. :( I fixed the confusions of get, hit and read in your patch. long num_hit = ReadBufferCount + ReadLocalBufferCount; long num_read = num_hit - BufferHitCount - LocalBufferHitCount; should be long num_get = ReadBufferCount + ReadLocalBufferCount; long num_hit = BufferHitCount + LocalBufferHitCount; long num_read = num_get - num_hit; ReadBufferCount means number of buffer access :( Patch attached. I took a look at this today and I have a couple of comments. The basic functionality looks useful, but I think the terminology is too terse. Specific commens: 1. In the EXPLAIN output, I think that the buffers information should be output on its own line, rather than appended to the line that already contains costs and execution times. The current output doesn't include the word buffers or blocks anywhere, which seems to me to be a critical flaw. I would suggest something like Blocks Read: %ld Hit: %ld Temp Read: %ld\n. See the way we handle output of sort type and space usage, for example. 2. Similarly, in pg_stat_statements, the Counters structure could easily use the same names for the structure members that we already use in e.g. pg_stat_database - blks_hit, blks_read, and, say, blks_temp_read. In fact I tend to think we should stick with blocks rather than buffers overall, for consistency with what the system does elsewhere. 3. With respect to the doc changes in explain.sgml, we consistently use disk rather than disc in the documentation; but it may not be necessary to use that word at all, and I think the paragraph can be tightened up a bit: Include information on the number of blocks read, the number of those that are hits (already in shared buffers and do not need to be read in), and the number of those that are reads on temporary, backend-local buffers. This parameter requires that the literalANALYZE/literal parameter also be used. This parameter defaults to literalFALSE/literal. 4. Instrumentation stack is broken doesn't seem terribly helpful in understanding what has gone wrong. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Robert Haas robertmh...@gmail.com wrote: 1. I would suggest something like Blocks Read: %ld Hit: %ld Temp Read: %ld\n. See the way we handle output of sort type and space usage, for example. I have some questions: * Did you use single space and double spaces in your example intentionally? * Should we use lower cases here? * Can I use temp instead of Temp Read to shorten the name? 2. Similarly, in pg_stat_statements, the Counters structure could easily use the same names for the structure members that we already use in e.g. pg_stat_database - blks_hit, blks_read, and, say, blks_temp_read. In fact I tend to think we should stick with blocks rather than buffers overall, for consistency with what the system does elsewhere. I agree to rename them into blks_*, but EXPLAIN (blocks) might be misleading; EXPLAIN (buffer) can be interpreted as buffer usage, but normally we don't call it block usage. My suggestion is: * EXPLAIN (buffers) prints (blocks read: %ld hit: %ld temp: %ld) * auto_explain.log_buffers are not changed * pg_stat_statements uses blks_hit and blks_read 4. Instrumentation stack is broken doesn't seem terribly helpful in understanding what has gone wrong. This message is only for hackers and should not occur. Assert() might be ok instead. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
On Sun, Oct 4, 2009 at 11:22 PM, Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp wrote: Robert Haas robertmh...@gmail.com wrote: 1. I would suggest something like Blocks Read: %ld Hit: %ld Temp Read: %ld\n. See the way we handle output of sort type and space usage, for example. I have some questions: * Did you use single space and double spaces in your example intentionally? No, that was unintentional. * Should we use lower cases here? No. We don't anywhere else in explain.c. * Can I use temp instead of Temp Read to shorten the name? I can't tell what that means without reading the source code. I think clarity should take precedence over brevity. 2. Similarly, in pg_stat_statements, the Counters structure could easily use the same names for the structure members that we already use in e.g. pg_stat_database - blks_hit, blks_read, and, say, blks_temp_read. In fact I tend to think we should stick with blocks rather than buffers overall, for consistency with what the system does elsewhere. I agree to rename them into blks_*, but EXPLAIN (blocks) might be misleading; EXPLAIN (buffer) can be interpreted as buffer usage, but normally we don't call it block usage. My suggestion is: * EXPLAIN (buffers) prints (blocks read: %ld hit: %ld temp: %ld) * auto_explain.log_buffers are not changed * pg_stat_statements uses blks_hit and blks_read I agree. 4. Instrumentation stack is broken doesn't seem terribly helpful in understanding what has gone wrong. This message is only for hackers and should not occur. Assert() might be ok instead. Hmm, I think I like the idea of an Assert(). Logging a cryptic message at DEBUG2 doesn't seem sufficient for a can't-happen condition that probably indicates a serious bug in the code. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Here is an update version of buffer usage patch. * All buffers_* and bufs_* are renamed to blks_*. * 'disc' = 'disk' in documentation * Replace debug-log to Assert(). * Fix a bug in ResetLocalBufferUsage(). log_xxx_stats had not worked. Robert Haas robertmh...@gmail.com wrote: ?* Can I use temp instead of Temp Read to shorten the name? I can't tell what that means without reading the source code. I think clarity should take precedence over brevity. I used temp_blks_read because we have idx_blks_read in pg_statio_xxx. =# \d pg_stat_statements View public.pg_stat_statements Column | Type | Modifiers +--+--- userid | oid | dbid | oid | query | text | calls | bigint | total_time | double precision | rows | bigint | blks_hit | bigint | blks_read | bigint | temp_blks_read | bigint | =# SET work_mem = '1MB'; =# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pgbench_accounts ORDER BY bid; QUERY PLAN --- Sort (cost=21913.32..22163.33 rows=15 width=97) (actual time=81.345..99.054 rows=10 loops=1) Sort Key: bid Sort Method: external sort Disk: 10472kB Blocks Hit: 0 Read: 0 Temp Read: 1309 - Seq Scan on pgbench_accounts (cost=0.00..2667.05 rows=15 width=97) (actual time=0.018..23.129 rows=10 loops=1) Blocks Hit: 74 Read: 1694 Temp Read: 0 Total runtime: 105.238 ms (7 rows) =# SET work_mem = '18MB'; =# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pgbench_accounts ORDER BY bid; QUERY PLAN --- Sort (cost=10972.32..11222.33 rows=15 width=97) (actual time=35.437..43.069 rows=10 loops=1) Sort Key: bid Sort Method: quicksort Memory: 17916kB Blocks Hit: 0 Read: 0 Temp Read: 0 - Seq Scan on pgbench_accounts (cost=0.00..2667.05 rows=15 width=97) (actual time=0.028..15.030 rows=10 loops=1) Blocks Hit: 32 Read: 1635 Temp Read: 0 Total runtime: 52.026 ms (7 rows) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center buffer_usage_20091005.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Itagaki Takahiro escreveu: I fixed the confusions of get, hit and read in your patch. Works for me. Will mark it ready for a committer. PS BTW, your patch (20091001112006.9c36.52131...@oss.ntt.co.jp) doesn't seem to be in archives.p.o. though I've received a copy from the server. -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Euler Taveira de Oliveira wrote: Itagaki Takahiro escreveu: I fixed the confusions of get, hit and read in your patch. Works for me. Will mark it ready for a committer. PS BTW, your patch (20091001112006.9c36.52131...@oss.ntt.co.jp) doesn't seem to be in archives.p.o. though I've received a copy from the server. That's indeed very strange -- I have it locally and I wasn't CCed, so Majordomo must have delivered it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Alvaro Herrera wrote: Euler Taveira de Oliveira wrote: Itagaki Takahiro escreveu: I fixed the confusions of get, hit and read in your patch. Works for me. Will mark it ready for a committer. PS BTW, your patch (20091001112006.9c36.52131...@oss.ntt.co.jp) doesn't seem to be in archives.p.o. though I've received a copy from the server. That's indeed very strange -- I have it locally and I wasn't CCed, so Majordomo must have delivered it. Something was wrong with last month's archive. For some reason it had 96000 files in that directory. I have rerun mhonarc on it and it has normalized now (~2100 files). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Euler Taveira de Oliveira eu...@timbira.com wrote: But there are some confusions in postgres; ReadBufferCount and BufferHitCount are used for get and hit, but heap_blks_read and heap_blks_hit are used for read and hit in pg_statio_all_tables. I see. :( I fixed the confusions of get, hit and read in your patch. longnum_hit = ReadBufferCount + ReadLocalBufferCount; longnum_read = num_hit - BufferHitCount - LocalBufferHitCount; should be longnum_get = ReadBufferCount + ReadLocalBufferCount; longnum_hit = BufferHitCount + LocalBufferHitCount; longnum_read = num_get - num_hit; ReadBufferCount means number of buffer access :( Patch attached. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center buffer_usage_20091001.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Hi Itagaki-san, I'm reviewing your patch. Your patch is in good shape. It applies cleanly. All of the things are built as intended (including the two contrib modules). It doesn't include docs but I wrote it. Basically, I produced another patch (that are attached) correcting some minor gripes; docs are included too. The comments are in-line. static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; + static bool auto_explain_log_buffer = false; Rename it to auto_explain_log_buffers. That's because I renamed the option for plural form. See above. es.verbose = auto_explain_log_verbose; + es.buffer = auto_explain_log_buffer; Change this check to look at es.analyze too. So the es.buffers will only be enabled iif the es.analyze is enabled too. + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, return type must be a row type); + per_query_ctx = rsinfo-econtext-ecxt_per_query_memory; oldcontext = MemoryContextSwitchTo(per_query_ctx); ! tupdesc = CreateTupleDescCopy(tupdesc); Out of curiosity, any reason for this change? else if (strcmp(opt-defname, costs) == 0) es.costs = defGetBoolean(opt); + else if (strcmp(opt-defname, buffer) == 0) + es.buffer = defGetBoolean(opt); I decided to change buffer to buffers. That's because we already have costs and the statistics will not be about one kind of buffer so plural form sounds more natural. + if (es-format == EXPLAIN_FORMAT_TEXT) + { + appendStringInfo(es-str, (gets=%ld reads=%ld temp=%ld), + num_gets, num_reads, num_temp); Rename gets and reads to hit and read. Maybe we could prefix it with buf_ or something else. Rename num_gets and num_reads to num_hit and num_read. The later terminology is used all over the code. + } + else + { + ExplainPropertyLong(Buffer Gets, num_gets, es); + ExplainPropertyLong(Buffer Reads, num_reads, es); + ExplainPropertyLong(Buffer Temp, num_temp, es); I didn't like these terminologies. I came up with Hit Buffers, Read Buffers, and Temp Buffers. I confess that I don't like the last ones. Read Buffers? We're reading from disk blocks. Read Blocks? Read Disk Blocks? Read Data Blocks? Temp Buffers? It could be temporary sort files, hash files (?), or temporary relations. Hit Local Buffers? Local Buffers? Hit Temp Buffers? #include parser/parsetree.h + #include storage/buf_internals.h It's not used. Removed. + CurrentInstrument = instr-prev; + } + else + elog(WARNING, Instrumentation stack is broken); WARNING? I changed it to DEBUG2 and return immediately (as it does some lines of code above). + /* for log_[parser|planner|executor|statement]_stats */ + static long GlobalReadBufferCount; + static long GlobalReadLocalBufferCount; + static long GlobalBufferHitCount; + static long GlobalLocalBufferHitCount; + static long GlobalBufferFlushCount; + static long GlobalLocalBufferFlushCount; + static long GlobalBufFileReadCount; + static long GlobalBufFileWriteCount; + I'm not sure if this is the right place for these counters. Maybe we should put it in buf_init.c. Ideas? boolcosts; /* print costs */ + boolbuffer; /* print buffer stats */ Rename it to buffers. + /* Buffer usage */ + longbuffer_gets;/* # of buffer reads */ + longbuffer_reads; /* # of disk reads */ + longbuffer_temp;/* # of temp file reads */ Rename them to buffers_hit, buffers_read, and buffers_temp. I didn't test this changes with big queries because I don't have some at hand right now. Also, I didn't notice any slowdowns caused by the patch. Comments? If none, it is ready for a committer. -- Euler Taveira de Oliveira http://www.timbira.com/ buffer_usage-20090928.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Euler Taveira de Oliveira eu...@timbira.com wrote: I'm reviewing your patch. Your patch is in good shape. It applies cleanly. All of the things are built as intended (including the two contrib modules). It doesn't include docs but I wrote it. Basically, I produced another patch (that are attached) correcting some minor gripes; docs are included too. The comments are in-line. Thanks. Except choice of words, can I think the basic architecture of the patch is ok? The codes to handle global variables like ReadBufferCount and GlobalReadBufferCount could be rewrite in cleaner way if we could drop supports of log_{parser|planner|executor|statement}_stats. + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, return type must be a row type); + per_query_ctx = rsinfo-econtext-ecxt_per_query_memory; oldcontext = MemoryContextSwitchTo(per_query_ctx); ! tupdesc = CreateTupleDescCopy(tupdesc); Out of curiosity, any reason for this change? That's because the new code is cleaner, I think. Since the result tuple type is defined in OUT parameters, we don't have to re-define the result with CreateTemplateTupleDesc(). + appendStringInfo(es-str, (gets=%ld reads=%ld temp=%ld), + num_gets, num_reads, num_temp); Rename gets and reads to hit and read. Maybe we could prefix it with buf_ or something else. Rename num_gets and num_reads to num_hit and num_read. The later terminology is used all over the code. We should define the meanings of get and hit before rename them. I'd like to propose the meanings as following: * get : number of page access (= hit + read) * hit : number of cache read (no disk read) * read : number of disk read (= number of read() calls) But there are some confusions in postgres; ReadBufferCount and BufferHitCount are used for get and hit, but heap_blks_read and heap_blks_hit are used for read and hit in pg_statio_all_tables. Can I rename ReadBufferCount to BufferGetCount? And which values should we show in EXPLAIN and pg_stat_statements? (two of the three are enough) I didn't like these terminologies. I came up with Hit Buffers, Read Buffers, and Temp Buffers. I confess that I don't like the last ones. Read Buffers? We're reading from disk blocks. Read Blocks? Read Disk Blocks? Read Data Blocks? Temp Buffers? It could be temporary sort files, hash files (?), or temporary relations. Hit Local Buffers? Local Buffers? Hit Temp Buffers? I borrowed the terms Buffer Gets and Buffer Reads from STATSPACK report in Oracle Database. But I'm willing to rename them if appropriate. http://www.oracle.com/apps_benchmark/doc/awrrpt_20090325b_900.html#600 Presently Temp Buffers contains temporary sort files, hash files, and materialized executor plan. Local buffer statistics for temp tables are not included here but merged with shared buffer statistics. Are there any better way to group them? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Itagaki Takahiro escreveu: Thanks. Except choice of words, can I think the basic architecture of the patch is ok? The codes to handle global variables like ReadBufferCount and GlobalReadBufferCount could be rewrite in cleaner way if we could drop supports of log_{parser|planner|executor|statement}_stats. Yes, it is. I'm afraid someone is relying on that piece of code. We can ask people if it is ok to deprecated it; but it should be removed after 2 releases or so. BTW, Isn't it make sense to move the Global* variables to buf_init.c, is it? We should define the meanings of get and hit before rename them. I'd like to propose the meanings as following: * get : number of page access (= hit + read) * hit : number of cache read (no disk read) * read : number of disk read (= number of read() calls) +1. But there are some confusions in postgres; ReadBufferCount and BufferHitCount are used for get and hit, but heap_blks_read and heap_blks_hit are used for read and hit in pg_statio_all_tables. I see. :( Can I rename ReadBufferCount to BufferGetCount? And which values should we show in EXPLAIN and pg_stat_statements? (two of the three are enough) Do you want to include number of page access in the stats? If not, we don't need to rename the variables for now (maybe a separate patch?). And IMHO we should include hit and read because get is just a simple math. I borrowed the terms Buffer Gets and Buffer Reads from STATSPACK report in Oracle Database. But I'm willing to rename them if appropriate. http://www.oracle.com/apps_benchmark/doc/awrrpt_20090325b_900.html#600 Hmm... I don't have a strong opinion about those names as I said. So if you want to revert the old names... Presently Temp Buffers contains temporary sort files, hash files, and materialized executor plan. Local buffer statistics for temp tables are not included here but merged with shared buffer statistics. Are there any better way to group them? Are you sure? Looking at ReadBuffer_common() function I see an 'if (isLocalBuf)' test. As I said your patch is in good shape and if we solve those terminologies, it is done for a committer. Would you care to submit another patch if you want to do some modifications? -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements (review)
Euler Taveira de Oliveira eu...@timbira.com writes: Itagaki Takahiro escreveu: Thanks. Except choice of words, can I think the basic architecture of the patch is ok? The codes to handle global variables like ReadBufferCount and GlobalReadBufferCount could be rewrite in cleaner way if we could drop supports of log_{parser|planner|executor|statement}_stats. Yes, it is. I'm afraid someone is relying on that piece of code. If we have a better substitute, I think there'd be nothing wrong with removing those features. They were never anything but pretty crufty anyway, and they are *not* a compatibility issue because applications have no direct way to access those stats. However, you'd have to be sure that the substitute covers all the use-cases for the old stats ... which strikes me as a lot more territory than this patch has proposed to cover. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buffer usage in EXPLAIN and pg_stat_statements
Here is a proposal to add buffer usage information to EXPLAIN and contrib/pg_stat_statements. We can retrieve new values 'gets', 'reads' and 'temp': - gets : total number of buffer pool access - reads : total number of data file access - temp : total number of temp file access (sort) In EXPLAIN, we can use EXPLAIN (ANALYZE, BUFFER) ... syntax. Each executor node shows buffer usage only in it; parent nodes don't contain buffer usages in their sub nodes. In pg_stat_statements, new 3 columns are added to the view. We can determine queries that consume I/O bandwidth using SELECT * FROM pg_stat_statements ORDER BY reads DESC. We will find out bad queries easily using those buffer and disk access information in addition to duration statistics. I implementd this feature using an instrumentation stack. A global variable CurrentInstrument points top of the stack, and each Instrumentation are linked with newly added 'prev' field. The stack must be reset even on error because each innstrumentation might have been deallocated already. I added codes to reset stack in main loop of backend for the purpose. TopInstrument is a special node that sums up all of the child nodes. It tracks QueryDesc.totaltime and used in pg_stat_statements. There might be another idea that walking around on planstate tree and gathering all counters in the module, but very complex codes are needed. I chose a simple way. I'll write documentations if this design is accepted. Comments welcome. Output samples are below: [EXPLAIN] =# EXPLAIN (ANALYZE, BUFFER) SELECT * FROM pgbench_accounts ORDER BY bid; QUERY PLAN -- Sort (cost=...) (actual ...) (gets=0 reads=0 temp=1309) Sort Key: bid Sort Method: external sort Disk: 10472kB - Seq Scan on pgbench_accounts (cost=...) (actual ...) (gets=1798 reads=1644 temp=0) Total runtime: 75.867 ms [contrib/pg_stat_statements] =# SELECT query, gets, reads FROM pg_stat_statements ORDER BY gets DESC LIMIT 4; query | gets | reads --+---+--- UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2; | 58628 | 1 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2; | 26999 | 1929 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2; | 25474 | 1 SELECT abalance FROM pgbench_accounts WHERE aid = $1;| 19950 | 0 (4 rows) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center buffer_usage-20090817.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer usage in EXPLAIN and pg_stat_statements
2009/8/17 Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp: Here is a proposal to add buffer usage information to EXPLAIN and contrib/pg_stat_statements. We can retrieve new values 'gets', 'reads' and 'temp': - gets : total number of buffer pool access - reads : total number of data file access - temp : total number of temp file access (sort) In EXPLAIN, we can use EXPLAIN (ANALYZE, BUFFER) ... syntax. Each executor node shows buffer usage only in it; parent nodes don't contain buffer usages in their sub nodes. In pg_stat_statements, new 3 columns are added to the view. We can determine queries that consume I/O bandwidth using SELECT * FROM pg_stat_statements ORDER BY reads DESC. We will find out bad queries easily using those buffer and disk access information in addition to duration statistics. I implementd this feature using an instrumentation stack. A global variable CurrentInstrument points top of the stack, and each Instrumentation are linked with newly added 'prev' field. The stack must be reset even on error because each innstrumentation might have been deallocated already. I added codes to reset stack in main loop of backend for the purpose. TopInstrument is a special node that sums up all of the child nodes. It tracks QueryDesc.totaltime and used in pg_stat_statements. There might be another idea that walking around on planstate tree and gathering all counters in the module, but very complex codes are needed. I chose a simple way. I'll write documentations if this design is accepted. Comments welcome. Output samples are below: [EXPLAIN] =# EXPLAIN (ANALYZE, BUFFER) SELECT * FROM pgbench_accounts ORDER BY bid; QUERY PLAN -- Sort (cost=...) (actual ...) (gets=0 reads=0 temp=1309) Sort Key: bid Sort Method: external sort Disk: 10472kB - Seq Scan on pgbench_accounts (cost=...) (actual ...) (gets=1798 reads=1644 temp=0) Total runtime: 75.867 ms [contrib/pg_stat_statements] =# SELECT query, gets, reads FROM pg_stat_statements ORDER BY gets DESC LIMIT 4; query | gets | reads --+---+--- UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2; | 58628 | 1 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2; | 26999 | 1929 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2; | 25474 | 1 SELECT abalance FROM pgbench_accounts WHERE aid = $1; | 19950 | 0 (4 rows) It's should be nice Pavel Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers