Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
I noticed a minor omission in the patch as committed. Attached patch corrects this. -- Peter Geoghegan *** a/contrib/pg_stat_statements/pg_stat_statements.c --- b/contrib/pg_stat_statements/pg_stat_statements.c *** generate_normalized_query(pgssJumbleStat *** 2726,2732 if (tok_len 0) continue; /* ignore any duplicates */ ! /* Copy next chunk, or as much as will fit */ len_to_wrt = off - last_off; len_to_wrt -= last_tok_len; --- 2726,2732 if (tok_len 0) continue; /* ignore any duplicates */ ! /* Copy next chunk */ len_to_wrt = off - last_off; len_to_wrt -= last_tok_len; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: I noticed a minor omission in the patch as committed. Attached patch corrects this. Duh. Thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. This line of argument seems to reduce to we don't need to worry about performance of this code path because it won't be reached often. I think I may have over-elaborated, giving you the false impression that this was something I felt strongly about. I'm glad that the overhead has been shown to be quite low, and I think that lexing without the lock held will be fine. OK. Committed after a couple of small further revisions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Mon, Jan 27, 2014 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: OK. Committed after a couple of small further revisions. Great. Thank you. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Sat, Jan 25, 2014 at 2:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, it's fairly expensive to generate that text, in the case of a large/complex statement. It's possible that continuing to hold the lock is nonetheless the right thing to do because release+reacquire is also expensive; but there is no proof of that AFAIK, and I believe that release+reacquire is not likely to be expensive unless the lock is heavily contended, which would be exactly the conditions under which keeping it wouldn't be such a good idea anyway. My reason for only acquiring the shared lock once, and performing text normalization with it held was that in practice most query texts are not all that expensive to lex/normalize, and the cost of holding the lock for the vast majority of sessions (that won't experience contention) is nil. Meh. This line of argument seems to reduce to we don't need to worry about performance of this code path because it won't be reached often. I don't particularly buy that; it may be true in some usage patterns but probably not all. And if you *do* buy it, it can be turned around to say that we needn't worry about the costs of an extra locking cycle, anyway. However, the only thing that is really going to settle this argument is data, so here's a simple test case. I made a script that just consisted of many repetitions of \dd SELECT pg_stat_statements_reset(); and ran it in psql while tracing the system with perf. (\dd produces a query that is middlingly complex, but certainly not a patch on those I've seen produced by many real applications. The point of the reset is to force us through the stats-entry-creation code path each time.) The hits above 1%, in a non-assert-enabled backend: 13.92%34576 postmaster [kernel.kallsyms] [k] 0x8103ed21 8.64%21645 postmaster postgres [.] AllocSetAlloc 5.09%12502 postmaster postgres [.] SearchCatCache 2.37% 6025 postmaster postgres [.] MemoryContextStrdup 2.22% 5624 postmaster libc-2.12.so [.] __strcmp_sse42 1.93% 4860 postmaster postgres [.] core_yylex 1.84% 4501 postmaster postgres [.] base_yyparse 1.83% 4601 postmaster postgres [.] ScanKeywordLookup 1.54% 3893 postmaster postgres [.] copyObject 1.54% 3849 postmaster postgres [.] MemoryContextAllocZeroAligned 1.30% 3237 postmaster postgres [.] expression_tree_walker 1.23% 3069 postmaster postgres [.] palloc 1.15% 2903 postmaster libc-2.12.so [.] memcpy 1.04% 2566 postmaster postgres [.] hash_uint32 So 3.76% of the entire runtime is going into core_yylex+ScanKeywordLookup, and presumably half of that can be blamed on generate_normalized_query. On the other hand, the earliest mention of lock-related functions in the profile is 0.17% 411 postmaster postgres [.] LWLockAcquire and LWLockRelease is down here: 0.11% 264 postmaster postgres [.] LWLockRelease So on this example, the *total* cost of LWLocks, across the entire backend, is something like 15% of the cost of generate_normalized_query. This seems to me to be reasonably strong evidence that we should worry about that cost, and that releasing/reacquiring the pgss LWLock is trivial by comparison. Of course, if the lock were contended then the locking cost would go up substantially --- but that would also be a scenario in which it'd be important not to hold the lock unnecessarily. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. This line of argument seems to reduce to we don't need to worry about performance of this code path because it won't be reached often. I think I may have over-elaborated, giving you the false impression that this was something I felt strongly about. I'm glad that the overhead has been shown to be quite low, and I think that lexing without the lock held will be fine. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Wed, Jan 22, 2014 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, have you got any sort of test scenario you could share for this purpose? I'm sure I could build something, but if you've already done it ... I simply ran the standard regression tests, and then straced a backend as it executed the pgss-calling query. I'm not sure that I've understood your question. As previously noted, my approach to testing this patch involved variations of: running make installcheck-parallel Do the regression tests fail for you when doing this? What I see is a duplicate occurrence of an escape_string_warning bleat: *** /home/postgres/pgsql/src/test/regress/expected/plpgsql.out Fri Jan 3 17:07 :46 2014 --- /home/postgres/pgsql/src/test/regress/results/plpgsql.out Sat Jan 25 13:37 :20 2014 *** *** 4568,4573 --- 4568,4579 HINT: Use the escape string syntax for backslashes, e.g., E'\\'. QUERY: SELECT 'foo\\bar\041baz' CONTEXT: PL/pgSQL function strtest() line 4 at RETURN + WARNING: nonstandard use of \\ in a string literal + LINE 1: SELECT 'foo\\bar\041baz' +^ + HINT: Use the escape string syntax for backslashes, e.g., E'\\'. + QUERY: SELECT 'foo\\bar\041baz' + CONTEXT: PL/pgSQL function strtest() line 4 at RETURN strtest - foo\bar!baz == which seems to happen because generate_normalized_query() reruns the core lexer on the given statement, and if you got a warning the first time, you'll get another one. It seems this only happens with rather small values of pg_stat_statements.max, else there's already a RETURN text-constant query in the hashtable so we don't need to do reparsing right here. (I'm testing with max = 100 to exercise the garbage collection code.) I'm not sure this is a big deal for real-world use, because surely by now everyone has either fixed their escape-string issues or disabled the warning. But it's annoying for testing. The big picture of course is that having this warning issued this way is broken anyhow, and has been since day one, so it's not entirely pg_stat_statements' fault. I'm just wondering if it's worth taking the trouble to turn off the warning when reparsing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sat, Jan 25, 2014 at 11:04 AM, Tom Lane t...@sss.pgh.pa.us wrote: Do the regression tests fail for you when doing this? What I see is a duplicate occurrence of an escape_string_warning bleat: *** /home/postgres/pgsql/src/test/regress/expected/plpgsql.out Fri Jan 3 17:07 :46 2014 --- /home/postgres/pgsql/src/test/regress/results/plpgsql.out Sat Jan 25 13:37 :20 2014 *** *** 4568,4573 --- 4568,4579 HINT: Use the escape string syntax for backslashes, e.g., E'\\'. QUERY: SELECT 'foo\\bar\041baz' CONTEXT: PL/pgSQL function strtest() line 4 at RETURN + WARNING: nonstandard use of \\ in a string literal + LINE 1: SELECT 'foo\\bar\041baz' +^ + HINT: Use the escape string syntax for backslashes, e.g., E'\\'. + QUERY: SELECT 'foo\\bar\041baz' + CONTEXT: PL/pgSQL function strtest() line 4 at RETURN strtest - foo\bar!baz == which seems to happen because generate_normalized_query() reruns the core lexer on the given statement, and if you got a warning the first time, you'll get another one. Oh, yes, I noticed that and reached the same conclusion. Sorry, I probably should have mentioned this pro-actively. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sat, Jan 25, 2014 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: I've chosen to handle failures to load query text data by just returning NULL for that query text, which seems reasonable given the new mindset that the query text is auxiliary data less important than the actual counts. I guess that's okay. I've not done much more than smoke-testing on this; I'm thinking about hot-wiring the code to sometimes force it through the error paths, just to make sure they act as expected. I've also not done any real performance testing. I was never too worried about the impact on performance, because any price that must be paid is paid only when new entries are created, which is typically a very rare event that was already expensive due to requiring an exclusive lock. This patch will make it much rarer in practice by increasing the pg_stat_statements.max default, and thus reducing the impact on such exclusive locks on *all* sessions, not just those executing less popular queries. I don't know about you, but the reason that I didn't performance test this is that it's really hard to think of a representative scenario where it could possibly matter, even though it seems like there might be one. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Why do you think it's better to release the shared lock while generating a normalized query text, only to acquire it once more? I'm not suggesting that it's the wrong thing to do. I'm curious about the reasoning around assessing the costs. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: Why do you think it's better to release the shared lock while generating a normalized query text, only to acquire it once more? I'm not suggesting that it's the wrong thing to do. I'm curious about the reasoning around assessing the costs. Well, it's fairly expensive to generate that text, in the case of a large/complex statement. It's possible that continuing to hold the lock is nonetheless the right thing to do because release+reacquire is also expensive; but there is no proof of that AFAIK, and I believe that release+reacquire is not likely to be expensive unless the lock is heavily contended, which would be exactly the conditions under which keeping it wouldn't be such a good idea anyway. So I'd prefer to leave it doing what it did before, until there's some concrete evidence that keeping the lock is a better idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sat, Jan 25, 2014 at 2:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, it's fairly expensive to generate that text, in the case of a large/complex statement. It's possible that continuing to hold the lock is nonetheless the right thing to do because release+reacquire is also expensive; but there is no proof of that AFAIK, and I believe that release+reacquire is not likely to be expensive unless the lock is heavily contended, which would be exactly the conditions under which keeping it wouldn't be such a good idea anyway. My reason for only acquiring the shared lock once, and performing text normalization with it held was that in practice most query texts are not all that expensive to lex/normalize, and the cost of holding the lock for the vast majority of sessions (that won't experience contention) is nil. Acquiring the shared lock twice for something like that struck me as unjustified. I though it was closer to the original coding to lex with the lock, because of course the original coding will only ever acquire the shared lock once. The original lex-with-no-lock coding is only justified by well, might as well. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan p...@heroku.com wrote: Actually, I think the whole thing is rather badly designed, as warm cache or no you're still proposing to do thousands of kernel calls while holding a potentially contended LWLock. I suggest what we do is (1) read in the whole file, (2) acquire the lock, (3) re-read the whole file in the same buffer, (4) work from the buffer. fseeko() and fread() are part of the standard library, not any kernel. I don't believe that what I've done in qtext_fetch() implies thousands of kernel calls, or thousands of context switches. I ran an strace on a pg_stat_statements backend with a full ~5,000 entries. It seems that glibc is actually content to always make lseek() and read() syscalls in the event of an fseek(), even though it does maintain its own buffer and could in principle have a lot more gumption about that [1]. I agree that we should copy everything into a buffer in one go. I think that making the LWLocking duration as brief as possible isn't critically important. Automated tools will only be interested in new query texts (implying possible I/O while share locking) as they observe new entries. But new entries are the only thing worth considering that may potentially block on that shared lock (although, not as they write out their query texts, which almost always just requires a shared lock). New entries ought to be very rare occurrence compared to the execution of queries - certainly, on the busy production systems I use pg_stat_statements on, it may be weeks before a new query is observed (uh, with the possible exception of *my* ad-hoc pg_stat_statements queries). On my laptop, explain analyze select * from pg_stat_statements indicates a total runtime of ~9ms with 5,000 regression test entries, even with the (still modest) overhead of doing all those read()/lseek() calls. So we're talking about a small impact on a new entry, that will only be affected once, that was always going to have to write out its query text to file anyway. On top of all this, in general it's very improbable that any given new entry will be affected at all, because two unlikely things need to happen at the same time for that to be possible. The most important aspect of keeping the overhead of pg_stat_statements low is that there not be too much cache pressure. Obviously the huge reduction in its shared memory footprint that this patch allows helps with that. [1] http://www.pixelbeat.org/programming/stdio_buffering/ -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan p...@heroku.com wrote: BTW shouldn't there be an fflush() in qtext_store? I don't think so, no. Whenever qtext_store() callers don't fflush() before releasing their exclusive lock, they FreeFile() before doing so instead. In the case of pgss_shmem_startup() there is no lock held anyway, because it doesn't matter, for the same reason it doesn't matter that we're modifying shmem structures in a way that ordinarily necessitates an exclusive lock. Why fflush() every qtext_store() call if there is expected to be more than one, as is the case for several callers? Having said all that, it occurs to me now that I could have been smarter about where I fflush(). I'm pretty sure pgss_store() should have two fflush() calls. The first can occur with just a shared LWLock held, before the lock strength promotion interim (you probably noticed that I no longer generate a normalized query text in that interim, for reasons that are obvious). The second fflush() may occur only in the very unlikely event of a garbage collection necessitating a new qtext_store() call with an exclusive lock held, a few lines after the first fflush(). No need to fflush() with the exclusive lock the vast majority of the time. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan p...@heroku.com wrote: Actually, I think the whole thing is rather badly designed, as warm cache or no you're still proposing to do thousands of kernel calls while holding a potentially contended LWLock. I suggest what we do is (1) read in the whole file, (2) acquire the lock, (3) re-read the whole file in the same buffer, (4) work from the buffer. fseeko() and fread() are part of the standard library, not any kernel. I don't believe that what I've done in qtext_fetch() implies thousands of kernel calls, or thousands of context switches. I ran an strace on a pg_stat_statements backend with a full ~5,000 entries. It seems that glibc is actually content to always make lseek() and read() syscalls in the event of an fseek(), even though it does maintain its own buffer and could in principle have a lot more gumption about that [1]. I agree that we should copy everything into a buffer in one go. Yeah, that was what I thought might happen. Even if glibc were smarter, a lot of other libc implementations aren't. Also, ISTM that traversal of the hash table will generally lead to more-or-less-random access into the text file. So unless you assume that libc has mmap'd the whole file, it'd have to do a lot of kernel calls anyway to get different pages of the file into its buffer at different times. I think that read the whole file is probably going to net out not any more complex as far as our code goes, and it'll be making a lot fewer assumptions about how smart the libc level is and what's happening at the kernel level. I'll have a go at coding it, anyway. I think that making the LWLocking duration as brief as possible isn't critically important. Maybe, but I don't like the idea that calling pg_stat_statements() frequently could result in random glitches in response time for other queries. Automated tools will only be interested in new query texts (implying possible I/O while share locking) as they observe new entries. This argument would be more plausible if there were a way to fetch the text for a previously-observed entry without invoking pg_stat_statements(). I'm surprised you've not submitted a patch to add such a function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: I ran an strace on a pg_stat_statements backend with a full ~5,000 entries. BTW, have you got any sort of test scenario you could share for this purpose? I'm sure I could build something, but if you've already done it ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Wed, Jan 22, 2014 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, have you got any sort of test scenario you could share for this purpose? I'm sure I could build something, but if you've already done it ... I simply ran the standard regression tests, and then straced a backend as it executed the pgss-calling query. I'm not sure that I've understood your question. As previously noted, my approach to testing this patch involved variations of: running make installcheck-parallel Concurrently, running pgbench with multiple clients each calling pg_stat_statements() and pg_stat_statements_reset(). The latter was sometimes rigged to do a direct garbage collection to stress test things. My expectation was that the sanity checking done everywhere would complain if there were any race conditions or other bugs. This was pretty effective as a smoke test during development. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Wed, Jan 22, 2014 at 1:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, have you got any sort of test scenario you could share for this purpose? I'm sure I could build something, but if you've already done it ... I simply ran the standard regression tests, and then straced a backend as it executed the pgss-calling query. I'm not sure that I've understood your question. Hm, are there 5K distinct queries in the regression tests? I'd have thought they weren't enough to fill a large pgss hash. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Wed, Jan 22, 2014 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hm, are there 5K distinct queries in the regression tests? I'd have thought they weren't enough to fill a large pgss hash. Well, yes. They're a really bad case for pg_stat_statements, in that almost all distinct queries are only executed once or twice and yet there are enough of them. I think this is generally useful for testing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
BTW, what is the value of using posix_fadvise() in warm_fcache? ISTM that if you're worried about waiting for I/O while holding the LWLock (as I concur you should be), this coding is entirely inadequate for preventing that, as the whole point of posix_fadvise is that it won't wait around for the I/O to happen. It strains credulity to the breaking point to imagine that the kernel will have gotten around to reading the file in the small number of nanoseconds that will elapse before you start needing the data. Unless of course the file is already in buffers, but then the whole thing was a waste of code. I think we should flush the posix_fadvise() code path as being neither useful nor portable, and just do the actual read() as in the other path (which btw is lacking an essential fseek). Actually, I think the whole thing is rather badly designed, as warm cache or no you're still proposing to do thousands of kernel calls while holding a potentially contended LWLock. I suggest what we do is (1) read in the whole file, (2) acquire the lock, (3) re-read the whole file in the same buffer, (4) work from the buffer. It might be possible to optimize the re-read away in the common case that the file didn't actually change since we started to read it; we'd need to put a bit more reporting into qtext_store probably, so that it's possible to tell if any space is reserved but still unwritten. BTW shouldn't there be an fflush() in qtext_store? We could also eliminate some palloc/pfree cycles by using the string directly in the buffer (at the small cost of needing to include the strings' trailing nulls in the file). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Tue, Jan 21, 2014 at 6:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, what is the value of using posix_fadvise() in warm_fcache? ISTM that if you're worried about waiting for I/O while holding the LWLock (as I concur you should be), this coding is entirely inadequate for preventing that, as the whole point of posix_fadvise is that it won't wait around for the I/O to happen. It strains credulity to the breaking point to imagine that the kernel will have gotten around to reading the file in the small number of nanoseconds that will elapse before you start needing the data. Unless of course the file is already in buffers, but then the whole thing was a waste of code. Not necessarily. Under Linux for example, POSIX_FADV_SEQUENTIAL sets the readahead window to the default size for the backing device. That seems like a useful thing, though I will admit that it's hard to quantify how useful here. If it is useful, and furthermore if it's more useful than just reading the file into a temp buffer (which, I admit, is not clear), then why not do it where we can? Adopting support for the non-portable posix_fadvise is a sunk cost. I think we should flush the posix_fadvise() code path as being neither useful nor portable, and just do the actual read() as in the other path Perhaps. Actually, I think the whole thing is rather badly designed, as warm cache or no you're still proposing to do thousands of kernel calls while holding a potentially contended LWLock. I suggest what we do is (1) read in the whole file, (2) acquire the lock, (3) re-read the whole file in the same buffer, (4) work from the buffer. fseeko() and fread() are part of the standard library, not any kernel. I don't believe that what I've done in qtext_fetch() implies thousands of kernel calls, or thousands of context switches. Now, using a buffer not managed by libc might be better, but I should note what holding that LWLock actually represents. As you know, it's a shared lock that does not block the updating of existing entries. It will only block the creation of entirely new ones. That ought to be very rare. You may recall that when I worked with you on pg_stat_statements in the past, I characterized the rate of growth as being logarithmic. Once the system has been running for half an hour, new entries become very infrequent indeed. Admittedly, I may be failing to consider a worst case there, but in the average case this works fine. Encouraging automated tools to lazily retrieve query texts is a big part of why I thought this might be the least worst thing. It's not obvious to me that always avoiding blocking new entries while performing physical I/O is important enough to warrant the additional complexity of copying to a buffer first, and mostly working off that. Also, you must consider that there may have been a garbage collection in the interim, so you have to get the shared lock twice (to check the garbage collection count, to later guard against your buffer having been invalidated by a garbage collection) instead of just getting the shared lock once. I discussed this with Pavel, actually. Getting the shared lock twice on every pg_stat_statements() call doesn't seem very good either. Especially since you'll have to open the file twice, *with* the shared lock second time around, when your scheme misses new entries. Your scheme won't work out under exactly the same circumstances that mine doesn't do too well, which in when there is a new entry to consider. When that doesn't happen, holding a shared lock is not that important. It might be possible to optimize the re-read away in the common case that the file didn't actually change since we started to read it; we'd need to put a bit more reporting into qtext_store probably, so that it's possible to tell if any space is reserved but still unwritten. So, to be clear, you'd still be reading from the file if that proved necessary? I think you'll have to, because it seems useful that pg_stat_statements offers a limited form of consistency, in that you only see entries at a point in time, even if the figures themselves are not exactly consistent. Maybe I've missed something, but it's not obvious to me that you're any better off as compared to just using a FILE/stream. BTW shouldn't there be an fflush() in qtext_store? I don't think so, no. Whenever qtext_store() callers don't fflush() before releasing their exclusive lock, they FreeFile() before doing so instead. In the case of pgss_shmem_startup() there is no lock held anyway, because it doesn't matter, for the same reason it doesn't matter that we're modifying shmem structures in a way that ordinarily necessitates an exclusive lock. Why fflush() every qtext_store() call if there is expected to be more than one, as is the case for several callers? FreeFile() calls fclose(). fclose() is described by the standard as flushing its stream. It is my understanding that calling fflush() on a stream immediately prior to
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan p...@heroku.com wrote: Not necessarily. Under Linux for example, POSIX_FADV_SEQUENTIAL sets the readahead window to the default size for the backing device Excuse me; I meant to put POSIX_FADV_SEQUENTIAL doubles this size [default size for the backing device]. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao masao.fu...@gmail.com wrote: The patch doesn't apply cleanly against the master due to recent change of pg_stat_statements. Could you update the patch? Revision is attached, including changes recently discussed on other thread [1] to allow avoiding sending query text where that's not desirable and increase in default pg_stat_statements.max to 5000. I see this is marked as ready for committer. Where does it stand in relation to the other long-running thread about calls under-estimation propagation? I was surprised to find that there isn't any CommitFest entry linked to that thread, so I'm wondering if that proposal is abandoned or what. If it's not, is committing this going to blow up that patch? BTW, I'm also thinking that the detected_version kluge is about ready to collapse of its own weight, or at least is clearly going to break in future. What we need to do is embed the API version in the C name of the pg_stat_statements function instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: I see this is marked as ready for committer. Where does it stand in relation to the other long-running thread about calls under-estimation propagation? I was surprised to find that there isn't any CommitFest entry linked to that thread, so I'm wondering if that proposal is abandoned or what. If it's not, is committing this going to blow up that patch? I believe that proposal was withdrawn. I think the conclusion there was that we should just expose queryid and be done with it. In a way, exposing the queryid enabled this work, because it provides an identifier that can be used instead of sending large query texts each call. BTW, I'm also thinking that the detected_version kluge is about ready to collapse of its own weight, or at least is clearly going to break in future. What we need to do is embed the API version in the C name of the pg_stat_statements function instead. I agree that it isn't scalable. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
Peter Geoghegan p...@heroku.com writes: On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, I'm also thinking that the detected_version kluge is about ready to collapse of its own weight, or at least is clearly going to break in future. What we need to do is embed the API version in the C name of the pg_stat_statements function instead. I agree that it isn't scalable. Yeah. It was more or less tolerable as long as we were only using it in connection with identifying the set of output columns, but using it to identify the expected input arguments too seems darn rickety. I'll refactor so there are separate C call points for each supported API version. (Well, I guess it's too late to separate 1.0 and 1.1, but we can make 1.2 and future versions separate.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao masao.fu...@gmail.com wrote: The patch doesn't apply cleanly against the master due to recent change of pg_stat_statements. Could you update the patch? Revision is attached, including changes recently discussed on other thread [1] to allow avoiding sending query text where that's not desirable and increase in default pg_stat_statements.max to 5000. I've found myself tweaking things a bit more here and there. I've added additional clarification around why we do an in-place garbage collection (i.e. why we don't just write a new file and atomically rename -- grep over-engineer to find what I mean). I also fully documented the pg_stat_statements function. If we want external tool authors to use it to avoid sending query texts, they have to know about it in the first place. I now use the pg_stat_tmp directory for my temp file (so pg_stat_statements behaves more like the statistics collector). The stats_temp_directory GUC is not used, for reasons that a patch comment explains. I've fuzz-tested this throughout with the make installcheck-parallel regression tests. I found it useful to run that in one terminal, while running pgbench with multiple clients in another. The pgbench script looks something like: select * from pg_stat_statements; select pg_stat_statements_reset(); pg_stat_statements_reset() was temporarily rigged to perform a garbage collection (and not a reset) for the benefit of this stress-test. The function pg_stat_statements(), the garbage collection function and pgss_store() will complain loudly if they notice anything out of the ordinary. I was not able to demonstrate any problems through this technique (though I think it focused my thinking on the right areas during development). Clearly, missed subtleties around managing the external file while locking in order to ensure correct behavior (that no session can observe something that it should or should not) will be something that a committer will want to pay particular attention to. I go to some lengths to to avoid doing this with only a shared lock. FYI, without hacking things up, it's quite hard to make external query text garbage collection occur - need_gc_qtexts() will almost always say no. It will only automatically happen once with pg_stat_statements.max set to 1000 when the standard regression tests are run. This is a really extreme workload in terms of causing pg_stat_statements churn, which shows just how unlikely garbage collection is in the real world. Still, it ought to be totally robust when it happens. In passing, I did this: /* !* Rename file into place, to atomically commit to serializing. */ Because at this juncture, there ought to be no file to replace (not since Magnus had pg_stat_statements not keep the main global file around for as long as the server is running, in the style of the statistics collector). Thanks [1] http://www.postgresql.org/message-id/cam3swzsvff-vbnc8sc-0cpwzxvqh49reybchb95t95fcrgs...@mail.gmail.com -- Peter Geoghegan pg_stat_statements_ext_text.v5.2013_12_09.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Mon, Dec 9, 2013 at 7:31 PM, Peter Geoghegan p...@heroku.com wrote: I go to some lengths to to avoid doing this with only a shared lock. I should have said: I go to great lengths to do this with only a shared lock, and not an exclusive (see gc_count stuff). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sun, Nov 17, 2013 at 1:15 PM, Peter Geoghegan p...@heroku.com wrote: On Sat, Nov 16, 2013 at 4:36 PM, Peter Geoghegan p...@heroku.com wrote: I'll post a revision with fixes for those. Another concern is that I see some race conditions that only occur when pg_stat_statements cache is under a lot of pressure with a lot of concurrency, that wasn't revealed by my original testing. I'm working on a fix for that. Revision attached. The patch doesn't apply cleanly against the master due to recent change of pg_stat_statements. Could you update the patch? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
With a pg_stat_statements.max of 1,000, on my system with the patch applied the additional amount of shared memory required is only 192KiB. That compares with about 1.17MiB for the same setting in master's version of pg_stat_statements. Surely with this huge reduction, we should revisit that default - I think that a default setting of 5,000 for pg_stat_statements.max makes more sense. entry_dealloc() requires an exclusive lock, locking all queries out of simply recording their costs. With a lot of cache pressure this could be really expensive. In my opinion that risk alone justifies the change. Without adding another GUC, we might even go so far as to have a mechanism like checkpoint_warning warn that entry_dealloc() calls occur too frequently... -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Fri, Nov 15, 2013 at 8:51 AM, Peter Eisentraut pete...@gmx.net wrote: Compiler warnings with fortify settings: I'll post a revision with fixes for those. Another concern is that I see some race conditions that only occur when pg_stat_statements cache is under a lot of pressure with a lot of concurrency, that wasn't revealed by my original testing. I'm working on a fix for that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On Sat, Nov 16, 2013 at 4:36 PM, Peter Geoghegan p...@heroku.com wrote: I'll post a revision with fixes for those. Another concern is that I see some race conditions that only occur when pg_stat_statements cache is under a lot of pressure with a lot of concurrency, that wasn't revealed by my original testing. I'm working on a fix for that. Revision attached. -- Peter Geoghegan pg_stat_statements_ext_text.v2.2013_11_16.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core
On 11/14/13, 3:17 AM, Peter Geoghegan wrote: Attached patch allows pg_stat_statements to store arbitrarily long query texts, using an external, transparently managed lookaside file. Compiler warnings with fortify settings: gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pg_stat_statements.o pg_stat_statements.c -MMD -MP -MF .deps/pg_stat_statements.Po pg_stat_statements.c: In function ‘entry_reset’: pg_stat_statements.c:1827:11: warning: ignoring return value of ‘ftruncate’, declared with attribute warn_unused_result [-Wunused-result] pg_stat_statements.c: In function ‘garbage_collect_query_texts’: pg_stat_statements.c:1779:11: warning: ignoring return value of ‘ftruncate’, declared with attribute warn_unused_result [-Wunused-result] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers