Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 3 March 2012 20:22, Jeff Janes jeff.ja...@gmail.com wrote: Add it all up, and instead of pre-reading 32 consecutive 8K blocks, it pre-reads only about 1 or 2 consecutive ones on the final merge. Now some of those could be salvaged by the kernel keeping track of multiple interleaved read ahead opportunities, but in my hands vmstat shows a lot of IO wait and shows reads that seem to be closer to random IO than large read-ahead. If it used truly efficient read ahead, CPU would probably be limiting. Can you suggest a benchmark that will usefully exercise this patch? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding probes for smgr
On 28 July 2012 17:15, Tom Lane t...@sss.pgh.pa.us wrote: IMV smgr is pretty vestigial. I wouldn't recommend loading more functionality onto that layer, because it's as likely as not that we'll just get rid of it someday. Agreed. I recently found myself reading a paper written by Stonebraker back in the Berkeley days: http://dislab2.hufs.ac.kr/dislab/seminar/2007/ERL-M87-06.pdf This paper appears to have been published in about 1988, and it shows. It's fairly obvious from reading the opening paragraph that the original rationale for the design of the storage manager doesn't hold these days. Of course, it's also obvious from reading the code, since for example there is only one storage manager module. This state of affairs sort of reminds me of mcxt.c . The struct MemoryContextData is described as an abstract type that can have multiple implementations, despite the fact that since 2000 (and perhaps earlier), the underlying type is invariably AllocSetContext. I never investigated if that indirection still needs to exist, but I suspect that it too is a candidate for refactoring. Do you agree? Incidentally, you might consider refreshing these remarks in the smgr README: In Berkeley Postgres each relation was tagged with the ID of the storage manager to use for it. This is gone. It would be more reasonable to associate storage managers with tablespaces (a feature not present as this text is being written, but one likely to emerge soon). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Help me develop new commit_delay advice
Many of you will be aware that the behaviour of commit_delay was recently changed. Now, the delay only occurs within the group commit leader backend, and not within each and every backend committing a transaction: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f11e8be3e812cdbbc139c1b4e49141378b118dee For those of you that didn't follow this development, I should point out that I wrote a blogpost that described the idea, which will serve as a useful summary: http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html I made what may turn out to be a useful observation during the development of the patch, which was that for both the tpc-b.sql and insert.sql pgbench-tools scripts, a commit_delay of half of my wal_sync_method's reported raw sync speed looked optimal. I use Linux, so my wal_sync_method happened to have been fdatasync. I measured this using pg_test_fsync. The devel docs still say of commit_delay and commit siblings: Good values for these parameters are not yet clear; experimentation is encouraged. This has been the case since Postgres 7.1 (i.e. it has never been clear what good values were - the folk wisdom was actually that commit_delay should always be set to 0). I hope to be able to formulate some folk wisdom about setting commit_delay from 9.3 on, that may go on to be accepted as an official recommendation within the docs. I am rather curious as to what experimentation shows optimal values for commit_delay to be for a representative cross-section of hardware. In particular, I'd like to see if setting commit_delay to half of raw sync time appears to be optimal for both insert.sql and tpc-b.sql workloads across different types of hardware with different sync times. Now, it may be sort of questionable to take those workloads as general proxies for performance, not least since they will literally give Postgres as many *completely uniform* transactions as it can handle. However, it is hard to think of another, better general proxy for performance that is likely to be accepted as such, and will allows us to reason about setting commit_delay. While I am not completely confident that we can formulate a widely useful, simple piece of advice, I am encouraged by the fact that a commit_delay of 4,000 worked very well for both tpc-b.sql and insert.sql workloads on my laptop, beating out settings of 3,000 and 5,000 on each benchmark. I am also encouraged by the fact that in some cases, including both the insert.sql and tpc-b.sql cases that I've already described elsewhere, there is actually no downside to setting commit_delay - transaction throughput naturally improves, but transaction latency is actually improved a bit too (or at least the average and worst-cases). This is presumably due to the amelioration of resource contention (from greater commit batching) more than compensating for the obvious downside of adding a delay. It would be useful, for a start, if I had numbers for a battery-backed write cache. I don't have access to one right now though, nor do I have access to any more interesting hardware, which is one reason why I'm asking for help with this. I like to run sync prior to running pg_test_fsync, just in case. [peter@peterlaptop pg_test_fsync]$ sync I then interpret the following output: [peter@peterlaptop pg_test_fsync]$ pg_test_fsync 2 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 112.940 ops/sec fdatasync 114.035 ops/sec fsync 21.291 ops/sec *** SNIP *** So if I can perform 114.035 8KiB sync operations per second, that's an average of about 1 per 8.77 milliseconds, or 8770 microseconds to put it in the units that commit_delay speaks. It is my hope that we will find that when this number is halved, we will arrive at a figure that is worth recommending as a general useful setting for commit_delay for the system. I guess I could gain some additional insight by simply changing my wal_sync_method, but I'd find it more interesting to look at organic setups with faster (not slower) sync times than my system's fdatasync. For those who are able to help me here, I'd like to see pgbench-tools workloads for both tpc-b.sql and insert.sql with incrementing values of commit_delay (increments of, say, 1000 microseconds, perhaps with less granularity where it isn't needed), from 0 to $(1.5 times raw sync speed) microseconds. Thanks -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Help me develop new commit_delay advice
On 1 August 2012 15:14, Amit Kapila amit.kap...@huawei.com wrote: I shall look into this aspect also(setting commit_delay based on raw sync). You also suggest if you want to run the test with different configuration. Well, I was specifically interested in testing if half of raw sync time was a widely useful setting, across a variety of different, though representative I/O subsystems. Unfortunately, without some context about raw sync speed to go along with your numbers, I cannot advance or disprove that idea. It would also have been nice to see a baseline number of 0 too, to get an idea of how effective commit_delay may now be. However, that's secondary. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Help me develop new commit_delay advice
On 29 July 2012 16:39, Peter Geoghegan pe...@2ndquadrant.com wrote: Many of you will be aware that the behaviour of commit_delay was recently changed. Now, the delay only occurs within the group commit leader backend, and not within each and every backend committing a transaction: I've moved this to the pgsql-performance list. Please continue the discussion there. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: pg_pretty_query
On 7 August 2012 20:01, Andrew Dunstan and...@dunslane.net wrote: On 08/07/2012 02:14 PM, Tom Lane wrote: * As per some of the complaints already registered in this thread, ruleutils.c is not designed with the goal of being a pretty-printer. Its primary charter is to support pg_dump by regurgitating rules/views in an unambiguous form, which does not necessarily look very similar to what was entered. An example of a transformation that probably nobody would want in a typical pretty-printing context is expansion of SELECT * lists. But again, there is really no way to turn that off. Another aspect that seems pretty undesirable for pretty-printing is loss of any comments embedded in the query text. I'm very much not in favor of trying to make ruleutils serve two masters, but that's the game we will be playing if we accept this patch. +1. A pretty-printer that makes the query to be cleaned-up actually undergo parse-analysis seems misconceived to me. This is a tool that begs to be written in something like Python, as a satellite project, with much greater flexibility in the format of the SQL that it outputs. One of the challenges is to have a pretty printer that is kept in sync with the dialect that's supported. Anything that doesn't use the backend's parser seems to me to be guaranteed to get out of sync very quickly. I'm not convinced of that. Consider the example of cscope, a popular tool for browsing C code. Its parser was written to be fuzzy, so it's actually perfectly usable for C++ and Java, even though that isn't actually supported, IIRC. Now, I'll grant you that that isn't a perfectly analogous situation, but it is similar in some ways. If we add a new clause to select and that bit is generically pretty-printed, is that really so bad? I have a hard time believing that a well written fuzzy pretty-printer for Postgres wouldn't deliver the vast majority of the benefits of an in-core approach, at a small fraction of the effort. You'd also be able to pretty-print plpgsql function definitions (a particularly compelling case for this kind of tool), which any approach based on the backends grammar will never be able to do (except, perhaps, if you were to do something even more complicated). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Revert commit_delay change; just add comment that we don't hav
On 15 August 2012 05:15, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 14 August 2012 21:26, Bruce Momjian br...@momjian.us wrote: Revert commit_delay change; just add comment that we don't have a microsecond specification. I think that if we eventually decide to change the name of commit_delay for 9.3 (you previously suggested that that might be revisited), it will be reasonable to have the new GUC in units of milliseconds. Well, the reason why it's like that at all is the thought that values of less than 1 millisecond might be useful. Are we prepared to suppose that that is not and never will be true? I think that the need for sub-millisecond granularity had more to do with historic quirks of our implementation. Slight tweaks accidentally greatly improved throughput, if only for the synthetic benchmark in question. I personally have not seen any need for a sub-millisecond granularity when experimenting with the setting on 9.3-devel. However, I am not sure that sub-millisecond granularity could never be of any use, in squeezing the last small increase in throughput made possible by commit_delay. Importantly, feedback as the GUC is tuned is far more predictable than it was with the prior implementation, so this does seem quite unimportant. Why does commit_delay have to be an integer? Can't we devise a way of manipulating it in units of milliseconds, but have the internal representation be a double, as with pg_stat_statements' total_time column? That would allow very fine tuning of the delay. As I've outlined, I'm not sure that it's worth supporting such fine-tuning with the new implementation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Revert commit_delay change; just add comment that we don't hav
On 15 August 2012 14:39, Tom Lane t...@sss.pgh.pa.us wrote: If you wanted to re-implement all the guc.c logic for supporting unit-ified values such that it would also work with floats, we could do that. It seems like way more mechanism than the problem is worth however. Fair enough. I'm not quite comfortable recommending a switch to milliseconds if that implies a loss of sub-millisecond granularity. I know that someone is going to point out that in some particularly benchmark, they can get another relatively modest increase in throughput (perhaps 2%-3%) by splitting the difference between two adjoining millisecond integer values. In that scenario, I'd be tempted to point out that that increase is quite unlikely to carry over to real-world benefits, because the setting is then right on the cusp of where increasing commit_delay stops helping throughput and starts hurting it. The improvement is likely to get lost in the noise in the context of a real-world application, where for example the actually cost of an fsync is more variable. I'm just not sure that that's the right attitude. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The pgrminclude problem
I found a tool that Google authored that attempts to solve the same problems as pgrminclude does in our codebase. It's called include what you use, and is based on Clang. The project is hosted here: http://code.google.com/p/include-what-you-use/ I'm not suggesting that we should start using this tool instead of pgrminclude, because it has enough caveats of its own, and is mostly written with Google's C++ codebase in mind. However, it's worth being aware of. There is a useful analysis of the general problem in the README - Why IWYU is Difficult (you can probably just skip the extensive analysis of C++ templates as they relate to the problem there). The tool is authored by Craig Silverstein, Google's director of technology. If he believes that IWYU is a difficult problem, well, it probably is. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The pgrminclude problem
On 16 August 2012 16:56, Bruce Momjian br...@momjian.us wrote: Good to know. We only use pgrminclude very five years or so, and Tom isn't even keen on that. Yeah. Even if this could be made to work well, we'd still have to do something like get an absolute consensus from all build farm animals, if we expected to have an absolutely trustworthy list. I don't think pgrminclude is a bad idea. I just think that it should only be used to guide the efforts of a human to remove superfluous #includes, which is how it is used anyway. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
, AllocSetGetChunkSpace) return header size too. We were getting away with that before with the doubling strategy, but now we have to factor in that header's size into allowedMem. I don't think my revision satisfactory in what it does here. It isn't obvious what a better principled implementation would do though. Does anyone have any other ideas? I think this patch (or at least your observation about I/O waits within vmstat) may point to a more fundamental issue with our sort code: Why are we not using asynchronous I/O in our implementation? There are anecdotal reports of other RDBMS implementations doing far better than we do here, and I believe asynchronous I/O, pipelining, and other such optimisations have a lot to do with that. It's something I'd hoped to find the time to look at in detail, but probably won't in the 9.3 cycle. One of the more obvious ways of optimising an external sort is to use asynchronous I/O so that one run of data can be sorted or merged while other runs are being read from or written to disk. Our current implementation seems naive about this. There are some interesting details about how this is exposed by POSIX here: http://www.gnu.org/software/libc/manual/html_node/Asynchronous-I_002fO.html It's already anticipated that we might take advantage of libaio for the benefit of FilePrefetch() (see its accompanying comments - it uses posix_fadvise itself - effective_io_concurrency must be 0 for this to ever be called). It perhaps could be considered parallel low-hanging fruit in that it allows us to offer limited though useful backend parallelism without first resolving thorny issues around what abstraction we might use, or how we might eventually make backends thread-safe. AIO supports registering signal callbacks (a SIGPOLL handler can be called), which seems relatively uncontroversial. Platform support for AIO might be a bit lacking, but then you can say the same about posix_fadvise. We don't assume that poll(2) is available, but we already use it where it is within the latch code. Besides, in-kernel support can be emulated if POSIX threads is available, which I believe would make this broadly useful on unix-like platforms. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services sortmem_grow-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sha1, sha2 functions into core?
On 20 August 2012 21:26, Joshua D. Drake j...@commandprompt.com wrote: On 08/20/2012 01:21 PM, Josh Berkus wrote: I don't think US export regulations are the only issue. Some other countries (mostly the usual suspects) forbid the use of crypto software. If we build more crypto functions into the core we make it harder to use Postgres legally in those places. I fail to see how that is our problem. We shouldn't make the software less useful because of those places. Agreed. I find the idea of some secret policeman urging the use of MySQL because it doesn't have a built-in SHA-1 cryptographic hash function seems extremely far-fetched. The BitTorrent protocol uses SHA-1 to validate chunks, and it has been variously estimated that 10% - 50% of all internet traffic is BitTorrent traffic. SHA-1 is also integral to the way that git makes content effectively tamper-proof: http://www.youtube.com/watch?v=4XpnKHJAok8#t=56m -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO
On 23 August 2012 20:21, Bruce Momjian br...@momjian.us wrote: On Tue, Aug 21, 2012 at 12:16:11PM +0900, Tatsuo Ishii wrote: I found this in https://wiki.postgresql.org/wiki/Todo : Improve ability to display optimizer analysis using OPTIMIZER_DEBUG What does this actually mean? Add GUC switch to enable optimizer debug on/off? More fancy/useful info should be printed? If so, what kind of information is required? Well, right now, OPTIMIZER_DEBUG lets you see what plans were considered and removed. I was thinking that information should be available without a special compiled binary. That seems like an unreasonable burden on what OPTIMIZER_DEBUG can do. If you can't build Postgres from source, something that is relatively low barrier, you have little hope of sensibly interpreting low-level output from the planner. However, I could get behind having it as a configure option, which it is currently not. It would probably also make sense to expose SELECTIVITY_DEBUG and GEQO_DEBUG. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_buffers
On 19 February 2012 05:24, Robert Haas robertmh...@gmail.com wrote: I have attached tps scatterplots. The obvious conclusion appears to be that, with only 16MB of wal_buffers, the buffer wraps around with some regularity: we can't insert more WAL because the buffer we need to use still contains WAL that hasn't yet been fsync'd, leading to long stalls. More buffer space ameliorates the problem. Incidentally, I wondered if we could further improve group commit performance by implementing commit_delay with a WaitLatch call, and setting the latch in the event of WAL buffers wraparound (or rather, a queued wraparound request - a segment switch needs WALWriteLock, which the group commit leader holds for a relatively long time during the delay). I'm not really sure how significant a win this might be, though. There could be other types of contention, which could be considerably more significant. I'll try and take a look at it next week. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] effective_io_concurrency
On 30 August 2012 20:28, Robert Haas robertmh...@gmail.com wrote: On Sat, Jul 28, 2012 at 4:09 PM, Jeff Janes jeff.ja...@gmail.com wrote: But it might be better yet to make ordinary index scans benefit from effective_io_concurrency, but even if/when that gets done it would probably still be worthwhile to make the planner understand the benefit. That sounds good too, but separate. Indeed. The original effective_io_concurrency commit message said: ***SNIP*** (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) ...seems like a pity that this debate never reached a useful conclusion. Just how helpful is effective_io_concurrency? Did someone produce a benchmark at some point? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
On 2 September 2012 06:21, Sergey Koposov kopo...@ast.cam.ac.uk wrote: Hi, I'm experiencing the case when bitmap scan is ~ 70 times slower than index scan which seems to be caused by 1) very big table 2) some hash search logic (hash_search_with_hash_value ) Here is the explain analyze of the query with bitmap scans allowed: wsdb= explain analyze select * from test as t, crts.data as d1 where d1.objid=t.objid and d1.mjd=t.mjd limit 1; QUERY PLAN --- Limit (cost=11514.04..115493165.44 rows=1 width=68) (actual time=27.512..66620.231 rows=1 loops=1) - Nested Loop (cost=11514.04..1799585184.18 rows=155832 width=68) (actual time=27.511..66616.807 rows=1 loops=1) - Seq Scan on test t (cost=0.00..2678.40 rows=156240 width=28) (actual time=0.010..4.685 rows=11456 loops=1) - Bitmap Heap Scan on data d1 (cost=11514.04..11518.05 rows=1 width=40) (actual time=5.807..5.807 rows=1 loops=11456) Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid)) - BitmapAnd (cost=11514.04..11514.04 rows=1 width=0) (actual time=5.777..5.777 rows=0 loops=11456) - Bitmap Index Scan on data_mjd_idx (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241 loops=11456) Index Cond: (mjd = t.mjd) - Bitmap Index Scan on data_objid_idx (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248 loops=11456) Index Cond: (objid = t.objid) Total runtime: 66622.026 ms (11 rows) Here is the output when bitmap scans are disabled: QUERY PLAN QUERY PLAN Limit (cost=0.00..329631941.65 rows=1 width=68) (actual time=0.082..906.876 rows=1 loops=1) - Nested Loop (cost=0.00..4979486036.95 rows=151062 width=68) (actual time=0.081..905.683 rows=1 loops=1) Join Filter: (t.mjd = d1.mjd) - Seq Scan on test t (cost=0.00..2632.77 rows=151677 width=28) (actual time=0.009..1.679 rows=11456 loops=1) - Index Scan using data_objid_idx on data d1 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050 rows=248 loops=11456) Index Cond: (objid = t.objid) Total runtime: 907.462 ms When the bitmap scans are enabled the prof of postgres shows 47.10% postmaster postgres [.] hash_search_with_hash_value | --- hash_search_with_hash_value 11.06% postmaster postgres [.] hash_seq_search | --- hash_seq_search 6.95% postmaster postgres [.] hash_any | --- hash_any 5.17% postmaster postgres [.] _bt_checkkeys | --- _bt_checkkeys 4.07% postmaster postgres [.] tbm_add_tuples | --- tbm_add_tuples 3.41% postmaster postgres [.] hash_search | --- hash_search And the last note is that the crts.data table which is being bitmap scanned is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan code is somehow unprepared to combine two bitmaps for such a big table, and this leads to the terrible performance. I think that this kind of question is better suited to the pgsql-performance list. Granted, it was presented as a bug report (though they're generally sent to -bugs rather than -hackers), but I don't think that this is a valid bug. One obvious red-flag from your query plans is that there is a misestimation of the row return count of a few orders of magnitude in the Bitmap Index Scan node. Did you trying performing an ANALYZE to see if that helped? It may also be helpful to show pg_stats entries for both the data.mjd and test.mjd columns. You may find, prior to doing an ANALYZE, that there is no entries for one of those tables. Turning off the enable_* planner options in production is generally discouraged. Certainly, you'd be crazy to do that on a server-wide basis. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value
On 2 September 2012 16:26, Sergey Koposov kopo...@ast.cam.ac.uk wrote: After looking at them, I think I understand the reason -- the number of n_distinct for crts.data is terribly wrong. In reality it should be ~ 130 millions. I already faced this problem at certain point when doing group by objid and PG was excausting all the memory, because of that wrong estimate. But I know that it's a known hard problem to estimate n_distinct. So probably that's the main reason of a problem... That's why we support altering that value with an ALTER TABLE...ALTER COLUMN DDL statement. You might at least consider increasing the statistics target for the column first though, to see if that will make the n_distinct value better accord with reality. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is this non-volatile pointer access OK?
On 3 September 2012 08:10, Daniel Farina dan...@heroku.com wrote: http://doxygen.postgresql.org/xlog_8c_source.html#l08197 On line 8197 of xlog.c: 08194 /* Get a local copy of the last safe checkpoint record. */ 08195 SpinLockAcquire(xlogctl-info_lck); 08196 lastCheckPointRecPtr = xlogctl-lastCheckPointRecPtr; 08197 memcpy(lastCheckPoint, XLogCtl-lastCheckPoint, sizeof(CheckPoint)); 08198 SpinLockRelease(xlogctl-info_lck); Note the use of capital XLogCtl-lastCheckPoint, which is not the volatile pointer. That looks like a bug to me. Come to think of it, the whole convention of using a lower-case variant of the original pointer variable name seems like a foot-gun, given the harmful and indeed very subtle consequences of making this error. I count 98 SpinLockAcquire() call sites (of which only a minority use this convention, which is mostly within xlog.c, I think). Is it worth instituting an alternative convention to make this kind of misuse more obvious? This went unnoticed since February 2009. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_test_fsync output and commit_delay
I propose that we try and develop better commit_delay advice, to make it easier to set the parameter in a way that actually helps performance. I have been researching a way to make commit_delay adaptive, though have yet to develop any further insight that is worth sharing. This may change. The attached simple patch alters the output produced by pg_test_fsync, so that we also see the average sync time per op. The pg_test_fsync docs have had minor alterations too. Hopefully, another doc patch will follow that builds upon this, and actually firmly recommends taking a particular course of action when setting commit_delay - my previous observations about what helped throughput, though supported by Greg Smith, have not been scrutinised enough just yet, I feel. For now, the equivocated wording of my doc alterations (that raw wal_sync_method file sync time might *somehow* be useful here) seems appropriate. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services pg_test_fsync.v1.2012_09_08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Closing CF 2012-06
On 14 September 2012 18:47, Noah Misch n...@leadboat.com wrote: According to the CommitFest site, 2012-06 is still in progress with four unresolved patches. With CF 2012-09 scheduled to start tomorrow, let's get it closed. I propose changing two patches from Waiting on Author to Returned with Feedback and moving the other two to CF 2012-09. That seems reasonable. I've marked both patches that I was listed as a reviewer for (the patches that were previously Waiting on Author) as Returned with Feedback. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 1 October 2012 08:57, Magnus Hagander mag...@hagander.net wrote: I know there was discussions about it earlier, and it wasn't done with an argument of it not being stable between releases (IIRC). I think we can live with that drawback, assuming of course that we document this properly. Well, I'll point out once again that the argument about its stability is invalid, because we serialise the entries to disk. If a point release changes the representation of the query tree such that the hash values won't match, then we have no recourse but to bump pg_stat_statements version number, and invalidate all existing entries. I credit our users with the intelligence to not jump to any rash conclusions about the hash if directly exposed, such as assuming that it has any particular degree of stability with respect to the queries that are fingerprinted, or any degree greater than the self-evident bare minimum. I'm pretty sure that the stability among point releases in the face of potential minor changes to query tree representation thing was something that I imagined as a reason for the proposal being rejected, when I tried to read between the lines of a flat rejection. Perhaps I should have asked for clarification on that point. Now that I think about it, I'm pretty sure that the need to bump catversion, as a result of any change in the way dumping the query tree struct into stored rules needs to happen, will preclude that problem in practice. I've now run into multiple customer installations where it would be very useful to have. The usecase is mainly storing snapshots of the pg_stat_statements output over time and analyzing those. Weird things happen for example when the query text is the same, but the hash is different (which can happen for example when a table is dropped and recreated). And even without that, in order to do anything useful with it, you end up hashing the query text anyway - so using the already existing hash would be easier and more useful. Yes, these are all arguments that I'm familiar with :-) . I've always thought of pg_stat_statements as a low-level statistical view that people would naturally want to store snapshots of for analysis, in much the same way as many do now with things like pg_stat_bgwriter using tools like Munin. Who wouldn't want to know what queries were running a half an hour ago when the database server seemed slower than usual, for example? Such tools should naturally have access to the same candidate key for entries, rather than adding a subtle impedance mismatch by using the string. That reminds me - when are you writing the pg_stat_statements Postgres plugin for Munin? I was disappointed that my proposal was shot down, despite the fact that I independently raised it on list at least twice, and pushed as hard as I felt that I could at the time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE optimization fence on the todo list?
On 1 October 2012 14:05, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Sep 19, 2012 at 3:05 PM, Daniel Browning d...@kavod.com wrote: I'm wondering if there are any technical/standards constraints that are behind the fencing behavior. If there aren't any, maybe an opt-in keyword might do the trick -- WITH UNBOXED foo AS (..)? I may be mistaken, but I think that the fence is described in the SQL standard. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_malloc() versus malloc(0)
On 1 October 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote: 1. Teach pg_malloc not to complain if result == NULL and size == 0. +1 to that proposal. 2. Before the malloc call, have it replace size == 0 with size = 1. I don't like that proposal on purely aesthetic grounds. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 1 October 2012 15:22, Magnus Hagander mag...@hagander.net wrote: On Mon, Oct 1, 2012 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Worse than that: it could change across a minor version update. These are internal data structures we're hashing, and we've been known to have to change them for bug-fix purposes. As Peter pointed out, when we do that we have to change the file format anyway, and wipe the statistics. So chaning that is something we'd have to *note*. I think the need to not break stored rules pins down the actual representation of the Query struct in release branches. I'm not prepared to say that it does so absolutely, since of course certain logic could be altered in a way that resulted in different values within the struct or its children, but I'm curious as to when this actually occurred in the past. Discussion about exposing this value aside, it'd obviously be a bit disappointing if we had to invalidate existing statistics in a point release. Still, that situation isn't made any worse by exposing the value, and in fact doing so could aid in helping users to understand the issues involved. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 1 October 2012 17:12, Stephen Frost sfr...@snowman.net wrote: Peter, all, * Peter Geoghegan (pe...@2ndquadrant.com) wrote: Well, I'll point out once again that the argument about its stability is invalid, because we serialise the entries to disk. If a point release changes the representation of the query tree such that the hash values won't match, then we have no recourse but to bump pg_stat_statements version number, and invalidate all existing entries. What if we simply included the pg_stat_statements version number in what's shown to the user as the 'hash'? ver#.hash ? That won't really help matters. There'd still be duplicate entries, from before and after the change, even if we make it immediately obvious which is which. The only reasonable solution in that scenario is to bump PGSS_FILE_HEADER, which will cause all existing entries to be invalidated. This is a hypothetical scenario, and concerns about it are totally orthogonal to the actual question of whether or not the queryid should be exposed... -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 1 October 2012 18:05, Stephen Frost sfr...@snowman.net wrote: * Peter Geoghegan (pe...@2ndquadrant.com) wrote: That won't really help matters. There'd still be duplicate entries, from before and after the change, even if we make it immediately obvious which is which. The only reasonable solution in that scenario is to bump PGSS_FILE_HEADER, which will cause all existing entries to be invalidated. You're going to have to help me here, 'cause I don't see how there can be duplicates if we include the PGSS_FILE_HEADER as part of the hash, unless we're planning to keep PGSS_FILE_HEADER constant while we change what the hash value is for a given query, yet that goes against the assumptions that were laid out, aiui. Well, they wouldn't be duplicates if you think that the fact that one query was executed before some point release and another after ought to differentiate queries. I do not. If there's a change that results in a given query X no longer hashing to a value A, we need to change PGSS_FILE_HEADER to invalidate statistics which were collected for value A (or else we risk an independent query Y hashing to value A and ending up with completely invalid stats..). Provided we apply that consistently and don't reuse PGSS_FILE_HEADER values along the way, a combination of PGSS_FILE_HEADER and the hash value for a given query should be unique over time. We do need to document that the hash value for a given query may change.. By invalidate, I mean that when we go to open the saved file, if the header doesn't match, the file is considered corrupt, and we simply log that the file could not be read, before unlinking it. This would be necessary in the unlikely event of there being some substantive change in the representation of query trees in a point release. I am not aware of any precedent for this, though Tom said that there was one. Tom: Could you please describe the precedent you had in mind? I would like to better understand this risk. I don't want to get too hung up on what we'd do if this problem actually occurred, because that isn't what this thread is about. Exposing the hash is a completely unrelated question, except that to do so might make pg_stat_statements (including its limitations) better understood. In my opinion, the various log parsing tools have taught people to think about this in the wrong way - the query string is just a representation of a query (and an imperfect one at that). There are other, similar tools that exist in proprietary databases. They expose a hash value, which is subject to exactly the same caveats as our own. They explicitly encourage the type of aggregation by third-party tools that I anticipate will happen with pg_stat_statements. I want to facilitate this, but I'm confident that the use of (dbid, userid, querytext) as a candidate key by these tools is going to cause confusion, in subtle ways. By using the hash value, those tools are exactly as well off as pg_stat_statements is, which is to say, as well off as possible. I simply do not understand objections to the proposal. Have I missed something? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 2 October 2012 18:16, Tom Lane t...@sss.pgh.pa.us wrote 1. Why isn't something like md5() on the reported query text an equally good solution for users who want a query hash? Because that does not uniquely identify the entry. The very first thing that the docs say on search_path is Qualified names are tedious to write, and it's often best not to wire a particular schema name into applications anyway. Presumably, the reason it's best not to wire schema names into apps is because it might be useful to modify search_path in a way that dynamically made the same queries in some application reference what are technically distinct relations. If anyone does this, and it seems likely that many do for various reasons, they will be out of luck when using some kind of pg_stat_statements aggregation. This was the behaviour that I intended for pg_stat_statements all along, and I think it's better than a solution that matches query strings. 2. If people are going to accumulate stats on queries over a long period of time, is a 32-bit hash really good enough for the purpose? If I'm doing the math right, the chance of collision is already greater than 1% at 1 queries, and rises to about 70% for 10 queries; see http://en.wikipedia.org/wiki/Birthday_paradox We discussed this issue and decided it was okay for pg_stat_statements's internal hash table, but it's not at all clear to me that it's sensible to use 32-bit hashes for external accumulation of query stats. Well, forgive me for pointing this out, but I did propose that the hash be a 64-bit value (which would have necessitated adopting hash_any() to produce 64-bit values), but you rejected the proposal. I arrived at the same probability for a collision as you did and posted in to the list, in discussion shortly after the normalisation stuff was committed. A more sensible way of assessing the risk of a collision would be to try and come up with the probability of a collision that someone actually ends up caring about, which is considerably less than the 1% for 10,000 entries. I'm not being glib - people are very used to the idea that aggregating information on query costs is a lossy process. Prior to 9.2, the only way execution costs could reasonably be measured on at the query granularity on a busy system was to set log_min_duration_statement to something like 1 second. I am also unconvinced by the idea that aggregating historical data (with the same hash value) in a separate application is likely to make the collision situation appreciably worse. People are going to be using something like an RRD circular buffer to aggregate the information, and I can't see anyone caring about detailed information that is more than a couple of weeks in the past. The point of aggregation isn't to store more queries, it's to construct time-series data from snapshots. Besides, do most applications really even have more than 10,000 distinct queries? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 2 October 2012 17:58, Stephen Frost sfr...@snowman.net wrote: Right, and that's all I'm trying to address here- how do we provide a value for a given query which can be relied upon by outside sources, even in the face of a point release which changes what our internal hash value for a given query is. I don't know of a way. Presumably, we'd hope to avoid this, and would look for alternatives to anything that would necessitate bumping PGSS_FILE_HEADER, while not going so far as to let pg_stat_statements contort things in the core system. If I was aware of a case where this would have come up had pg_stat_statements fingerprinting been around at the time, perhaps I could give a better answer than that. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Instead, I think it makes sense to assign a number -- arbitrarily, but uniquely -- to the generation of a new row in pg_stat_statements, and, on the flip side, whenever a row is retired its number should be eliminated, practically, for-ever. This way re-introductions between two samplings of pg_stat_statements cannot be confused for a contiguously maintained statistic on a query. This argument seems sensible to me. Is there any use-case where the proposed counter wouldn't do what people wished to do with an exposed hash value? Yes. The hash could be used to aggregate query execution costs across entire WAL-based replication clusters. I'm not opposed to Daniel's suggestion, though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
Do you think you can follow through on this soon, Robert? I don't believe that there are any outstanding issues. I'm not going to make an issue of the fact that strxfrm() hasn't been taken advantage of. If you could please post a new revision, with the suggested alterations (that you agree with), I'd be happy to take another look. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Disable _FORTIFY_SOURCE with ICC
On 5 October 2012 04:37, Tom Lane t...@sss.pgh.pa.us wrote: A bit later: testing on an F17 box (glibc-2.15-56.fc17.x86_64) confirms Peter G's complaint, and also confirms that putting the above into port/linux.h (instead of the template file) fixes the problem. Don't know how to disable it on ICC, but I suppose there's some way to test for ICC via an #if. I was using F17 + glibc 2.15 too. http://www.acsu.buffalo.edu/~charngda/icc.html indicates that ICC defines a macro __ICC, which evaluates to an integer constant representing the ICC version numbers. I guess that that particular source isn't very trustworthy, but that's what we have a build farm for. Small patch that hopefully fixes everything for everyone is attached. However, when I build at -O2 with the system GCC (on the same F17 box with no changes to CFLAGS; just ./configure --with-libxml --prefix=/home/peter/pgsql --enable-depend) this patch (when applied to today's master) sees the following warnings raised: zic.c: In function ‘writezone’: zic.c:1752:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1753:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1754:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1755:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1756:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1757:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1758:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1759:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1760:3: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1772:4: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c:1785:4: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c: In function ‘puttzcode64’: zic.c:1514:2: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] zic.c: In function ‘puttzcode’: zic.c:1505:2: warning: ignoring return value of ‘fwrite’, declared with attribute warn_unused_result [-Wunused-result] In file included from gram.y:13502:0: scan.c: In function ‘yy_try_NUL_trans’: scan.c:16247:23: warning: unused variable ‘yyg’ [-Wunused-variable] -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services fortify_source_fix.2012_10_08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
On 20 August 2012 14:09, Pavel Stehule pavel.steh...@gmail.com wrote: (eelog-2012-08-20.diff) This patch has the following reference to relerror.c: *** a/src/include/utils/rel.h --- b/src/include/utils/rel.h *** *** 394,397 typedef struct StdRdOptions --- 394,402 extern void RelationIncrementReferenceCount(Relation rel); extern void RelationDecrementReferenceCount(Relation rel); + /* routines in utils/error/relerror.c */ Indeed, the new C file has been separately added to a makefile: ! OBJS = assert.o elog.o relerror.o However, the new file itself does not appear in this patch. Therefore, the code does not compile. Please post a revision with the new file included. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 8 October 2012 16:30, Robert Haas robertmh...@gmail.com wrote: I don't have any plans to work on this further. I think all of the criticism that has been leveled at this patch is 100% bogus, and I greatly dislike the changes that have been proposed. That may not be fair, but it's how I feel, and in light of that feeling it does not make sense for me to pursue this. Sorry. If it was the case that you were only 50% of the way to getting something committable, I guess I'd understand; this is, after all, a volunteer effort, and you are of course free to pursue or not pursue whatever you like. It's more like 95% though, so I have a hard time understanding this attitude. The only criticism that I imagine that you could be referring to is my criticism (which was seconded by Tom) concerning the approach to sizing a buffer that is used as state between successive calls to the sortsupport routine. This was a fairly trivial point even within the context of this patch. It was regrettable that it got so out of hand, but that certainly wasn't my intention. I must say that I find this disheartening as a reviewer, particularly since it comes from someone who reviews a lot of patches (including most of my own), and has often complained about the difficulties that reviewers face. Is it really so hard for you to accept a criticism from me? I didn't expect you to fully accept my criticism; I only expected you to take it into consideration, and possibly let it go for the sake of progress. For what it's worth, I have high regard for your work generally. In all sincerity, this appears to me to be a slight, and I find it upsetting, both on a personal level, and because it creates a concern about being able to work with you in the future, which is certainly something that I've benefited from in the past, and hope to continue to benefit from. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sortsupport for text
On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote: Hey, if me deciding I don't want to work on a patch any more is going to make you feel slighted, then you're out of luck. The archives are littered with people who have decided to stop working on things because the consensus position on list was different than the approach that they personally favored, and I have as much right to do that as anyone else. Sure you do. However, I doubt very many of those who gave up did so over so trivial an issue as to how to grow a buffer somewhere, and those that did usually did not go on to become major contributors, and certainly not committers. The buffer thing is, as far as I know, the single solitary point of contention with this patch. We're talking about 2 lines of code. To give up now is just petulant. There is no other way of looking at it. It is not as if anyone has phrased this as a maybe-we-should sort of argument; there have been quite definite arguments on both sides over apparently strongly-held positions. I had hoped that this was going to be a quick and non-controversial win, but 74 email messages later it has become clear that it will be neither of those things. Many of those 74 emails concerned my completely unrelated digression into exploiting strxfrm(); we spent a ridiculously long time discussing how to size this buffer, but it still wasn't anything like 74 messages. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Disable _FORTIFY_SOURCE with ICC
On 8 October 2012 14:39, Peter Geoghegan pe...@2ndquadrant.com wrote: Small patch that hopefully fixes everything for everyone is attached. Will someone take a look at this, please? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 3 October 2012 19:54, Peter Geoghegan pe...@2ndquadrant.com wrote: On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote: This argument seems sensible to me. Is there any use-case where the proposed counter wouldn't do what people wished to do with an exposed hash value? Yes. The hash could be used to aggregate query execution costs across entire WAL-based replication clusters. I'm not opposed to Daniel's suggestion, though. Could we please try and reach a consensus here? If you're still dead set against exposing the hash value, I think that just following what Daniel has suggested is a fair compromise. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 8/8] Introduce wal decoding via catalog timetravel
On 15 September 2012 01:39, Andres Freund and...@2ndquadrant.com wrote: (0008-Introduce-wal-decoding-via-catalog-timetravel.patch) This patch is the 8th of 8 in a patch series that covers different aspects of the bi-directional replication feature planned for PostgreSQL 9.3. For those that are unfamiliar with the BDR projection, a useful summary is available in an e-mail sent to this list by Simon Riggs back in April [1]. I should also point out that Andres has made available a design document that discusses aspects of this patch in particular, in another thread [2]. That document, High level design for logical replication in Postgres, emphasizes source data generation in particular: generalising how PostgreSQL's WAL stream is generated to represent the changes it describes logically, without pointers to physical heap tuples and so forth (data generation as a concept is described in detail in an earlier design document [3]). This particular patch can be thought of as a response to the earlier discussion [4] surrounding how to solve the problem of keeping system catalogs consistent during WAL replay on followers: catalog time travel is now used, rather than maintaining a synchronized catalog at the decoding end. Andres' git tree (xlog-decoding-rebasing-cf2 branch) [5] provides additional useful comments in commit messages (he rebases things such that each commit represents a distinct piece of functionality/ patch for review). This patch is not strictly speaking an atomic unit. It is necessary to apply all 8 patches in order to get the code to compile. However, it is approximately an atomic unit, that represents a subdivision of the entire BDR patch that it is manageable and logical to write a discrete review for. This is idiomatic use of git-format-patch, but it is unusual enough within our community for me feel the need to note these facts. I briefly discussed this patch with Andres off-list. His feeling is that the review process ought to focus on the design of WAL decoding, including how it fits within the larger set of replication features proposed for 9.3. There are a number of known omissions in this patch. Andres has listed some of these above, and edge-cases and so on are noted next to XXX and FIXME comments in the patch itself. I am inclined to agree with Andres' view that we should attempt to solidify community support for this prototype patch's design, or some variant, before fixing the edge-cases and working towards committable code. I will try my best to proceed on that basis. What follows is an initial overview of the patch (or at least my understanding of the patch, which you may need to correct), and some points of concern. * applycache module which reassembles transactions from a stream of interspersed changes This is what the design doc [2] refers to as 4.5. TX reassembly. This functionality is concentrated in applycache.c. As [2] notes, the reassembly component was previously coined ApplyCache because it was proposed to run on replication consumers just before applying changes. This is not the case anymore. It is still called that way in the source of the patch recently submitted. The purpose of ApplyCache/transaction reassembly is to reassemble interlaced records, and organise them by XID, so that the consumer client code sees only streams (well, lists) of records split by XID. I meant to avoid talking about anything other than the bigger picture for now, but I must ask: Why the frequent use of malloc(), particularly within applycache.c? The obvious answer is that it's rough code and that that will change, but that still doesn't comport with my idea about how rough Postgres code should look, so I have to wonder if there's a better reason. applycache.c has an acute paucity of comments, which makes it really hard to review well. [2] doesn't have all that much to say about it either. I'm going to not comment much on this here, except to say that I think that the file should be renamed to reassembly.c or something like that, to reflect its well-specified purpose, and not how it might be used. Any cache really belongs in src/backend/utils/cache/ anyway. Applycache is presumably where you're going to want to spill transaction streams to disk, eventually. That seems like a prerequisite to commit. By the way, I see that you're doing this here: + /* most callers don't need snapshot.h */ + typedef struct SnapshotData *Snapshot; Tom, Alvaro and I had a discussion about whether or not this was an acceptable way to reduce build dependencies back in July [8] – I lost that one. You're relying on a Gnu extension/C11 feature here (that is, typedef redefinition). If you find it intensely irritating that you cannot do this while following the standard to the letter, you are not alone. * snapbuilder which builds catalog snapshots so that tuples from wal can be understood This component analyses xlog and builds a special kind of Snapshot. This has been compared to the
Re: [HACKERS] tuplesort memory usage: grow_memtuples
Do you intend to follow through with this, Jeff? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
, documented exceptions to my previous all or nothing proviso, such as table names won't be available in the event of domain constraints, so be it. I'm going to suggest you add a note to both the docs and errcodes.h regarding all this in your next revision. People need to be adding these fields for all errcodes that *require* them going forward. If, in the future, ERRCODE_UNIQUE_VIOLATION errors, for example, cannot supply a constraint name that was violated, then that is, almost by definition, the wrong errcode to use. [1] http://archives.postgresql.org/message-id/CAEYLb_VJK+AZe6fO_s0Md0ge5D=rendf7wg+g4nxn8mhkq4...@mail.gmail.com [2] cafj8prdttdvosvjt8pp08mq_lw2haomwxvrudoylhk9xf7k...@mail.gmail.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 11 October 2012 20:28, Simon Riggs si...@2ndquadrant.com wrote: Not many RULE-lovers out there, once you've tried to use them. Allowing RULEs complicates various things and can make security more difficult. What exactly do they make more difficult? Are you particularly concerned with the overhead that rules impose when developing certain types of features? If so, since there's going to be a legacy compatibility mode for a long time, I don't know that deprecating rules will buy you much in the next 3 - 4 releases. For 9.3, I suggest we create a DDL trigger by default which prevents RULEs and throws an ERROR that explains they are now deprecated. Anybody that really cares can delete this and use them. Sometime in future, we hard code it, barring complaints. Well, rules have been around since the Berkeley days [1]. I don't think that anyone, including Tom, is willing to argue that user-defined rules are not a tar-pit (except perhaps ON SELECT DO INSTEAD SELECT rules - which are exactly equivalent to views anyway). Personally, I'd like to see them removed too. However, in order to be able to get behind your proposal, I'd like to see a reasonably developed cost/benefit analysis. People do use user-defined rules. For example, the xTuple open source ERP package uses ON INSERT DO INSTEAD rules [2]. [1] http://db.cs.berkeley.edu/papers/ERL-M89-82.pdf [2] http://www.xtuple.org/ApiDevelopment -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
offer us? What have I said that is fundamentally incompatible with how things work in this patch right now? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 14 October 2012 09:19, Simon Riggs si...@2ndquadrant.com wrote: This is a very useful optimisation, for both the low and the high end. I suppose it's possible to not think of it as an optimisation at all. Rather, it could be considered a way of making tuplesort really use the work_mem allotted to it, or at least use it more effectively, so that DBAs don't have to oversize work_mem. You're getting the same behaviour as when you oversized work_mem, except that you don't run the risk of thrashing due to excessive paging if ever there is a sort big enough to have that effect. The current coding allows full use of memory iff the memory usage is an exact power of 2, so this patch will allow an average of 50% and as much as 100% gain in memory for sorts. We need to be careful to note this as a change on the release notes, since users may well have set work_mem to x2 what was actually needed to get it to work - that means most users will see a sudden leap in actual memory usage, which could lead to swapping in edge cases. Yes, that's probably true. I notice that cost_sort doesn't take into account the space used for sorting other than tuple space so apparently requires no changes with this patch. ISTM that cost_sort should be less optimistic about memory efficiency than it is. Perhaps. I don't have an intuitive sense of what is and is not worth modelling in the optimiser, so I can't really comment here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 15 October 2012 11:41, Greg Stark st...@mit.edu wrote: On Mon, Oct 15, 2012 at 8:00 AM, Simon Riggs si...@2ndquadrant.com wrote: Please can anyone show me the SQL for a rule that cannot be written as a view or a trigger? I do not believe such a thing exists and I will provide free beer to the first person that can prove me wrong. Being written as a view doesn't help you because views use rules. I repeat, the very fact that we need rules to implement views prove rules are necessary for some purposes. Well, the usual way that this proposal is phrased is that user-defined rules should be deprecated. Granted, that wasn't the case on this occasion, but it has been on many other occasions. It's not as if we there isn't a clear separation between what we all agree are good rules (that is, ON SELECT DO INSTEAD SELECT rules, which views are technically very simple wrappers of) and bad rules (that is, everything else). Humorous aside: I saw this comment within view.c, that dates from the Postgres95 days at the latest (but is probably older still): * This update consists of adding two new entries IN THE BEGINNING * of the range table (otherwise the rule system will die a slow, * horrible and painful death, and we do not want that now, do we?) I'm not sure that the authors' remarks about not wanting that should be taken at face value... -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 15 October 2012 00:30, Greg Stark st...@mit.edu wrote: In fact it's not a very good analogy because the situation is *precisely* the same -- rules *are* macros and manipulate the raw sql before it's run and the reason they can't be replaced by triggers is because, like functions, triggers happen after the code is compiled and run. I quite like this analogy, because it nicely illustrates the problems with rules. C, and the C preprocessor, are essential the same now as they were in the early 1970s. I think that *an emphasis* on a preprocessing stage of translation is a fairly discredited idea (though there are some sensible uses, particularly where alternatives are not available). C99 introduced inline functions, probably in no small part because it is quite obvious that they are often superior to macros. Consider the two most successful programming languages that were obviously influenced by C: Java and C++. The first doesn't have a preprocessor, and the second strongly encourages using numerous alternatives to macros where possible, which is almost always. Maybe you don't like this analogy, because you consider C to be a systems programming language, and as such think it is only right and proper that programmers should be given enough rope to hang themselves. Perhaps you're right, but the same surely cannot be said for SQL. The original appeal of SQL was that it was supposedly possible for non-programmers to write it. Clearly deprecating rules implies some loss of functionality - there is no exact, drop-in equivalent to something that magically rewrites SQL that isn't equally baroque and problematic. If that's the bar, then detractors of rules should stop wasting their breath, because the bar has been set impossibly high. On a *practical* level triggers are complete replacements for user-defined rules. All that it takes to be able to *always* say that one language feature is not equivalent to another, and on that basis the other should not be deprecated, is a sufficient degree of pedantry (not that I'm implying that you or anyone else was being pedantic, or that concerns raised should not be heeded). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash id in pg_stat_statements
On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote: Daniel Farina dan...@heroku.com writes: Instead, I think it makes sense to assign a number -- arbitrarily, but uniquely -- to the generation of a new row in pg_stat_statements, and, on the flip side, whenever a row is retired its number should be eliminated, practically, for-ever. This way re-introductions between two samplings of pg_stat_statements cannot be confused for a contiguously maintained statistic on a query. This argument seems sensible to me. Daniel: Could you please submit the patch that you were working on that does this to the next commitfest? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
On 15 October 2012 19:19, Bruce Momjian br...@momjian.us wrote: I think Robert is right that if Slony can't use the API, it is unlikely any other replication system could use it. I don't accept that. Clearly there is a circular dependency, and someone has to go first - why should the Slony guys invest in adopting this technology if it is going to necessitate using a forked Postgres with an uncertain future? That would be (with respect to the Slony guys) a commercial risk that is fairly heavily concentrated with Afilias. So, if you're going to attach as a condition to its acceptance that the Slony guys be able to use it immediately (because can integrate really means will integrate, right?), you're attaching it to a rather arbitrary condition that has nothing much to do with the technical merit of the patches proposed. The fact of the matter is that Slony was originally designed with a somewhat different set of constraints to those that exist today, so I don't doubt that this is something that they're going to need to integrate over time, probably in a separate release branch, to get the upsides of in-core logical replication, along with the great flexibility that Slony currently offers (and that Afilias undoubtedly depend upon today). Another way of putting this is that Postgres should go first because we will get huge benefits even if only one of the trigger-based logical replication systems adopts the technology. Though I hope and expect that the Slony guys will be able to work with what we're doing, surely a logical replication system with all the benefits implied by being logical, but with with only some subset of Slony's functionality is still going to be of great benefit. My view is that the only reasonable approach is to build something solid, well-integrated and generic, in core. I'd certainly like to hear what the Slony guys have to say here, though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 14 October 2012 09:19, Simon Riggs si...@2ndquadrant.com wrote: This is a very useful optimisation, for both the low and the high end. Well, I'm about ready to mark this one ready for committer. There is this outstanding issue in my revision of August 17th, though: + /* +* XXX: This feels quite brittle; is there a better principled approach, +* that does not violate modularity? +*/ + newmemtupsize = (int) floor(oldmemtupsize * allowedMem / memNowUsed); + state-fin_growth = true; I suppose that I should just recognise that this *is* nothing more than a heuristic, and leave it at that. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 16 October 2012 14:24, Simon Riggs si...@2ndquadrant.com wrote: If you describe in detail that it is a heuristic and why that is proposed over other approaches that should be sufficient for future generations to read and understand. I've done so, in the attached revision. Things have been simplified somewhat too. The same basic strategy for sizing the tuplesort memtuples array in also exists in tuplestore. I wonder if we should repeat this there? I suppose that that could follow later. The patch will now been marked ready for committer. Does this need doc changes, in light of what is arguably a behavioural difference? You only mentioned release notes. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services sortmem_grow-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 16 October 2012 22:18, Greg Stark st...@mit.edu wrote: That's assuming my committer bits haven't lapsed and people are ok with me stepping back into things? I personally have no objections. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows
On 17 October 2012 14:53, Merlin Moncure mmonc...@gmail.com wrote: Is that defined in the standard? RETURNING isn't even defined in the standard. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 17 October 2012 18:50, Andrew Dunstan and...@dunslane.net wrote: I don't know how many times I have to say this: people are not listening. Tom has already given a case for it upthread: Triggers necessarily operate on a row-at-a-time basis. In theory, for at least some bulk operations, a rule could greatly outperform a trigger. It's difficult to walk away from that - unless somebody can prove that the advantage doesn't ever accrue in practice. Fair point. I'm just not sure that that is a good enough reason to not deprecate rules. I mean, if experienced hackers cannot figure out if that's actually a useful facet of rules, what hope is there for anyone else? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 17 October 2012 18:46, Greg Stark st...@mit.edu wrote: I would suggest something like Warning: RULES are tricky to use correctly. They rewrite the original query into a new query before it is run and it is very hard to correctly anticipate and rewrite every possible input query into the desired result. There are also unexpected interactions with other components when RULES do something unexpected such as rewrite a single query to return two result sets. +1 to that sort of wording. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
On 16 October 2012 15:26, Jan Wieck janwi...@yahoo.com wrote: This means that the transition time from the existing, trigger based approach to the new WAL based mechanism will see both technologies in parallel, which is no small thing to support. So, you're talking about a shim between the two in order to usefully support inter-version replication, or are you just thinking about making a clean break in compatibility for Postgres versions prior to 9.3 in a new release branch? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
On 18 October 2012 16:18, Christopher Browne cbbro...@gmail.com wrote: A shim adds complexity, but retains the upgrade across versions use case, and reduces the need to keep supporting elder versions of Slony. Right. Upgrading across major versions is likely to continue to remain a very important use-case for Slony. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On 16 October 2012 21:47, Peter Geoghegan pe...@2ndquadrant.com wrote: The same basic strategy for sizing the tuplesort memtuples array in also exists in tuplestore. I wonder if we should repeat this there? I suppose that that could follow later. Incidentally, the basis of this remark is commit 2689abf0, where Tom decided to keep the two in sync. That's a precedent for what we need to do here, I suppose. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure w/extended query protocol
On 19 October 2012 17:19, Robert Haas robertmh...@gmail.com wrote: Rushabh Lathia of the EnterpriseDB development team and I have been doing some testing of the extended query protocol and have found a case where it causes an assertion failure. Here's how to reproduce: 1. Apply the attached patch to teach psql how to use the extended query protocol. Compile, install. 2. Start the modified psql and do this: \set PROTOCOL extended PREPARE stmt as select 1; CREATE TEMPORARY TABLE tmptbl AS EXECUTE stmt; The result is: TRAP: FailedAssertion(!(qry-commandType != CMD_UTILITY), File: utility.c, Line: 1516) I'm reasonably confident that commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a caused this breakage. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure w/extended query protocol
On 19 October 2012 19:01, Andres Freund and...@2ndquadrant.com wrote: Btw, do you plan to submit that psql patch at some point? I repeatedly wished to be able to use the extended protocol without writing code or misusing pgbench exactly to test stuff like this. +1 -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] First draft of snapshot snapshot building design document
-subxip, snapshot-subxcnt)) + { + CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple); + /* no support for that yet */ + if (tuple-t_infomask HEAP_COMBOCID){ + elog(WARNING, combocids not yet supported); + return false; + } + if (cmin = snapshot-curcid) + return false; /* inserted after scan started */ + } Above, you aren't taking this into account (code from HeapTupleHeaderGetCmax()): /* We do not store cmax when locking a tuple */ Assert(!(tup-t_infomask (HEAP_MOVED | HEAP_IS_LOCKED))); Sure, you're only interested in cmin, so this doesn't look like it applies (isn't this just a sanity check?), but actually, based on this it seems to me that the current sane representation of cmin needs to be obtained in a more concurrency aware fashion - having the backend local data-structures that originate the tuple isn't even good enough. The paucity of other callers to HeapTupleHeaderGetRawCommandId() seems to support this. Apart from contrib/pageinspect, there is only this one other direct caller, within heap_getsysattr(): /* * cmin and cmax are now both aliases for the same field, which * can in fact also be a combo command id. XXX perhaps we should * return the real cmin or cmax if possible, that is if we are * inside the originating transaction? */ result = CommandIdGetDatum(HeapTupleHeaderGetRawCommandId(tup-t_data)); So these few direct call-sites that inspect CommandId outside of their originating backend are all non-critical observers of the CommandId, that are roughly speaking allowed to be wrong. Callers to the higher level abstractions (HeapTupleHeaderGetCmin()/HeapTupleHeaderGetCmax()) know that the CommandId must be a cmin or cmax respectively, because they have as their contract that TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)) and TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tup)) respectively. I guess this can all happen when you write that XLOG_HEAP2_NEW_COMBOCID record within the originating transaction (that is, when you xmin is that of the tuple in the originating transaction, you are indeed dealing with a cmin), but these are hazards that you need to consider in addition to the more obvious ComboCid hazard. I have an idea that the HeapTupleHeaderGetRawCommandId(tuple) call in your code could well be bogus even when (t_infomask HEAP_COMBOCID) == 0. I look forward to seeing your revision that addressed the issue with sinval messages. [1] http://archives.postgresql.org/message-id/201209221900.53190.and...@2ndquadrant.com [2] http://archives.postgresql.org/message-id/CAEYLb_XZ-k_vRpBP9TW=_wufdsusosp1yir1xg7l_4rmg5b...@mail.gmail.com [3] http://archives.postgresql.org/message-id/201209150233.25616.and...@2ndquadrant.com [4] http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-and...@2ndquadrant.com [5] http://archives.postgresql.org/message-id/201206131327.24092.and...@2ndquadrant.com [6] http://archives.postgresql.org/message-id/201210161330.37967.and...@2ndquadrant.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deprecating RULES
On 19 October 2012 22:03, Josh Berkus j...@agliodbs.com wrote: Unless the easiest way to implement MERGE is to extend RULEs. FWIW, I'd say that's probably about the hardest possible way to implement MERGE, assuming that we prioritise providing robust UPSERT support, as I strongly feel we should. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
I think that we're both going to be busy next week, since we're both attending pgconf.eu. For that reason, I would like to spend some time tomorrow to get something in shape, that I can mark ready for committer. I'd like to get this patch committed during this commitfest. You are welcome to do this work instead. I want to avoid a redundant effort. Let me know if you think that that's a good idea. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea
On 21 October 2012 16:59, Kevin Grittner kgri...@mail.com wrote: I don't know about anyone else, but I could live with that. Me too. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] enhanced error fields
On 24 October 2012 23:29, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Let me know if you think that that's a good idea. I guess you didn't get around to it. I did get some work on this done, which does change things somewhat. In particular, I think that the need to have so many new fields is questionable, and so have removed some. I will get around to this, and will incorporate those ideas. The errrel() calls with index relations are not sane, but that's just an oversight. The next revision will actually do this: + Assert(table-rd_rel-relkind == RELKIND_RELATION); -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should select 'nan'::float = 'nan'::float; return false as per IEEE 754
On 28 October 2012 09:43, Hannu Krosing ha...@2ndquadrant.com wrote: test=# select 'NaN'::float = 'NaN'::float as must_be_false; must_be_false -- t (1 row) I think that PostgreSQL's behaviour of comparing two NaN-s as equal is wrong and Iwe should follow the IEEE 754 spec here The reason that PostgreSQL does this is that in order for float to be a btree indexable type, its semantics must follow the reflexive law. This and other requirements of btree operator classes are described under src/backend/access/nbtree/README. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical to physical page mapping
On 28 October 2012 22:35, Jan Wieck janwi...@yahoo.com wrote: The amount of WAL generated with full_page_writes=on is quite substantial. For pgbench for example the ratio 20:1. Meaning with full_page_writes you write 20x the amount you do without. Sure, but as always, pgbench pessimises everything by having writes follow a uniform distribution, which is completely unrepresentative of the natural world. This will literally maximise the number of pristine full page images that need to be included. The actual frequency of checkpoints is a crucial factor here too, and you didn't mention anything about that. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
On 29 October 2012 20:14, Christian Kruse cjk+postg...@defunct.ch wrote: created a patch which implements MAP_HUGETLB for sysv shared memory segments (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I added error handling, huge page size detection and a GUC variable. I have a few initial observations on this. * I think you should be making the new GUC PGC_INTERNAL on platforms where MAP_HUGETLB is not defined or available. See also, effective_io_concurrency. This gives sane error handling. * Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do the same thing as HUGE_TLB_TRY, contrary to your documentation: + if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == HUGE_TLB_TRY) * I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather than XOR'ing after the fact. We already build the flags that comprise PG_MMAP_FLAGS using another set of intermediary flags, based on platform considerations, so what you're doing here is just inconsistent. Don't wrap the mmap() code in ifdefs, and instead rely on the GUC being available on all platforms, with setting the GUC causing an error on unsupported platforms, in the style of effective_io_concurrency, as established in commit 3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB intermediary flag on all platforms. * Apparently you're supposed to do this for the benefit of Itanic [1]: /* Only ia64 requires this */ #ifdef __ia64__ #define ADDR (void *)(0x8000UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) #else #define ADDR (void *)(0x0UL) #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif * You aren't following our style guide with respect to error messages [2]. [1] https://www.kernel.org/doc/Documentation/vm/map_hugetlb.c [2] http://www.postgresql.org/docs/devel/static/error-style-guide.html -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 01:03, Jeff Janes jeff.ja...@gmail.com wrote: https://wiki.postgresql.org/wiki/Index-only_scans This page is seriously out of date. It suggests they are not yet implemented, but only being talked about. Attached is a piece I wrote on the feature. That might form the basis of a new wiki page. Feel free to incorporate this material as you see fit. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services index_only_scans.rst Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote: I found this an interesting read. As one of the people who worked on the feature, I'm sort of curious whether people have any experience yet with how this actually shakes out in the field. Are you (or is anyone) aware of positive/negative field experiences with this feature? Unfortunately, I don't think that I have any original insight about the problems with index-only scans in the field right now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote: I found this an interesting read. As one of the people who worked on the feature, I'm sort of curious whether people have any experience yet with how this actually shakes out in the field. Are you (or is anyone) aware of positive/negative field experiences with this feature? Unfortunately, I don't think that I have any original insight about the problems with index-only scans in the field right now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 22 March 2012 17:19, Tom Lane t...@sss.pgh.pa.us wrote: I'm inclined to think that the most useful behavior is to teach the rewriter to copy queryId from the original query into all the Queries generated by rewrite. Then, all rules fired by a source query would be lumped into that query for tracking purposes. This might not be the ideal behavior either, but I don't see a better solution. +1. This behaviour seems fairly sane. The lumping together of DO ALSO and DO INSTEAD operations was a simple oversight. This seems to boil down to what you want to have happen with queries created/executed inside functions, which is something I don't recall being discussed. Uh, well, pg_stat_statements is clearly supposed to monitor execution of queries from within functions - there is a GUC, pg_stat_statements.track, which can be set to 'all' to track nested queries. That being the case, we should clearly be fingerprinting those query trees too. The fact that we'll fingerprint these queries even though we usually don't care about them doesn't seem like a problem, since in practice the vast majority will be prepared. Either way, I think we'd be a lot better advised to define a single hook post_parse_analysis_hook and make the core code responsible for calling it at the appropriate places, rather than supposing that the contrib module knows exactly which core functions ought to be the places to do it. I agree. Since you haven't mentioned the removal of parse-analysis Const location alterations, I take it that you do not object to that, which is something I'm glad of. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 22 March 2012 19:07, Tom Lane t...@sss.pgh.pa.us wrote: Will you adjust the patch for the other issues? Sure. I'll take a look at it now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 23 January 2012 02:08, Robert Haas robertmh...@gmail.com wrote: On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote: I'm finding the backend_writes column pretty unfortunate. The only use I know of for it is to determine if the bgwriter is lagging behind. Yet it doesn't serve even this purpose because it lumps together the backend writes due to lagging background writes, and the backend writes by design due to the use buffer access strategy during bulk inserts. +1 for separating those. I decided to have a go myself. Attached patch breaks out strategy allocations in pg_stat_bgwriter, but not strategy writes. My thinking is that this may serve to approximate non-BAS_NORMAL writes, with the considerable advantage of not requiring that I work backwards to figure out strategy from some block when backend-local syncing (yes, syncing, not writing) a buffer to work out which strategy object references the buffer. The bookkeeping that that would likely entail seems to make it infeasible. Incidentally, it seems Postgres doesn't currently record backend writes when the buffer doesn't go on to be sync'd. That seems problematic to me, or at the very least a misrepresentation, since temporary tables will be written out by the backend for example. Not sure if it's worth fixing, though I've added a comment to that effect at the site of where backend_writes is bumped. I have corrected the md.c bug. This previously would have prevented the sync_files (number of relation segments synced) value from being valid in non-log_checkpoints configurations. I'm not currently confident that the strategy_alloc filed is a very useful proxy for a strategy_backend_writes field. I think that rather than bumping the strategy allocation count analogously to the way the overall count is bumped (within StrategyGetBuffer()), I should have bumped earlier within BufferAlloc() so that it'd count if the buffer was requested with non-BAS_NORMAL strategy but was found in shared_buffers (so control never got as far as StrategyGetBuffer() ). That might make the value more closely line-up to the would-be value of a strategy_backend_writes column. What do you think? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml new file mode 100644 index 840e54a..f714cb8 *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** SELECT pg_stat_get_backend_pid(s.backend *** 806,811 --- 806,818 the functionpg_stat_get_buf_alloc/function function./entry /row row + entrybuffers_strat_alloc/entry + entrytypebigint/type/entry + entryNumber of buffers allocated with non-default buffer access strategy. + This value can also be returned by directly calling + the functionpg_stat_get_buf_strat_alloc/function function./entry + /row + row entrystats_reset/entry entrytypebigint/type/entry entryThe last time these statistics were reset. *** SELECT pg_stat_get_backend_pid(s.backend *** 1703,1708 --- 1710,1741 /entry /row + row + entryliteralfunctionpg_stat_get_bgwriter_write_time()/function/literal/entry + entrytypebigint/type/entry + entry + Total amount of time that has been spent in the part of checkpoint + processing where files are written to disk, in milliseconds. + /entry + /row + + row + entryliteralfunctionpg_stat_get_bgwriter_sync_time()/function/literal/entry + entrytypebigint/type/entry + entry + Total amount of time that has been spent in the part of checkpoint + processing where files are synchronized to disk, in milliseconds. + /entry + /row + + row + entryliteralfunctionpg_stat_get_bgwriter_sync_files()/function/literal/entry + entrytypebigint/type/entry + entry + Total number of files that have been synchronized to disk during + checkpoint processing. + /entry + /row row entryliteralfunctionpg_stat_get_wal_senders()/function/literal/entry diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c new file mode 100644 index ff7f521..e481be3 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** LogCheckpointStart(int flags, bool resta *** 7492,7501 } /* ! * Log end of a checkpoint. */ static void ! LogCheckpointEnd(bool restartpoint) { long write_secs, sync_secs, --- 7492,7501 } /* ! * Time and potentially log the end of a checkpoint. */ static void ! TimeCheckpointEnd(bool restartpoint) { long write_secs, sync_secs, *** LogCheckpointEnd(bool restartpoint) *** 7511,7519 CheckpointStats.ckpt_end_t = GetCurrentTimestamp(); - TimestampDifference
Re: [HACKERS] Uppercase tab completion keywords in psql?
On 22 March 2012 22:05, Andrew Dunstan and...@dunslane.net wrote: On 03/22/2012 05:49 PM, Bruce Momjian wrote: Robert Haas and I are disappointed by this change. I liked the fact that I could post nice-looking SQL queries without having to use my capslock key (which I use as a second control key). Any chance of reverting this change? Should it be governed by a setting? Perhaps, but I find the behaviour that was introduced by Peter's patch to be a more preferable default. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uppercase tab completion keywords in psql?
On 23 March 2012 15:13, Andrew Dunstan and...@dunslane.net wrote: Upper casing SQL keywords is a common style, which is used in lots of our code (e.g. regression tests, psql queries, pg_dump). I think the default should match what is in effect our house style, and what we have historically done. The code doesn't give preferential treatment to lower-case code - it merely puts it on an even footing. I would agree with your position if the change assumed that the user always wanted to use lower-case SQL, but it does not. Rather, it intelligently infers what the user wants. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote: I've committed the core-backend parts of this, just to get them out of the way. Have yet to look at the pg_stat_statements code itself. Thanks. I'm glad that we have that out of the way. I ended up choosing not to apply that bit. I remain of the opinion that this behavior is fundamentally inconsistent with the general rules for assigning parse locations to analyzed constructs, and I see no reason to propagate that inconsistency further than we absolutely have to. Fair enough. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
[ also for the archives' sake ] On 27 March 2012 22:05, Tom Lane t...@sss.pgh.pa.us wrote: Well, testing function pointers for null is certainly OK --- note that all our hook function call sites do that. It's true that testing for equality to a particular function's name can fail on some platforms because of jump table hacks. I was actually talking about stylistic iffiness. This seems contrary to the standard, which states: (ISO C 99 section 6.5.9.6) Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (...) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space. However, the fly in the ointment is IA-64 (Itanic), which apparently at least at one stage had broken function pointer comparisons, at least when code was built using some version(s) of GCC. I found it a bit difficult to square your contention that performing function pointer comparisons against function addresses was what sounded like undefined behaviour, and yet neither GCC nor Clang complained. However, in light of what I've learned about IA-64, I can certainly see why we as a project would avoid the practice. Source: http://gcc.gnu.org/ml/gcc/2003-06/msg01283.html -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote: Have yet to look at the pg_stat_statements code itself. I merged upstream changes with the intention of providing a new patch for you to review. I found a problem that I'd guess was introduced by commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, Restructure SELECT INTO's parsetree representation into CreateTableAsStmt. This has nothing to do with my patch in particular. I noticed that my tests broke, on queries like select * into orders_recent FROM orders WHERE orderdate = '2002-01-01';. Since commands like that are now utility statements, I thought it best to just hash the query string directly, along with all other utility statements - richer functionality would be unlikely to be missed all that much, and that's a fairly clean demarcation that I don't want to make blurry. In the existing pg_stat_statements code in HEAD, there are 2 pgss_store call sites - one in pgss_ProcessUtility, and the other in pgss_ExecutorFinish. There is an implicit assumption in the extant code (and my patch too) that there will be exactly one pgss_store call per query execution. However, that assumption appears to now fall down, as illustrated by the GDB session below. What's more, our new hook is called twice, which is arguably redundant. (gdb) break pgss_parse_analyze Breakpoint 1 at 0x7fbd17b96790: file pg_stat_statements.c, line 640. (gdb) break pgss_ProcessUtility Breakpoint 2 at 0x7fbd17b962b4: file pg_stat_statements.c, line 1710. (gdb) break pgss_ExecutorEnd Breakpoint 3 at 0x7fbd17b9618c: file pg_stat_statements.c, line 1674. (gdb) c Continuing. I execute the command select * into orders_recent FROM orders WHERE orderdate = '2002-01-01'; Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0, post_analysis_tree=0x2474010) at pg_stat_statements.c:640 640 if (post_analysis_tree-commandType != CMD_UTILITY) (gdb) c Continuing. Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0, post_analysis_tree=0x2474010) at pg_stat_statements.c:640 640 if (post_analysis_tree-commandType != CMD_UTILITY) (gdb) c Continuing. Breakpoint 2, pgss_ProcessUtility (parsetree=0x2473cd8, queryString=0x2472a88 select * into orders_recent FROM orders WHERE orderdate = '2002-01-01';, params=0x0, isTopLevel=1 '\001', dest=0x2474278, completionTag=0x7fff74e481e0 ) at pg_stat_statements.c:1710 1710if (pgss_track_utility pgss_enabled()) (gdb) c Continuing. Breakpoint 3, pgss_ExecutorEnd (queryDesc=0x24c9660) at pg_stat_statements.c:1674 1674if (queryDesc-totaltime pgss_enabled()) (gdb) c Continuing. What do you think we should do about this? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 28 March 2012 15:25, Tom Lane t...@sss.pgh.pa.us wrote: That's been an issue right along for cases such as EXPLAIN and EXECUTE, I believe. Possible, since I didn't have test coverage for either of those 2 commands. Perhaps the right thing is to consider such executor calls as nested statements --- that is, the ProcessUtility hook ought to bump the nesting depth too. That makes a lot of sense, but it might spoil things for the pg_stat_statements.track = 'top' + pg_stat_statements.track_utility = 'on' case. At the very least, it's a POLA violation, to the extent that if you were going to do this, you might mandate that nested statements be tracked along with utility statements (probably while defaulting to having both off, which would be a change). Since you've already removed the intoClause chunk, I'm not sure how far underway the review effort is - would you like me to produce a new revision, or is that unnecessary? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 28 March 2012 15:57, Tom Lane t...@sss.pgh.pa.us wrote: Is there any actual benefit in providing the pg_stat_statements.string_key GUC? It looks to me more like something that was thrown in because it was easy than because anybody would want it. I'd just as soon leave it out and avoid the incremental API complexity increase. (While on that subject, I see no documentation updates in the patch...) Personally, I don't care for it, and I'm sure most users wouldn't either, but I thought that someone somewhere might be relying on the existing behaviour. I will produce a doc-patch. It would have been premature to do so until quite recently. Also, I'm not terribly happy with the sticky entries hack. The way you had it set up with a 1e10 bias for a sticky entry was completely unreasonable IMO, because if the later pgss_store call never happens (which is quite possible if the statement contains an error detected during planning or execution), that entry is basically never going to age out, and will just uselessly consume a precious table slot for a long time. In the extreme, somebody could effectively disable query tracking by filling the hashtable with variants of SELECT 1/0. The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK to maybe 10 or so, but I wonder whether there's some less klugy way to get the result in the first place. I thought about keeping the canonicalized query string around locally in the backend rather than having the early pgss_store call at all, but am not sure it's worth the complexity of an additional local hashtable or some such to hold such pending entries. I was troubled by that too, and had considered various ways of at least polishing the kludge. Maybe a better approach would be to start with a usage of 1e10 (or something rather high, anyway), but apply a much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky entries only? That way, in earlier calls of entry_dealloc() the sticky entries, easily identifiable as having 0 calls, are almost impossible to evict, but after a relatively small number of calls they soon become more readily evictable. This seems better than simply having some much lower usage that is only a few times the value of USAGE_INIT. Let's suppose we set sticky entries to have a usage value of 10. If all other queries have more than 10 calls, which is not unlikely (under the current usage format, 1.0 usage = 1 call, at least until entry_dealloc() dampens usage) then when we entry_dealloc(), the sticky entry might as well have a usage of 1, and has no way of increasing its usage short of becoming a real entry. On the other hand, with the multiplier trick, how close the sticky entry is to eviction is, importantly, far more strongly influenced by the number of entry_dealloc() calls, which in turn is influenced by churn in the system, rather than being largely influenced by how the magic sticky usage value happens to compare to those usage values of some random set of real entries on some random database. If entries really are precious, then the sticky entry is freed much sooner. If not, then why not allow the sticky entry to stick around pending its execution/ promotion to a real entry? It would probably be pretty inexpensive to maintain what is currently the largest usage value in the hash table at entry_dealloc() time - that would likely be far more suitable than 1e10, and might even work well. We could perhaps cut that in half every entry_dealloc(). -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 00:14, Tom Lane t...@sss.pgh.pa.us wrote: I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value of 3.0, which is the largest value that stays below 10% wastage. We can twiddle that logic later, so if you want to experiment with an alternate decay rule, feel free. I think I may well end up doing so when I get a chance. This seems like the kind of problem that will be solved only when we get some practical experience (i.e. use the tool on something closer to a production system than the regression tests). doc-patch is attached. I'm not sure if I got the balance right - it may be on the verbose side. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services pg_stat_statements_norm_docs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 02:09, Tom Lane t...@sss.pgh.pa.us wrote: Thanks. I've committed the patch along with the docs, after rather heavy editorialization. Thank you. 1. What to do with EXPLAIN, SELECT INTO, etc. We had talked about tweaking the behavior of statement nesting and some other possibilities. I think clearly this could use improvement but I'm not sure just how. (Note: I left out the part of your docs patch that attempted to explain the current behavior, since I think we should fix it not document it.) Yeah, this is kind of unsatisfactory. Nobody would expect the module to behave this way. On the other hand, users probably aren't hugely interested in this information. I'm still kind of attached to the idea of exposing the hash value in the view. It could be handy in replication situations, to be able to aggregate statistics across the cluster (assuming you're using streaming replication and not a trigger based system). You need a relatively stable identifier to be able to do that. You've already sort-of promised to not break the format in point releases, because it is serialised to disk, and may have to persist for months or years. Also, it will drive home the reality of what's going on in situations like this (from the docs): In some cases, queries with visibly different texts might get merged into a single pg_stat_statements entry. Normally this will happen only for semantically equivalent queries, but there is a small chance of hash collisions causing unrelated queries to be merged into one entry. (This cannot happen for queries belonging to different users or databases, however.) Since the hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings. 2. Whether and how to adjust the aging-out of sticky entries. This seems like a research project, but the code impact should be quite localized. As I said, I'll try and give it some thought, and do some experiments. BTW, I eventually concluded that the parameterization testing we were worried about before was a red herring. As committed, the patch tries to store a normalized string if it found any deletable constants, full stop. This seems to me to be correct behavior because the presence of constants is exactly what makes the string normalizable, and such constants *will* be ignored in the hash calculation no matter whether there are other parameters or not. Yeah, that aspect of not canonicalising parametrised entries had bothered me. I guess we're better off gracefully handling the problem across the board, rather than attempting to band-aid the problem up by just not having speculative hashtable entries in cases where they arguably are not so useful. Avoiding canonicalising those constants was somewhat misleading. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 28 March 2012 15:23, Robert Haas robertmh...@gmail.com wrote: At any rate, I strongly agree that counting the number of strategy allocations is not really a viable proxy for counting the number of backend writes. You can't know how many of those actually got dirtied. Sure. Since any backend write is necessarily the result of that backend trying to allocate a buffer, I think maybe we should just count whether the number of times it was trying to allocate a buffer *using a BAS* vs. the number of times it was trying to allocate a buffer *not using a BAS*. That is, decide whether or not it's a strategy write not based on whether the buffer came in via a strategy allocation, but rather based on whether it's going out as a result of a strategy allocation. I'm not quite sure I understand what you mean here. Are you suggesting that I produce a revision that bumps beside FlushBuffer() in BufferAlloc(), as a dirty page is evicted/written, while breaking the figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both figures be presented as separate columns within pg_stat_bgwriter? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new group commit behavior not helping?
On 1 April 2012 01:10, Robert Haas robertmh...@gmail.com wrote: Hoping to demonstrate the wonders of our new group commit code, I ran some benchmarks on the IBM POWER7 machine with synchronous_commit = on. But, it didn't come out much better than 9.1. pgbench, scale factor 300, median of 3 30-minute test runs, # clients = #threads, shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = on, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, wal_writer_delay = 20ms. Why the low value for wal_writer_delay? master: 01 tps = 118.968446 (including connections establishing) 02 tps = 120.666865 (including connections establishing) 04 tps = 209.624437 (including connections establishing) 08 tps = 377.387029 (including connections establishing) 16 tps = 695.172899 (including connections establishing) 32 tps = 1318.468375 (including connections establishing) REL9_1_STABLE: 01 tps = 117.037056 (including connections establishing) 02 tps = 119.393871 (including connections establishing) 04 tps = 205.958750 (including connections establishing) 08 tps = 365.464735 (including connections establishing) 16 tps = 673.379394 (including connections establishing) 32 tps = 1101.324865 (including connections establishing) (presumably s/tps/clients/ was intended here) Is this expected behavior? Is this not the case where it's supposed to help? I thought Peter G. posted results showing a huge improvement on this kind of workload, and I thought Heikki reproduced them on a different server, so I'm confused why I can't. The exact benchmark that I ran was the update.sql pgbench-tools benchmark, on my laptop. The idea was to produce a sympathetic benchmark with a workload that was maximally commit-bound. Heikki reproduced similar numbers on his laptop, iirc. Presumably the default TPC-B-like transaction test has been used here. You didn't mention what kind of disks this server has - I'm not sure if that information is available elsewhere. That could be highly pertinent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new group commit behavior not helping?
On 1 April 2012 06:41, Robert Haas robertmh...@gmail.com wrote: There seem to be too relevant differences between your test and mine: (1) your test is just a single insert per transaction, whereas mine is pgbench's usual update, select, update, update, insert and (2) it seems that, to really see the benefit of this patch, you need to pound the server with a very large number of clients. On this test, 250 clients was the sweet spot. *refers to original early January benchmark* While the graph that I produced was about the same shape as yours, the underlying hardware was quite different, and indeed with my benchmark group commit's benefits are more apparent earlier - at 32 clients, throughput has more-than doubled compared to pre group commit Postgres, which has already just about plateaued. I did include hdparm information for the disk that my benchmark was performed on at the time. While write-caching was not disabled, I would expect that the commit speed of my laptop - which has a fairly unremarkable 7200RPM disk - is slower than the 10K RPM SAS disks that you used. A formal benchmark of respective raw commit speeds may shed more light on this. Why did I even bother with such a sympathetic benchmark, when a benchmark on a large server could have been performed instead? Well, the reality is that many of our users have a commit speed that is comparable to my laptop. In particular, the increasing prevalence of cloud type deployments, make group commit a timely feature. If you wanted to demonstrate the wonders of group commit, I'd take that particular tone. I'm sure that if you re-ran this benchmark with a battery-backed cache, you would observe a much smaller though still very apparent benefit, but if you wanted to make the feature sound appealing to traditional enterprise users that are using a BBU, a good line would be this is what will save your bacon that day that your procedures fail and your BBU battery dies. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] invalid search_path complaints
On 3 April 2012 19:16, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 3, 2012 at 12:05 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 3, 2012 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: I would say that's an improvement. Do you think it isn't? It seems like a log spam hazard at high connection rates. [ shrug... ] Failing to report a problem is a problem, too. By your argument any error message anywhere should be removed because it might be a log spam hazard at high reporting rates. Surely there's a way to throttle it appropriately? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 3 April 2012 12:11, Robert Haas robertmh...@gmail.com wrote: Well, for 9.2, I am asking that you rip all that stuff out of the patch altogether so we can focus on the stuff that was in the original patch. Given how we're pushed for time, I'm inclined to agree that that is the best course of action. Attached revision does not attempt to do anything with strategy writes/allocations. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services bgwriter_stats_2012_04_05.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 25 March 2012 09:17, Simon Riggs si...@2ndquadrant.com wrote: The main thing we're waiting on are the performance tests to confirm the lack of regression. I have extensively benchmarked the latest revision of the patch (tpc-b.sql), which I pulled from Alvaro's github account. The benchmark was of the feature branch's then-and-current HEAD, Don't follow update chains unless caller requests it. I've had to split these numbers out into two separate reports. Incidentally, at some future point I hope that pgbench-tools can handling testing across feature branches, initdb'ing and suchlike automatically and as needed. That's something that's likely to happen sooner rather than later. The server used was kindly supplied by the University of Oregon open source lab. Server (It's virtualised, but apparently this is purely for sandboxing purposes and the virtualisation technology is rather good): IBM,8231-E2B POWER7 processor (8 cores). Fedora 16 8GB Ram Dedicated RAID1 disks. Exact configuration unknown. postgresql.conf (this information is available when you drill down into each test too, fwiw): max_connections = 200 shared_buffers = 2GB checkpoint_segments = 30 checkpoint_completion_target = 0.8 effective_cache_size = 6GB Reports: http://results_fklocks.staticloud.com/ http://results_master_for_fks.staticloud.com/ Executive summary: There is a clear regression of less than 10%. There also appears to be a new source of contention at higher client counts. I realise that the likely upshot of this, and other concerns that are generally held at this late stage is that this patch will not make it into 9.2 . For what it's worth, that comes as a big disappointment to me. I would like to thank both Alvaro and Noah for their hard work here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 29 March 2012 21:05, Tom Lane t...@sss.pgh.pa.us wrote: Barring objections I'll go fix this, and then this patch can be considered closed except for possible future tweaking of the sticky-entry decay rule. Attached patch fixes a bug, and tweaks sticky-entry decay. The extant code bumps usage (though not call counts) in two hooks (pgss_post_parse_analyze() and pgss_ExecutorEnd()) , so prepared queries will always have about half the usage of an equivalent simple query, which is clearly not desirable. With the proposed patch, usage should be similar to calls until the first call of entry_dealloc(), rather than usually having a value that's about twice as high. With the patch, a run of pgbench with and without -M prepared results in a usage of calls + 1 for each query from both runs. The approach I've taken with decay is to maintain a server-wide median usage value (well, a convenient approximation), which is assigned to sticky entries. This makes it hard to evict the entries in the first couple of calls to entry_dealloc(). On the other hand, if there really is contention for entries, it will soon become really easy to evict sticky entries, because we use a much more aggressive multiplier of 0.5 for their decay. I rather conservatively initially assume that the median usage is 10, which is a very low value considering the use of the multiplier trick. In any case, in the real world it won't take too long to call entry_dealloc() to set the median value, if in fact it actually matters. You described entries as precious. This isn't quite the full picture; while pg_stat_statements will malthusianistically burn through pretty much as many entries as you care give to it, or so you might think, I believe that in the real world, the rate at which the module burns through them would frequently look logarithmic. In other words, after an entry_dealloc() call the hashtable is 95% full, but it might take rather a long time to reach 100% again - the first 5% is consumed dramatically faster than the last. The user might not actually care if you need to cache a sticky value for a few hours in one of their slots, as you run an epic reporting query, even though the hashtable is over 95% full. The idea is to avoid evicting a sticky entry just because there happened to be an infrequent entry_dealloc() at the wrong time, and the least marginal of the most marginal 5% of non-sticky entries (that is, the 5% up for eviction) happened to have a call count/usage of higher than the magic value of 3, which I find quite plausible. If I apply your test for dead sticky entries after the regression tests (serial schedule) were run, my approach compares very favourably (granted, presumably usage values were double-counted for your test, making our results less than completely comparable). For the purposes of this experiment, I've just commented out if (calls == 0) continue; within the pg_stat_statements() function, obviously: postgres=# select calls = 0, count(*) from pg_stat_statements() group by calls = 0; -[ RECORD 1 ]- ?column? | f count| 959 -[ RECORD 2 ]- ?column? | t count| 3 --- this includes the above query itself postgres=# select calls = 0, count(*) from pg_stat_statements() group by calls = 0; -[ RECORD 1 ]- ?column? | f count| 960 now it's counted here... -[ RECORD 2 ]- ?column? | t count| 2 ...not here I've also attached some elogs, in their original chronological order, that trace the median usage when recorded at entry_dealloc() for the regression tests. As you'd expect given that this is the regression tests, the median is very low, consistently between 1.9 and 2.5. An additional factor that makes this work well is that the standard deviation is low, and as such it is much easier to evict sticky entries, which is what you want here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services DEBUG: cur_med_usage: 1.98 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.940598 DEBUG: cur_med_usage: 1.921192 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.882960 DEBUG: cur_med_usage: 1.882960 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.901980 DEBUG: cur_med_usage: 1.921192 DEBUG: cur_med_usage: 1.921192 DEBUG: cur_med_usage: 1.921192 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.960200 DEBUG: cur_med_usage: 1.98 DEBUG: cur_med_usage: 1.98 DEBUG: cur_med_usage: 1.98 DEBUG: cur_med_usage: 1.98 DEBUG: cur_med_usage
Re: [HACKERS] Last gasp
On 7 April 2012 22:20, Tom Lane t...@sss.pgh.pa.us wrote: In short, the idea of strongly calendar-driven releases looks more and more attractive to me the more times we go through this process. If your patch isn't ready on date X, then it's not getting into this release; but there'll be another bus coming along before long. Stretching out release cycles to get in those last few neat features just increases the pressure for more of the same, because people don't know how long it will be to the next release. I hope that that policy will not be applied without some degree of discrimination. I understand that this is an open-source project, and on a certain level it has to be that simple, because we don't have any other way of making contributors more conscientious about scheduling considerations, except of course for giving them a nudge. That said, all features are not equal, and clearly some are of particular strategic importance to the project, and as such may be worth holding up the release for, subject to certain parameters. Which of these features are worth holding the release up for is highly dependent on their state at the time of the final commitfest deadline. Which features are of strategic importance is of course a matter of opinion, but that's the kind of thing that we have a steering committee for, right? Perhaps a good compromise would be to formalise the process through which patches get a second chances in the final commitfest. I don't think that it will prove workable to have a final deadline that is adhered to blindly. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 7 April 2012 23:51, Tom Lane t...@sss.pgh.pa.us wrote: As for the steering committee, core tries hard not to make technical decisions for the project; it's better to let those emerge by consensus. I don't really consider this to be an engineering problem, which is precisely why I bring up the role of core here. Holding up a release for a feature has costs (the feature is delayed, which in some way hurts us, as our users don't have a new release for that much longer, etc) and benefits (when the release is available, Postgres is that much better one release sooner). Naturally, the maturity of the existing code, as judged by the community, would be heavily weighed in this process, but the wider importance of the feature, and how much of an advantage having it earlier provides, would also be weighed. Open source products (as distinct from projects) often have this sort of cost-benefit analysis made by individuals who are somewhat removed from the engineering decisions themselves. While I'm certainly not advocating that we switch to that model - in particular, I am not advocating anything less than full transparency - surely there is something to be said for some aspects of it, mostly to break the sorts of deadlocks that can sometimes occur. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 8 April 2012 20:51, Tom Lane t...@sss.pgh.pa.us wrote: Applied with some cosmetic adjustments. Thanks. Having taken another look at the code, I wonder if we wouldn't have been better off just fastpathing out of pgss_store in the first call (in a pair of calls made by a backend as part an execution of some non-prepared query) iff there is already an entry in the hashtable - after all, we're now going to the trouble of acquiring the spinlock just to increment the usage for the entry by 0 (likewise, every other field), which is obviously superfluous. I apologise for not having spotted this before submitting my last patch. I have attached a patch with the modifications described. This is more than a micro-optimisation, since it will cut the number of spinlock acquisitions approximately in half for non-prepared queries. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services pg_stat_statements_optimization_2012_04_08.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 10 April 2012 14:33, Robert Haas robertmh...@gmail.com wrote: So, should we make the new columns exposed by pg_stat_statements use milliseconds, so that the block read/write timings are everywhere in milliseconds, or should we keep them as a float8, so that all the times exposed by pg_stat_statements use float8? I believe that we should keep them as float8, on the basis that a user is more likely to generalise from total_time's format (when writing a script to query the view of whatever) than from that of pg_stat_database. A part of me would like to change the view definitions so that all the columns are strongly typed (i.e. all these values would be intervals). I realise that that isn't practical though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 10 April 2012 15:26, Kevin Grittner kevin.gritt...@wicourts.gov wrote: A patch on which the author is continuing to work even in the absence of review should be considered a WIP want feedback submission; it should not be allowed to constitute a placeholder for inclusion in the release. To be fair, I doubt that anyone actually believes that. If they did, they wouldn't have to pay attention very long to receive a rude awakening. onlyhalfkiddingPerhaps we should have a concept of feature months -- so that when we look at holding up a release with 20 features for two months so that one more feature can make it in, the cost side of the equation is 40 feature-months, and the benefit is 10 feature-months. (Remember, you can't count the added feature as though it's there for a year before the next release if it holds the release up.)/onlyhalfkidding I am broadly in favour of assessing the value of features in the same way that a commercial organisation might - the more adoption a feature spurs, the more valuable it is, and the more hesitant we should be to bump it (though other factors are also very important). I take this idea seriously, or at the very least share the mentality of the idea - I'm just not sure that we can formalise it, or that we should. I also think that we should try and reward good will. I think that we generally do so, but an automatic cut-off date seems contrary to that. The law of unintended consequences might see us lower our standards to commit something to meet the deadline, that would otherwise not be immediately committed. We're only human, and it would be foolish to assume that committers don't feel that kind of pressure. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] To Do wiki
On 10 April 2012 16:40, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 10.04.2012 18:31, Jeff Janes wrote: If I do select count(distinct bid) from pgbench_accounts I get many calls to btint4fastcmp, but if I do create index on pgbench_accounts (bid) I instead get many calls to btint4cmp. If the index build is using SortSupport, shouldn't it also be calling btint4fastcmp like the distinct does? Oh, sorry, you're right. I stand corrected. There is an impedance mismatch between tuplesort_begin_heap and tuplesort_begin_index_btree that prevented this from being done with the initial commit. Strangely, the SortSupport commit message didn't comment on this. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 10 April 2012 16:51, Robert Haas robertmh...@gmail.com wrote: When these things are pointed out to the people who are doing them, the response is often either (a) this feature is so important we're all going to die if it's not in the release how can you even think about bouncing it or (b) I'm not really still hacking on it these are all just minor changes. It's surprisingly easy to hoodwink even experienced contributors into thinking that your patch is really, really almost done, honest, it just needs a couple more tweaks when in fact it's nowhere close. I try not to attribute to bad faith what can be explained by incurable optimism, so maybe we just have a lot of incurable optimism. But it's doing nobody any good. I think that you may be missing the greater point here. The people that do this are kind of like the defectors in prisoner's dilemma - at a certain point, some people cannot resist the temptation to push their own patch forward at the expense of others by asserting dubiously that it's ready-for-committer, or maybe they really do incorrectly believe it to be so, or maybe, unlike you, they understand that term to mean I've done as much as I can, as has my reviewer, or whatever. To play devil's advocate, that might be an anti-social act, but at a certain point, who wants to be the last honest sap? Besides, isn't everyone's crime no crime at all? ISTM that this is symptomatic of the wider problem of a dire shortage of committer resources. 100% of my non-doc patches so far have been committed by 3 people. I would really like to see us figure out a way of making more hackers committers, perhaps subject to certain conditions that don't currently exist for committers. You might find that given commit bits, some people will take their responsibilities as a reviewer far more seriously. Maybe you don't think that any of the likely candidates are quite ready for that responsibility, but you must admit that it's a serious problem. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 10 April 2012 18:28, Robert Haas robertmh...@gmail.com wrote: If we accept your argument that some people simply cannot help themselves, then the only solution is to make it cease to be a prisoner's dilemma, and that can only be done by changing the incentives, which presumably means handing down punishments to people who push their own patches forward at the expense of others. Unless we care to choose a benevolent dictator, I don't see any way to accomplish that. Well, I was really pointing out that people are somewhat forced into a corner by the current state of affairs, because committers are not typically able to look at anything in sufficient detail that isn't ready for committer, particularly as we approach crunch-time - their time is simply too precious. By not marking the patch ready for committer, they are basically asking for their patch to be passed over, and they may be incapable of bridging the chasm between what really is their best effort, and what'd you'd consider to be the ready-for-committer gold standard. Some people cannot exclusively dedicate their time to their patch, or lack sufficient experience to meet that standard. It's feasible to think that we might be able to streamline the process of booting patches that are not close to committable at the start of a CommitFest, and especially at the start of the final CommitFest. For example, limiting patches to a small number of days in the Waiting on Author state would help a great deal. But the more general problem of people arguing that *their* patch is the special one without which the earth will cease to revolve about its axis is more difficult to solve, or that it's ready when it's really not, is more difficult to solve. How would you propose we deal with that problem? As I've already said, I think that needs to be decided impartially, ideally by people who are removed from the engineering process. I don't mean that we'd get a marketing person to make those decisions - far from it. I just mean that some separation of powers can be a good thing in some circumstances. I don't agree with that. I think that there are a few people who don't now have commit bits who should be given them - in particular, Fujii Masao and Kevin Grittner, both of whom have been doing consistently excellent work for several years. I agree with you about both individuals. I hope that this happens sooner rather than later. Giving more people the ability to commit stuff will neither force them to devote time to it nor make them qualified to do it if they aren't already. One major component of being qualified, is, of course, knowing what you don't know, and the risk of being left with egg on your face turns out to be a pretty effective way of preventing new committers from being too eager. Giving more people bits has a cost: in general, I'd expect it to result in a higher bug-to-line ratio when code is committed. However, not doing so has an opportunity cost: less code is committed, which may, on balance, result in an inferior release than what we could have had. Maybe you think that we have the balance perfectly right, and you are of course perfectly entitled to that view, as well as being perfectly entitled to having your opinion more heavily weighed than mine, but I'd like to see a dialogue about it at some point. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote: On 04/10/2012 12:27 PM, Tom Lane wrote: I am doing more sophisticated things with it, so I'll celebrate this as my opportunity to say I did something you didn't see coming for 2012. This is why I requested that we expose the query_id hash value - I believe that it will be generally useful in clustering situations. It would be nice to have a persistent identifier. While we're discussing revising pg_stat_statement's interface, are you still opposed to exposing that value, Tom? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - you're basically throwing the benefit of all of that intelligent normalisation out of the window, because it's pretty close to free to expose it. What if a developer tweaks an alias in the application for clarity? Also, as you point out, it has additional utility in advertising when a collision has happened, and setting the user's expectations appropriately. I assume that collisions are very rare, but when they do happen, this gives you a fighting chance of noticing them. As Tom points out, the query hash will vary according to platform specific characteristics, including endianness, and will require OIDs are the same on every node. However, it is still going to be useful in clusters that use streaming replication, though not a third party trigger based replication system for example, because streaming replication does of course require that those factors (and rather a lot more) will be identical across the cluster anyway. Realistically, I'd expect a large majority of people interested in this feature to only want to use it with streaming replication anyway. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote: If people need something like that, couldn't they create it by hashing the normalized query text with an arbitrary algorithm? That supposes that the normalised query text is perfectly stable. It may well not be, particularly for things like ad-hoc queries or queries generated by ORMs, across database clusters and over long periods of time - Indeed, but the hash value isn't stable either given those sorts of assumptions, so I'm not convinced that there's any advantage there. Isn't it? The hash captures the true meaning of the query, while having the database server's platform as a usually irrelevant artefact. Another thing that I forgot to mention is client encoding - it may well be fairly inconvenient to have to use the same algorithm to hash the query string across applications. You also have to hash the query string yourself again and again, which is expensive to do from Python or something, and is often inconvenient - differences beyond track_activity_query_size bytes (default:1024) are not recognised. Using an SQL code beautifier where a single byte varies now breaks everything, which developers don't expect at all (we've trained them not to), so in many ways you're back to the same limitations as classic pg_stat_statements if you attempt to aggregate queries over time and across machines, which is a very real use case. It's probably pretty annoying to have to get your Python app to use the same hash function as your Java app or whatever I, unless you want to use something heavyweight like a cryptographic hash function. By doing it within Postgres, you avoid those headaches. I'm not asking you to very loudly proclaim that it should be used like this - just expose it, accurately document it, and I'm quite confident that it will be widely used and relied upon by those that are reasonably well informed, and understand its limitations, which are really quite straightforward. What I think people would actually like to know, if they're in a situation where distinct query texts are getting hashed to the same thing, is *which* different texts got hashed to the same thing. But there's no good way to expose that given the pg_stat_statements infrastructure, and exposing the hash value doesn't help. Apart from detecting the case where we get a straightforward collision, I don't expect that that would be useful. The whole point is that the user doesn't care about the difference, and I think we've specified a practical, widely useful standard for when queries should be considered equivalent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 11 April 2012 02:14, Robert Haas robertmh...@gmail.com wrote: My perception of what's going on here is dramatically different from yours. I don't think there was any overflow of submissions for 9.2. That is just not true. See the attached graph (couldn't produce one with better resolution at short notice) - I've just eyeballed the graph, but it looks like an upward trend to me. I think I submitted my first patch in late April 2011. Now that Andres has joined 2ndQuadrant, it seems likely that his output will increase by a large amount too. I think that Noah began contributing less than 2 years ago. It certainly seems to me like we're getting more submissions that are more complex...this image was generated by the tool gitstat, and as such does not include figures for the patches that we were unable to commit, though there also appears to be an upward trend there. For the most part I would describe it as a slow and boring release cycle, with the usual spike in half-baked submissions right near the end, except this release they were less baked than usual, which is why most of them didn't go in. That is uncalled for. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services attachment: lines_of_code.png -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On 11 April 2012 03:26, Tom Lane t...@sss.pgh.pa.us wrote: [ scratches head... ] That's supposed to be total lines of code in our source tree? What's the big drop in late 2009, then? I had wondered about that myself - all I can tell you is that I used the tool as advertised, without any adornments. This particular tool is a bit misleading, because it counts lines of code as lines of checked-in text, which can include things that are very verbose without necessarily being what we really think of as code - it doesn't make any effort to discriminate against non-code, nor does it expose the option of doing so. I chose it at short notice simply because it produces graphs. I do intend to take a look at this problem in more detail, and get better statistics on changes to our codebase - it's a tricky proposition, though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers