Re: [HACKERS] Assertion failure in REL9_5_STABLE
On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja wrote: > On 2016-08-11 12:09 AM, Alvaro Herrera wrote: >> >> BTW this is not a regression, right? It's been broken all along. Or am >> I mistaken? > > > You're probably right. I just hadn't realized I could run our app against > 9.5 with --enable-cassert until last week. Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still see a problem? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new autovacuum criterion for visible pages
On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes wrote: > I wanted to create a new relopt named something like > autovacuum_vacuum_pagevisible_factor which would cause autovacuum to > vacuum a table once less than a certain fraction of the relation's > pages are marked allvisible. Interesting idea. > 1) One issue is that pg_class.relpages and pg_class.relallvisible are > themselves only updated by vacuum/analyze. In the absence of manual > vacuum or analyze, this means that if the new criterion uses those > field, it could only kick in after an autoanalyze has already been > done, which means that autovacuum_vacuum_pagevisible_factor could not > meaningfully be set lower than autovacuum_analyze_scale_factor. > > Should relallvisible be moved/copied from pg_class to > pg_stat_all_tables, so that it is maintained by the stats collector? > Or should the autovacuum worker just walk the vm of every table with a > defined autovacuum_vacuum_pagevisible_factor each time it is launched > to get an up-to-date count that way? relation_needs_vacanalyze has access to Form_pg_class, so it is not a problem to use the value of relallvisible there to decide if a vacuum/analyze should be run. > 2) Should there be a guc in addition to the relopt? I can't think of > a reason why I would want to set this globally, so I'm happy with just > a relopt. If it were set globally, it would sure increase the cost > for scanning the vm for each table once each naptime. Having a GUC is useful to enforce the default behavior of tables that do not have this parameter directly set with ALTER TABLE. > 3) Should there be a autovacuum_vacuum_pagevisible_threshold? The > other settings have both a factor and a threshold. I've never > understand what the point of the threshold settings is, but presumably > there is a point to them. Does that reason also apply to keeping vm > tuned up? Having both a threshold and a scale would make the most sense to me. It may be difficult for the lambda user to tune those parameters using a number of relation pages. An alternative would be to define those values in kB, like 32MB worth of pages are marked all visible for example. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression test for extended query protocol
On Thu, Aug 11, 2016 at 5:33 AM, Alvaro Herrera wrote: > Michael Paquier wrote: >> On Fri, Aug 5, 2016 at 12:21 AM, Alvaro Herrera >> wrote: >> > If somebody had some spare time to devote to this, I would suggest to >> > implement something in core that can be used to specify a list of >> > commands to run, and a list of files-to-be-saved-in-bf-log emitted by >> > each command. We could have one such base file in the core repo that >> > specifies some tests to run (such as "make -C src/test/recovery check"), >> > and an additional file can be given by buildfarm animals to run custom >> > tests, without having to write BF modules for each thing. With that, >> > pgsql committers could simply add a new test to be run by all buildfarm >> > animals by adding it to the list in core. >> >> Do you think about using a special makefile target to run those >> commands, say in src/test? At the end we are going to need to patch >> the buildfarm client code at least once, at least that would be worth >> it in the long term.. > > Sure. Some time ago I proposed something like a JSON file, something > like > > test_foo => { > "command" : "make -C src/test/blah check", > "save_output" : [ "src/test/blah/server.log", "src/test/blah/regress*.log" ] > } > > as I recall, Andrew said that he didn't like JSON very much but that the > idea made sense to him. As long as the buildfarm client has native support to parse that, that sounds good. I think that we had better add a TODO item now, I won't be able to tackle that in the short term. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Thu, Aug 11, 2016 at 2:58 PM, Venkata Balaji N wrote: > > On Tue, Aug 9, 2016 at 1:02 AM, Robert Haas wrote: > >> On Mon, Aug 8, 2016 at 10:10 AM, Rahila Syed >> wrote: >> > Thank you for inputs everyone. >> > >> > The opinions on this thread can be classified into following >> > 1. Commit >> > This makes more sense as the user who is doing it would realise that the > transaction has been left open. > > >> Alternatively, I also think it would be sensible to issue an immediate >> COMMIT when the autocommit setting is changed from off to on. That >> was my first reaction. >> > > Issuing commit would indicate that, open transactions will be committed > which is not a good idea in my opinion. If the user is issuing AUTOCOMMIT = > ON, then it means all the transactions initiated after issuing this must be > committed, whereas it is committing the previously pending transactions as > well. > My apologies for confusing statement, correction - i meant, by setting autocommit=on, committing all the previously open transactions ( transactions opened when autocommit=off) may not be a good idea. What user meant by autocommit=on is that all the subsequent transactions must be committed. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] multivariate statistics (v19)
On Thu, Aug 11, 2016 at 3:34 AM, Tomas Vondra wrote: > On 08/10/2016 02:23 PM, Michael Paquier wrote: >> >> On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra >> wrote: >>> The idea is that the syntax should work even for statistics built on >>> multiple tables, e.g. to provide better statistics for joins. That's why >>> the >>> schema may be specified (as each table might be in different schema), and >>> so >>> on. >> >> >> So you mean that the same statistics could be shared between tables? >> But as this is visibly not a concept introduced yet in this set of >> patches, why not just cut it off for now to simplify the whole? If >> there is no schema-related field in pg_mv_statistics we could still >> add it later if it proves to be useful. >> > > Yes, I think creating statistics on multiple tables is one of the possible > future directions. One of the previous patch versions introduced ALTER TABLE > ... ADD STATISTICS syntax, but that ran into issues in gram.y, and given the > multi-table possibilities the CREATE STATISTICS seems like a much better > idea anyway. > > But I guess you're right we may make this a bit more strict now, and relax > it in the future if needed. For example as we only support single-table > statistics at this point, we may remove the schema and always create the > statistics in the schema of the table. This would simplify the code the code a bit so I'd suggest removing that from the first shot. If there is demand for it, keeping the infrastructure open for this extension is what we had better do. > But I don't think we should make the statistics names unique only within a > table (instead of within the schema). They could be made unique using (name, table_oid, column_list). There is a lot of mumbo-jumbo regarding the way dependencies are stored with mainly serialize_mv_dependencies and deserialize_mv_dependencies that operates them from bytea/dep trees. That's not cool and not portable because pg_mv_statistic represents that as pure bytea. I would suggest creating a generic data type that does those operations, named like pg_dependency_tree and then use that in those new catalogs. pg_node_tree is a precedent of such a thing. New features could as well make use of this new data type of we are able to design that in a way generic enough, so that would be a base patch that the current 0002 applies on top of. >>> >>> >>> >>> Interesting idea, haven't thought about that. So are you suggesting to >>> add a >>> data type for each statistics type (dependencies, MCV, histogram, ...)? >> >> >> Yes that would be something like that, it would be actually perhaps >> better to have one single data type, and be able to switch between >> each model easily instead of putting byteas in the catalog. > > Hmmm, not sure about that. For example what about combinations of statistics > - e.g. when we have MCV list on the most common values and a histogram on > the rest? Should we store both as a single value, or would that be in two > separate values, or what? The same statistics can combine two different things, using different columns may depend on how readable things get... Btw, for the format we could get inspired from pg_node_tree, with pg_stat_tree: {HISTOGRAM :arg {BUCKET :index 0 :minvals ... }} {DEPENDENCY :arg {:elt "a => c" ...} ... } {MVC :arg {:index 0 :values {0,0} ... } ... } Please consider that as a tentative idea to make things more friendly. Others may have a different opinion on the matter. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Mon, Aug 8, 2016 at 9:56 PM, Bruce Momjian wrote: > On Mon, Aug 8, 2016 at 06:34:46PM +0530, Amit Kapila wrote: >> I think here expensive part would be recheck for the cases where the >> index value is changed to a different value (value which doesn't exist >> in WARM chain). You anyway have to add the new entry (key,TID) in >> index, but each time traversing the WARM chain would be an additional >> effort. For cases, where there are just two index entries and one >> them is being updated, it might regress as compare to what we do now. > > Yes, I think the all-increment or all-decrement WARM chain is going to > be the right approach. > Probably, it will mitigate the cost in some cases, still there will be a cost in comparisons especially when index is on char/varchar columns. Also, I think it will reduce the chance of performing WARM update in certain cases considering we can create a WARM tuple only when it follows the order. OTOH, if we store pointers in index to intermediate tuples, we won't face such issues. Yes, there will be some delay in cleanup of intermediate line pointers, however I think we can clear those once we mark the corresponding index tuples as dead in the next scan on corresponding index. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Tue, Aug 9, 2016 at 1:02 AM, Robert Haas wrote: > On Mon, Aug 8, 2016 at 10:10 AM, Rahila Syed > wrote: > > Thank you for inputs everyone. > > > > The opinions on this thread can be classified into following > > 1. Commit > > 2. Rollback > > 3. Error > > 4. Warning > > > > As per opinion upthread, issuing implicit commit immediately after > switching > > autocommit to ON, can be unsafe if it was not desired. While I agree > that > > its difficult to judge users intention here, but if we were to base it on > > some assumption, the closest would be implicit COMMIT in my > opinion.There is > > higher likelihood of a user being happy with issuing a commit when > setting > > autocommit ON than a transaction being rolled back. Also there are quite > > some interfaces which provide this. > > > > As mentioned upthread, issuing a warning on switching back to autocommit > > will not be effective inside a script. It won't allow subsequent > commands to > > be committed as set autocommit to ON is not committed. Scripts will have > to > > be rerun with changes which will impact user friendliness. > > > > While I agree that issuing an ERROR and rolling back the transaction > ranks > > higher in safe behaviour, it is not as common (according to instances > stated > > upthread) as immediately committing any open transaction when switching > back > > to autocommit. > > I think I like the option of having psql issue an error. On the > server side, the transaction would still be open, but the user would > receive a psql error message and the autocommit setting would not be > changed. So the user could type COMMIT or ROLLBACK manually and then > retry changing the value of the setting. > This makes more sense as the user who is doing it would realise that the transaction has been left open. > Alternatively, I also think it would be sensible to issue an immediate > COMMIT when the autocommit setting is changed from off to on. That > was my first reaction. > Issuing commit would indicate that, open transactions will be committed which is not a good idea in my opinion. If the user is issuing AUTOCOMMIT = ON, then it means all the transactions initiated after issuing this must be committed, whereas it is committing the previously pending transactions as well. > Aborting the server-side transaction - with or without notice - > doesn't seem very reasonable. > Agreed. Traditionally, open transactions in the database must be left open until user issues a COMMIT or ROLLBACK. If the session is changed or killed, then, the transaction must be rolled back. Regards, Venkata B N Fujitsu Australia
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Thank you for your information. Here is the result: After insertions: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ 2 | 3 | 5488721920 | 44337 | 4464 | 665545 | 0 | 0 | 52 | 11 (1 row) After rebuild: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+ ++-+---+ --+ 2 | 3 | 3154296832 | 41827 | 1899 | 383146 | 0 | 0 |90.08 | 0 It seems like that rebuild has an effect to reduce the number of internal and leaf_pages and make more dense leaf pages. On Wed, Aug 10, 2016 at 6:47 PM, Lukas Fittl wrote: > On Wed, Aug 10, 2016 at 4:24 PM, Kisung Kim wrote: >> >> When I used the index bloating estimation script in >> https://github.com/ioguix/pgsql-bloat-estimation, >> the result is as follows: >> > > Regardless of the issue at hand, it might make sense to verify these > statistics using pgstattuple - those bloat estimation queries can be wildly > off at times. > > See also https://www.postgresql.org/docs/9.5/static/pgstattuple.html as > well as https://www.keithf4.com/checking-for-postgresql-bloat/ > > Best, > Lukas > > -- > Lukas Fittl > > Skype: lfittl > Phone: +1 415 321 0630 > -- Bitnine Global Inc., Kisung Kim, Ph.D https://sites.google.com/site/kisungresearch/ E-mail : ks...@bitnine.net Office phone : 070-4800-5890, 408-606-8602 US Mobile phone : 408-805-2192
Re: [HACKERS] phrase search TS_phrase_execute code readability patch
On Tue, Aug 9, 2016 at 3:35 PM, David G. Johnston wrote: > I don't follow why LposStart is needed so I removed it... That doesn't seem very reasonable. > Not compiled or in any way tested... Please do not bother submitting patches that you aren't prepared to compile and test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
On Wed, Aug 10, 2016 at 6:14 PM, Tom Lane wrote: > Kevin Grittner writes: >> That one seems like it should either be at the AM level or not >> included at all. Where it would be interesting to know is if you >> are a hacker looking for an AM to enhance with support, or (when >> there is more than just btree supported, so it is not so easy to >> remember) if you are a DBA investigating a high rate of >> serialization failures and want to know whether indexes of a >> certain type have index-relation predicate locking granularity or >> something more fine-grained. The latter use case seems plausible >> once there is more of a mix of support among the AMs. > > TBH, that line of thought impresses me not at all, because I do not see > a reason for SQL queries to need to see internal behaviors of AMs, and > especially not at levels as crude as boolean properties of entire AMs, > because that's just about guaranteed to become a lie (or at least not > enough of the truth) in a year or two. If you are a DBA wanting to know > how fine-grained the locking is in a particular index type, you really > need to read the source code or ask a hacker. > > We have been bit by the "representation not good enough to describe actual > behavior" problem *repeatedly* over the years that pg_am had all this > detail. First it was amstrategies and amsupport, which have never > usefully described the set of valid proc/op strategy numbers for any index > type more complicated than btree. Then there was amorderstrategy, which > we got rid of in favor of amcanorder, and later added amcanbackward to > that (not to mention amcanorderbyop). And amconcurrent, which went away > for reasons I don't recall. Then we added amstorage, which later had to > be supplemented with amkeytype, and still isn't a very accurate guide to > what's actually in an index. amcanreturn actually was a boolean rather > than a function for awhile (though it looks like we never shipped a > release with that definition). There's still a lot of stuff with > obviously limited life expectancy, like amoptionalkey, which is at best a > really crude guide to what are valid index qualifications; someday that > will likely have to go away in favor of a "check proposed index qual for > supportability" AM callback. > > So I don't think I'm being unreasonable in wanting to minimize, not > maximize, the amount of info exposed through this interface. There is > enough history to make me pretty sure that a lot of things that might be > simple boolean properties today are going to be less simple tomorrow, and > then we'll be stuck having to invent arbitrary definitions for what the > property-test function is going to return for those. And if there are > any queries out there that are depending on simplistic interpretations > of those property flags, they'll be broken in some respect no matter > what we do. > > In short, I do not see a good reason to expose ampredlocks at the SQL > level, and I think there needs to be a darn good reason to expose any of > this stuff, not just "maybe some DBA will think he needs to query this". I don't think you're being unreasonable, but I don't agree with your approach. I think that we should expose everything we reasonably can, and if we have to change it later then it will be a backward compatibility break. Making it unqueryable in the hopes that people won't try to query it is futile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 10:23 PM, Tom Lane wrote: >> Great. Are you handling creating the new branch? > > Yeah, I guess. I've made most of them lately. I'll do it if you want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 10:35:44PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > It sounds like you are saying that the branch is to happen before the > > pgindent run. Am I missing something? > > I read it as pgindent then branch. OK, good. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
Bruce Momjian writes: > It sounds like you are saying that the branch is to happen before the > pgindent run. Am I missing something? I read it as pgindent then branch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 10:31:35PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > I am confused --- are you pgindenting just HEAD and not 9.6? Why? > > The point is to pgindent while they're still the same. So I read this: > >> +1, I was planning to do that myself. > > > Great. Are you handling creating the new branch? > > Yeah, I guess. I've made most of them lately. > > > If so, it probably > > makes sense for you to do this just beforehand. > > OK. It sounds like you are saying that the branch is to happen before the pgindent run. Am I missing something? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
Bruce Momjian writes: > I am confused --- are you pgindenting just HEAD and not 9.6? Why? The point is to pgindent while they're still the same. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 10:23:14PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Wed, Aug 10, 2016 at 5:04 PM, Tom Lane wrote: > >> Robert Haas writes: > >>> Would anyone mind too much if I refreshed typedefs.list and > >>> re-indented the whole tree before we branch? > > >> +1, I was planning to do that myself. > > > Great. Are you handling creating the new branch? > > Yeah, I guess. I've made most of them lately. > > > If so, it probably > > makes sense for you to do this just beforehand. > > OK. I am confused --- are you pgindenting just HEAD and not 9.6? Why? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
Robert Haas writes: > On Wed, Aug 10, 2016 at 5:04 PM, Tom Lane wrote: >> Robert Haas writes: >>> Would anyone mind too much if I refreshed typedefs.list and >>> re-indented the whole tree before we branch? >> +1, I was planning to do that myself. > Great. Are you handling creating the new branch? Yeah, I guess. I've made most of them lately. > If so, it probably > makes sense for you to do this just beforehand. OK. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 5:04 PM, Tom Lane wrote: > Robert Haas writes: >> Would anyone mind too much if I refreshed typedefs.list and >> re-indented the whole tree before we branch? > > +1, I was planning to do that myself. Great. Are you handling creating the new branch? If so, it probably makes sense for you to do this just beforehand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On Wed, Aug 10, 2016 at 4:24 PM, Kisung Kim wrote: > > When I used the index bloating estimation script in > https://github.com/ioguix/pgsql-bloat-estimation, > the result is as follows: > Regardless of the issue at hand, it might make sense to verify these statistics using pgstattuple - those bloat estimation queries can be wildly off at times. See also https://www.postgresql.org/docs/9.5/static/pgstattuple.html as well as https://www.keithf4.com/checking-for-postgresql-bloat/ Best, Lukas -- Lukas Fittl Skype: lfittl Phone: +1 415 321 0630
Re: [HACKERS] Set log_line_prefix and application name in test drivers
On 8/10/16 5:18 PM, Tom Lane wrote: > Or in short: I don't want to be seeing one prefix format in some buildfarm > logs and a different format in others. Sure. My patch has log_line_prefix = '%t [%p]: [%l] %qapp=%a ' which is modeled after the pgfouine recommendation, which is I believe a wide-spread convention, and it also vaguely follows syslog customs. The build farm client has log_line_prefix = '%m [%c:%l] ' which is very similar, but the lack of the PID makes it unsuitable for the purposes that I have set out, and there is no obvious place to put additional information such as %a. %m vs %t is obviously a minor issue that I will gladly adjust, but besides that I prefer to stick with my version. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Improved ICU patch - WAS: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
On Wed, Aug 10, 2016 at 1:42 PM, Palle Girgensohn wrote: > They've been used for the FreeBSD ports since 2005, and have served us well. > I have of course updated them regularly. In this latest version, I've removed > support for other encodings beside UTF-8, mostly since I don't know how to > test them, but also, I see little point in supporting anything else using ICU. Looks like you're not using the ICU equivalent of strxfrm(). While 9.5 is not the release that introduced its use, it did expand it significantly. I think you need to fix this, even though it isn't actually used to sort text at present, since presumably FreeBSD builds of 9.5 don't TRUST_STRXFRM. Since you're using ICU, though, you could reasonably trust the ICU equivalent of strxfrm(), so that's a missed opportunity. (See the wiki page on the abbreviated keys issue [1] if you don't know what I'm talking about.) Shouldn't you really have a strxfrm() wrapper, used across the board, including for callers outside of varlena.c? convert_string_datum() has been calling strxfrm() for many releases now. These calls are still used in FreeBSD builds, I would think, which seems like a bug that is not dodged by simply not defining TRUST_STRXFRM. Isn't its assumption that that matching the ordering used elsewhere not really hold on FreeBSD builds? [1] https://wiki.postgresql.org/wiki/Abbreviated_keys_glibc_issue -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Hi, I've run YCSB(Yahoo! Cloud Service Benchmark) on PostgreSQL and MongoDB with WiredTiger. And I found some interesting results and some issues(maybe) on Btree index of PostgreSQL. Here is my experiments and results. YCSB is for document store benchmark and I build following schema in PG. CREATE TABLE usertable ( YCSB_KEY varchar(30) primary key, FIELDS jsonb); And the benchmark generated avg-300-byte-length Json documents and loaded 100M rows in PG and Mongo. First I found that the size difference between PG and Mongo: I configured Mongo not to use any compression for both storage and index. MongoDB index size: 2.1 GB PostgreSQL index size: 5.5 GB When I used the index bloating estimation script in https://github.com/ioguix/pgsql-bloat-estimation, the result is as follows: current_database | schemaname | tblname | idxname | real_size | extra_size |extra_ratio| fillfactor | bloat_size |bloat_ratio| is_na --++--+---+++---+++---+--- ycsb | public | usertable| usertable_pkey | 5488852992 | 2673713152 | 48.7116917850949 | 90 | 2362122240 | 43.0348971532448 | f It says that the bloat_ratio is 42 for the index. So, I rebuilded the index and the result was changed: current_database | schemaname | tblname | idxname | real_size | extra_size |extra_ratio| fillfactor | bloat_size |bloat_ratio| is_na --++--+---+++---+++---+--- ycsb | public | usertable| usertable_pkey | 3154264064 | 339132416 | 10.7515543758863 | 90 | 27533312 | 0.872891788428275 | f I am curious about the results 1) why the index was bloated even though rows were only inserted not removed or updated. 2) And then why the bloating is removed when it is rebuilded I guess that the splits of Btree nodes during inserting rows caused the bloating but I don't know exact reasons. And given that MongoDB's index size is much smaller than PG after they handled the same workload (only insert), then is there any chances to improve PG's index behavior. Thank you very much. -- Bitnine Global Inc., Kisung Kim, Ph.D https://sites.google.com/site/kisungresearch/ E-mail : ks...@bitnine.net Office phone : 070-4800-5890, 408-606-8602 US Mobile phone : 408-805-2192
Re: [HACKERS] Assertion failure in REL9_5_STABLE
On 2016-08-11 12:09 AM, Alvaro Herrera wrote: BTW this is not a regression, right? It's been broken all along. Or am I mistaken? You're probably right. I just hadn't realized I could run our app against 9.5 with --enable-cassert until last week. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
On 2016-08-10 11:01 PM, Alvaro Herrera wrote: Oh, I see ... so there's an update chain, and you're updating a earlier tuple. But the later tuple has a multixact and one of the members is the current transaction. I wonder how can you lock a tuple that's not the latest, where that update chain was produced by your own transaction. I suppose this is because of plpgsql use of cursors. There's a rolled back subtransaction that also did some magic on the rows AFAICT. I can try and spend some time producing a smaller test case over the weekend. No hurry since this missed the today's point release anyway. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierth writes: > So these properties (I've changed all the names here, suggestions > welcome for better ones) I think should be testable on the AM, each with > an example of why: > can_order > can_unique > can_multi_col > can_exclude Check, flags that indicate what you can put in CREATE INDEX obviously have to be testable on the AM. I wonder though whether this behavior of can_order should be distinct from the notion of "can produce ordered scan output"? Maybe that's silly. Or maybe can_order needs to be something you test at the opclass level not the AM level --- I can sort of imagine an index type in which some opclasses support ordering and others don't. Certainly that behavior is possible today for amcanorderbyop. > (One possible refinement here could be to invert the sense of all of > these, making them no_whatever, so that "false" and "null" could be > treated the same by clients. Or would that be too confusing?) Hmm? I think true as the "has capability" case is fine from that perspective, null would be taken as "doesn't work". > These could be limited to being testable only on a specified index, and > not AM-wide: That would require adding a third function, but maybe we should just do that. In a lot of cases you'd rather not have to worry about which AM underlies a given index, so a simple index_has_property(regclass, text) function would be nice. (That means can_order ought to be supported in the per-index function even if you don't believe that it'd ever be opclass-specific, or in the per-column function if you do.) > can_backward As above, that could conceivably need to be per-column. > clusterable Call this can_cluster, maybe? Or just cluster? > index_scan > bitmap_scan > optional_key (? maybe) > predicate_locks (? maybe) As noted in my response to Kevin, I don't like the last two. They are internal properties and it's hard to see them being of much use to applications even if they weren't subject to change. > And these for individual columns: > can_return > search_array (? maybe) > search_nulls (? maybe) > operator_orderable (or distance_orderable? what's a good name?) distance_orderable seems not bad. > orderable > asc > desc > nulls_first > nulls_last OK > A question: implementing can_return as a per-column property looks like > it requires actually opening the index rel, rather than just consulting > the syscache the way that most pg_get_* functions do. Should it always > open it, or only for properties that need it? Probably only if needed, on performance grounds, and because opening the rel adds to chances of failure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Kevin Grittner writes: > That one seems like it should either be at the AM level or not > included at all. Where it would be interesting to know is if you > are a hacker looking for an AM to enhance with support, or (when > there is more than just btree supported, so it is not so easy to > remember) if you are a DBA investigating a high rate of > serialization failures and want to know whether indexes of a > certain type have index-relation predicate locking granularity or > something more fine-grained. The latter use case seems plausible > once there is more of a mix of support among the AMs. TBH, that line of thought impresses me not at all, because I do not see a reason for SQL queries to need to see internal behaviors of AMs, and especially not at levels as crude as boolean properties of entire AMs, because that's just about guaranteed to become a lie (or at least not enough of the truth) in a year or two. If you are a DBA wanting to know how fine-grained the locking is in a particular index type, you really need to read the source code or ask a hacker. We have been bit by the "representation not good enough to describe actual behavior" problem *repeatedly* over the years that pg_am had all this detail. First it was amstrategies and amsupport, which have never usefully described the set of valid proc/op strategy numbers for any index type more complicated than btree. Then there was amorderstrategy, which we got rid of in favor of amcanorder, and later added amcanbackward to that (not to mention amcanorderbyop). And amconcurrent, which went away for reasons I don't recall. Then we added amstorage, which later had to be supplemented with amkeytype, and still isn't a very accurate guide to what's actually in an index. amcanreturn actually was a boolean rather than a function for awhile (though it looks like we never shipped a release with that definition). There's still a lot of stuff with obviously limited life expectancy, like amoptionalkey, which is at best a really crude guide to what are valid index qualifications; someday that will likely have to go away in favor of a "check proposed index qual for supportability" AM callback. So I don't think I'm being unreasonable in wanting to minimize, not maximize, the amount of info exposed through this interface. There is enough history to make me pretty sure that a lot of things that might be simple boolean properties today are going to be less simple tomorrow, and then we'll be stuck having to invent arbitrary definitions for what the property-test function is going to return for those. And if there are any queries out there that are depending on simplistic interpretations of those property flags, they'll be broken in some respect no matter what we do. In short, I do not see a good reason to expose ampredlocks at the SQL level, and I think there needs to be a darn good reason to expose any of this stuff, not just "maybe some DBA will think he needs to query this". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
BTW this is not a regression, right? It's been broken all along. Or am I mistaken? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: > I think that last part is a very important property; my intuition is > that dividing up the work between cooperating processes in a way that > should come out equal will often fail to do so, either due to the > operating system scheduler or due to some data being cached and other > data not being cached or due to the comparator running faster on some > data than other data or due to NUMA effects that make some processes > run faster than others or due to any number of other causes. So I > think that algorithms that allocate the work dynamically are going to > greatly outperform those that use a division of labor which is fixed > at the beginning of a computation phase. I agree that dynamic sampling has big advantages. Our Quicksort implementation does dynamic sampling, of course. You need to be strict about partition boundaries: they may not be drawn at a point in the key space that is not precisely defined, and in general there can be no ambiguity about what bucket a tuple can end up in ahead of time. In other words, you cannot carelessly allow equal tuples to go on either side of an equal boundary key. The reason for this restriction is that otherwise, stuff breaks when you later attempt to "align" boundaries across sort operations that are performed in parallel. I don't think you can introduce an artificial B-Tree style tie-breaker condition to avoid the problem, because that will slow things right down (B&M Quicksort does really well with many equal keys). When you have one really common value, load balancing for partitioning just isn't going to do very well. My point is that there will be a somewhat unpleasant worst case that will need to be accepted. It's not practical to go to the trouble of preventing it entirely. So, the comparison with quicksort works on a couple of levels. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] per-statement-level INSTEAD OF triggers
> It might be more useful after we get the infrastructure that Kevin's been > working on to allow collecting all the updates into a tuplestore that > could be passed to a statement-level trigger. Right now I tend to agree > that there's little point. Maybe, this can be used to re-implement FOREIGN KEYs. Never-ending bulk DELETEs caused by lack of indexes on foreign key columns are biting novice users quite often in my experience. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Peter Eisentraut writes: > On 8/9/16 12:16 PM, Tom Lane wrote: >> Peter Eisentraut writes: >>> Here is a small patch that sets log_line_prefix and application name in >>> pg_regress and the TAP tests, to make analyzing the server log output >>> easier. >> How would this interact with the buildfarm's existing policies >> on setting log_line_prefix? > AFAICT, that only applies if the build farm client runs initdb itself, > that is, for the installcheck parts. Well, I guess the subtext of my question was whether we shouldn't try to align this with the buildfarm's choices, or vice versa. Andrew made some different choices than you have done here, and it seems like we ought to strive for a meeting of the minds on what's appropriate. Or in short: I don't want to be seeing one prefix format in some buildfarm logs and a different format in others. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Wed, Aug 10, 2016 at 12:08 PM, Claudio Freire wrote: > I think it's a great design, but for that, per-worker final tapes have > to always be random-access. Thanks. I don't think I need to live with the randomAccess restriction, because I can be clever about reading only the first tuple on each logtape.c block initially. Much later, when the binary search gets down to seeking within a single block, everything in the block can be read at once into memory, and we can take the binary search to that other representation. This latter part only needs to happen once or twice per partition boundary per worker. > I'm not hugely familiar with the code, but IIUC there's some penalty > to making them random-access right? Yeah, there is. For one thing, you have to store the length of the tuple twice, to support incremental seeking in both directions. For another, you cannot perform the final merge on-the-fly; you must produce a serialized tape as output, which is used subsequently to support random seeks. There is no penalty when you manage to do the sort in memory, though (not that that has anything to do with parallel sort). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
Robert Haas writes: > Would anyone mind too much if I refreshed typedefs.list and > re-indented the whole tree before we branch? +1, I was planning to do that myself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
Marko Tiikkaja wrote: > On 2016-08-10 19:28, Alvaro Herrera wrote: > >Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after > >already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught > >the same assertion). Something is odd there ... > > HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case. > HeapTupleSelfUpdated comes from here (line 4749): > > /* if there are updates, follow the update chain */ > if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) > { > HTSU_Result res; > res = heap_lock_updated_tuple(relation, tuple, &t_ctid, > GetCurrentTransactionId(), > mode); > if (res != HeapTupleMayBeUpdated) > { > result = res; > /* recovery code expects to have buffer lock held */ > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > goto failed; > } > } Oh, I see ... so there's an update chain, and you're updating a earlier tuple. But the later tuple has a multixact and one of the members is the current transaction. I wonder how can you lock a tuple that's not the latest, where that update chain was produced by your own transaction. I suppose this is because of plpgsql use of cursors. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: > My view on this - currently anyway - is that we shouldn't conflate the > tuplesort with the subsequent index generation, but that we should try > to use parallelism within the tuplesort itself to the greatest extent > possible. If there is a single output stream that the leader uses to > generate the final index, then none of the above problems arise. They > only arise if you've got multiple processes actually writing to the > index. I'm not sure if you're agreeing with my contention about parallel CREATE INDEX not being a good target for partitioning here. Are you? You can get some idea of how much a separate pass over the concatenated outputs would hurt by using the test GUC with my patch applied (the one that artificially forces randomAccess by B-Tree tuplesort callers). >> Suggested partitioning algorithm >> >> The basic idea I have in mind is that we create runs in workers in the >> same way that the parallel CREATE INDEX patch does (one output run per >> worker). However, rather than merging in the leader, we use a >> splitting algorithm to determine partition boundaries on-the-fly. The >> logical tape stuff then does a series of binary searches to find those >> exact split points within each worker's "final" tape. Each worker >> reports the boundary points of its original materialized output run in >> shared memory. Then, the leader instructs workers to "redistribute" >> slices of their final runs among each other, by changing the tapeset >> metadata to reflect that each worker has nworker input tapes with >> redrawn offsets into a unified BufFile. Workers immediately begin >> their own private on-the-fly merges. > > Yeah, this is pretty cool. You end up with the final merge segmented > into N submerges producing nonoverlapping ranges. So you could have > the leader perform submerge 0 itself, and while it's doing that the > other workers can perform submerges 1..N. By the time the leader > finishes submerge 0, the remaining submerges will likely be complete > and after that the leader can just read the outputs of those submerges > one after another and it has basically no additional work to do. Again, I'm a little puzzled by your remarks here. Surely the really great case for parallel sort with partitioning is the case where there remains minimal further IPC between workers? So, while "the leader can just read the outputs of those submerges", ideally it will be reading as little as possible from workers. For example, it's ideal when the workers were able to determine that their particular range in the parallel merge join has very few tuples to return, having also "synchronized" their range within two underlying relations (importantly, the merge join "synchronization" can begin per worker when the tuplesort.c on-the-fly merge begins and returns its first tuple -- that is, it can begin very soon). In short, partitioning when sorting is as much about avoiding a serial dependency for the entire query tree as it is about taking advantage of available CPU cores and disk spindles. That is my understanding, at any rate. While all this speculation about choice of algorithm is fun, realistically I'm not gong to write the patch for a rainy day (nor for parallel CREATE INDEX, at least until we become very comfortable with all the issues I raise, which could never happen). I'd be happy to consider helping you improve parallel query by providing infrastructure like this, but I need someone else to write the client of the infrastructure (e.g. a parallel merge join patch), or to at least agree to meet me half way with an interdependent prototype of their own. It's going to be messy, and we'll have to do a bit of stumbling to get to a good place. I can sign up to that if I'm not the only one that has to stumble. Remember how I said we should work on the merging bottleneck indirectly? I'm currently experimenting with having merging use sifting down to replace the root in the heap. This is very loosely based on the Jeremy Harris patch from 2014, I suppose. Anyway, this can be far, far faster, with perhaps none of the downsides that we saw in the context of building an initial replacement selection heap, because we have more control of the distribution of input (tapes have sorted tuples), and because this heap is so tiny and cache efficient to begin with. This does really well in the event of clustering of values, which is a common case, but also helps with totally random initially input. I need to do some more research before posting a patch, but right now I can see that it makes merging presorted numeric values more than 2x faster. And that's with 8 tapes, on my very I/O bound laptop. I bet that the benefits would also be large for text (temporal locality is improved, and so strcoll() comparison caching is more effective). Serial merging still needs work, it seems. -- Peter Geoghegan -- Sent via pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
On 2016-08-10 19:28, Alvaro Herrera wrote: Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught the same assertion). Something is odd there ... HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case. HeapTupleSelfUpdated comes from here (line 4749): /* if there are updates, follow the update chain */ if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) { HTSU_Result res; res = heap_lock_updated_tuple(relation, tuple, &t_ctid, GetCurrentTransactionId(), mode); if (res != HeapTupleMayBeUpdated) { result = res; /* recovery code expects to have buffer lock held */ LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); goto failed; } } Can you share the test case? Not at this time, sorry. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new pgindent run before branch?
On Wed, Aug 10, 2016 at 04:02:27PM -0400, Robert Haas wrote: > Hi, > > Would anyone mind too much if I refreshed typedefs.list and > re-indented the whole tree before we branch? There's not too much > churn at the moment, and I feel like it would be nice to start the > cycle in as clean a state as possible. > > Current results of this attached. Sure, seems like a good idea. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Improved ICU patch - WAS: [HACKERS] Implementing full UTF-8 support (aka supporting 0x00)
> 4 aug. 2016 kl. 02:40 skrev Bruce Momjian : > > On Thu, Aug 4, 2016 at 08:22:25AM +0800, Craig Ringer wrote: >> Yep, it does. But we've made little to no progress on integration of ICU >> support and AFAIK nobody's working on it right now. > > Uh, this email from July says Peter Eisentraut will submit it in > September :-) > > > https://www.postgresql.org/message-id/2b833706-1133-1e11-39d9-4fa228892...@2ndquadrant.com Cool. I have brushed up my decade+ old patches [1] for ICU, so they now have support for COLLATE on columns. https://github.com/girgen/postgres/ in branches icu/XXX where XXX is master or REL9_X_STABLE. They've been used for the FreeBSD ports since 2005, and have served us well. I have of course updated them regularly. In this latest version, I've removed support for other encodings beside UTF-8, mostly since I don't know how to test them, but also, I see little point in supporting anything else using ICU. I have one question for someone with knowledge about Turkish (Devrim?). This is the diff from regression tests, when running $ gmake check EXTRA_TESTS=collate.linux.utf8 LANG=sv_SE.UTF-8 $ cat "/Users/girgen/postgresql/obj/src/test/regress/regression.diffs" *** /Users/girgen/postgresql/postgres/src/test/regress/expected/collate.linux.utf8.out 2016-08-10 21:09:03.0 +0200 --- /Users/girgen/postgresql/obj/src/test/regress/results/collate.linux.utf8.out 2016-08-10 21:12:53.0 +0200 *** *** 373,379 SELECT 'Türkiye' COLLATE "tr_TR" ~* 'KI' AS "false"; false --- ! f (1 row) SELECT 'bıt' ~* 'BIT' COLLATE "en_US" AS "false"; --- 373,379 SELECT 'Türkiye' COLLATE "tr_TR" ~* 'KI' AS "false"; false --- ! t (1 row) SELECT 'bıt' ~* 'BIT' COLLATE "en_US" AS "false"; *** *** 385,391 SELECT 'bıt' ~* 'BIT' COLLATE "tr_TR" AS "true"; true -- ! t (1 row) -- The following actually exercises the selectivity estimation for ~*. --- 385,391 SELECT 'bıt' ~* 'BIT' COLLATE "tr_TR" AS "true"; true -- ! f (1 row) -- The following actually exercises the selectivity estimation for ~*. == The Linux locale behaves differently from ICU for the above (corner ?) cases. Any ideas if one is more correct than the other? I seems unclear to me. Perhaps it depends on whether the case-insensitive match is done using lower(both) or upper(both)? I haven't investigated this yet. @Devrim, is one more correct than the other? As Thomas points out, using ucoll_strcoll it is quick, since no copying is needed. I will get some benchmarks soon. Palle [1] https://people.freebsd.org/~girgen/postgresql-icu/README.html signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 12:07 AM, Tsunakawa, Takayuki wrote: > As another idea, we can stand on the middle ground. Interestingly, MySQL > also enables their event monitoring (Performance Schema) by default, but not > all events are collected. I guess highly encountered events are not > collected by default to minimize the overhead. Yes, I think that's a sensible approach. I can't see enabling by default a feature that significantly regresses performance. We work too hard to improve performance to throw very much of it away for any one feature, even a feature that a lot of people like. What I really like about what got committed to 9.6 is that it's so cheap we should be able to use for lots of other things - latch events, network I/O, disk I/O, etc. without hurting performance at all. But if we start timing those events, it's going to be really expensive. Even just counting them or keeping a history will cost a lot more than just publishing them while they're active, which is what we're doing now. > BTW, I remember EnterpriseDB has a wait event monitoring feature. Is it > disabled by default? What was the overhead? Timed events in Advanced Server are disabled by default. I haven't actually tested the overhead myself and I don't remember exactly what the numbers were the last time someone else did, but I think if you turned edb_timed_statistics on, it's pretty expensive. If we can agree on something sensible here, I imagine we'll get rid of that feature in Advanced Server in favor of whatever the community settles on. But if the community agrees to turn on something by default that costs a measurable percentage in performance, I predict that Advanced Server 10 will ship with a different default for that feature than PostgreSQL 10. Personally, I think too much of this thread (and previous threads) is devoted to arguing about whether it's OK to make performance worse, and by how much we'd be willing to make it worse. What I think we ought to be talking about is how to design a feature that produces the most useful data for the least performance cost possible, like by avoiding measuring wait times for events that are very frequent or waits that are very short. Or, maybe we could have a background process that updates a timestamp in shared memory every millisecond, and other processes can read that value instead of making a system call. I think on Linux systems with fast clocks the operating system basically does something like that for you, but there might be other systems where it helps. Of course, it could also skew the results if the system is so overloaded that the clock-updater process gets descheduled for a lengthy period of time. Anyway, I disagree with the idea that this feature is stalled or blocked in some way. I (and quite a few other people, though not everyone) oppose making performance significantly worse in the default configuration. I oppose that regardless of whether it is a hypothetical patch for this feature that causes the problem or whether it is a hypothetical patch for some other feature that causes the problem. I am not otherwise opposed to more work in this area; in fact, I'm rather in favor of it. But you can count on me to argue against pretty much everything that causes a performance regression, whatever the reason. Virtually every release, at least one developer proposes some patch that slows the server down by "only" 1-2%. If we'd accepted all of the patches that were shot down because of such impacts, we'd have lost a very big chunk of performance between the time I started working on PostgreSQL and now. As it is, our single-threaded performance seems to have regressed noticeably since 9.1: http://bonesmoses.org/2016/01/08/pg-phriday-how-far-weve-come/ I think that's awful. But if we'd accepted all of those patches that cost "only" one or two percentage points, it would probably be -15% or -25% rather than -4.4%. I think that if we want to really be successful as a project, we need to make that number go UP, not down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new autovacuum criterion for visible pages
I wanted to create a new relopt named something like autovacuum_vacuum_pagevisible_factor which would cause autovacuum to vacuum a table once less than a certain fraction of the relation's pages are marked allvisible. I wanted some feedback on some things. 1) One issue is that pg_class.relpages and pg_class.relallvisible are themselves only updated by vacuum/analyze. In the absence of manual vacuum or analyze, this means that if the new criterion uses those field, it could only kick in after an autoanalyze has already been done, which means that autovacuum_vacuum_pagevisible_factor could not meaningfully be set lower than autovacuum_analyze_scale_factor. Should relallvisible be moved/copied from pg_class to pg_stat_all_tables, so that it is maintained by the stats collector? Or should the autovacuum worker just walk the vm of every table with a defined autovacuum_vacuum_pagevisible_factor each time it is launched to get an up-to-date count that way? 2) Should there be a guc in addition to the relopt? I can't think of a reason why I would want to set this globally, so I'm happy with just a relopt. If it were set globally, it would sure increase the cost for scanning the vm for each table once each naptime. 3) Should there be a autovacuum_vacuum_pagevisible_threshold? The other settings have both a factor and a threshold. I've never understand what the point of the threshold settings is, but presumably there is a point to them. Does that reason also apply to keeping vm tuned up? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Shay> it's important to note that query parsing and rewriting isn't an "inevitable evil". Ok, I stay corrected. ADO.NET have raw mode in the API. That's interesting. Let's say "lots of heavily used languages do have their own notion of bind placeholders". And for the reset, it is still not that hard to prepare on the go. Shay> As Tom said, if an application can benefit from preparing, the developer has the responsibility (and also the best knowledge) to manage preparation, not the driver. Magical behavior under the hood causes surprises, hard-to-diagnose bugs etc. Why do you do C# then? Aren't you supposed to love machine codes as the least magical thing? Even x86 does not describe "the exact way the code should be executed". All the CPUs shuffle the instructions to make it faster. Shay>As Tom said, if an application can benefit from preparing, the developer has the responsibility Does developer have the responsibility to choose between "index scan" and "table seq scan"? So your "developer has the responsibility" is like building on sand. Even "SQL execution on PostgreSQL is a magical behavior under the hood". Does that mean we should drop the optimizer and implement "fully static hint-based execution mode"? I do not buy that. My experience shows, that people are very bad at predicting where the performance problem would be. For 80% (or even 99%) of the cases, they just do not care thinking if a particular statement should be server-prepared or not. They have absolutely no idea how much resources it would take and so on. ORMs have no that notion of "this query must be server-prepared, while that one must not be". And so on. It is somewhat insane to assume people would use naked SQL. Of course they would use ORMs and alike, so they just would not be able to pass that additional parameter telling if a particular query out of thousands should be server-prepared or not. Vladimir> "cached plan cannot change result type" -- PostgreSQL just fails to execute the server-prepared statement if a table was altered. Shay>How exactly do users cope with this in pgjdbc? Do they have some special API to flush (i.e. deallocate) prepared statements which they're supposed to use after a DDL? First of all, pgjdbc does report those problems to hackers. Unfortunately, it is still "not implemented". Then, a workaround at pgjdbc side is made. Here's relevant pgjdbc fix: https://github.com/pgjdbc/pgjdbc/pull/451 It analyzes error code, and if it finds "not_implemented from RevalidateCachedQuery", then it realizes it should re-prepare. Unfortunately, there is no dedicated error code, but at least there's a routine name. On top of that, each pgjdbc commit is tested against HEAD PostgreSQL revision, so if the routine will get renamed for some reason, we'll know that right away. There will be some more parsing to cover "deallocate all" case. Shay>it have been listed many times - pgbouncer Let's stop discussing pgbouncer issue here? It has absolutely nothing to do with pgsql-hackers. Thanks. Vladimir
Re: [HACKERS] regression test for extended query protocol
Michael Paquier wrote: > On Fri, Aug 5, 2016 at 12:21 AM, Alvaro Herrera > wrote: > > If somebody had some spare time to devote to this, I would suggest to > > implement something in core that can be used to specify a list of > > commands to run, and a list of files-to-be-saved-in-bf-log emitted by > > each command. We could have one such base file in the core repo that > > specifies some tests to run (such as "make -C src/test/recovery check"), > > and an additional file can be given by buildfarm animals to run custom > > tests, without having to write BF modules for each thing. With that, > > pgsql committers could simply add a new test to be run by all buildfarm > > animals by adding it to the list in core. > > Do you think about using a special makefile target to run those > commands, say in src/test? At the end we are going to need to patch > the buildfarm client code at least once, at least that would be worth > it in the long term.. Sure. Some time ago I proposed something like a JSON file, something like test_foo => { "command" : "make -C src/test/blah check", "save_output" : [ "src/test/blah/server.log", "src/test/blah/regress*.log" ] } as I recall, Andrew said that he didn't like JSON very much but that the idea made sense to him. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new pgindent run before branch?
Hi, Would anyone mind too much if I refreshed typedefs.list and re-indented the whole tree before we branch? There's not too much churn at the moment, and I feel like it would be nice to start the cycle in as clean a state as possible. Current results of this attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company pgindent.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Wed, Aug 10, 2016 at 4:37 PM, Simon Riggs wrote: > On 10 August 2016 at 03:45, Pavan Deolasee wrote: >> >> >> On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire >> wrote: >>> >>> On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee >>> wrote: >>> >>> > Some heuristics and limits on amount of work done to detect duplicate >>> > index >>> > entries will help too. >>> >>> I think I prefer a more thorough approach. >>> >>> Increment/decrement may get very complicated with custom opclasses, >>> for instance. A column-change bitmap won't know how to handle >>> funcional indexes, etc. >>> >>> What I intend to try, is modify btree to allow efficient search of a >>> key-ctid pair, by adding the ctid to the sort order. >> >> >> Yes, that's a good medium term plan. And this is kind of independent from >> the core idea. > > +1 That seems like a good idea. It would allow us to produce a bitmap > scan in blocksorted order. Well, not really. Only equal-key runs will be block-sorted, since the sort order would be (key, block, offset). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Some comments... For the record, I do find implicit/transparent driver-level query preparation interesting and potentially useful, and have opened https://github.com/npgsql/npgsql/issues/1237 to think about it - mostly based on arguments on this thread. One big question mark I have is whether this behavior should be opt-out as in pgjdbc, rather than opt-in. Like others in this thread (I think), I prefer my libraries and drivers very simple, following my exact API calls to the letter and doing minimal magic under the hood. As Tom said, if an application can benefit from preparing, the developer has the responsibility (and also the best knowledge) to manage preparation, not the driver. Magical behavior under the hood causes surprises, hard-to-diagnose bugs etc. One interesting case where implicit preparation is unbeatable, though, is when users aren't coding against the driver directly but are using some layer, e.g. an ORM. For example, the Entity Framework ORM simply does not prepare statements (see https://github.com/aspnet/EntityFramework/issues/5459 which I opened). This is one reason I find it a valuable feature to have, although probably as opt-in and not opt-out. Regarding driver-level SQL parsing, Vladimir wrote: > Unfortunately, client-side SQL parsing is inevitable evil (see below on '?'), so any sane driver would cache queries any way. The ones that do not have query cache will perform badly anyway. First, it's a very different thing to have a query cache in the driver, and to implicitly prepare statements. Second, I personally hate the idea that drivers parse SQL and rewrite queries. It's true that in many languages database APIs have "standardized" parameter placeholders, and therefore drivers have no choice but to rewrite. ADO.NET (the .NET database API) actually does not do this - parameter placeholders are completely database-dependent (see https://msdn.microsoft.com/en-us/library/yy6y35y8%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396). Npgsql does rewrite queries to provide named parameters, but the next version is going to have a "raw query" mode in which no parsing/rewriting happens whatsoever. It simply takes your string, packs it into a Parse (or Query), and sends it off. This is not necessarily going to be the main way users use Npgsql, but it's important to note that query parsing and rewriting isn't an "inevitable evil". Vladimir wrote: > This new message would be slower than proper query cache, so why should we all spend time on a half-baked solution? The cases where the prepared statement path doesn't work have already been listed many times - pgbouncer (and other pools), drivers which don't support persisting prepared statement across pooled connection open/close, and people (like many in this conversation) who don't appreciate their drivers doing magic under the hood. This is why I thing both proposals are great - there's nothing wrong with query caching (or more precisely, implicit statement preparation by the driver), especially if it's opt-in, but that doesn't mean we shouldn't optimize things for people who don't, can't or won't go for that. > To be fair, implementing a cache is a trivial thing when compared with hard-coding binary/text formats for all the datatypes in each and every language. > Remember, each driver has to implement its own set of procedures to input/output values in text/binary format, and that is a way harder than implementing the cache we are talking about. I agree with that, but it's not a question of how easy it is to implement implicit preparation. It's a question of whether driver developers choose to do this, based on other considerations as well - many people here think it's not the job of the driver to decide for you whether to prepare or not, for example. It has nothing to do with implementation complexity. > "cached plan cannot change result type" -- PostgreSQL just fails to execute the server-prepared statement if a table was altered. That alone seems to be a good reason to make implicit preparation opt-in at best. How exactly do users cope with this in pgjdbc? Do they have some special API to flush (i.e. deallocate) prepared statements which they're supposed to use after a DDL? Do you look at the command tag in the CommandComplete to know whether a command was DDL or not?
Re: [HACKERS] Heap WARM Tuples - Design Draft
On 10 August 2016 at 03:45, Pavan Deolasee wrote: > > > On Tue, Aug 9, 2016 at 12:06 AM, Claudio Freire > wrote: >> >> On Mon, Aug 8, 2016 at 2:52 PM, Pavan Deolasee >> wrote: >> >> > Some heuristics and limits on amount of work done to detect duplicate >> > index >> > entries will help too. >> >> I think I prefer a more thorough approach. >> >> Increment/decrement may get very complicated with custom opclasses, >> for instance. A column-change bitmap won't know how to handle >> funcional indexes, etc. >> >> What I intend to try, is modify btree to allow efficient search of a >> key-ctid pair, by adding the ctid to the sort order. > > > Yes, that's a good medium term plan. And this is kind of independent from > the core idea. +1 That seems like a good idea. It would allow us to produce a bitmap scan in blocksorted order. > So I'll go ahead and write a patch that implements the core > idea and some basic optimizations. +1 > We can then try different approaches such > as tracking changed columns, tracking increment/decrement and teaching btree > to skip duplicate (key, CTID) entries. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
On 8/9/16 12:16 PM, Tom Lane wrote: > Peter Eisentraut writes: >> > Here is a small patch that sets log_line_prefix and application name in >> > pg_regress and the TAP tests, to make analyzing the server log output >> > easier. > How would this interact with the buildfarm's existing policies > on setting log_line_prefix? AFAICT, that only applies if the build farm client runs initdb itself, that is, for the installcheck parts. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Mon, Aug 8, 2016 at 4:44 PM, Peter Geoghegan wrote: > The basic idea I have in mind is that we create runs in workers in the > same way that the parallel CREATE INDEX patch does (one output run per > worker). However, rather than merging in the leader, we use a > splitting algorithm to determine partition boundaries on-the-fly. The > logical tape stuff then does a series of binary searches to find those > exact split points within each worker's "final" tape. Each worker > reports the boundary points of its original materialized output run in > shared memory. Then, the leader instructs workers to "redistribute" > slices of their final runs among each other, by changing the tapeset > metadata to reflect that each worker has nworker input tapes with > redrawn offsets into a unified BufFile. Workers immediately begin > their own private on-the-fly merges. I think it's a great design, but for that, per-worker final tapes have to always be random-access. I'm not hugely familiar with the code, but IIUC there's some penalty to making them random-access right? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Mon, Aug 8, 2016 at 3:44 PM, Peter Geoghegan wrote: > I don't think partitioning is urgent for CREATE INDEX, and may be > inappropriate for CREATE INDEX under any circumstances, because: > > * Possible problems with parallel infrastructure and writes. > * Unbalanced B-Trees (or the risk thereof). > * What I've come up with is minimally divergent from the existing > approach to tuplesorting. My view on this - currently anyway - is that we shouldn't conflate the tuplesort with the subsequent index generation, but that we should try to use parallelism within the tuplesort itself to the greatest extent possible. If there is a single output stream that the leader uses to generate the final index, then none of the above problems arise. They only arise if you've got multiple processes actually writing to the index. > Suggested partitioning algorithm > > > I think a hybrid partitioning + merging approach would work well for > us. The paper "Parallel Sorting on a Shared-Nothing Architecture using > Probabilistic Splitting" [3] has influenced my thinking here (this was > written by prominent researchers from the influential UW-Madison > Wisconsin database group). Currently, I have in mind something that is > closer to what they call exact splitting to what they call > probabilistic splitting, because I don't think it's going to be > generally possible to have good statistics on partition boundaries > immediately available (e.g., through something like their > probabilistic splitting sampling the relation ahead of time). > > The basic idea I have in mind is that we create runs in workers in the > same way that the parallel CREATE INDEX patch does (one output run per > worker). However, rather than merging in the leader, we use a > splitting algorithm to determine partition boundaries on-the-fly. The > logical tape stuff then does a series of binary searches to find those > exact split points within each worker's "final" tape. Each worker > reports the boundary points of its original materialized output run in > shared memory. Then, the leader instructs workers to "redistribute" > slices of their final runs among each other, by changing the tapeset > metadata to reflect that each worker has nworker input tapes with > redrawn offsets into a unified BufFile. Workers immediately begin > their own private on-the-fly merges. Yeah, this is pretty cool. You end up with the final merge segmented into N submerges producing nonoverlapping ranges. So you could have the leader perform submerge 0 itself, and while it's doing that the other workers can perform submerges 1..N. By the time the leader finishes submerge 0, the remaining submerges will likely be complete and after that the leader can just read the outputs of those submerges one after another and it has basically no additional work to do. It might be a good idea to divide the work into a number of submerges substantially greater than the number of workers. For example, suppose we expect between 1 and 4 workers, but we partition the work into 64 submerges. The leader claims submerge 0, which is only 1/64 of the total. By the time it finishes consuming those tuples, submerge 1 will likely be done. Hopefully, even if there are only 1 or 2 workers, they can keep ahead of the leader so that very little of the merging happens in the leader. Also, if some submerges go faster than others, the distribution of work among workers remains even, because the ones that go quicker will handle more of the submerges and the ones that go slower will handle fewer. I think that last part is a very important property; my intuition is that dividing up the work between cooperating processes in a way that should come out equal will often fail to do so, either due to the operating system scheduler or due to some data being cached and other data not being cached or due to the comparator running faster on some data than other data or due to NUMA effects that make some processes run faster than others or due to any number of other causes. So I think that algorithms that allocate the work dynamically are going to greatly outperform those that use a division of labor which is fixed at the beginning of a computation phase. > Clearly it's really hard to be sure that this is the right thing at > this point, but my intuition is that this is the way to go (while > avoiding anything like this for CREATE INDEX). I'd like to know how > others feel about it. The number of others weighing in on these topics is surely less than either of us would like, but hopefully we can find a way to make progress anyhow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 8:26 PM, Greg Stark wrote: > On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov > wrote: > > Oh, I found that I underestimated complexity of async commit... :) > > Indeed. I think Tom's attitude was right even if the specific > conclusion was wrong. While I don't think removing async commit is > viable I think it would be a laudable goal if we can remove some of > the complication in it. I normally describe async commit as "just like > a normal commit but don't block on the commit" but it's actually a bit > more complicated. > > AIUI the additional complexity is that while async commits are visible > to everyone immediately other non-async transactions can be committed > but then be in limbo for a while before they are visible to others. So > other sessions will see the async commit "jump ahead" of any non-async > transactions even if those other transactions were committed first. > Any standbys will see the non-async transaction in the logs before the > async transaction and in a crash it's possible to lose the async > transaction even though it was visible but not the sync transaction > that wasn't. > > Complexity like this makes it hard to implement other features such as > CSNs. IIRC this already bit hot standby as well. I think it would be a > big improvement if we had a clear, well defined commit order that was > easy to explain and easy to reason about when new changes are being > made. > Heikki, Ants, Greg, thank you for the explanation. You restored order in my personal world. Now I see that introduction of own sequence of CSN which is not equal to LSN makes sense. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 12:26 PM, Greg Stark wrote: > Complexity like this makes it hard to implement other features such as > CSNs. IIRC this already bit hot standby as well. I think it would be a > big improvement if we had a clear, well defined commit order that was > easy to explain and easy to reason about when new changes are being > made. And here I was getting concerned that there was no mention of "apparent order of execution" for serializable transactions -- which does not necessarily match either the order of LSNs from commit records nor CSNs. The order in which transactions become visible is clearly a large factor in determining AOoE, but it is secondary to looking at whether a transaction modified data based on reading the "before" image of a data set modified by a concurrent transaction. I still think that our best bet for avoiding anomalies when using logical replication in complex environments is for logical replication to apply transactions in apparent order of execution. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 08/10/2016 02:23 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: Patch 0001: there have been comments about that before, and you have put the checks on RestrictInfo in a couple of variables of pull_varnos_walker, so nothing to say from here. I don't follow. Are you suggesting 0001 is a reasonable fix, or that there's a proposed solution? I think that's reasonable. Well, to me the 0001 feels more like a temporary workaround rather than a proper solution. I just don't know how to deal with it so I've kept it for now. Pretty sure there will be complaints that adding RestrictInfo to the expression walkers is not a nice idea. >> ... The idea is that the syntax should work even for statistics built on multiple tables, e.g. to provide better statistics for joins. That's why the schema may be specified (as each table might be in different schema), and so on. So you mean that the same statistics could be shared between tables? But as this is visibly not a concept introduced yet in this set of patches, why not just cut it off for now to simplify the whole? If there is no schema-related field in pg_mv_statistics we could still add it later if it proves to be useful. Yes, I think creating statistics on multiple tables is one of the possible future directions. One of the previous patch versions introduced ALTER TABLE ... ADD STATISTICS syntax, but that ran into issues in gram.y, and given the multi-table possibilities the CREATE STATISTICS seems like a much better idea anyway. But I guess you're right we may make this a bit more strict now, and relax it in the future if needed. For example as we only support single-table statistics at this point, we may remove the schema and always create the statistics in the schema of the table. But I don't think we should make the statistics names unique only within a table (instead of within the schema). The difference between those two cases is that if we allow multi-table statistics in the future, we can simply allow specifying the schema and everything will work just fine. But it'd break the second case, as it might result in conflicts in existing schemas. I do realize this might be seen as a case of "future proofing" based on dubious predictions of how something might work, but OTOH this (schema inherited from table, unique within a schema) is pretty consistent with how this work for indexes. + /* Spurious noise in the patch. + /* check that at least some statistics were requested */ + if (!build_dependencies) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), +errmsg("no statistics type (dependencies) was requested"))); So, WITH (dependencies) is mandatory in any case. Why not just dropping it from the first cut then? Because the follow-up patches extend this to require at least one statistics type. So in 0004 it becomes if (!(build_dependencies || build_mcv)) and in 0005 it's if (!(build_dependencies || build_mcv || build_histogram)) We might drop it from 0002 (and assume build_dependencies=true), and then add the check in 0004. But it seems a bit pointless. This is a complicated set of patches. I'd think that we should try to simplify things as much as possible first, and the WITH clause is not mandatory to have as of 0002. OK, I can remove the WITH from the 0002 part. Not a big deal. Statistics definition reorder the columns by itself depending on their order. For example: create table aa (a int, b int); create statistics aas on aa(b, a) with (dependencies); \d aa "public.aas" (dependencies) ON (a, b) As this defines a correlation between multiple columns, isn't it wrong to assume that (b, a) and (a, b) are always the same correlation? I don't recall such properties as being always commutative (old memories, I suck at stats in general). [...reading README...] So this is caused by the implementation limitations that only limit the analysis between interactions of two columns. Still it seems incorrect to reorder the user-visible portion. I don't follow. If you talk about Pearson's correlation, that clearly does not depend on the order of columns - it's perfectly independent of that. If you talk about about correlation in the wider sense (i.e. arbitrary dependence between columns), that might depend - but I don't remember a single piece of the patch where this might be a problem. Yes, based on what is done in the patch that may not be a problem, but I am wondering if this is not restricting things too much. Let's keep the code as it is. If we run into this issue in the future, we can easily relax this - there's nothing depending on the ordering of attnums, IIRC. Also, which README states that we can only analyze interactions between two columns? That's pretty clearly not the case - the patch should handle dependencies between more columns without any problems. I have
Re: [HACKERS] pg_ctl promote wait
On 8/7/16 9:44 PM, Michael Paquier wrote: >>> This is not a good >>> >> idea, and the idea of putting a wait argument in get_controlfile does >>> >> not seem a good interface to me. I'd rather see get_controlfile be >>> >> extended with a flag saying no_error_on_failure and keep the wait >>> >> logic within pg_ctl. >> > >> > I guess we could write a wrapper function in pg_ctl that encapsulated >> > the wait logic. > That's what I would do. New patches, incorporating your suggestions. I moved some of the error handling out of get_controlfile() and back into the callers, because it was getting too weird that that function knew so much about the callers' intentions. That way we don't actually have to change the signature. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 04a724675769b2412b739c408abc136d17d52794 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 26 Jul 2016 11:39:43 -0400 Subject: [PATCH v2 1/5] Make command_like output more compact Consistently print the test name, not the full command, which can be quite lenghty and include temporary directory names and other distracting details. --- src/test/perl/TestLib.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index 649fd82..c7b3262 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -276,8 +276,8 @@ sub command_like my ($stdout, $stderr); print("# Running: " . join(" ", @{$cmd}) . "\n"); my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; - ok($result, "@$cmd exit code 0"); - is($stderr, '', "@$cmd no stderr"); + ok($result, "$test_name: exit code 0"); + is($stderr, '', "$test_name: no stderr"); like($stdout, $expected_stdout, "$test_name: matches"); } -- 2.9.2 From b7c04b3f6cbe7a37359509363da0701c84063113 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 26 Jul 2016 10:48:43 -0400 Subject: [PATCH v2 2/5] pg_ctl: Add tests for promote action --- src/bin/pg_ctl/t/003_promote.pl | 39 +++ src/test/perl/TestLib.pm| 11 +++ 2 files changed, 50 insertions(+) create mode 100644 src/bin/pg_ctl/t/003_promote.pl diff --git a/src/bin/pg_ctl/t/003_promote.pl b/src/bin/pg_ctl/t/003_promote.pl new file mode 100644 index 000..1461234 --- /dev/null +++ b/src/bin/pg_ctl/t/003_promote.pl @@ -0,0 +1,39 @@ +use strict; +use warnings; + +use PostgresNode; +use TestLib; +use Test::More tests => 9; + +my $tempdir = TestLib::tempdir; + +command_fails_like([ 'pg_ctl', '-D', "$tempdir/nonexistent", 'promote' ], + qr/directory .* does not exist/, + 'pg_ctl promote with nonexistent directory'); + +my $node_primary = get_new_node('primary'); +$node_primary->init(allows_streaming => 1); + +command_fails_like([ 'pg_ctl', '-D', $node_primary->data_dir, 'promote' ], + qr/PID file .* does not exist/, + 'pg_ctl promote of not running instance fails'); + +$node_primary->start; + +command_fails_like([ 'pg_ctl', '-D', $node_primary->data_dir, 'promote' ], + qr/not in standby mode/, + 'pg_ctl promote of primary instance fails'); + +my $node_standby = get_new_node('standby'); +$node_primary->backup('my_backup'); +$node_standby->init_from_backup($node_primary, 'my_backup', has_streaming => 1); +$node_standby->start; + +is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'), + 't', 'standby is in recovery'); + +command_ok([ 'pg_ctl', '-D', $node_standby->data_dir, 'promote' ], + 'pg_ctl promote of standby runs'); + +ok($node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()'), + 'promoted standby is not in recovery'); diff --git a/src/test/perl/TestLib.pm b/src/test/perl/TestLib.pm index c7b3262..51b533e 100644 --- a/src/test/perl/TestLib.pm +++ b/src/test/perl/TestLib.pm @@ -34,6 +34,7 @@ our @EXPORT = qw( program_version_ok program_options_handling_ok command_like + command_fails_like $windows_os ); @@ -281,4 +282,14 @@ sub command_like like($stdout, $expected_stdout, "$test_name: matches"); } +sub command_fails_like +{ + my ($cmd, $expected_stderr, $test_name) = @_; + my ($stdout, $stderr); + print("# Running: " . join(" ", @{$cmd}) . "\n"); + my $result = IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr; + ok(!$result, "$test_name: exit code not 0"); + like($stderr, $expected_stderr, "$test_name: matches"); +} + 1; -- 2.9.2 From cf77e8419bae21a84a2a30d260a70bac2c19498a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Tue, 26 Jul 2016 11:23:43 -0400 Subject: [PATCH v2 3/5] pg_ctl: Detect current standby state from pg_control pg_ctl used to determine whether a server was in standby mode by looking for a recovery.conf file. With this change, it instead looks into pg_control, which is potentially more accurate. There are also occasional discussions about removing recovery.conf, so this r
Re: [HACKERS] Declarative partitioning - another take
On Wed, Aug 10, 2016 at 7:09 AM, Amit Langote wrote: > Attached is the latest set of patches to implement declarative > partitioning. Cool. I would encourage you to give some thought to what is the least committable subset of these patches, and think about whether it can be reorganized to make that smaller. Based on the descriptions, it sounds to me like the least committable subset is probably all of 0001 - 0005 as one giant commit, and that's a lot of code. Maybe there's no real way to come up with something more compact than that, but it's worth some thought. > 0001-Catalog-and-DDL-for-partition-key.patch + pg_partitioned pg_partitioned seems like a slightly strange choice of name. We have no other catalog tables whose names end in "ed", so the use of a past participle here is novel. More generally, this patch seems like it's suffering a bit of terminological confusion: while the catalog table is called pg_partitioned, the relkind is RELKIND_PARTITION_REL. And get_relation_by_qualified_name() thinks that RELKIND_PARTITION_REL is a "table", but getRelationDescription thinks it's a "partitioned table". I think we need to get all these bits and pieces on the same page, or have some kind of consistent way of deciding what to do in each case. Maybe pg_partitioned_table, RELKIND_PARTITIONED_TABLE, etc. for the internal stuff, but just "table" in user-facing messages. Alternatively, I wonder if we should switch to calling this a "partition root" rather than a "partitioned table". It's not the whole table, just the root. And doesn't contain data - in fact it probably shouldn't even have storage - so calling it a "table" might be confusing. But if we're using CREATE TABLE to create it in the ifrst place, then calling it something other than a table later on is also confusing. Hmm. + partexprs There's a certain symmetry between this and what we do for indexes, but I'm wondering whether there's a use case for partitioning a table by an expression rather than a column value. I suppose if you've already done the work, there's no harm in supporting it. +[ PARTITION BY {RANGE | LIST} ( { column_name | ( expression ) } [ opclass ] [, ...] ) The spacing isn't right here. Should say { RANGE | LIST } rather than {RANGE|LIST}. Similarly elsewhere. + thus created is called partitioned table. Key + consists of an ordered list of column names and/or expressions when + using the RANGE method, whereas only a single column or + expression can be specified when using the LIST method. Why do we support range partitioning with multiple columns, but list partitioning only with a single column? + The type of a key column or an expresion must have an associated Typo. + + Currently, there are following limitations on definition of partitioned + tables: one cannot specify any UNIQUE, PRIMARY KEY, EXCLUDE and/or + FOREIGN KEY constraints. + But I presume you can define these constraints on the partitions; that's probably worth mentioning. I'd say something like this "Partitioned tables do not support UNIQUE, PRIMARY, EXCLUDE, or FOREIGN KEY constraints; however, you can define these constraints on individual data partitions." +if (partexprs) +recordDependencyOnSingleRelExpr(&myself, +(Node *) partexprs, +RelationGetRelid(rel), +DEPENDENCY_NORMAL, +DEPENDENCY_IGNORE); I don't think introducing a new DEPENDENCY_IGNORE type is a good idea here. Instead, you could just add an additional Boolean argument to recordDependencyOnSingleRelExpr. That seems less likely to create bugs in unrelated portions of the code. +/* + * Override specified inheritance option, if relation is a partitioned + * table and it's a target of INSERT. + */ +if (alsoSource) +rte->inh |= relid_is_partitioned(rte->relid); This seems extremely unlikely to be the right way to do this. We're just doing parse analysis here, so depending on properties of the table that are (or might be made to be) changeable is not good. It's also an extra catalog lookup whether the table is partitioned or not (and whether rte->inh is already true or not). Furthermore, it seems to go against the comment explaining what rte->inh is supposed to mean, which says: *inh is TRUE for relation references that should be expanded to include *inheritance children, if the rel has any. This *must* be FALSE for *RTEs other than RTE_RELATION entries. I am not sure what problem you're trying to solve here, but I suspect this needs to be ripped out altogether or handled later, during planning. Similarly for the related change in addRangeTableEntry. +{PartitionedRelationId,/* PARTEDRELID */ +PartitionedRelidIndexId, +1, +{ +Anum_pg_partit
Re: [HACKERS] multivariate statistics (v19)
On 08/10/2016 02:24 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm starting to think that the current implementation may not be the correct one (which is why I haven't written the SGML docs about this part of the patch series yet). Assume there's a table "t" with 3 columns (a, b, c), and that we're estimating query: SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3 but that we only have two statistics (a,b) and (b,c). The current patch does about this: P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2) i.e. it estimates the first two conditions using (a,b), and then estimates (c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient, but it only works as long as the query contains conditions "connecting" the two statistics. So if we remove the "b=2" condition from the query, this stops working. This is trying to make the algorithm smarter than the user, which is something I'd think we could live without. In this case statistics on (a,c) or (a,b,c) are missing. And what if the user does not want to make use of stats for (a,c) because he only defined (a,b) and (b,c)? I don't think so. Obviously, if you have statistics covering all the conditions - great, we can't really do better than that. But there's a crucial relation between the number of dimensions of the statistics and accuracy of the statistics. Let's say you have statistics on 8 columns, and you split each dimension twice to build a histogram - that's 256 buckets right there, and we only get ~50% selectivity in each dimension (the actual histogram building algorithm is more complex, but you get the idea). I think it makes sense to pursue this, but I also think we can easily live with not having it in the first version that gets committed and doing it as follow-up patch. This patch is large and complicated enough. As this is not a mandatory piece to get a basic support, I'd suggest just to drop that for later. Which is why combining multiple statistics is in part 0006 and all the previous parts simply choose the single "best" statistics ;-) I'm perfectly fine with committing just the first few parts, and leaving 0006 for the next major version. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 08/10/2016 03:29 PM, Ants Aasma wrote: On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm starting to think that the current implementation may not be the correct one (which is why I haven't written the SGML docs about this part of the patch series yet). While researching this topic a few years ago I came across a paper on this exact topic called "Consistently Estimating the Selectivity of Conjuncts of Predicates" [1]. While effective it seems to be quite heavy-weight, so would probably need support for tiered optimization. [1] https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf I think I've read that paper some time ago, and IIRC it's solving the same problem but in a very different way - instead of combining the statistics directly, it relies on the "partial" selectivities and then estimates the total selectivity using the maximum-entropy principle. I think it's a nice idea and it probably works fine in many cases, but it kinda throws away part of the information (that we could get by matching the statistics against each other directly). But I'll keep that paper in mind, and we can revisit this solution later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
> Michael Banck writes: >> As I've been bitten by this problem recently, I thought I'd take a >> look at editing the PostGIS extension SQL file to this end, but >> contrary to the above, the @extschema@ feature only applies to >> non-relocatable extensions, from src/backend/commands/extension.c: >> * If it's not relocatable, substitute the target schema name for >> * occurrences of @extschema@. >> * >> * For a relocatable extension, we needn't do this. There cannot be >> * any need for @extschema@, else it wouldn't be relocatable. >> I'm not sure that logic is sound - even if setting @extschema@ >> explicitly in the SQL functions bodies kills inlining (not sure about >> that) or wouldn't help for other reasons, ISTM this should be >> reconsidered in the light of the use case with materialized views > > during restore. > It's not simply a matter of allowing the substitution to occur while reading the extension script. "Relocatable" means that we support ALTER EXTENSION SET SCHEMA, which means moving all the > extension's objects into some new schema. There's no good way to run around and find places where @extschema@ was replaced in order to change them to something else. > Basically the point of @extschema@ is to support extensions that are relocatable at installation time, but not afterwards. > regards, tom lane FWIW on upcoming PostGIS 2.3, we have changed to not allow PostGIS to be relocatable and schema qualifying internal calls. I took Tom's suggestion of just using @extschema@ Which did mean we needed to not allow PostGIS to be relocatable anymore. A bit of a bummer. Setting search_path on functions aside from killing inlining also killed performance in other ways so that was a no go. Not sure if that is a known issue or not and I haven't determined under what circumstances setting search_path kills performance when index usage does not come into play. I'll take it as a known. Here is an example of such a case. https://trac.osgeo.org/postgis/ticket/3611 Now getting to the fact that using @extschema@ means requiring extension not to be relocatable, that was a bummer and something we would need to deal with if we ever forced everyone to install PostGIS in a specific schema so that other extensions that rely on us can just know where PostGIS is installed (or as Steve Frost suggested a way for dependency extensions to be able to specify location of dependent extensions with a code such as @extschema_postgis@ as we've got a bunch of extensions we are aware of relying on postgis already (pgrouting, postgis_sfcgal, postgis_topology, postgis_tiger_geocoder) It would also be nice if the extension model had a way to allow the extension authors the choice of handling the 'ALTER EXTENSION SET SCHEMA' event short of monkeying with event triggers. Yes we really need an extensions authors list to iron out and hear about these pain points. :) Thanks, Regina -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Heap WARM Tuples - Design Draft
On Tue, Aug 9, 2016 at 11:39 PM, Jim Nasby wrote: > On 8/9/16 6:44 PM, Claudio Freire wrote: >> >> Since we can lookup all occurrences of k1=a index=0 and k2=a index=0, >> and in fact we probably did so already as part of the update logic > > > That's a change from what currently happens, right? > > The reason I think that's important is that dropping the assumption that we > can't safely re-find index entries from the heap opens up other > optimizations, ones that should be significantly simpler to implement. The > most obvious example being getting rid of full index scans in vacuum. While > that won't help with write amplification, it would reduce the cost of vacuum > enormously. Orders of magnitude wouldn't surprise me in the least. > > If that's indeed a prerequisite to WARM it would be great to get that > groundwork laid early so others could work on other optimizations it would > enable. I can do that. I've been prospecting the code to see what changes it would entail already. But it's still specific to btree, I'm not sure the same optimizations can be applied to GIN (maybe, if the posting list is sorted) or GIST (probably, since it's like a btree, but I don't know the code well enough). Certainly hash indexes won't support it. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
On Wed, Aug 10, 2016 at 12:31 PM, Andrew Gierth wrote: > These could be limited to being testable only on a specified index, and > not AM-wide: > predicate_locks (? maybe) That one seems like it should either be at the AM level or not included at all. Where it would be interesting to know is if you are a hacker looking for an AM to enhance with support, or (when there is more than just btree supported, so it is not so easy to remember) if you are a DBA investigating a high rate of serialization failures and want to know whether indexes of a certain type have index-relation predicate locking granularity or something more fine-grained. The latter use case seems plausible once there is more of a mix of support among the AMs. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
> "Tom" == Tom Lane writes: >> - this still has everything in amapi.c rather than creating any new >> files. Also, the regression tests are in create_index.sql for lack >> of any obviously better place. Tom> This more than doubles the size of amapi.c, so it has a definite Tom> feel of tail-wags-dog for me, even if that seemed like an Tom> appropriate place otherwise which it doesn't really. I think a Tom> new .c file under adt/ is the way to go, with extern declarations Tom> in builtins.h. Yeah, I'm fine with that. adt/amutils.c unless I see some better suggestion. Tom> Maybe we need a new regression test case file too. I don't much Tom> like adding this to create_index because (a) it doesn't Tom> particularly seem to match that file's purpose of setting up Tom> indexes on the standard regression test tables, and (b) that file Tom> is a bottleneck in parallel regression runs because it can't run Tom> in parallel with much else. Good point. I looked around to see if anything was testing pg_get_indexdef, thinking that that would be a good place, but it seems that pg_get_indexdef is tested only in incidental ways (in collate and rules, the latter of which tests it only with invalid input). I'll do the next version with a new file, unless a better idea shows up. >> Comments? Tom> Why did you go with "capability" rather than "property" in the Tom> exposed function names? The latter seems much closer to le mot Tom> juste, especially for things like asc/desc. The first version (which dealt only with AMs) went with "capability" because it was dealing with what the AM _could_ do rather than what was defined on any specific index. The second version added pg_index_column_has_property because that was the name that had been used in discussion. Changing them all to "property" would be more consistent I suppose. Tom> I'd personally cut the list of pg_am replacement properties way Tom> back, as I believe much of what you've got there is not actually Tom> of use to applications, and some of it is outright Tom> counterproductive. An example is exposing amcanreturn as an Tom> index-AM boolean. For AM-wide properties, it may be that they have to be considered "lossy" when tested against the AM oid rather than on an individual index or column - at the AM level, "false" might mean "this won't work" while "true" would mean "this might work sometimes, not guaranteed to work on every index". The documentation should probably indicate this. So these properties (I've changed all the names here, suggestions welcome for better ones) I think should be testable on the AM, each with an example of why: can_order - if this is false, an admin tool shouldn't try and put ASC or DESC in a CREATE INDEX can_unique - if this is false, an admin tool might, for example, want to not offer the user the option of CREATE UNIQUE INDEX with this AM can_multi_col - if this is false, an admin tool might want to allow the user to select only one column can_exclude - a new property that indicates whether the AM can be used for exclusion constraints; at present this matches "amgettuple" but that implementation detail should of course be hidden (One possible refinement here could be to invert the sense of all of these, making them no_whatever, so that "false" and "null" could be treated the same by clients. Or would that be too confusing?) These could be limited to being testable only on a specified index, and not AM-wide: can_backward clusterable index_scan bitmap_scan optional_key (? maybe) predicate_locks (? maybe) And these for individual columns: can_return search_array (? maybe) search_nulls (? maybe) operator_orderable (or distance_orderable? what's a good name?) orderable asc desc nulls_first nulls_last A question: implementing can_return as a per-column property looks like it requires actually opening the index rel, rather than just consulting the syscache the way that most pg_get_* functions do. Should it always open it, or only for properties that need it? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 7:54 PM, Alexander Korotkov wrote: > Oh, I found that I underestimated complexity of async commit... :) > > Do I understand right that now async commit right as follows? > 1) Async transaction confirms commit before flushing WAL. > 2) Other transactions sees effect of async transaction only when its WAL > flushed. > 3) In the session which just committed async transaction, effect of this > transaction is visible immediately (before WAL flushed). Is it true? Current code simplified: XactLogCommitRecord() if (synchronous_commit) XLogFlush() ProcArrayEndTransaction() // Become visible The issue we are discussing is that with CSNs, the "become visible" portion must occur in CSN order. If CSN == LSN, then async transactions that have their commit record after a sync record must wait for the sync record to flush and become visible. Simplest solution is to not require CSN == LSN and just assign a CSN value immediately before becoming visible. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
Marko Tiikkaja wrote: > Hi, > > Running one specific test from our application against REL9_5_STABLE > (current as of today) gives me this: > > #2 0x7effb59595be in ExceptionalCondition ( > conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0 > || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && > ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? > HeapTupleGetUpdateXid(tup) : ( (tup)-"..., > errorType=errorType@entry=0x7effb599b74b "FailedAssertion", > fileName=fileName@entry=0x7effb5b2796c "combocid.c", > lineNumber=lineNumber@entry=132) at assert.c:54 > #3 0x7effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at > combocid.c:131 > #4 0x7effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90, > tuple=tuple@entry=0x7fffcee73690, cid=346, > mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock, > follow_updates=follow_updates@entry=1 '\001', > buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at > heapam.c:4813 Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught the same assertion). Something is odd there ... Can you share the test case? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Stephen> While it may have good results in many cases, it's not accurate to say that using prepared statements will always be faster than not. There's no silver bullet. <-- that is accurate, but it is useless for end-user applications I've never claimed that "server prepared statement" is a silver bullet. I just claim that "PrepareBindExecuteDeallocate" message does have justification from performance point of view. Stephen Frost> Dropping and recreating the prepared statement is how that particular issue is addressed. Stephen, The original problem is: "extended query execution is slow" I recommend "let's just add a query cache" You mention: "that does not work since one query in a year might get slower on 6th execution" I ask: "what should be done _instead_ of a query cache?" You say: "the same query cache, but execute 5 times at most" Does that mean you agree that "query cache is a way to go"? I do not follow that. Of course, I could add "yet another debugger switch" to pgjdbc so it executes server-prepared statements 5 times maximum, however I do consider it to be a PostgreSQL's issue. I do not like to hard-code "5" into the driver logic. I do not like making "5 times maximum" a default mode. Vladimir
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov wrote: > Oh, I found that I underestimated complexity of async commit... :) Indeed. I think Tom's attitude was right even if the specific conclusion was wrong. While I don't think removing async commit is viable I think it would be a laudable goal if we can remove some of the complication in it. I normally describe async commit as "just like a normal commit but don't block on the commit" but it's actually a bit more complicated. AIUI the additional complexity is that while async commits are visible to everyone immediately other non-async transactions can be committed but then be in limbo for a while before they are visible to others. So other sessions will see the async commit "jump ahead" of any non-async transactions even if those other transactions were committed first. Any standbys will see the non-async transaction in the logs before the async transaction and in a crash it's possible to lose the async transaction even though it was visible but not the sync transaction that wasn't. Complexity like this makes it hard to implement other features such as CSNs. IIRC this already bit hot standby as well. I think it would be a big improvement if we had a clear, well defined commit order that was easy to explain and easy to reason about when new changes are being made. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 07:54 PM, Alexander Korotkov wrote: Do I understand right that now async commit right as follows? 1) Async transaction confirms commit before flushing WAL. Yes. 2) Other transactions sees effect of async transaction only when its WAL flushed. No. Other transactions also see the effects of the async transaction before it's flushed. 3) In the session which just committed async transaction, effect of this transaction is visible immediately (before WAL flushed). Is it true? Yes. (The same session is not a special case, per previous point. All sessions see the async transaction as committed, even before it's flushed.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
* Vladimir Sitnikov (sitnikov.vladi...@gmail.com) wrote: > 3) "suddently get slow the 6th time" is a PostgreSQL bug that both fails to > estimate cardinality properly, and it does not provide administrator a way > to disable the feature (generic vs specific plan). Dropping and recreating the prepared statement is how that particular issue is addressed. > Query cache does have very good results for the overall web page times, and > problems like "6th execution" are not that often. While it may have good results in many cases, it's not accurate to say that using prepared statements will always be faster than not. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas wrote: > On 08/10/2016 05:51 PM, Tom Lane wrote: > >> Heikki Linnakangas writes: >> >>> On 08/10/2016 05:09 PM, Tom Lane wrote: >>> Uh, what? That's not the semantics we have today, and I don't see why it's necessary or a good idea. Once the commit is in the WAL stream, any action taken on the basis of seeing the commit must be later in the WAL stream. So what's the problem? >>> >> I was talking about synchronous commits in the above. A synchronous >>> commit is not made visible to other transactions, until the commit WAL >>> record is flushed to disk. >>> >> >> [ thinks for a bit... ] Oh, OK, that's because we don't treat a >> transaction as committed until its clog bit is set *and* it's not >> marked as running in the PGPROC array. And sync transactions will >> flush WAL in between. >> > > Right. > > Still, having to invent CSNs seems like a huge loss for this design. >> Personally I'd give up async commit first. If we had only sync commit, >> the rule could be "xact LSN less than snapshot threshold and less than >> WAL flush position", and we'd not need CSNs. I know some people like >> async commit, but it's there because it was easy and cheap in our old >> design, not because it's the world's greatest feature and worth giving >> up performance for. >> > > I don't think that's a very popular opinion (I disagree, for one). > Asynchronous commits are a huge performance boost for some applications. > The alternative is fsync=off, and I don't want to see more people doing > that. SSDs have made the penalty of an fsync much smaller, but it's still > there. > > Hmm. There's one more possible way this could all work. Let's have CSN == > LSN, also for asynchronous commits. A snapshot is the current insert > position, but also make note of the current flush position, when you take a > snapshot. Now, when you use the snapshot, if you ever see an XID that > committed between the snapshot's insert position and the flush position, > wait for the WAL to be flushed up to the snapshot's insert position at that > point. With that scheme, an asynchronous commit could return to the > application without waiting for a flush, but if someone actually looks at > the changes the transaction made, then that transaction would have to wait. > Furthermore, we could probably skip that waiting too, if the reading > transaction is also using synchronous_commit=off. > > That's slightly different from the current behaviour. A transaction that > runs with synchronous_commit=on, and reads data that was modified by an > asynchronous transaction, would take a hit. But I think that would be > acceptable. Oh, I found that I underestimated complexity of async commit... :) Do I understand right that now async commit right as follows? 1) Async transaction confirms commit before flushing WAL. 2) Other transactions sees effect of async transaction only when its WAL flushed. 3) In the session which just committed async transaction, effect of this transaction is visible immediately (before WAL flushed). Is it true? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Slowness of extended protocol
Stephen>I encourage you to look through the archives The thing is pl/pgsql suffers from exactly the same problem. pl/pgsql is not a typical language of choice (e.g. see Tiobe index and alike), so the probability of running into "prepared statement issues" was low. As more languages would use server-prepared statements, the rate of the issues would naturally increase. JFYI: I did participate in those conversations, so I do not get which particular point are you asking for me to "look through" there. Stephen Frost: > And is the source of frequent complaints on various mailing lists along > the lines of "why did my query suddently get slow the 6th time it was > run?!". > I claim the following: 1) People run into such problems with pl/pgsql as well. pl/pgsql does exactly the same server-prepared logic. So what? Pl/pgsql does have a query cache, but other languages are forbidden from having one? 2) Those problematic queries are not that often 3) "suddently get slow the 6th time" is a PostgreSQL bug that both fails to estimate cardinality properly, and it does not provide administrator a way to disable the feature (generic vs specific plan). 4) Do you have better solution? Of course, the planner is not perfect. Of course it will have issues with wrong cardinality estimations. So what? Should we completely abandon the optimizer? I do not think so. Query cache does have very good results for the overall web page times, and problems like "6th execution" are not that often. By the way, other common problems are: "cached plan cannot change result type" -- PostgreSQL just fails to execute the server-prepared statement if a table was altered. "prepared statement does not exist" -- the applications might use "deallocate all" for some unknown reason, so the driver has to keep eye on that. "set search_path" vs "prepared statement" -- the prepared statement binds by oids, so "search_path changes" should be accompanied by "deallocate all" or alike. Vladimir
Re: [HACKERS] Assertion failure in REL9_5_STABLE
Marko Tiikkaja writes: > The failure is a number of levels down a call stack of PL/PgSQL > functions, but I can reproduce it at will by calling the function. I > unfortunately can't narrow it down to a smaller test case, but attached > is an xlogdump of the session. The query that finally breaks the > elephant's back is a SELECT .. FOR UPDATE from relid=54511. > Any ideas on how to debug this? If this test case used to pass, it'd be pretty useful to git bisect to find where it started to fail. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas wrote: > Hmm. There's one more possible way this could all work. Let's have CSN == > LSN, also for asynchronous commits. A snapshot is the current insert > position, but also make note of the current flush position, when you take a > snapshot. Now, when you use the snapshot, if you ever see an XID that > committed between the snapshot's insert position and the flush position, > wait for the WAL to be flushed up to the snapshot's insert position at that > point. With that scheme, an asynchronous commit could return to the > application without waiting for a flush, but if someone actually looks at > the changes the transaction made, then that transaction would have to wait. > Furthermore, we could probably skip that waiting too, if the reading > transaction is also using synchronous_commit=off. > > That's slightly different from the current behaviour. A transaction that > runs with synchronous_commit=on, and reads data that was modified by an > asynchronous transaction, would take a hit. But I think that would be > acceptable. My proposal of vector clocks would allow for pretty much exactly current behavior. To simplify, there would be lastSyncCommitSeqNo and lastAsyncCommitSeqNo variables in ShmemVariableCache. Transaction commit would choose which one to update based on synchronous_commit setting and embed the value of the setting into CSN log. Snapshots would contain both values, when checking for CSN visibility use the value of the looked up synchronous_commit setting to decide which value to compare against. Standby's replaying commit records would just update both values, resulting in transactions becoming visible in xlog order, as they do today. The scheme would allow for inventing a new xlog record/replication message communicating visibility ordering. However I don't see why inventing a separate CSN concept is a large problem. Quite the opposite, unless there is a good reason that I'm missing, it seems better to not unnecessarily conflate commit record durability and transaction visibility ordering. Not having them tied together allows for an external source to provide CSN values, allowing for interesting distributed transaction implementations. E.g. using a timestamp as the CSN a'la Google Spanner and the TrueTime API. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assertion failure in REL9_5_STABLE
Hi, Running one specific test from our application against REL9_5_STABLE (current as of today) gives me this: #2 0x7effb59595be in ExceptionalCondition ( conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0 || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : ( (tup)-"..., errorType=errorType@entry=0x7effb599b74b "FailedAssertion", fileName=fileName@entry=0x7effb5b2796c "combocid.c", lineNumber=lineNumber@entry=132) at assert.c:54 #3 0x7effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at combocid.c:131 #4 0x7effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90, tuple=tuple@entry=0x7fffcee73690, cid=346, mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock, follow_updates=follow_updates@entry=1 '\001', buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at heapam.c:4813 #5 0x7effb5753e82 in ExecLockRows (node=node@entry=0x7effb6cebb00) at nodeLockRows.c:179 #6 0x7effb573cbc8 in ExecProcNode (node=node@entry=0x7effb6cebb00) at execProcnode.c:516 #7 0x7effb5739432 in ExecutePlan (dest=0x7effb5dd8160 , direction=, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x7effb6cebb00, estate=0x7effb6ceb8f8) at execMain.c:1549 #8 standard_ExecutorRun (queryDesc=0x7effb6ae3b40, direction=out>, count=0) at execMain.c:337 #9 0x7effb57661db in _SPI_pquery (tcount=0, fire_triggers=1 '\001', queryDesc=0x7effb6ae3b40) at spi.c:2411 The failure is a number of levels down a call stack of PL/PgSQL functions, but I can reproduce it at will by calling the function. I unfortunately can't narrow it down to a smaller test case, but attached is an xlogdump of the session. The query that finally breaks the elephant's back is a SELECT .. FOR UPDATE from relid=54511. Any ideas on how to debug this? .m rmgr: Standby len (rec/tot): 24/50, tx: 0, lsn: 0/2428, prev 0/23000108, desc: RUNNING_XACTS nextXid 8495 latestCompletedXid 8494 oldestRunningXid 8495 rmgr: Transaction len (rec/tot): 12/38, tx: 8496, lsn: 0/2460, prev 0/2428, desc: ASSIGNMENT xtop 8495: subxacts: 8496 rmgr: Heaplen (rec/tot): 7/ 1042, tx: 8496, lsn: 0/2488, prev 0/2460, desc: LOCK off 10: xid 8496 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/47473/50461 blk 0 FPW rmgr: Heaplen (rec/tot): 7/ 8246, tx: 8496, lsn: 0/240004A0, prev 0/2488, desc: LOCK off 48: xid 8496 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/47473/51018 blk 81 FPW rmgr: Heaplen (rec/tot): 7/53, tx: 8496, lsn: 0/240024F0, prev 0/240004A0, desc: LOCK off 49: xid 8496 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 1663/47473/51018 blk 81 rmgr: Sequencelen (rec/tot):158/ 204, tx: 8496, lsn: 0/24002528, prev 0/240024F0, desc: LOG rel 1663/47473/49387, blkref #0: rel 1663/47473/49387 blk 0 rmgr: Sequencelen (rec/tot):158/ 204, tx: 0, lsn: 0/240025F8, prev 0/24002528, desc: LOG rel 1663/47473/49992, blkref #0: rel 1663/47473/49992 blk 0 rmgr: Heaplen (rec/tot): 3/ 4404, tx: 8497, lsn: 0/240026C8, prev 0/240025F8, desc: INSERT off 74, blkref #0: rel 1663/47473/49994 blk 18 FPW rmgr: Btree len (rec/tot): 2/ 6961, tx: 8497, lsn: 0/24003800, prev 0/240026C8, desc: INSERT_LEAF off 78, blkref #0: rel 1663/47473/59412 blk 2 FPW rmgr: Btree len (rec/tot): 2/ 6933, tx: 8497, lsn: 0/24005350, prev 0/24003800, desc: INSERT_LEAF off 342, blkref #0: rel 1663/47473/59414 blk 8 FPW rmgr: Heaplen (rec/tot): 7/ 4910, tx: 8497, lsn: 0/24006E80, prev 0/24005350, desc: LOCK off 3: xid 8497 LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/47473/50005 blk 0 FPW rmgr: Heaplen (rec/tot): 3/87, tx: 8498, lsn: 0/240081C8, prev 0/24006E80, desc: INSERT off 75, blkref #0: rel 1663/47473/49994 blk 18 rmgr: Btree len (rec/tot): 2/ 4273, tx: 8498, lsn: 0/24008220, prev 0/240081C8, desc: INSERT_LEAF off 76, blkref #0: rel 1663/47473/59412 blk 13 FPW rmgr: Btree len (rec/tot): 2/64, tx: 8498, lsn: 0/240092D8, prev 0/24008220, desc: INSERT_LEAF off 343, blkref #0: rel 1663/47473/59414 blk 8 rmgr: Heaplen (rec/tot): 7/53, tx: 8498, lsn: 0/24009318, prev 0/240092D8, desc: LOCK off 29: xid 8498 LOCK_ONLY KEYSHR_LOCK , blkref #0: rel 1663/47473/50005 blk 0 rmgr: Heaplen (rec/tot): 3/ 2397, tx: 8496, lsn: 0/24009350, prev 0/24009318, desc: INSERT off 16, blkref #0: rel 1663/47473/49389 blk 16 FPW rmgr: Btree len (rec/tot): 2/ 5773, tx: 8496, lsn: 0/24009CB0, prev 0/24009350, desc: INSERT_LEAF off 6, blkref #0: rel 1663/47473/49400 blk 1 FPW rmgr: Btree len (rec/tot): 2
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 09:04 AM, Stephen Frost wrote: * Joshua D. Drake (j...@commandprompt.com) wrote: +1 for Robert here, removing async commit is a non-starter. It is PostgreSQL performance 101 that you disable synchronous commit unless you have a specific data/business requirement that needs it. Specifically because of how much faster Pg is with async commit. I agree that we don't want to get rid of async commit, but, for the archive, I wouldn't recommend using it unless you specifically understand and accept that trade-off, so I wouldn't lump it into a "PostgreSQL performance 101" group- that's increasing work_mem, shared_buffers, WAL size, etc. Accepting that you're going to lose *committed* transactions on a crash requires careful thought and consideration of what you're going to do when that happens, not the other way around. Yes Stephen, you are correct which is why I said, "unless you have a specific data/business requirement that needs it". Thanks! jD Thanks! Stephen -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Robert>But that makes it the job of every driver to implement some sort of cache, which IMHO isn't a very reasonable position Let's wait what Shay decides on implementing query cache in npgsql ? Here's the issue: https://github.com/npgsql/npgsql/issues/1237 He could change his mind when it comes to the need of "new ParseBindExecuteDeallocate message". Robert>I think you should consider the possibility that those people know what they are talking about. I do appreciate all the inputs, however, all the performance-related complaints in this thread I've seen can trivially be solved by adding a query cache at the driver level + fixing pgbouncer's issue. Unfortunately, client-side SQL parsing is inevitable evil (see below on '?'), so any sane driver would cache queries any way. The ones that do not have query cache will perform badly anyway. As far as I can understand, the alternative solution is "to add ParseBindExecDeallocateInOneGo message to the protocol". This new message would require changes from all the drivers, and all the pgbouncers. This new message would be slower than proper query cache, so why should we all spend time on a half-baked solution? Of course, I might miss some use cases, that is why keep asking: please provide close to production scenario that does require the new protocol message we are talking about. Note: examples (by Robert and Shay) like "typical web application that fetches a single row from DB and prints it to the browser" were already presented, and they are easily covered by the query cache. To be fair, implementing a cache is a trivial thing when compared with hard-coding binary/text formats for all the datatypes in each and every language. Remember, each driver has to implement its own set of procedures to input/output values in text/binary format, and that is a way harder than implementing the cache we are talking about. If only there was some ability to have language-independent binary transfer format (protobuf, avro, kryo, whatever) Regarding query cache: each language has its own notion how bind variables are represented in SQL text. For instance, in Go language (as well as in Java), bind placeholders are represented as '?' character. Of course, PostgreSQL does not support that (it wants $1, $2, etc), so Go driver has to parse SQL text at the driver side in order to convert it into PostgreSQL-compatible flavor. This parser has to support comments, string literals, etc, etc. It is just natural thing to have a cache there so the driver does not repeat the same parsing logic again and again (typical applications are known to use the same query text multiple times). Robert>When there's a common need that affects users of many different programming languages You are right. No questions here. Ok, what is a need? What problem does "new message" solve? >From what I see there is no performance need to introduce "ParseBindExecDeallocateInOneGo" message. The thread is performance-related, so I naturally object spending everybody's time on implementing a useless feature. Vladimir>Do you agree that the major part would be some hot queries, the rest will be Vladimir> much less frequently used ones (e.g. one time queries)? Robert>Sure, but I don't want the application to have to know about that Application does not need to know that. It is like "branch prediction in the CPU". Application does not need to know there is a branch predictor in the CPU. The same goes for query cache. Application should just continue to execute SQLs in a sane manner, and the query cache would pick up the pattern (server-prepare hot-used queries). Vladimir
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
Michael Banck writes: > As I've been bitten by this problem recently, I thought I'd take a look > at editing the PostGIS extension SQL file to this end, but contrary to > the above, the @extschema@ feature only applies to non-relocatable > extensions, from src/backend/commands/extension.c: > * If it's not relocatable, substitute the target schema name for > * occurrences of @extschema@. > * > * For a relocatable extension, we needn't do this. There cannot be > * any need for @extschema@, else it wouldn't be relocatable. > I'm not sure that logic is sound - even if setting @extschema@ > explicitly in the SQL functions bodies kills inlining (not sure about > that) or wouldn't help for other reasons, ISTM this should be > reconsidered in the light of the use case with materialized views during > restore. It's not simply a matter of allowing the substitution to occur while reading the extension script. "Relocatable" means that we support ALTER EXTENSION SET SCHEMA, which means moving all the extension's objects into some new schema. There's no good way to run around and find places where @extschema@ was replaced in order to change them to something else. Basically the point of @extschema@ is to support extensions that are relocatable at installation time, but not afterwards. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
* Vladimir Sitnikov (sitnikov.vladi...@gmail.com) wrote: > It works completely transparent to the application, and it does use > server-prepared statements even though application builds "brand new" sql > text every time. And is the source of frequent complaints on various mailing lists along the lines of "why did my query suddently get slow the 6th time it was run?!". I encourage you to look through the archives and read up on how and why that can happen. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal for CSN based snapshots
* Joshua D. Drake (j...@commandprompt.com) wrote: > +1 for Robert here, removing async commit is a non-starter. It is > PostgreSQL performance 101 that you disable synchronous commit > unless you have a specific data/business requirement that needs it. > Specifically because of how much faster Pg is with async commit. I agree that we don't want to get rid of async commit, but, for the archive, I wouldn't recommend using it unless you specifically understand and accept that trade-off, so I wouldn't lump it into a "PostgreSQL performance 101" group- that's increasing work_mem, shared_buffers, WAL size, etc. Accepting that you're going to lose *committed* transactions on a crash requires careful thought and consideration of what you're going to do when that happens, not the other way around. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Slowness of extended protocol
On Wed, Aug 10, 2016 at 11:50 AM, Tom Lane wrote: > Robert Haas writes: >> Sure, but I don't want the application to have to know about that, and >> I don't really think the driver should need to know about that either. >> Your point, as I understand it, is that sufficiently good query >> caching in the driver can ameliorate the problem, and I agree with >> that. > > I don't, actually. If a particular application has a query mix that gets > a good win from caching query plans, it should already be using prepared > statements. The fact that that's not a particularly popular thing to do > isn't simply because people are lazy, it's because they've found out that > it isn't worth the trouble for them. Putting query caching logic into > drivers isn't going to improve performance for such cases, and it could > very possibly make things worse. The driver is the place with the > absolute least leverage for implementing caching; it has no visibility > into either the application or the server. I didn't mean to imply, in any way, that it is an ideal solution - just that people use it successfully. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
On Thu, Mar 10, 2016 at 11:48:41AM -0500, Tom Lane wrote: > If you're worried about preserving relocatability of an extension > containing such functions, the @extschema@ feature might help. As I've been bitten by this problem recently, I thought I'd take a look at editing the PostGIS extension SQL file to this end, but contrary to the above, the @extschema@ feature only applies to non-relocatable extensions, from src/backend/commands/extension.c: * If it's not relocatable, substitute the target schema name for * occurrences of @extschema@. * * For a relocatable extension, we needn't do this. There cannot be * any need for @extschema@, else it wouldn't be relocatable. I'm not sure that logic is sound - even if setting @extschema@ explicitly in the SQL functions bodies kills inlining (not sure about that) or wouldn't help for other reasons, ISTM this should be reconsidered in the light of the use case with materialized views during restore. Best regards, Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 08:28 AM, Robert Haas wrote: On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas wrote: Still, having to invent CSNs seems like a huge loss for this design. Personally I'd give up async commit first. If we had only sync commit, the rule could be "xact LSN less than snapshot threshold and less than WAL flush position", and we'd not need CSNs. I know some people like async commit, but it's there because it was easy and cheap in our old design, not because it's the world's greatest feature and worth giving up performance for. I don't think that's a very popular opinion (I disagree, for one). Asynchronous commits are a huge performance boost for some applications. The alternative is fsync=off, and I don't want to see more people doing that. SSDs have made the penalty of an fsync much smaller, but it's still there. Uh, yeah. Asynchronous commit can be 100 times faster on some realistic workloads. If we remove it, many people will have to decide between running with fsync=off and abandoning PostgreSQL altogether. That doesn't strike me as a remotely possible line of attack. +1 for Robert here, removing async commit is a non-starter. It is PostgreSQL performance 101 that you disable synchronous commit unless you have a specific data/business requirement that needs it. Specifically because of how much faster Pg is with async commit. Sincerely, jD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
Robert Haas writes: > Sure, but I don't want the application to have to know about that, and > I don't really think the driver should need to know about that either. > Your point, as I understand it, is that sufficiently good query > caching in the driver can ameliorate the problem, and I agree with > that. I don't, actually. If a particular application has a query mix that gets a good win from caching query plans, it should already be using prepared statements. The fact that that's not a particularly popular thing to do isn't simply because people are lazy, it's because they've found out that it isn't worth the trouble for them. Putting query caching logic into drivers isn't going to improve performance for such cases, and it could very possibly make things worse. The driver is the place with the absolute least leverage for implementing caching; it has no visibility into either the application or the server. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierth writes: > - this still has everything in amapi.c rather than creating any new > files. Also, the regression tests are in create_index.sql for lack > of any obviously better place. This more than doubles the size of amapi.c, so it has a definite feel of tail-wags-dog for me, even if that seemed like an appropriate place otherwise which it doesn't really. I think a new .c file under adt/ is the way to go, with extern declarations in builtins.h. Maybe we need a new regression test case file too. I don't much like adding this to create_index because (a) it doesn't particularly seem to match that file's purpose of setting up indexes on the standard regression test tables, and (b) that file is a bottleneck in parallel regression runs because it can't run in parallel with much else. > The list of column properties is: > ordered - (same as "amcanorder" AM capability) > ordered_asc > ordered_desc > ordered_nulls_first > ordered_nulls_last > If "ordered" is true then exactly one of _asc/_desc and exactly one of > _nulls_first/_last will be true; if "ordered" is false then all the > others will be false too. Check, but do we need the "ordered_" prefixes on the last four? Seems like mostly useless typing. And the name space here is never going to be so large that it'd be confusing. > Comments? Why did you go with "capability" rather than "property" in the exposed function names? The latter seems much closer to le mot juste, especially for things like asc/desc. I'd expect the property keywords to be recognized case-insensitively, as for example are the keywords in has_table_privilege() and its ilk. That being the case, you should be using pg_strcasecmp(). This stuff: if (namelen == 10 && memcmp(nameptr, "amcanorder", 10) == 0) is ugly and hard to maintain as well as being unnecessarily non-user- friendly. Just convert the input to a C string and be done with it. It's not like saving a couple of nanoseconds is critical here. I'd personally cut the list of pg_am replacement properties way back, as I believe much of what you've got there is not actually of use to applications, and some of it is outright counterproductive. An example is exposing amcanreturn as an index-AM boolean. That's just wrong these days. If you want that, it should be an index column property. And because what used to be an index-AM property no longer is in that case, I think it's a fine example of why we shouldn't be reporting more than the absolute minimum here. It would tie our hands with respect to making other improvements of that kind later. (This might be a good argument for moving as much as we can over to the index-column-property function, even if it can't be set per-column today. If there's a chance that it could be per-column in the future, let's not call it an AM property.) Also, while I see the point of the amgettuple and amgetbitmap properties, I would call them something like "index_scan" and "bitmap_scan" because that's the user's-eye view of what is supported. I do not think we should slavishly tie the property names to the old, never-intended-as-user-facing column names. I'd drop the "am" prefix, for starters. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slowness of extended protocol
On Tue, Aug 9, 2016 at 5:07 PM, Vladimir Sitnikov wrote: > I do not buy that "dynamically generated queries defeat server-prepared > usage" argument. It is just not true (see below). > > Do you mean "in language X, where X != Java it is impossible to implement a > query cache"? > That is just ridiculus. Well, multiple experienced users are telling you that this is a real problem. You certainly don't have to agree, but when a bunch of other people who are smart and experienced think that something is a problem, I think you should consider the possibility that those people know what they are talking about. > Do you agree that the major part would be some hot queries, the rest will be > much less frequently used ones (e.g. one time queries)? Sure, but I don't want the application to have to know about that, and I don't really think the driver should need to know about that either. Your point, as I understand it, is that sufficiently good query caching in the driver can ameliorate the problem, and I agree with that. But that makes it the job of every driver to implement some sort of cache, which IMHO isn't a very reasonable position. When there's a common need that affects users of many different programming languages, the server should make it easy to meet that need, not require every driver to implement query caching. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas wrote: >> Still, having to invent CSNs seems like a huge loss for this design. >> Personally I'd give up async commit first. If we had only sync commit, >> the rule could be "xact LSN less than snapshot threshold and less than >> WAL flush position", and we'd not need CSNs. I know some people like >> async commit, but it's there because it was easy and cheap in our old >> design, not because it's the world's greatest feature and worth giving >> up performance for. > > I don't think that's a very popular opinion (I disagree, for one). > Asynchronous commits are a huge performance boost for some applications. The > alternative is fsync=off, and I don't want to see more people doing that. > SSDs have made the penalty of an fsync much smaller, but it's still there. Uh, yeah. Asynchronous commit can be 100 times faster on some realistic workloads. If we remove it, many people will have to decide between running with fsync=off and abandoning PostgreSQL altogether. That doesn't strike me as a remotely possible line of attack. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 05:51 PM, Tom Lane wrote: Heikki Linnakangas writes: On 08/10/2016 05:09 PM, Tom Lane wrote: Uh, what? That's not the semantics we have today, and I don't see why it's necessary or a good idea. Once the commit is in the WAL stream, any action taken on the basis of seeing the commit must be later in the WAL stream. So what's the problem? I was talking about synchronous commits in the above. A synchronous commit is not made visible to other transactions, until the commit WAL record is flushed to disk. [ thinks for a bit... ] Oh, OK, that's because we don't treat a transaction as committed until its clog bit is set *and* it's not marked as running in the PGPROC array. And sync transactions will flush WAL in between. Right. Still, having to invent CSNs seems like a huge loss for this design. Personally I'd give up async commit first. If we had only sync commit, the rule could be "xact LSN less than snapshot threshold and less than WAL flush position", and we'd not need CSNs. I know some people like async commit, but it's there because it was easy and cheap in our old design, not because it's the world's greatest feature and worth giving up performance for. I don't think that's a very popular opinion (I disagree, for one). Asynchronous commits are a huge performance boost for some applications. The alternative is fsync=off, and I don't want to see more people doing that. SSDs have made the penalty of an fsync much smaller, but it's still there. Hmm. There's one more possible way this could all work. Let's have CSN == LSN, also for asynchronous commits. A snapshot is the current insert position, but also make note of the current flush position, when you take a snapshot. Now, when you use the snapshot, if you ever see an XID that committed between the snapshot's insert position and the flush position, wait for the WAL to be flushed up to the snapshot's insert position at that point. With that scheme, an asynchronous commit could return to the application without waiting for a flush, but if someone actually looks at the changes the transaction made, then that transaction would have to wait. Furthermore, we could probably skip that waiting too, if the reading transaction is also using synchronous_commit=off. That's slightly different from the current behaviour. A transaction that runs with synchronous_commit=on, and reads data that was modified by an asynchronous transaction, would take a hit. But I think that would be acceptable. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Updated patch. Changes: - returns NULL rather than "cache lookup failed" - added pg_index_column_has_property (incl. docs) - added regression tests Not changed / need consideration: - this still has everything in amapi.c rather than creating any new files. Also, the regression tests are in create_index.sql for lack of any obviously better place. The list of column properties is: ordered - (same as "amcanorder" AM capability) ordered_asc ordered_desc ordered_nulls_first ordered_nulls_last If "ordered" is true then exactly one of _asc/_desc and exactly one of _nulls_first/_last will be true; if "ordered" is false then all the others will be false too. The intended usage is something like CASE WHEN pg_index_column_has_property(idx, attno, 'ordered_asc') THEN 'ASC' WHEN pg_index_column_has_property(idx, attno, 'ordered_desc') THEN 'DESC' ELSE '' -- or NULL END Comments? -- Andrew (irc:RhodiumToad) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index ccb9b97..684f7b3 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -577,6 +577,89 @@ + +Capability information formerly stored in pg_am +is now available via the functions +pg_indexam_has_capability and +pg_index_has_capability +(see ). The following +boolean-valued capability names are currently supported: + + + + Capabilities + + + + + Name + Description + + + + + amcanorder + Does the access method support ordered scans sorted by the + indexed column's value? + + + amcanorderbyop + Does the access method support ordered scans sorted by the result + of an operator on the indexed column? + + + amcanbackward + Does the access method support backward scanning? + + + amcanunique + Does the access method support unique indexes? + + + amcanmulticol + Does the access method support multicolumn indexes? + + + amoptionalkey + Does the access method support a scan without any constraint + for the first index column? + + + amsearcharray + Does the access method support ScalarArrayOpExpr searches? + + + amsearchnulls + Does the access method support IS NULL/NOT NULL searches? + + + amstorage + Can index storage data type differ from column data type? + + + amclusterable + Can an index of this type be clustered on? + + + ampredlocks + Does an index of this type manage fine-grained predicate locks? + + + amgettuple + Does the access method provide an amgettuple function? + + + amgetbitmap + Does the access method provide an amgetbitmap function? + + + amcanreturn + Does the access method support index-only scans? + + + + + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7830334..d2fe506 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -16290,6 +16290,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); +pg_indexam_has_capability + + + +pg_index_column_has_property + + + +pg_index_has_capability + + + pg_options_to_table @@ -16477,6 +16489,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); number of columns, pretty-printing is implied + pg_indexam_has_capability(am_oid, cap_name) + boolean + Test whether an index access method has a specified capability + + + pg_index_column_has_property(index_oid, column_no, prop_name) + boolean + Test whether an index column has a specified property + + + pg_index_has_capability(index_oid, cap_name) + boolean + Test whether the access method for the specified index has a specified capability + + pg_options_to_table(reloptions) setof record get the set of storage option name/value pairs @@ -16620,6 +16647,73 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); + pg_indexam_has_capability and + pg_index_has_capability return whether the specified + access method, or the access method for the specified index, advertises the + named capability. NULL is returned if the capability + name is not known; true if the capability is advertised, + false if it is not. Refer + to for capability names and their meanings. + + + + pg_index_column_has_property returns whether the + specified index column possesses the named property. + NULL is returned if the property name is not + known; true if the property is present, + false if it is not. Index column property names and the + matching clauses of CREATE INDEX are given in + . + + + + Index Column Prope
Re: [HACKERS] Proposal for CSN based snapshots
Heikki Linnakangas writes: > On 08/10/2016 05:09 PM, Tom Lane wrote: >> Uh, what? That's not the semantics we have today, and I don't see why >> it's necessary or a good idea. Once the commit is in the WAL stream, >> any action taken on the basis of seeing the commit must be later in >> the WAL stream. So what's the problem? > I was talking about synchronous commits in the above. A synchronous > commit is not made visible to other transactions, until the commit WAL > record is flushed to disk. [ thinks for a bit... ] Oh, OK, that's because we don't treat a transaction as committed until its clog bit is set *and* it's not marked as running in the PGPROC array. And sync transactions will flush WAL in between. Still, having to invent CSNs seems like a huge loss for this design. Personally I'd give up async commit first. If we had only sync commit, the rule could be "xact LSN less than snapshot threshold and less than WAL flush position", and we'd not need CSNs. I know some people like async commit, but it's there because it was easy and cheap in our old design, not because it's the world's greatest feature and worth giving up performance for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Wed, Aug 10, 2016 at 11:37:36PM +0900, Satoshi Nagayasu wrote: > Agreed. > > If people are facing with some difficult situation in terms of performance, > they may accept some (one-time) overhead to resolve the issue. > But if they don't have (recognize) any issue, they may not. > > That's one of the realities according to my experiences. Yes. Many people are arguing for specific defaults based on what _they_ would want, not what the average user would want. Sophisticated users will know about this and turn it on when desired. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
2016/08/10 23:22 "Bruce Momjian" : > > On Wed, Aug 10, 2016 at 05:14:52PM +0300, Alexander Korotkov wrote: > > On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjian wrote: > > > > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > > I hope wait event monitoring will be on by default even if the overhead > > is not > > > almost zero, because the data needs to be readily available for faster > > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > > you > > > disable it by default because of overhead, how can we convince users to > > enable > > > it in production systems to solve some performance problem? I’m afraid > > severe > > > users would say “we can’t change any setting that might cause more > > trouble, so > > > investigate the cause with existing information.” > > > > If you want to know why people are against enabling this monitoring by > > default, above is the reason. What percentage of people do you think > > would be willing to take a 10% performance penalty for monitoring like > > this? I would bet very few, but the argument above doesn't seem to > > address the fact it is a small percentage. > > > > > > Just two notes from me: > > > > 1) 10% overhead from monitoring wait events is just an idea without any proof > > so soon. > > 2) We already have functionality which trades insight into database with way > > more huge overhead. auto_explain.log_analyze = true can slowdown queries *in > > times*. Do you think we should remove it? > > The point is not removing it, the point is whether > auto_explain.log_analyze = true should be enabled by default, and I > think no one wants to do that. Agreed. If people are facing with some difficult situation in terms of performance, they may accept some (one-time) overhead to resolve the issue. But if they don't have (recognize) any issue, they may not. That's one of the realities according to my experiences. Regards,
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 05:09 PM, Tom Lane wrote: Heikki Linnakangas writes: Imagine that you have a stream of normal, synchronous, commits. They get assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in that order. The way I described this scheme in the first emails on this thread, was to use the current WAL insertion position as the snapshot. That's not correct, though: you have to use the current WAL *flush* position as the snapshot. Otherwise you would see the results of a transaction that hasn't been flushed to disk yet, i.e. which might still get lost, if you pull the power plug before the flush happens. So you have to use the last flush position as the snapshot. Uh, what? That's not the semantics we have today, and I don't see why it's necessary or a good idea. Once the commit is in the WAL stream, any action taken on the basis of seeing the commit must be later in the WAL stream. So what's the problem? I was talking about synchronous commits in the above. A synchronous commit is not made visible to other transactions, until the commit WAL record is flushed to disk. You could argue that that doesn't need to be so, because indeed any action taken on the basis of seeing the commit must be later in the WAL stream. But that's what asynchronous commits are for. For synchronous commits, we have a higher standard. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Wed, Aug 10, 2016 at 05:14:52PM +0300, Alexander Korotkov wrote: > On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjian wrote: > > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > I hope wait event monitoring will be on by default even if the overhead > is not > > almost zero, because the data needs to be readily available for faster > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > you > > disable it by default because of overhead, how can we convince users to > enable > > it in production systems to solve some performance problem? I’m afraid > severe > > users would say “we can’t change any setting that might cause more > trouble, so > > investigate the cause with existing information.” > > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think > would be willing to take a 10% performance penalty for monitoring like > this? I would bet very few, but the argument above doesn't seem to > address the fact it is a small percentage. > > > Just two notes from me: > > 1) 10% overhead from monitoring wait events is just an idea without any proof > so soon. > 2) We already have functionality which trades insight into database with way > more huge overhead. auto_explain.log_analyze = true can slowdown queries *in > times*. Do you think we should remove it? The point is not removing it, the point is whether auto_explain.log_analyze = true should be enabled by default, and I think no one wants to do that. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 5:37 AM, Bruce Momjian wrote: > On Tue, Aug 9, 2016 at 02:06:40AM +, Tsunakawa, Takayuki wrote: > > I hope wait event monitoring will be on by default even if the overhead > is not > > almost zero, because the data needs to be readily available for faster > > troubleshooting. IMO, the benefit would be worth even 10% overhead. If > you > > disable it by default because of overhead, how can we convince users to > enable > > it in production systems to solve some performance problem? I’m afraid > severe > > users would say “we can’t change any setting that might cause more > trouble, so > > investigate the cause with existing information.” > > If you want to know why people are against enabling this monitoring by > default, above is the reason. What percentage of people do you think > would be willing to take a 10% performance penalty for monitoring like > this? I would bet very few, but the argument above doesn't seem to > address the fact it is a small percentage. > Just two notes from me: 1) 10% overhead from monitoring wait events is just an idea without any proof so soon. 2) We already have functionality which trades insight into database with way more huge overhead. auto_explain.log_analyze = true can slowdown queries *in times*. Do you think we should remove it? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal for CSN based snapshots
Heikki Linnakangas writes: > Imagine that you have a stream of normal, synchronous, commits. They get > assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in > that order. > The way I described this scheme in the first emails on this thread, was > to use the current WAL insertion position as the snapshot. That's not > correct, though: you have to use the current WAL *flush* position as the > snapshot. Otherwise you would see the results of a transaction that > hasn't been flushed to disk yet, i.e. which might still get lost, if you > pull the power plug before the flush happens. So you have to use the > last flush position as the snapshot. Uh, what? That's not the semantics we have today, and I don't see why it's necessary or a good idea. Once the commit is in the WAL stream, any action taken on the basis of seeing the commit must be later in the WAL stream. So what's the problem? > Now, if you do an asynchronous commit, the effects of that should become > visible immediately, without waiting for the next flush. You can't do > that, if its CSN == LSN. This distinction is completely arbitrary, and unlike the way it works now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait events monitoring future development
On Tue, Aug 9, 2016 at 12:47 AM, Ilya Kosmodemiansky < ilya.kosmodemian...@postgresql-consulting.com> wrote: > On Mon, Aug 8, 2016 at 7:03 PM, Bruce Momjian wrote: > > It seems asking users to run pg_test_timing before deploying to check > > the overhead would be sufficient. > > I'am not sure. Time measurement for waits is slightly more complicated > than a time measurement for explain analyze: a good workload plus > using gettimeofday in a straightforward manner can cause huge > overhead. What makes you think so? Both my thoughts and observations are opposite: it's way easier to get huge overhead from EXPLAIN ANALYZE than from measuring wait events. Current wait events are quite huge events itself related to syscalls, context switches and so on. In contrast EXPLAIN ANALYZE calls gettimeofday for very cheap operations like transfer tuple from one executor node to another. > Thats why a proper testing is important - if we can see a > significant performance drop if we have for example large > shared_buffers with the same concurrency, that shows gettimeofday is > too expensive to use. Am I correct, that we do not have such accurate > tests now? > Do you think that large shared buffers is a kind a stress test for wait events monitoring? If so, why? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal for CSN based snapshots
On 08/10/2016 04:34 PM, Alexander Korotkov wrote: On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas wrote: I switched to using a separate counter for CSNs. CSN is no longer the same as the commit WAL record's LSN. While I liked the conceptual simplicity of CSN == LSN a lot, and the fact that the standby would see the same commit order as master, I couldn't figure out how to make async commits to work. I didn't get async commits problem at first glance. AFAICS, the difference between sync commit and async is only that async commit doesn't wait WAL log to flush. But async commit still receives LSN. Could you describe it in more details? Imagine that you have a stream of normal, synchronous, commits. They get assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in that order. The way I described this scheme in the first emails on this thread, was to use the current WAL insertion position as the snapshot. That's not correct, though: you have to use the current WAL *flush* position as the snapshot. Otherwise you would see the results of a transaction that hasn't been flushed to disk yet, i.e. which might still get lost, if you pull the power plug before the flush happens. So you have to use the last flush position as the snapshot. Now, if you do an asynchronous commit, the effects of that should become visible immediately, without waiting for the next flush. You can't do that, if its CSN == LSN. Perhaps you could make an exception for async commits, so that the CSN of an async commit is not the commit record's LSN, but the current WAL flush position, so that it becomes visible to others immediately. But then, how do you distinguish two async transactions that commit roughly at the same time, so that the flush position at time of commit is the same for both? And now you've given up on the CSN == LSN property, for async commits, anyway. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Curing plpgsql's memory leaks for statement-lifespan values
2016-08-10 11:25 GMT+02:00 Pavel Stehule : > Hi > > 2016-07-27 16:49 GMT+02:00 Tom Lane : > >> Robert Haas writes: >> > On Mon, Jul 25, 2016 at 6:04 PM, Tom Lane wrote: >> >> I suppose that a fix based on putting PG_TRY blocks into all the >> affected >> >> functions might be simple enough that we'd risk back-patching it, but >> >> I don't really want to go that way. >> >> > try/catch blocks aren't completely free, either, and PL/pgsql is not >> > suffering from a deplorable excess of execution speed. >> >> BTW, just to annotate that a bit: I did some measurements and found out >> that on my Linux box, creating/deleting a memory context >> (AllocSetContextCreate + MemoryContextDelete) is somewhere around 10x >> more expensive than a PG_TRY block. This means that the PG_TRY approach >> would actually be faster for cases involving only a small number of >> statements-needing-local-storage within a single plpgsql function >> execution. However, the memory context creation cost is amortized across >> repeated executions of a statement, whereas of course PG_TRY won't be. >> We can roughly estimate that PG_TRY would lose any time we loop through >> the statement in question more than circa ten times. So I believe the >> way I want to do it will win speed-wise in cases where it matters, but >> it's not entirely an open-and-shut conclusion. >> >> Anyway, there are enough other reasons not to go the PG_TRY route. >> > > I did some synthetic benchmarks related to plpgsql speed - bubble sort and > loop over handling errors and I don't see any slowdown > > handling exceptions is little bit faster with your patch > > CREATE OR REPLACE FUNCTION public.loop_test(a integer) > RETURNS void > LANGUAGE plpgsql > AS $function$ > declare x int; > begin > for i in 1..a > loop > declare s text; > begin > s := 'AHOJ'; > x := (random()*1000)::int; > raise exception '*'; > exception when others then > x := 0; --raise notice 'handled'; > end; > end loop; > end; > $function$ > > head - 100K loops 640ms, patched 610ms > > Regards > Hi I am sending a review of this patch: 1. There was not any problem with patching and compilation 2. All tests passed without any problem 3. The advantages and disadvantages was discussed detailed in this thread - selected way is good + the code is little bit reduced and cleaned + special context can be used in future 4. For this patch new regress tests or documentation is not necessary 5. I didn't find performance slowdown in special tests - the impact on performance in real code should be insignificant I'll mark this patch as ready for commiter Regards Pavel > > Pavel > > > >> regards, tom lane >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > >
Re: [HACKERS] Proposal for CSN based snapshots
On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas wrote: > (Reviving an old thread) > > I spent some time dusting off this old patch, to implement CSN snapshots. > Attached is a new patch, rebased over current master, and with tons of > comments etc. cleaned up. There's more work to be done here, I'm posting > this to let people know I'm working on this again. And to have a backup on > the 'net :-). > Great! It's very nice seeing that you're working on this patch. After PGCON I've your patch rebased to the master, but you already did much more. > I switched to using a separate counter for CSNs. CSN is no longer the same > as the commit WAL record's LSN. While I liked the conceptual simplicity of > CSN == LSN a lot, and the fact that the standby would see the same commit > order as master, I couldn't figure out how to make async commits to work. > I didn't get async commits problem at first glance. AFAICS, the difference between sync commit and async is only that async commit doesn't wait WAL log to flush. But async commit still receives LSN. Could you describe it in more details? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Proposal for CSN based snapshots
On Wed, Aug 10, 2016 at 2:10 PM, Heikki Linnakangas wrote: > Yeah, if the csnlog access turns out to be too expensive, we could do > something like this. In theory, you can always convert a CSN snapshot into > an old-style list of XIDs, by scanning the csnlog between the xmin and > xmax. That could be expensive if the distance between xmin and xmax is > large, of course. But as you said, you can have various hybrid forms, where > you use a list of XIDs of some range as a cache, for example. > > I'm hopeful that we can simply make the csnlog access fast enough, though. > Looking up an XID in a sorted array is O(log n), while looking up an XID in > the csnlog is O(1). That ignores all locking and different constant > factors, of course, but it's not a given that accessing the csnlog has to > be slower than a binary search of an XID array. FYI, I'm still fan of idea to rewrite XID with CSN in tuple in the same way we're writing hint bits now. I'm going to make prototype of this approach which would be enough for performance measurements. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] multivariate statistics (v19)
On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: > 2) combining multiple statistics > > I think the ability to combine multivariate statistics (covering different > subsets of conditions) is important and useful, but I'm starting to think > that the current implementation may not be the correct one (which is why I > haven't written the SGML docs about this part of the patch series yet). While researching this topic a few years ago I came across a paper on this exact topic called "Consistently Estimating the Selectivity of Conjuncts of Predicates" [1]. While effective it seems to be quite heavy-weight, so would probably need support for tiered optimization. [1] https://courses.cs.washington.edu/courses/cse544/11wi/papers/markl-vldb-2005.pdf Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: > On 10/08/16 13:33, Tomas Vondra wrote: >> >> On 08/10/2016 06:41 AM, Michael Paquier wrote: >>> >>> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm starting to think that the current implementation may not be the correct one (which is why I haven't written the SGML docs about this part of the patch series yet). Assume there's a table "t" with 3 columns (a, b, c), and that we're estimating query: SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3 but that we only have two statistics (a,b) and (b,c). The current patch does about this: P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2) i.e. it estimates the first two conditions using (a,b), and then estimates (c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient, but it only works as long as the query contains conditions "connecting" the two statistics. So if we remove the "b=2" condition from the query, this stops working. >>> >>> >>> This is trying to make the algorithm smarter than the user, which is >>> something I'd think we could live without. In this case statistics on >>> (a,c) or (a,b,c) are missing. And what if the user does not want to >>> make use of stats for (a,c) because he only defined (a,b) and (b,c)? >>> >> >> I don't think so. Obviously, if you have statistics covering all the >> conditions - great, we can't really do better than that. >> >> But there's a crucial relation between the number of dimensions of the >> statistics and accuracy of the statistics. Let's say you have statistics >> on 8 columns, and you split each dimension twice to build a histogram - >> that's 256 buckets right there, and we only get ~50% selectivity in each >> dimension (the actual histogram building algorithm is more complex, but >> you get the idea). > > I think it makes sense to pursue this, but I also think we can easily live > with not having it in the first version that gets committed and doing it as > follow-up patch. This patch is large and complicated enough. As this is not a mandatory piece to get a basic support, I'd suggest just to drop that for later. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: > On 08/10/2016 06:41 AM, Michael Paquier wrote: >> Patch 0001: there have been comments about that before, and you have >> put the checks on RestrictInfo in a couple of variables of >> pull_varnos_walker, so nothing to say from here. >> > > I don't follow. Are you suggesting 0001 is a reasonable fix, or that there's > a proposed solution? I think that's reasonable. >> Patch 0002: >> + >> + CREATE STATISTICS will create a new multivariate >> + statistics on the table. The statistics will be created in the in the >> + current database. The statistics will be owned by the user issuing >> + the command. >> + >> s/in the/in the/. >> >> + >> + Create table t1 with two functionally dependent >> columns, i.e. >> + knowledge of a value in the first column is sufficient for detemining >> the >> + value in the other column. Then functional dependencies are built on >> those >> + columns: >> s/detemining/determining/ >> >> + >> + If a schema name is given (for example, CREATE STATISTICS >> + myschema.mystat ...) then the statistics is created in the >> specified >> + schema. Otherwise it is created in the current schema. The name of >> + the table must be distinct from the name of any other statistics in >> the >> + same schema. >> + >> I would just assume that a statistics is located on the schema of the >> relation it depends on. So the thing that may be better to do is just: >> - Register the OID of the table a statistics depends on but not the >> schema. >> - Give up on those query extensions related to the schema. >> - Allow the same statistics name to be used for multiple tables. >> - Just fail if a statistics name is being reused on the table again. >> It may be better to complain about that even if the column list is >> different. >> - Register the dependency between the statistics and the table. > > The idea is that the syntax should work even for statistics built on > multiple tables, e.g. to provide better statistics for joins. That's why the > schema may be specified (as each table might be in different schema), and so > on. So you mean that the same statistics could be shared between tables? But as this is visibly not a concept introduced yet in this set of patches, why not just cut it off for now to simplify the whole? If there is no schema-related field in pg_mv_statistics we could still add it later if it proves to be useful. >> + >> /* >> Spurious noise in the patch. >> >> + /* check that at least some statistics were requested */ >> + if (!build_dependencies) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> +errmsg("no statistics type (dependencies) was >> requested"))); >> So, WITH (dependencies) is mandatory in any case. Why not just >> dropping it from the first cut then? > > > Because the follow-up patches extend this to require at least one statistics > type. So in 0004 it becomes > > if (!(build_dependencies || build_mcv)) > > and in 0005 it's > > if (!(build_dependencies || build_mcv || build_histogram)) > > We might drop it from 0002 (and assume build_dependencies=true), and then > add the check in 0004. But it seems a bit pointless. This is a complicated set of patches. I'd think that we should try to simplify things as much as possible first, and the WITH clause is not mandatory to have as of 0002. >> Statistics definition reorder the columns by itself depending on their >> order. For example: >> create table aa (a int, b int); >> create statistics aas on aa(b, a) with (dependencies); >> \d aa >> "public.aas" (dependencies) ON (a, b) >> As this defines a correlation between multiple columns, isn't it wrong >> to assume that (b, a) and (a, b) are always the same correlation? I >> don't recall such properties as being always commutative (old >> memories, I suck at stats in general). [...reading README...] So this >> is caused by the implementation limitations that only limit the >> analysis between interactions of two columns. Still it seems incorrect >> to reorder the user-visible portion. > > I don't follow. If you talk about Pearson's correlation, that clearly does > not depend on the order of columns - it's perfectly independent of that. If > you talk about about correlation in the wider sense (i.e. arbitrary > dependence between columns), that might depend - but I don't remember a > single piece of the patch where this might be a problem. Yes, based on what is done in the patch that may not be a problem, but I am wondering if this is not restricting things too much. > Also, which README states that we can only analyze interactions between two > columns? That's pretty clearly not the case - the patch should handle > dependencies between more columns without any problems. I have noticed that the patch evaluates all the set of permutations possible using a column list, it seems to me though that say if we have three columns (a,
Re: [HACKERS] multivariate statistics (v19)
On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm starting to think that the current implementation may not be the correct one (which is why I haven't written the SGML docs about this part of the patch series yet). Assume there's a table "t" with 3 columns (a, b, c), and that we're estimating query: SELECT * FROM t WHERE a = 1 AND b = 2 AND c = 3 but that we only have two statistics (a,b) and (b,c). The current patch does about this: P(a=1,b=2,c=3) = P(a=1,b=2) * P(c=3|b=2) i.e. it estimates the first two conditions using (a,b), and then estimates (c=3) using (b,c) with "b=2" as a condition. Now, this is very efficient, but it only works as long as the query contains conditions "connecting" the two statistics. So if we remove the "b=2" condition from the query, this stops working. This is trying to make the algorithm smarter than the user, which is something I'd think we could live without. In this case statistics on (a,c) or (a,b,c) are missing. And what if the user does not want to make use of stats for (a,c) because he only defined (a,b) and (b,c)? I don't think so. Obviously, if you have statistics covering all the conditions - great, we can't really do better than that. But there's a crucial relation between the number of dimensions of the statistics and accuracy of the statistics. Let's say you have statistics on 8 columns, and you split each dimension twice to build a histogram - that's 256 buckets right there, and we only get ~50% selectivity in each dimension (the actual histogram building algorithm is more complex, but you get the idea). I think it makes sense to pursue this, but I also think we can easily live with not having it in the first version that gets committed and doing it as follow-up patch. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers