[HACKERS] proposal: make NOTIFY list de-duplication optional
- new GUC in "Statement Behaviour" section, notify_duplicate_removal (default true) Initial discussion in this thread: http://www.postgresql.org/message-id/CAP_rwwmpzk9=sbjrztod05bdctyc43wnknu_m37dygvl4sa...@mail.gmail.com Rationale: for some legitimate use cases, duplicate removal is not required, and it gets O(N^2) cost on large COPY/ insert transactions. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 392eb70..9fb5504 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6095,6 +6095,24 @@ SET XML OPTION { DOCUMENT | CONTENT }; + + notify_duplicate_removal (bool) + + notify_duplicate_removal configuration parameter + + + + +Try to remove duplicate messages while processing NOTIFY. When +on (the default), the server will avoid duplicate messages +(with same channel and payload). Setting this variable to +off can increase performance in some situations - at the +cost of having duplicate messages in NOTIFY queue. See for more information. + + + + diff --git a/doc/src/sgml/ref/notify.sgml b/doc/src/sgml/ref/notify.sgml index 4dd5608..86a9bed 100644 --- a/doc/src/sgml/ref/notify.sgml +++ b/doc/src/sgml/ref/notify.sgml @@ -95,16 +95,17 @@ NOTIFY channel [ , If the same channel name is signaled multiple times from the same - transaction with identical payload strings, the - database server can decide to deliver a single notification only. - On the other hand, notifications with distinct payload strings will - always be delivered as distinct notifications. Similarly, notifications from - different transactions will never get folded into one notification. - Except for dropping later instances of duplicate notifications, + transaction with identical payload strings, and + notify_duplicate_removal is set to true, the database server + can decide to deliver a single notification only. On the other hand, + notifications with distinct payload strings will always be delivered as + distinct notifications. Similarly, notifications from different + transactions will never get folded into one notification. Except for + dropping later instances of duplicate notifications, NOTIFY guarantees that notifications from the same transaction get delivered in the order they were sent. It is also - guaranteed that messages from different transactions are delivered in - the order in which the transactions committed. + guaranteed that messages from different transactions are delivered in the + order in which the transactions committed. diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index c39ac3a..a7bc9f1 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -364,8 +364,9 @@ static bool amRegisteredListener = false; /* has this backend sent notifications in the current transaction? */ static bool backendHasSentNotifications = false; -/* GUC parameter */ +/* GUC parameters */ bool Trace_notify = false; +bool notify_duplicate_removal = true; /* local function prototypes */ static bool asyncQueuePagePrecedes(int p, int q); @@ -570,9 +571,12 @@ Async_Notify(const char *channel, const char *payload) errmsg("payload string too long"))); } - /* no point in making duplicate entries in the list ... */ - if (AsyncExistsPendingNotify(channel, payload)) - return; + if (notify_duplicate_removal) + { + /* check for duplicate entries in the list */ + if (AsyncExistsPendingNotify(channel, payload)) + return; + } /* * The notification list needs to live until end of transaction, so store diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 83b8388..b737c29 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1617,6 +1617,15 @@ static struct config_bool ConfigureNamesBool[] = false, NULL, NULL, NULL }, + { + {"notify_duplicate_removal", PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop("Remove duplicate messages during NOTIFY."), + NULL + }, + _duplicate_removal, + true, + NULL, NULL, NULL + }, /* End-of-list marker */ { diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 029114f..2831c1b 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -536,6 +536,7 @@ #xmloption = 'content' #gin_fuzzy_search_limit = 0 #gin_pending_list_limit = 4MB +#notify_duplicate_removal = on # - Locale and Formatting - diff --git a/src/include/commands/async.h b/src/include/commands/async.h index b4c13fa..c572691 100644 --- a/src/include/commands/async.h +++ b/src/include/commands/async.h @@ -23,6 +23,7 @@ #define NUM_ASYNC_BUFFERS 8 extern bool Trace_notify; +extern bool notify_duplicate_removal; extern volatile sig_atomic_t
Re: [HACKERS] Sanity checking for ./configure options?
On Wed, Feb 03, 2016 at 06:02:57PM -0600, Jim Nasby wrote: > I just discovered that ./configure will happily accept '--with-pgport=' (I > was actually doing =$PGPORT, and didn't realize $PGPORT was empty). What you > end up with is a compile error in guc.c, with no idea why it's broken. Any > reason not to have configure or at least make puke if pgport isn't valid? That seems like a good idea. I've been getting rejection to happen with phrases like --with-pgport=${PGPORT:?} which while it looks a little odd, only adds 4 characters to each shell variable. Cheers, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] silent data loss with ext4 / all current versions
On 02/04/2016 09:59 AM, Michael Paquier wrote: On Tue, Feb 2, 2016 at 9:59 AM, Andres Freundwrote: On 2016-02-02 09:56:40 +0900, Michael Paquier wrote: And there is no actual risk of data loss Huh? More precise: what I mean here is that should an OS crash or a power failure happen, we would fall back to recovery at next restart, so we would not actually *lose* data. Except that we actually can't perform the recovery properly because we may not have the last WAL segment (or multiple segments), so we can't replay the last batch of transactions. And we don't even notice that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] First-draft release notes for next week's back-branch releases
On Sat, Feb 6, 2016 at 7:11 AM, Tom Lanewrote: > I've done a first pass at next week's release notes; please review. > > Committed at > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7008e70d105b572821406744ce080771b74c06ab > and should be visible in the website's devel-branch docs after the next > guaibasaurus buildfarm run, due a couple hours from now. > > As usual, I just dumped all the notes into one branch's release-N.N.sgml > file, and will sort out which ones apply to which branches later. I chose > to put them into 9.4, though, not 9.5, since many of these issues are > already fixed in 9.5.0 and will not need to appear in the 9.5.1 section. + + + Ensure that dynloader.h is included in the installed + header files in MSVC builds (Michael Paquier) + + Bruce is the main author of this patch. I used what he did as a base to build a version correct for MSVC. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL Audit Extension
All, * Robert Haas (robertmh...@gmail.com) wrote: > OK, I'll bite: I'm worried that this patch will be a maintenance > burden. It's easy to imagine that changes to core will result in the > necessity or at least desirability of changes to pgaudit, but I'm > definitely not prepared to insist that future authors try to insist > that future patch submitters have to understand this code and update > it as things change. I agree with this concern in general, and have since pgaudit was first proposed for inclusion by Simon two years ago. Having auditing as an extension is what makes this into an issue though. Were auditing baked into core, it'd be far more straight-forward, much easier to maintain, and would be updated as we add new core capabilities naturally. David's comments about how pgaudit has to work are entirely accurate- everything ends up having to be reconstructed in a very painful way. > The set of things that the patch can audit is pretty arbitrary and not > well tied into the core code. This also speaks to the difficulties of having auditing implemented as an extension. * Tom Lane (t...@sss.pgh.pa.us) wrote: > Yeah. Auditing strikes me as a fine example of something for which there > is no *technical* reason to need to put it in core. It might need some > more hooks than we have now, but that's no big deal. I disagree with this, quite strongly. There are very good, technical, reasons why auditing should be a core capability. Adding more hooks, exposing various internal functions for use by extensions, etc, doesn't change that any extension would have to be constantly updated as new capabilities are added to core and auditing, as a capability, would continue to be limited by virtue of being implemented as an extension. I don't mean to detract anything from what 2ndQ and David have done with pgaudit (which is the only auditing solution of its kind that I'm aware of, so I don't understand the "competing implementations" argument which has been brought up at all- there's really only one). They've done just as much as each version of PG has allowed them to do. It's a much better solution than what we have today (which is basically just log_statement = 'all', which no one should ever consider to be a real auditing solution). I've already stated that I'd be willing to take on resonsibility for maintaining it as an extension (included with PG or not), but it's not the end-all, be-all of auditing for PG and we should be continuing to look at implementing a full in-core auditing solution. I'm still of the opinion that we should include pgaudit for the reason that it's a good incremental step towards proper in-core auditing, but there's clearly no consensus on that and doesn't appear that further discussion is likely to change that. An in-core auditing solution would provide us with proper grammar support, ability to directly mark objects for auditing in the catalog, allow us to much more easily maintain auditing capability over time as a small incremental bit of work for each new feature (with proper in-core infrastructure for it) and generally be a far better technical solution. Leveraging the GRANT system is quite cute, and does work, but it's certainly non-intuitive and is only because we've got no better way, due to it being implemented as an extension. Looking at pgaudit and the other approaches to auditing which have been developed (eg: applications which sit in front of PG and essentially have to reimplement large bits of PG to then audit the commands sent before passing them to PG, or hacks which try to make sense out of log files full of SQL statements) make it quite clear, in my view, that attempts to bolt-on auditing to PG result in a poorer solution, from a technical perspective, than what this project is known for and capable of. To make true progress towards that, however, we need to get past the thinking that auditing doesn't need to be in-core or that it should be a second-class citizen feature or that we don't need it in PG. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL Audit Extension
* Joe Conway (m...@joeconway.com) wrote: > On 02/05/2016 10:16 AM, Stephen Frost wrote: > > An in-core auditing solution would provide us with proper grammar > > support, ability to directly mark objects for auditing in the catalog, > > allow us to much more easily maintain auditing capability over time as > > a small incremental bit of work for each new feature (with proper > > in-core infrastructure for it) and generally be a far better technical > > solution. Leveraging the GRANT system is quite cute, and does work, but > > it's certainly non-intuitive and is only because we've got no better > > way, due to it being implemented as an extension. > > I think one additional item needed would be the ability for the audit > logs to be sent to a different location than the standard logs. Indeed, reworking the logging to be supportive of multiple destinations with tagging of the source, etc, has long been a desire of mine (and others), though that's largely independent of auditing itself. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Using quicksort for every external sort run
On Thu, Feb 4, 2016 at 6:14 AM, Peter Geogheganwrote: > The economics of using 4MB or even 20MB to sort 10GB of data are > already preposterously bad for everyone that runs a database server, > no matter how budget conscious they may be. I can reluctantly accept > that we need to still use a heap with very low work_mem settings to > avoid the risk of a regression (in the event of a strong correlation) > on general principle, but I'm well justified in proposing "just don't > do that" as the best practical advice. > > I thought I had your agreement on that point, Robert; is that actually the > case? Peter and I spent a few hours talking on Skype this morning about this point and I believe we have agreed on an algorithm that I think will address all of my concerns and hopefully also be acceptable to him. Peter, please weigh in and let me know if I've gotten anything incorrect here or if you think of other concerns afterwards. The basic idea is that we will add a new GUC with a name like replacement_sort_mem that will have a default value in the range of 20-30MB; or possibly we will hardcode this value, but for purposes of this email I'm going to assume it's a GUC. If the value of work_mem or maintenance_work_mem, whichever applies, is smaller than the value of replacement_sort_mem, then the latter has no effect. However, if replacement_sort_mem is the smaller value, then the amount of memory that can be used for a heap with replacement selection is limited to replacement_sort_mem: we can use more memory than that in total for the sort, but the amount that can be used for a heap is restricted to that value. The way we do this is explained in more detail below. One thing I just thought of (after the call) is that it might be better for this GUC to be in units of tuples rather than in units of memory; it's not clear to me why the optimal heap size should be dependent on the tuple size, so we could have a threshold like 300,000 tuples or whatever. But that's a secondary issue and I might be wrong about it: the point is that in order to have a chance of winning, a heap used for replacement selection needs to be not very big at all by the standards of modern hardware, so the plan is to limit it to a size at which it may have a chance. Here's how that will work, assuming Peter and I understand each other: 1. We start reading the input data. If we reach the end of the input data before (maintenance_)work_mem is exhausted, then we can simply quicksort the data and we're done. This is no different than what we already do today. 2. If (maintenance_)work_mem fills up completely, we will quicksort all of the data we have in memory. We will then regard the tail end of that sorted data, in an amount governed by replacement_sort_mem, as a heap, and use it to perform replacement selection until no tuples remain for the current run. Meanwhile, the rest of the sorted data remains in memory untouched. Logically, we're constructing a run of tuples which is split between memory and disk: the head of the run (what fits in all of (maintenance_)work_mem except for replacement_sort_mem) is in memory, and the tail of the run is on disk. 3. If we reach the end of input before replacement selection runs out of tuples for the current run, and if it finds no tuples for the next run prior to that time, then we are done. All of the tuples form a single run and we can return the tuples in memory first followed by the tuples on disk. This case is highly likely to be a huge win over what we have today, because (a) some portion of the tuples were sorted via quicksort rather than heapsort and that's faster, (b) the tuples that were sorted using a heap were sorted using a small heap rather than a big one, and (c) we only wrote out the minimal number of tuples to tape instead of, as we would have done today, all of them. 4. If we reach this step, then replacement selection with a small heap wasn't able to sort the input in a single run. We have a bunch of sorted data in memory which is the head of the same run whose tail is already on disk; we now spill all of these tuples to disk. That leaves only the heapified tuples in memory. We just ignore the fact that they are a heap and treat them as unsorted. We repeatedly do the following: read tuples until work_mem is full, sort them, and dump the result to disk as a run. When all runs have been created, we merge runs just as we do today. This algorithm seems very likely to beat what we do today in practically all cases. The benchmarking Peter and others have already done shows that building runs with quicksort rather than replacement selection can often win even if the larger number of tapes requires a multi-pass merge. The only cases where it didn't seem to be a clear win involved data that was already in sorted order, or very close to it. But with this algorithm, presorted input is fine: we'll quicksort some of it (which is faster than replacement
Re: [HACKERS] Development with Eclipse - Wrong error messages in IDE
> On Feb 3, 2016, at 2:38 AM, Peter Moserwrote: > > Does anyone had similar problems? Do I have to configure Eclipse to > understand the PG_RMGR macro or is there another possibility to teach Eclipse > these macros? I just built 9.6 under Eclipse CDT to try this out and was able to open e.g. heapam.c without any error markers. I added PostgreSQL as a “Makefile Project with Existing Code” after running ./configure from the command-line. After that, I built the project from within Eclipse by adding the ‘all’ make target and running it. One setting I usually change: right-click the project, pick Properties, then drill down through C/C++ General -> Preprocessor Include Paths. In the Provider pane, there is an entry for “CDT GCC Build Output Parser”. I’m not sure if this is strictly necessary, but I set my “Container to keep discovered entries” setting to “File”. Basically, Eclipse scans the make output for -I flags, then notes all the includes used to build each file, so the static analyzer, etc. can have more accurate information (it is crucial that the “Compiler command pattern” in this window be a regex that will match the compiler binary you use, so if you have /usr/local/bin/gcc, and “gcc” is the pattern, you are in for trouble). After running the build, Eclipse should now know what includes are used for each file and stop whining. If it ever seems to have problems, you can kick it by running a clean target, then all, then picking “Project -> C/C++ Index -> Rebuild” (I think). -- Jason Petersen Software Engineer | Citus Data 303.736.9255 ja...@citusdata.com signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] PostgreSQL Audit Extension
On 02/05/2016 10:16 AM, Stephen Frost wrote: > An in-core auditing solution would provide us with proper grammar > support, ability to directly mark objects for auditing in the catalog, > allow us to much more easily maintain auditing capability over time as > a small incremental bit of work for each new feature (with proper > in-core infrastructure for it) and generally be a far better technical > solution. Leveraging the GRANT system is quite cute, and does work, but > it's certainly non-intuitive and is only because we've got no better > way, due to it being implemented as an extension. I think one additional item needed would be the ability for the audit logs to be sent to a different location than the standard logs. > To make true progress towards that, however, we need to get past > the thinking that auditing doesn't need to be in-core or that it should > be a second-class citizen feature or that we don't need it in PG. +1 Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Development with Eclipse - Wrong error messages in IDE
Peter Moser wrote: I have some strange error message inside Eclipse, that some symbols cannot be found. I work with version 9.6 currently. For instance, Symbol 'RM_HEAP_ID' could not be resolved src/backend/access/heap/heapam.c It affects all occurrences of symbols that are defined in src/include/access/rmgrlist.h. Eclipse just says "Syntax error" here. However, the source code compiles and runs without any compile-time error or warning. It is just an IDE problem I think, but it distracts me from finding real bugs. Disclaimer: I've never used eclipse. The problem is some perhaps-too-clever stuff we do to avoid repetitive declarations of things. The rmgr stuff uses a PG_RMGR macro, which is defined differently in src/backend/access/transam/rmgr.c and src/include/access/rmgr.h; the latter contains the actual enum definition. On the other hand Eclipse is trying to be too clever by processing the C files, but not actually getting it completely right (which is understandable, really). We have other similar cases, such as grammar keywords (kwlist.h) I'm afraid that you'd have to teach Eclipse to deal with such things (which might be tricky) or live with it. Ok, thank you for the comment. I think, I can live with it. Perhaps, when I have some spare time I give it a try to solve this "non-issue"... Cheers, Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hello, Please find attached updated patch. >The point of having pgstat_report_progress_update_counter() is so that >you can efficiently update a single counter without having to update >everything, when only one counter has changed. But here you are >calling this function a whole bunch of times in a row, which >completely misses the point - if you are updating all the counters, >it's more efficient to use an interface that does them all at once >instead of one at a time. The pgstat_report_progress_update_counter() is called at appropriate places in the attached patch. >So I've spent a fair amount of time debugging really-long-running >VACUUM processes with customers, and generally what I really want to >know is: What block number are we at? <<< Agreed. The attached patch reported current block number. Regards, Vinayak Vacuum_progress_checker_v11.patch Description: Vacuum_progress_checker_v11.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquierwrote: > On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier > wrote: >> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas wrote: >>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier >>> wrote: Yes, please let's use the custom language, and let's not care of not more than 1 level of nesting so as it is possible to represent pg_stat_replication in a simple way for the user. >>> >>> "not" is used twice in this sentence in a way that renders me not able >>> to be sure that I'm not understanding it not properly. >> >> 4 times here. Score beaten. >> >> Sorry. Perhaps I am tired... I was just wondering if it would be fine >> to only support configurations up to one level of nested objects, like >> that: >> 2[node1, node2, node3] >> node1, 2[node2, node3], node3 >> In short, we could restrict things so as we cannot define a group of >> nodes within an existing group. > > No, actually, that's stupid. Having up to two nested levels makes more > sense, a quite common case for this feature being something like that: > 2{node1,[node2,node3]} > In short, sync confirmation is waited from node1 and (node2 or node3). > > Flattening groups of nodes with a new catalog will be necessary to > ease the view of this data to users: > - group name? > - array of members with nodes/groups > - group type: quorum or priority > - number of items to wait for in this group So, here are some thoughts to make that more user-friendly. I think that the critical issue here is to properly flatten the meta data in the custom language and represent it properly in a new catalog, without messing up too much with the existing pg_stat_replication that people are now used to for 5 releases since 9.0. So, I would think that we will need to have a new catalog, say pg_stat_replication_groups with the following things: - One line of this catalog represents the status of a group or of a single node. - The status of a node/group is either sync or potential, if a node/group is specified more than once, it may be possible that it would be sync and potential depending on where it is defined, in which case setting its status to 'sync' has the most sense. If it is in sync state I guess. - Move sync_priority and sync_state, actually an equivalent from pg_stat_replication into this new catalog, because those represent the status of a node or group of nodes. - group name, and by that I think that we had perhaps better make mandatory the need to append a name with a quorum or priority group. The group at the highest level is forcibly named as 'top', 'main', or whatever if not directly specified by the user. If the entry is directly a node, use the application_name. - Type of group, quorum or priority - Elements in this group, an element can be a group name or a node name, aka application_name. If group is of type priority, the elements are listed in increasing order. So the elements with lower priority get first, etc. We could have one column listing explicitly a list of integers that map with the elements of a group but it does not seem worth it, what users would like to know is what are the nodes that are prioritized. This covers the former 'priority' field of pg_stat_replication. We may have a good idea of how to define a custom language, still we are going to need to design a clean interface at catalog level more or less close to what is written here. If we can get a clean interface, the custom language implemented, and TAP tests that take advantage of this user interface to check the node/group statuses, I guess that we would be in good shape for this patch. Anyway that's not a small project, and perhaps I am over-complicating the whole thing. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] IF (NOT) EXISTS in psql-completion
Hello, I considered how to make tab-completion robust for syntactical noises, in other words, optional words in syntax. Typically "IF (NOT) EXISTS", UNIQUE and TEMPORARY are words that don't affect further completion. However, the current delimit-matching mechanism is not so capable (or is complexty-prone) to live with such noises. I have proposed to use regular expressions or simplified one for the robustness but it was too complex to be applied. This is another answer for the problem. Removal of such words on-the-fly makes further matching more robust. Next, currently some CREATE xxx subsyntaxes of CREATE SCHEMA are matched using TailMatching but it makes difficult the options-removal operations, which needs forward matching. So I introduced two things to resolve them by this patch. 1. HEAD_SHIFT macro It shifts the beginning index in previous_words for *MatchN macros. When the varialbe is 3 and previous_words is as following, {"NOT", "IF", "CONCURRENTLY", "INDEX", "UNIQUE", "CREATE"} Match3("CONCURRENTLY", "IF", "NOT") reutrns true. HeadMatches and TailMatches works on the same basis. This allows us to match "CREATE xxx" subsyntaxes of CREATE SCHEMA independently. SHIFT_TO_LAST1() macro finds the last appearance of specified word and HEAD_SHIFT to there if found. 2. MidMatchAndRemoveN() macros These macros remove specified words starts from specfied number of words after the current beginning. Having head_shift = 0 and the following previous_words, {"i_t1_a", "EXISTS", "IF", "INDEX", "DROP"} MidMatchAndRemove2(2, "IF", "EXISTS") leaves the following previous_words. {"i_t1_a", "INDEX", "DROP"} Using these things, the patch as whole does the following things. A. Allows "IF (NOT) EXISTS" at almost everywhere it allowed syntactically. The boilerplate is like the following, | else if (MatchesN(words before IF EXISTS)) | COMPLETE_WITH_XXX(... "UNION SELECT 'IF EXISTS'"); | else if (HeadMatchesN(words before "IF EXISTS") && | MidMatchAndRemoveM(N, words to be removed) && | MatchesN(words before "IF EXISTS")) | COMPLETE_WITH_(); The first "else if" makes suggestion for the 'thing' with "IF EXISTS". The MidMatchAndRemoveM in the second "else if" actually removes "IF EXISTS" if match and the third MatchesN or all matching macros ever after don't see the removed words. So they can make a suggestion without seeing such noises. This looks a bit hackery but works well in the current framework. B. Masks the part in CREATE SCHEMA unrelated to the last CREATE subsyntax currently focused on. |else if (HeadMatches2("CREATE", "SCHEMA") && | SHIFT_TO_LAST1("CREATE") && | false) {} /* FALL THROUGH */ The result of this, for the query like this, CREATE SCHEMA foo bar baz CREATE foe fee CREATE hoge hage all the following part of psql_completion works as if the current query is just "CREATE hoge hage". Does anybody have suggestions, opinions or objections? regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 82928996a0887882212d05aca3406a3bd3cf22e5 Mon Sep 17 00:00:00 2001 From: Kyotaro HoriguchiDate: Fri, 5 Feb 2016 16:50:35 +0900 Subject: [PATCH] Suggest IF (NOT) EXISTS for tab-completion of psql This patch lets psql to suggest "IF (NOT) EXISTS". Addition to that, since this patch introduces some mechanism for syntactical robustness, it allows psql completion to omit some optional part on matching. --- src/bin/psql/tab-complete.c | 559 1 file changed, 463 insertions(+), 96 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 5f27120..d95698a 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -656,6 +656,10 @@ static const SchemaQuery Query_for_list_of_matviews = { " FROM pg_catalog.pg_roles "\ " WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'" +#define Query_for_list_of_rules \ +"SELECT pg_catalog.quote_ident(rulename) FROM pg_catalog.pg_rules "\ +" WHERE substring(pg_catalog.quote_ident(rulename),1,%d)='%s'" + #define Query_for_list_of_grant_roles \ " SELECT pg_catalog.quote_ident(rolname) "\ " FROM pg_catalog.pg_roles "\ @@ -763,6 +767,11 @@ static const SchemaQuery Query_for_list_of_matviews = { "SELECT pg_catalog.quote_ident(tmplname) FROM pg_catalog.pg_ts_template "\ " WHERE substring(pg_catalog.quote_ident(tmplname),1,%d)='%s'" +#define Query_for_list_of_triggers \ +"SELECT pg_catalog.quote_ident(tgname) FROM pg_catalog.pg_trigger "\ +" WHERE substring(pg_catalog.quote_ident(tgname),1,%d)='%s' AND "\ +" NOT tgisinternal" + #define Query_for_list_of_fdws \ " SELECT pg_catalog.quote_ident(fdwname) "\ " FROM pg_catalog.pg_foreign_data_wrapper "\ @@ -906,7 +915,7 @@ static const pgsql_thing_t words_after_create[] = { {"PARSER",
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/04 21:42, Ashutosh Bapat wrote: * Is it safe to replace outerjoinpath with its fdw_outerpath the following way? I think that if the join relation represented by outerjoinpath has local conditions that can't be executed remotely, we have to keep outerjoinpath in the path tree; we will otherwise fail to execute the local conditions. No? + /* +* If either inner or outer path is a ForeignPath corresponding to +* a pushed down join, replace it with the fdw_outerpath, so that we +* maintain path for EPQ checks built entirely of local join +* strategies. +*/ + if (IsA(joinpath->outerjoinpath, ForeignPath)) + { + ForeignPath *foreign_path; + foreign_path = (ForeignPath *)joinpath->outerjoinpath; + if (foreign_path->path.parent->reloptkind == RELOPT_JOINREL) + joinpath->outerjoinpath = foreign_path->fdw_outerpath; + } all the conditions (local and remote) should be part of fdw_outerpath as well, since that's the alternate local path, which should produce (when converted to the plan) the same result as the foreign path. fdw_outerpath should be a local path set when paths for outerjoinpath->parent was being created. Am I missing something? I assumed by mistake that only the remote conditions were evaluated in a plan created from each fdw_outerpath. Sorry for that. I think that is a good idea! Btw, IIUC, I think the patch fails to adjust the targetlist of the top plan created that way, to output the fdw_scan_tlist, as discussed in [1] (ie, I think the attached patch is needed, which is created on top of your patch pg_fdw_join_v8.patch). Best regards, Etsuro Fujita [1] http://www.postgresql.org/message-id/ca+tgmoba4mskgquicgt5ckbpqj-tmpqefht_wy49ndwa91w...@mail.gmail.com *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 1067,1072 postgresGetForeignPlan(PlannerInfo *root, --- 1067,1076 /* Build the list of columns to be fetched from the foreign server. */ fdw_scan_tlist = build_tlist_to_deparse(foreignrel); + + /* Replace the targetlist of outer_plan with fdw_scan_tlist, if any */ + if (outer_plan) + outer_plan->targetlist = fdw_scan_tlist; } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] First-draft release notes for next week's back-branch releases
I've done a first pass at next week's release notes; please review. Committed at http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7008e70d105b572821406744ce080771b74c06ab and should be visible in the website's devel-branch docs after the next guaibasaurus buildfarm run, due a couple hours from now. As usual, I just dumped all the notes into one branch's release-N.N.sgml file, and will sort out which ones apply to which branches later. I chose to put them into 9.4, though, not 9.5, since many of these issues are already fixed in 9.5.0 and will not need to appear in the 9.5.1 section. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Fri, Feb 5, 2016 at 4:23 AM, Ashutosh Bapatwrote: > I have corrected this in this set of patches. Also, I have included the > change to build the join relation description while constructing fpinfo in > the main patch since that avoids repeated building of the same at a small > cost of constructing relation name for base relations, which goes waste if > that relation is not going to be part of any pushable join tree. > > Ran pgindent as well. pg_fdw_join_v9.patch does not aplpy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On Fri, Feb 5, 2016 at 9:09 AM, Ashutosh Bapatwrote: >> Would it be nuts to set fdw_scan_tlist in all cases? Would the code >> come out simpler than what we have now? > > PFA the patch that can be applied on v9 patches. > > If there is a whole-row reference for base relation, instead of adding that > as an additional column deparseTargetList() creates a list of all the > attributes of that foreign table . The whole-row reference gets constructed > during projection. This saves some network bandwidth while transferring the > data from the foreign server. If we build the target list for base relation, > we should include Vars for all the columns (like deparseTargetList). Thus we > borrow some code from deparseTargetList to get all the attributes of a > relation. I included this logic in function build_tlist_from_attrs_used(), > which is being called by build_tlist_to_deparse(). So, before calling > deparseSelectStmtForRel() the callers can just call build_tlist_to_deparse() > and pass the targetlist to deparseSelectStmtForRel() and use the same to be > handed over to the core code as fdw_scan_tlist. build_tlist_to_deparse() > also constructs retrieved_attrs list, so that doesn't need to be passed > around in deparse routines. > > But we now have similar code in three places deparseTargetList(), > deparseAnalyzeSql() and build_tlist_from_attrs_used(). So, I re-wrote > deparseTargetList() (which is now used to deparse returning list) to call > build_tlist_from_attrs_used() followed by deparseExplicitTargetList(). The > later and its minion deparseVar requires a deparse_expr_cxt to be passed. > deparse_expr_cxt has a member to store RelOptInfo of the relation being > deparsed. The callers of deparseReturningList() do not have it and thus > deparse_expr_cxt required changes, which in turn required changes in other > places. All in all, a larger refactoring. It touches more places than > necessary for foreign join push down. So, I think, we should try that as a > separate refactoring work. We may just do the work described in the first > paragraph for join pushdown, but we will then be left with duplicate code, > which I don't think is worth the output. Yeah, I'm not convinced this is actually simpler; at first look, it seems like it is just moving the complexity around. I don't like the fact that there are so many places where we have to do one thing for a baserel and something totally different for a joinrel, which was the point of my comment. But this seems to add one instance of that and remove one instance of that, so I don't see how we're coming out better than a wash. Maybe it's got more merit than I'm giving it credit for, and I'm just not seeing it right at the moment... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
> Btw, IIUC, I think the patch fails to adjust the targetlist of the top > plan created that way, to output the fdw_scan_tlist, as discussed in [1] > (ie, I think the attached patch is needed, which is created on top of your > patch pg_fdw_join_v8.patch). > > fdw_scan_tlist represents the output fetched from the foreign server and is not necessarily the output of ForeignScan. ForeignScan node's output is represented by tlist argument to. 1119 return make_foreignscan(tlist, 1120 local_exprs, 1121 scan_relid, 1122 params_list, 1123 fdw_private, 1124 fdw_scan_tlist, 1125 remote_exprs, 1126 outer_plan); This tlist is built using build_path_tlist() for all join plans. IIUC, all of them output the same targetlist. We don't need to make sure that targetlist match as long as we are using the targetlist passed in by create_scan_plan(). Do you have a counter example? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] silent data loss with ext4 / all current versions
On Thu, Feb 4, 2016 at 2:34 PM, Michael Paquierwrote: > On Thu, Feb 4, 2016 at 12:02 PM, Michael Paquier > wrote: >> On Tue, Feb 2, 2016 at 4:20 PM, Michael Paquier wrote: >>> Not wrong, and this leads to the following: >>> void rename_safe(const char *old, const char *new, bool isdir, int elevel); >>> Controlling elevel is necessary per the multiple code paths that would >>> use it. Some use ERROR, most of them FATAL, and a bit of WARNING. Does >>> that look fine? >> >> After really coding it, I finished with the following thing: >> +int >> +rename_safe(const char *old, const char *new) >> >> There is no need to extend that for directories, well we could of >> course but all the renames happen on files so I see no need to make >> that more complicated. More refactoring of the other rename() calls >> could be done as well by extending rename_safe() with a flag to fsync >> the data within a critical section, particularly for the replication >> slot code. I have let that out to not complicate more the patch. > > Andres just poked me (2m far from each other now) regarding the fact > that we should fsync even after the link() calls when > HAVE_WORKING_LINK is used. So we could lose some meta data here. Mea > culpa. And the patch misses that. So, attached is an updated patch that adds a new routine link_safe() to ensure that a hard link is on-disk persistent. link() is called twice in timeline.c and once in xlog.c, so those three code paths are impacted. I noticed as well that my previous patch was sometimes doing palloc calls in a critical section (oops), I fixed that on the way. Thoughts welcome. -- Michael xlog-fsync-v5.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
> We may have a good idea of how to define a custom language, still we > are going to need to design a clean interface at catalog level more or > less close to what is written here. If we can get a clean interface, > the custom language implemented, and TAP tests that take advantage of > this user interface to check the node/group statuses, I guess that we > would be in good shape for this patch. > > Anyway that's not a small project, and perhaps I am over-complicating > the whole thing. Yes. The more I look at this, the worse the idea of custom syntax looks. Yes, I realize there are drawbacks to using JSON, but this is worse. Further, there's a lot of horse-cart inversion here. This proposal involves letting the syntax for sync_list configuration determine the feature set for N-sync. That's backwards; we should decide the total list of features we want to support, and then adopt a syntax which will make it possible to have them. -- Josh Berkus Red Hat OSAS (opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Fri, Feb 5, 2016 at 12:19 PM, Masahiko Sawadawrote: > On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier > wrote: > I agree with adding new system catalog to easily checking replication > status for user. And group name will needed for this. > What about adding group name with ":" to immediately after set of > standbys like follows? This way is fine for me. > 2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo] > > Also, regarding sync replication according to configuration, the view > I'm thinking is following definition. > > =# \d pg_synchronous_replication > Column | Type | Modifiers > -+---+--- > name| text | > sync_type | text | > wait_num | integer | > sync_priority | inteter | > sync_state| text | > member| text[] | > level | integer | > write_location| pg_lsn | > flush_location| pg_lsn | > apply_location | pg_lsn | > > - "name" : node name or group name, or "main" meaning top level node. Check. > - "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL. That would be one or the other. > - "wait_num" : number of nodes/groups to wait for in this group. Check. This is taken directly from the meta data. > - "sync_priority" : priority of node/group in this group. "main" node has "0". > - the standby is in quorum group always has > priority 1. > - the standby is in priority group has > priority according to definition order. This is a bit confusing if the same node or group in in multiple groups. My previous suggestion was to list the elements of the group in increasing order of priority. That's an important point. > - "sync_state" : 'sync' or 'potential' or 'quorum'. > - the standby is in quorum group is always 'quorum'. > - the standby is in priority group is 'sync' > / 'potential'. potential and quorum are the same thing, no? The only difference is based on the group type here. > - "member" : array of members for group node, otherwise NULL. This can be NULL only when the entry is a node. > - "level" : nested level. "main" node is level 0. Not sure this one is necessary. > - "write/flush/apply_location" : group/node calculated LSN according > to configuration. This does not need to be part of this catalog, that's a representation of the data that is part of the WAL sender. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] count_nulls(VARIADIC "any")
On 2016-02-05 05:06, Tom Lane wrote: I wrote: Pavel Stehulewrites: [ num_nulls_v6.patch ] I started looking through this. It seems generally okay, but I'm not very pleased with the function name "num_notnulls". I think it would be better as "num_nonnulls", as I see Oleksandr suggested already. Not hearing any complaints, I pushed it with that change and some other cosmetic adjustments. Thanks Tom and Pavel and everyone who provided feedback. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujitawrote: > On 2016/01/28 15:20, Rushabh Lathia wrote: > >> On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita >> > wrote: >> >> On 2016/01/27 21:23, Rushabh Lathia wrote: >> >> If I understood correctly, above documentation means, that if >> FDW have >> DMLPushdown APIs that is enough. But in reality thats not the >> case, we >> need ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >> in case >> DML is not pushable. >> >> And here fact is DMLPushdown APIs are optional for FDW, so that >> if FDW >> don't have DMLPushdown APIs they can still very well perform the >> DML >> operations using ExecForeignInsert, ExecForeignUpdate, or >> ExecForeignDelete. >> > > So documentation should be like: >> >> If the IsForeignRelUpdatable pointer is set to NULL, foreign >> tables are >> assumed to be insertable, updatable, or deletable if the FDW >> provides >> ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >> respectively, >> >> If FDW provides DMLPushdown APIs and the DML are pushable to the >> foreign >> server, then FDW still needs ExecForeignInsert, >> ExecForeignUpdate, or >> ExecForeignDelete for the non-pushable DML operation. >> >> What's your opinion ? >> > > I agree that we should add this to the documentation, too. >> > > I added docs to the IsForeignRelUpdatable documentation. Also, a brief > introductory remark has been added at the beginning of the DML pushdown > APIs' documentation. > > BTW, if I understand correctly, I think we should also modify >> relation_is_updatabale() accordingly. Am I right? >> > > Yep, we need to modify relation_is_updatable(). >> > > I thought I'd modify that function in the same way as > CheckValidResultRel(), but I noticed that we cannot do that, because we > don't have any information on whether each update is pushed down to the > remote server by PlanDMLPushdown, during relation_is_updatabale(). Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives information update whether update is pushed down safe or not ? What my concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true (PlanDMLPushdown return true), but later into CheckValidResultRel() it found out that missing BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown APIs and it will end up with error. What I think CheckValidResultRel() should do is, if resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API and if it doesn't find those API then check for traditional APIs (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it doesn't find both then it should return an error. I changed CheckValidResultRel(), where 1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs are missing as query can still perform operation with traditional ExecForeign APIs. So in this situation just marked resultRelInfo->ri_FdwPushdown to false. (Wondering can we add the checks for DMLPushdown APIs into PlanDMLPushdown as additional check? Means PlanDMLPushdown should return true only if FDW provides the BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs ? What you say ?) 2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and DMLPushdown APIs is present but ExecForeign APIs are missing. 3) Throw an error if resultRelInfo->ri_FdwPushdown is false and ExecForeign APIs are missing. Attaching is the WIP patch here, do share your thought. (need to apply on top of V6 patch) So, I left that function as-is. relation_is_updatabale() is just used for > display in the information_schema views, so ISTM that that function is fine > as-is. (As for CheckValidResultRel(), I revised it so as to check the > presence of DML pushdown APIs after checking the existing APIs if the given > command will be pushed down. The reason is because we assume the presence > of the existing APIs, anyway.) > > I revised other docs and some comments, mostly for consistency. > > Attached is an updated version of the patch, which has been created on top > of the updated version of the bugfix patch posted by Robert in [1] > (attached). > > > Best regards, > Etsuro Fujita > > [1] > http://www.postgresql.org/message-id/ca+tgmoz40j2uc5ac1nxu03oj4crvolks15xx+ptfp-1u-8z...@mail.gmail.com > -- Rushabh Lathia diff --git a/src/backend/executor/.execMain.c.swp b/src/backend/executor/.execMain.c.swp index 6f54fd9..8a29cc6 100644 Binary files a/src/backend/executor/.execMain.c.swp and b/src/backend/executor/.execMain.c.swp differ diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index eb24051..04d90ea 100644 --- a/src/backend/executor/execMain.c +++
Re: [HACKERS] Relation extension scalability
On Tue, Feb 2, 2016 at 9:19 PM, Andres Freundwrote: > > On 2016-01-28 16:53:08 +0530, Dilip Kumar wrote: > > test_script: > > > > ./psql -d postgres -c "truncate table data" > > ./psql -d postgres -c "checkpoint" > > ./pgbench -f copy_script -T 120 -c$ -j$ postgres > > > > Shared Buffer48GB > > Table:Unlogged Table > > ench -c$ -j$ -f -M Prepared postgres > > > > ClientsBasepatch > > 1178 180 > > 2337 338 > > 4265 601 > > 8167 805 > > Could you also measure how this behaves for an INSERT instead of a COPY > workload? > I think such a test will be useful. > > I'm doubtful that anything that does the victim buffer search while > holding the extension lock will actually scale in a wide range of > scenarios. > I think the problem for victim buffer could be visible if the blocks are dirty and it needs to write the dirty buffer and especially as the patch is written where after acquiring the extension lock, it again tries to extend the relation without checking if it can get a page with space from FSM. It seems to me that we should re-check the availability of page because while one backend is waiting on extension lock, other backend might have added pages. To re-check the availability we might want to use something similar to LWLockAcquireOrWait() semantics as used during WAL writing. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On Fri, Feb 5, 2016 at 4:46 PM, Rushabh Lathiawrote: > > > On Wed, Feb 3, 2016 at 3:31 PM, Etsuro Fujita > wrote: > >> On 2016/01/28 15:20, Rushabh Lathia wrote: >> >>> On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita >>> > >>> wrote: >>> >>> On 2016/01/27 21:23, Rushabh Lathia wrote: >>> >>> If I understood correctly, above documentation means, that if >>> FDW have >>> DMLPushdown APIs that is enough. But in reality thats not the >>> case, we >>> need ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >>> in case >>> DML is not pushable. >>> >>> And here fact is DMLPushdown APIs are optional for FDW, so that >>> if FDW >>> don't have DMLPushdown APIs they can still very well perform the >>> DML >>> operations using ExecForeignInsert, ExecForeignUpdate, or >>> ExecForeignDelete. >>> >> >> So documentation should be like: >>> >>> If the IsForeignRelUpdatable pointer is set to NULL, foreign >>> tables are >>> assumed to be insertable, updatable, or deletable if the FDW >>> provides >>> ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete >>> respectively, >>> >>> If FDW provides DMLPushdown APIs and the DML are pushable to the >>> foreign >>> server, then FDW still needs ExecForeignInsert, >>> ExecForeignUpdate, or >>> ExecForeignDelete for the non-pushable DML operation. >>> >>> What's your opinion ? >>> >> >> I agree that we should add this to the documentation, too. >>> >> >> I added docs to the IsForeignRelUpdatable documentation. Also, a brief >> introductory remark has been added at the beginning of the DML pushdown >> APIs' documentation. >> >> BTW, if I understand correctly, I think we should also modify >>> relation_is_updatabale() accordingly. Am I right? >>> >> >> Yep, we need to modify relation_is_updatable(). >>> >> >> I thought I'd modify that function in the same way as >> CheckValidResultRel(), but I noticed that we cannot do that, because we >> don't have any information on whether each update is pushed down to the >> remote server by PlanDMLPushdown, during relation_is_updatabale(). > > > Sorry I didn't get you here. Can't resultRelInfo->ri_FdwPushdown gives > information update whether update is pushed down safe or not ? What my > concern here is, lets say resultRelInfo->ri_FdwPushdown marked as true > (PlanDMLPushdown return true), but later into CheckValidResultRel() it > found out that missing BeginDMLPushdown, IterateDMLPushdown and > EndDMLPushdown APIs and it will end up with error. > > What I think CheckValidResultRel() should do is, if > resultRelInfo->ri_FdwPushdown is true then check for the DMLPushdown API > and if it doesn't find those API then check for traditional APIs > (ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete). And when it > doesn't find both then it should return an error. > > I changed CheckValidResultRel(), where > > 1) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and > DMLPushdown APIs are missing as query can still perform operation with > traditional ExecForeign APIs. So in this situation just marked > resultRelInfo->ri_FdwPushdown to false. > > (Wondering can we add the checks for DMLPushdown APIs into PlanDMLPushdown > as additional check? Means PlanDMLPushdown should return true only if FDW > provides the BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs ? > What you say ?) > > On another thought, we should not give responsibility to check for the APIs to the FDW. So may be we should call PlanDMLPushdown only if BeginDMLPushdown & IterateDMLPushdown & EndDMLPushdown APIs are present into FDW. That means prepare DMLPushdown plan only when all the required APIs are available with FDW. This will also reduce the changes into CheckValidResultRel(). Thanks Ashutosh Bapat for healthy discussion. PFA patch. > 2) Don't throw an error if resultRelInfo->ri_FdwPushdown is true and > DMLPushdown APIs is present but ExecForeign APIs are missing. > 3) Throw an error if resultRelInfo->ri_FdwPushdown is false and > ExecForeign APIs are missing. > > Attaching is the WIP patch here, do share your thought. > (need to apply on top of V6 patch) > > > So, I left that function as-is. relation_is_updatabale() is just used for >> display in the information_schema views, so ISTM that that function is fine >> as-is. (As for CheckValidResultRel(), I revised it so as to check the >> presence of DML pushdown APIs after checking the existing APIs if the given >> command will be pushed down. The reason is because we assume the presence >> of the existing APIs, anyway.) >> >> > I revised other docs and some comments, mostly for consistency. >> >> Attached is an updated version of the patch,
Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
On Wed, Feb 3, 2016 at 11:31 AM, Robert Haaswrote: > Thanks for the review. I fixed the OID conflict, tweaked a few > comments, and committed this. Thanks. I noticed a tiny, preexisting typo in the string abbreviated key code. The attached patch fixes it. -- Peter Geoghegan diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 1a74e5e..5e7536a 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -2154,7 +2154,7 @@ varstr_abbrev_convert(Datum original, SortSupport ssup) len = bpchartruelen(authoritative_data, len); /* - * If we're using the C collation, use memcmp(), rather than strxfrm(), to + * If we're using the C collation, use memcpy(), rather than strxfrm(), to * abbreviate keys. The full comparator for the C locale is always * memcmp(). It would be incorrect to allow bytea callers (callers that * always force the C collation -- bytea isn't a collatable type, but this -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
Would it be nuts to set fdw_scan_tlist in all cases? Would the code > come out simpler than what we have now? > > PFA the patch that can be applied on v9 patches. If there is a whole-row reference for base relation, instead of adding that as an additional column deparseTargetList() creates a list of all the attributes of that foreign table . The whole-row reference gets constructed during projection. This saves some network bandwidth while transferring the data from the foreign server. If we build the target list for base relation, we should include Vars for all the columns (like deparseTargetList). Thus we borrow some code from deparseTargetList to get all the attributes of a relation. I included this logic in function build_tlist_from_attrs_used(), which is being called by build_tlist_to_deparse(). So, before calling deparseSelectStmtForRel() the callers can just call build_tlist_to_deparse() and pass the targetlist to deparseSelectStmtForRel() and use the same to be handed over to the core code as fdw_scan_tlist. build_tlist_to_deparse() also constructs retrieved_attrs list, so that doesn't need to be passed around in deparse routines. But we now have similar code in three places deparseTargetList(), deparseAnalyzeSql() and build_tlist_from_attrs_used(). So, I re-wrote deparseTargetList() (which is now used to deparse returning list) to call build_tlist_from_attrs_used() followed by deparseExplicitTargetList(). The later and its minion deparseVar requires a deparse_expr_cxt to be passed. deparse_expr_cxt has a member to store RelOptInfo of the relation being deparsed. The callers of deparseReturningList() do not have it and thus deparse_expr_cxt required changes, which in turn required changes in other places. All in all, a larger refactoring. It touches more places than necessary for foreign join push down. So, I think, we should try that as a separate refactoring work. We may just do the work described in the first paragraph for join pushdown, but we will then be left with duplicate code, which I don't think is worth the output. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index fb72f45..7a2a67b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -93,9 +93,10 @@ typedef struct foreign_loc_cxt typedef struct deparse_expr_cxt { PlannerInfo *root; /* global planner state */ - RelOptInfo *foreignrel; /* the foreign relation we are planning for */ + Relids relids; /* Relids for which to deparse */ StringInfo buf;/* output buffer to append to */ List **params_list;/* exprs that will become remote Params */ + boolis_joinrel; /* Are we deparsing for a join relation */ } deparse_expr_cxt; #define REL_ALIAS_PREFIX "r" @@ -122,8 +123,7 @@ static void deparseTargetList(StringInfo buf, Bitmapset *attrs_used, List **retrieved_attrs, bool qualify_col); -static void deparseExplicitTargetList(List *tlist, List **retrieved_attrs, - deparse_expr_cxt *context); +static void deparseExplicitTargetList(List *tlist, deparse_expr_cxt *context); static void deparseReturningList(StringInfo buf, PlannerInfo *root, Index rtindex, Relation rel, bool trig_after_row, @@ -151,13 +151,15 @@ static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod, deparse_expr_cxt *context); -static void deparseSelectSql(List *tlist, List **retrieved_attrs, -deparse_expr_cxt *context); +static void deparseSelectSql(List *tlist, RelOptInfo *rel, +deparse_expr_cxt *context); static void deparseLockingClause(deparse_expr_cxt *context); static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context); static void appendConditions(List *exprs, deparse_expr_cxt *context); static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *joinrel, bool use_alias, List **params_list); +static List *build_tlist_from_attrs_used(Index rtindex, Bitmapset *attrs_used, + PlannerInfo *root, List **retrieved_attrs); /* @@ -715,26 +717,119 @@ deparse_type_name(Oid type_oid, int32 typemod) } /* + * Convert input bitmap representation of columns into targetlist of + * corresponding Var nodes. + * + * List of
[HACKERS] 9.6 Release Schedule
Per discussion at the Brussels developer meeting and within the release team, the high level schedule for 9.6 will be: Beta: May (before PGCon) Release: September -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comment typo in slot.c
On Thu, Feb 4, 2016 at 4:39 AM, Michael Paquierwrote: > I just bumped into the following typo in slot.c: > /* > * If we'd now fail - really unlikely - we wouldn't know > whether this slot > * would persist after an OS crash or not - so, force a restart. The > -* restart would try to fysnc this again till it works. > +* restart would try to fsync this again till it works. > */ > START_CRIT_SECTION(); Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactoring of LWLock tranches
On Thu, Feb 4, 2016 at 11:30 PM, Amit Kapilawrote: > On Fri, Feb 5, 2016 at 3:17 AM, Robert Haas wrote: >> >> On Thu, Feb 4, 2016 at 7:00 AM, Amit Kapila >> wrote: >> > [ new patch ] >> >> I've committed this after heavy rewriting. In particular, I merged >> two loops, used local variables to get rid of the very long and quite >> repetitive lines, and did various cleanup on the documentation and >> comments. >> > > I think there is a typo in the committed code, patch to fix the same > is attached. Thanks! Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses
On Fri, Feb 5, 2016 at 6:14 AM, Peter Geogheganwrote: > On Wed, Feb 3, 2016 at 11:31 AM, Robert Haas wrote: >> Thanks for the review. I fixed the OID conflict, tweaked a few >> comments, and committed this. > > Thanks. I noticed a tiny, preexisting typo in the string abbreviated > key code. The attached patch fixes it. Gosh, I must have looked at that line 10 times without seeing that mistake. Thanks, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/04 21:57, Ashutosh Bapat wrote: One more: I think the following in postgresGetForeignJoinPaths() is a good idea, but I think it's okay to just check whether root->rowMarks is non-NIL, because that since we have rowmarks for all base relations except the target, if we have root->parse->commandType==CMD_DELETE (or root->parse->commandType==CMD_UPDATE), then there would be at least one non-target base relation in the joinrel, which would have a rowmark. Sorry, I am unable to understand it fully. But what you are suggesting that if there are root->rowMarks, then we are sure that there is at least one base relation apart from the target, which needs locking rows. Even if we don't have one, still changes in a row of target relation after it was scanned, can result in firing EPQ check, which would need the local plan to be executed, thus even if root->rowMarks is NIL, EPQ check can fire and we will need alternate local plan. Yeah, I think that is true, but if root->rowMarks==NIL, we won't have non-target foreign tables, and therefore postgresGetForeignJoinPaths() will never be called. No? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquierwrote: > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier > wrote: >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier >> wrote: >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas wrote: On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier wrote: > Yes, please let's use the custom language, and let's not care of not > more than 1 level of nesting so as it is possible to represent > pg_stat_replication in a simple way for the user. "not" is used twice in this sentence in a way that renders me not able to be sure that I'm not understanding it not properly. >>> >>> 4 times here. Score beaten. >>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine >>> to only support configurations up to one level of nested objects, like >>> that: >>> 2[node1, node2, node3] >>> node1, 2[node2, node3], node3 >>> In short, we could restrict things so as we cannot define a group of >>> nodes within an existing group. >> >> No, actually, that's stupid. Having up to two nested levels makes more >> sense, a quite common case for this feature being something like that: >> 2{node1,[node2,node3]} >> In short, sync confirmation is waited from node1 and (node2 or node3). >> >> Flattening groups of nodes with a new catalog will be necessary to >> ease the view of this data to users: >> - group name? >> - array of members with nodes/groups >> - group type: quorum or priority >> - number of items to wait for in this group > > So, here are some thoughts to make that more user-friendly. I think > that the critical issue here is to properly flatten the meta data in > the custom language and represent it properly in a new catalog, > without messing up too much with the existing pg_stat_replication that > people are now used to for 5 releases since 9.0. So, I would think > that we will need to have a new catalog, say > pg_stat_replication_groups with the following things: > - One line of this catalog represents the status of a group or of a single > node. > - The status of a node/group is either sync or potential, if a > node/group is specified more than once, it may be possible that it > would be sync and potential depending on where it is defined, in which > case setting its status to 'sync' has the most sense. If it is in sync > state I guess. > - Move sync_priority and sync_state, actually an equivalent from > pg_stat_replication into this new catalog, because those represent the > status of a node or group of nodes. > - group name, and by that I think that we had perhaps better make > mandatory the need to append a name with a quorum or priority group. > The group at the highest level is forcibly named as 'top', 'main', or > whatever if not directly specified by the user. If the entry is > directly a node, use the application_name. > - Type of group, quorum or priority > - Elements in this group, an element can be a group name or a node > name, aka application_name. If group is of type priority, the elements > are listed in increasing order. So the elements with lower priority > get first, etc. We could have one column listing explicitly a list of > integers that map with the elements of a group but it does not seem > worth it, what users would like to know is what are the nodes that are > prioritized. This covers the former 'priority' field of > pg_stat_replication. > > We may have a good idea of how to define a custom language, still we > are going to need to design a clean interface at catalog level more or > less close to what is written here. If we can get a clean interface, > the custom language implemented, and TAP tests that take advantage of > this user interface to check the node/group statuses, I guess that we > would be in good shape for this patch. > > Anyway that's not a small project, and perhaps I am over-complicating > the whole thing. > I agree with adding new system catalog to easily checking replication status for user. And group name will needed for this. What about adding group name with ":" to immediately after set of standbys like follows? 2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo] Also, regarding sync replication according to configuration, the view I'm thinking is following definition. =# \d pg_synchronous_replication Column | Type | Modifiers -+---+--- name| text | sync_type | text | wait_num | integer | sync_priority | inteter | sync_state| text | member| text[] | level | integer | write_location| pg_lsn | flush_location| pg_lsn | apply_location | pg_lsn | - "name" : node name or group name, or "main" meaning top level node. - "sync_type" : 'priority' or 'quorum' for group
[HACKERS] Fix for OpenSSL error queue bug
Attached patch fixes an issue reported by William Felipe Welter about a year ago [1]. It is loosely based on his original patch. As Heikki goes into on that thread, the appropriate action seems to be to constantly reset the error queue, and to make sure that we ourselves clear the queue consistently. (Note that we might not have consistently called ERR_get_error() in the event of an OOM within SSLerrmessage(), for example). I have not changed backend code in the patch, though; I felt that we had enough control of the general situation there for it to be unnecessary to lock everything down. The interface that OpenSSL exposes for all of this is very poorly thought out. It's not exactly clear how a client of OpenSSL can be a "good citizen" in handling the error queue. Correctly using the library is only ever described in terms of a very exact thing that must happen or must not happen. There is no overarching concept of how things fit together so that each OpenSSL client doesn't clobber the other. It's all rather impractical. Clearly, this patch needs careful review. [1] http://www.postgresql.org/message-id/flat/20150224030956.2529.83...@wrigleys.postgresql.org#20150224030956.2529.83...@wrigleys.postgresql.org -- Peter Geoghegan From d5433bc562f792afd75ef8664729db9e6a60ee62 Mon Sep 17 00:00:00 2001 From: Peter GeogheganDate: Tue, 26 Jan 2016 15:11:15 -0800 Subject: [PATCH] Distrust external OpenSSL clients; clear err queue OpenSSL has an unfortunate tendency to mix up per-session state error handling, and per-thread OpenSSL error handling. This can cause problems when programs that link to libpq with OpenSSL enabled have some other use of OpenSSL; without care, one caller of OpenSSL may cause problems for the other caller. To fix, don't trust other users of OpenSSL to clear the per-thread error queue. Instead, clear the entire per-thread queue ahead of certain I/O operations (these I/O operations mostly need to call SSL_get_error() to check for success, which relies on the queue being empty). This is a bit aggressive, but it's pretty clear that the other callers have a very dubious claim to ownership of the per-thread queue; problem reports involving PHP scripts that use both PHP's OpenSSL extension, and the pgsql PHP extension (libpq) are themselves evidence of this. Finally, be more careful about clearing our own error queue, so as to not cause these problems ourself. It's possibly that control previously did not always reach SSLerrmessage(), where ERR_get_error() was supposed to be called to clear the queue's earliest code. Make sure ERR_get_error() is always called, so as to spare other users of OpenSSL the possibility of similar problems caused by libpq (as opposed to problems caused by a third party OpenSSL library like PHP's OpenSSL extension). --- src/interfaces/libpq/fe-secure-openssl.c | 125 +-- 1 file changed, 104 insertions(+), 21 deletions(-) diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index 133546b..270d184 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -70,7 +70,7 @@ static int verify_peer_name_matches_certificate_name(PGconn *conn, static void destroy_ssl_system(void); static int initialize_SSL(PGconn *conn); static PostgresPollingStatusType open_client_SSL(PGconn *); -static char *SSLerrmessage(void); +static char *SSLerrmessage(unsigned long errcode); static void SSLerrfree(char *buf); static int my_sock_read(BIO *h, char *buf, int size); @@ -152,7 +152,7 @@ pgtls_open_client(PGconn *conn) !SSL_set_app_data(conn->ssl, conn) || !my_SSL_set_fd(conn, conn->sock)) { - char *err = SSLerrmessage(); + char *err = SSLerrmessage(ERR_get_error()); printfPQExpBuffer(>errorMessage, libpq_gettext("could not establish SSL connection: %s\n"), @@ -209,11 +209,37 @@ pgtls_read(PGconn *conn, void *ptr, size_t len) int result_errno = 0; char sebuf[256]; int err; + unsigned long errcode; rloop: SOCK_ERRNO_SET(0); + + /* + * Prepare to call SSL_get_error() by clearing thread's OpenSSL error + * queue. In general, the current thread's error queue must be empty + * before the TLS/SSL I/O operation is attempted, or SSL_get_error() + * will not work reliably. Since the possibility exists that other + * OpenSSL clients running in the same thread but not under our control + * will fail to call ERR_get_error() themselves (after their own I/O + * operations), pro-actively clear the per-thread error queue now. + */ + ERR_clear_error(); n = SSL_read(conn->ssl, ptr, len); err = SSL_get_error(conn->ssl, n); + + /* + * Other clients of OpenSSL may fail to call ERR_get_error(), but we + * always do (so as to not cause problems for OpenSSL clients that + * don't call ERR_clear_error() defensively, but are responsible enough + * to call ERR_get_error() to clear error
[HACKERS] Explanation for bug #13908: hash joins are badly broken
I have sussed what's happening in bug #13908. Basically, commit 45f6240a8fa9d355 ("Pack tuples in a hash join batch densely, to save memory") broke things for the case where a hash join is using a skew table. The reason is that that commit only changed the storage of tuples going into the main hash table; tuples going into the skew table are still allocated with a palloc apiece, without being put into the "chunk" storage. Now, if we're loading the hash table and we find that we've exceeded the storage allowed for skew tuples, ExecHashRemoveNextSkewBucket wants to push some skew tuples back into the main hash table; and it believes that linking such tuples into the appropriate main hashbucket chain is sufficient for that. Which it was before the aforesaid commit, and still is in simple cases. However, if we later try to do ExecHashIncreaseNumBatches, that function contains new code that assumes that it can find all tuples in the main hashtable by scanning the "chunk" storage directly. Thus, the pushed-back tuples are not scanned and are neither re-entered into the hash table nor dumped into a batch file. So they won't get joined. It looks like ExecHashIncreaseNumBuckets, if it were to run after some executions of ExecHashRemoveNextSkewBucket, would break things in the same way. That's not what's happening in this particular test case, though. I'm of the opinion that this is a stop-ship bug for 9.5.1. Barring somebody producing a fix over the weekend, I will deal with it by reverting the aforementioned commit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: make NOTIFY list de-duplication optional
Robert Haaswrites: > On Fri, Feb 5, 2016 at 10:17 AM, Filip RembiaÅkowski > wrote: >> - new GUC in "Statement Behaviour" section, notify_duplicate_removal > I agree with what Merlin said about this: > http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yc10ahu1nfia1tbhsg+35ds-oeueuapo7...@mail.gmail.com Yeah, I agree that a GUC for this is quite unappetizing. One idea would be to build a hashtable to aid with duplicate detection (perhaps only once the pending-notify list gets long). Another thought is that it's already the case that duplicate detection is something of a "best effort" activity; note for example the comment in AsyncExistsPendingNotify pointing out that we don't collapse duplicates across subtransactions. Would it be acceptable to relax the standards a bit further? For example, if we only checked for duplicates among the last N notification list entries (for N say around 100), we'd probably cover just about all the useful cases, and the runtime would stay linear. The data structure isn't tremendously conducive to that, but it could be done. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GiST support for UUIDs
On 12/25/2015 03:23 AM, Ildus Kurbangaliev wrote: On Fri, 25 Dec 2015 13:34:25 +0300 Teodor Sigaevwrote: First, variables (high and low) should not be declared in the middle of code. Second, assert will fail if ua.v64[0] == ub.v64[0] and ua.v64[1] > ub.v64[1] although it's a possible and allowed case. Third, actually you multiply high value by 2^64 anf low by 2^-64. Seems, it's needed to do only one multiplication. Thank you for review. Fixed. Just wanted to follow up on this and see about getting it added to the next commitfest. It looks like Iluds's latest patch (btree_gist_uuid_5.patch) doesn't appear on the commitfest page here: https://commitfest.postgresql.org/7/332/ Also the last few emails of the thread don't show up, although you can read them here: http://www.postgresql.org/message-id/20151225142340.46e577dd@lp I'm not sure the next step here. Do I click "Move to next CF" in the "Status" dropdown? Apologies for not being familiar with the protocol. I'd be sad to see this work just get ignored. Thanks, Paul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: make NOTIFY list de-duplication optional
On Fri, Feb 5, 2016 at 10:17 AM, Filip Rembiałkowskiwrote: > - new GUC in "Statement Behaviour" section, notify_duplicate_removal > (default true) > > Initial discussion in this thread: > http://www.postgresql.org/message-id/CAP_rwwmpzk9=sbjrztod05bdctyc43wnknu_m37dygvl4sa...@mail.gmail.com > > Rationale: for some legitimate use cases, duplicate removal is not required, > and it gets O(N^2) cost on large COPY/ insert transactions. I agree with what Merlin said about this: http://www.postgresql.org/message-id/CAHyXU0yoHe8Qc=yc10ahu1nfia1tbhsg+35ds-oeueuapo7...@mail.gmail.com -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Mon, Sep 07, 2015 at 05:05:10PM +0100, Greg Stark wrote: > I feel like I remember hearing about this before but I can't find any > mention of it in my mail archives. It seems pretty simple to add > support for LLVM's Address Sanitizer (asan) by using the hooks we > already have for valgrind. Nice. > In fact I think this would actually be sufficient. I'm not sure what > the MEMPOOL valgrind stuff is though. I don't think it's relevant to > address sanitizer which only tracks references to free'd or > unallocated pointers. Documentation: http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools A Valgrind memory pool is the precise analog of a PostgreSQL memory context. The design of the PostgreSQL Valgrind support gave mcxt.c primary responsibility for validating and invalidating pointers, and mcxt.c uses MEMPOOL client requests to do so. While Valgrind could catch all the same errors with just VALGRIND_MAKE_MEM_{DEFINED,NOACCESS,UNDEFINED}, the MEMPOOL client requests improve the attribution of errors. You get this: ==00:00:14:08.472 471537== Address 0x8ecd848 is 24 bytes inside a block of size 25 client-defined ==00:00:14:08.472 471537==at 0x7717FD: MemoryContextAlloc (mcxt.c:589) Instead of this: ==00:00:02:29.587 18940== Address 0x6a1b51b is 83,835 bytes inside a block of size 262,144 alloc'd ==00:00:02:29.587 18940==at 0x4C2353F: malloc (vg_replace_malloc.c:236) ==00:00:02:29.587 18940==by 0x7FD12F: AllocSetAlloc (aset.c:792) ==00:00:02:29.587 18940==by 0x7FEE9D: MemoryContextAlloc (mcxt.c:524) > I don't even see any need offhand for a configure flag or autoconf > test. We could have a configure flag just to be consistent with > valgrind but it seems pointless. If you're compiling with asan I don't > see any reason to not use it. I'm building this to see if it works > now. I agree. A flag guards Valgrind client requests, because we'd otherwise have no idea whether the user plans to run the binary under Valgrind. For ASAN, all relevant decisions happen at build time. > --- a/src/include/utils/memdebug.h > +++ b/src/include/utils/memdebug.h > @@ -19,6 +19,27 @@ > > #ifdef USE_VALGRIND > #include > + > +#elif __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__) > + > +#include > + > +#define VALGRIND_MAKE_MEM_DEFINED(addr, size) \ > + ASAN_UNPOISON_MEMORY_REGION(addr, size) > + > +#define VALGRIND_MAKE_MEM_NOACCESS(addr, size) \ > + ASAN_POISON_MEMORY_REGION(addr, size) > + > +#define VALGRIND_MAKE_MEM_UNDEFINED(addr, size) \ > + ASAN_UNPOISON_MEMORY_REGION(addr, size) > + > +#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size) do {} while (0) > +#define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed) do {} while (0) > +#define VALGRIND_DESTROY_MEMPOOL(context) do {} while (0) > +#define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} while (0) > +#define VALGRIND_MEMPOOL_FREE(context, addr) do {} while (0) > +#define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size) do {} while (0) aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit VALGRIND_MAKE_MEM_NOACCESS(). #define those two accordingly. If ASAN has no equivalent of MEMPOOL client requests, it's fine to stub out the rest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Explanation for bug #13908: hash joins are badly broken
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I'm of the opinion that this is a stop-ship bug for 9.5.1. Barring > somebody producing a fix over the weekend, I will deal with it by > reverting the aforementioned commit. Agreed. Thanks! Stephen signature.asc Description: Digital signature