Re: [HACKERS]
On Wed, Oct 26, 2016 at 2:51 PM, Masahiko Sawadawrote: > On Fri, Oct 21, 2016 at 2:38 PM, Ashutosh Bapat > wrote: >> On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas wrote: >>> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote >>> wrote: However, when I briefly read the description in "Transaction Management in the R* Distributed Database Management System (C. Mohan et al)" [2], it seems that what Ashutosh is saying might be a correct way to proceed after all: >>> >>> I think Ashutosh is mostly right, but I think there's a lot of room to >>> doubt whether the design of this patch is good enough that we should >>> adopt it. >>> >>> Consider two possible designs. In design #1, the leader performs the >>> commit locally and then tries to send COMMIT PREPARED to every standby >>> server afterward, and only then acknowledges the commit to the client. >>> In design #2, the leader performs the commit locally and then >>> acknowledges the commit to the client at once, leaving the task of >>> running COMMIT PREPARED to some background process. Design #2 >>> involves a race condition, because it's possible that the background >>> process might not complete COMMIT PREPARED on every node before the >>> user submits the next query, and that query might then fail to see >>> supposedly-committed changes. This can't happen in design #1. On the >>> other hand, there's always the possibility that the leader's session >>> is forcibly killed, even perhaps by pulling the plug. If the >>> background process contemplated by design #2 is well-designed, it can >>> recover and finish sending COMMIT PREPARED to each relevant server >>> after the next restart. In design #1, that background process doesn't >>> necessarily exist, so inevitably there is a possibility of orphaning >>> prepared transactions on the remote servers, which is not good. Even >>> if the DBA notices them, it won't be easy to figure out whether to >>> commit them or roll them back. >>> >>> I think this thought experiment shows that, on the one hand, there is >>> a point to waiting for commits on the foreign servers, because it can >>> avoid the anomaly of not seeing the effects of your own commits. On >>> the other hand, it's ridiculous to suppose that every case can be >>> handled by waiting, because that just isn't true. You can't be sure >>> that you'll be able to wait long enough for COMMIT PREPARED to >>> complete, and even if that works out, you may not want to wait >>> indefinitely for a dead server. Waiting for a ROLLBACK PREPARED has >>> no value whatsoever unless the system design is such that failing to >>> wait for it results in the ROLLBACK PREPARED never getting performed >>> -- which is a pretty poor excuse. >>> >>> Moreover, there are good reasons to think that doing this kind of >>> cleanup work in the post-commit hooks is never going to be acceptable. >>> Generally, the post-commit hooks need to be no-fail, because it's too >>> late to throw an ERROR. But there's very little hope that a >>> connection to a remote server can be no-fail; anything that involves a >>> network connection is, by definition, prone to failure. We can try to >>> guarantee that every single bit of code that runs in the path that >>> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an >>> ERROR, but that's going to be quite difficult to do: even palloc() can >>> throw an error. And what about interrupts? We don't want to be stuck >>> inside this code for a long time without any hope of the user >>> recovering control of the session by pressing ^C, but of course the >>> way that works is it throws an ERROR, which we can't handle here. We >>> fixed a similar issue for synchronous replication in >>> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a >>> WARNING in that case and then return control to the user. But if we >>> do that here, all of the code that every FDW emits has to be aware of >>> that rule and follow it, and it just adds to the list of ways that the >>> user backend can escape this code without having cleaned up all of the >>> prepared transactions on the remote side. >> >> Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak, >> tries to resolve prepared transactions in post-commit code. I agree >> with you here, that it should be avoided and the backend should take >> over the job of resolving transactions. >> >>> >>> It seems to me that the only way to really make this feature robust is >>> to have a background worker as part of the equation. The background >>> worker launches at startup and looks around for local state that tells >>> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED >>> operations pending that weren't completed during the last server >>> lifetime, whether because of a local crash or remote unavailability. >>> It attempts to complete
[HACKERS]
On Fri, Oct 21, 2016 at 2:38 PM, Ashutosh Bapatwrote: > On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas wrote: >> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote >> wrote: >>> However, when I briefly read the description in "Transaction Management in >>> the R* Distributed Database Management System (C. Mohan et al)" [2], it >>> seems that what Ashutosh is saying might be a correct way to proceed after >>> all: >> >> I think Ashutosh is mostly right, but I think there's a lot of room to >> doubt whether the design of this patch is good enough that we should >> adopt it. >> >> Consider two possible designs. In design #1, the leader performs the >> commit locally and then tries to send COMMIT PREPARED to every standby >> server afterward, and only then acknowledges the commit to the client. >> In design #2, the leader performs the commit locally and then >> acknowledges the commit to the client at once, leaving the task of >> running COMMIT PREPARED to some background process. Design #2 >> involves a race condition, because it's possible that the background >> process might not complete COMMIT PREPARED on every node before the >> user submits the next query, and that query might then fail to see >> supposedly-committed changes. This can't happen in design #1. On the >> other hand, there's always the possibility that the leader's session >> is forcibly killed, even perhaps by pulling the plug. If the >> background process contemplated by design #2 is well-designed, it can >> recover and finish sending COMMIT PREPARED to each relevant server >> after the next restart. In design #1, that background process doesn't >> necessarily exist, so inevitably there is a possibility of orphaning >> prepared transactions on the remote servers, which is not good. Even >> if the DBA notices them, it won't be easy to figure out whether to >> commit them or roll them back. >> >> I think this thought experiment shows that, on the one hand, there is >> a point to waiting for commits on the foreign servers, because it can >> avoid the anomaly of not seeing the effects of your own commits. On >> the other hand, it's ridiculous to suppose that every case can be >> handled by waiting, because that just isn't true. You can't be sure >> that you'll be able to wait long enough for COMMIT PREPARED to >> complete, and even if that works out, you may not want to wait >> indefinitely for a dead server. Waiting for a ROLLBACK PREPARED has >> no value whatsoever unless the system design is such that failing to >> wait for it results in the ROLLBACK PREPARED never getting performed >> -- which is a pretty poor excuse. >> >> Moreover, there are good reasons to think that doing this kind of >> cleanup work in the post-commit hooks is never going to be acceptable. >> Generally, the post-commit hooks need to be no-fail, because it's too >> late to throw an ERROR. But there's very little hope that a >> connection to a remote server can be no-fail; anything that involves a >> network connection is, by definition, prone to failure. We can try to >> guarantee that every single bit of code that runs in the path that >> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an >> ERROR, but that's going to be quite difficult to do: even palloc() can >> throw an error. And what about interrupts? We don't want to be stuck >> inside this code for a long time without any hope of the user >> recovering control of the session by pressing ^C, but of course the >> way that works is it throws an ERROR, which we can't handle here. We >> fixed a similar issue for synchronous replication in >> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a >> WARNING in that case and then return control to the user. But if we >> do that here, all of the code that every FDW emits has to be aware of >> that rule and follow it, and it just adds to the list of ways that the >> user backend can escape this code without having cleaned up all of the >> prepared transactions on the remote side. > > Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak, > tries to resolve prepared transactions in post-commit code. I agree > with you here, that it should be avoided and the backend should take > over the job of resolving transactions. > >> >> It seems to me that the only way to really make this feature robust is >> to have a background worker as part of the equation. The background >> worker launches at startup and looks around for local state that tells >> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED >> operations pending that weren't completed during the last server >> lifetime, whether because of a local crash or remote unavailability. >> It attempts to complete those and retries periodically. When a new >> transaction needs this type of coordination, it adds the necessary >> crash-proof state and then signals the
[HACKERS] [bug fix] Stats collector is not restarted on the standby
Hello, If the stats collector is forcibly terminated on the standby in streaming replication configuration, it won't be restarted until the standby is promoted to the primary. The attached patch restarts the stats collector on the standby. FYI, when the stats collector is down, SELECTs against the statistics views get stale data with the following message. LOG: using stale statistics instead of current ones because stats collector is not responding STATEMENT: select * from pg_stat_user_tables Regards Takayuki Tsunakawa stats_collector_not_restarted.patch Description: stats_collector_not_restarted.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor typo in reorderbuffer.c
Hi, Attached patch fixes a typo in reorderbuffer.c s/messsage/message/g Regards, Vinayak Pokale NTT Open Source Software Center typo-reorderbuffer-c.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unused variable in postgres_fdw/deparse.c
On Wed, Oct 26, 2016 at 6:19 AM, Kyotaro HORIGUCHIwrote: > Hello, compiler complains about unused variable during build > postgres_fdw without assertions (!--enable-cassert). > > deparse.c: In function ‘deparseFromExpr’: > deparse.c:1029:14: warning: unused variable ‘foreignrel’ [-Wunused-variable] > RelOptInfo *foreignrel = context->foreignrel; > ^~ > > The attched patch removes it and moves into the assertion below it. > Thanks for the patch and sorry for missing this in the review. I The patch applies but seems to have a trailing white space. patch -p1 < /mnt/hgfs/tmp/postgre_fdw_delete_unused_var.patch (Stripping trailing CRs from patch.) patching file contrib/postgres_fdw/deparse.c But that's removed my "patch" command. It compiles cleanly and make check in contrib/postgres_fdw does not show any failure. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, 25 Oct 2016 22:30:48 -0500 "Karl O. Pinc"wrote: > Since pg_log_file may contain only one line, and that > line may be either the filename of the csv log file or > the file name of the stderr file name it's impossible > to tell whether that single file is in csv or stderr > format. I suppose it might be possible based on file > name suffix, but Unix expressly ignores file name > extensions and it seems unwise to force dependence > on them. Perhaps each line could begin with > the "type" of the file, either 'csv' or 'stderr' > followed by a space and the file name? > > In other words, > as long as you're making the content of pg_log_file > a data structure that contains more than just a single > file name you may as well make that data structure > something well-defined, easily parseable in shell, extensible, > and informative. While you're at it, it wouldn't hurt to provide another function that tells you the format of the file returned by pg_current_logfile(), since pg_current_logfile() called without arguments could return either a stderr formatted file or a csvlog formatted file. Or leave it for the future. Just a thought. Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/10/26 12:09, Amit Kapila wrote: > On Wed, Oct 26, 2016 at 8:27 AM, Amit Langote >wrote: >> On 2016/10/26 11:41, Amit Kapila wrote: >>> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote >>> wrote: Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned table is specified, but I thought this may be fine initially. >>> >>> Okay, I don't want to add anything to your existing work unless it is >>> important. However, I think there should be some agreement on which >>> of the restrictions are okay for first version of patch. This can >>> avoid such questions in future from other reviewers. >> >> OK, so I assume you don't find this particular restriction problematic in >> long term. > > I think you can keep it as you have in patch. After posting your > updated patches, please do send a list of restrictions which this > patch is imposing based on the argument that for first version they > are not essential. OK, agreed that it will be better to have all such restrictions and limitations of the first version listed in one place, rather than being scattered across different emails where they might have been mentioned and discussed. I will try to include such a list when posting the latest set of patches. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
Another new version of a doc patch to the v6 patch. More better English. *sigh* Regards, KarlFree Software: "You don't pay back, you pay forward." -- Robert A. Heinlein patch_pg_current_logfile-v6.diff.patchv4 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Mon, Oct 24, 2016 at 12:25 PM, Michael Paquierwrote: > On Mon, Oct 24, 2016 at 1:39 PM, Amit Kapila wrote: >> >> I think what you are saying is not completely right, because we do >> update minRecoveryPoint when we don't perform a new restart point. >> When we perform restart point, then it assumes that flushing the >> buffers will take care of updating minRecoveryPoint. > > Yep, minRecoveryPoint still gets updated when the last checkpoint > record is the last restart point to avoid a hot standby to allow > read-only connections at a LSN-point earlier than the last shutdown. > Anyway, we can clearly reject 1. in the light of > https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com > when playing with different stop locations at recovery. > That point holds good only for cases when we try to update minimum recovery point beyond what is required (like earlier we were thinking to update it unconditionally), however what is being discussed here is to update only if it is not updated by flush of buffers. I think that is okay, but I feel Kyotaro-San's fix is a good fix for the problem and we don't need to add some more code (additional update of control file) to fix the problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Tue, Oct 25, 2016 at 10:43 PM, Robert Haaswrote: > On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapila wrote: >> >> You are right that it will include additional WAL than strictly >> necessary, but that can happen today as well because minrecoverypoint >> could be updated after you have established restart point for >> do_pg_start_backup(). Do you want to say that even without patch in >> some cases we are including additional WAL in backup than what is >> strictly necessary, so it is better to improve the situation? > > In that case, including additional WAL is *necessary*. If the minimum > recovery point is updated after do_pg_start_backup(), pages bearing > LSNs newer than the old minimum recovery point could conceivably have > been copied as part of the backup, > I might be missing something here, but I think this theory doesn't hold true with respect to current code because we always update minimum recovery point to last record being replayed not till till the LSN of the page being flushed. This is done to avoid repeated update of control file. Considering that, it seems to me that the patch by Kyotaro-San is a reasonable fix provided we update comments at all appropriate places. I can try to update the comments, if the proposed fix is acceptable to you. > and therefore we need to replay up > through the new minimum recovery point to guarantee consistency. > >>> I can think of two solutions that would be "tighter": >>> >>> 1. When performing a restartpoint, update the minimum recovery point >>> to just beyond the checkpoint record. I think this can't hurt anyone >>> who is actually restarting recovery on the same machine, because >>> that's exactly the point where they are going to start recovery. A >>> minimum recovery point which precedes the point at which they will >>> start recovery is no better than one which is equal to the point at >>> which they will start recovery. >> >> I think this will work but will cause an unnecessary control file >> update for the cases when buffer flush will anyway do it. It can work >> if we try to do only when required (minrecoverypoint is lesser than >> lastcheckpoint) after flush of buffers. > > Sure, that doesn't sound too hard to implement. > If you are inclined towards this solution, then I think what we need to do is to change the API UpdateMinRecoveryPoint() such that it's second parameter can take three values. 0 means update minRecoveryPoint to passed lsn if minRecoveryPoint < lsn; 1 means update minRecoveryPoint to latest replayed point if minRecoveryPoint < lsn, same as currently false for *force*; 2 indicates same behaviour as current *force* as true. Also we need to pass currentTLI parameter (lastCheckPoint.ThisTimeLineID) to this API to update minRecoveryPointTLI. I have not tried this, but I think something on these lines should work. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, 25 Oct 2016 22:53:41 -0500 "Karl O. Pinc"wrote: > On Tue, 25 Oct 2016 22:30:48 -0500 > "Karl O. Pinc" wrote: > > > Hope to provide more feedback soon. Er, attached is yet another doc patch to the v6 patch. Sorry about that. Changes pg_current_logfile() detailed documentation. Instead of saying that return values are undefined I've documented that NULL is returned. And reworded, and shortened, my previous wording. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein patch_pg_current_logfile-v6.diff.patchv3 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, 25 Oct 2016 22:30:48 -0500 "Karl O. Pinc"wrote: > Hope to provide more feedback soon. Before I forget: "make check" fails, due to oid issues with pg_current_logfile(). You're writing Unix eol characters into pg_log_file. (I think.) Does this matter on MS Windows? (I'm not up on MS Windows, and haven't put any thought into this at all. But didn't want to forget about the potential issue.) Now that pg_log_file contains multiple lines shouldn't it be called pg_log_files? In the docs, other functions that take optional arguments show up as multiple rows. Attached is a new version of my patch to the v6 patch which fixes this and supplies a slightly better short description of pg_log_filename(). Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein patch_pg_current_logfile-v6.diff.patchv2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch proposal
Attached is the patch which introduces the new parameter "recovery_target_incomplete" - Currently this patch addresses two scenarios - *Scenario 1 :* Provide options to DBA when the recovery target is not reached and has stopped mid-way due to unavailability of WALs When performing recovery to a specific recovery target, "recovery_target_incomplete" parameter can be used to either promote, pause or shutdown when recovery does not reach the specified recovery target and reaches end-of-the-wal. *Scenario 2 :* Generates Errors, Hints when the specified recovery target is prior to the backup's current position (xid, time and lsn). This behaviour is integrated with the parameters "recovery_target_time","recovery_target_lsn" and "recovery_target_xid". I would like to know if the approach this patch incorporates sounds ok ? My other comments are inline On Mon, Aug 29, 2016 at 3:17 PM, Venkata B Nagothiwrote: > > On Fri, Aug 26, 2016 at 10:58 PM, Stephen Frost > wrote: > >> * Venkata B Nagothi (nag1...@gmail.com) wrote: >> > On Fri, Aug 26, 2016 at 12:30 PM, Stephen Frost >> wrote: >> > > * Venkata B Nagothi (nag1...@gmail.com) wrote: >> > > > This behaviour will be similar to that of >> recovery_target="immediate" and >> > > > can be aliased. >> > > >> > > I don't believe we're really going at this the right way. Clearly, >> > > there will be cases where we'd like promotion at the end of the WAL >> > > stream (as we currently have) even if the recovery point is not found, >> > > but if the new option's "promote" is the same as "immediate" then we >> > > don't have that. >> > >> > My apologies for the confusion. Correction - I meant, "promote" option >> > would promote the database once the consistent point is reached at the >> > end-of-the-WAL. >> >> "consistent point" and "end-of-the-WAL" are not the same. >> >> > > recovery_target = immediate, other >> > > >> > > recovery_action_target_found = promote, pause, shutdown >> > >> > This is currently supported by the existing parameter called >> > "recovery_target_action" >> >> Right, sure, we could possibly use that, or create an alias, etc. >> >> > > recovery_action_target_not_found = promote, pause, shutdown >> > >> > This is exactly what newly proposed parameter will do. >> >> Then it isn't the same as 'recovery_target = immediate'. >> >> > > One question to ask is if we need to support an option for xid and >> time >> > > related to when we realize that we won't find the recovery target. If >> > > consistency is reached at a time which is later than the recovery >> target >> > > for time, what then? Do we go through the rest of the WAL and perform >> > > the "not found" action at the end of the WAL stream? If we use that >> > > approach, then at least all of the recovery target types are handled >> the >> > > same, but I can definitely see cases where an administrator might >> prefer >> > > an "error" option. >> > >> > Currently, this situation is handled by recovery_target_inclusive >> > parameter. >> >> No, that's not the same. >> >> > Administrator can choose to stop the recovery at any consistent >> > point before or after the specified recovery target. Is this what you >> were >> > referring to ? >> >> Not quite. >> >> > I might need a bit of clarification here, the parameter i am proposing >> > would be effective only if the specified recovery target is not reached >> and >> > may not be effective if the recovery goes beyond recovery target point. >> >> Ok, let's consider this scenario: >> >> Backup started at: 4/22D3B8E0, time: 2016-08-26 08:29:08 >> Backup completed (consistency) at: 4/22D3B8EF, 2016-08-26 08:30:00 >> recovery_target is not set >> recovery_target_time = 2016-08-26 08:29:30 >> recovery_target_inclusive = false >> >> In such a case, we will necessairly commit transactions which happened >> between 8:29:30 and 8:30:00 and only stop (I believe..) once we've >> reached consistency. We could possibly detect that case and throw an >> error instead. The same could happen with xid. >> > > Yes, currently, PG just recovers until the consistency is reached. It > makes sense to throw an error instead. > This has not been addressed yet. Currently, the patch only considers generating an error if the recovery target position is prior the current backup's position and this is irrespective of weather backup_label is present or not. I will share my updates on how this can be addressed. > > >> Working through more scenarios would be useful, I believe. For example, >> what if the xid or time specified happened before the backup started? >> >> Basically, what I was trying to get at is that we might be able to >> detect that we'll never find a given recovery target point without >> actually replaying all of WAL and wondering if we should provide options >> to control what to do in such a case. >> > > Ah, Yes, I got the point and I agree. Thanks for the clarification. > > Yes,
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, 18 Oct 2016 14:18:36 +0200 Gilles Daroldwrote: > Here is the v6 of the patch, here is the description of the > pg_current_logfile() function, I have tried to keep thing as simple as > possible: > > pg_current_logfile( [ destination text ] ) > - Attached is a doc patch to apply on top of your v6 patch. Changes are, roughly: Put pg_log_file in alphabetical order in the file name listing and section body. Put pg_current_logfile in alphabetical order in the function name listing and section body. 1 argument functions don't seem to have a parameter name when listed in documentation tables, just a data type, so I got rid of the parameter name for pg_current_logfile(). Cleaned up the wording and provided more detail. Added hyperlink to log_destination config parameter. Added markup to various system values. The markup does not stand out very well in the html docs, but that's a different issue and should be fixed by changing the markup processing. (I used the markup found in the log_destination() config parameter docs.) pg_current_logfile does not seem related to pg_listening_channels or pg_notification_queue_usage so I moved the latter 2 index entries closer to their text. I also have a couple of preliminary comments. It seems like the code is testing for whether the logfile is a CSV file by looking for a '.csv' suffix on the end of the file name. This seems a little fragile. Shouldn't it be looking at something set internally when the log_destination config declaration is parsed? Since pg_log_file may contain only one line, and that line may be either the filename of the csv log file or the file name of the stderr file name it's impossible to tell whether that single file is in csv or stderr format. I suppose it might be possible based on file name suffix, but Unix expressly ignores file name extensions and it seems unwise to force dependence on them. Perhaps each line could begin with the "type" of the file, either 'csv' or 'stderr' followed by a space and the file name? In other words, as long as you're making the content of pg_log_file a data structure that contains more than just a single file name you may as well make that data structure something well-defined, easily parseable in shell, extensible, and informative. Hope to provide more feedback soon. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d1a5b96..54e10af 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15443,6 +15443,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); + pg_current_logfile(text) + text + log file in use by the logging collector + + + pg_my_temp_schema() oid OID of session's temporary schema, or 0 if none @@ -15480,12 +15486,6 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - pg_current_logfile( destination text) - text - current log file used by the logging collector - - - session_user name session user name @@ -15667,6 +15667,27 @@ SET search_path TO schema , schema, .. +pg_current_logfile + + + +pg_current_logfile returns, as text +the name of the log file currently in use by the logging collector. +Log collection must be active or the return value is undefined. When +the configuration parameter +contains both csvlog and stderr +pg_current_logfile returns the stderr filename. +When does not +contain stderr the csv filename is returned. When it +does not contains csvlog the stderr filename is +returned. To request a specific file format supply either +csvlog or stderr as the value of the +optional, type text, parameter. The return value is +undefined when the log format requested is not a configured log +destination. + + + pg_my_temp_schema @@ -15692,23 +15713,6 @@ SET search_path TO schema , schema, .. pg_notification_queue_usage - -pg_current_logfile - - - -pg_current_logfile returns the name of the -current log file used by the logging collector, as text. -Log collection must be active or the return value is undefined. When -csvlog is used as log destination, the csv filename is returned, when -it is set to stderr, the stderr filename is returned. When both are -used, it returns the stderr filename. -There is an optional parameter of type text to determines -the log filename to return following the log destination, values can -be 'csvlog' or 'stderr'. When the log format asked is not used as log -destination then the return value is undefined. - - pg_listening_channels returns a set of names of asynchronous
Re: [HACKERS] Declarative partitioning - another take
On Wed, Oct 26, 2016 at 8:27 AM, Amit Langotewrote: > On 2016/10/26 11:41, Amit Kapila wrote: >> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote >> wrote: 1. @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate, { .. + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from partitioned table \"%s\"", + RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."))); .. } Why is this restriction? Won't it be useful to allow it for the cases when user wants to copy the data of all the partitions? >>> >>> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned >>> table is specified, but I thought this may be fine initially. >>> >> >> Okay, I don't want to add anything to your existing work unless it is >> important. However, I think there should be some agreement on which >> of the restrictions are okay for first version of patch. This can >> avoid such questions in future from other reviewers. > > OK, so I assume you don't find this particular restriction problematic in > long term. > I think you can keep it as you have in patch. After posting your updated patches, please do send a list of restrictions which this patch is imposing based on the argument that for first version they are not essential. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/10/26 11:41, Amit Kapila wrote: > On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote >wrote: >>> 1. >>> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate, >>> { >>> .. >>> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >>> + ereport(ERROR, >>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE), >>> + errmsg("cannot copy from partitioned table \"%s\"", >>> + RelationGetRelationName(rel)), >>> + errhint("Try the COPY (SELECT ...) TO variant."))); >>> .. >>> } >>> >>> Why is this restriction? Won't it be useful to allow it for the cases >>> when user wants to copy the data of all the partitions? >> >> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned >> table is specified, but I thought this may be fine initially. >> > > Okay, I don't want to add anything to your existing work unless it is > important. However, I think there should be some agreement on which > of the restrictions are okay for first version of patch. This can > avoid such questions in future from other reviewers. OK, so I assume you don't find this particular restriction problematic in long term. >>> 2. >>> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") && >>> + partnatts > 1) >>> + ereport(ERROR, >>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), >>> + errmsg("cannot list partition using more than one column"))); >>> >>> /cannot list/cannot use list >> >> Actually "list partition" works here as a verb, as in "to list partition". >> > > I am not an expert of this matter, so probably some one having better > grip can comment. Are we using something similar in any other error > message? In fact, I changed to the current text after Robert suggested the same [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoaPxXJ14eDVia514UiuQAXyZGqfbz8Qg3G4a8Rz2gKF7w%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Wed, Oct 26, 2016 at 6:36 AM, Amit Langotewrote: >> 1. >> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate, >> { >> .. >> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) >> + ereport(ERROR, >> + (errcode(ERRCODE_WRONG_OBJECT_TYPE), >> + errmsg("cannot copy from partitioned table \"%s\"", >> + RelationGetRelationName(rel)), >> + errhint("Try the COPY (SELECT ...) TO variant."))); >> .. >> } >> >> Why is this restriction? Won't it be useful to allow it for the cases >> when user wants to copy the data of all the partitions? > > Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned > table is specified, but I thought this may be fine initially. > Okay, I don't want to add anything to your existing work unless it is important. However, I think there should be some agreement on which of the restrictions are okay for first version of patch. This can avoid such questions in future from other reviewers. >> 2. >> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") && >> + partnatts > 1) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), >> + errmsg("cannot list partition using more than one column"))); >> >> /cannot list/cannot use list > > Actually "list partition" works here as a verb, as in "to list partition". > I am not an expert of this matter, so probably some one having better grip can comment. Are we using something similar in any other error message? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
On Tue, Oct 25, 2016 at 10:35 PM, Alvaro Herrerawrote: > We don't promise order of execution (which is why we can afford to sort > on cost), but I think it makes sense to keep a rough ordering based on > cost, and not let this push-down affect those ordering decisions too > much. Ok, Thanks for clarification. > > I think it's fine to push-down quals that are within the same order of > magnitude of the cost of a pushable condition, while keeping any other > much-costlier qual (which could otherwise be pushed down) together with > non-pushable conditions; this would sort-of guarantee within-order-of- > magnitude order of execution of quals. > > Hopefully an example clarifies what I mean. Let's suppose you have > three quals, where qual2 is non-pushable but 1 and 3 are. cost(1)=10, > cost(2)=11, cost(3)=12. Currently, they are executed in that order. > > If you were to compare costs in the straightforward way, you would push > only 1 (because 3 is costlier than 2 which is not push-down-able). With > fuzzy comparisons, you'd push both 1 and 3, because cost of 3 is close > enough to that of qual 2. > > But if cost(3)=100 then only push qual 1, and let qual 3 be evaluated > together with 2 outside the scan node. After putting more thought on this, IMHO it need not to be so complicated. Currently we are talking about pushing only "var op const", and cost of all such functions are very low and fixed "1". Do we really need to take care of any user defined function which is declared with very low cost ? Because while building index conditions also we don't take care of such things. Index conditions will always we evaluated first then only filter will be applied. Am I missing something ? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.6, background worker processes, and PL/Java
On 10/25/16 18:56, Chapman Flack wrote: > If pooled, and tied to the backend that started them, do they need > to do anything special to detect when the leader has executed > SET ROLE or SET SESSION AUTHORIZATION? Let me guess ... such information is *not* synchronized across workers, and that'd be why the manual says "functions must be marked PARALLEL RESTRICTED if they access ... client connection state ..."? That's probably a resounding 'no' for declaring any PL/Java function SAFE, then. And if changing "the transaction state even temporarily (e.g. a PL/pgsql function which establishes an EXCEPTION block to catch errors)" is enough to require UNSAFE, then it may be that RESTRICTED is off limits too, as there are places PL/Java does that internally. I take it that example refers not to just any use of PG_TRY/PG_CATCH, but only to those uses where an internal subtransaction is used to allow execution to continue? If a person writes a function in some language (SQL, for example), declares it PARALLEL SAFE but is lying because it calls another function (in Java, say) that is PARALLEL UNSAFE or RESTRICTED, does PostgreSQL detect or prevent that, or is it just considered an unfortunate mistake by the goofball who declared the first function safe? And if that's not already prevented, could it be worth adding code in the PL/Java call handler to detect such a situation and make sure it ends in a meaningful ereport and not something worse? -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/10/25 15:58, Amit Kapila wrote: > On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote >wrote: >> On 2016/10/05 2:12, Robert Haas wrote: >>> Hmm, do we ever fire triggers on the parent for operations on a child >>> table? Note this thread, which seems possibly relevant: >>> >>> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com >> >> The answer to your question is no. >> >> The thread you quoted discusses statement-level triggers and the >> conclusion is that they don't work as desired for UPDATE and DELETE on >> inheritance tables. As things stand, only UPDATE or DELETE on the parent >> affects the child tables and it's proposed there that the statement-level >> triggers on the parent and also on any child tables affected should be >> fired in that case. >> > > Doesn't that imply that the statement level triggers should be fired > for all the tables that get changed for statement? If so, then in > your case it should never fire for parent table, which means we could > disallow statement level triggers as well on parent tables? I may have misunderstood statement-level triggers, but don't they apply to tables *specified* as the target table in the statement, instead of those *changed* by resulting actions? Now in case of inheritance, unless ONLY is specified, all tables in the hierarchy including the parent are *implicitly* specified to be affected by an UPDATE or DELETE operation. So, if some or all of those tables have any statement-level triggers defined, they should get fired. That was the conclusion of that thread, but that TODO item still remains [1]. I am not (or no longer) sure how that argument affects INSERT on partitioned tables with tuple-routing though. Are partitions at all levels *implicitly specified to be affected* when we say INSERT INTO root_partitioned_table? > Some of the other things that we might want to consider disallowing on > parent table could be: > a. Policy on table_name Perhaps. Since there are no rows in the parent table(s) itself of a partition hierarchy, it might not make sense to continue to allow creating row-level security policies on them. > b. Alter table has many clauses, are all of those allowed and will it > make sense to allow them? Currently, we only disallow the following with partitioned parent tables as far as alter table is concerned. - cannot change inheritance by ALTER TABLE partitioned_table INHERIT ... - cannot let them be regular inheritance parents either - that is, the following is disallowed: ALTER TABLE some_able INHERIT partitioned_table - cannot create UNIQUE, PRIMARY KEY, FOREIGN KEY, EXCLUDE constraints - cannot drop column involved in the partitioning key Most other forms that affect attributes and constraints follow the regular inheritance behavior (recursion) with certain exceptions such as: - cannot add/drop an attribute or check constraint to *only* to/from the parent - cannot add/drop NOT NULL constraint to/from *only* the parent Thoughts? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Thanks for the review! On 2016/10/25 20:32, Amit Kapila wrote: > On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote >wrote: >> On 2016/10/05 2:12, Robert Haas wrote: > >> Attached revised patches. > > Few assorted review comments for 0001-Catalog*: > > > 1. > @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate, > { > .. > + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > + ereport(ERROR, > + (errcode(ERRCODE_WRONG_OBJECT_TYPE), > + errmsg("cannot copy from partitioned table \"%s\"", > + RelationGetRelationName(rel)), > + errhint("Try the COPY (SELECT ...) TO variant."))); > .. > } > > Why is this restriction? Won't it be useful to allow it for the cases > when user wants to copy the data of all the partitions? Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned table is specified, but I thought this may be fine initially. > 2. > + if (!pg_strcasecmp(stmt->partspec->strategy, "list") && > + partnatts > 1) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > + errmsg("cannot list partition using more than one column"))); > > /cannot list/cannot use list Actually "list partition" works here as a verb, as in "to list partition". > 3. > @@ -77,7 +77,7 @@ typedef enum DependencyType > DEPENDENCY_INTERNAL = 'i', > DEPENDENCY_EXTENSION = 'e', > DEPENDENCY_AUTO_EXTENSION = 'x', > - DEPENDENCY_PIN = 'p' > + DEPENDENCY_PIN = 'p', > } DependencyType; > Why is this change required? Looks like a leftover hunk from previous revision. Will fix. > 4. > @@ -0,0 +1,69 @@ > +/*- > + * > + * pg_partitioned_table.h > + * definition of the system "partitioned table" relation > + * along with the relation's initial contents. > + * > + * > + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group > > Copyright year should be 2016. Oops, will fix. > 5. > +/* > + * PartitionSpec - partition key definition including the strategy > + * > + * 'strategy' partition strategy name ('list', 'range', etc.) > > etc. in above comment seems to be unnecessary. Will fix, although I thought that list is yet incomplete. > 6. > + {PartitionedRelationId, /* PARTEDRELID */ > > Here PARTEDRELID sounds inconvenient, how about PARTRELID? Agreed. There used to be another catalog which had used up PARTRELID, but that's no longer an issue. I will include these changes in the next version of patches I will post soon in reply to [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unused variable in postgres_fdw/deparse.c
Hello, compiler complains about unused variable during build postgres_fdw without assertions (!--enable-cassert). deparse.c: In function ‘deparseFromExpr’: deparse.c:1029:14: warning: unused variable ‘foreignrel’ [-Wunused-variable] RelOptInfo *foreignrel = context->foreignrel; ^~ The attched patch removes it and moves into the assertion below it. regards, diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 8da8c11..450693a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1026,11 +1026,10 @@ static void deparseFromExpr(List *quals, deparse_expr_cxt *context) { StringInfo buf = context->buf; - RelOptInfo *foreignrel = context->foreignrel; RelOptInfo *scanrel = context->scanrel; /* For upper relations, scanrel must be either a joinrel or a baserel */ - Assert(foreignrel->reloptkind != RELOPT_UPPER_REL || + Assert(context->foreignrel->reloptkind != RELOPT_UPPER_REL || scanrel->reloptkind == RELOPT_JOINREL || scanrel->reloptkind == RELOPT_BASEREL); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wraparound warning
On Tue, Oct 25, 2016 at 01:32:04PM -0400, Bruce Momjian wrote: > On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote: > > On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjianwrote: > > > LOG: MultiXact member wraparound protections are now enabled > > > > > > I thought that was going to be removed at some point, no? Latest proposal on when to remove it: https://www.postgresql.org/message-id/flat/CA%2BTgmobOoo9a9tGbvXgnvZC_p_15sVNHu2mV9usbB_OLGRZSmw%40mail.gmail.com Since we have outstanding multixact bugs, that proposal's clock of "a couple of years" has not begun to tick. > > Maybe what we should do (since this message obviously annoys everyone) > > It is a persistent reminder our of engineering failure. I guess it > should annoy us, and maybe that's OK. Yep. > > is change things so that, starting in v10, a message is logged at > > startup if those protections are NOT enabled, and nothing is logged if > > they are enabled. Keep the message for the case where protections are > > enabled later, after startup time. I continue to find the message good as-is, per the thread I linked above. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/25/16 4:48 PM, Tomas Vondra wrote: The main issue that bugs me is the name of the Gen allocator, but I don't have a good naming ideas :( The basic characteristics of Gen is that it does not reuse space released by pfree(), relying on the fact that the whole block will become free. That should be reflected in the name somehow, I guess. OneTime? OneUse? OneShot? AllocOnce? OneHitWonder? ;P -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.6, background worker processes, and PL/Java
Hi, I have a report of a PL/Java crash in 9.6 where the stack trace suggests it was trying to initialize in a background worker process (not sure why that even happened, yet), and by my first glance, it seems to have crashed dereferencing MyProcPort, which I am guessing a BGW might not always have (?). So, as I try to get up to speed on this PostgreSQL feature, it seems to me that I have up to three different cases that I may need to make PL/Java detect and respond appropriately to. (If you see me veering into any misconceptions, please let me know.) 1. A worker explicitly created with Register... or RegisterDynamic... that has not called ...InitializeConnection... and so isn't any particular user or connected to any database. 2. A worker explicitly created that has called ...Initialize... and therefore is connected to some database as some user. (So, is there a MyProcPort in this case?) 3. A worker implicitly created for a parallel query plan (and therefore associated with a database and a user). Does this have a MyProcPort? Case 1, I think I at most need to detect and ereport. It is hard to imagine how it could even arise, as without a database connection there's no pg_extension, pg_language, or pg_proc, but I suppose it could happen if someone misguidedly puts libpljava in shared_preload_libraries, or some other bgw code inexplicably loads it. It's a non-useful case as PL/Java has nothing to do without a database connection and sqlj schema. Case 2 might be worth supporting, but I may need to account for anything that differs in this environment from a normal connected backend. Case 3 seems most likely. It should only be possible by invoking a declared Java function that somebody marked parallel-safe, right? In the parallel-unsafe or -restricted cases, PL/Java can only find itself invoked within the leader process? Such a leader process can only be a normal backend? Or perhaps also a case-2 explicitly created BGW that is executing a query? My main question is, what state do I need to examine at startup in order to distinguish these cases? Do I detect I'm in a BGW by a non-null MyBgworkerEntry? If it's there, do I detect whether I have a database and an identity by checking for a MyProcPort, or some other way? As for declaring functions parallel-unsafe, -restricted, or -safe, I assume there should be no problems with PL/Java functions with the default designation of unsafe. There should be no essential problem if someone declares a function -restricted - provided PL/Java itself can be audited to make sure it doesn't do any of the things restricted functions can't do - as it will only be running in the leader process anyway. Even should somebody mark a PL/Java function safe, while hard to imagine a good case for, shouldn't really break anything; as the workers are separate processes, this should be safe. Any imagined speed advantage of the parallel query is likely to evaporate while the several processes load their own JVMs, but nothing should outright break. That leads me to: Are BGWs for parallel queries born fresh for each query, or do they get pooled and reused? If pooled, can they be reused across backends/database connections/ identities, or only by the backend that created them? If reusable across contexts, that's a dealbreaker and I'd have to have PL/Java reject any parallel-safe declaration, but a pool tied to a connection should be ok (and better yet, allow amortizing the JVM startup cost). If pooled, and tied to the backend that started them, do they need to do anything special to detect when the leader has executed SET ROLE or SET SESSION AUTHORIZATION? If all of this is covered to death in some document I obviously haven't read, please feel free to point me to it. Thanks! -Chap -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On 10/22/16 12:38 PM, Tom Lane wrote: Jim Nasbywrites: > On 10/21/16 7:43 PM, Tom Lane wrote: >> Alvaro Herrera writes: >>> Agreed. The problem is how to install it without breaking pg_upgrade. > It can't look up relation names? It can't shove 64 bytes into a page that has < 64 bytes free space. Yeah, that would need to be a special case, but unless the page pointers are horked you'd be able to tell if there was a name in there. >> Well, that's the first problem. The second problem is how to cope with >> RENAME TABLE. > If the name was only encoded in the first block of the relation I'm not > sure how bad this would be; are there any facilities to change the name > back on a rollback? No. How would that work in crash situations? (And keep in mind that the Well, if FPI's were enabled you'd get the old page back. Even without that recovery could remember rename records it's seen that didn't commit. argument for this hinges entirely on its being 100% trustworthy after a crash.) I don't think this needs to be 100% reliable to be useful, especially if we went with Andreas suggestion to store both the old and new name (and storing the OID as well isn't a bad idea). If you're ever at the point of looking at this info you're already in deep trouble and should already be taking everything with a grain of salt anyway. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/1/16 7:34 PM, Tomas Vondra wrote: +/* otherwise add it to the proper freelist bin */ Looks like something went missing... :) Ummm? The patch contains this: +/* otherwise add it to the proper freelist bin */ +if (set->freelist[block->nfree]) +set->freelist[block->nfree]->prev = block; + +block->next = set->freelist[block->nfree]; +set->freelist[block->nfree] = block; Which does exactly the thing it should do. Or what is missing? What's confusing is the "otherwise" right at the beginning of the function: +static void +add_to_freelist(Slab set, SlabBlock block) +{ + /* otherwise add it to the proper freelist bin */ + if (set->freelist[block->nfree]) + set->freelist[block->nfree]->prev = block; + + block->next = set->freelist[block->nfree]; + set->freelist[block->nfree] = block; +} Otherwise what? What's the other option? (Haven't looked at the newer patch, so maybe this isn't an issue anymore.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 26 October 2016 at 10:58, Tom Lanewrote: > The alternative I'm now thinking about pursuing is to get rid of the > conversion of RLS quals to subqueries. Instead, we can label individual > qual clauses with security precedence markings. Concretely, suppose we > add an "int security_level" field to struct RestrictInfo. The semantics > of this would be that a qual with a lower security_level value must be > evaluated before a qual with a higher security_level value, unless the > latter qual is leakproof. (It would likely also behoove us to add a > "leakproof" bool field to struct RestrictInfo, to avoid duplicate > leakproof-ness checks on quals. But that's just an optimization.) I wonder if there will be a need for more security_levels in the future, otherwise perhaps a more generic field can be added, like "int evaulation_flags". In [1], there's still things to work out with it, but I mentioned that I'd like to improve equivalence classes to handle more than just simple equality, which seems to require "optional" RestrictInfos. It would be nice if we could store all this in one field as a set of bits. [1] https://www.postgresql.org/message-id/cakjs1f9fpdlkm6%3dsuzagwuch3otbspk6k0yt8-a1hgjfapl...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does it make sense to add a -q (quiet) flag to initdb?
On 10/25/16 11:26 AM, Joshua D. Drake wrote: Per: https://www.commandprompt.com/blog/can_i_make_initdb_quiet/ This was a question that was asked on #postgresql. Obviously we found a work around but I wonder if it makes sense to add a -q to solve some of these issues? (I could see it being useful with automation). Well, there's always pg_ctl initdb -s (not sure why it's -s instead of the more common -q...). ISTM it'd be better to point people that direction, but a silent option to initdb certainly wouldn't hurt (and would maybe simplify pg_ctl as well...) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Improving RLS planning
Currently, we don't produce very good plans when row-level security is enabled. An example is that, given create table t1 (pk1 int primary key, label text); create table t2 (pk2 int primary key, fk int references t1); then for select * from t1, t2 where pk1 = fk and pk2 = 42; you would ordinarily get a cheap plan like Nested Loop -> Index Scan using t2_pkey on t2 Index Cond: (pk2 = 42) -> Index Scan using t1_pkey on t1 Index Cond: (pk1 = t2.fk) But stick an RLS policy on t1, and that degrades to a seqscan, eg Nested Loop Join Filter: (t1.pk1 = t2.fk) -> Index Scan using t2_pkey on t2 Index Cond: (pk2 = 42) -> Seq Scan on t1 Filter: (label = 'public'::text) The reason for this is that we implement RLS by turning the reference to t1 into a sub-SELECT, and the planner's recursive invocation of subquery_planner produces only a seqscan path for t1, there not being any reason visible in the subquery for it to do differently. I have been thinking about improving this by allowing subquery_planner to generate parameterized paths; but the more I think about that the less satisfied I am with it. It will be quite expensive and probably will still fail to find desirable plans in many cases. (I've not given up on parameterized subquery paths altogether --- I just feel it'd be a brute-force and not very effective way of dealing with RLS.) The alternative I'm now thinking about pursuing is to get rid of the conversion of RLS quals to subqueries. Instead, we can label individual qual clauses with security precedence markings. Concretely, suppose we add an "int security_level" field to struct RestrictInfo. The semantics of this would be that a qual with a lower security_level value must be evaluated before a qual with a higher security_level value, unless the latter qual is leakproof. (It would likely also behoove us to add a "leakproof" bool field to struct RestrictInfo, to avoid duplicate leakproof-ness checks on quals. But that's just an optimization.) In the initial implementation, quals coming from a RangeTblEntry's securityQuals field would have security_level 0, quals coming from anywhere else would have security_level 1; except that if we know there are no security quals anywhere (ie not Query->hasRowSecurity), we could give all quals security_level 0. (I think this exception may be worth making because there's no need to test leakproofness for a qual with security level 0; it could never be a candidate for security delay anyway.) Having done that much, I think all we need in order to get rid of RLS subqueries, and just stick RLS quals into their relation's baserestrictinfo list, are two rules: 1. When selecting potential indexquals, a RestrictInfo can be considered for indexqual use only if it is leakproof or has security_level <= the minimum among the table's baserestrictinfo clauses. 2. In order_qual_clauses, sort first by security_level and second by cost. This would already be enough of a win to be worth doing. I think though that this mechanism can be extended to also allow getting rid of the restriction that security-barrier views can't be flattened. The idea would be to make sure that quals coming from above the SB view are given higher security_level values than quals within the SB view. We'd need some extra mechanism to make that possible --- perhaps an additional kind of node within jointree nests to show where there had been a security-barrier boundary, and then some smarts in distribute_qual_to_rels to prevent pushing upper quals down past a lower qual of strictly lesser security level. But that can come later. (We do not need such smarts to fix the RLS problem, because in the initial version, quals with lower security level than another qual could only exist at the baserel level.) In short, I'm proposing to throw away the entire existing implementation for planning of RLS and SB views, and start over. There are some corner cases I've not entirely worked out, in particular what security_level to assign to quals generated from EquivalenceClasses. A safe but not optimal answer would be to assign them the maximum security_level of any source clause of the EC. Maybe it's not worth working harder than that, because most equality operators are leakproof anyway, so that it wouldn't matter what level we assigned them. Before I start implementing this, can anyone see a fatal flaw in the design? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup stream xlog to tar
On Wed, Oct 26, 2016 at 2:00 AM, Magnus Haganderwrote: > Thanks, applied and pushed. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/23/2016 05:26 PM, Petr Jelinek wrote: On 23/10/16 16:26, Tomas Vondra wrote: On 10/22/2016 08:30 PM, Tomas Vondra wrote: ... Moreover, the slab/gen allocators proposed here seem like a better fit for reorderbuffer, e.g. because they release memory. I haven't looked at sb_alloc too closely, but I think it behaves more like AllocSet in this regard (i.e. keeping the memory indefinitely). For reorderbuffer, from what I've seen in practice, I'd prefer proper freeing to 5% performance gain as I seen walsenders taking GBs of memory dues to reoderbuffer allocations that are never properly freed. Right. > About your actual patch. I do like both the Slab and the Gen allocators and think that we should proceed with them for the moment. You definitely need to rename the Gen one (don't ask me to what though) as it sounds like "generic" and do some finishing touches but I think it's the way to go. I don't see any point in GenSlab anymore. Attached is a v5 of the patch that does this i.e. throws away the GenSlab allocator and modifies reorderbuffer in two steps. First (0001) it adds Slab allocator for TXN/Change allocations, and keeps the local slab cache for TupleBuf allocations (with a separate AllocSet context). Then (in 0002) it adds the Gen allocator for TupleBuf, removing the last bits of the local slab cache. I do think this version is is as simple as it gets - there's not much more we could simplify / remove. The main issue that bugs me is the name of the Gen allocator, but I don't have a good naming ideas :( The basic characteristics of Gen is that it does not reuse space released by pfree(), relying on the fact that the whole block will become free. That should be reflected in the name somehow, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v5.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wraparound warning
On Tue, Oct 25, 2016 at 1:32 PM, Bruce Momjianwrote: > On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote: >> On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjian wrote: >> > Do we still need to report the wraparound warning on server startup? >> > >> > LOG: MultiXact member wraparound protections are now enabled >> > >> > I thought that was going to be removed at some point, no? >> >> If you start with a 9.3.small cluster and do a pg_upgrade to 10, you >> could end up not seeing that message and not having those protections >> enabled, and it would be useful to be able to tell that from looking >> at the log. > > Uh, I am confused. I am seeing this on a fresh PG 10 initdb'ed cluster, > not one pg_upgraded. Is that expected? Yes, that is expected. If you initdb with a version >= 9.5, you will always see that message on every startup. However, if you initdb with an older version, things get messed up, and then you pg_upgrade, you might fail to see it. Not seeing it would indicate that your cluster is in a bad situation, which is something you might want to know about. >> is change things so that, starting in v10, a message is logged at >> startup if those protections are NOT enabled, and nothing is logged if >> they are enabled. Keep the message for the case where protections are >> enabled later, after startup time. > > How are those protections enabled/disabled? When you start the cluster, they are disabled. We then try to enable them every time SetOffsetVacuumLimit() is called. It tries to determine the oldest multixact offset by examining what it thinks is the oldest multixact. If that works, then it enables the protections. If that fails because the "oldest multixact" value is corrupted and that multixact doesn't actually exist, then it forces autovacuum to run in emergency mode, which will eventually correct the problem. Once the problem has been corrected, the next call to SetOffsetVacuumLimit() will enable the protections. Normally, your cluster is OK, so the very first call to that function enables the protections right at startup time. Maybe we should skip emitting the message in that case, so that the "normal" case doesn't generate extra log chatter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump: Simplify internal archive version handling
On 10/13/16 9:13 AM, Tom Lane wrote: > Peter Eisentrautwrites: >> I propose the attached patch to clean up some redundancy and confusion >> in pg_dump. > > Looks sane, though I'd suggest coding the access macros with shifts > not multiplies/divides. Done. > Since these numbers are purely internal and not stored in this form in the > file, I wonder whether we shouldn't drop the rightmost zero byte, ie make > it just not 00. That would save at least > one divide/shift when accessing. Done and committed. Thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 25, 2016 at 2:31 PM, Tom Lanewrote: > Merlin Moncure writes: >> What if the subsequent dataloss was in fact a symptom of the first >> outage? Is in theory possible for data to appear visible but then be >> eaten up as the transactions making the data visible get voided out by >> some other mechanic? I had to pull a quick restart the first time and >> everything looked ok -- or so I thought. What I think was actually >> happening is that data started to slip into the void. It's like >> randomly sys catalogs were dropping off. I bet other data was, too. I >> can pull older backups and verify that. It's as if some creeping xmin >> was snuffing everything out. > > Might be interesting to look at age(xmin) in a few different system > catalogs. I think you can ignore entries with age = 2147483647; > those should be frozen rows. But if you see entries with very large > ages that are not that, it'd be suspicious. nothing really stands out. The damage did re-occur after a dump/restore -- not sure about a cluster level rebuild. No problems previous to that. This suggests that if this theory holds the damage would have had to have been under the database level -- perhaps in clog. Maybe hint bits and clog did not agree as to commit or delete status for example. clog has plenty of history leading past the problem barrier: -rwx-- 1 postgres postgres 256K Jul 10 16:21 -rwx-- 1 postgres postgres 256K Jul 21 12:39 0001 -rwx-- 1 postgres postgres 256K Jul 21 13:19 0002 -rwx-- 1 postgres postgres 256K Jul 21 13:59 0003 Confirmation of problem re-occurrence will come in a few days.I'm much more likely to believe 6+sigma occurrence (storage, freak bug, etc) should it prove the problem goes away post rebuild. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
Merlin Moncurewrites: > What if the subsequent dataloss was in fact a symptom of the first > outage? Is in theory possible for data to appear visible but then be > eaten up as the transactions making the data visible get voided out by > some other mechanic? I had to pull a quick restart the first time and > everything looked ok -- or so I thought. What I think was actually > happening is that data started to slip into the void. It's like > randomly sys catalogs were dropping off. I bet other data was, too. I > can pull older backups and verify that. It's as if some creeping xmin > was snuffing everything out. Might be interesting to look at age(xmin) in a few different system catalogs. I think you can ignore entries with age = 2147483647; those should be frozen rows. But if you see entries with very large ages that are not that, it'd be suspicious. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Tue, Oct 25, 2016 at 12:57 PM, Alvaro Herrerawrote: > Merlin Moncure wrote: > >> After last night, I rebuilt the cluster, turning on checksums, turning >> on synchronous commit (it was off) and added a standby replica. This >> should help narrow the problem down should it re-occur; if storage is >> bad (note, other database on same machine is doing 10x write activity >> and is fine) or something is scribbling on shared memory (my guess >> here) then checksums should be popped, right? > > Not really sure about that. As I recall we compute the CRC on the > buffer's way out, based on the then-current contents, so if something > scribbles on the buffer while it's waiting to be evicted, the CRC > computation would include the new (corrupted) bytes rather than the > original ones -- see FlushBuffer. Huh. I have a new theory on this. Dealing with the reconstituted database, I'm finding more things -- functions and such, that are simply gone and had to be rebuilt -- they escaped notice as they were not in primary code paths. Recall that the original outage came manifested as queries getting stuck, possibly on spinlock (we don't know for sure). After that, things started to randomly disappear, possibly from system catalogs (but now need to go back and verify older data, I think). There were three autovac processes running. What if the subsequent dataloss was in fact a symptom of the first outage? Is in theory possible for data to appear visible but then be eaten up as the transactions making the data visible get voided out by some other mechanic? I had to pull a quick restart the first time and everything looked ok -- or so I thought. What I think was actually happening is that data started to slip into the void. It's like randomly sys catalogs were dropping off. I bet other data was, too. I can pull older backups and verify that. It's as if some creeping xmin was snuffing everything out. The confirmation of this should be obvious -- if that's indeed the case, the backup and restored cluster should no longer present data loss. Given that I was getting that every 1-2 days, we should be able to figure that out pretty soon. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
Merlin Moncure wrote: > After last night, I rebuilt the cluster, turning on checksums, turning > on synchronous commit (it was off) and added a standby replica. This > should help narrow the problem down should it re-occur; if storage is > bad (note, other database on same machine is doing 10x write activity > and is fine) or something is scribbling on shared memory (my guess > here) then checksums should be popped, right? Not really sure about that. As I recall we compute the CRC on the buffer's way out, based on the then-current contents, so if something scribbles on the buffer while it's waiting to be evicted, the CRC computation would include the new (corrupted) bytes rather than the original ones -- see FlushBuffer. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrerawrote: > Merlin Moncure wrote: >> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure wrote: > >> > Corruption struck again. >> > This time got another case of view busted -- attempting to create >> > gives missing 'type' error. >> >> Call it a hunch -- I think the problem is in pl/sh. > > I've heard that before. well, yeah, previously I had an issue where the database crashed during a heavy concurrent pl/sh based load. However the problems went away when I refactored the code. Anyways, I looked at the code and couldn't see anything obviously wrong so who knows? All I know is my production database is exploding continuously and I'm looking for answers. The only other extension in heavy use on this servers is postgres_fdw. The other database on the cluster is fine, which kind of suggests we are not facing clog or WAL type problems. After last night, I rebuilt the cluster, turning on checksums, turning on synchronous commit (it was off) and added a standby replica. This should help narrow the problem down should it re-occur; if storage is bad (note, other database on same machine is doing 10x write activity and is fine) or something is scribbling on shared memory (my guess here) then checksums should be popped, right? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wraparound warning
On Tue, Oct 25, 2016 at 01:19:26PM -0400, Robert Haas wrote: > On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjianwrote: > > Do we still need to report the wraparound warning on server startup? > > > > LOG: MultiXact member wraparound protections are now enabled > > > > I thought that was going to be removed at some point, no? > > If you start with a 9.3.small cluster and do a pg_upgrade to 10, you > could end up not seeing that message and not having those protections > enabled, and it would be useful to be able to tell that from looking > at the log. Uh, I am confused. I am seeing this on a fresh PG 10 initdb'ed cluster, not one pg_upgraded. Is that expected? > Maybe what we should do (since this message obviously annoys everyone) It is a persistent reminder our of engineering failure. I guess it should annoy us, and maybe that's OK. > is change things so that, starting in v10, a message is logged at > startup if those protections are NOT enabled, and nothing is logged if > they are enabled. Keep the message for the case where protections are > enabled later, after startup time. How are those protections enabled/disabled? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill - rare buildfarm failures in test_shm_mq_pipelined()
Stefan Kaltenbrunnerwrites: > Spoonbill is very rarely (ie once every few months) failing like this: Yeah, I've been wondering about that ... > Any ideas on what is causing this? No, but it would sure be interesting to get a stack trace showing where the SIGFPE is happening. Could you change the ereport(ERROR) in FloatExceptionHandler to ereport(PANIC) and run that single test over and over till you get a panic? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Mon, Oct 24, 2016 at 4:15 PM, Peter Eisentrautwrote: > On 10/24/16 11:57 AM, Robert Haas wrote: >> Today, since the host part can't include a >> port specifier, it's regarded as part of the IP address, and I think >> it would probably be a bad idea to change that, as I believe Victor's >> patch would. He seems to have it in mind that we could allow things >> like host=[1:2:3::4:5:6] or host=[1:2:3::4:5]:6, which would might be >> helpful for the future but doesn't avoid changing the meaning of >> connection strings that work today. > > Let's keep in mind here that the decision to allow database names to > contain a connection parameter substructure has caused some security > issues. Let's not create more levels of ambiguity and the need to pass > around override flags. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Mon, Oct 24, 2016 at 4:40 PM, Alvaro Herrerawrote: > Umm, my recollection regarding IPv6 parsing in the URI syntax is that > those must appear inside square brackets -- it's not valid to have the > IPv6 address outside brackets, and the port number is necessarily > outside the brackets. So there's no ambiguity. If the current patch is > allowing IPv6 address to appear outside of brackets, that seems like a > bug to me. I agree. > The string conninfo spec should not accept port numbers in the "host" > part. (Does it?) Not currently, but the proposed patch would cause it to do so. >> So now I think that to make this work correctly, we're going to need >> to change both the URL parser and also add parsing for the host and >> port. Let's say the user says this: >> >> postgresql://[1::2]:3,[4::5],[6::7]::8/ > > (There's a double colon before 8, I suppose that's just a typo.) Oh, yeah. Right. >> It is not obvious what it means if there are multiple ports but the >> number doesn't equal the number of hosts. > > I think we should reject the case of differing number of elements and > neither host nor port is a singleton, as an error. The suggestion to > ignore some parts seems too error-prone. OK, can do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wraparound warning
On Tue, Oct 25, 2016 at 11:20 AM, Bruce Momjianwrote: > Do we still need to report the wraparound warning on server startup? > > LOG: MultiXact member wraparound protections are now enabled > > I thought that was going to be removed at some point, no? If you start with a 9.3.small cluster and do a pg_upgrade to 10, you could end up not seeing that message and not having those protections enabled, and it would be useful to be able to tell that from looking at the log. Maybe what we should do (since this message obviously annoys everyone) is change things so that, starting in v10, a message is logged at startup if those protections are NOT enabled, and nothing is logged if they are enabled. Keep the message for the case where protections are enabled later, after startup time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
Alvaro Herrerawrites: > BTW, should we cost push-down-able quals differently, say discount some > fraction of the cost, to reflect the fact that they are cheaper to run? > However, since the decision of which ones to push down depends on the > cost, and the cost would depend on which ones we push down, it looks > rather messy. I don't think our cost model is anywhere near refined enough to make it worth trying to distinguish that. (Frankly, I'm pretty skeptical of this entire patch being worth the trouble...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] spoonbill - rare buildfarm failures in test_shm_mq_pipelined()
Spoonbill is very rarely (ie once every few months) failing like this: [2016-08-29 18:15:35.273 CEST:57c45f88.52d4:8] LOG: statement: SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,27)), 200, 3); [2016-08-29 18:15:35.282 CEST:57c45f88.52d4:9] ERROR: floating-point exception [2016-08-29 18:15:35.282 CEST:57c45f88.52d4:10] DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, such as division by zero. [2016-08-29 18:15:35.282 CEST:57c45f88.52d4:11] STATEMENT: SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,27)), 200, 3); Some examples: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-10-22%2023%3A00%3A06 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-08-29%2011%3A00%3A06 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill=2016-06-21%2023%3A00%3A06 Any ideas on what is causing this? IIrc we had issues with that specific test on spoonbill (and other sparc based boxes) before so maybe we failed to fix the issue completely... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Mon, Oct 24, 2016 at 2:55 AM, Michael Paquierwrote: > Yep, minRecoveryPoint still gets updated when the last checkpoint > record is the last restart point to avoid a hot standby to allow > read-only connections at a LSN-point earlier than the last shutdown. > Anyway, we can clearly reject 1. in the light of > https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com > when playing with different stop locations at recovery. I can't understand what point you're driving at here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Mon, Oct 24, 2016 at 12:26 AM, Amit Kapilawrote: >> The consensus solution on this thread seems to be that we should have >> pg_do_stop_backup() return the last-replayed XLOG location as the >> backup end point. If the control file has been updated with a newer >> redo location, then the associated checkpoint record has surely been >> replayed, so we'll definitely have enough WAL to replay that >> checkpoint record (and we don't need to replay anything else, because >> we're supposing that this is the case where the minimum recovery point >> precedes the redo location). Although this will work, it might >> include more WAL in the backup than is strictly necessary. If the >> additional WAL replayed beyond the minimum recovery point does NOT >> include a checkpoint, we don't need any of it; if it does, we need >> only the portion up to and including the last checkpoint record, and >> not anything beyond that. > > You are right that it will include additional WAL than strictly > necessary, but that can happen today as well because minrecoverypoint > could be updated after you have established restart point for > do_pg_start_backup(). Do you want to say that even without patch in > some cases we are including additional WAL in backup than what is > strictly necessary, so it is better to improve the situation? In that case, including additional WAL is *necessary*. If the minimum recovery point is updated after do_pg_start_backup(), pages bearing LSNs newer than the old minimum recovery point could conceivably have been copied as part of the backup, and therefore we need to replay up through the new minimum recovery point to guarantee consistency. >> I can think of two solutions that would be "tighter": >> >> 1. When performing a restartpoint, update the minimum recovery point >> to just beyond the checkpoint record. I think this can't hurt anyone >> who is actually restarting recovery on the same machine, because >> that's exactly the point where they are going to start recovery. A >> minimum recovery point which precedes the point at which they will >> start recovery is no better than one which is equal to the point at >> which they will start recovery. > > I think this will work but will cause an unnecessary control file > update for the cases when buffer flush will anyway do it. It can work > if we try to do only when required (minrecoverypoint is lesser than > lastcheckpoint) after flush of buffers. Sure, that doesn't sound too hard to implement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Sun, Oct 23, 2016 at 11:48 PM, Kyotaro HORIGUCHIwrote: >> I can think of two solutions that would be "tighter": >> >> 1. When performing a restartpoint, update the minimum recovery point >> to just beyond the checkpoint record. I think this can't hurt anyone >> who is actually restarting recovery on the same machine, because >> that's exactly the point where they are going to start recovery. A >> minimum recovery point which precedes the point at which they will >> start recovery is no better than one which is equal to the point at >> which they will start recovery. > > This is a candidate that I thought of. But I avoided to change > the behavior of minRecoveryPoint that (seems to me that) it > describes only buffer state. So checkpoint with no update doesn't > advances minRecoveryPoint as my understanding. I don't really know what you mean by this. minRecoveryPoint is the point up to which we need to replay WAL in order to achieve consistency - the distinction between "buffer state" and something else is artificial. My point is that the amount of WAL we need to replay can never be negative, but that's what a minimum recovery point prior to the end of the checkpoint record used for start-of-REDO implies. >> 2. In perform_base_backup(), if the endptr returned by >> do_pg_stop_backup() precedes the end of the checkpoint record returned >> by do_pg_start_backup(), use the latter value instead. Unfortunately, >> that's not so easy: we can't just say if (endptr < starptr) startptr = >> endptr; because startptr is the *start* of the checkpoint record, not >> the end. I suspect somebody could figure out a solution to this >> problem, though. > > Yes, and the reason I rejected it was that it is not logical, > maybe the same meaning to it would cause another problem later. I don't know why this isn't logical. If you want to say it's going to cause another problem later, you should say what problem you think it will cause, not just guess that there might be one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
Dilip Kumar wrote: > #2. Currently quals are ordered based on cost (refer > order_qual_clauses), But once we pushdown some of the quals, then > those quals will always be executed first. Can this create problem ? We don't promise order of execution (which is why we can afford to sort on cost), but I think it makes sense to keep a rough ordering based on cost, and not let this push-down affect those ordering decisions too much. I think it's fine to push-down quals that are within the same order of magnitude of the cost of a pushable condition, while keeping any other much-costlier qual (which could otherwise be pushed down) together with non-pushable conditions; this would sort-of guarantee within-order-of- magnitude order of execution of quals. Hopefully an example clarifies what I mean. Let's suppose you have three quals, where qual2 is non-pushable but 1 and 3 are. cost(1)=10, cost(2)=11, cost(3)=12. Currently, they are executed in that order. If you were to compare costs in the straightforward way, you would push only 1 (because 3 is costlier than 2 which is not push-down-able). With fuzzy comparisons, you'd push both 1 and 3, because cost of 3 is close enough to that of qual 2. But if cost(3)=100 then only push qual 1, and let qual 3 be evaluated together with 2 outside the scan node. BTW, should we cost push-down-able quals differently, say discount some fraction of the cost, to reflect the fact that they are cheaper to run? However, since the decision of which ones to push down depends on the cost, and the cost would depend on which ones we push down, it looks rather messy. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup stream xlog to tar
On Tue, Oct 25, 2016 at 2:52 PM, Michael Paquierwrote: > On Tue, Oct 25, 2016 at 7:12 PM, Magnus Hagander > wrote: > > On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquier < > michael.paqu...@gmail.com> > > wrote: > >> > >> On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander > >> wrote: > >> + if (format == 'p') > >> + stream.walmethod = CreateWalDirectoryMethod(param->xlog, > do_sync); > >> + else > >> + stream.walmethod = CreateWalTarMethod(param->xlog, > >> compresslevel, do_sync); > >> LogStreamerMain() exits immediately once it is done, but I think that > >> we had better be tidy here and clean up the WAL methods that have been > >> allocated. I am thinking here about a potentially retry method on > >> failure, though the best shot in this area would be with > >> ReceiveXlogStream(). > > > > Wouldn't the same be needed in pg_receivexlog.c in that case? > > Oops, missed that. Thanks for the extra checks. Attached is an updated > patch. > Thanks, applied and pushed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Proposal: scan key push down to heap [WIP]
On Fri, Oct 14, 2016 at 11:54 AM, Andres Freundwrote: > I don't think it's a good idea to do this under the content lock in any > case. But luckily I don't think we have to do so at all. > > Due to pagemode - which is used by pretty much everything iterating over > heaps, and definitely seqscans - the key test already happens without > the content lock held, in heapgettup_pagemode(): Yes, you are right. Now after this clarification, it's clear that even we push down the key we are not evaluating it under buffer content lock. As of now, I have done my further analysis by keeping in mind only pushing 'var op const'. Below are my observations. #1. If we process the qual in executor, all temp memory allocation are under "per_tuple_context" which get reset after each tuple process. But, on other hand if we do that in heap, then that will be under "per_query_context". This restrict us to pushdown any qual which need to do memory allocations like "toastable" field. Is there any other way to handle this ? #2. Currently quals are ordered based on cost (refer order_qual_clauses), But once we pushdown some of the quals, then those quals will always be executed first. Can this create problem ? consider below example.. create or replace function f1(anyelement) returns bool as $$ begin raise notice '%', $1; return true; end; $$ LANGUAGE plpgsql cost 0.01; select * from t3 where a>1 and f1(b); In this case "f1(b)" will always be executed as first qual (cost is set very low by user) hence it will raise notice for all the tuple. But if we pushdown "a>1" qual to heap then only qualified tuple will be passed to "f1(b)". Is it behaviour change ? I know that, it can also impact the performance, because when user has given some function with very low cost then that should be executed first as it may discard most of the tuple. One solution to this can be.. 1. Only pushdown if either all quals are of same cost. 2. Pushdown all quals until we find first non pushable qual (this way we can maintain the same qual execution order what is there in existing system). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Does it make sense to add a -q (quiet) flag to initdb?
Hello, Per: https://www.commandprompt.com/blog/can_i_make_initdb_quiet/ This was a question that was asked on #postgresql. Obviously we found a work around but I wonder if it makes sense to add a -q to solve some of these issues? (I could see it being useful with automation). Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reload config instructions
On Sat, Oct 22, 2016 at 11:39:40AM -0400, Bruce Momjian wrote: > I found our postgresql.conf and pg_hba.conf config files had > inconsistent comment instructions about reloading, and didn't mention > SELECT pg_reload_conf(). > > The attached doc patch fixes this. Patch applied to head. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Wraparound warning
Do we still need to report the wraparound warning on server startup? LOG: MultiXact member wraparound protections are now enabled I thought that was going to be removed at some point, no? -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recommend wrappers of PG_DETOAST_DATUM_PACKED()
Noah Mischwrites: > When commit 3e23b68dac006e8deb0afa327e855258df8de064 introduced single-byte > varlena headers, its fmgr.h changes presented PG_GETARG_TEXT_PP() and > PG_GETARG_TEXT_P() as equals. Its postgres.h changes presented > PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the exceptional case. Let's > firmly recommend PG_GETARG_TEXT_PP() over PG_GETARG_TEXT_P(); likewise for > other ...PP() macros. This shaves cycles and brings consistency of style. +1 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Recommend wrappers of PG_DETOAST_DATUM_PACKED()
When commit 3e23b68dac006e8deb0afa327e855258df8de064 introduced single-byte varlena headers, its fmgr.h changes presented PG_GETARG_TEXT_PP() and PG_GETARG_TEXT_P() as equals. Its postgres.h changes presented PG_DETOAST_DATUM_PACKED() and VARDATA_ANY() as the exceptional case. Let's firmly recommend PG_GETARG_TEXT_PP() over PG_GETARG_TEXT_P(); likewise for other ...PP() macros. This shaves cycles and brings consistency of style. If the attached policy-setting patch seems reasonable, I will follow it with a mechanical patch adopting the recommended macros throughout the tree. (If any code does want the alignment it gets from an obsolecent macro, I will add a comment to that code.) Thanks, nm diff --git a/src/include/fmgr.h b/src/include/fmgr.h index 0878418..2ae0b64 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -243,13 +243,9 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena * datum); /* and this if you can handle 1-byte-header datums: */ #define PG_GETARG_VARLENA_PP(n) PG_DETOAST_DATUM_PACKED(PG_GETARG_DATUM(n)) /* DatumGetFoo macros for varlena types will typically look like this: */ -#define DatumGetByteaP(X) ((bytea *) PG_DETOAST_DATUM(X)) #define DatumGetByteaPP(X) ((bytea *) PG_DETOAST_DATUM_PACKED(X)) -#define DatumGetTextP(X) ((text *) PG_DETOAST_DATUM(X)) #define DatumGetTextPP(X) ((text *) PG_DETOAST_DATUM_PACKED(X)) -#define DatumGetBpCharP(X) ((BpChar *) PG_DETOAST_DATUM(X)) #define DatumGetBpCharPP(X)((BpChar *) PG_DETOAST_DATUM_PACKED(X)) -#define DatumGetVarCharP(X)((VarChar *) PG_DETOAST_DATUM(X)) #define DatumGetVarCharPP(X) ((VarChar *) PG_DETOAST_DATUM_PACKED(X)) #define DatumGetHeapTupleHeader(X) ((HeapTupleHeader) PG_DETOAST_DATUM(X)) /* And we also offer variants that return an OK-to-write copy */ @@ -264,13 +260,9 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena * datum); #define DatumGetBpCharPSlice(X,m,n) ((BpChar *) PG_DETOAST_DATUM_SLICE(X,m,n)) #define DatumGetVarCharPSlice(X,m,n) ((VarChar *) PG_DETOAST_DATUM_SLICE(X,m,n)) /* GETARG macros for varlena types will typically look like this: */ -#define PG_GETARG_BYTEA_P(n) DatumGetByteaP(PG_GETARG_DATUM(n)) #define PG_GETARG_BYTEA_PP(n) DatumGetByteaPP(PG_GETARG_DATUM(n)) -#define PG_GETARG_TEXT_P(n) DatumGetTextP(PG_GETARG_DATUM(n)) #define PG_GETARG_TEXT_PP(n) DatumGetTextPP(PG_GETARG_DATUM(n)) -#define PG_GETARG_BPCHAR_P(n) DatumGetBpCharP(PG_GETARG_DATUM(n)) #define PG_GETARG_BPCHAR_PP(n) DatumGetBpCharPP(PG_GETARG_DATUM(n)) -#define PG_GETARG_VARCHAR_P(n) DatumGetVarCharP(PG_GETARG_DATUM(n)) #define PG_GETARG_VARCHAR_PP(n) DatumGetVarCharPP(PG_GETARG_DATUM(n)) #define PG_GETARG_HEAPTUPLEHEADER(n) DatumGetHeapTupleHeader(PG_GETARG_DATUM(n)) /* And we also offer variants that return an OK-to-write copy */ @@ -284,6 +276,21 @@ extern struct varlena *pg_detoast_datum_packed(struct varlena * datum); #define PG_GETARG_TEXT_P_SLICE(n,a,b) DatumGetTextPSlice(PG_GETARG_DATUM(n),a,b) #define PG_GETARG_BPCHAR_P_SLICE(n,a,b) DatumGetBpCharPSlice(PG_GETARG_DATUM(n),a,b) #define PG_GETARG_VARCHAR_P_SLICE(n,a,b) DatumGetVarCharPSlice(PG_GETARG_DATUM(n),a,b) +/* + * Obsolescent variants that guarantee INT alignment for the return value. + * Few operations on these particular types need alignment, mainly operations + * that cast the VARDATA pointer to a type like int16[]. Most code should use + * the ...PP(X) counterpart. Nonetheless, these appear frequently in code + * predating the PostgreSQL 8.3 introduction of the ...PP(X) variants. + */ +#define DatumGetByteaP(X) ((bytea *) PG_DETOAST_DATUM(X)) +#define DatumGetTextP(X) ((text *) PG_DETOAST_DATUM(X)) +#define DatumGetBpCharP(X) ((BpChar *) PG_DETOAST_DATUM(X)) +#define DatumGetVarCharP(X)((VarChar *) PG_DETOAST_DATUM(X)) +#define PG_GETARG_BYTEA_P(n) DatumGetByteaP(PG_GETARG_DATUM(n)) +#define PG_GETARG_TEXT_P(n) DatumGetTextP(PG_GETARG_DATUM(n)) +#define PG_GETARG_BPCHAR_P(n) DatumGetBpCharP(PG_GETARG_DATUM(n)) +#define PG_GETARG_VARCHAR_P(n) DatumGetVarCharP(PG_GETARG_DATUM(n)) /* To return a NULL do this: */ #define PG_RETURN_NULL() \ diff --git a/src/include/postgres.h b/src/include/postgres.h index be4d0d6..5844f78 100644 --- a/src/include/postgres.h +++ b/src/include/postgres.h @@ -287,20 +287,18 @@ typedef struct /* Externally visible macros */ /* - * VARDATA, VARSIZE, and SET_VARSIZE are the recommended API for most code - * for varlena datatypes. Note that they only work on untoasted, - * 4-byte-header Datums! - * - * Code that wants to use 1-byte-header values without detoasting should - * use
Re: [HACKERS] [PATCH] pgpassfile connection option
On 10/16/2016 12:09 PM, Fabien COELHO wrote: Patch applies cleanly, make check ok... however AFAICS it only means that it compiles but it is not tested in anyway... This is is annoying. Well I'm not sure whether other options are tested either, but they should. Thanks for taking the time to review my patch! I haven't found much regarding the testing of connection options. However since there isn't anything fancy going on in PasswordFromFile() I'm not too worried about that. The documentation needs to be updated. I've written a couple lines now. I aligned the definition of the connection option and the environment variable with that of other (conn.opt) pairs and added mention of the different options to the doc of the "Password File". The reported password file is wrong. It is even more funny if ~/.pgpass contains the right password: the pgpassfile option *is* taken into account, ./foo is read and it fails, but the error message is not updated and points to the wrong file. The error message stuff should be updated to look for the pgpassfile connection string option... That was indeed an Error on my side, I hadn't updated the errormessages to inform you which file has been used. So attached is an updated version of the patch. I'd like to ask for some input on how to handle invalid files - right now no message is shown, the user just gets a password prompt as a result, however I think a message when the custom pgpassfile hasn't been found would be helpful. Kind regards, Julian -- Julian Markwort Westphalian Wilhelms-University in Münster julian(dot)markwort(at)uni-muenster(dot)de diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 4e34f00..1bd5597 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -930,7 +930,7 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname Note that authentication is likely to fail if host is not the name of the server at network address hostaddr. Also, note that host rather than hostaddr -is used to identify the connection in ~/.pgpass (see +is used to identify the connection in a password file (see ). @@ -986,6 +986,16 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname + + pgpassfile + + +Specifies the name of the file used to lookup passwords. +Defaults to the password file (see ). + + + + connect_timeout @@ -6862,9 +6872,8 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough) PGPASSFILE - PGPASSFILE specifies the name of the password file to - use for lookups. If not set, it defaults to ~/.pgpass - (see ). + PGPASSFILE behaves the same as the connection parameter. @@ -7136,13 +7145,15 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough) - The file .pgpass in a user's home directory or the - file referenced by PGPASSFILE can contain passwords to + The file .pgpass in a user's home directory can contain passwords to be used if the connection requires a password (and no password has been specified otherwise). On Microsoft Windows the file is named %APPDATA%\postgresql\pgpass.conf (where %APPDATA% refers to the Application Data subdirectory in the user's profile). + Alternatively, a password file can be specified + using the connection parameter + or the environment variable PGPASSFILE. diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index f3a9e5a..db1de26 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -184,6 +184,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = { "Database-Password", "*", 20, offsetof(struct pg_conn, pgpass)}, + {"pgpassfile", "PGPASSFILE", NULL, NULL, + "Database-Password-File", "", 64, + offsetof(struct pg_conn, pgpassfile)}, + {"connect_timeout", "PGCONNECT_TIMEOUT", NULL, NULL, "Connect-timeout", "", 10, /* strlen(INT32_MAX) == 10 */ offsetof(struct pg_conn, connect_timeout)}, @@ -375,7 +379,7 @@ static int parseServiceFile(const char *serviceFile, bool *group_found); static char *pwdfMatchesString(char *buf, char *token); static char *PasswordFromFile(char *hostname, char *port, char *dbname, -char *username); +char *username, char *pgpassfile); static bool getPgPassFilename(char *pgpassfile); static void dot_pg_pass_warning(PGconn *conn); static void default_threadlock(int acquire); @@ -806,8 +810,9 @@ connectOptions2(PGconn *conn) { if (conn->pgpass) free(conn->pgpass); + /* We'll pass conn->pgpassfile regardless of it's contents - checks happen in
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Mon, Oct 17, 2016 at 4:00 PM, Masahiko Sawadawrote: > Attached latest patch. > Please review it. Okay, so let's move on with this patch... + + The keyword ANY is omissible, but note that there is + not compatibility between PostgreSQL version 10 and + 9.6 or before. For example, 1 (s1, s2) is the same as the + configuration with FIRST and + num_sync equal to 1 in PostgreSQL 9.6 + or before. On the other hand, It's the same as the configuration with + ANY and num_sync equal to + 1 in PostgreSQL 10 or later. + This paragraph could be reworded: If FIRST or ANY are not specified, this parameter behaves as ANY. Note that this grammar is incompatible with PostgreSQL 9.6, where no keyword specified is equivalent as if FIRST was specified. In short, there is no real need to specify num_sync as this behavior does not have changed, as well as it is not necessary to mention pre-9.6 versions as the multi-sync grammar has been added in 9.6. -Specifying more than one standby name can allow very high availability. Why removing this sentence? +The keyword ANY, coupeld with an interger number N, s/coupeld/coupled/ and s/interger/integer/, for a double hit in one line, still... +The keyword ANY, coupeld with an interger number N, +chooses N standbys in a set of standbys with the same, lowest, +priority and makes transaction commit when WAL records are received +those N standbys. This could be reworded more simply, for example: The keyword ANY, coupled with an integer number N, makes transaction commits wait until WAL records are received from N connected standbys among those defined in the list of synchronous_standby_names. +s2 and s3 wil be considered as synchronous standby s/wil/will/ + when standby is considered as a condidate of quorum commit. s/condidate/candidate/ [... stopping here ...] Please be more careful with the documentation and comment grammar. There are other things in the patch.. A bunch of comments at the top of syncrep.c need to be updated. +extern List *SyncRepGetSyncStandbysPriority(bool *am_sync); +extern List *SyncRepGetSyncStandbysQuorum(bool *am_sync); Those two should be static routines in syncrep.c, let's keep the whole logic about quorum and higher-priority definition only there and not bother the callers of them about that. + if (SyncRepConfig->sync_method == SYNC_REP_PRIORITY) + return SyncRepGetSyncStandbysPriority(am_sync); + else /* SYNC_REP_QUORUM */ + return SyncRepGetSyncStandbysQuorum(am_sync); Both routines share the same logic to detect if a WAL sender can be selected as a candidate for sync evaluation or not, still per the selection they do I agree that it is better to keep them as separate. + /* In quroum method, all sync standby priorities are always 1 */ + if (found && SyncRepConfig->sync_method == SYNC_REP_QUORUM) + priority = 1; Honestly I don't understand why you are enforcing that. Priority can be important for users willing to switch from ANY to FIRST to have a look immediately at what are the standbys that would become sync or potential. else if (list_member_int(sync_standbys, i)) - values[7] = CStringGetTextDatum("sync"); + values[7] = SyncRepConfig->sync_method == SYNC_REP_PRIORITY ? + CStringGetTextDatum("sync") : CStringGetTextDatum("quorum"); The comment at the top of this code block needs to be refreshed. The representation given to the user in pg_stat_replication is not enough IMO. For example, imagine a cluster with 4 standbys: =# select application_name, sync_priority, sync_state from pg_stat_replication ; application_name | sync_priority | sync_state --+---+ node_5433| 0 | async node_5434| 0 | async node_5435| 0 | async node_5436| 0 | async (4 rows) If FIRST N is used, is it easy for the user to understand what are the nodes in sync: =# alter system set synchronous_standby_names = 'FIRST 2 (node_5433, node_5434, node_5435)'; ALTER SYSTEM =# select pg_reload_conf(); pg_reload_conf t (1 row) =# select application_name, sync_priority, sync_state from pg_stat_replication ; application_name | sync_priority | sync_state --+---+ node_5433| 1 | sync node_5434| 2 | sync node_5435| 3 | potential node_5436| 0 | async (4 rows) In this case it is easy to understand that two nodes are required to be in sync. When using ANY similarly for three nodes, here is what pg_stat_replication tells: =# select application_name, sync_priority, sync_state from pg_stat_replication ; application_name | sync_priority | sync_state
Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1
I wrote: > Anyway, I have prepared a patch along the lines you suggest. It occurred to me that the documentation still suggests that you should add a declaration to a C function; I have fixed that too. I'll add the patch to the next commitfest. Yours, Laurenz Albe 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup stream xlog to tar
On Tue, Oct 25, 2016 at 7:12 PM, Magnus Haganderwrote: > On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquier > wrote: >> >> On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander >> wrote: >> + if (format == 'p') >> + stream.walmethod = CreateWalDirectoryMethod(param->xlog, do_sync); >> + else >> + stream.walmethod = CreateWalTarMethod(param->xlog, >> compresslevel, do_sync); >> LogStreamerMain() exits immediately once it is done, but I think that >> we had better be tidy here and clean up the WAL methods that have been >> allocated. I am thinking here about a potentially retry method on >> failure, though the best shot in this area would be with >> ReceiveXlogStream(). > > Wouldn't the same be needed in pg_receivexlog.c in that case? Oops, missed that. Thanks for the extra checks. Attached is an updated patch. -- Michael diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 16cab97..e2875df 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -495,6 +495,13 @@ LogStreamerMain(logstreamer_param *param) } PQfinish(param->bgconn); + + if (format == 'p') + FreeWalDirectoryMethod(); + else + FreeWalTarMethod(); + pg_free(stream.walmethod); + return 0; } diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index bbdf96e..99445e6 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -352,6 +352,10 @@ StreamLog(void) } PQfinish(conn); + + FreeWalDirectoryMethod(); + pg_free(stream.walmethod); + conn = NULL; } diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index d1dc046..656622f 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -299,6 +299,13 @@ CreateWalDirectoryMethod(const char *basedir, bool sync) return method; } +void +FreeWalDirectoryMethod(void) +{ + pg_free(dir_data->basedir); + pg_free(dir_data); +} + /*- * WalTarMethod - write wal to a tar file containing pg_xlog contents @@ -611,6 +618,9 @@ tar_sync(Walfile f) Assert(f != NULL); tar_clear_error(); + if (!tar_data->sync) + return 0; + /* * Always sync the whole tarfile, because that's all we can do. This makes * no sense on compressed files, so just ignore those. @@ -842,7 +852,8 @@ tar_finish(void) #endif /* sync the empty blocks as well, since they're after the last file */ - fsync(tar_data->fd); + if (tar_data->sync) + fsync(tar_data->fd); if (close(tar_data->fd) != 0) return false; @@ -890,3 +901,14 @@ CreateWalTarMethod(const char *tarbase, int compression, bool sync) return method; } + +void +FreeWalTarMethod(void) +{ + pg_free(tar_data->tarfilename); +#ifdef HAVE_LIBZ + if (tar_data->compression) + pg_free(tar_data->zlibOut); +#endif + pg_free(tar_data); +} diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h index 0c8eac7..8cea8ff 100644 --- a/src/bin/pg_basebackup/walmethods.h +++ b/src/bin/pg_basebackup/walmethods.h @@ -43,3 +43,7 @@ struct WalWriteMethod */ WalWriteMethod *CreateWalDirectoryMethod(const char *basedir, bool sync); WalWriteMethod *CreateWalTarMethod(const char *tarbase, int compression, bool sync); + +/* Cleanup routines for previously-created methods */ +void FreeWalDirectoryMethod(void); +void FreeWalTarMethod(void); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On 10/25/16 1:38 AM, Haribabu Kommi wrote: > Here I attached the first version of patch that supports both EUI-48 and > EUI-64 type > Mac addresses with a single datatype called macaddr. This is an variable > length > datatype similar like inet. It can store both 6 and 8 byte addresses. > Variable length > type is used because in case in future, if MAC address gets enhanced, > still this type > can support without breaking DISK compatibility. Since the world knows the 6-byte variant as MAC-48, shouldn't it be renamed to macaddr48 or even mac48? > Currently the patch lacks of documentation. Comments? For patches like this, it would be good if you included a mock commit message so that someone who comes in late knows what's going on. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/10/25 18:58, Ashutosh Bapat wrote: You wrote: 13. The comment below is missing the main purpose i.e. creating a a unique alias, in case the relation gets converted into a subquery. Lowest or highest relid will create a unique alias at given level of join and that would be more future proof. If we decide to consider paths for every join order, following comment will no more be true. +/* + * Set the relation index. This is defined as the position of this + * joinrel in the join_rel_list list plus the length of the rtable list. + * Note that since this joinrel is at the end of the list when we are + * called, we can get the position by list_length. + */ +fpinfo->relation_index = +list_length(root->parse->rtable) + list_length(root->join_rel_list); I wrote: I don't agree on that point. As I said before, the approach using the lowest/highest relid would make a remote query having nested subqueries difficult to read. And even if we decided to consider paths for every join order, we could define the relation_index safely, by modifying postgresGetForeignJoinPaths so that it (1) sets the relation_index the proposed way at the first time it is called and (2) avoids setting the relation_index after the first call, by checking to see fpinfo->relation_index > 0. OK. Although, the alias which contains a number, which apparently doesn't show any relation with the relation (no pun intended :)) being deparsed, won't make much sense in the deparsed query. But I am fine leaving this to the committer's judgement. Fine with me. May be you want to add an assert here making sure that relation_index is set only once. Something like Assert(fpinfo->relation_index == 0) just before setting it. Will do. 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. Right. I think that not only avoids creating redundant data to find the alias of a given Var node but also would be able to find it in optimal cost. Instead of that, we should probably gather all the alias information once and store it in context as an array indexed by relid. While deparsing a Var we can get the targetlist and alias for a given relation by using var->varno as index. And then search for given Var node in the targetlist to deparse the column name by its position in the targetlist. For the relations that are not converted into subqueries, this array will not hold any information and the Var will be deparsed as it's being done today. Sorry, I don't understand this part. How does that work when creating a remote query having nested subqueries? Is that able to search for a given Var node efficiently? For every relation that is deparsed as a subquery, we will create a separate context. Each such context will have an array described above. This array will contain the targetlist and aliases for all the relations, covered by that relation, that are required to be deparsed as subqueries. These arrays bubble up to topmost relation that is not required to be deparsed as subquery. When a relation is required to be deparsed as a subquery, its immediate upper relation sets the alias and targetlist chosen for that relation at all the indexes corresponding to all the base relations covered by that relation. Thanks for the explanation! For example, let's assume that a relation (1, 2, 3) is required to be deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5) (thus the other joining relation being (4,5)). While deparsing for relation (1,2,3,4,5), the context will contain a 5 element array, with positions 1, 2, 3 filled by same targetlist and alias whereas positions 4 and 5 will not be filled as those relations are not deparsed as subqueries. Sorry, I don't understand this. In that case, the immediate upper relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases for *the join relation (1, 2, 3) somewhere*, not the targetlist and aliases for each of the component relations 1, 2, and 3, because the join relation is deparsed as a subquery. Maybe I'm missing something, though. Let's assume in relation (1, 2, 3), (1, 3) in turn requires subquery but (2) does not. Thus the context created while deparsing (1, 2, 3) will have a 3 element array with positions 1 and 3 containing the same targetlist and alias, where as position 2 will be empty. When deparsing a Var node with varno = N and varattno = m, if the nth position in the array in the context is empty, that Var node will be deparsed as rN.. What happens when deparsing eg, a Var with varno = 2 at the topmost relation (1, 2, 3, 4, 5)? The second position of the array is empty, but the join relation (1, 2, 3) is deparsed as a subquery, so the Var should be deparsed as an alias of an output column of the subquery at the topmost relation, I think. But if that position is has alias sZ, then we search for that Var node in the targetlist and if it's found at kth position in the targetlist,
Re: [HACKERS] Declarative partitioning - another take
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langotewrote: > On 2016/10/05 2:12, Robert Haas wrote: > Attached revised patches. Few assorted review comments for 0001-Catalog*: 1. @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate, { .. + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from partitioned table \"%s\"", + RelationGetRelationName(rel)), + errhint("Try the COPY (SELECT ...) TO variant."))); .. } Why is this restriction? Won't it be useful to allow it for the cases when user wants to copy the data of all the partitions? 2. + if (!pg_strcasecmp(stmt->partspec->strategy, "list") && + partnatts > 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot list partition using more than one column"))); /cannot list/cannot use list 3. @@ -77,7 +77,7 @@ typedef enum DependencyType DEPENDENCY_INTERNAL = 'i', DEPENDENCY_EXTENSION = 'e', DEPENDENCY_AUTO_EXTENSION = 'x', - DEPENDENCY_PIN = 'p' + DEPENDENCY_PIN = 'p', } DependencyType; Why is this change required? 4. @@ -0,0 +1,69 @@ +/*- + * + * pg_partitioned_table.h + * definition of the system "partitioned table" relation + * along with the relation's initial contents. + * + * + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group Copyright year should be 2016. 5. +/* + * PartitionSpec - partition key definition including the strategy + * + * 'strategy' partition strategy name ('list', 'range', etc.) etc. in above comment seems to be unnecessary. 6. + {PartitionedRelationId, /* PARTEDRELID */ Here PARTEDRELID sounds inconvenient, how about PARTRELID? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
Hi, this is the 7th patch to make instrumentation work. Explain analyze shows the following result by the previous patch set . | Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4324.676..4324.676 | rows=1 loops=1) | -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=0.910..3663.8 |82 rows=400 loops=1) | -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4) | (never executed) | -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4) | (never executed) | -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4) | (never executed) | -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4) | (never executed) | -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4) | (actual time=0.004..0.004 rows=0 loops=1) The current instrument stuff assumes that requested tuple always returns a tuple or the end of tuple comes. This async framework has two point of executing underneath nodes. ExecAsyncRequest and ExecAsyncEventLoop. So I'm not sure if this is appropriate but anyway it seems to show sane numbers. | Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4571.205..4571.206 | rows=1 loops=1) | -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=1.362..3893.1 |14 rows=400 loops=1) | -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4) |(actual time=1.056..770.863 rows=100 loops=1) | -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4) |(actual time=0.461..767.840 rows=100 loops=1) | -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4) |(actual time=0.474..782.547 rows=100 loops=1) | -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4) |(actual time=0.156..765.920 rows=100 loops=1) | -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4) (never executed) regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 35c60a46f49aab72d492c798ff7eb8fc0e672250 Mon Sep 17 00:00:00 2001 From: Kyotaro HoriguchiDate: Tue, 25 Oct 2016 19:04:04 +0900 Subject: [PATCH 7/7] Add instrumentation to async execution Make explain analyze give sane result when async execution has taken place. --- src/backend/executor/execAsync.c | 19 +++ src/backend/executor/instrument.c | 2 +- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c index 40e3f67..588ba18 100644 --- a/src/backend/executor/execAsync.c +++ b/src/backend/executor/execAsync.c @@ -46,6 +46,9 @@ ExecAsyncRequest(EState *estate, PlanState *requestor, int request_index, PendingAsyncRequest *areq = NULL; int nasync = estate->es_num_pending_async; + if (requestee->instrument) + InstrStartNode(requestee->instrument); + /* * If the number of pending asynchronous nodes exceeds the number of * available slots in the es_pending_async array, expand the array. @@ -121,11 +124,17 @@ ExecAsyncRequest(EState *estate, PlanState *requestor, int request_index, if (areq->state == ASYNC_COMPLETE) { Assert(areq->result == NULL || IsA(areq->result, TupleTableSlot)); + ExecAsyncResponse(estate, areq); + if (areq->requestee->instrument) + InstrStopNode(requestee->instrument, + TupIsNull((TupleTableSlot*)areq->result) ? 0.0 : 1.0); return; } + if (areq->requestee->instrument) + InstrStopNode(requestee->instrument, 0); /* No result available now, make this node pending */ estate->es_num_pending_async++; } @@ -193,6 +202,9 @@ ExecAsyncEventLoop(EState *estate, PlanState *requestor, long timeout) { PendingAsyncRequest *areq = estate->es_pending_async[i]; + if (areq->requestee->instrument) +InstrStartNode(areq->requestee->instrument); + /* Skip it if not pending. */ if (areq->state == ASYNC_CALLBACK_PENDING) { @@ -211,7 +223,14 @@ ExecAsyncEventLoop(EState *estate, PlanState *requestor, long timeout) if (requestor == areq->requestor) requestor_done = true; ExecAsyncResponse(estate, areq); + +if (areq->requestee->instrument) + InstrStopNode(areq->requestee->instrument, + TupIsNull((TupleTableSlot*)areq->result) ? + 0.0 : 1.0); } + else if (areq->requestee->instrument) +InstrStopNode(areq->requestee->instrument, 0); } /* If any node completed, compact the array. */ diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index 2614bf4..6a22a15 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -102,7 +102,7 @@ InstrStopNode(Instrumentation *instr, double nTuples) , >bufusage_start); /* Is this the first tuple
Re: [HACKERS] pg_basebackup stream xlog to tar
On Mon, Oct 24, 2016 at 7:46 AM, Michael Paquierwrote: > On Sun, Oct 23, 2016 at 10:28 PM, Magnus Hagander > wrote: > > It also broke the tests and invalidated some documentation. But it was > easy > > enough to fix. > > > > I've now applied this, so next time you get to do the merging :P Joking > > aside, please review and let me know if you can spot something I messed > up > > in the final merge. > > Just had another look at it.. > +static int > +tar_fsync(Walfile f) > +{ > + Assert(f != NULL); > + tar_clear_error(); > + > + /* > +* Always sync the whole tarfile, because that's all we can do. This > makes > +* no sense on compressed files, so just ignore those. > +*/ > + if (tar_data->compression) > + return 0; > + > + return fsync(tar_data->fd); > +} > fsync() should not be called here if --no-sync is used. > > + /* sync the empty blocks as well, since they're after the last file */ > + fsync(tar_data->fd); > Similarly, the fsync call of tar_finish() should not happen when > --no-sync is used. > Yeah, agreed. > + if (format == 'p') > + stream.walmethod = CreateWalDirectoryMethod(param->xlog, do_sync); > + else > + stream.walmethod = CreateWalTarMethod(param->xlog, > compresslevel, do_sync); > LogStreamerMain() exits immediately once it is done, but I think that > we had better be tidy here and clean up the WAL methods that have been > allocated. I am thinking here about a potentially retry method on > failure, though the best shot in this area would be with > ReceiveXlogStream(). > Wouldn't the same be needed in pg_receivexlog.c in that case? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Push down more full joins in postgres_fdw
> >> 13. The comment below is missing the main purpose i.e. creating a a unique >> alias, in case the relation gets converted into a subquery. Lowest or >> highest >> relid will create a unique alias at given level of join and that would be >> more >> future proof. If we decide to consider paths for every join order, >> following >> comment will no more be true. >> +/* >> + * Set the relation index. This is defined as the position of this >> + * joinrel in the join_rel_list list plus the length of the rtable >> list. >> + * Note that since this joinrel is at the end of the list when we are >> + * called, we can get the position by list_length. >> + */ >> +fpinfo->relation_index = >> +list_length(root->parse->rtable) + >> list_length(root->join_rel_list); > > > I don't agree on that point. As I said before, the approach using the > lowest/highest relid would make a remote query having nested subqueries > difficult to read. And even if we decided to consider paths for every join > order, we could define the relation_index safely, by modifying > postgresGetForeignJoinPaths so that it (1) sets the relation_index the > proposed way at the first time it is called and (2) avoids setting the > relation_index after the first call, by checking to see > fpinfo->relation_index > 0. OK. Although, the alias which contains a number, which apparently doesn't show any relation with the relation (no pun intended :)) being deparsed, won't make much sense in the deparsed query. But I am fine leaving this to the committer's judgement. May be you want to add an assert here making sure that relation_index is set only once. Something like Assert(fpinfo->relation_index == 0) just before setting it. > >> 14. The function name talks about non-vars but the Assert few lines below >> asserts that every node there is a Var. Need better naming. >> +reltarget_has_non_vars(RelOptInfo *foreignrel) >> +{ >> +ListCell *lc; >> + >> +foreach(lc, foreignrel->reltarget->exprs) >> +{ >> +Var *var = (Var *) lfirst(lc); >> + >> +Assert(IsA(var, Var)); >> And also an explanation for the claim >> + * Note: currently deparseExplicitTargetList can't properly handle such >> Vars. > > > Will remove this function for the reason described in #12. > >> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. > > > Right. I think that not only avoids creating redundant data to find the > alias of a given Var node but also would be able to find it in optimal cost. > >> Instead of that, we should probably gather all the alias information once >> and >> store it in context as an array indexed by relid. While deparsing a Var we >> can >> get the targetlist and alias for a given relation by using var->varno as >> index. >> And then search for given Var node in the targetlist to deparse the column >> name >> by its position in the targetlist. For the relations that are not >> converted >> into subqueries, this array will not hold any information and the Var will >> be >> deparsed as it's being done today. > > > Sorry, I don't understand this part. How does that work when creating a > remote query having nested subqueries? Is that able to search for a given > Var node efficiently? For every relation that is deparsed as a subquery, we will create a separate context. Each such context will have an array described above. This array will contain the targetlist and aliases for all the relations, covered by that relation, that are required to be deparsed as subqueries. These arrays bubble up to topmost relation that is not required to be deparsed as subquery. When a relation is required to be deparsed as a subquery, its immediate upper relation sets the alias and targetlist chosen for that relation at all the indexes corresponding to all the base relations covered by that relation. For example, let's assume that a relation (1, 2, 3) is required to be deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5) (thus the other joining relation being (4,5)). While deparsing for relation (1,2,3,4,5), the context will contain a 5 element array, with positions 1, 2, 3 filled by same targetlist and alias whereas positions 4 and 5 will not be filled as those relations are not deparsed as subqueries. Let's assume in relation (1, 2, 3), (1, 3) in turn requires subquery but (2) does not. Thus the context created while deparsing (1, 2, 3) will have a 3 element array with positions 1 and 3 containing the same targetlist and alias, where as position 2 will be empty. When deparsing a Var node with varno = N and varattno = m, if the nth position in the array in the context is empty, that Var node will be deparsed as rN.. But if that position is has alias sZ, then we search for that Var node in the targetlist and if it's found at kth position in the targetlist, we will deparse it as sZ.ck. The search in the targetlist can be performed using tlist_member, and then fetching the
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On 27.07.2016 at 16:09 Robert Haas wrote: On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignöswrote: We would like to contribute to PostgreSQL a solution that supports the query processing of "at each time point". The basic idea is to offer two new operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the ranges of tuples so that subsequent queries can use the usual grouping and equality conditions to get the intended results. I think that it is great that you want to contribute your work to PostgreSQL. I don't know whether there will be a consensus that this is generally useful functionality that we should accept, but I commend the effort anyhow. Assuming there is, getting this into a state that we consider committable will probably take quite a bit of additional work on your part; no one will do it for you. Hi hackers, thank you for your feedback. We are aware that contributing to PostgreSQL is a long way with a lot of work. We are committed to go all the way and do the work as discussed in the community. We had some internal discussions about the project, looking also at some other patches to better understand whether the patch is work-in-progress or ready for commitfest. If you're still interested in proceeding given those caveats, please add your patch here so that it gets reviewed: https://commitfest.postgresql.org/action/commitfest_view/open We decided to follow your recommendation and add the patch to the commitfest. Looking forward for your feedback, Anton, Johann, Michael, Peter -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
This is the rebased version on the current master(-0004), and added resowner stuff (0005) and unlikely(0006). At Tue, 18 Oct 2016 10:30:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20161018.103051.30820907.horiguchi.kyot...@lab.ntt.co.jp> > > > - Errors in the executor can leak the WaitEventSet. Probably we need > > > to modify ResourceOwners to be able to own WaitEventSets. > > WaitEventSet itself is not leaked but epoll-fd should be closed > at failure. This seems doable with TRY-CATCHing in > ExecAsyncEventLoop. (not yet) Haha, that's a silly talk. The wait event can continue to live when timeout and any error can happen on the way after the that. I added an entry for wait event set to resource owner and hang ones created in ExecAsyncEventWait to TopTransactionResourceOwner. Currently WaitLatchOrSocket doesn't do so not to change the current behavior. WaitEventSet doesn't have usable identifier for resowner.c so currently I use the address(pointer value) for the purpose. The patch 0005 does that. > I measured performance and had the following result. > > t0 - SELECT sum(a) FROM ; > pl - SELECT sum(a) FROM <4 local children>; > pf0 - SELECT sum(a) FROM <4 foreign children on single connection>; > pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>; > > The result is written as "time (std dev )" > > sync > t0: 3820.33 ( 1.88) > pl: 1608.59 ( 12.06) > pf0: 7928.29 ( 46.58) > pf1: 8023.16 ( 26.43) > > async > t0: 3806.31 ( 4.49)0.4% faster (should be error) > pl: 1629.17 ( 0.29)1.3% slower > pf0: 6447.07 ( 25.19) 18.7% faster > pf1: 1876.80 ( 47.13) 76.6% faster > > t0 is not affected since the ExecProcNode stuff has gone. > > pl is getting a bit slower. (almost the same to simple seqscan of > the previous patch) This should be a misprediction penalty. Using likely macro for ExecAppend, and it seems to have shaken off the degradation. sync t0: 3919.49 ( 5.95) pl: 1637.95 ( 0.75) pf0: 8304.20 ( 43.94) pf1: 8222.09 ( 28.20) async t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..) pl: 1617.20 ( 3.51) 1.26% faster (ditto) pf0: 6680.95 (478.72) 19.5% faster pf1: 1886.87 ( 36.25) 77.1% faster regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 25eba7e506228ab087e8b743efb039286a8251c4 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 12 Oct 2016 12:46:10 +0900 Subject: [PATCH 1/6] robert's 2nd framework --- contrib/postgres_fdw/postgres_fdw.c | 49 src/backend/executor/Makefile | 4 +- src/backend/executor/README | 43 +++ src/backend/executor/execAmi.c | 5 + src/backend/executor/execAsync.c| 462 src/backend/executor/nodeAppend.c | 162 ++- src/backend/executor/nodeForeignscan.c | 49 src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/outfuncs.c| 1 + src/backend/nodes/readfuncs.c | 1 + src/backend/optimizer/plan/createplan.c | 45 +++- src/include/executor/execAsync.h| 29 ++ src/include/executor/nodeAppend.h | 3 + src/include/executor/nodeForeignscan.h | 7 + src/include/foreign/fdwapi.h| 15 ++ src/include/nodes/execnodes.h | 57 +++- src/include/nodes/plannodes.h | 1 + 17 files changed, 909 insertions(+), 25 deletions(-) create mode 100644 src/backend/executor/execAsync.c create mode 100644 src/include/executor/execAsync.h diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 906d6e6..c480945 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -19,6 +19,7 @@ #include "commands/defrem.h" #include "commands/explain.h" #include "commands/vacuum.h" +#include "executor/execAsync.h" #include "foreign/fdwapi.h" #include "funcapi.h" #include "miscadmin.h" @@ -349,6 +350,14 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage, RelOptInfo *input_rel, RelOptInfo *output_rel); +static bool postgresIsForeignPathAsyncCapable(ForeignPath *path); +static void postgresForeignAsyncRequest(EState *estate, + PendingAsyncRequest *areq); +static void postgresForeignAsyncConfigureWait(EState *estate, + PendingAsyncRequest *areq, + bool reinit); +static void postgresForeignAsyncNotify(EState *estate, + PendingAsyncRequest *areq); /* * Helper functions @@ -468,6 +477,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS) /* Support functions for upper relation push-down */ routine->GetForeignUpperPaths = postgresGetForeignUpperPaths; + /* Support functions for async execution */ + routine->IsForeignPathAsyncCapable = postgresIsForeignPathAsyncCapable; + routine->ForeignAsyncRequest = postgresForeignAsyncRequest; +
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Tue, Oct 25, 2016 at 2:10 AM, Alvaro Herrerawrote: > Robert Haas wrote: > > >> It is not obvious what it means if there are multiple ports but the >> number doesn't equal the number of hosts. > > I think we should reject the case of differing number of elements and > neither host nor port is a singleton, as an error. The suggestion to > ignore some parts seems too error-prone. > +1. I also think returning error is better option in above case. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Transaction management overhaul is necessary?
From: Craig Ringer [mailto:cr...@2ndquadrant.com] > >> This was because psqlODBC starts and ends a subtransaction for each > >> SQL statement by default to implement statement-level rollback. And > >> PostgreSQL creates one CurTransactionContext memory context, which is > >> 8KB, for each subtransaction and retain them until the top transaction > ends. > > Surely that's where to start then. Find a way to pool and re-use, fully > release, or otherwise be done with transaction contexts for released > savepoints. Yes, I'll investigate this. Any reference information would be appreciated on why the CurTransactionContexts had to be retained, and whether it's difficult to circumvent. > You can control transaction level rollback in psqlODBC directly. You do > not need to fall back to the old protocol. Check the driver options. That driver option is Protocol=7.4-1. The name is misleading, as the driver now ignores version part (7.4), and interprets 1 as transaction-rollback. > Right. We can't just fire off each statement wrapped in SAVEPOINT and RELEASE > SAVEPOINT because we need to get the result of the statement and decide > whether to ROLLBACK TO SAVEPOINT or RELEASE SAVEPOINT. It only requires > two round trips if you shove the SAVEPOINT in with the intended statement, > but it's still messy. > > I'd like to see an alternative statement with semantics more akin to COMMIT > - which automatically into ROLLBACK if the tx is aborted. > COMMIT SAVEPOINT would be too confusing since it's not truly committed. > I don't know what to call it. But basically something that does RELEASE > SAVEPOINT [named savepoint] unless the subxact is in aborted state, in which > case it does ROLLBACK TO [named savepoint]. > Bonus points for letting it remember the last savepoint created and use > that. > > Furthermore, we should really add it on the protocol level so drivers can > send subtransaction control messages more compactly, without needing to > go through the parser etc, and without massively spamming the logs. For > this purpose savepoint names would be internally generated so the driver > wouldn't have to send them. We'd log savepoint boundaries when transaction > logging was enabled. Since the client would send the first such protocol > request we could do it on the sly without a protocol version bump; clients > could just check server version and not use the new messages for older > servers. If they send it to an older server they get a protocol error, which > is fine. I'm simply thinking of proposing a new GUC, something like "SET auto_rollback = {none | statement | transaction}", where none is the default and traditional behavior. > > You should to implement a CALL statement - that can be independent on > > outer transaction. The behave inside procedure called by CALL > > statement should be same like client side - and there you can controll > > transactions explicitly without nesting. > > I agree that'd be desirable. Top level "procedures" are necessary for this, > really. > > This would also enable us to return multiple result sets. > > We'd probably have to start at least one small read-only tx for the initial > cache access to look up the proc and set everything up, but if we don't > allocate xids local transactions are super cheap. OK, that would be a very big challenge... I can't imagine how difficult it will be now. But supporting the stored procedure with CALL statement would be a wall to overcome. > However, I think trying to tackle the memory context bloat reported upthread > would be a more effective starting point since it immediately targets the Yes, I think I'll address this. Maybe I'll start different threads for each topic: 1. Memory context bloat 2. Statement-level rollback 3. Stored procedures where transactions can be ended and started Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_file_settings view patch
Haribabu Kommi wrote: > On Mon, Oct 3, 2016 at 3:51 PM, Michael Paquier> wrote: > Yes, I agree that adding these JSONB utility functions for this view > is an overkill, but I thought that these are may be useful for some > users if it is a JSONB type instead of array. Peter Eisentraut said he'd like JSON better: https://www.postgresql.org/message-id/5547db0a.2020...@gmx.net I asked twice about the use of JSON, suggesting an array instead: https://www.postgresql.org/message-id/20151204163147.GZ2763@alvherre.pgsql https://www.postgresql.org/message-id/20160201215714.GA98800@alvherre.pgsql I now think that we should figure out what it is that we want before we continue to request it to be changed over and over. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langotewrote: > On 2016/10/05 2:12, Robert Haas wrote: >> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote >> wrote: Even if we leave the empty relfilenode around for now -- in the long run I think it should die -- I think we should prohibit the creation of subsidiary object on the parent which is only sensible if it has rows - e.g. indexes. It makes no sense to disallow non-inheritable constraints while allowing indexes, and it could box us into a corner later. >>> >>> I agree. So we must prevent from the get-go the creation of following >>> objects on parent tables (aka RELKIND_PARTITIONED_TABLE relations): >>> >>> * Indexes >>> * Row triggers (?) >> >> Hmm, do we ever fire triggers on the parent for operations on a child >> table? Note this thread, which seems possibly relevant: >> >> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com > > The answer to your question is no. > > The thread you quoted discusses statement-level triggers and the > conclusion is that they don't work as desired for UPDATE and DELETE on > inheritance tables. As things stand, only UPDATE or DELETE on the parent > affects the child tables and it's proposed there that the statement-level > triggers on the parent and also on any child tables affected should be > fired in that case. > Doesn't that imply that the statement level triggers should be fired for all the tables that get changed for statement? If so, then in your case it should never fire for parent table, which means we could disallow statement level triggers as well on parent tables? Some of the other things that we might want to consider disallowing on parent table could be: a. Policy on table_name b. Alter table has many clauses, are all of those allowed and will it make sense to allow them? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planet postgresql issue
On Tue, Oct 25, 2016 at 08:36:22AM +0200, hubert depesz lubaczewski wrote: > Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and > as far as I can tell, it works OK. Magnus is looking into the problem now. Seems to be something related to networking in the box that hosts planet. Best regards, depesz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planet postgresql issue
I got a email about issues with reading feed URL. [...] Same here. [...] Same issue here. [...] Same. I would guess that the feed loader has some connectivity problems. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planet postgresql issue
On Tue, Oct 25, 2016 at 3:36 PM, hubert depesz lubaczewskiwrote: > On Tue, Oct 25, 2016 at 08:28:00AM +0200, Pavel Stehule wrote: >> Hi >> >> I got a email about issues with reading feed URL. >> >> I checked manually URL and it looks well. http://okbob.blogspot.com/ >> feeds/posts/default > > Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and > as far as I can tell, it works OK. Same issue here. Glad I am not crazy yet. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails
On Tue, Oct 25, 2016 at 11:07 AM, Kyotaro HORIGUCHIwrote: > At Mon, 24 Oct 2016 15:55:58 +0900, Michael Paquier > wrote in > >> Anyway, we can clearly reject 1. in the light of >> https://www.postgresql.org/message-id/caa4ek1kmjtsxqf0cav7cs4d4vwv2h_pc8d8q1bucqdzaf+7...@mail.gmail.com >> when playing with different stop locations at recovery. > > | * If the last checkpoint record we've replayed is already our last > | * restartpoint, we can't perform a new restart point. We still update > | * minRecoveryPoint in that case, so that if this is a shutdown restart > | * point, we won't start up earlier than before. > ... > | * We don't explicitly advance minRecoveryPoint when we do create a > | * restartpoint. It's assumed that flushing the buffers will do that as a > | * side-effect. > > The second sentence seems to me as "we *expect* minRecoveryPoint > to be updated anyway even if we don't do that here". Though a bit > different in reality..it. > > skipped checkpoints - advance minRecvoeryPoint to the checkpoint > > I'm failing to make a consistent model for the code around here > in my mind.. Hm? If the last checkpoint record replayed is the last restart point, no restart point is created *but* minRecoveryPoint is updated to prevent the case where read only queries are allowed earlier than the next startup point. On the other hand, if a restart point is created, this code does not update minRecoveryPoint and it is assumed that the next buffer flush will do it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] planet postgresql issue
On Tue, Oct 25, 2016 at 08:28:00AM +0200, Pavel Stehule wrote: > Hi > > I got a email about issues with reading feed URL. > > I checked manually URL and it looks well. http://okbob.blogspot.com/ > feeds/posts/default Same here. feed url is https://www.depesz.com/tag/postgresql/feed/ and as far as I can tell, it works OK. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] planet postgresql issue
Hi I got a email about issues with reading feed URL. I checked manually URL and it looks well. http://okbob.blogspot.com/ feeds/posts/default Regards Pavel
Re: [HACKERS] pg_hba_file_settings view patch
On Mon, Oct 3, 2016 at 3:51 PM, Michael Paquierwrote: > On Mon, Sep 5, 2016 at 4:09 PM, Haribabu Kommi > wrote: > > On Sun, Sep 4, 2016 at 1:44 AM, Simon Riggs > wrote: > >> On 15 August 2016 at 12:17, Haribabu Kommi > >> wrote: > >> > >> > comments? > >> > >> This looks like a good feature contribution, thanks. > >> > >> At present the patch doesn't apply cleanly, please rebase. > > > > > > Rebased patch is attached. > > Moved to next CF as there is a patch and no reviews. > > + push_jsonb_string_key(, "map"); > + push_jsonb_string_value(, hba->usermap); > [...] > + > + options > + jsonb > + Configuration options set for authentication method > + > Why is it an advantage to use jsonb here instead of a simple array > made of name=value? If they were nested I'd see a case for it but it > seems to me that as presented this is just an overkill. In short, I > think that this patch needs a bit of rework, so I am marking it as > returned with feedback. > Yes, I agree that adding these JSONB utility functions for this view is an overkill, but I thought that these are may be useful for some users if it is a JSONB type instead of array. If anyone else feel the same opinion, I can update the patch with array datatype. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Minor code improvement to postgresGetForeignJoinPaths
Hi Ashutosh, You are right. Every call except that one is using NIL, so better to fix it. The pattern was repeated in the recent aggregate pushdown support. Here's patch to fix create_foreignscan_path() call in add_foreign_grouping_paths() as well. Thanks for the review! Tatsuro Yamada NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers