Re: [HACKERS] [PATCH] Store Extension Options
On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera wrote: > I am reworking this patch, both to accomodate earlier review comments > and to fix the multiple verify step of namespaces, as noted by Tom in > 4530.1390023...@sss.pgh.pa.us > > Alvaro, Do you need some help? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Shave a few instructions from child-process startup sequence
On Sun, Dec 1, 2013 at 12:07:21PM -0500, Gurjeet Singh wrote: > On Wed, Nov 27, 2013 at 9:12 AM, Robert Haas wrote: > > > This is a performance patch, so it should come with benchmark results > demonstrating that it accomplishes its intended purpose. I don't see > any. > > > Yes, this is a performance patch, but as the subject says, it saves a few > instructions. I don't know how to write a test case that can measure savings > of > skipping a few instructions in a startup sequence that potentially takes > thousands, or even millions, of instructions. Are we saying we don't want this patch? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade: delete_old_cluster.sh issues
On Mon, Nov 18, 2013 at 10:13:19PM -0500, Bruce Momjian wrote: > On Tue, Nov 12, 2013 at 10:35:58AM +, Marc Mamin wrote: > > Hello, > > > > IMHO, there is a serious issue in the script to clean the old data directory > > when running pg_upgrade in link mode. > > > > in short: When working with symbolic links, the first step in > > delete_old_cluster.sh > > is to delete the old $PGDATA folder that may contain tablespaces used by the > > new instance. > > > > in long, our use case: > > Rather than removing files/directories individually, which would be > difficult to maintain, we decided in pg_upgrade 9.3 to detect > tablespaces in the old data directory and report that and not create a > delete script. Here is the commit: > > > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4765dd79219b9697d84f5c2c70f3fe00455609a1 > > The problem with your setup is that while you didn't pass symbolic links > to pg_upgrade, you did use symbolic links when defining the tablespaces, > so pg_upgrade couldn't recognize that the symbolic links were inside the > old data directory. > > We could use readlink() to go walk over all symbolic links, but that > seems quite complex. We could use stat() and make sure there are no > matching inodes in the old data directory, or that they are in a > different file system. We could look for a directory named after the PG > catversion in the old data directory. We could update the docs. > > I am not sure what to do. We never expected people would put > tablespaces in the data directory. I went with a doc patch, attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml new file mode 100644 index 4d03b12..72e3cb6 *** a/doc/src/sgml/pgupgrade.sgml --- b/doc/src/sgml/pgupgrade.sgml *** psql --username postgres --file script.s *** 460,466 cluster's data directories by running the script mentioned when pg_upgrade completes. You can also delete the old installation directories ! (e.g. bin, share). --- 460,467 cluster's data directories by running the script mentioned when pg_upgrade completes. You can also delete the old installation directories ! (e.g. bin, share). This will not work ! if you have tablespaces inside the old data directory. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl status with nonexistent data directory
On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian wrote: > On Fri, Mar 7, 2014 at 07:18:04PM +0530, Amit Kapila wrote: >> > OK, done with the attached patch Three is returned for status, but one >> > for other cases. >> >> I think it would have been better if it return status as 4 for cases where >> directory/file is not accessible (current new cases added by this patch). >> >> I think status 3 should be only return only when the data directory is valid >> and pid file is missing, because in script after getting this code, the next >> thing they will probably do is to start the server which doesn't seem a >> good fit for newly added cases. > > OK, I have updated the attached patch to reflect this, and added a C > comment. This is fine, do you think we should mention about this in docs? I have added one line in docs (updated patch attached), if you think it makes sense then add it otherwise ignore it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index 45b53ce..99cb176 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -206,9 +206,10 @@ PostgreSQL documentation status mode checks whether a server is running in the specified data directory. If it is, the PID - and the command line options that were used to invoke it are - displayed. If the server is not running, the process returns an - exit status of 3. +and the command line options that were used to invoke it are +displayed. If the server is not running, the process returns an +exit status of 3. If the specified data directory is not accessible, +the process returns an exit status of 4. diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 0dbaa6e..56d238f 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -97,6 +97,7 @@ static bool allow_core_files = false; static time_t start_time; static char postopts_file[MAXPGPATH]; +static char version_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; static char backup_file[MAXPGPATH]; static char recovery_file[MAXPGPATH]; @@ -152,7 +153,7 @@ static void pgwin32_doRunAsService(void); static int CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service); #endif -static pgpid_t get_pgpid(void); +static pgpid_t get_pgpid(bool is_status_request); static char **readfile(const char *path); static void free_readfile(char **optlines); static int start_postmaster(void); @@ -246,10 +247,34 @@ print_msg(const char *msg) } static pgpid_t -get_pgpid(void) +get_pgpid(bool is_status_request) { FILE *pidf; longpid; + struct stat statbuf; + + if (stat(pg_data, &statbuf) != 0) + { + if (errno == ENOENT) + printf(_("%s: directory \"%s\" does not exist\n"), progname, +pg_data); + else + printf(_("%s: cannot access directory \"%s\"\n"), progname, +pg_data); + /* +* The Linux Standard Base Core Specification 3.1 says this should return +* '4, program or service status is unknown' +* https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html +*/ + exit(is_status_request ? 4 : 1); + } + + if (stat(version_file, &statbuf) != 0 && errno == ENOENT) + { + printf(_("%s: directory \"%s\" is not a database cluster directory\n"), +progname, pg_data); + exit(is_status_request ? 4 : 1); + } pidf = fopen(pid_file, "r"); if (pidf == NULL) @@ -810,7 +835,7 @@ do_start(void) if (ctl_command != RESTART_COMMAND) { - old_pid = get_pgpid(); + old_pid = get_pgpid(false); if (old_pid != 0) write_stderr(_("%s: another server might be running; " "trying to start server anyway\n"), @@ -894,7 +919,7 @@ do_stop(void) pgpid_t pid; struct stat statbuf; - pid = get_pgpid(); + pid = get_pgpid(false); if (pid == 0) /* no pid file */ { @@ -943,7 +968,7 @@ do_stop(void) for (cnt = 0; cnt < wait_seconds; cnt++) { - if ((pid = get_pgpid()) != 0) + if ((pid = get_pgpid(false)) != 0) { print_msg("."); pg_usleep(100); /* 1 sec */ @@ -980,7 +1005,7 @@ do_restart(void) pgpid_t pid; struct stat statbuf; - pid = get_pgpid(); + pid = get_pgpid(fals
Re: [HACKERS] extension_control_path
[I answered most of these concerns in more detail in the reply to Dimitri.] On 3/7/14, 9:16 AM, Stephen Frost wrote: > Being able to have a self-contained module which requires a minimum of > modification to postgresql.conf is a reduction in complexity, imv. > Having to maintain two config options which will end up being overly > long and mostly duplicated doesn't make things easier for people. Then we can make it one path. > This > has made me wonder if we could allow a control file to be explicitly > referred to from CREATE EXTENSION itself, dropping the need for any of > this postgresql.conf/GUC maintenance. There are downsides to that > approach as well, of course, but it's definitely got a certain appeal. That might be useful as a separate feature, but it reeks of #include , which isn't a sane practice. No programming language other than ancient or poorly designed ones allows that sort of thing. > I don't buy off on this analogy. For starters, you can change the > control file without needing to rebuild the library, (You can also change the rpath without rebuilding the library.) > but the main > difference is that, in practice, there are no library transistions > happening and instead what we're likely to have are independent and > *incompatible* libraries living with the same name in our path. I understand this concern. The question is, how big is it relative to the other ones. As side idea I just had, how about embedding the extension version number into the library somehow? Food for thought. > This makes sense when you have complete control over where things are > installed to and can drop the control file into the "one true directory > of control files" and similairly with the .so. Indeed, that works > already today for certain platforms, but from what I understand, on the > OSX platform you don't really get to just dump files anywhere on the > filesystem that you want and instead end up forced into a specific > directory tree. That is incorrect. If someone has a general use for module_pathname, I'd be interested to hear it, but that's not one of them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 23:49 , Tom Lane wrote: > Florian Pflug writes: >> On Mar5, 2014, at 18:37 , Tom Lane wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On 3/7/14, 11:39 AM, Dimitri Fontaine wrote: > Just please make sure that it's still possible to use full absolute path > for the module path name so that the packager can have control too. I can't think of any package system where absolute paths are part of a recommended workflow. There is always a search path, that search path contains a series of some kinds of system, non-system, per-user directories. A packager or installer chooses one of those directories as installation target, according to convention or user choice. OK, you can install a C library in some obscure place and then to #include , but that's not sane practice. Similarly, you will still be able to hardcode paths into CREATE FUNCTION statements or other places, but why would you want to? > What the $directory proposal achieves is allowing a fully relocatable > extension layout, where you just have to drop a directory anywhere in > the file system and it just works (*). If that's what you want (and it sounds attractive), why couldn't we just locate libraries using extension_control_path as well (which presumably contains the control file we are just processing). No need for another indirection. Libraries and control files being in separate directory hierarchies is a historical artifact, but it's not something that can't be changed if it's not what we want. The problem I have with this $directory proposal, if I understand it correctly, is that it requires editing of the control file at installation time. I don't like this for three reasons: it's not clear how this should actually be done, creating more broken extension build scripts (a big problem already); control files are part of the "code", so to speak, not a configuration file, and so munging it in a site-specific way creates a hybrid type of file that will be difficult to properly manage; it doesn't allow running an extension before installation (unless I overwrite the control file in my own source tree), which is my main use case for this. > What happens if you have more than one 'prefix.so' file in your path? The same thing that happens if you have more than one prefix.control in your path. You take the first one you find. Yes, if those are actually two different path settings, you need to keep those aligned. But that would be a one-time setting by the database administrator, not a per-extension-and-packager setting, so it's manageable. If it still bothers you, put them both in the same path, as I suggested above. > The module_pathname facility allows the packager to decide where the > library module file gets installed and the extension author not to > concern himself with that choice. Again, that would only work if they patch the control file during installation, which is dubious. That's like patching paths in include files or shared libraries. People do that, but it's not a preferred method. And secondly, why would a packager care? Has any packager ever cared to relocate the library file and no other file? Aside from those details, it seems clear that any reasonably complete move-extensions-elsewhere feature will need some kind of build system support. I have various ideas on that and would gladly contribute some of them, but it's not going to happen within two weeks. At this point I suggest that we work toward the minimum viable product: the extension_control_path feature as originally proposed (plus the crash fixes), and let the field work out best practices. As you describe, you can work around all the other issues by patching various text files, but you currently cannot move the extension control file in any way, and that's a real deficiency. (I once experimented with bind mounts to work around that -- a real mess ;-) ) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Mar5, 2014, at 23:49 , Tom Lane wrote: > Florian Pflug writes: >> On Mar5, 2014, at 18:37 , Tom Lane wrote: >>> My advice is to lose the EXPLAIN output entirely. If the authors of >>> the patch can't agree on what it means, what hope have everyday users >>> got of making sense of it? > >> The question isn't what the current output means, but whether it's a >> good metric to report or not. > > If you can't agree, then it isn't. Probably, yes, so let's find something that *is* a good metric. (BTW, it's not the authors who disagree here. It was me who put the EXPLAIN feature in, and Dean reviewed it and found it confusing. The original author David seems to run out of time to work on this, and AFAIK hasn't weighted in on that particular feature at all) >> If we don't report anything, then how would a user check whether a query >> is slow because of O(n^2) behaviour of a windowed aggregate, or because >> of some other reasons? > > [ shrug... ] They can see whether the Window plan node is where the time > is going. It's not apparent to me that the extra numbers you propose to > report will edify anybody. By the same line of reasoning, every metric except execution time is superfluous. Comparing execution times really is a horrible way to measure this - not only does it include all kinds of thing that have nothing to do with the restart behaviour of aggregates, you'd also have to construct a base-line query first which is guaranteed to not restart. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
Florian Pflug writes: > On Mar5, 2014, at 18:37 , Tom Lane wrote: >> My advice is to lose the EXPLAIN output entirely. If the authors of >> the patch can't agree on what it means, what hope have everyday users >> got of making sense of it? > The question isn't what the current output means, but whether it's a > good metric to report or not. If you can't agree, then it isn't. > If we don't report anything, then how would a user check whether a query > is slow because of O(n^2) behaviour of a windowed aggregate, or because > of some other reasons? [ shrug... ] They can see whether the Window plan node is where the time is going. It's not apparent to me that the extra numbers you propose to report will edify anybody. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 05/03/14 15:44, Craig Ringer wrote: On 03/05/2014 05:25 PM, Yeb Havinga wrote: Maybe a naive thought, but shouldn't all plans that include a table with an RLS clause be invalidated when the session role switches, regardless of which users from and to? Only if the plan is actually accessed when under a different user ID. Consider SECURITY DEFINER functions; you don't want to flush all cached plans just because you ran a SECURITY DEFINER function that doesn't even share any statements with the outer transaction. Hmm good point. I've also got some concerns about the user visible API; I'm not sure it makes sense to set row security policy for row reads per-command in PostgreSQL, since we have the RETURNING clause. Read-side policy should just be "FOR READ". Hmm but FOR READ would mean new keywords, and SELECT is also a concept known to users. I didn't find the api problematic to understand, on the contrary. Would you expect that FOR SELECT also affects rows you can see to UPDATE, INSERT, or DELETE? Yes. Because that's what it would have to mean, really. Otherwise, you could just use `UPDATE thetable SET id = id RETURNING *` (or whatever) to read the rows out if you had UPDATE rights. Or do the same with DELETE. With RETURNING, it doesn't make much sense for different statements to have different read access. Can you think of a case where it'd be reasonable to deny SELECT, but allow someone to see the same rows with `UPDATE ... RETURNING` ? It might be an idea to add the SELECT RLS clause for DML queries that contain a RETURNING clause. That way lies madness: A DML statement that affects *a different set of rows* depending on whether or not it has a RETURNING clause. If you state it like that, it sounds like a POLA violation. But the complete story is: "A user is allowed to UPDATE a set of rows from a table that is not a subsect of the set of rows he can SELECT from the table, iow he can UPDATE rows he is not allowed to SELECT. This can lead to unexpected results: When the user issues an UPDATE of the table without a returning clause, all rows the user may UPDATE are affected. When the user issues an UPDATE of the table with a returning clause, all rows the user may UPDATE and SELECT are affected." So the madness comes from the fact that it is allowed to define RLS that allow to modify rows you cannot select. Either prevent these conditions (i.e. proof that all DML RLS qual implies the SELECT qual, otherwise give an error on DML with a RETURNING clause), or allow it without violating the RLS rules but accept that a DML with RETURNING is different from a DML only. regards, Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tcp_keepalives_idle
On Fri, Mar 7, 2014 at 10:03:42PM -0500, Bruce Momjian wrote: > On Thu, Nov 14, 2013 at 11:32:23AM +0100, Marko Tiikkaja wrote: > > On 11/14/13 7:08 AM, Tatsuo Ishii wrote: > > >>It means "the connection is idle except for keepalive packets". > > >>We could perhaps just drop the word "otherwise", if people find > > >>it confusing. > > > > > >Wah. I seemed to completely misunderstand what the pharase > > >says. Thanks for clarification. I agree to drop "otherwise". > > > > I had some problem interpreting these explanations as well: > > http://www.postgresql.org/message-id/527a21f1.2000...@joh.to > > > > Compare that to the description in the libpq documentation: > > "Controls the number of seconds of inactivity after which TCP should > > send a keepalive message to the server.". > > Good point. I have improved the server-side keepalive parameter > descriptions to use the superior libpq text, with adjustment. > > Applied patch attached. Oops, now attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 86dbd0f..2811f11 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** include 'filename' *** 684,691 ! Specifies the number of seconds before sending a keepalive packet on ! an otherwise idle connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPIDLE or TCP_KEEPALIVE symbols, and on Windows; on other systems, it must be zero. --- 684,692 ! Specifies the number of seconds of inactivity after which TCP ! should send a keepalive message to the client. A value of 0 uses ! the system default. This parameter is supported only on systems that support the TCP_KEEPIDLE or TCP_KEEPALIVE symbols, and on Windows; on other systems, it must be zero. *** include 'filename' *** 708,715 ! Specifies the number of seconds between sending keepalives on an ! otherwise idle connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPINTVL symbol, and on Windows; on other systems, it must be zero. --- 709,717 ! Specifies the number of seconds after which a TCP keepalive message ! that is not acknowledged by the client should be retransmitted. ! A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPINTVL symbol, and on Windows; on other systems, it must be zero. *** include 'filename' *** 732,739 ! Specifies the number of keepalive packets to send on an otherwise idle ! connection. A value of 0 uses the system default. This parameter is supported only on systems that support the TCP_KEEPCNT symbol; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is --- 734,742 ! Specifies the number of TCP keepalives that can be lost before ! the server's connection to the client is considered dead. A value of 0 ! uses the system default. This parameter is supported only on systems that support the TCP_KEEPCNT symbol; on other systems, it must be zero. In sessions connected via a Unix-domain socket, this parameter is -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tcp_keepalives_idle
On Thu, Nov 14, 2013 at 11:32:23AM +0100, Marko Tiikkaja wrote: > On 11/14/13 7:08 AM, Tatsuo Ishii wrote: > >>It means "the connection is idle except for keepalive packets". > >>We could perhaps just drop the word "otherwise", if people find > >>it confusing. > > > >Wah. I seemed to completely misunderstand what the pharase > >says. Thanks for clarification. I agree to drop "otherwise". > > I had some problem interpreting these explanations as well: > http://www.postgresql.org/message-id/527a21f1.2000...@joh.to > > Compare that to the description in the libpq documentation: > "Controls the number of seconds of inactivity after which TCP should > send a keepalive message to the server.". Good point. I have improved the server-side keepalive parameter descriptions to use the superior libpq text, with adjustment. Applied patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLE not synonymous with SELECT * FROM?
On Wed, Nov 13, 2013 at 10:28:07AM +0100, Colin 't Hart wrote: > David et al, > > How about something like this? I have applied a modified version of your patch. I didn't like the idea of putting "SELECT" inside an OR syntax clauses --- the syntax is already too complicated. I also didn't like moving the TABLE mention up in the file. What I did do was to document the supported TABLE clauses, and add some of your verbiage. Thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml new file mode 100644 index 7395754..f1bc158 *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** TABLE [ ONLY ] name ! is completely equivalent to SELECT * FROM name It can be used as a top-level command or as a space-saving syntax ! variant in parts of complex queries. --- 1489,1505 TABLE name ! is equivalent to SELECT * FROM name It can be used as a top-level command or as a space-saving syntax ! variant in parts of complex queries. Only the WITH, ! UNION, INTERSECT, EXCEPT, ! ORDER BY, LIMIT, OFFSET, ! FETCH and locking clauses can be used with TABLE; ! the WHERE clause and any form of aggregation cannot ! be used. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On 2014-03-07 19:14:48 -0300, Fabrízio de Royes Mello wrote: > On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera > wrote: > > > I am reworking this patch, both to accomodate earlier review comments > > and to fix the multiple verify step of namespaces, as noted by Tom in > > 4530.1390023...@sss.pgh.pa.us > > > > > This link is broken... It is a message id, and it seemt o point to an appropriate thread? You can relatively easily lookup message ids using urls like http://www.postgresql.org/message-id/4530.1390023...@sss.pgh.pa.us Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Regression test errors
I was testing some builds I was doing and found that the regression tests fails when doing the against a Hot Standby server: $ make standbycheck [...] == running regression test queries== test hs_standby_check ... ok test hs_standby_allowed ... FAILED test hs_standby_disallowed... FAILED test hs_standby_functions ... ok == 2 of 4 tests failed. == The differences that caused some tests to fail can be viewed in the file "/usr/local/postgresql-9.3.3/src/test/regress/regression.diffs". A copy of the test summary that you see above is saved in the file "/usr/local/postgresql-9.3.3/src/test/regress/regression.out". The regression.diffs and patch attached. I haven't checked how far back those go. I don't think it's even important to back patch this, but it's nice for future testing. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services regression.diffs Description: Binary data From b6db8388e37f6afaa431e31239fd972d10140cc1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= Date: Fri, 7 Mar 2014 21:29:29 -0300 Subject: [PATCH] Standby regression checks failed. Two Hot Standby regression tests failed for various reasones. - An error in src/test/regress/expected/hs_standby_disallowed.out made regression fail (VACUUM should be ANALYZE). - Serializable transactions won't work on a Hot Standby. --- src/test/regress/expected/hs_standby_allowed.out| 2 +- src/test/regress/expected/hs_standby_disallowed.out | 2 +- src/test/regress/sql/hs_standby_allowed.sql | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/test/regress/expected/hs_standby_allowed.out b/src/test/regress/expected/hs_standby_allowed.out index 1abe5f6..9d18d77 100644 --- a/src/test/regress/expected/hs_standby_allowed.out +++ b/src/test/regress/expected/hs_standby_allowed.out @@ -49,7 +49,7 @@ select count(*) as should_be_1 from hs1; (1 row) end; -begin transaction isolation level serializable; +begin transaction isolation level readrepeatable; select count(*) as should_be_1 from hs1; should_be_1 - diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out index e7f4835..bc11741 100644 --- a/src/test/regress/expected/hs_standby_disallowed.out +++ b/src/test/regress/expected/hs_standby_disallowed.out @@ -124,7 +124,7 @@ unlisten *; ERROR: cannot execute UNLISTEN during recovery -- disallowed commands ANALYZE hs1; -ERROR: cannot execute VACUUM during recovery +ERROR: cannot execute ANALYZE during recovery VACUUM hs2; ERROR: cannot execute VACUUM during recovery CLUSTER hs2 using hs1_pkey; diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql index 58e2c01..5cd450d 100644 --- a/src/test/regress/sql/hs_standby_allowed.sql +++ b/src/test/regress/sql/hs_standby_allowed.sql @@ -28,7 +28,7 @@ begin transaction read only; select count(*) as should_be_1 from hs1; end; -begin transaction isolation level serializable; +begin transaction isolation repeatable read; select count(*) as should_be_1 from hs1; select count(*) as should_be_1 from hs1; select count(*) as should_be_1 from hs1; -- 1.8.3.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Static Code Analysis Exploits.
Patrick Curran writes: > We use Postgres in our product and we have a client that requires a > static code analysis scan to detect vulnerabilities. They are concerned > because the tool (Veracode) found several flaws in Postgres and they > believe there might be a security risk. I'm sure there are lots of > companies that use Postgres that have security policies like theirs in > place, so I'm hoping someone has the experience to know that these are > false positives or that they are not a security risk for some reason. > Below is a description of the vulnerability and the location in the > source code. Version 9.3.2.1 was scanned. Please let me know if there is > a better place to ask this kind of question. TBH, I don't think anyone's going to bother with going through this list in this form. Line numbers in something that's not an official community release are not very helpful, and the descriptions are far too vague for someone looking at the list to be sure exactly what their tool is on about. I took one example at random: > Stack-based Buffer Overflow (CWE ID 121)(13 flaws): > There is a potential buffer overflow with these functions. If an > attacker can control the data written into the buffer, the overflow may > result in execution of arbitrary code. > libpq.dll .../interfaces/libpq/pqexpbuffer.c 369 This seems to be complaining about the memcpy in appendBinaryPQExpBuffer. Well, I don't see anything unsafe about it: the preceding call to enlargePQExpBuffer should have made sure that the target buffer is big enough. And the reference to stack-based buffer overflow is completely nonsensical, because no PQExpBuffer keeps its buffer on the stack. It's conceivable that their tool has spotted some unsafe pattern in some call site, but this report is unhelpful about identifying what that would be. I did look at a few of the other items, and none of the ones I looked at were any more clear. FWIW, we do have access to Coverity scans of the Postgres sources, and we do make efforts to fix things Coverity complains about. But we're unlikely to take reports like this one seriously: there's simply not enough information to know what the tool is unhappy about, nor any clear reason to believe that it's spotted something that both human readers and Coverity have missed. Sorry if that's not the answer you wanted; but a more positive response is going to require a substantially greater amount of work on your end. In particular, given the very substantial amounts of work that have already gone into hardening the Postgres code, I think the burden of proof is on you or your client to show that these are issues, not on us to disprove claims that are too vague to be disproven. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
On Fri, Mar 7, 2014 at 7:21 PM, Andres Freund wrote: > On 2014-03-07 19:14:48 -0300, Fabrízio de Royes Mello wrote: > > On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera >wrote: > > > > > I am reworking this patch, both to accomodate earlier review comments > > > and to fix the multiple verify step of namespaces, as noted by Tom in > > > 4530.1390023...@sss.pgh.pa.us > > > > > > > > This link is broken... > > It is a message id, and it seemt o point to an appropriate thread? You > can relatively easily lookup message ids using urls like > http://www.postgresql.org/message-id/4530.1390023...@sss.pgh.pa.us > > Sorry... my fault!! Thanks! -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] [PATCH] Store Extension Options
On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera wrote: > I am reworking this patch, both to accomodate earlier review comments > and to fix the multiple verify step of namespaces, as noted by Tom in > 4530.1390023...@sss.pgh.pa.us > > This link is broken... -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Fri, Mar 7, 2014 at 12:55 PM, Peter LaDow wrote: > Just to be clear, what do you mean by "nontrivial code"? Do you mean > C library calls, other than fork/exec/_exit? I think I've answered my own question: http://man7.org/linux/man-pages/man7/signal.7.html The 'Async-signal-safe functions' are the nontrivial C calls that may be called. Pete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mar 6, 2014, at 1:51 AM, Peter Geoghegan wrote: >> It's true for perl. Syntax of hstore is close to hash/array syntax and it's >> easy serialize/deserialize hstore to/from perl. Syntax of hstore was >> inspired by perl. > > I understand that. There is a module on CPAN called Pg::hstore that > will do this; it appears to have been around since 2011. I don't use > Perl, so I don't know a lot about it. Perhaps David Wheeler has an > opinion on the value of Perl-like syntax, as a long time Perl > enthusiast? HSTORE was inspired by the syntax of Perl hash declarations, but it is not compatible. Notably, HSTORE the HSTORE can have a value `NULL`, while in Perl hashes it’s `undef`. So you cannot simply `eval` an HSTORE to get a Perl hash unless you are certain there are no NULLs. Besides, string eval in Perl is considered unsafe. Parsing is *much* safer. > In any case, Perl has excellent support for JSON, just like every > other language - you are at no particular advantage in Perl by having > a format that happens to more closely resemble the format of Perl > hashes and arrays. I really feel that we should concentrate our > efforts on one standardized format here. It makes the effort to > integrate your good work, in a way that makes it available to everyone > so much easier. I agree. I like HSTORE, but now that JSON is so standard (in fact, as of this week, a *real* standard! http://rfc7159.net/rfc7159), and its support is so much better than that of HSTORE, including in Perl, I believe that it should be priority over HSTORE. I’m happy if HSTORE has the same functionality as JSONB, but given the choice, all other things being equal, as a Perl hacker I will always choose JSONB. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Next CommitFest Deadlines
On Thu, Mar 6, 2014 at 7:44 AM, Fabrízio de Royes Mello wrote: > Hi all, > > There are some place with the next commitfest deadlines (2014/06 and > 2014/09) ? > > Regards, Those deadlines won't be finalized until after PGCon, but it seems likely to me that we'll do about what we did last year. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Static Code Analysis Exploits.
Hi, We use Postgres in our product and we have a client that requires a static code analysis scan to detect vulnerabilities. They are concerned because the tool (Veracode) found several flaws in Postgres and they believe there might be a security risk. I'm sure there are lots of companies that use Postgres that have security policies like theirs in place, so I'm hoping someone has the experience to know that these are false positives or that they are not a security risk for some reason. Below is a description of the vulnerability and the location in the source code. Version 9.3.2.1 was scanned. Please let me know if there is a better place to ask this kind of question. Thanks, Patrick Stack-based Buffer Overflow (CWE ID 121)(13 flaws): There is a potential buffer overflow with these functions. If an attacker can control the data written into the buffer, the overflow may result in execution of arbitrary code. btree_gist.dll .../btree_gist/btree_utils_num.c 115 btree_gist.dll .../btree_gist/btree_utils_num.c 123 pgcrypto.dll .../contrib/pgcrypto/crypt-md5.c 103 libpq.dll .../interfaces/libpq/fe-connect.c 3185 libpq.dll .../interfaces/libpq/fe-connect.c 3220 clusterdb.exe .../interfaces/libpq/fe-connect.c 3243 libpq.dll .../libpq/fe-protocol3.c 1661 libecpg_compat.dll .../ecpg/compatlib/informix.c 978 pgcrypto.dll .../contrib/pgcrypto/mbuf.c 112 pgcrypto.dll .../contrib/pgcrypto/mbuf.c 290 pgcrypto.dll .../contrib/pgcrypto/mbuf.c 306 pgcrypto.dll .../contrib/pgcrypto/mbuf.c 330 libpq.dll .../interfaces/libpq/pqexpbuffer.c 369 Use of Inherently Dangerous Function (CWE ID 242)(1 flaw): These functions are inherently unsafe because they does not perform bounds checking on the size of their input. An attacker can send overly long input and overflow the destination buffer, potentially resulting in execution of arbitrary code. pg_isolation_regress.exe .../src/test/regress/pg_regress.c 2307 Integer Overflow or Wraparound (CWE ID 190)(1 flaw): An integer overflow condition exists when an integer that has not been properly sanity checked is used in the determination of an offset or size for memory allocation, copying, concatenation, or similarly. If the integer in question is incremented past the maximum possible value, it may wrap to become a very small, negative number, therefore providing an unintended value. This occurs most commonly in arithmetic operations or loop iterations. Integer overflows can often result in buffer overflows or data corruption, both of which may be potentially exploited to execute arbitrary code. dict_snowball.dll .../libstemmer/utilities.c 371 Process Control (CWE ID 114)(4 flaws) A function call could result in a process control attack. An argument to a process control function is either derived from an untrusted source or is hard-coded, both of which may allow an attacker to execute malicious code under certain conditions. If an attacker is allowed to specify all or part of the filename, it may be possible to load arbitrary libraries. If the location is hard-coded and an attacker is able to place a malicious copy of the library higher in the search order than the file the application intends to load, then the application will load the malicious version. psql.exe .../src/bin/psql/print.c 752 psql.exe .../src/bin/psql/print.c 791 psql.exe .../src/bin/psql/print.c 2209 psql.exe .../src/bin/psql/print.c 2500 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Store Extension Options
I am reworking this patch, both to accomodate earlier review comments and to fix the multiple verify step of namespaces, as noted by Tom in 4530.1390023...@sss.pgh.pa.us -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund wrote: > Forking twice is ok as well, as long as you just use _exit() after the > fork. The thing is that you shouldn't run any nontrivial code in the > fork, as long as you're connected to the original environment (fd's, > memory mappings and so forth). Just to be clear, what do you mean by "nontrivial code"? Do you mean C library calls, other than fork/exec/_exit? Pete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund wrote: > Man. The point is that that the library code is carefully written to not > use exit() but _exit() after a fork failure, that's it. I understand your point. I understand that in the case of the postmaster we don't want to invoke behavior that can cause problems by calling exit(). But how does this jive with Tom's point (on the bug list) about other 3rd party libraries perhaps registering atexit handlers? If the convention is that only fork/exec/_exit is permissible after a fork (what about on_exit_reset?), then third party libraries also need to be aware of that convention and not depend on their atexit handlers being called. And I'm not advocating a certain solution. I'm only trying to clarify what the solution is so that we "comply" with the convention. We don't want to break posgres or any other "well behaved" third party libraries. I don't know the internals of postgres (hence original bug report and this thread), so I of course defer to you and the other experts here. So, in our case we call on_exit_reset() after the fork in the child, and then from there on only use fork, exec, or _exit, as you mention above. Pete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On 2014-03-07 12:09:45 -0800, Peter LaDow wrote: > On Friday, March 7, 2014, Andres Freund wrote: > > > > If the third party library is suitably careful it will only use fork() > > and then exec() or _exit(), otherwise there are other things that'll be > But that is not possible* in our case of trying to spawn an asynchronous > backgound process. Forking twice is ok as well, as long as you just use _exit() after the fork. The thing is that you shouldn't run any nontrivial code in the fork, as long as you're connected to the original environment (fd's, memory mappings and so forth). > > Both perl and glibc seem to do just that in system() btw... > I don't know about Perl, but system is blocking. What if you don't want to > wait for the child to exit? Man. The point is that that the library code is carefully written to not use exit() but _exit() after a fork failure, that's it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Friday, March 7, 2014, Andres Freund wrote: > > If the third party library is suitably careful it will only use fork() > and then exec() or _exit(), otherwise there are other things that'll be But that is not possible* in our case of trying to spawn an asynchronous backgound process. The goal here is for the extension to spawn the process and return immediately. If we exec in the first level child, and immediately return, who reaps the child when done? This is why we fork twice and exit in the first level child so that the extension can reap the first. Unless Postgres happily reaps all children perhaps through a SIGCHLD handler? (* I suppose we could exec a program that itself forks/execs a background process. But that is essentially no different than what we are doing now, other than trying to meet this fork/exec/_exit requirement. And that's fine if that is to be the case. We just need to know what it is.) > Both perl and glibc seem to do just that in system() btw... > I don't know about Perl, but system is blocking. What if you don't want to wait for the child to exit? Pete
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Fri, Mar 7, 2014 at 1:54 PM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane wrote: >>> I just noticed that the DSM patch has introduced a whole new class of >>> failures related to the bug #9464 issue: to wit, any on_detach >>> actions registered in a parent process will also be performed when a >>> child process exits, because nothing has been added to on_exit_reset >>> to prevent that. It seems likely that this is undesirable. > >> I don't think this can actually happen. There are quite a number of >> things that would go belly-up if you tried to use dynamic shared >> memory from the postmaster, which is why dsm_create() and dsm_attach() >> both Assert(IsUnderPostmaster). > > Nonetheless it seems like a good idea to make on_exit_reset drop any > such queued actions. > > The big picture here is that in the scenario being debated in the other > thread, exit() in a child process forked from a backend will execute that > backend's on_detach actions *even if the code had done on_exit_reset after > the fork*. Hmm. So the problematic sequence of events is where a postmaster child forks, and then exits without exec-ing, perhaps because e.g. exec fails? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On 2014-03-07 14:14:17 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-03-07 13:54:42 -0500, Tom Lane wrote: > >> The big picture here is that in the scenario being debated in the other > >> thread, exit() in a child process forked from a backend will execute that > >> backend's on_detach actions *even if the code had done on_exit_reset after > >> the fork*. > > > Hm, aren't those actions called via shmem_exit() calling > > dsm_backend_shutdown() et al? I think that should be cleared by > > on_shmem_exit()? > > But dsm_backend_shutdown gets called whether or not any shmem_exit > actions are registered. Yikes. I thought on_exit_reset() disarmed the atexit callback, but indeed, it does not... > > I think you're misunderstanding me. I am saying we *should* defend > > against it. Our opinions just seem to differ on what to do when the > > scenario is detected. I am saying we should scream bloody murder (which > > admittedly is a hard in a child), you want to essentially declare it > > supported. > > And if we scream bloody murder, what will happen? Absolutely nothing > except we'll annoy our users. They won't have control over the > third-party libraries that are doing what you want to complain about. If the third party library is suitably careful it will only use fork() and then exec() or _exit(), otherwise there are other things that'll be broken (e.g. stdio). In that case everything was and is fine. If not, the library's user can then fix or file a bug against the library. Both perl and glibc seem to do just that in system() btw... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
Andres Freund writes: > On 2014-03-07 13:54:42 -0500, Tom Lane wrote: >> The big picture here is that in the scenario being debated in the other >> thread, exit() in a child process forked from a backend will execute that >> backend's on_detach actions *even if the code had done on_exit_reset after >> the fork*. > Hm, aren't those actions called via shmem_exit() calling > dsm_backend_shutdown() et al? I think that should be cleared by > on_shmem_exit()? But dsm_backend_shutdown gets called whether or not any shmem_exit actions are registered. > I think you're misunderstanding me. I am saying we *should* defend > against it. Our opinions just seem to differ on what to do when the > scenario is detected. I am saying we should scream bloody murder (which > admittedly is a hard in a child), you want to essentially declare it > supported. And if we scream bloody murder, what will happen? Absolutely nothing except we'll annoy our users. They won't have control over the third-party libraries that are doing what you want to complain about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
Hi, On 2014-03-07 13:54:42 -0500, Tom Lane wrote: > Robert Haas writes: > > I don't think this can actually happen. There are quite a number of > > things that would go belly-up if you tried to use dynamic shared > > memory from the postmaster, which is why dsm_create() and dsm_attach() > > both Assert(IsUnderPostmaster). > > Nonetheless it seems like a good idea to make on_exit_reset drop any > such queued actions. > > The big picture here is that in the scenario being debated in the other > thread, exit() in a child process forked from a backend will execute that > backend's on_detach actions *even if the code had done on_exit_reset after > the fork*. Hm, aren't those actions called via shmem_exit() calling dsm_backend_shutdown() et al? I think that should be cleared by on_shmem_exit()? > So whether or not you buy Andres' argument that it's not > necessary for atexit_callback to defend against this scenario, there's > actually no other defense possible given the way things work in HEAD. I think you're misunderstanding me. I am saying we *should* defend against it. Our opinions just seem to differ on what to do when the scenario is detected. I am saying we should scream bloody murder (which admittedly is a hard in a child), you want to essentially declare it supported. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
Robert Haas writes: > On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane wrote: >> I just noticed that the DSM patch has introduced a whole new class of >> failures related to the bug #9464 issue: to wit, any on_detach >> actions registered in a parent process will also be performed when a >> child process exits, because nothing has been added to on_exit_reset >> to prevent that. It seems likely that this is undesirable. > I don't think this can actually happen. There are quite a number of > things that would go belly-up if you tried to use dynamic shared > memory from the postmaster, which is why dsm_create() and dsm_attach() > both Assert(IsUnderPostmaster). Nonetheless it seems like a good idea to make on_exit_reset drop any such queued actions. The big picture here is that in the scenario being debated in the other thread, exit() in a child process forked from a backend will execute that backend's on_detach actions *even if the code had done on_exit_reset after the fork*. So whether or not you buy Andres' argument that it's not necessary for atexit_callback to defend against this scenario, there's actually no other defense possible given the way things work in HEAD. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable coding in reorderbuffer.h
On 2014-03-07 13:27:28 -0500, Tom Lane wrote: > Andres Freund writes: > > A patch fixing this is attached. I've added some more local variables to > > deal with the longer lines... > > Since I've got a compiler in captivity that complains about this, > I'll take care of checking and committing this patch. Thanks. The tests specific for the feature can be run with (cd contrib/test_decoding && make -s check), as they require non-default PGC_POSTMASTER settings. > I still think it might be a good idea to spin up a buildfarm member > running "gcc -ansi -pedantic", even if we don't see that as a particularly > useful case in practice. Thoughts? I tried to compile with both -ansi -pedantic -std=c90 to catch potential further issues, but at least my gcc version makes the output completely drown in various warnings. Some can be individually disabled, but lots of them seem to be only be covered by pretty general warning categories. -ansi -std=c90 -Wno-variadic-macros works reasonably well tho. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions
On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane wrote: > I just noticed that the DSM patch has introduced a whole new class of > failures related to the bug #9464 issue: to wit, any on_detach > actions registered in a parent process will also be performed when a > child process exits, because nothing has been added to on_exit_reset > to prevent that. It seems likely that this is undesirable. I don't think this can actually happen. There are quite a number of things that would go belly-up if you tried to use dynamic shared memory from the postmaster, which is why dsm_create() and dsm_attach() both Assert(IsUnderPostmaster). Without calling those function, there's no way for code running in the postmaster to call on_dsm_detach(), because it's got nothing to pass for the first argument. In case you're wondering, the major reason I disallowed this is that the control segment tracks the number of backends attached to each segment, and there's no provision for adjusting that value each time the postmaster forks. We could add such provision, but it seems like there would be race conditions, and the postmaster might have to participate in locking, so it might be pretty ugly, and a performance suck for anyone who doesn't need this functionality. And it doesn't seem very useful anyway. The postmaster really shouldn't be touching any shared memory segment more than the absolutely minimal amount, so the main possible benefit I can see is that you could have the mapping already in place for each new backend rather than having to set it up in every backend. But I'm prepared to hope that isn't actually important enough to be worth worrying about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable coding in reorderbuffer.h
Andres Freund writes: > A patch fixing this is attached. I've added some more local variables to > deal with the longer lines... Since I've got a compiler in captivity that complains about this, I'll take care of checking and committing this patch. I still think it might be a good idea to spin up a buildfarm member running "gcc -ansi -pedantic", even if we don't see that as a particularly useful case in practice. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal (9.5) : psql unicode border line styles
Hello I am returning back to this topic. Last time I proposed styles: http://www.postgresql.org/message-id/cafj8prclgoktryjpbtoncpgyftrcz-zgfowdc1jqulb+ede...@mail.gmail.com http://postgres.cz/wiki/Pretty_borders_in_psql This experiment fails, but there are some interesting tips in discuss. So I propose little bit different proposal - choose one predefined style for any table lines elements. These styles are active only when "linestyle" is unicode. So possible line elements are: * border, * header_separator, * row_separator, * column_separator, Possible styles (for each element) * none, * single, * double, * thick, It should to have enough variability to define all styles proposed early. I hope, so this proposal is secure and simple for usage. Styles should be persistently saved in .psqlrc file - and some examples can be in documentation. Usage: \pset linestyle_border double \pset linestyle_header_separator single \pset linestyle_row_separator single \pset linestyle_column_separator single \pset linestyle unicode ╔═══╤╤═══╗ ║ a │ b │ c ║ ╟───┼┼───╢ ║ 1 │ 2012-05-24 │ Hello ║ ╟───┼┼───╢ ║ 2 │ 2012-05-25 │ Hello ║ ║ ││ World ║ ╚═══╧╧═══╝ (2 rows) Comments, ideas ? Regards Pavel
Re: [HACKERS] atexit_callback can be a net negative
Hi, On 2014-03-07 09:50:15 -0800, Peter LaDow wrote: > Also, the on_exit_reset() does clear up the issue, and we have > implemented it as suggested in this thread (calling it immediately > after the fork() in the child). And Andres' keen eye noted we were > calling exit() after an exec() failure, rather than _exit(). Thank > you, Andres, for pointing out this error. I actually didn't see any source code of yours ;), just answered Tom's question about what to do after exec() failed. There's several other wierd behaviours if you use exit() instead of _exit() after a fork, most prominently double flushes leading to repeated/corrupted output when you have have stream FILEs open in the parent. This is because the stream will be flushed in both, the parent (at some later write or exit) and the child (at exit). It's simply something that won't work well. > > I don't see any way it's be safe for a pg unaware library to start > > forking around and doing similar random things inside normal > > backends. > > We aren't exactly "forking around." We were trying to spawn an > asynchronous process without any ties to the postmaster. The important bit in the sentence you quote is "pg unaware library". My point is just that there are some special considerations you have to be aware of. fork() and exec() is a way to escape that environment, and should be fine. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] What the heck is hobby.2ndquadrant.com?
On 7 March 2014 17:46, Tom Lane wrote: > There are a few threads floating around currently in which some of > the cc'd addresses are various-people at hobby.2ndquadrant.com. > Addresses like that seem not to work, or at least not work reliably. > I believe the official addresses are just soandso at 2ndquadrant.com. > > If you're replying to any such threads, you might want to clean > up the addresses so you don't get a bunch of bounce messages. > > In any case, I thought the 2ndquadrant people would want to know that > there's something broken in their mail system. I'm not sure how these > addresses got injected into our threads in the first place, but I bet > it wasn't done by anybody outside 2ndquadrant. Perhaps an internal > mail server name was unintentionally exposed? Thanks for highlighting that. Will investigate. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
Craig Ringer writes: > What I'm concerned about is the locking. It looks to me like we're > causing the user to lock rows that they may not intend to lock, by > applying a LockRows step *before* the user supplied qual. (I'm going to > test that tomorrow, it's sleep time in Australia). The fact that there are two LockRows nodes seems outright broken. The one at the top of the plan is correctly placed, but how did the other one get in there? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Sorry for the bit of top-posting, but I wanted to make some things clear. Also, I wasn't subscribed to pgsql-hackers at the time this thread began, so I apologize for the missing headers that might cause threading issues. I'm the submitter of bug #9464. Here's the background on what we are doing. We are on a limited resource, embedded device. We make use of the database as an event driven system. In the case of an incoming event, such as a settings change, we make use of this fork/exec procedure to spawn an asynchronous process to handle certain events. Some of these external processes are long running, some need to run outside the current transaction context, and some we don't care about the result--we just want it to run. Also, the on_exit_reset() does clear up the issue, and we have implemented it as suggested in this thread (calling it immediately after the fork() in the child). And Andres' keen eye noted we were calling exit() after an exec() failure, rather than _exit(). Thank you, Andres, for pointing out this error. Andres Freund 2ndquadrant.com> writes: > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > > Andres Freund 2ndquadrant.com> writes: > > > What are you proposing to do in that case? This is only one of the > > > failure cases of forking carelessly, right? Just to be clear, we intended to fork careFULLy, not careLESSly. Hence the reason for the double fork with an eventual exec(). We intended to be clearly separated from the backend and executing in our own clean, unrelated context. > I don't think it's a reasonable pattern run background processes that > aren't managed by postgres with all shared memory still > accessible. You'll have to either also detach from shared memory and In this case we definitely did NOT want to be managed by postgres. Hence the double fork. The intent was that the first level child would NOT be managed by postgres, hence the exit(). > related things, or you have to fork() and exec(). At the very least, not Which is _exactly_ what we were trying to do. The first fork was only so that we could fork again and spawn a subprocess completely detached from the postmaster. And also to have something for the postmaster to reap via waitpid and prevent a zombie from hanging around. The typical daemon stuff. > integrating the child with the postmaster's lifetime will prevent > postgres from restarting because there's still a child attached to the > shared memory. Which is _exactly_ what we were NOT trying to do. We did not want to integrate with postmaster. > I don't see any way it's be safe for a pg unaware library to start > forking around and doing similar random things inside normal > backends. We aren't exactly "forking around." We were trying to spawn an asynchronous process without any ties to the postmaster. This was expected to be well-defined, clean behavior. A fork/exec isn't an unusual thing to do, and we didn't expect that exiting a child would invoke behavior that would cause problems. Thanks, Pete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] What the heck is hobby.2ndquadrant.com?
There are a few threads floating around currently in which some of the cc'd addresses are various-people at hobby.2ndquadrant.com. Addresses like that seem not to work, or at least not work reliably. I believe the official addresses are just soandso at 2ndquadrant.com. If you're replying to any such threads, you might want to clean up the addresses so you don't get a bunch of bounce messages. In any case, I thought the 2ndquadrant people would want to know that there's something broken in their mail system. I'm not sure how these addresses got injected into our threads in the first place, but I bet it wasn't done by anybody outside 2ndquadrant. Perhaps an internal mail server name was unintentionally exposed? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
Craig Ringer writes: > On 03/06/2014 02:58 AM, Tom Lane wrote: >> The PlanInvalItem could perfectly well be generated by the planner, >> no, if it has the user OID? But I'm not real sure why you need it. >> I don't see the reason for an invalidation triggered by user ID. >> What exactly about the *user*, and not something else, would trigger >> plan invalidation? > It's only that the plan depends on the user ID. There's no point keeping > the plan around if the user no longer exists. [ shrug... ] Leaving such a plan cached would be harmless, though. Furthermore, the user ID we'd be talking about is either the owner of the current session, or the owner of some view or security-definer function that the plan is already dependent on, so it's fairly hard to credit that the plan would survive long enough for the issue to arise. Even if there is a scenario where invalidating by user ID is actually useful, I think adding infrastructure to cause invalidation in such a case is optimizing for the wrong thing. You're adding cycles to every query to benefit a case that is going to be quite infrequent in practice. >> What we do need is a notion that a plan cache entry might only be >> valid for a specific calling user ID; but that's a matter for cache >> entry lookup not for subsequent invalidation. > Yes, that would be good, but is IMO more of a separate optimization. I'm > currently using KaiGai's code to invalidate and re-plan when a user ID > change is detected. I'm unlikely to accept a patch that does that; wouldn't it be catastrophic for performance in the presence of security-definer functions? You can't just trash the whole plan cache when a user ID switch occurs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Mar 7, 2014 at 11:57:48AM -0500, Andrew Dunstan wrote: > >If they can be done for 9.4, great, if not, we have to decide if these > >suboptimal cases are enough for us to delay the data type until 9.5. I > >don't know the answer, but I have to ask the question. > > > > AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were > essentially two ways of spelling the same data, the domains were > virtually identical (hstore might have been a bit more liberal about > numeric input). OK, it sounds like the adjustments are minimal, like not using the high-order bit. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 03/07/2014 11:45 AM, Bruce Momjian wrote: On Fri, Mar 7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote: IIRC The sacrifice was one bit in the header (i.e. in the first int after the varlena header). We could now repurpose that (for example if we ever decided to use a new format). Oleg and Teodor made most of the adjustments on the hstore(2) side (e.g. providing for scalar roots, providing for json typing of scalars so everything isn't just a string). Can the architecture be changed? No. If we think it's not good enough we would have to kiss jsonb goodbye for 9.4 and go back to the drawing board. But I haven't seen any such suggestion from anyone who has been reviewing it (e.g. Andres or Peter). We are going to be stuck with the JSONB binary format we ship in 9.4 so I am asking if there are things we should do to improve it, now that we know we don't need backward compatibility. If they can be done for 9.4, great, if not, we have to decide if these suboptimal cases are enough for us to delay the data type until 9.5. I don't know the answer, but I have to ask the question. AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were essentially two ways of spelling the same data, the domains were virtually identical (hstore might have been a bit more liberal about numeric input). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Mar 7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote: > IIRC The sacrifice was one bit in the header (i.e. in the first int > after the varlena header). We could now repurpose that (for example > if we ever decided to use a new format). > > Oleg and Teodor made most of the adjustments on the hstore(2) side > (e.g. providing for scalar roots, providing for json typing of > scalars so everything isn't just a string). > > Can the architecture be changed? No. If we think it's not good > enough we would have to kiss jsonb goodbye for 9.4 and go back to > the drawing board. But I haven't seen any such suggestion from > anyone who has been reviewing it (e.g. Andres or Peter). We are going to be stuck with the JSONB binary format we ship in 9.4 so I am asking if there are things we should do to improve it, now that we know we don't need backward compatibility. If they can be done for 9.4, great, if not, we have to decide if these suboptimal cases are enough for us to delay the data type until 9.5. I don't know the answer, but I have to ask the question. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
Hi, Peter Eisentraut writes: > On 2/27/14, 6:04 AM, Dimitri Fontaine wrote: >>directory = 'local/hstore-new' >>module_pathname = '$directory/hstore' > > I think your previously proposed patch to add extension_control_path > plus my suggestion to update existing de facto best practices to not > include $libdir into the module path name (thus allowing the use of > dynamic_library_path) will address all desired use cases just fine. My opinion is that we have two choices: refactoring the current API or incrementally improving it. In both cases we should make it possible for the packager to control where any individual module file is loaded from, with maybe options for the sysadmin to override the packager's choice. In your proposal, the control moves away from the developer, and that's a good thing, so you get a +1 from me. Just please make sure that it's still possible to use full absolute path for the module path name so that the packager can have control too. > Moreover, going that way would reuse existing facilities and concepts, > remove indirections and reduce overall complexity. This new proposal, > on the other hand, would go the other way, introducing new concepts, > adding more indirections, and increasing overall complexity, while > actually achieving less. What the $directory proposal achieves is allowing a fully relocatable extension layout, where you just have to drop a directory anywhere in the file system and it just works (*). It just work and allows to easily control which module is loaded and without having to setup either LD_LIBRARY_PATH, ld.so.conf nor our own dynamic_library_path. * providing that said directory is part of extension_control_path, or that you copy or move the .control file to sharedir/extension. That said, I don't intend to be using it myself, so I won't try and save that patch in any ways. My position is that Stephen's concern is real and his idea simple enough while effective, so worth pursuing. > I see an analogy here. What we are currently doing is similar to > hardcoding absolute rpaths into all libraries. Your proposal is > effectively to (1) add the $ORIGIN mechanism and (2) make people use > chrpath when they want to install somewhere else. My proposal is to get > rid of all rpaths and just set a search path. Yes, on technical level, > this is less powerful, but it's simpler and gets the job done and is > harder to misuse. What happens if you have more than one 'prefix.so' file in your path? > A problem with features like these is that they get rarely used but > offer infinite flexibility, so they are not used consistently and you > can't rely on anything. This is already the case for the > module_pathname setting in the control file. It has, AFAICT, no actual > use, and because of that no one uses it, and because of that, there is > no guarantee that extensions use it sensibly, and because of that no one > can ever make sensible use of it in the future, because there is no > guarantee that extensions have it set sensibly. In fact, I would > propose deprecating module_pathname. The module_pathname facility allows the packager to decide where the library module file gets installed and the extension author not to concern himself with that choice. I agree that using $libdir as the extension developer isn't the right thing to do. Having to choose the installation path as a developer, either in the SQL script or in the control file, is not the right thing. Now, the practical answer I have to that point is to have the packager rewrite the control file as part of its build system. My vote goes against deprecating module_pathname, because I didn't see in your proposal any ways to offer the control back to the packager, only to the sysadmin, and I don't want to have the sysadmin involved if we can avoid it (as you say, too much flexibility is a killer). In practical term, though, given the current situation, the build system I'm woking on already has to edit the SQL scripts and control files anyways… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 03/06/2014 11:33 PM, Bruce Momjian wrote: On Thu, Mar 6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote: Hi there, Looks like consensus is done. I and Teodor are not happy with it, but what we can do :) One thing I want to do is to reserve our contribution to the flagship feature (jsonb), particularly, "binary storage for nested structures and indexing. Their work was sponsored by Engine Yard". OK, if we are going with an unchanged hstore in contrib and a new JSONB, there is no reason to wack around JSONB to be binary compatible with the old hstore format. What sacrifices did we need to make to have JSBONB be binary compatible with hstore, can those sacrifices be removed, and can that be done in time for 9.4? IIRC The sacrifice was one bit in the header (i.e. in the first int after the varlena header). We could now repurpose that (for example if we ever decided to use a new format). Oleg and Teodor made most of the adjustments on the hstore(2) side (e.g. providing for scalar roots, providing for json typing of scalars so everything isn't just a string). Can the architecture be changed? No. If we think it's not good enough we would have to kiss jsonb goodbye for 9.4 and go back to the drawing board. But I haven't seen any such suggestion from anyone who has been reviewing it (e.g. Andres or Peter). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Mar 6, 2014 at 10:33 PM, Bruce Momjian wrote: > On Thu, Mar 6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote: >> Hi there, >> >> Looks like consensus is done. I and Teodor are not happy with it, but >> what we can do :) One thing I want to do is to reserve our >> contribution to the flagship feature (jsonb), particularly, "binary >> storage for nested structures and indexing. Their work was sponsored >> by Engine Yard". > > OK, if we are going with an unchanged hstore in contrib and a new JSONB, > there is no reason to wack around JSONB to be binary compatible with the > old hstore format. What sacrifices did we need to make to have JSBONB > be binary compatible with hstore, can those sacrifices be removed, and > can that be done in time for 9.4? Also, *) what hstore2 features (if any) that are not already reflected in the jsonb type are going to be moved to josnb for 9.4? *) if the answer above is anything but 'nothing', what hstore-isms are going to be adjusted in the process of doing so? Presumably there would be same function name changes to put them in the jsonb style but also the hstore sytnax ('=>') is going to be embedded in some of the search operators and possibly other things. Is that going change? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Hi, On 2014-03-07 10:24:31 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > >> No, I think it should do nothing. The coding pattern shown in bug #9464 > >> seems perfectly reasonable and I think we should allow it. > > > I don't think it's a reasonable pattern run background processes that > > aren't managed by postgres with all shared memory still > > accessible. You'll have to either also detach from shared memory and > > related things, or you have to fork() and exec(). > > The code in question is trying to do that. And what do you think will > happen if the exec() fails? In code that's written properly, not much. It will use _exit() after the exec() failure. That's pretty established best practice after forking, especially after a exec() failure. > > At the very least, not > > integrating the child with the postmaster's lifetime will prevent > > postgres from restarting because there's still a child attached to the > > shared memory. > > I think you're willfully missing the point. The reason we added > atexit_callback was to try to defend ourselves against third-party code > that did things in a non-Postgres-aware way. Hey, I am not arguing that we should remove protection, I am saying that we should make it more obvious that dangerous stuff happens. That way people can fix stuff during development rather than finding out stuff breaks in some corner cases on busy live systems. > Arguing that such code > should do things in a Postgres-aware way is not helpful for the concerns > here, and it's not relevant to reality either, because people will load > stuff like libperl into backends. Good luck getting a post-fork > on_exit_reset() call into libperl. libperl won't fork on it's own, without the user telling it to do so, and code that forks can very well be careful to do POSIX::_exit(), especially in case a exec fails. I don't have much problem telling people that a fork() without a direct exec() afterwards simply isn't supported from PLs. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unportable coding in reorderbuffer.h
Hi, On 2014-03-06 02:39:37 +0100, Andres Freund wrote: > Unless somebody protests I'll try to get the remaining walsender and > docs patches ready before sending in the patch fixing this as it's not > disturbing the buildfarm. I'll be onsite/travelling tomorrow; so I am > not sure I'll be done then, but friday it is. A patch fixing this is attached. I've added some more local variables to deal with the longer lines... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 875eda2d163e38e6533bf6cf8e6e6417aad0421b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 6 Mar 2014 02:00:47 +0100 Subject: [PATCH] Remove unportable use of anonymous unions from reorderbuffer.h. In b89e151054a I had assumed it was ok to use anonymous unions as struct members, but while a longstanding extension in many compilers, it's only been standardized in C11. To fix, remove one of the anonymous unions which tried to hide some implementation specific enum values and give the other a name. The latter unfortunately requires changes in output plugins, but since the feature has only been added a few days ago... Per complaint from Tom Lane. --- contrib/test_decoding/test_decoding.c | 18 +- src/backend/replication/logical/decode.c| 28 +-- src/backend/replication/logical/reorderbuffer.c | 283 src/include/replication/reorderbuffer.h | 39 ++-- 4 files changed, 186 insertions(+), 182 deletions(-) diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c index ea463fb..e356c7c 100644 --- a/contrib/test_decoding/test_decoding.c +++ b/contrib/test_decoding/test_decoding.c @@ -358,43 +358,45 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, { case REORDER_BUFFER_CHANGE_INSERT: appendStringInfoString(ctx->out, " INSERT:"); - if (change->tp.newtuple == NULL) + if (change->data.tp.newtuple == NULL) appendStringInfoString(ctx->out, " (no-tuple-data)"); else tuple_to_stringinfo(ctx->out, tupdesc, - &change->tp.newtuple->tuple, + &change->data.tp.newtuple->tuple, false); break; case REORDER_BUFFER_CHANGE_UPDATE: appendStringInfoString(ctx->out, " UPDATE:"); - if (change->tp.oldtuple != NULL) + if (change->data.tp.oldtuple != NULL) { appendStringInfoString(ctx->out, " old-key:"); tuple_to_stringinfo(ctx->out, tupdesc, - &change->tp.oldtuple->tuple, + &change->data.tp.oldtuple->tuple, true); appendStringInfoString(ctx->out, " new-tuple:"); } - if (change->tp.newtuple == NULL) + if (change->data.tp.newtuple == NULL) appendStringInfoString(ctx->out, " (no-tuple-data)"); else tuple_to_stringinfo(ctx->out, tupdesc, - &change->tp.newtuple->tuple, + &change->data.tp.newtuple->tuple, false); break; case REORDER_BUFFER_CHANGE_DELETE: appendStringInfoString(ctx->out, " DELETE:"); /* if there was no PK, we only know that a delete happened */ - if (change->tp.oldtuple == NULL) + if (change->data.tp.oldtuple == NULL) appendStringInfoString(ctx->out, " (no-tuple-data)"); /* In DELETE, only the replica identity is present; display that */ else tuple_to_stringinfo(ctx->out, tupdesc, - &change->tp.oldtuple->tuple, + &change->data.tp.oldtuple->tuple, true); break; + default: + Assert(false); } MemoryContextSwitchTo(old); diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index af3948e..414cfa9 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -586,17 +586,17 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) change = ReorderBufferGetChange(ctx->reorder); change->action = REORDER_BUFFER_CHANGE_INSERT; - memcpy(&change->tp.relnode, &xlrec->target.node, sizeof(RelFileNode)); + memcpy(&change->data.tp.relnode, &xlrec->target.node, sizeof(RelFileNode)); if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE) { Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader)); - change->tp.newtuple = ReorderBufferGetTupleBuf(ctx->reorder); + change->data.tp.newtuple = ReorderBufferGetTupleBuf(ctx->reorder); DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert, r->xl_len - SizeOfHeapInsert, - change->tp.newtuple); + change->data.tp.newtuple); } ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change); @@ -626,7 +626,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) change = ReorderBufferGetChange(ctx->reorder); change->action = REORDER_BUFFER_CHANGE_UPDATE; - memcpy(&change->tp.relnode, &xlrec->target.node, sizeof(RelFileNode)); + memcpy(&change->data.tp.relnode, &xlrec->target.node, sizeof(RelFileNode)); data = (char *) &xl
Re: [HACKERS] atexit_callback can be a net negative
Andres Freund writes: > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: >> No, I think it should do nothing. The coding pattern shown in bug #9464 >> seems perfectly reasonable and I think we should allow it. > I don't think it's a reasonable pattern run background processes that > aren't managed by postgres with all shared memory still > accessible. You'll have to either also detach from shared memory and > related things, or you have to fork() and exec(). The code in question is trying to do that. And what do you think will happen if the exec() fails? > At the very least, not > integrating the child with the postmaster's lifetime will prevent > postgres from restarting because there's still a child attached to the > shared memory. I think you're willfully missing the point. The reason we added atexit_callback was to try to defend ourselves against third-party code that did things in a non-Postgres-aware way. Arguing that such code should do things in a Postgres-aware way is not helpful for the concerns here, and it's not relevant to reality either, because people will load stuff like libperl into backends. Good luck getting a post-fork on_exit_reset() call into libperl. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 04:10 PM, Tom Lane wrote: Florian Weimer writes: On 03/07/2014 03:57 PM, Tom Lane wrote: It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. Indeed. Checking getppid() in addition might narrow things down further. I don't think getppid adds much to the party. In particular, what to do if it returns 1? You can't tell if you're an orphaned backend (in which case you should still do normal shutdown) Oh. I didn't know that orphaned backends perform a normal shutdown. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Florian Weimer writes: > On 03/07/2014 03:57 PM, Tom Lane wrote: >> It's not a reason not to do something about the much larger chance of >> this happening in a direct child process, which certainly won't have a >> matching PID. > Indeed. Checking getppid() in addition might narrow things down further. I don't think getppid adds much to the party. In particular, what to do if it returns 1? You can't tell if you're an orphaned backend (in which case you should still do normal shutdown) or an orphaned grandchild. The standalone-backend case would also complicate matters. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-03-07 00:03:48 -0500, Tom Lane wrote: > >> In the bug thread I proposed making atexit_callback check whether getpid() > >> still matches MyProcPid. > > > What are you proposing to do in that case? This is only one of the > > failure cases of forking carelessly, right? > > No, I think it should do nothing. The coding pattern shown in bug #9464 > seems perfectly reasonable and I think we should allow it. No doubt it's > safer if the child process does an on_exit_reset; but right now, if the > child fails to do so, atexit_callback is actively breaking things. And > I don't think we can rely on third-party libraries to call on_exit_reset > after forking. I don't think it's a reasonable pattern run background processes that aren't managed by postgres with all shared memory still accessible. You'll have to either also detach from shared memory and related things, or you have to fork() and exec(). At the very least, not integrating the child with the postmaster's lifetime will prevent postgres from restarting because there's still a child attached to the shared memory. I don't see any way it's be safe for a pg unaware library to start forking around and doing similar random things inside normal backends. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] on_exit_reset fails to clear DSM-related exit actions
I just noticed that the DSM patch has introduced a whole new class of failures related to the bug #9464 issue: to wit, any on_detach actions registered in a parent process will also be performed when a child process exits, because nothing has been added to on_exit_reset to prevent that. It seems likely that this is undesirable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 03:57 PM, Tom Lane wrote: I think Florian's right that there's a risk there, but it seems pretty remote, and I don't see any reliable way to detect the case anyhow. (Process start time? Where would you get that from portably?) I don't think there's a portable source for that. On Linux, you'd have to use /proc. It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. Indeed. Checking getppid() in addition might narrow things down further. On Linux, linking against pthread_atfork currently requires linking against pthread (although this is about to change), and it might incur the pthread-induced overhead on some configurations. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Heikki Linnakangas writes: > On 03/07/2014 04:23 PM, Florian Weimer wrote: >> There's the PID reuse problem. Forking twice (with a delay) could end >> up with the same PID as MyProcPid. > Not if the parent process is still running. If the original parent backend is *not* still running, then running atexit_callback in the grandchild is just as dangerous if not more so; it could be clobbering shared-memory state belonging to some other session that has recycled the same PGPROC. I think Florian's right that there's a risk there, but it seems pretty remote, and I don't see any reliable way to detect the case anyhow. (Process start time? Where would you get that from portably?) It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Andres Freund writes: > On 2014-03-07 00:03:48 -0500, Tom Lane wrote: >> In the bug thread I proposed making atexit_callback check whether getpid() >> still matches MyProcPid. > What are you proposing to do in that case? This is only one of the > failure cases of forking carelessly, right? No, I think it should do nothing. The coding pattern shown in bug #9464 seems perfectly reasonable and I think we should allow it. No doubt it's safer if the child process does an on_exit_reset; but right now, if the child fails to do so, atexit_callback is actively breaking things. And I don't think we can rely on third-party libraries to call on_exit_reset after forking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 03/07/2014 10:07 PM, Craig Ringer wrote: > On 03/07/2014 09:33 PM, Craig Ringer wrote: >> On 03/05/2014 11:02 AM, Craig Ringer wrote: >>> The main known issue remaining is plan invalidation. >> >> I've pushed a version with a plan invalidation implementation. It's tagged: >> >> rls-9.4-upd-sb-views-v9 >> >> in >> >> g...@github.com:ringerc/postgres.git >> >> The invalidation implementation does not yet handle foreign key checks; >> that will require additional changes. I'll push an update to the >> rls-9.4-upd-sb-views and post an update later, at which time I'll rebase >> the changes back into the history. > > Well, that was easy. Done. > > rls-9.4-upd-sb-views-v11 > > and rebased the rls-9.4-upd-sb-views branch to incorporate the changes. > > The regression tests have further failures, but some are due to changes > in the inheritance semantics. I'm going through them now. > Need a quick opinion. KaiGai's original code produced a plan like this for an inheritance set: EXPLAIN (costs off) SELECT * FROM t1 WHERE f_leak(b) FOR SHARE; ! QUERY PLAN ! --- LockRows !-> Append ! -> Subquery Scan on t1 !Filter: f_leak(t1.b) !-> Seq Scan on t1 t1_1 ! Filter: ((a % 2) = 0) ! -> Subquery Scan on t2 !Filter: f_leak(t2.b) !-> Seq Scan on t2 t2_1 ! Filter: ((a % 2) = 1) ! -> Seq Scan on t3 !Filter: f_leak(b) (12 rows) The new code, using updatable s.b. views, instead produces: EXPLAIN (costs off) SELECT * FROM t1 WHERE f_leak(b) FOR SHARE; ! QUERY PLAN ! --- LockRows !-> Subquery Scan on t1 ! Filter: f_leak(t1.b) ! -> LockRows !-> Result ! -> Append !-> Seq Scan on t1 t1_1 ! Filter: ((a % 2) = 0) !-> Seq Scan on t2 ! Filter: ((a % 2) = 0) !-> Seq Scan on t3 ! Filter: ((a % 2) = 0) (12 rows) The different quals are expected, because of the change to the definition of inheritance handling in row security. What I'm concerned about is the locking. It looks to me like we're causing the user to lock rows that they may not intend to lock, by applying a LockRows step *before* the user supplied qual. (I'm going to test that tomorrow, it's sleep time in Australia). This seems to be related to RowMark handling in updatable security barrier views. I need to check whether it happens with updates to security barrier views, as well as with row security. Dean, any thoughts? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 2014-03-07 00:03:48 -0500, Tom Lane wrote: > In the bug thread I proposed making atexit_callback check whether getpid() > still matches MyProcPid. What are you proposing to do in that case? This is only one of the failure cases of forking carelessly, right? I think the only appropriate thing would be to make as much noise as possible in that case, which is probably something like writing a message to stderr, and then abort(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 04:23 PM, Florian Weimer wrote: On 03/07/2014 06:03 AM, Tom Lane wrote: In the bug thread I proposed making atexit_callback check whether getpid() still matches MyProcPid. If it doesn't, then presumably we inherited the atexit callback list, along with the value of MyProcPid, from some parent backend process whose elbow we should not joggle. Can anyone see a flaw in that? There's the PID reuse problem. Forking twice (with a delay) could end up with the same PID as MyProcPid. Not if the parent process is still running. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl status with nonexistent data directory
On Fri, Mar 7, 2014 at 07:18:04PM +0530, Amit Kapila wrote: > > OK, done with the attached patch Three is returned for status, but one > > for other cases. > > I think it would have been better if it return status as 4 for cases where > directory/file is not accessible (current new cases added by this patch). > > I think status 3 should be only return only when the data directory is valid > and pid file is missing, because in script after getting this code, the next > thing they will probably do is to start the server which doesn't seem a > good fit for newly added cases. OK, I have updated the attached patch to reflect this, and added a C comment. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index 0dbaa6e..56d238f *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** static bool allow_core_files = false; *** 97,102 --- 97,103 static time_t start_time; static char postopts_file[MAXPGPATH]; + static char version_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; static char backup_file[MAXPGPATH]; static char recovery_file[MAXPGPATH]; *** static void pgwin32_doRunAsService(void) *** 152,158 static int CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service); #endif ! static pgpid_t get_pgpid(void); static char **readfile(const char *path); static void free_readfile(char **optlines); static int start_postmaster(void); --- 153,159 static int CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service); #endif ! static pgpid_t get_pgpid(bool is_status_request); static char **readfile(const char *path); static void free_readfile(char **optlines); static int start_postmaster(void); *** print_msg(const char *msg) *** 246,255 } static pgpid_t ! get_pgpid(void) { FILE *pidf; long pid; pidf = fopen(pid_file, "r"); if (pidf == NULL) --- 247,280 } static pgpid_t ! get_pgpid(bool is_status_request) { FILE *pidf; long pid; + struct stat statbuf; + + if (stat(pg_data, &statbuf) != 0) + { + if (errno == ENOENT) + printf(_("%s: directory \"%s\" does not exist\n"), progname, + pg_data); + else + printf(_("%s: cannot access directory \"%s\"\n"), progname, + pg_data); + /* + * The Linux Standard Base Core Specification 3.1 says this should return + * '4, program or service status is unknown' + * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html + */ + exit(is_status_request ? 4 : 1); + } + + if (stat(version_file, &statbuf) != 0 && errno == ENOENT) + { + printf(_("%s: directory \"%s\" is not a database cluster directory\n"), + progname, pg_data); + exit(is_status_request ? 4 : 1); + } pidf = fopen(pid_file, "r"); if (pidf == NULL) *** do_start(void) *** 810,816 if (ctl_command != RESTART_COMMAND) { ! old_pid = get_pgpid(); if (old_pid != 0) write_stderr(_("%s: another server might be running; " "trying to start server anyway\n"), --- 835,841 if (ctl_command != RESTART_COMMAND) { ! old_pid = get_pgpid(false); if (old_pid != 0) write_stderr(_("%s: another server might be running; " "trying to start server anyway\n"), *** do_stop(void) *** 894,900 pgpid_t pid; struct stat statbuf; ! pid = get_pgpid(); if (pid == 0)/* no pid file */ { --- 919,925 pgpid_t pid; struct stat statbuf; ! pid = get_pgpid(false); if (pid == 0)/* no pid file */ { *** do_stop(void) *** 943,949 for (cnt = 0; cnt < wait_seconds; cnt++) { ! if ((pid = get_pgpid()) != 0) { print_msg("."); pg_usleep(100); /* 1 sec */ --- 968,974 for (cnt = 0; cnt < wait_seconds; cnt++) { ! if ((pid = get_pgpid(false)) != 0) { print_msg("."); pg_usleep(100); /* 1 sec */ *** do_restart(void) *** 980,986 pgpid_t pid; struct stat statbuf; ! pid = get_pgpid(); if (pid == 0)/* no pid file */ { --- 1005,1011 pgpid_t pid; struct stat statbuf; ! pid = get_pgpid(false); if (pid == 0)/* no pid file */ { *** do_restart(void) *** 1033,1039 for (cnt = 0; cnt < wait_seconds; cnt++) { ! if ((pid = get_pgpid()) != 0) { print_msg("."); pg_usleep(100); /* 1 sec */ --- 1058,1064 for (cnt = 0; cnt < wait_seconds; cnt++) { ! if ((pid = get_pgpid(false)) != 0) { print_msg("."); pg_usleep(100); /* 1 sec */ *** do_reload(void) *** 1071,1077 { pgpid_t pid; ! pid = ge
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 06:03 AM, Tom Lane wrote: In the bug thread I proposed making atexit_callback check whether getpid() still matches MyProcPid. If it doesn't, then presumably we inherited the atexit callback list, along with the value of MyProcPid, from some parent backend process whose elbow we should not joggle. Can anyone see a flaw in that? There's the PID reuse problem. Forking twice (with a delay) could end up with the same PID as MyProcPid. Comparing the process start time would protect against that. Checking getppid() would have the same theoretical problem. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
* Peter Eisentraut (pete...@gmx.net) wrote: > On 2/27/14, 6:04 AM, Dimitri Fontaine wrote: > > What about allowing a control file like this: > > > ># hstore extension > >comment = 'data type for storing sets of (key, value) pairs' > >default_version = '1.3' > >directory = 'local/hstore-new' > >module_pathname = '$directory/hstore' > >relocatable = true > > I think your previously proposed patch to add extension_control_path > plus my suggestion to update existing de facto best practices to not > include $libdir into the module path name (thus allowing the use of > dynamic_library_path) will address all desired use cases just fine. You still haven't addressed how to deal with the case of multiple .so's with the same name ending up in the dynamic_library_path. I don't see that as unlikely to end up happening either. > Moreover, going that way would reuse existing facilities and concepts, > remove indirections and reduce overall complexity. This new proposal, > on the other hand, would go the other way, introducing new concepts, > adding more indirections, and increasing overall complexity, while > actually achieving less. Being able to have a self-contained module which requires a minimum of modification to postgresql.conf is a reduction in complexity, imv. Having to maintain two config options which will end up being overly long and mostly duplicated doesn't make things easier for people. This has made me wonder if we could allow a control file to be explicitly referred to from CREATE EXTENSION itself, dropping the need for any of this postgresql.conf/GUC maintenance. There are downsides to that approach as well, of course, but it's definitely got a certain appeal. > I see an analogy here. What we are currently doing is similar to > hardcoding absolute rpaths into all libraries. Your proposal is > effectively to (1) add the $ORIGIN mechanism and (2) make people use > chrpath when they want to install somewhere else. My proposal is to get > rid of all rpaths and just set a search path. Yes, on technical level, > this is less powerful, but it's simpler and gets the job done and is > harder to misuse. I don't buy off on this analogy. For starters, you can change the control file without needing to rebuild the library, but the main difference is that, in practice, there are no library transistions happening and instead what we're likely to have are independent and *incompatible* libraries living with the same name in our path. > A problem with features like these is that they get rarely used but > offer infinite flexibility, so they are not used consistently and you > can't rely on anything. This is already the case for the > module_pathname setting in the control file. It has, AFAICT, no actual > use, and because of that no one uses it, and because of that, there is > no guarantee that extensions use it sensibly, and because of that no one > can ever make sensible use of it in the future, because there is no > guarantee that extensions have it set sensibly. In fact, I would > propose deprecating module_pathname. This makes sense when you have complete control over where things are installed to and can drop the control file into the "one true directory of control files" and similairly with the .so. Indeed, that works already today for certain platforms, but from what I understand, on the OSX platform you don't really get to just dump files anywhere on the filesystem that you want and instead end up forced into a specific directory tree. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/07/2014 03:48 PM, Robert Haas wrote: On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas wrote: >Hmm. You suggested ensuring that a scan always has at least a pin, and split >takes a vacuum-lock. That ought to work. There's no need for the more >complicated maneuvers you described, ISTM that you can just replace the >heavy-weight share lock with holding a pin on the primary page of the >bucket, and an exclusive lock with a vacuum-lock. Note that >_hash_expandtable already takes the exclusive lock conditionally, ie. if it >doesn't get the lock immediately it just gives up. We could do the same with >the cleanup lock. We could try that. I assume you mean do*just* what you describe here, without the split-in-progress or moved-by-split flags I suggested. Yep. The only issue I see with that is that instead of everyone piling up on the heavyweight lock, a wait which is interruptible, they'd all pile up on the buffer content lwlock, a wait which isn't. And splitting a bucket can involve an arbitrary number of I/O operations, so that's kind of unappealing. Even checkpoints would be blocked until the bucket split completed, which seems unfortunate. Hmm. I doubt that's a big deal in practice, although I agree it's a bit ugly. Once we solve the crash-safety of splits, we actually have the option of doing the split in many small steps, even when there's no crash involved. You could for example grab the vacuum-lock, move all the tuples in the first 5 pages, and then release the lock to give other backends that are queued up a chance to do their scans/insertions. Then re-acquire the lock, and continue where you left. Or just bail out and let the next vacuum or insertion to finish it later. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 03/07/2014 09:33 PM, Craig Ringer wrote: > On 03/05/2014 11:02 AM, Craig Ringer wrote: >> The main known issue remaining is plan invalidation. > > I've pushed a version with a plan invalidation implementation. It's tagged: > > rls-9.4-upd-sb-views-v9 > > in > > g...@github.com:ringerc/postgres.git > > The invalidation implementation does not yet handle foreign key checks; > that will require additional changes. I'll push an update to the > rls-9.4-upd-sb-views and post an update later, at which time I'll rebase > the changes back into the history. Well, that was easy. Done. rls-9.4-upd-sb-views-v11 and rebased the rls-9.4-upd-sb-views branch to incorporate the changes. The regression tests have further failures, but some are due to changes in the inheritance semantics. I'm going through them now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 06, 2014 at 06:14:21PM -0500, Robert Haas wrote: > On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes wrote: > > > > I've been tempted to implement a new type of hash index that allows both WAL > > and high concurrency, simply by disallowing bucket splits. At the index > > creation time you use a storage parameter to specify the number of buckets, > > and that is that. If you mis-planned, build a new index with more buckets, > > possibly concurrently, and drop the too-small one. > > Yeah, we could certainly do something like that. It sort of sucks, > though. I mean, it's probably pretty easy to know that starting with > the default 2 buckets is not going to be enough; most people will at > least be smart enough to start with, say, 1024. But are you going to > know whether you need 32768 or 1048576 or 33554432? A lot of people > won't, and we have more than enough reasons for performance to degrade > over time as it is. > It would be useful to have a storage parameter for the target size of the index, even if it is not exact, to use in the initial index build to avoid the flurry of i/o caused by bucket splits as the index grows. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/07/2014 03:48 PM, Robert Haas wrote: >So, there seems to be a few fairly simple and independent improvements to be >made: > >1. Replace the heavy-weight lock with pin & vacuum-lock. >2. Make it crash-safe, by adding WAL-logging >3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for >the squeeze phase. >4. Cache the metapage. > >We still don't know if it's going to be any better than B-tree after all >that's done, but the only way to find out is to go ahead and implement it. I'm of the opinion that we ought to start by making splits and vacuuming more concurrent, and then do that stuff. Priorities are a matter of opinion, but note that for many applications the concurrency of splits and vacuuming is pretty much irrelevant (at least after we do bullet point 3 above). Like, if the index is mostly read-only, or at least doesn't grow much. You'd still want it to be crash-safe (2), and you might care a lot about the performance of scans (1 and 4), but if splits and vacuum hold an exclusive lock for a few seconds, that's OK if you only need to do it once in a blue moon. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
[ I just sent a reply to Greg Stark's email which touches on some of the same points you mentioned here; that was mostly written last night, and finished this morning before seeing this. ] On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas wrote: > Hmm. You suggested ensuring that a scan always has at least a pin, and split > takes a vacuum-lock. That ought to work. There's no need for the more > complicated maneuvers you described, ISTM that you can just replace the > heavy-weight share lock with holding a pin on the primary page of the > bucket, and an exclusive lock with a vacuum-lock. Note that > _hash_expandtable already takes the exclusive lock conditionally, ie. if it > doesn't get the lock immediately it just gives up. We could do the same with > the cleanup lock. We could try that. I assume you mean do *just* what you describe here, without the split-in-progress or moved-by-split flags I suggested. The only issue I see with that is that instead of everyone piling up on the heavyweight lock, a wait which is interruptible, they'd all pile up on the buffer content lwlock, a wait which isn't. And splitting a bucket can involve an arbitrary number of I/O operations, so that's kind of unappealing. Even checkpoints would be blocked until the bucket split completed, which seems unfortunate. But I like the idea of teaching each scan to retain a pin on the primary buffer page. If we then do the split the way I proposed before, we can implement the "outwait concurrent scans" step by taking a cleanup lock on the primary buffer page. In this design, we only need to acquire and release the cleanup lock. Once we get the cleanup lock on the primary buffer page, even for an instant, we know that there are no remaining scans in the bucket that need the pre-split tuples that have now been moved to the new bucket. We can then remove tuples with a lesser lock (or keep the stronger lock if we wish to re-pack). > Vacuum could also be enhanced. It currently takes an exclusive lock on the > bucket, then removes any dead tuples and finally "squeezes" the bucket by > moving tuples to earlier pages. But you only really need the exclusive lock > for the squeeze-phase. You could do the dead-tuple removal without the > bucket-lock, and only grab for the squeeze-phase. And the squeezing is > optional, so you could just skip that if you can't get the lock. But that's > a separate patch as well. Yeah, I think this would be a useful improvement. > One more thing we could do to make hash indexes more scalable, independent > of the above: Cache the information in the metapage in backend-private > memory. Then you (usually) wouldn't need to access the metapage at all when > scanning. Store a copy of the bitmask for that bucket in the primary page, > and when scanning, check that it matches the cached value. If not, refresh > the cached metapage and try again. I don't know whether this would be a useful improvement. > So, there seems to be a few fairly simple and independent improvements to be > made: > > 1. Replace the heavy-weight lock with pin & vacuum-lock. > 2. Make it crash-safe, by adding WAL-logging > 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for > the squeeze phase. > 4. Cache the metapage. > > We still don't know if it's going to be any better than B-tree after all > that's done, but the only way to find out is to go ahead and implement it. I'm of the opinion that we ought to start by making splits and vacuuming more concurrent, and then do that stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ctl status with nonexistent data directory
On Fri, Mar 7, 2014 at 9:41 AM, Bruce Momjian wrote: > On Fri, Mar 7, 2014 at 09:07:24AM +0530, Amit Kapila wrote: >> One reason could be that as we are already returning special exit code >> for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as >> this >> also get called during status command. >> >> Also in the link sent by you in another mail, I could see some statement >> which indicates it is more important to return correct codes in case of >> status which sounds a good reason to me. >> >> Link and statement >> http://www.postgresql.org/message-id/51d1e482.5090...@gmx.net >> >> "The "status" case is different, because there the exit code >> can be passed out by the init script directly." >> >> If we just want to handle correct exit codes for "status" option, then >> may be we need to refactor the code a bit to ensure the same. > > OK, done with the attached patch Three is returned for status, but one > for other cases. I think it would have been better if it return status as 4 for cases where directory/file is not accessible (current new cases added by this patch). I think status 3 should be only return only when the data directory is valid and pid file is missing, because in script after getting this code, the next thing they will probably do is to start the server which doesn't seem a good fit for newly added cases. Other than that patch seems fine to me. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-security on updatable s.b. views
On 03/05/2014 11:02 AM, Craig Ringer wrote: > The main known issue remaining is plan invalidation. I've pushed a version with a plan invalidation implementation. It's tagged: rls-9.4-upd-sb-views-v9 in g...@github.com:ringerc/postgres.git The invalidation implementation does not yet handle foreign key checks; that will require additional changes. I'll push an update to the rls-9.4-upd-sb-views and post an update later, at which time I'll rebase the changes back into the history. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.9.1
On 2014-03-07 10:17:21 -0300, Alvaro Herrera wrote: > Andres Freund escribió: > > > fprintf(stderr, > > - _("%s: could not identify system: got > > %d rows and %d fields, expected %d rows and %d fields\n"), > > - progname, PQntuples(res), > > PQnfields(res), 1, 3); > > + _("%s: could not identify system: got > > %d rows and %d fields, expected 1 row and 3 or more fields\n"), > > + progname, PQntuples(res), > > PQnfields(res)); > > Please don't change this. The reason these messages use %d and an extra > printf argument is to avoid giving translators extra work when the > number of rows or fields is changed. In these cases I suggest this: > > > - _("%s: could not identify system: got > > %d rows and %d fields, expected %d rows and %d fields\n"), > > - progname, PQntuples(res), > > PQnfields(res), 1, 3); > > + _("%s: could not identify system: got > > %d rows and %d fields, expected %d rows and %d or more fields\n"), > > + progname, PQntuples(res), > > PQnfields(res), 1, 3); > > (Yes, I know the "expected 1 rows" output looks a bit silly. Since this > is an unexpected error message anyway, I don't think that's worth > fixing.) I changed it to not use placeholders because I thought "or more" was specific enough to be unlikely to be used in other places, but I don't have a problem with continuing to use them. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On Thu, Mar 6, 2014 at 7:07 PM, Greg Stark wrote: > On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas wrote: >>> I've been tempted to implement a new type of hash index that allows both WAL >>> and high concurrency, simply by disallowing bucket splits. At the index >>> creation time you use a storage parameter to specify the number of buckets, >>> and that is that. If you mis-planned, build a new index with more buckets, >>> possibly concurrently, and drop the too-small one. >> >> Yeah, we could certainly do something like that. It sort of sucks, >> though. I mean, it's probably pretty easy to know that starting with >> the default 2 buckets is not going to be enough; most people will at >> least be smart enough to start with, say, 1024. But are you going to >> know whether you need 32768 or 1048576 or 33554432? A lot of people >> won't, and we have more than enough reasons for performance to degrade >> over time as it is. > > The other thought I had was that you can do things lazily in vacuum. > So when you probe you need to check multiple pages until vacuum comes > along and rehashes everything. I was thinking about this, too. Cleaning up the old bucket after the split is pretty similar to vacuuming. And lo and behold, vacuum also needs to lock the entire bucket. AFAICT, there are two reasons for this. First, when we resume a suspended scan, we assume that the next match on the page, if any, will occur on the same page at an offset greater than the offset where we found the previous match. The code copes with the possibility of current insertions by refinding the last item we returned and scanning forward from there, but assumes the pointer we're refinding can't have moved to a lower offset. I think this could be easily fixed - at essentially no cost - by changing hashgettuple so that, if the forward scan errors out, it tries scanning backward rather than just giving up. Second, vacuum compacts each bucket that it modifies using _hash_squeezebucket, which scans from the two ends of the index in toward the middle, filling any free space on earlier pages by pulling tuples from the end of the bucket chain. This is a little thornier. We could change vacuum so that it only removes TIDs from the individual pages, without actually trying to free up pages, but that seems undesirable. However, I think we might be able to get by with making bucket compaction less aggressive, without eliminating it altogether. Suppose that whenever we remove items from a page, we consider consolidating the page with its immediate predecessor and successor in the bucket chain. This means our utilization will be a little over 50% in the worst case where we have a full page, a page with one item, another full page, another page with one item, and so on. But that's not all bad, because compacting a bucket chain to the minimum possible size when it may well be about to suffer more inserts isn't necessarily a good thing anyway. Also, doing this means that we don't need to lock out scans from the entire bucket chain in order to compact. It's sufficient to make sure that nobody's in the middle of scanning the two pages we want to merge. That turns out not to be possible right now. When a scan is suspended, we still hold a pin on the page we're scanning. But when _hash_readnext or _hash_readprev walks the bucket chain, it drops both lock and pin on the current buffer and then pins and locks the new buffer. That, however, seems like it could easily be changed: drop lock on current buffer, acquire pin on new buffer, drop pin on current buffer, lock new buffer. Then, if we want to merge two buffers, it's enough to lock both of them for cleanup. To avoid any risk of deadlock, and also to avoid waiting for a long-running suspended scan to wake up and complete, we can do this conditionally; if we fail to get either cleanup lock, then just don't merge the pages; take an exclusive lock only and remove whatever you need to remove, leaving the rest. Merging pages is only a performance optimization, so if it fails now and then, no real harm done. (A side benefit of this approach is that we could opportunistically attempt to compact pages containing dead items any time we can manage a ConditionalLockBufferForCleanup() on the page, sort of like what HOT pruning does for heap blocks. We could even, after pruning away dead items, attempt to merge the page with siblings, so that even without vacuum the bucket chains can gradually shrink if the index tuples are discovered to be pointing to dead heap tuples.) With the above changes, vacuuming a bucket can happen without taking the heavyweight lock in exclusive mode, leaving bucket splitting as the only operation that requires it. And we could perhaps use basically the same algorithm to clean the tuples out of the old bucket after the split. The problem is that, when we're vacuuming, we know that no scan currently in progress can still care about any of the tuples we're removing.
Re: [HACKERS] Changeset Extraction v7.9.1
Andres Freund escribió: > fprintf(stderr, > - _("%s: could not identify system: got > %d rows and %d fields, expected %d rows and %d fields\n"), > - progname, PQntuples(res), > PQnfields(res), 1, 3); > + _("%s: could not identify system: got > %d rows and %d fields, expected 1 row and 3 or more fields\n"), > + progname, PQntuples(res), > PQnfields(res)); Please don't change this. The reason these messages use %d and an extra printf argument is to avoid giving translators extra work when the number of rows or fields is changed. In these cases I suggest this: > - _("%s: could not identify system: got > %d rows and %d fields, expected %d rows and %d fields\n"), > - progname, PQntuples(res), > PQnfields(res), 1, 3); > + _("%s: could not identify system: got > %d rows and %d fields, expected %d rows and %d or more fields\n"), > + progname, PQntuples(res), > PQnfields(res), 1, 3); (Yes, I know the "expected 1 rows" output looks a bit silly. Since this is an unexpected error message anyway, I don't think that's worth fixing.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.9.1
Hi, On 2014-03-05 23:20:57 +0100, Andres Freund wrote: > On 2014-03-05 17:05:24 -0500, Robert Haas wrote: > > > I very much dislike having the three different event loops, but it's > > > pretty much forced by the design of the xlogreader. "My" xlogreader > > > version didn't block when it neeeded to wait for WAL but just returned > > > "need input/output", but with the eventually committed version you're > > > pretty much forced to block inside the read_page callback. > > > > > > I don't really have a idea how we could sensibly unify them atm. > > > > WalSndLoop(void (*gutsfn)())? > > The problem is that they are actually different. In the WalSndLoop we're > also maintaining the walsender's state, in WalSndWriteData() we're just > waiting for writes to be flushed, in WalSndWaitForWal we're primarily > waiting for the flush pointer to pass some LSN. And the timing of the > individual checks isn't trivial (just added some more comments about > it). > > I'll simplify it by pulling out more common code, maybe it'll become > apparent how it should look. I've attached a new version of the walsender patch. It's been rebased ontop of Heikki's latest commit to walsender.c. I've changed a fair bit of stuff: * The sleeptime is now computed to sleep until we either need to send a keepalive or kill ourselves, as Heikki sugggested. * Sleep time computation, sending pings, checking timeouts is now done in separate functions. * Comment and codestyle improvements. Although they are shorter and simpler now, I have not managed to unify the three loops however. They seem to be too different to unify them inside one. I tried a common function with an 'wait_for' bitmask argument, but that turned out to be fairly illegible. The checks in WalSndWaitForWal() and WalSndLoop() just seem to be too different. I'd be grateful if you (or somebody else!) could have a quick look at body of the loops in WalSndWriteData(), WalSndWaitForWal() and WalSndLoop(). Maybe I am just staring at it the wrong way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From ca84cd3d2966f0e7297d1c7270b094122eec2f18 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Wed, 5 Mar 2014 00:15:38 +0100 Subject: [PATCH] Add walsender interface for the logical decoding functionality. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This exposes the changeset extraction feature via walsenders which allows the data to be received in a streaming fashion and supports synchronous replication. To do this walsenders need to be able to connect to a specific database. For that it extend the existing 'replication' parameter to not only allow a boolean value but also "database". If the latter is specified we connect to the database specified in 'dbname'. Andres Freund, with contributions from Álvaro Herrera, reviewed by Robert Haas. --- doc/src/sgml/protocol.sgml | 24 +- src/backend/postmaster/postmaster.c| 28 +- .../libpqwalreceiver/libpqwalreceiver.c| 4 +- src/backend/replication/repl_gram.y| 81 +- src/backend/replication/repl_scanner.l | 1 + src/backend/replication/walsender.c| 914 ++--- src/backend/utils/init/postinit.c | 15 +- src/bin/pg_basebackup/pg_basebackup.c | 6 +- src/bin/pg_basebackup/pg_receivexlog.c | 6 +- src/bin/pg_basebackup/receivelog.c | 6 +- src/include/replication/walsender.h| 1 + src/tools/pgindent/typedefs.list | 1 + 12 files changed, 918 insertions(+), 169 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index d36f2f3..510bf9a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1302,10 +1302,13 @@ To initiate streaming replication, the frontend sends the -replication parameter in the startup message. This tells the -backend to go into walsender mode, wherein a small set of replication commands -can be issued instead of SQL statements. Only the simple query protocol can be -used in walsender mode. +replication parameter in the startup message. A boolean value +of true tells the backend to go into walsender mode, wherein a +small set of replication commands can be issued instead of SQL statements. Only +the simple query protocol can be used in walsender mode. +Passing a database as the value instructs walsender to connect to +the database specified in the dbname paramter which will in future +allow some additional commands to the ones specified below to be run. The commands accepted in walsender mode are: @@ -1315,7 +1318,7 @@ The commands accepted in walsender mode are: Requests the server to identify itself. Server replies with a result - set of a single row, containing th
Re: [HACKERS] Hot standby doesn't come up on some situation.
Hello, > > After all, I have confirmed that this fixes the problem on crash > > recovery of hot-standby botfor 9.3 and HEAD and no problem was > > found except unreadability :( > > Ok, committed. Thanks! Thank you. > Any concrete suggestions about the readability? Is there some > particular spot that needs clarifying? Well, Although I became to see no problem there after I understood how it works:-), I'll write about two points where I had difficulties to understand. | * (or the checkpoint record itself, if it's a shutdown checkpoint). | */ | if (checkPoint.redo < RecPtr) First, it was a bit tough work to confirm the equivalence between (redo == RecPtr) and that the checkpoint is shutdown checkpoint. Although finally I was convinced that it surely holds, that is actually not the point. The point here is in the first half of the phrase. The comment might be less perplexing if it were as folowing even if only shutdown checkpoint satisfies the condition. But it would occur another quiestion in readers' mind. | * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint). Second, the added code depends on the assumption that RecPtr points to the checkpoint record and EndRecPtr points to the next record there. It would be better for understandability and stability (against modifying code) to explicitly declare the precondition, like this | Here RecPtr points the checkpoint record and EndRecPtr points to the | place for the record just after. > > Surely this is the consequence of illegal operation but I think > > it is also not a issue of assertion - which fires on something > > wrong in design or quite rare cases(this case ?). > > Ah, I see. Yes, that's definitely a bug. If you don't hit the > assertion, because the oldestActiveXID is set in the checkpoint record > even though wal_level is 'archive', or if you simply have assertions > disabled, the system will start up in hot standby mode even though > it's not safe. > > > So it might be > > better to show message as below on the case. > > > > | FATAL: Checkpoint doesn't have valid oldest active transaction id > > | HINT: Reading WAL might have been written under insufficient > > | wal_level. Agreed. > Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though > wal_level is 'archive'. So the above patch doesn't fix the whole > problem. > > The real bug here is that CheckRequiredParameterValues() tests for > InArchiveRecovery, when it should be testing for > ArchiveRecoveryRequested. Otherwise, the checks are not performed when > going through the crash recovery followed by archive recovery. I > should've changed that as part of the commit that added the crash > recovery then archive recovery behavior. > > Fixed, thanks for pointing it out! It's my pleasre. regrds, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extension_control_path
On 2/27/14, 6:04 AM, Dimitri Fontaine wrote: > What about allowing a control file like this: > ># hstore extension >comment = 'data type for storing sets of (key, value) pairs' >default_version = '1.3' >directory = 'local/hstore-new' >module_pathname = '$directory/hstore' >relocatable = true I think your previously proposed patch to add extension_control_path plus my suggestion to update existing de facto best practices to not include $libdir into the module path name (thus allowing the use of dynamic_library_path) will address all desired use cases just fine. Moreover, going that way would reuse existing facilities and concepts, remove indirections and reduce overall complexity. This new proposal, on the other hand, would go the other way, introducing new concepts, adding more indirections, and increasing overall complexity, while actually achieving less. I see an analogy here. What we are currently doing is similar to hardcoding absolute rpaths into all libraries. Your proposal is effectively to (1) add the $ORIGIN mechanism and (2) make people use chrpath when they want to install somewhere else. My proposal is to get rid of all rpaths and just set a search path. Yes, on technical level, this is less powerful, but it's simpler and gets the job done and is harder to misuse. A problem with features like these is that they get rarely used but offer infinite flexibility, so they are not used consistently and you can't rely on anything. This is already the case for the module_pathname setting in the control file. It has, AFAICT, no actual use, and because of that no one uses it, and because of that, there is no guarantee that extensions use it sensibly, and because of that no one can ever make sensible use of it in the future, because there is no guarantee that extensions have it set sensibly. In fact, I would propose deprecating module_pathname. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] missing RelationCloseSmgr in FreeFakeRelcacheEntry?
On 03/06/2014 02:18 AM, Bruce Momjian wrote: On Tue, Nov 5, 2013 at 08:36:32PM +0100, Andres Freund wrote: On 2013-11-04 13:48:32 +0100, Andres Freund wrote: What about just unowning the smgr entry with if (rel->rd_smgr != NULL) smgrsetowner(NULL, rel->rd_smgr) when closing the fake relcache entry? That shouldn't require any further changes than changing the comment in smgrsetowner() that this isn't something expected to frequently happen. Attached is a patch doing it like that, it required slightmy more invasive changes than that. With the patch applied we survive replay of a primary's make check run under valgrind without warnings. Where are we on this patch? Committed now, with small changes. I made the new smgrclearowner function to check that the SmgrRelation object is indeed owned by the owner we expect, so that it doesn't unown it if it's actually owned by someone else. That shouldn't happen, but it seemed prudent to check. Thanks Andres. I tried to reproduce the valgrind message you reported, but couldn't. How did you do it? Did this commit fix it? And thanks for the nudge, Bruce. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC on WAL-logging hash indexes
On 03/06/2014 09:34 PM, Robert Haas wrote: On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas wrote: I don't think it's necessary to improve concurrency just to get WAL-logging. Better concurrency is a worthy goal of its own, of course, but it's a separate concern. To some extent, I agree, but only to some extent. To make hash indexes generally usable, we really need to solve both problems. When I got rid of just some of the heavyweight locking in commit 76837c1507cb5a5a0048046233568447729e66dd, the results were pretty dramatic at higher levels of concurrency: http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com But there was still an awful lot of contention inside the heavyweight lock manager, and I don't think hash indexes are going to be ready for prime time until we solve that problem. Hmm. You suggested ensuring that a scan always has at least a pin, and split takes a vacuum-lock. That ought to work. There's no need for the more complicated maneuvers you described, ISTM that you can just replace the heavy-weight share lock with holding a pin on the primary page of the bucket, and an exclusive lock with a vacuum-lock. Note that _hash_expandtable already takes the exclusive lock conditionally, ie. if it doesn't get the lock immediately it just gives up. We could do the same with the cleanup lock. Vacuum could also be enhanced. It currently takes an exclusive lock on the bucket, then removes any dead tuples and finally "squeezes" the bucket by moving tuples to earlier pages. But you only really need the exclusive lock for the squeeze-phase. You could do the dead-tuple removal without the bucket-lock, and only grab for the squeeze-phase. And the squeezing is optional, so you could just skip that if you can't get the lock. But that's a separate patch as well. One more thing we could do to make hash indexes more scalable, independent of the above: Cache the information in the metapage in backend-private memory. Then you (usually) wouldn't need to access the metapage at all when scanning. Store a copy of the bitmask for that bucket in the primary page, and when scanning, check that it matches the cached value. If not, refresh the cached metapage and try again. So, there seems to be a few fairly simple and independent improvements to be made: 1. Replace the heavy-weight lock with pin & vacuum-lock. 2. Make it crash-safe, by adding WAL-logging 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for the squeeze phase. 4. Cache the metapage. We still don't know if it's going to be any better than B-tree after all that's done, but the only way to find out is to go ahead and implement it. This seems like a great GSoC project to me. We have a pretty good idea of what we want to accomplish. It's uncontroversial: I don't think anyone is going to object to improving hash indexes (one could argue that it's a waste of time, but that's different from objecting to the idea). And it consists of a few mostly independent parts, so it's possible to do incrementally which makes it easier to track progress, and we'll probably have something useful at the end of the summer even if it doesn't all get finished. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe Reply-To:
On 6 March 2014 22:43, Noah Misch wrote: > On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote: >> On 2014-03-04 11:40:10 -0500, Tom Lane wrote: >> > Robert Haas writes: > I think this is all too >> > late for 9.4, though. >> > >> > I agree with the feeling that a meaningful fix for pg_dump isn't going >> > to get done for 9.4. So that leaves us with the alternatives of (1) >> > put off the lock-strength-reduction patch for another year; (2) push >> > it anyway and accept a reduction in pg_dump reliability. >> > >> > I don't care for (2). I'd like to have lock strength reduction as >> > much as anybody, but it can't come at the price of reduction of >> > reliability. >> >> I am sorry, but I think this is vastly overstating the scope of the >> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the >> amount of problems that has caused in the past is surprisingly low. If >> such a frequently used command doesn't cause problems, why are you >> assuming other commands to be that problematic? And I think it's hard to >> argue that the proposed changes are more likely to cause problems. >> >> Let's try to go at this a bit more methodically. The commands that - >> afaics - change their locklevel due to latest patch (v21) are: > [snip] > > Good analysis. The hazards arise when pg_dump uses one of the ruleutils.c > deparse worker functions. As a cross-check to your study, I looked briefly at > the use of those functions in pg_dump and how this patch might affect them: > > -- pg_get_constraintdef() > > pg_dump reads the constraint OID with its transaction snapshot, so we will > never see a too-new constraint. Dropping a constraint still requires > AccessExclusiveLock. Agreed > Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its > transaction snapshot and uses that to decide whether to dump the CHECK > constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD > CONSTRAINT following the data load. However, pg_get_constraintdef() reads the > latest convalidated to decide whether to emit NOT VALID. Consequently, one > can get a dump in which the dumped table data did not yet conform to the > constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails. > (Suppose you deleted the last invalid rows just before executing the VALIDATE > CONSTRAINT. I tested this by committing the DELETE + VALIDATE CONSTRAINT with > pg_dump stopped at getTableAttrs().) > > One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT > problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did > not do so. It's, conveniently, the last part of the definition. I would tend > to choose this. We could also just decide this isn't bad enough to worry > about. The consequence is that an ALTER TABLE ADD CONSTRAINT fails. Assuming > no --single-transaction for the original restoral, you just add NOT VALID to > the command and rerun. Like most of the potential new pg_dump problems, this > can already happen today if the relevant database changes happen between > taking the pg_dump transaction snapshot and locking the tables. Too hacky for me, but some good thinking. My proposed solution is below. > -- pg_get_expr() for default expressions > > pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will > never see a too-new default. This does allow us to read a dropped default. > That's not a problem directly. However, suppose the default references a > function dropped at the same time as the default. pg_dump could fail in > pg_get_expr(). > > -- pg_get_indexdef() > > As you explained elsewhere, new indexes are no problem. DROP INDEX still > requires AccessExclusiveLock. Overall, no problems here. > > -- pg_get_ruledef() > > The patch changes lock requirements for enabling and disabling of rules, but > that is all separate from the rule expression handling. No problems. > > -- pg_get_triggerdef() > > The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock. > The implications for pg_dump are similar to those for pg_get_expr(). These are certainly concerning. What surprises me the most is that pg_dump has been so happy to randomly mix SQL using the transaction snapshot with sys cache access code using a different snapshot. If that was intention there is no documentation in code or in the docs to explain that. > -- pg_get_viewdef() > > Untamed: pg_dump does not lock views at all. OMG, its really a wonder pg_dump works at all. > One thing not to forget is that you can always get the old mutual exclusion > back by issuing LOCK TABLE just before a DDL operation. If some unlucky user > regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a > workaround. There's no comparable way for someone who would not experience > that problem to weaken the now-hardcoded AccessExclusiveLock. Many > consequences of insufficient locking are too severe for that workaround to > bring comfort,
Re: [HACKERS] syslog_ident mentioned as syslog_identify in the docs
On 03/07/2014 08:24 AM, Michael Paquier wrote: In the documentation, particularly the doc index, syslog_ident is incorrectly mentioned as syslog_identify. The attached patch fixes that. This error is in the docs since 8.0. Thanks, fixed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode
On Thu, Mar 6, 2014 at 12:09 AM, Tom Lane wrote: > Andrew Dunstan writes: >> On 03/05/2014 09:11 AM, Michael Paquier wrote: >>> After testing this feature, I noticed that FORCE_NULL and >>> FORCE_NOT_NULL can both be specified with COPY on the same column. > >> Strictly they are not actually contradictory, since FORCE NULL relates >> to quoted null strings and FORCE NOT NULL relates to unquoted null >> strings. Arguably the docs are slightly loose on this point. Still, >> applying both FORCE NULL and FORCE NOT NULL to the same column would be >> rather perverse, since it would result in a quoted null string becoming >> null and an unquoted null string becoming not null. > > Given the remarkable lack of standardization of "CSV" output, who's > to say that there might not be data sources out there for which this > is the desired behavior? It's weird, I agree, but I think throwing > an error for the combination is not going to be helpful. It's not > like somebody might accidentally write both on the same column. > > +1 for clarifying the docs, though, more or less in the words you > used above. Following that, I have hacked the patch attached to update the docs with an additional regression test (actually replaces a test that was the same as the one before in copy2). I am attaching as well a second patch for file_fdw, to allow the use of force_null and force_not_null on the same column, to be consistent with COPY. Regards, -- Michael diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 7fb1dbc..97a35d0 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -267,11 +267,6 @@ file_fdw_validator(PG_FUNCTION_ARGS) (errcode(ERRCODE_SYNTAX_ERROR), errmsg("conflicting or redundant options"), errhint("option \"force_not_null\" supplied more than once for a column"))); - if(force_null) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("conflicting or redundant options"), - errhint("option \"force_not_null\" cannot be used together with \"force_null\""))); force_not_null = def; /* Don't care what the value is, as long as it's a legal boolean */ (void) defGetBoolean(def); @@ -284,11 +279,6 @@ file_fdw_validator(PG_FUNCTION_ARGS) (errcode(ERRCODE_SYNTAX_ERROR), errmsg("conflicting or redundant options"), errhint("option \"force_null\" supplied more than once for a column"))); - if(force_not_null) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("conflicting or redundant options"), - errhint("option \"force_null\" cannot be used together with \"force_not_null\""))); force_null = def; (void) defGetBoolean(def); } diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source index 0c278aa..b608372 100644 --- a/contrib/file_fdw/input/file_fdw.source +++ b/contrib/file_fdw/input/file_fdw.source @@ -91,24 +91,22 @@ ALTER FOREIGN TABLE text_csv OPTIONS (SET format 'csv'); \pset null _null_ SELECT * FROM text_csv; +-- force_not_null and force_null can be used together on the same column +ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true'); +ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS (force_not_null 'true'); + -- force_not_null is not allowed to be specified at any foreign object level: ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_not_null '*'); -- ERROR ALTER SERVER file_server OPTIONS (ADD force_not_null '*'); -- ERROR CREATE USER MAPPING FOR public SERVER file_server OPTIONS (force_not_null '*'); -- ERROR CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_not_null '*'); -- ERROR --- force_not_null cannot be specified together with force_null -ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true'); --ERROR - -- force_null is not allowed to be specified at any foreign object level: ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_null '*'); -- ERROR ALTER SERVER file_server OPTIONS (ADD force_null '*'); -- ERROR CREATE USER MAPPING FOR public SERVER file_server OPTIONS (force_null '*'); -- ERROR CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_null '*'); -- ERROR --- force_null cannot be specified together with force_not_null -ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS (force_not_null 'true'); --ERROR - -- basic query tests SELECT * FROM agg_text WHERE b > 10.0 ORDER BY a; SELECT * FROM agg_csv ORDER BY a; diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source index 2bec160..bc183b8 100644 --- a/contrib/file_fdw/output/file_fdw.source +++ b/contrib/file_fdw/output/file_fdw.source @@ -115,6 +115,9 @@ SELECT * FROM text_csv; ABC | abc|| (5 rows) +-- force_not_null and force_null can be used together on the same column +ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true'); +ALTER FOREIGN TABLE text_csv ALTE