date:20140307

On Sun, Dec  1, 2013 at 12:07:21PM -0500, Gurjeet Singh wrote:
> On Wed, Nov 27, 2013 at 9:12 AM, Robert Haas  wrote:
> 
> 
> This is a performance patch, so it should come with benchmark results
> demonstrating that it accomplishes its intended purpose.  I don't see
> any.
> 
> 
> Yes, this is a performance patch, but as the subject says, it saves a few
> instructions. I don't know how to write a test case that can measure savings 
> of
> skipping a few instructions in a startup sequence that potentially takes
> thousands, or even millions, of instructions.

Are we saying we don't want this patch?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade: delete_old_cluster.sh issues

On Mon, Nov 18, 2013 at 10:13:19PM -0500, Bruce Momjian wrote:
> On Tue, Nov 12, 2013 at 10:35:58AM +, Marc Mamin wrote:
> > Hello,
> >  
> > IMHO, there is a serious issue in the script to clean the old data directory
> > when running pg_upgrade in link mode.
> >  
> > in short: When working with symbolic links, the first step in
> > delete_old_cluster.sh
> > is to delete the old $PGDATA folder that may contain tablespaces used by the
> > new instance.
> >  
> > in long, our use case:
> 
> Rather than removing files/directories individually, which would be
> difficult to maintain, we decided in pg_upgrade 9.3 to detect
> tablespaces in the old data directory and report that and not create a
> delete script.  Here is the commit:
> 
>
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=4765dd79219b9697d84f5c2c70f3fe00455609a1
> 
> The problem with your setup is that while you didn't pass symbolic links
> to pg_upgrade, you did use symbolic links when defining the tablespaces,
> so pg_upgrade couldn't recognize that the symbolic links were inside the
> old data directory.
> 
> We could use readlink() to go walk over all symbolic links, but that
> seems quite complex.  We could use stat() and make sure there are no
> matching inodes in the old data directory, or that they are in a
> different file system.  We could look for a directory named after the PG
> catversion in the old data directory.  We could update the docs.
> 
> I am not sure what to do.  We never expected people would put
> tablespaces in the data directory.

I went with a doc patch, attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index 4d03b12..72e3cb6
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** psql --username postgres --file script.s
*** 460,466 
   cluster's data directories by running the script mentioned when
   pg_upgrade completes. You can also delete the
   old installation directories
!  (e.g. bin, share).
  
 
  
--- 460,467 
   cluster's data directories by running the script mentioned when
   pg_upgrade completes. You can also delete the
   old installation directories
!  (e.g. bin, share).  This will not work
!  if you have tablespaces inside the old data directory.
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl status with nonexistent data directory

2014-03-07 Thread Amit Kapila

On Fri, Mar 7, 2014 at 7:59 PM, Bruce Momjian  wrote:
> On Fri, Mar  7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:
>> > OK, done with the attached patch  Three is returned for status, but one
>> > for other cases.
>>
>> I think it would have been better if it return status as 4 for cases where
>> directory/file is not accessible (current new cases added by this patch).
>>
>> I think status 3 should be only return only when the data directory is valid
>> and pid file is missing, because in script after getting this code, the next
>> thing they will probably do is to start the server which doesn't seem a
>> good fit for newly added cases.
>
> OK, I have updated the attached patch to reflect this, and added a C
> comment.

This is fine, do you think we should mention about this in docs?
I have added one line in docs (updated patch attached), if you think
it makes sense then add it otherwise ignore it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 45b53ce..99cb176 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -206,9 +206,10 @@ PostgreSQL documentation
   
status mode checks whether a server is running in
the specified data directory. If it is, the PID
-   and the command line options that were used to invoke it are
-   displayed.  If the server is not running, the process returns an
-   exit status of 3.
+and the command line options that were used to invoke it are
+displayed.  If the server is not running, the process returns an
+exit status of 3.  If the specified data directory is not accessible,
+the process returns an exit status of 4.
   
 
   
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 0dbaa6e..56d238f 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -97,6 +97,7 @@ static bool allow_core_files = false;
 static time_t start_time;
 
 static char postopts_file[MAXPGPATH];
+static char version_file[MAXPGPATH];
 static char pid_file[MAXPGPATH];
 static char backup_file[MAXPGPATH];
 static char recovery_file[MAXPGPATH];
@@ -152,7 +153,7 @@ static void pgwin32_doRunAsService(void);
 static int CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION 
*processInfo, bool as_service);
 #endif
 
-static pgpid_t get_pgpid(void);
+static pgpid_t get_pgpid(bool is_status_request);
 static char **readfile(const char *path);
 static void free_readfile(char **optlines);
 static int start_postmaster(void);
@@ -246,10 +247,34 @@ print_msg(const char *msg)
 }
 
 static pgpid_t
-get_pgpid(void)
+get_pgpid(bool is_status_request)
 {
FILE   *pidf;
longpid;
+   struct stat statbuf;
+
+   if (stat(pg_data, &statbuf) != 0)
+   {
+   if (errno == ENOENT)
+   printf(_("%s: directory \"%s\" does not exist\n"), 
progname,
+pg_data);
+   else
+   printf(_("%s: cannot access directory \"%s\"\n"), 
progname,
+pg_data);
+   /*
+* The Linux Standard Base Core Specification 3.1 says this 
should return
+* '4, program or service status is unknown'
+* 
https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
+*/
+   exit(is_status_request ? 4 : 1);
+   }
+
+   if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+   {
+   printf(_("%s: directory \"%s\" is not a database cluster 
directory\n"),
+progname, pg_data);
+   exit(is_status_request ? 4 : 1);
+   }
 
pidf = fopen(pid_file, "r");
if (pidf == NULL)
@@ -810,7 +835,7 @@ do_start(void)
 
if (ctl_command != RESTART_COMMAND)
{
-   old_pid = get_pgpid();
+   old_pid = get_pgpid(false);
if (old_pid != 0)
write_stderr(_("%s: another server might be running; "
   "trying to start server 
anyway\n"),
@@ -894,7 +919,7 @@ do_stop(void)
pgpid_t pid;
struct stat statbuf;
 
-   pid = get_pgpid();
+   pid = get_pgpid(false);
 
if (pid == 0)   /* no pid file */
{
@@ -943,7 +968,7 @@ do_stop(void)
 
for (cnt = 0; cnt < wait_seconds; cnt++)
{
-   if ((pid = get_pgpid()) != 0)
+   if ((pid = get_pgpid(false)) != 0)
{
print_msg(".");
pg_usleep(100); /* 1 sec */
@@ -980,7 +1005,7 @@ do_restart(void)
pgpid_t pid;
struct stat statbuf;
 
-   pid = get_pgpid();
+   pid = get_pgpid(fals

Re: [HACKERS] extension_control_path

2014-03-07 Thread Peter Eisentraut

[I answered most of these concerns in more detail in the reply to Dimitri.]

On 3/7/14, 9:16 AM, Stephen Frost wrote:
> Being able to have a self-contained module which requires a minimum of
> modification to postgresql.conf is a reduction in complexity, imv.
> Having to maintain two config options which will end up being overly
> long and mostly duplicated doesn't make things easier for people.

Then we can make it one path.

> This
> has made me wonder if we could allow a control file to be explicitly
> referred to from CREATE EXTENSION itself, dropping the need for any of
> this postgresql.conf/GUC maintenance.  There are downsides to that
> approach as well, of course, but it's definitely got a certain appeal.

That might be useful as a separate feature, but it reeks of #include
, which isn't a sane practice.  No programming
language other than ancient or poorly designed ones allows that sort of
thing.

> I don't buy off on this analogy.  For starters, you can change the
> control file without needing to rebuild the library,

(You can also change the rpath without rebuilding the library.)

> but the main
> difference is that, in practice, there are no library transistions
> happening and instead what we're likely to have are independent and
> *incompatible* libraries living with the same name in our path.

I understand this concern.  The question is, how big is it relative to
the other ones.

As side idea I just had, how about embedding the extension version
number into the library somehow?  Food for thought.

> This makes sense when you have complete control over where things are
> installed to and can drop the control file into the "one true directory
> of control files" and similairly with the .so.  Indeed, that works
> already today for certain platforms, but from what I understand, on the
> OSX platform you don't really get to just dump files anywhere on the
> filesystem that you want and instead end up forced into a specific
> directory tree.

That is incorrect.

If someone has a general use for module_pathname, I'd be interested to
hear it, but that's not one of them.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-07 Thread Florian Pflug

On Mar5, 2014, at 23:49 , Tom Lane  wrote:
> Florian Pflug  writes:
>> On Mar5, 2014, at 18:37 , Tom Lane  wrote:
>>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>>> the patch can't agree on what it means, what hope have everyday users
>>> got of making sense of it?
> 
>> The question isn't what the current output means, but whether it's a
>> good metric to report or not.
> 
> If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

>> If we don't report anything, then how would a user check whether a query
>> is slow because of O(n^2) behaviour of a windowed aggregate, or because
>> of some other reasons?
> 
> [ shrug... ]  They can see whether the Window plan node is where the time
> is going.  It's not apparent to me that the extra numbers you propose to
> report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-03-07 Thread Peter Eisentraut

On 3/7/14, 11:39 AM, Dimitri Fontaine wrote:
> Just please make sure that it's still possible to use full absolute path
> for the module path name so that the packager can have control too.

I can't think of any package system where absolute paths are part of a
recommended workflow.  There is always a search path, that search path
contains a series of some kinds of system, non-system, per-user
directories.  A packager or installer chooses one of those directories
as installation target, according to convention or user choice.

OK, you can install a C library in some obscure place and then to
#include , but that's not sane practice.
Similarly, you will still be able to hardcode paths into CREATE FUNCTION
statements or other places, but why would you want to?

> What the $directory proposal achieves is allowing a fully relocatable
> extension layout, where you just have to drop a directory anywhere in
> the file system and it just works (*).

If that's what you want (and it sounds attractive), why couldn't we just
locate libraries using extension_control_path as well (which presumably
contains the control file we are just processing).  No need for another
indirection.  Libraries and control files being in separate directory
hierarchies is a historical artifact, but it's not something that can't
be changed if it's not what we want.

The problem I have with this $directory proposal, if I understand it
correctly, is that it requires editing of the control file at
installation time.  I don't like this for three reasons: it's not clear
how this should actually be done, creating more broken extension build
scripts (a big problem already); control files are part of the "code",
so to speak, not a configuration file, and so munging it in a
site-specific way creates a hybrid type of file that will be difficult
to properly manage; it doesn't allow running an extension before
installation (unless I overwrite the control file in my own source
tree), which is my main use case for this.

> What happens if you have more than one 'prefix.so' file in your path?

The same thing that happens if you have more than one prefix.control in
your path.  You take the first one you find.

Yes, if those are actually two different path settings, you need to keep
those aligned.  But that would be a one-time setting by the database
administrator, not a per-extension-and-packager setting, so it's
manageable.  If it still bothers you, put them both in the same path, as
I suggested above.

> The module_pathname facility allows the packager to decide where the
> library module file gets installed and the extension author not to
> concern himself with that choice.

Again, that would only work if they patch the control file during
installation, which is dubious.  That's like patching paths in include
files or shared libraries.  People do that, but it's not a preferred
method.  And secondly, why would a packager care?  Has any packager ever
cared to relocate the library file and no other file?

Aside from those details, it seems clear that any reasonably complete
move-extensions-elsewhere feature will need some kind of build system
support.  I have various ideas on that and would gladly contribute some
of them, but it's not going to happen within two weeks.

At this point I suggest that we work toward the minimum viable product:
the extension_control_path feature as originally proposed (plus the
crash fixes), and let the field work out best practices.  As you
describe, you can work around all the other issues by patching various
text files, but you currently cannot move the extension control file in
any way, and that's a real deficiency.  (I once experimented with bind
mounts to work around that -- a real mess ;-) )

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-03-07 Thread Florian Pflug

On Mar5, 2014, at 23:49 , Tom Lane  wrote:
> Florian Pflug  writes:
>> On Mar5, 2014, at 18:37 , Tom Lane  wrote:
>>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>>> the patch can't agree on what it means, what hope have everyday users
>>> got of making sense of it?
> 
>> The question isn't what the current output means, but whether it's a
>> good metric to report or not.
> 
> If you can't agree, then it isn't.

Probably, yes, so let's find something that *is* a good metric.

(BTW, it's not the authors who disagree here. It was me who put the EXPLAIN
feature in, and Dean reviewed it and found it confusing. The original
author David seems to run out of time to work on this, and AFAIK hasn't
weighted in on that particular feature at all)

>> If we don't report anything, then how would a user check whether a query
>> is slow because of O(n^2) behaviour of a windowed aggregate, or because
>> of some other reasons?
> 
> [ shrug... ]  They can see whether the Window plan node is where the time
> is going.  It's not apparent to me that the extra numbers you propose to
> report will edify anybody.

By the same line of reasoning, every metric except execution time is
superfluous. Comparing execution times really is a horrible way to measure
this - not only does it include all kinds of thing that have nothing to do
with the restart behaviour of aggregates, you'd also have to construct a
base-line query first which is guaranteed to not restart.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

Florian Pflug  writes:
> On Mar5, 2014, at 18:37 , Tom Lane  wrote:
>> My advice is to lose the EXPLAIN output entirely.  If the authors of
>> the patch can't agree on what it means, what hope have everyday users
>> got of making sense of it?

> The question isn't what the current output means, but whether it's a
> good metric to report or not.

If you can't agree, then it isn't.

> If we don't report anything, then how would a user check whether a query
> is slow because of O(n^2) behaviour of a windowed aggregate, or because
> of some other reasons?

[ shrug... ]  They can see whether the Window plan node is where the time
is going.  It's not apparent to me that the extra numbers you propose to
report will edify anybody.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-03-07 Thread Yeb Havinga


On 05/03/14 15:44, Craig Ringer wrote:

On 03/05/2014 05:25 PM, Yeb Havinga wrote:

Maybe a naive thought, but shouldn't all plans that include a table with
an RLS clause be invalidated when the session role switches, regardless
of which users from and to?

Only if the plan is actually accessed when under a different user ID.
Consider SECURITY DEFINER functions; you don't want to flush all cached
plans just because you ran a SECURITY DEFINER function that doesn't even
share any statements with the outer transaction.

Hmm good point.


I've also got some concerns about the user visible API; I'm not sure it
makes sense to set row security policy for row reads per-command in
PostgreSQL, since we have the RETURNING clause. Read-side policy should
just be "FOR READ".

Hmm but FOR READ would mean new keywords, and SELECT is also a concept
known to users. I didn't find the api problematic to understand, on the
contrary.

Would you expect that FOR SELECT also affects rows you can see to
UPDATE, INSERT, or DELETE?

Yes.

Because that's what it would have to mean, really. Otherwise, you could
just use `UPDATE thetable SET id = id RETURNING *` (or whatever) to read
the rows out if you had UPDATE rights. Or do the same with DELETE.

With RETURNING, it doesn't make much sense for different statements to
have different read access. Can you think of a case where it'd be
reasonable to deny SELECT, but allow someone to see the same rows with
`UPDATE ... RETURNING` ?


It might be an idea to add the SELECT RLS clause for DML
queries that contain a RETURNING clause.

That way lies madness: A DML statement that affects *a different set of
rows* depending on whether or not it has a RETURNING clause.
If you state it like that, it sounds like a POLA violation. But the 
complete story is: "A user is allowed to UPDATE a set of rows from a 
table that is not a subsect of the set of rows he can SELECT from the 
table, iow he can UPDATE rows he is not allowed to SELECT. This can lead 
to unexpected results: When the user issues an UPDATE of the table 
without a returning clause, all rows the user may UPDATE are affected. 
When the user issues an UPDATE of the table with a returning clause, all 
rows the user may UPDATE and SELECT are affected."


So the madness comes from the fact that it is allowed to define RLS that 
allow to modify rows you cannot select. Either prevent these conditions 
(i.e. proof that all DML RLS qual implies the SELECT qual, otherwise 
give an error on DML with a RETURNING clause), or allow it without 
violating the RLS rules but accept that a DML with RETURNING is 
different from a DML only.


regards,
Yeb



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tcp_keepalives_idle

On Fri, Mar  7, 2014 at 10:03:42PM -0500, Bruce Momjian wrote:
> On Thu, Nov 14, 2013 at 11:32:23AM +0100, Marko Tiikkaja wrote:
> > On 11/14/13 7:08 AM, Tatsuo Ishii wrote:
> > >>It means "the connection is idle except for keepalive packets".
> > >>We could perhaps just drop the word "otherwise", if people find
> > >>it confusing.
> > >
> > >Wah. I seemed to completely misunderstand what the pharase
> > >says. Thanks for clarification. I agree to drop "otherwise".
> > 
> > I had some problem interpreting these explanations as well:
> > http://www.postgresql.org/message-id/527a21f1.2000...@joh.to
> > 
> > Compare that to the description in the libpq documentation:
> > "Controls the number of seconds of inactivity after which TCP should
> > send a keepalive message to the server.".
> 
> Good point. I have improved the server-side keepalive parameter
> descriptions to use the superior libpq text, with adjustment.
> 
> Applied patch attached.

Oops, now attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 86dbd0f..2811f11
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** include 'filename'
*** 684,691 

! Specifies the number of seconds before sending a keepalive packet on
! an otherwise idle connection.  A value of 0 uses the system default.
  This parameter is supported only on systems that support the
  TCP_KEEPIDLE or TCP_KEEPALIVE symbols, and on
  Windows; on other systems, it must be zero.
--- 684,692 

! Specifies the number of seconds of inactivity after which TCP
! should send a keepalive message to the client.  A value of 0 uses
! the system default.
  This parameter is supported only on systems that support the
  TCP_KEEPIDLE or TCP_KEEPALIVE symbols, and on
  Windows; on other systems, it must be zero.
*** include 'filename'
*** 708,715 

! Specifies the number of seconds between sending keepalives on an
! otherwise idle connection.  A value of 0 uses the system default.
  This parameter is supported only on systems that support the
  TCP_KEEPINTVL symbol, and on Windows; on other systems, it
  must be zero.
--- 709,717 

! Specifies the number of seconds after which a TCP keepalive message
! that is not acknowledged by the client should be retransmitted.
! A value of 0 uses the system default.
  This parameter is supported only on systems that support the
  TCP_KEEPINTVL symbol, and on Windows; on other systems, it
  must be zero.
*** include 'filename'
*** 732,739 

! Specifies the number of keepalive packets to send on an otherwise idle
! connection.  A value of 0 uses the system default.  This parameter is
  supported only on systems that support the TCP_KEEPCNT
  symbol; on other systems, it must be zero.
  In sessions connected via a Unix-domain socket, this parameter is
--- 734,742 

! Specifies the number of TCP keepalives that can be lost before
! the server's connection to the client is considered dead.  A value of 0
! uses the system default.  This parameter is
  supported only on systems that support the TCP_KEEPCNT
  symbol; on other systems, it must be zero.
  In sessions connected via a Unix-domain socket, this parameter is

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tcp_keepalives_idle

On Thu, Nov 14, 2013 at 11:32:23AM +0100, Marko Tiikkaja wrote:
> On 11/14/13 7:08 AM, Tatsuo Ishii wrote:
> >>It means "the connection is idle except for keepalive packets".
> >>We could perhaps just drop the word "otherwise", if people find
> >>it confusing.
> >
> >Wah. I seemed to completely misunderstand what the pharase
> >says. Thanks for clarification. I agree to drop "otherwise".
> 
> I had some problem interpreting these explanations as well:
> http://www.postgresql.org/message-id/527a21f1.2000...@joh.to
> 
> Compare that to the description in the libpq documentation:
> "Controls the number of seconds of inactivity after which TCP should
> send a keepalive message to the server.".

Good point. I have improved the server-side keepalive parameter
descriptions to use the superior libpq text, with adjustment.

Applied patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLE not synonymous with SELECT * FROM?

On Wed, Nov 13, 2013 at 10:28:07AM +0100, Colin 't Hart wrote:
> David et al,
> 
> How about something like this?

I have applied a modified version of your patch.   I didn't like the
idea of putting "SELECT" inside an OR syntax clauses --- the syntax is
already too complicated.  I also didn't like moving the TABLE mention up
in the file.  What I did do was to document the supported TABLE clauses,
and add some of your verbiage.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
new file mode 100644
index 7395754..f1bc158
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
*** TABLE [ ONLY ] name

! is completely equivalent to

  SELECT * FROM name

  It can be used as a top-level command or as a space-saving syntax
! variant in parts of complex queries.

--- 1489,1505 

  TABLE name

! is equivalent to

  SELECT * FROM name

  It can be used as a top-level command or as a space-saving syntax
! variant in parts of complex queries. Only the WITH,
! UNION, INTERSECT, EXCEPT,
! ORDER BY, LIMIT, OFFSET,
! FETCH and locking clauses can be used with TABLE;
! the WHERE clause and any form of aggregation cannot
! be used.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Store Extension Options

On 2014-03-07 19:14:48 -0300, Fabrízio de Royes Mello wrote:
> On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera 
> wrote:
> 
> > I am reworking this patch, both to accomodate earlier review comments
> > and to fix the multiple verify step of namespaces, as noted by Tom in
> > 4530.1390023...@sss.pgh.pa.us
> >
> >
> This link is broken...

It is a message id, and it seemt o point to an appropriate thread? You
can relatively easily lookup message ids using urls like
http://www.postgresql.org/message-id/4530.1390023...@sss.pgh.pa.us

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regression test errors

2014-03-07 Thread Martín Marqués

I was testing some builds I was doing and found that the regression
tests fails when doing the against a Hot Standby server:

$ make standbycheck
[...]
== running regression test queries==
test hs_standby_check ... ok
test hs_standby_allowed   ... FAILED
test hs_standby_disallowed... FAILED
test hs_standby_functions ... ok

==
 2 of 4 tests failed.
==

The differences that caused some tests to fail can be viewed in the
file "/usr/local/postgresql-9.3.3/src/test/regress/regression.diffs".
A copy of the test summary that you see
above is saved in the file
"/usr/local/postgresql-9.3.3/src/test/regress/regression.out".

The regression.diffs and patch attached.

I haven't checked how far back those go. I don't think it's even
important to back patch this, but it's nice for future testing.

Regards,

-- 
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


regression.diffs
Description: Binary data
From b6db8388e37f6afaa431e31239fd972d10140cc1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= 
Date: Fri, 7 Mar 2014 21:29:29 -0300
Subject: [PATCH] Standby regression checks failed.

Two Hot Standby regression tests failed for various reasones.

- An error in src/test/regress/expected/hs_standby_disallowed.out
  made regression fail (VACUUM should be ANALYZE).
- Serializable transactions won't work on a Hot Standby.
---
 src/test/regress/expected/hs_standby_allowed.out| 2 +-
 src/test/regress/expected/hs_standby_disallowed.out | 2 +-
 src/test/regress/sql/hs_standby_allowed.sql | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/test/regress/expected/hs_standby_allowed.out b/src/test/regress/expected/hs_standby_allowed.out
index 1abe5f6..9d18d77 100644
--- a/src/test/regress/expected/hs_standby_allowed.out
+++ b/src/test/regress/expected/hs_standby_allowed.out
@@ -49,7 +49,7 @@ select count(*)  as should_be_1 from hs1;
 (1 row)
 
 end;
-begin transaction isolation level serializable;
+begin transaction isolation level readrepeatable;
 select count(*) as should_be_1 from hs1;
  should_be_1 
 -
diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out
index e7f4835..bc11741 100644
--- a/src/test/regress/expected/hs_standby_disallowed.out
+++ b/src/test/regress/expected/hs_standby_disallowed.out
@@ -124,7 +124,7 @@ unlisten *;
 ERROR:  cannot execute UNLISTEN during recovery
 -- disallowed commands
 ANALYZE hs1;
-ERROR:  cannot execute VACUUM during recovery
+ERROR:  cannot execute ANALYZE during recovery
 VACUUM hs2;
 ERROR:  cannot execute VACUUM during recovery
 CLUSTER hs2 using hs1_pkey;
diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql
index 58e2c01..5cd450d 100644
--- a/src/test/regress/sql/hs_standby_allowed.sql
+++ b/src/test/regress/sql/hs_standby_allowed.sql
@@ -28,7 +28,7 @@ begin transaction read only;
 select count(*)  as should_be_1 from hs1;
 end;
 
-begin transaction isolation level serializable;
+begin transaction isolation repeatable read;
 select count(*) as should_be_1 from hs1;
 select count(*) as should_be_1 from hs1;
 select count(*) as should_be_1 from hs1;
-- 
1.8.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Static Code Analysis Exploits.

Patrick Curran  writes:
> We use Postgres in our product and we have a client that requires a 
> static code analysis scan to detect vulnerabilities. They are concerned 
> because the tool (Veracode) found several flaws in Postgres and they 
> believe there might be a security risk. I'm sure there are lots of 
> companies that use Postgres that have security policies like theirs in 
> place, so I'm hoping someone has the experience to know that these are 
> false positives or that they are not a security risk for some reason. 
> Below is a description of the vulnerability and the location in the 
> source code. Version 9.3.2.1 was scanned. Please let me know if there is 
> a better place to ask this kind of question.

TBH, I don't think anyone's going to bother with going through this list
in this form.  Line numbers in something that's not an official community
release are not very helpful, and the descriptions are far too vague for
someone looking at the list to be sure exactly what their tool is on
about.  I took one example at random:

> Stack-based Buffer Overflow (CWE ID 121)(13 flaws):
> There is a potential buffer overflow with these functions. If an 
> attacker can control the data written into the buffer, the overflow may 
> result in execution of arbitrary code.

> libpq.dll .../interfaces/libpq/pqexpbuffer.c 369

This seems to be complaining about the memcpy in appendBinaryPQExpBuffer.
Well, I don't see anything unsafe about it: the preceding call to
enlargePQExpBuffer should have made sure that the target buffer is big
enough.  And the reference to stack-based buffer overflow is completely
nonsensical, because no PQExpBuffer keeps its buffer on the stack.  It's
conceivable that their tool has spotted some unsafe pattern in some call
site, but this report is unhelpful about identifying what that would be.

I did look at a few of the other items, and none of the ones I looked at
were any more clear.

FWIW, we do have access to Coverity scans of the Postgres sources, and
we do make efforts to fix things Coverity complains about.  But we're
unlikely to take reports like this one seriously: there's simply not
enough information to know what the tool is unhappy about, nor any
clear reason to believe that it's spotted something that both human
readers and Coverity have missed.

Sorry if that's not the answer you wanted; but a more positive response
is going to require a substantially greater amount of work on your end.
In particular, given the very substantial amounts of work that have
already gone into hardening the Postgres code, I think the burden of
proof is on you or your client to show that these are issues, not on
us to disprove claims that are too vague to be disproven.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Store Extension Options

2014-03-07 Thread Fabrízio de Royes Mello

On Fri, Mar 7, 2014 at 7:21 PM, Andres Freund wrote:

> On 2014-03-07 19:14:48 -0300, Fabrízio de Royes Mello wrote:
> > On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera  >wrote:
> >
> > > I am reworking this patch, both to accomodate earlier review comments
> > > and to fix the multiple verify step of namespaces, as noted by Tom in
> > > 4530.1390023...@sss.pgh.pa.us
> > >
> > >
> > This link is broken...
>
> It is a message id, and it seemt o point to an appropriate thread? You
> can relatively easily lookup message ids using urls like
> http://www.postgresql.org/message-id/4530.1390023...@sss.pgh.pa.us
>
>
Sorry... my fault!! Thanks!

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] [PATCH] Store Extension Options

2014-03-07 Thread Fabrízio de Royes Mello

On Fri, Mar 7, 2014 at 5:56 PM, Alvaro Herrera wrote:

> I am reworking this patch, both to accomodate earlier review comments
> and to fix the multiple verify step of namespaces, as noted by Tom in
> 4530.1390023...@sss.pgh.pa.us
>
>
This link is broken...

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Fri, Mar 7, 2014 at 12:55 PM, Peter LaDow  wrote:
> Just to be clear, what do you mean by "nontrivial code"?  Do you mean
> C library calls, other than fork/exec/_exit?

I think I've answered my own question:

http://man7.org/linux/man-pages/man7/signal.7.html

The 'Async-signal-safe functions' are the nontrivial C calls that may be called.

Pete


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread David E. Wheeler

On Mar 6, 2014, at 1:51 AM, Peter Geoghegan  wrote:

>> It's true for perl. Syntax of hstore is close to hash/array syntax and it's
>> easy serialize/deserialize hstore to/from perl. Syntax of hstore was
>> inspired by perl.
> 
> I understand that. There is a module on CPAN called Pg::hstore that
> will do this; it appears to have been around since 2011. I don't use
> Perl, so I don't know a lot about it. Perhaps David Wheeler has an
> opinion on the value of Perl-like syntax, as a long time Perl
> enthusiast?

HSTORE was inspired by the syntax of Perl hash declarations, but it is not 
compatible. Notably, HSTORE the HSTORE can have a value `NULL`, while in Perl 
hashes it’s `undef`. So you cannot simply `eval` an HSTORE to get a Perl hash 
unless you are certain there are no NULLs.

Besides, string eval in Perl is considered unsafe. Parsing is *much* safer.

> In any case, Perl has excellent support for JSON, just like every
> other language - you are at no particular advantage in Perl by having
> a format that happens to more closely resemble the format of Perl
> hashes and arrays. I really feel that we should concentrate our
> efforts on one standardized format here. It makes the effort to
> integrate your good work, in a way that makes it available to everyone
> so much easier.

I agree. I like HSTORE, but now that JSON is so standard (in fact, as of this 
week, a *real* standard! http://rfc7159.net/rfc7159), and its support is so 
much better than that of HSTORE, including in Perl, I believe that it should be 
priority over HSTORE. I’m happy if HSTORE has the same functionality as JSONB, 
but given the choice, all other things being equal, as a Perl hacker I will 
always choose JSONB.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Next CommitFest Deadlines

On Thu, Mar 6, 2014 at 7:44 AM, Fabrízio de Royes Mello
 wrote:
> Hi all,
>
> There are some place with the next commitfest deadlines (2014/06 and
> 2014/09) ?
>
> Regards,

Those deadlines won't be finalized until after PGCon, but it seems
likely to me that we'll do about what we did last year.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Static Code Analysis Exploits.

2014-03-07 Thread Patrick Curran


Hi,

We use Postgres in our product and we have a client that requires a 
static code analysis scan to detect vulnerabilities. They are concerned 
because the tool (Veracode) found several flaws in Postgres and they 
believe there might be a security risk. I'm sure there are lots of 
companies that use Postgres that have security policies like theirs in 
place, so I'm hoping someone has the experience to know that these are 
false positives or that they are not a security risk for some reason. 
Below is a description of the vulnerability and the location in the 
source code. Version 9.3.2.1 was scanned. Please let me know if there is 
a better place to ask this kind of question.


Thanks,
Patrick



Stack-based Buffer Overflow (CWE ID 121)(13 flaws):
There is a potential buffer overflow with these functions. If an 
attacker can control the data written into the buffer, the overflow may 
result in execution of arbitrary code.


btree_gist.dll .../btree_gist/btree_utils_num.c 115
btree_gist.dll .../btree_gist/btree_utils_num.c 123
pgcrypto.dll .../contrib/pgcrypto/crypt-md5.c 103
libpq.dll .../interfaces/libpq/fe-connect.c 3185
libpq.dll .../interfaces/libpq/fe-connect.c 3220
clusterdb.exe .../interfaces/libpq/fe-connect.c 3243
libpq.dll .../libpq/fe-protocol3.c 1661
libecpg_compat.dll .../ecpg/compatlib/informix.c 978
pgcrypto.dll .../contrib/pgcrypto/mbuf.c 112
pgcrypto.dll .../contrib/pgcrypto/mbuf.c 290
pgcrypto.dll .../contrib/pgcrypto/mbuf.c 306
pgcrypto.dll .../contrib/pgcrypto/mbuf.c 330
libpq.dll .../interfaces/libpq/pqexpbuffer.c 369

Use of Inherently Dangerous Function (CWE ID 242)(1 flaw):
These functions are inherently unsafe because they does not perform 
bounds checking on the size of their input. An attacker can send overly 
long input and overflow the destination buffer, potentially resulting in 
execution of arbitrary code.

pg_isolation_regress.exe .../src/test/regress/pg_regress.c 2307

Integer Overflow or Wraparound (CWE ID 190)(1 flaw):
An integer overflow condition exists when an integer that has not been 
properly sanity checked is used in the determination of an offset or 
size for memory allocation, copying, concatenation, or similarly. If the 
integer in question is incremented past the maximum possible value, it 
may wrap to become a very small, negative number, therefore providing an 
unintended value. This occurs most commonly in arithmetic operations or 
loop iterations. Integer overflows can often result in buffer overflows 
or data corruption, both of which may be potentially exploited to 
execute arbitrary code.


dict_snowball.dll .../libstemmer/utilities.c 371

Process Control (CWE ID 114)(4 flaws)
A function call could result in a process control attack. An argument to 
a process control function is either derived from an untrusted source or 
is hard-coded, both of which may allow an attacker to execute malicious 
code under certain conditions. If an attacker is allowed to specify all 
or part of the filename, it may be possible to load arbitrary libraries. 
If the location is hard-coded and an attacker is able to place a 
malicious copy of the library higher in the search order than the file 
the application intends to load, then the application will load the 
malicious version.


psql.exe .../src/bin/psql/print.c 752
psql.exe .../src/bin/psql/print.c 791
psql.exe .../src/bin/psql/print.c 2209
psql.exe .../src/bin/psql/print.c 2500


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Store Extension Options

2014-03-07 Thread Alvaro Herrera

I am reworking this patch, both to accomodate earlier review comments
and to fix the multiple verify step of namespaces, as noted by Tom in
4530.1390023...@sss.pgh.pa.us

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund  wrote:
> Forking twice is ok as well, as long as you just use _exit() after the
> fork. The thing is that you shouldn't run any nontrivial code in the
> fork, as long as you're connected to the original environment (fd's,
> memory mappings and so forth).

Just to be clear, what do you mean by "nontrivial code"?  Do you mean
C library calls, other than fork/exec/_exit?

Pete


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Fri, Mar 7, 2014 at 12:17 PM, Andres Freund  wrote:
> Man. The point is that that the library code is carefully written to not
> use exit() but _exit() after a fork failure, that's it.

I understand your point.  I understand that in the case of the
postmaster we don't want to invoke behavior that can cause problems by
calling exit(). But how does this jive with Tom's point (on the bug
list) about other 3rd party libraries perhaps registering atexit
handlers?

If the convention is that only fork/exec/_exit is permissible after a
fork (what about on_exit_reset?), then third party libraries also need
to be aware of that convention and not depend on their atexit handlers
being called.

And I'm not advocating a certain solution.  I'm only trying to clarify
what the solution is so that we "comply" with the convention.  We
don't want to break posgres or any other "well behaved" third party
libraries.  I don't know the internals of postgres (hence original bug
report and this thread), so I of course defer to you and the other
experts here.  So, in our case we call on_exit_reset() after the fork
in the child, and then from there on only use fork, exec, or _exit, as
you mention above.

Pete

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On 2014-03-07 12:09:45 -0800, Peter LaDow wrote:
> On Friday, March 7, 2014, Andres Freund  wrote:
> >
> > If the third party library is suitably careful it will only use fork()
> > and then exec() or _exit(), otherwise there are other things that'll be

> But that is not possible* in our case of trying to spawn an asynchronous
> backgound process.

Forking twice is ok as well, as long as you just use _exit() after the
fork. The thing is that you shouldn't run any nontrivial code in the
fork, as long as you're connected to the original environment (fd's,
memory mappings and so forth).

> > Both perl and glibc seem to do just that in system() btw...

> I don't know about Perl, but system is blocking. What if you don't want to
> wait for the child to exit?

Man. The point is that that the library code is carefully written to not
use exit() but _exit() after a fork failure, that's it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Friday, March 7, 2014, Andres Freund  wrote:
>
> If the third party library is suitably careful it will only use fork()
> and then exec() or _exit(), otherwise there are other things that'll be

But that is not possible* in our case of trying to spawn an asynchronous
backgound process. The goal here is for the extension to spawn the process
and return immediately. If we exec in the first level child, and
immediately return, who reaps the child when done?

This is why we fork twice and exit in the first level child so that the
extension can reap the first.

Unless Postgres happily reaps all children perhaps through a SIGCHLD
handler?

(* I suppose we could exec a program that itself forks/execs a background
process. But that is essentially no different than what we are doing now,
other than trying to meet this fork/exec/_exit requirement. And that's fine
if that is to be the case. We just need to know what it is.)

> Both perl and glibc seem to do just that in system() btw...
>

I don't know about Perl, but system is blocking. What if you don't want to
wait for the child to exit?

Pete

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Fri, Mar 7, 2014 at 1:54 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane  wrote:
>>> I just noticed that the DSM patch has introduced a whole new class of
>>> failures related to the bug #9464 issue: to wit, any on_detach
>>> actions registered in a parent process will also be performed when a
>>> child process exits, because nothing has been added to on_exit_reset
>>> to prevent that.  It seems likely that this is undesirable.
>
>> I don't think this can actually happen.  There are quite a number of
>> things that would go belly-up if you tried to use dynamic shared
>> memory from the postmaster, which is why dsm_create() and dsm_attach()
>> both Assert(IsUnderPostmaster).
>
> Nonetheless it seems like a good idea to make on_exit_reset drop any
> such queued actions.
>
> The big picture here is that in the scenario being debated in the other
> thread, exit() in a child process forked from a backend will execute that
> backend's on_detach actions *even if the code had done on_exit_reset after
> the fork*.

Hmm.  So the problematic sequence of events is where a postmaster
child forks, and then exits without exec-ing, perhaps because e.g.
exec fails?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On 2014-03-07 14:14:17 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
> >> The big picture here is that in the scenario being debated in the other
> >> thread, exit() in a child process forked from a backend will execute that
> >> backend's on_detach actions *even if the code had done on_exit_reset after
> >> the fork*.
> 
> > Hm, aren't those actions called via shmem_exit() calling
> > dsm_backend_shutdown() et al? I think that should be cleared by
> > on_shmem_exit()?
> 
> But dsm_backend_shutdown gets called whether or not any shmem_exit
> actions are registered.

Yikes. I thought on_exit_reset() disarmed the atexit callback, but
indeed, it does not...

> > I think you're misunderstanding me. I am saying we *should* defend
> > against it. Our opinions just seem to differ on what to do when the
> > scenario is detected. I am saying we should scream bloody murder (which
> > admittedly is a hard in a child), you want to essentially declare it
> > supported.
> 
> And if we scream bloody murder, what will happen?  Absolutely nothing
> except we'll annoy our users.  They won't have control over the
> third-party libraries that are doing what you want to complain about.

If the third party library is suitably careful it will only use fork()
and then exec() or _exit(), otherwise there are other things that'll be
broken (e.g. stdio). In that case everything was and is fine. If not,
the library's user can then fix or file a bug against the library.

Both perl and glibc seem to do just that in system() btw...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

Andres Freund  writes:
> On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
>> The big picture here is that in the scenario being debated in the other
>> thread, exit() in a child process forked from a backend will execute that
>> backend's on_detach actions *even if the code had done on_exit_reset after
>> the fork*.

> Hm, aren't those actions called via shmem_exit() calling
> dsm_backend_shutdown() et al? I think that should be cleared by
> on_shmem_exit()?

But dsm_backend_shutdown gets called whether or not any shmem_exit
actions are registered.

> I think you're misunderstanding me. I am saying we *should* defend
> against it. Our opinions just seem to differ on what to do when the
> scenario is detected. I am saying we should scream bloody murder (which
> admittedly is a hard in a child), you want to essentially declare it
> supported.

And if we scream bloody murder, what will happen?  Absolutely nothing
except we'll annoy our users.  They won't have control over the
third-party libraries that are doing what you want to complain about.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

Hi,

On 2014-03-07 13:54:42 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > I don't think this can actually happen.  There are quite a number of
> > things that would go belly-up if you tried to use dynamic shared
> > memory from the postmaster, which is why dsm_create() and dsm_attach()
> > both Assert(IsUnderPostmaster).
>
> Nonetheless it seems like a good idea to make on_exit_reset drop any
> such queued actions.
>
> The big picture here is that in the scenario being debated in the other
> thread, exit() in a child process forked from a backend will execute that
> backend's on_detach actions *even if the code had done on_exit_reset after
> the fork*.

Hm, aren't those actions called via shmem_exit() calling
dsm_backend_shutdown() et al? I think that should be cleared by
on_shmem_exit()?

> So whether or not you buy Andres' argument that it's not
> necessary for atexit_callback to defend against this scenario, there's
> actually no other defense possible given the way things work in HEAD.

I think you're misunderstanding me. I am saying we *should* defend
against it. Our opinions just seem to differ on what to do when the
scenario is detected. I am saying we should scream bloody murder (which
admittedly is a hard in a child), you want to essentially declare it
supported.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

Robert Haas  writes:
> On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane  wrote:
>> I just noticed that the DSM patch has introduced a whole new class of
>> failures related to the bug #9464 issue: to wit, any on_detach
>> actions registered in a parent process will also be performed when a
>> child process exits, because nothing has been added to on_exit_reset
>> to prevent that.  It seems likely that this is undesirable.

> I don't think this can actually happen.  There are quite a number of
> things that would go belly-up if you tried to use dynamic shared
> memory from the postmaster, which is why dsm_create() and dsm_attach()
> both Assert(IsUnderPostmaster).

Nonetheless it seems like a good idea to make on_exit_reset drop any
such queued actions.

The big picture here is that in the scenario being debated in the other
thread, exit() in a child process forked from a backend will execute that
backend's on_detach actions *even if the code had done on_exit_reset after
the fork*.  So whether or not you buy Andres' argument that it's not
necessary for atexit_callback to defend against this scenario, there's
actually no other defense possible given the way things work in HEAD.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable coding in reorderbuffer.h

On 2014-03-07 13:27:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > A patch fixing this is attached. I've added some more local variables to
> > deal with the longer lines...
> 
> Since I've got a compiler in captivity that complains about this,
> I'll take care of checking and committing this patch.

Thanks.

The tests specific for the feature can be run with (cd
contrib/test_decoding && make -s check), as they require non-default
PGC_POSTMASTER settings.

> I still think it might be a good idea to spin up a buildfarm member
> running "gcc -ansi -pedantic", even if we don't see that as a particularly
> useful case in practice.  Thoughts?

I tried to compile with both -ansi -pedantic -std=c90 to catch potential
further issues, but at least my gcc version makes the output completely
drown in various warnings. Some can be individually disabled, but lots
of them seem to be only be covered by pretty general warning categories.
-ansi -std=c90 -Wno-variadic-macros works reasonably well tho.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_exit_reset fails to clear DSM-related exit actions

On Fri, Mar 7, 2014 at 10:04 AM, Tom Lane  wrote:
> I just noticed that the DSM patch has introduced a whole new class of
> failures related to the bug #9464 issue: to wit, any on_detach
> actions registered in a parent process will also be performed when a
> child process exits, because nothing has been added to on_exit_reset
> to prevent that.  It seems likely that this is undesirable.

I don't think this can actually happen.  There are quite a number of
things that would go belly-up if you tried to use dynamic shared
memory from the postmaster, which is why dsm_create() and dsm_attach()
both Assert(IsUnderPostmaster).  Without calling those function,
there's no way for code running in the postmaster to call
on_dsm_detach(), because it's got nothing to pass for the first
argument.

In case you're wondering, the major reason I disallowed this is that
the control segment tracks the number of backends attached to each
segment, and there's no provision for adjusting that value each time
the postmaster forks.  We could add such provision, but it seems like
there would be race conditions, and the postmaster might have to
participate in locking, so it might be pretty ugly, and a performance
suck for anyone who doesn't need this functionality.  And it doesn't
seem very useful anyway.  The postmaster really shouldn't be touching
any shared memory segment more than the absolutely minimal amount, so
the main possible benefit I can see is that you could have the mapping
already in place for each new backend rather than having to set it up
in every backend.  But I'm prepared to hope that isn't actually
important enough to be worth worrying about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable coding in reorderbuffer.h

Andres Freund  writes:
> A patch fixing this is attached. I've added some more local variables to
> deal with the longer lines...

Since I've got a compiler in captivity that complains about this,
I'll take care of checking and committing this patch.

I still think it might be a good idea to spin up a buildfarm member
running "gcc -ansi -pedantic", even if we don't see that as a particularly
useful case in practice.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal (9.5) : psql unicode border line styles

2014-03-07 Thread Pavel Stehule

Hello

I am returning back to this topic. Last time I proposed styles:

http://www.postgresql.org/message-id/cafj8prclgoktryjpbtoncpgyftrcz-zgfowdc1jqulb+ede...@mail.gmail.com

http://postgres.cz/wiki/Pretty_borders_in_psql

This experiment fails, but there are some interesting tips in discuss.

So I propose little bit different proposal - choose one predefined style
for any table lines elements. These styles are active only when "linestyle"
is unicode.

So possible line elements are:

* border,
* header_separator,
* row_separator,
* column_separator,

Possible styles (for each element)

* none,
* single,
* double,
* thick,

It should to have enough variability to define all styles proposed early. I
hope, so this proposal is secure and simple for usage. Styles should be
persistently saved in .psqlrc file - and some examples can be in
documentation.

Usage:

\pset linestyle_border double
\pset linestyle_header_separator single
\pset linestyle_row_separator single
\pset linestyle_column_separator single

\pset linestyle unicode

╔═══╤╤═══╗
║ a │ b  │   c   ║
╟───┼┼───╢
║ 1 │ 2012-05-24 │ Hello ║
╟───┼┼───╢
║ 2 │ 2012-05-25 │ Hello ║
║   ││ World ║
╚═══╧╧═══╝
(2 rows)


Comments, ideas ?

Regards

Pavel

Re: [HACKERS] atexit_callback can be a net negative

Hi,

On 2014-03-07 09:50:15 -0800, Peter LaDow wrote:
> Also, the on_exit_reset() does clear up the issue, and we have
> implemented it as suggested in this thread (calling it immediately
> after the fork() in the child).  And Andres' keen eye noted we were
> calling exit() after an exec() failure, rather than _exit(). Thank
> you, Andres, for pointing out this error.

I actually didn't see any source code of yours ;), just answered Tom's
question about what to do after exec() failed.

There's several other wierd behaviours if you use exit() instead of
_exit() after a fork, most prominently double flushes leading to
repeated/corrupted output when you have have stream FILEs open in the
parent. This is because the stream will be flushed in both, the parent
(at some later write or exit) and the child (at exit). It's simply
something that won't work well.

> > I don't see any way it's be safe for a pg unaware library to start
> > forking around and doing similar random things inside normal
> > backends.
> 
> We aren't exactly "forking around."  We were trying to spawn an
> asynchronous process without any ties to the postmaster.

The important bit in the sentence you quote is "pg unaware library". My
point is just that there are some special considerations you have to be
aware of. fork() and exec() is a way to escape that environment, and
should be fine.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What the heck is hobby.2ndquadrant.com?

2014-03-07 Thread Simon Riggs

On 7 March 2014 17:46, Tom Lane  wrote:
> There are a few threads floating around currently in which some of
> the cc'd addresses are various-people at hobby.2ndquadrant.com.
> Addresses like that seem not to work, or at least not work reliably.
> I believe the official addresses are just soandso at 2ndquadrant.com.
>
> If you're replying to any such threads, you might want to clean
> up the addresses so you don't get a bunch of bounce messages.
>
> In any case, I thought the 2ndquadrant people would want to know that
> there's something broken in their mail system.  I'm not sure how these
> addresses got injected into our threads in the first place, but I bet
> it wasn't done by anybody outside 2ndquadrant.  Perhaps an internal
> mail server name was unintentionally exposed?

Thanks for highlighting that. Will investigate.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

Craig Ringer  writes:
> What I'm concerned about is the locking. It looks to me like we're
> causing the user to lock rows that they may not intend to lock, by
> applying a LockRows step *before* the user supplied qual. (I'm going to
> test that tomorrow, it's sleep time in Australia).

The fact that there are two LockRows nodes seems outright broken.
The one at the top of the plan is correctly placed, but how did the
other one get in there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

Sorry for the bit of top-posting, but I wanted to make some things
clear.  Also, I wasn't subscribed to pgsql-hackers at the time this
thread began, so I apologize for the missing headers that might cause
threading issues.

I'm the submitter of bug #9464.  Here's the background on what we are
doing.  We are on a limited resource, embedded device.  We make use of
the database as an event driven system.  In the case of an incoming
event, such as a settings change, we make use of this fork/exec
procedure to spawn an asynchronous process to handle certain events.
Some of these external processes are long running, some need to run
outside the current transaction context, and some we don't care about
the result--we just want it to run.

Also, the on_exit_reset() does clear up the issue, and we have
implemented it as suggested in this thread (calling it immediately
after the fork() in the child).  And Andres' keen eye noted we were
calling exit() after an exec() failure, rather than _exit(). Thank
you, Andres, for pointing out this error.

Andres Freund  2ndquadrant.com> writes:
> On 2014-03-07 09:49:05 -0500, Tom Lane wrote:
> > Andres Freund  2ndquadrant.com> writes:
> > > What are you proposing to do in that case? This is only one of the
> > > failure cases of forking carelessly, right?

Just to be clear, we intended to fork careFULLy, not careLESSly.
Hence the reason for the double fork with an eventual exec().  We
intended to be clearly separated from the backend and executing in our
own clean, unrelated context.

> I don't think it's a reasonable pattern run background processes that
> aren't managed by postgres with all shared memory still
> accessible. You'll have to either also detach from shared memory and

In this case we definitely did NOT want to be managed by postgres.
Hence the double fork.  The intent was that the first level child
would NOT be managed by postgres, hence the exit().

> related things, or you have to fork() and exec(). At the very least, not

Which is _exactly_ what we were trying to do.  The first fork was only
so that we could fork again and spawn a subprocess completely detached
from the postmaster.  And also to have something for the postmaster to
reap via waitpid and prevent a zombie from hanging around.  The
typical daemon stuff.

> integrating the child with the postmaster's lifetime will prevent
> postgres from restarting because there's still a child attached to the
> shared memory.

Which is _exactly_ what we were NOT trying to do.  We did not want to
integrate with postmaster.

> I don't see any way it's be safe for a pg unaware library to start
> forking around and doing similar random things inside normal
> backends.

We aren't exactly "forking around."  We were trying to spawn an
asynchronous process without any ties to the postmaster.  This was
expected to be well-defined, clean behavior.  A fork/exec isn't an
unusual thing to do, and we didn't expect that exiting a child would
invoke behavior that would cause problems.

Thanks,

Pete

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] What the heck is hobby.2ndquadrant.com?

There are a few threads floating around currently in which some of
the cc'd addresses are various-people at hobby.2ndquadrant.com.
Addresses like that seem not to work, or at least not work reliably.
I believe the official addresses are just soandso at 2ndquadrant.com.

If you're replying to any such threads, you might want to clean
up the addresses so you don't get a bunch of bounce messages.

In any case, I thought the 2ndquadrant people would want to know that
there's something broken in their mail system.  I'm not sure how these
addresses got injected into our threads in the first place, but I bet
it wasn't done by anybody outside 2ndquadrant.  Perhaps an internal
mail server name was unintentionally exposed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

Craig Ringer  writes:
> On 03/06/2014 02:58 AM, Tom Lane wrote:
>> The PlanInvalItem could perfectly well be generated by the planner,
>> no, if it has the user OID?  But I'm not real sure why you need it.
>> I don't see the reason for an invalidation triggered by user ID.
>> What exactly about the *user*, and not something else, would trigger
>> plan invalidation?

> It's only that the plan depends on the user ID. There's no point keeping
> the plan around if the user no longer exists.

[ shrug... ]  Leaving such a plan cached would be harmless, though.
Furthermore, the user ID we'd be talking about is either the owner
of the current session, or the owner of some view or security-definer
function that the plan is already dependent on, so it's fairly hard
to credit that the plan would survive long enough for the issue to
arise.

Even if there is a scenario where invalidating by user ID is actually
useful, I think adding infrastructure to cause invalidation in such a case
is optimizing for the wrong thing.  You're adding cycles to every query to
benefit a case that is going to be quite infrequent in practice.

>> What we do need is a notion that a plan cache entry might only be
>> valid for a specific calling user ID; but that's a matter for cache
>> entry lookup not for subsequent invalidation.

> Yes, that would be good, but is IMO more of a separate optimization. I'm
> currently using KaiGai's code to invalidate and re-plan when a user ID
> change is detected.

I'm unlikely to accept a patch that does that; wouldn't it be catastrophic
for performance in the presence of security-definer functions?  You can't
just trash the whole plan cache when a user ID switch occurs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Fri, Mar  7, 2014 at 11:57:48AM -0500, Andrew Dunstan wrote:
> >If they can be done for 9.4, great, if not, we have to decide if these
> >suboptimal cases are enough for us to delay the data type until 9.5.  I
> >don't know the answer, but I have to ask the question.
> >
> 
> AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were
> essentially two ways of spelling the same data, the domains were
> virtually identical (hstore might have been a bit more liberal about
> numeric input).

OK, it sounds like the adjustments are minimal, like not using the
high-order bit.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Andrew Dunstan



On 03/07/2014 11:45 AM, Bruce Momjian wrote:

On Fri, Mar  7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote:

IIRC The sacrifice was one bit in the header (i.e. in the first int
after the varlena header). We could now repurpose that (for example
if we ever decided to use a new format).

Oleg and Teodor made most of the adjustments on the hstore(2) side
(e.g. providing for scalar roots, providing for json typing of
scalars so everything isn't just a string).

Can the architecture be changed? No. If we think it's not good
enough we would have to kiss jsonb goodbye for 9.4 and go back to
the drawing board. But I haven't seen any such suggestion from
anyone who has been reviewing it (e.g. Andres or Peter).

We are going to be stuck with the JSONB binary format we ship in 9.4 so
I am asking if there are things we should do to improve it, now that we
know we don't need backward compatibility.

If they can be done for 9.4, great, if not, we have to decide if these
suboptimal cases are enough for us to delay the data type until 9.5.  I
don't know the answer, but I have to ask the question.



AFAIK, there is no sacrifice of optimality. hstore2 and jsonb were 
essentially two ways of spelling the same data, the domains were 
virtually identical (hstore might have been a bit more liberal about 
numeric input).


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

On Fri, Mar  7, 2014 at 11:35:41AM -0500, Andrew Dunstan wrote:
> IIRC The sacrifice was one bit in the header (i.e. in the first int
> after the varlena header). We could now repurpose that (for example
> if we ever decided to use a new format).
> 
> Oleg and Teodor made most of the adjustments on the hstore(2) side
> (e.g. providing for scalar roots, providing for json typing of
> scalars so everything isn't just a string).
> 
> Can the architecture be changed? No. If we think it's not good
> enough we would have to kiss jsonb goodbye for 9.4 and go back to
> the drawing board. But I haven't seen any such suggestion from
> anyone who has been reviewing it (e.g. Andres or Peter).

We are going to be stuck with the JSONB binary format we ship in 9.4 so
I am asking if there are things we should do to improve it, now that we
know we don't need backward compatibility.

If they can be done for 9.4, great, if not, we have to decide if these
suboptimal cases are enough for us to delay the data type until 9.5.  I
don't know the answer, but I have to ask the question.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-03-07 Thread Dimitri Fontaine

Hi,

Peter Eisentraut  writes:
> On 2/27/14, 6:04 AM, Dimitri Fontaine wrote:
>>directory = 'local/hstore-new'
>>module_pathname = '$directory/hstore'
>
> I think your previously proposed patch to add extension_control_path
> plus my suggestion to update existing de facto best practices to not
> include $libdir into the module path name (thus allowing the use of
> dynamic_library_path) will address all desired use cases just fine.

My opinion is that we have two choices: refactoring the current API or
incrementally improving it. In both cases we should make it possible for
the packager to control where any individual module file is loaded from,
with maybe options for the sysadmin to override the packager's choice.

In your proposal, the control moves away from the developer, and that's
a good thing, so you get a +1 from me.

Just please make sure that it's still possible to use full absolute path
for the module path name so that the packager can have control too.

> Moreover, going that way would reuse existing facilities and concepts,
> remove indirections and reduce overall complexity.  This new proposal,
> on the other hand, would go the other way, introducing new concepts,
> adding more indirections, and increasing overall complexity, while
> actually achieving less.

What the $directory proposal achieves is allowing a fully relocatable
extension layout, where you just have to drop a directory anywhere in
the file system and it just works (*).

It just work and allows to easily control which module is loaded and
without having to setup either LD_LIBRARY_PATH, ld.so.conf nor our own
dynamic_library_path.

  * providing that said directory is part of extension_control_path, or
that you copy or move the .control file to sharedir/extension.

That said, I don't intend to be using it myself, so I won't try and save
that patch in any ways. My position is that Stephen's concern is real
and his idea simple enough while effective, so worth pursuing.

> I see an analogy here.  What we are currently doing is similar to
> hardcoding absolute rpaths into all libraries.  Your proposal is
> effectively to (1) add the $ORIGIN mechanism and (2) make people use
> chrpath when they want to install somewhere else.  My proposal is to get
> rid of all rpaths and just set a search path.  Yes, on technical level,
> this is less powerful, but it's simpler and gets the job done and is
> harder to misuse.

What happens if you have more than one 'prefix.so' file in your path?

> A problem with features like these is that they get rarely used but
> offer infinite flexibility, so they are not used consistently and you
> can't rely on anything.  This is already the case for the
> module_pathname setting in the control file.  It has, AFAICT, no actual
> use, and because of that no one uses it, and because of that, there is
> no guarantee that extensions use it sensibly, and because of that no one
> can ever make sensible use of it in the future, because there is no
> guarantee that extensions have it set sensibly.  In fact, I would
> propose deprecating module_pathname.

The module_pathname facility allows the packager to decide where the
library module file gets installed and the extension author not to
concern himself with that choice.

I agree that using $libdir as the extension developer isn't the right
thing to do. Having to choose the installation path as a developer,
either in the SQL script or in the control file, is not the right thing.

Now, the practical answer I have to that point is to have the packager
rewrite the control file as part of its build system.

My vote goes against deprecating module_pathname, because I didn't see
in your proposal any ways to offer the control back to the packager,
only to the sysadmin, and I don't want to have the sysadmin involved if
we can avoid it (as you say, too much flexibility is a killer).

In practical term, though, given the current situation, the build system
I'm woking on already has to edit the SQL scripts and control files
anyways…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Andrew Dunstan



On 03/06/2014 11:33 PM, Bruce Momjian wrote:

On Thu, Mar  6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote:

Hi there,

Looks like consensus is done. I and Teodor are not happy with it, but
what we can do :)   One thing I  want to do is to reserve our
contribution to the flagship feature (jsonb), particularly, "binary
storage for nested structures and indexing. Their work was sponsored
by Engine Yard".

OK, if we are going with an unchanged hstore in contrib and a new JSONB,
there is no reason to wack around JSONB to be binary compatible with the
old hstore format.  What sacrifices did we need to make to have JSBONB
be binary compatible with hstore, can those sacrifices be removed, and
can that be done in time for 9.4?




IIRC The sacrifice was one bit in the header (i.e. in the first int 
after the varlena header). We could now repurpose that (for example if 
we ever decided to use a new format).


Oleg and Teodor made most of the adjustments on the hstore(2) side (e.g. 
providing for scalar roots, providing for json typing of scalars so 
everything isn't just a string).


Can the architecture be changed? No. If we think it's not good enough we 
would have to kiss jsonb goodbye for 9.4 and go back to the drawing 
board. But I haven't seen any such suggestion from anyone who has been 
reviewing it (e.g. Andres or Peter).


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Merlin Moncure

On Thu, Mar 6, 2014 at 10:33 PM, Bruce Momjian  wrote:
> On Thu, Mar  6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote:
>> Hi there,
>>
>> Looks like consensus is done. I and Teodor are not happy with it, but
>> what we can do :)   One thing I  want to do is to reserve our
>> contribution to the flagship feature (jsonb), particularly, "binary
>> storage for nested structures and indexing. Their work was sponsored
>> by Engine Yard".
>
> OK, if we are going with an unchanged hstore in contrib and a new JSONB,
> there is no reason to wack around JSONB to be binary compatible with the
> old hstore format.  What sacrifices did we need to make to have JSBONB
> be binary compatible with hstore, can those sacrifices be removed, and
> can that be done in time for 9.4?

Also,
*) what hstore2 features (if any) that are not already reflected in
the jsonb type are going to be moved to josnb for 9.4?
*) if the answer above is anything but 'nothing', what hstore-isms are
going to be adjusted in the process of doing so?  Presumably there
would be same function name changes to put them in the jsonb style but
also the hstore sytnax ('=>') is going to be embedded in some of the
search operators and possibly other things.  Is that going change?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

Hi,

On 2014-03-07 10:24:31 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-03-07 09:49:05 -0500, Tom Lane wrote:
> >> No, I think it should do nothing.  The coding pattern shown in bug #9464
> >> seems perfectly reasonable and I think we should allow it.
> 
> > I don't think it's a reasonable pattern run background processes that
> > aren't managed by postgres with all shared memory still
> > accessible. You'll have to either also detach from shared memory and
> > related things, or you have to fork() and exec().
> 
> The code in question is trying to do that.  And what do you think will
> happen if the exec() fails?

In code that's written properly, not much. It will use _exit() after the
exec() failure. That's pretty established best practice after forking,
especially after a exec() failure.

> > At the very least, not
> > integrating the child with the postmaster's lifetime will prevent
> > postgres from restarting because there's still a child attached to the
> > shared memory.
> 
> I think you're willfully missing the point.  The reason we added
> atexit_callback was to try to defend ourselves against third-party code
> that did things in a non-Postgres-aware way.

Hey, I am not arguing that we should remove protection, I am saying that
we should make it more obvious that dangerous stuff happens. That way
people can fix stuff during development rather than finding out stuff
breaks in some corner cases on busy live systems.

> Arguing that such code
> should do things in a Postgres-aware way is not helpful for the concerns
> here, and it's not relevant to reality either, because people will load
> stuff like libperl into backends.  Good luck getting a post-fork
> on_exit_reset() call into libperl.

libperl won't fork on it's own, without the user telling it to do so,
and code that forks can very well be careful to do POSIX::_exit(),
especially in case a exec fails. I don't have much problem telling
people that a fork() without a direct exec() afterwards simply isn't
supported from PLs.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unportable coding in reorderbuffer.h

Hi,

On 2014-03-06 02:39:37 +0100, Andres Freund wrote:
> Unless somebody protests I'll try to get the remaining walsender and
> docs patches ready before sending in the patch fixing this as it's not
> disturbing the buildfarm. I'll be onsite/travelling tomorrow; so I am
> not sure I'll be done then, but friday it is.

A patch fixing this is attached. I've added some more local variables to
deal with the longer lines...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 875eda2d163e38e6533bf6cf8e6e6417aad0421b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 6 Mar 2014 02:00:47 +0100
Subject: [PATCH] Remove unportable use of anonymous unions from
 reorderbuffer.h.

In b89e151054a I had assumed it was ok to use anonymous unions as
struct members, but while a longstanding extension in many compilers,
it's only been standardized in C11.

To fix, remove one of the anonymous unions which tried to hide some
implementation specific enum values and give the other a name. The
latter unfortunately requires changes in output plugins, but since the
feature has only been added a few days ago...

Per complaint from Tom Lane.
---
 contrib/test_decoding/test_decoding.c   |  18 +-
 src/backend/replication/logical/decode.c|  28 +--
 src/backend/replication/logical/reorderbuffer.c | 283 
 src/include/replication/reorderbuffer.h |  39 ++--
 4 files changed, 186 insertions(+), 182 deletions(-)

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index ea463fb..e356c7c 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -358,43 +358,45 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	{
 		case REORDER_BUFFER_CHANGE_INSERT:
 			appendStringInfoString(ctx->out, " INSERT:");
-			if (change->tp.newtuple == NULL)
+			if (change->data.tp.newtuple == NULL)
 appendStringInfoString(ctx->out, " (no-tuple-data)");
 			else
 tuple_to_stringinfo(ctx->out, tupdesc,
-	&change->tp.newtuple->tuple,
+	&change->data.tp.newtuple->tuple,
 	false);
 			break;
 		case REORDER_BUFFER_CHANGE_UPDATE:
 			appendStringInfoString(ctx->out, " UPDATE:");
-			if (change->tp.oldtuple != NULL)
+			if (change->data.tp.oldtuple != NULL)
 			{
 appendStringInfoString(ctx->out, " old-key:");
 tuple_to_stringinfo(ctx->out, tupdesc,
-	&change->tp.oldtuple->tuple,
+	&change->data.tp.oldtuple->tuple,
 	true);
 appendStringInfoString(ctx->out, " new-tuple:");
 			}
 
-			if (change->tp.newtuple == NULL)
+			if (change->data.tp.newtuple == NULL)
 appendStringInfoString(ctx->out, " (no-tuple-data)");
 			else
 tuple_to_stringinfo(ctx->out, tupdesc,
-	&change->tp.newtuple->tuple,
+	&change->data.tp.newtuple->tuple,
 	false);
 			break;
 		case REORDER_BUFFER_CHANGE_DELETE:
 			appendStringInfoString(ctx->out, " DELETE:");
 
 			/* if there was no PK, we only know that a delete happened */
-			if (change->tp.oldtuple == NULL)
+			if (change->data.tp.oldtuple == NULL)
 appendStringInfoString(ctx->out, " (no-tuple-data)");
 			/* In DELETE, only the replica identity is present; display that */
 			else
 tuple_to_stringinfo(ctx->out, tupdesc,
-	&change->tp.oldtuple->tuple,
+	&change->data.tp.oldtuple->tuple,
 	true);
 			break;
+		default:
+			Assert(false);
 	}
 
 	MemoryContextSwitchTo(old);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index af3948e..414cfa9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -586,17 +586,17 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	change = ReorderBufferGetChange(ctx->reorder);
 	change->action = REORDER_BUFFER_CHANGE_INSERT;
-	memcpy(&change->tp.relnode, &xlrec->target.node, sizeof(RelFileNode));
+	memcpy(&change->data.tp.relnode, &xlrec->target.node, sizeof(RelFileNode));
 
 	if (xlrec->flags & XLOG_HEAP_CONTAINS_NEW_TUPLE)
 	{
 		Assert(r->xl_len > (SizeOfHeapInsert + SizeOfHeapHeader));
 
-		change->tp.newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
+		change->data.tp.newtuple = ReorderBufferGetTupleBuf(ctx->reorder);
 
 		DecodeXLogTuple((char *) xlrec + SizeOfHeapInsert,
 		r->xl_len - SizeOfHeapInsert,
-		change->tp.newtuple);
+		change->data.tp.newtuple);
 	}
 
 	ReorderBufferQueueChange(ctx->reorder, r->xl_xid, buf->origptr, change);
@@ -626,7 +626,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	change = ReorderBufferGetChange(ctx->reorder);
 	change->action = REORDER_BUFFER_CHANGE_UPDATE;
-	memcpy(&change->tp.relnode, &xlrec->target.node, sizeof(RelFileNode));
+	memcpy(&change->data.tp.relnode, &xlrec->target.node, sizeof(RelFileNode));
 
 	data = (char *) &xl

Re: [HACKERS] atexit_callback can be a net negative

Andres Freund  writes:
> On 2014-03-07 09:49:05 -0500, Tom Lane wrote:
>> No, I think it should do nothing.  The coding pattern shown in bug #9464
>> seems perfectly reasonable and I think we should allow it.

> I don't think it's a reasonable pattern run background processes that
> aren't managed by postgres with all shared memory still
> accessible. You'll have to either also detach from shared memory and
> related things, or you have to fork() and exec().

The code in question is trying to do that.  And what do you think will
happen if the exec() fails?

> At the very least, not
> integrating the child with the postmaster's lifetime will prevent
> postgres from restarting because there's still a child attached to the
> shared memory.

I think you're willfully missing the point.  The reason we added
atexit_callback was to try to defend ourselves against third-party code
that did things in a non-Postgres-aware way.  Arguing that such code
should do things in a Postgres-aware way is not helpful for the concerns
here, and it's not relevant to reality either, because people will load
stuff like libperl into backends.  Good luck getting a post-fork
on_exit_reset() call into libperl.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

2014-03-07 Thread Florian Weimer


On 03/07/2014 04:10 PM, Tom Lane wrote:

Florian Weimer  writes:

On 03/07/2014 03:57 PM, Tom Lane wrote:

It's not a reason not to do something about the much larger chance of
this happening in a direct child process, which certainly won't have a
matching PID.



Indeed.  Checking getppid() in addition might narrow things down further.


I don't think getppid adds much to the party.  In particular, what to
do if it returns 1?  You can't tell if you're an orphaned backend (in
which case you should still do normal shutdown)


Oh.  I didn't know that orphaned backends perform a normal shutdown.

--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

Florian Weimer  writes:
> On 03/07/2014 03:57 PM, Tom Lane wrote:
>> It's not a reason not to do something about the much larger chance of
>> this happening in a direct child process, which certainly won't have a
>> matching PID.

> Indeed.  Checking getppid() in addition might narrow things down further.

I don't think getppid adds much to the party.  In particular, what to
do if it returns 1?  You can't tell if you're an orphaned backend (in
which case you should still do normal shutdown) or an orphaned
grandchild.  The standalone-backend case would also complicate matters.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

On 2014-03-07 09:49:05 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-03-07 00:03:48 -0500, Tom Lane wrote:
> >> In the bug thread I proposed making atexit_callback check whether getpid()
> >> still matches MyProcPid.
> 
> > What are you proposing to do in that case? This is only one of the
> > failure cases of forking carelessly, right?
> 
> No, I think it should do nothing.  The coding pattern shown in bug #9464
> seems perfectly reasonable and I think we should allow it.  No doubt it's
> safer if the child process does an on_exit_reset; but right now, if the
> child fails to do so, atexit_callback is actively breaking things.  And
> I don't think we can rely on third-party libraries to call on_exit_reset
> after forking.

I don't think it's a reasonable pattern run background processes that
aren't managed by postgres with all shared memory still
accessible. You'll have to either also detach from shared memory and
related things, or you have to fork() and exec(). At the very least, not
integrating the child with the postmaster's lifetime will prevent
postgres from restarting because there's still a child attached to the
shared memory.
I don't see any way it's be safe for a pg unaware library to start
forking around and doing similar random things inside normal
backends.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] on_exit_reset fails to clear DSM-related exit actions

I just noticed that the DSM patch has introduced a whole new class of
failures related to the bug #9464 issue: to wit, any on_detach
actions registered in a parent process will also be performed when a
child process exits, because nothing has been added to on_exit_reset
to prevent that.  It seems likely that this is undesirable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

2014-03-07 Thread Florian Weimer


On 03/07/2014 03:57 PM, Tom Lane wrote:


I think Florian's right that there's a risk there, but it seems pretty
remote, and I don't see any reliable way to detect the case anyhow.
(Process start time?  Where would you get that from portably?)


I don't think there's a portable source for that.  On Linux, you'd have 
to use /proc.



It's not a reason not to do something about the much larger chance of
this happening in a direct child process, which certainly won't have a
matching PID.


Indeed.  Checking getppid() in addition might narrow things down further.

On Linux, linking against pthread_atfork currently requires linking 
against pthread (although this is about to change), and it might incur 
the pthread-induced overhead on some configurations.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

Heikki Linnakangas  writes:
> On 03/07/2014 04:23 PM, Florian Weimer wrote:
>> There's the PID reuse problem.  Forking twice (with a delay) could end
>> up with the same PID as MyProcPid.

> Not if the parent process is still running.

If the original parent backend is *not* still running, then running
atexit_callback in the grandchild is just as dangerous if not more so;
it could be clobbering shared-memory state belonging to some other
session that has recycled the same PGPROC.

I think Florian's right that there's a risk there, but it seems pretty
remote, and I don't see any reliable way to detect the case anyhow.
(Process start time?  Where would you get that from portably?)
It's not a reason not to do something about the much larger chance of
this happening in a direct child process, which certainly won't have a
matching PID.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

Andres Freund  writes:
> On 2014-03-07 00:03:48 -0500, Tom Lane wrote:
>> In the bug thread I proposed making atexit_callback check whether getpid()
>> still matches MyProcPid.

> What are you proposing to do in that case? This is only one of the
> failure cases of forking carelessly, right?

No, I think it should do nothing.  The coding pattern shown in bug #9464
seems perfectly reasonable and I think we should allow it.  No doubt it's
safer if the child process does an on_exit_reset; but right now, if the
child fails to do so, atexit_callback is actively breaking things.  And
I don't think we can rely on third-party libraries to call on_exit_reset
after forking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-03-07 Thread Craig Ringer

On 03/07/2014 10:07 PM, Craig Ringer wrote:
> On 03/07/2014 09:33 PM, Craig Ringer wrote:
>> On 03/05/2014 11:02 AM, Craig Ringer wrote:
>>> The main known issue remaining is plan invalidation.
>>
>> I've pushed a version with a plan invalidation implementation. It's tagged:
>>
>>   rls-9.4-upd-sb-views-v9
>>
>> in
>>
>>   g...@github.com:ringerc/postgres.git
>>
>> The invalidation implementation does not yet handle foreign key checks;
>> that will require additional changes. I'll push an update to the
>> rls-9.4-upd-sb-views and post an update later, at which time I'll rebase
>> the changes back into the history.
> 
> Well, that was easy. Done.
> 
>   rls-9.4-upd-sb-views-v11
> 
> and rebased the rls-9.4-upd-sb-views branch to incorporate the changes.
> 
> The regression tests have further failures, but some are due to changes
> in the inheritance semantics. I'm going through them now.
> 


Need a quick opinion.

KaiGai's original code produced a plan like this for an inheritance set:

  EXPLAIN (costs off) SELECT * FROM t1 WHERE f_leak(b) FOR SHARE;
! QUERY PLAN
! ---
   LockRows
!->  Append
!  ->  Subquery Scan on t1
!Filter: f_leak(t1.b)
!->  Seq Scan on t1 t1_1
!  Filter: ((a % 2) = 0)
!  ->  Subquery Scan on t2
!Filter: f_leak(t2.b)
!->  Seq Scan on t2 t2_1
!  Filter: ((a % 2) = 1)
!  ->  Seq Scan on t3
!Filter: f_leak(b)
  (12 rows)



The new code, using updatable s.b. views, instead produces:


  EXPLAIN (costs off) SELECT * FROM t1 WHERE f_leak(b) FOR SHARE;
!   QUERY PLAN
! ---
   LockRows
!->  Subquery Scan on t1
!  Filter: f_leak(t1.b)
!  ->  LockRows
!->  Result
!  ->  Append
!->  Seq Scan on t1 t1_1
!  Filter: ((a % 2) = 0)
!->  Seq Scan on t2
!  Filter: ((a % 2) = 0)
!->  Seq Scan on t3
!  Filter: ((a % 2) = 0)
  (12 rows)



The different quals are expected, because of the change to the
definition of inheritance handling in row security.

What I'm concerned about is the locking. It looks to me like we're
causing the user to lock rows that they may not intend to lock, by
applying a LockRows step *before* the user supplied qual. (I'm going to
test that tomorrow, it's sleep time in Australia).

This seems to be related to RowMark handling in updatable security
barrier views. I need to check whether it happens with updates to
security barrier views, as well as with row security.

Dean, any thoughts?



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative

On 2014-03-07 00:03:48 -0500, Tom Lane wrote:
> In the bug thread I proposed making atexit_callback check whether getpid()
> still matches MyProcPid.

What are you proposing to do in that case? This is only one of the
failure cases of forking carelessly, right?
I think the only appropriate thing would be to make as much noise as
possible in that case, which is probably something like writing a
message to stderr, and then abort().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] atexit_callback can be a net negative


On 03/07/2014 04:23 PM, Florian Weimer wrote:

On 03/07/2014 06:03 AM, Tom Lane wrote:


In the bug thread I proposed making atexit_callback check whether getpid()
still matches MyProcPid.  If it doesn't, then presumably we inherited the
atexit callback list, along with the value of MyProcPid, from some parent
backend process whose elbow we should not joggle.  Can anyone see a flaw
in that?


There's the PID reuse problem.  Forking twice (with a delay) could end
up with the same PID as MyProcPid.


Not if the parent process is still running.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl status with nonexistent data directory

On Fri, Mar  7, 2014 at 07:18:04PM +0530, Amit Kapila wrote:
> > OK, done with the attached patch  Three is returned for status, but one
> > for other cases.
> 
> I think it would have been better if it return status as 4 for cases where
> directory/file is not accessible (current new cases added by this patch).
> 
> I think status 3 should be only return only when the data directory is valid
> and pid file is missing, because in script after getting this code, the next
> thing they will probably do is to start the server which doesn't seem a
> good fit for newly added cases.

OK, I have updated the attached patch to reflect this, and added a C
comment.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index 0dbaa6e..56d238f
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** static bool allow_core_files = false;
*** 97,102 
--- 97,103 
  static time_t start_time;
  
  static char postopts_file[MAXPGPATH];
+ static char version_file[MAXPGPATH];
  static char pid_file[MAXPGPATH];
  static char backup_file[MAXPGPATH];
  static char recovery_file[MAXPGPATH];
*** static void pgwin32_doRunAsService(void)
*** 152,158 
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(void);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
--- 153,159 
  static int	CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_service);
  #endif
  
! static pgpid_t get_pgpid(bool is_status_request);
  static char **readfile(const char *path);
  static void free_readfile(char **optlines);
  static int	start_postmaster(void);
*** print_msg(const char *msg)
*** 246,255 
  }
  
  static pgpid_t
! get_pgpid(void)
  {
  	FILE	   *pidf;
  	long		pid;
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
--- 247,280 
  }
  
  static pgpid_t
! get_pgpid(bool is_status_request)
  {
  	FILE	   *pidf;
  	long		pid;
+ 	struct stat statbuf;
+ 
+ 	if (stat(pg_data, &statbuf) != 0)
+ 	{
+ 		if (errno == ENOENT)
+ 			printf(_("%s: directory \"%s\" does not exist\n"), progname,
+ 	 pg_data);
+ 		else
+ 			printf(_("%s: cannot access directory \"%s\"\n"), progname,
+ 	 pg_data);
+ 		/*
+ 		 * The Linux Standard Base Core Specification 3.1 says this should return
+ 		 * '4, program or service status is unknown'
+ 		 * https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
+ 		 */
+ 		exit(is_status_request ? 4 : 1);
+ 	}
+ 
+ 	if (stat(version_file, &statbuf) != 0 && errno == ENOENT)
+ 	{
+ 		printf(_("%s: directory \"%s\" is not a database cluster directory\n"),
+  progname, pg_data);
+ 		exit(is_status_request ? 4 : 1);
+ 	}
  
  	pidf = fopen(pid_file, "r");
  	if (pidf == NULL)
*** do_start(void)
*** 810,816 
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid();
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  		   "trying to start server anyway\n"),
--- 835,841 
  
  	if (ctl_command != RESTART_COMMAND)
  	{
! 		old_pid = get_pgpid(false);
  		if (old_pid != 0)
  			write_stderr(_("%s: another server might be running; "
  		   "trying to start server anyway\n"),
*** do_stop(void)
*** 894,900 
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)/* no pid file */
  	{
--- 919,925 
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)/* no pid file */
  	{
*** do_stop(void)
*** 943,949 
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  print_msg(".");
  pg_usleep(100);		/* 1 sec */
--- 968,974 
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  print_msg(".");
  pg_usleep(100);		/* 1 sec */
*** do_restart(void)
*** 980,986 
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid();
  
  	if (pid == 0)/* no pid file */
  	{
--- 1005,1011 
  	pgpid_t		pid;
  	struct stat statbuf;
  
! 	pid = get_pgpid(false);
  
  	if (pid == 0)/* no pid file */
  	{
*** do_restart(void)
*** 1033,1039 
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid()) != 0)
  			{
  print_msg(".");
  pg_usleep(100);		/* 1 sec */
--- 1058,1064 
  
  		for (cnt = 0; cnt < wait_seconds; cnt++)
  		{
! 			if ((pid = get_pgpid(false)) != 0)
  			{
  print_msg(".");
  pg_usleep(100);		/* 1 sec */
*** do_reload(void)
*** 1071,1077 
  {
  	pgpid_t		pid;
  
! 	pid = ge

Re: [HACKERS] atexit_callback can be a net negative

2014-03-07 Thread Florian Weimer


On 03/07/2014 06:03 AM, Tom Lane wrote:


In the bug thread I proposed making atexit_callback check whether getpid()
still matches MyProcPid.  If it doesn't, then presumably we inherited the
atexit callback list, along with the value of MyProcPid, from some parent
backend process whose elbow we should not joggle.  Can anyone see a flaw
in that?


There's the PID reuse problem.  Forking twice (with a delay) could end 
up with the same PID as MyProcPid.  Comparing the process start time 
would protect against that.  Checking getppid() would have the same 
theoretical problem.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-03-07 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
> On 2/27/14, 6:04 AM, Dimitri Fontaine wrote:
> > What about allowing a control file like this:
> > 
> ># hstore extension
> >comment = 'data type for storing sets of (key, value) pairs'
> >default_version = '1.3'
> >directory = 'local/hstore-new'
> >module_pathname = '$directory/hstore'
> >relocatable = true
> 
> I think your previously proposed patch to add extension_control_path
> plus my suggestion to update existing de facto best practices to not
> include $libdir into the module path name (thus allowing the use of
> dynamic_library_path) will address all desired use cases just fine.

You still haven't addressed how to deal with the case of multiple .so's
with the same name ending up in the dynamic_library_path.  I don't see
that as unlikely to end up happening either.

> Moreover, going that way would reuse existing facilities and concepts,
> remove indirections and reduce overall complexity.  This new proposal,
> on the other hand, would go the other way, introducing new concepts,
> adding more indirections, and increasing overall complexity, while
> actually achieving less.

Being able to have a self-contained module which requires a minimum of
modification to postgresql.conf is a reduction in complexity, imv.
Having to maintain two config options which will end up being overly
long and mostly duplicated doesn't make things easier for people.  This
has made me wonder if we could allow a control file to be explicitly
referred to from CREATE EXTENSION itself, dropping the need for any of
this postgresql.conf/GUC maintenance.  There are downsides to that
approach as well, of course, but it's definitely got a certain appeal.

> I see an analogy here.  What we are currently doing is similar to
> hardcoding absolute rpaths into all libraries.  Your proposal is
> effectively to (1) add the $ORIGIN mechanism and (2) make people use
> chrpath when they want to install somewhere else.  My proposal is to get
> rid of all rpaths and just set a search path.  Yes, on technical level,
> this is less powerful, but it's simpler and gets the job done and is
> harder to misuse.

I don't buy off on this analogy.  For starters, you can change the
control file without needing to rebuild the library, but the main
difference is that, in practice, there are no library transistions
happening and instead what we're likely to have are independent and
*incompatible* libraries living with the same name in our path.

> A problem with features like these is that they get rarely used but
> offer infinite flexibility, so they are not used consistently and you
> can't rely on anything.  This is already the case for the
> module_pathname setting in the control file.  It has, AFAICT, no actual
> use, and because of that no one uses it, and because of that, there is
> no guarantee that extensions use it sensibly, and because of that no one
> can ever make sensible use of it in the future, because there is no
> guarantee that extensions have it set sensibly.  In fact, I would
> propose deprecating module_pathname.

This makes sense when you have complete control over where things are
installed to and can drop the control file into the "one true directory
of control files" and similairly with the .so.  Indeed, that works
already today for certain platforms, but from what I understand, on the
OSX platform you don't really get to just dump files anywhere on the
filesystem that you want and instead end up forced into a specific
directory tree.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] GSoC on WAL-logging hash indexes

On 03/07/2014 03:48 PM, Robert Haas wrote:

On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas
  wrote:

>Hmm. You suggested ensuring that a scan always has at least a pin, and split
>takes a vacuum-lock. That ought to work. There's no need for the more
>complicated maneuvers you described, ISTM that you can just replace the
>heavy-weight share lock with holding a pin on the primary page of the
>bucket, and an exclusive lock with a vacuum-lock. Note that
>_hash_expandtable already takes the exclusive lock conditionally, ie. if it
>doesn't get the lock immediately it just gives up. We could do the same with
>the cleanup lock.

We could try that.  I assume you mean do*just*  what you describe
here, without the split-in-progress or moved-by-split flags I
suggested.

Yep.

The only issue I see with that is that instead of everyone
piling up on the heavyweight lock, a wait which is interruptible,
they'd all pile up on the buffer content lwlock, a wait which isn't.
And splitting a bucket can involve an arbitrary number of I/O
operations, so that's kind of unappealing.  Even checkpoints would be
blocked until the bucket split completed, which seems unfortunate.

Hmm. I doubt that's a big deal in practice, although I agree it's a bit 
ugly.

Once we solve the crash-safety of splits, we actually have the option of 
doing the split in many small steps, even when there's no crash 
involved. You could for example grab the vacuum-lock, move all the 
tuples in the first 5 pages, and then release the lock to give other 
backends that are queued up a chance to do their scans/insertions. Then 
re-acquire the lock, and continue where you left. Or just bail out and 
let the next vacuum or insertion to finish it later.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-03-07 Thread Craig Ringer

On 03/07/2014 09:33 PM, Craig Ringer wrote:
> On 03/05/2014 11:02 AM, Craig Ringer wrote:
>> The main known issue remaining is plan invalidation.
> 
> I've pushed a version with a plan invalidation implementation. It's tagged:
> 
>   rls-9.4-upd-sb-views-v9
> 
> in
> 
>   g...@github.com:ringerc/postgres.git
> 
> The invalidation implementation does not yet handle foreign key checks;
> that will require additional changes. I'll push an update to the
> rls-9.4-upd-sb-views and post an update later, at which time I'll rebase
> the changes back into the history.

Well, that was easy. Done.

  rls-9.4-upd-sb-views-v11

and rebased the rls-9.4-upd-sb-views branch to incorporate the changes.

The regression tests have further failures, but some are due to changes
in the inheritance semantics. I'm going through them now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-07 Thread k...@rice.edu

On Thu, Mar 06, 2014 at 06:14:21PM -0500, Robert Haas wrote:
> On Thu, Mar 6, 2014 at 3:44 PM, Jeff Janes  wrote:
> >
> > I've been tempted to implement a new type of hash index that allows both WAL
> > and high concurrency, simply by disallowing bucket splits.  At the index
> > creation time you use a storage parameter to specify the number of buckets,
> > and that is that. If you mis-planned, build a new index with more buckets,
> > possibly concurrently, and drop the too-small one.
> 
> Yeah, we could certainly do something like that.  It sort of sucks,
> though.  I mean, it's probably pretty easy to know that starting with
> the default 2 buckets is not going to be enough; most people will at
> least be smart enough to start with, say, 1024.  But are you going to
> know whether you need 32768 or 1048576 or 33554432?  A lot of people
> won't, and we have more than enough reasons for performance to degrade
> over time as it is.
> 
It would be useful to have a storage parameter for the target size of
the index, even if it is not exact, to use in the initial index build
to avoid the flurry of i/o caused by bucket splits as the index grows.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

On 03/07/2014 03:48 PM, Robert Haas wrote:

>So, there seems to be a few fairly simple and independent improvements to be
>made:
>
>1. Replace the heavy-weight lock with pin & vacuum-lock.
>2. Make it crash-safe, by adding WAL-logging
>3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for
>the squeeze phase.
>4. Cache the metapage.
>
>We still don't know if it's going to be any better than B-tree after all
>that's done, but the only way to find out is to go ahead and implement it.

I'm of the opinion that we ought to start by making splits and
vacuuming more concurrent, and then do that stuff.

Priorities are a matter of opinion, but note that for many applications 
the concurrency of splits and vacuuming is pretty much irrelevant (at 
least after we do bullet point 3 above). Like, if the index is mostly 
read-only, or at least doesn't grow much. You'd still want it to be 
crash-safe (2), and you might care a lot about the performance of scans 
(1 and 4), but if splits and vacuum hold an exclusive lock for a few 
seconds, that's OK if you only need to do it once in a blue moon.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

[ I just sent a reply to Greg Stark's email which touches on some of
the same points you mentioned here; that was mostly written last
night, and finished this morning before seeing this.  ]

On Fri, Mar 7, 2014 at 4:34 AM, Heikki Linnakangas
 wrote:
> Hmm. You suggested ensuring that a scan always has at least a pin, and split
> takes a vacuum-lock. That ought to work. There's no need for the more
> complicated maneuvers you described, ISTM that you can just replace the
> heavy-weight share lock with holding a pin on the primary page of the
> bucket, and an exclusive lock with a vacuum-lock. Note that
> _hash_expandtable already takes the exclusive lock conditionally, ie. if it
> doesn't get the lock immediately it just gives up. We could do the same with
> the cleanup lock.

We could try that.  I assume you mean do *just* what you describe
here, without the split-in-progress or moved-by-split flags I
suggested.  The only issue I see with that is that instead of everyone
piling up on the heavyweight lock, a wait which is interruptible,
they'd all pile up on the buffer content lwlock, a wait which isn't.
And splitting a bucket can involve an arbitrary number of I/O
operations, so that's kind of unappealing.  Even checkpoints would be
blocked until the bucket split completed, which seems unfortunate.

But I like the idea of teaching each scan to retain a pin on the
primary buffer page.  If we then do the split the way I proposed
before, we can implement the "outwait concurrent scans" step by taking
a cleanup lock on the primary buffer page.  In this design, we only
need to acquire and release the cleanup lock.  Once we get the cleanup
lock on the primary buffer page, even for an instant, we know that
there are no remaining scans in the bucket that need the pre-split
tuples that have now been moved to the new bucket.  We can then remove
tuples with a lesser lock (or keep the stronger lock if we wish to
re-pack).

> Vacuum could also be enhanced. It currently takes an exclusive lock on the
> bucket, then removes any dead tuples and finally "squeezes" the bucket by
> moving tuples to earlier pages. But you only really need the exclusive lock
> for the squeeze-phase. You could do the dead-tuple removal without the
> bucket-lock, and only grab for the squeeze-phase. And the squeezing is
> optional, so you could just skip that if you can't get the lock. But that's
> a separate patch as well.

Yeah, I think this would be a useful improvement.

> One more thing we could do to make hash indexes more scalable, independent
> of the above: Cache the information in the metapage in backend-private
> memory. Then you (usually) wouldn't need to access the metapage at all when
> scanning. Store a copy of the bitmask for that bucket in the primary page,
> and when scanning, check that it matches the cached value. If not, refresh
> the cached metapage and try again.

I don't know whether this would be a useful improvement.

> So, there seems to be a few fairly simple and independent improvements to be
> made:
>
> 1. Replace the heavy-weight lock with pin & vacuum-lock.
> 2. Make it crash-safe, by adding WAL-logging
> 3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM for
> the squeeze phase.
> 4. Cache the metapage.
>
> We still don't know if it's going to be any better than B-tree after all
> that's done, but the only way to find out is to go ahead and implement it.

I'm of the opinion that we ought to start by making splits and
vacuuming more concurrent, and then do that stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl status with nonexistent data directory

2014-03-07 Thread Amit Kapila

On Fri, Mar 7, 2014 at 9:41 AM, Bruce Momjian  wrote:
> On Fri, Mar  7, 2014 at 09:07:24AM +0530, Amit Kapila wrote:
>> One reason could be that as we are already returning special exit code
>> for 'status' option of pg_ctl (exit(3)), this seems to be inline with it as 
>> this
>> also get called during status command.
>>
>> Also in the link sent by you in another mail, I could see some statement
>> which indicates it is more important to return correct codes in case of
>> status which sounds a good reason to me.
>>
>> Link and statement
>> http://www.postgresql.org/message-id/51d1e482.5090...@gmx.net
>>
>> "The "status" case is different, because there the exit code
>> can be passed out by the init script directly."
>>
>> If we just want to handle correct exit codes for "status" option, then
>> may be we need to refactor the code a bit to ensure the same.
>
> OK, done with the attached patch  Three is returned for status, but one
> for other cases.

I think it would have been better if it return status as 4 for cases where
directory/file is not accessible (current new cases added by this patch).

I think status 3 should be only return only when the data directory is valid
and pid file is missing, because in script after getting this code, the next
thing they will probably do is to start the server which doesn't seem a
good fit for newly added cases.

Other than that patch seems fine to me.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-03-07 Thread Craig Ringer

On 03/05/2014 11:02 AM, Craig Ringer wrote:
> The main known issue remaining is plan invalidation.

I've pushed a version with a plan invalidation implementation. It's tagged:

  rls-9.4-upd-sb-views-v9

in

  g...@github.com:ringerc/postgres.git

The invalidation implementation does not yet handle foreign key checks;
that will require additional changes. I'll push an update to the
rls-9.4-upd-sb-views and post an update later, at which time I'll rebase
the changes back into the history.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

On 2014-03-07 10:17:21 -0300, Alvaro Herrera wrote:
> Andres Freund escribió:
> 
> > fprintf(stderr,
> > -   _("%s: could not identify system: got 
> > %d rows and %d fields, expected %d rows and %d fields\n"),
> > -   progname, PQntuples(res), 
> > PQnfields(res), 1, 3);
> > +   _("%s: could not identify system: got 
> > %d rows and %d fields, expected 1 row and 3 or more fields\n"),
> > +   progname, PQntuples(res), 
> > PQnfields(res));
> 
> Please don't change this.  The reason these messages use %d and an extra
> printf argument is to avoid giving translators extra work when the
> number of rows or fields is changed.  In these cases I suggest this:
> 
> > -   _("%s: could not identify system: got 
> > %d rows and %d fields, expected %d rows and %d fields\n"),
> > -   progname, PQntuples(res), 
> > PQnfields(res), 1, 3);
> > +   _("%s: could not identify system: got 
> > %d rows and %d fields, expected %d rows and %d or more fields\n"),
> > +   progname, PQntuples(res), 
> > PQnfields(res), 1, 3);
> 
> (Yes, I know the "expected 1 rows" output looks a bit silly.  Since this
> is an unexpected error message anyway, I don't think that's worth
> fixing.)

I changed it to not use placeholders because I thought "or more" was
specific enough to be unlikely to be used in other places, but I don't
have a problem with continuing to use them.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

On Thu, Mar 6, 2014 at 7:07 PM, Greg Stark  wrote:
> On Thu, Mar 6, 2014 at 11:14 PM, Robert Haas  wrote:
>>> I've been tempted to implement a new type of hash index that allows both WAL
>>> and high concurrency, simply by disallowing bucket splits.  At the index
>>> creation time you use a storage parameter to specify the number of buckets,
>>> and that is that. If you mis-planned, build a new index with more buckets,
>>> possibly concurrently, and drop the too-small one.
>>
>> Yeah, we could certainly do something like that.  It sort of sucks,
>> though.  I mean, it's probably pretty easy to know that starting with
>> the default 2 buckets is not going to be enough; most people will at
>> least be smart enough to start with, say, 1024.  But are you going to
>> know whether you need 32768 or 1048576 or 33554432?  A lot of people
>> won't, and we have more than enough reasons for performance to degrade
>> over time as it is.
>
> The other thought I had was that you can do things lazily in vacuum.
> So when you probe you need to check multiple pages until vacuum comes
> along and rehashes everything.

I was thinking about this, too.  Cleaning up the old bucket after the
split is pretty similar to vacuuming.  And lo and behold, vacuum also
needs to lock the entire bucket.  AFAICT, there are two reasons for
this.  First, when we resume a suspended scan, we assume that the next
match on the page, if any, will occur on the same page at an offset
greater than the offset where we found the previous match.  The code
copes with the possibility of current insertions by refinding the last
item we returned and scanning forward from there, but assumes the
pointer we're refinding can't have moved to a lower offset.  I think
this could be easily fixed - at essentially no cost - by changing
hashgettuple so that, if the forward scan errors out, it tries
scanning backward rather than just giving up.  Second, vacuum compacts
each bucket that it modifies using _hash_squeezebucket, which scans
from the two ends of the index in toward the middle, filling any free
space on earlier pages by pulling tuples from the end of the bucket
chain.  This is a little thornier.  We could change vacuum so that it
only removes TIDs from the individual pages, without actually trying
to free up pages, but that seems undesirable.

However, I think we might be able to get by with making bucket
compaction less aggressive, without eliminating it altogether.
Suppose that whenever we remove items from a page, we consider
consolidating the page with its immediate predecessor and successor in
the bucket chain.  This means our utilization will be a little over
50% in the worst case where we have a full page, a page with one item,
another full page, another page with one item, and so on.  But that's
not all bad, because compacting a bucket chain to the minimum possible
size when it may well be about to suffer more inserts isn't
necessarily a good thing anyway.  Also, doing this means that we don't
need to lock out scans from the entire bucket chain in order to
compact.  It's sufficient to make sure that nobody's in the middle of
scanning the two pages we want to merge.

That turns out not to be possible right now.  When a scan is
suspended, we still hold a pin on the page we're scanning.  But when
_hash_readnext or _hash_readprev walks the bucket chain, it drops both
lock and pin on the current buffer and then pins and locks the new
buffer.  That, however, seems like it could easily be changed: drop
lock on current buffer, acquire pin on new buffer, drop pin on current
buffer, lock new buffer.  Then, if we want to merge two buffers, it's
enough to lock both of them for cleanup.  To avoid any risk of
deadlock, and also to avoid waiting for a long-running suspended scan
to wake up and complete, we can do this conditionally; if we fail to
get either cleanup lock, then just don't merge the pages; take an
exclusive lock only and remove whatever you need to remove, leaving
the rest.  Merging pages is only a performance optimization, so if it
fails now and then, no real harm done.

(A side benefit of this approach is that we could opportunistically
attempt to compact pages containing dead items any time we can manage
a ConditionalLockBufferForCleanup() on the page, sort of like what HOT
pruning does for heap blocks.  We could even, after pruning away dead
items, attempt to merge the page with siblings, so that even without
vacuum the bucket chains can gradually shrink if the index tuples are
discovered to be pointing to dead heap tuples.)

With the above changes, vacuuming a bucket can happen without taking
the heavyweight lock in exclusive mode, leaving bucket splitting as
the only operation that requires it.  And we could perhaps use
basically the same algorithm to clean the tuples out of the old bucket
after the split.  The problem is that, when we're vacuuming, we know
that no scan currently in progress can still care about any of the
tuples we're removing.

Re: [HACKERS] Changeset Extraction v7.9.1

2014-03-07 Thread Alvaro Herrera

Andres Freund escribió:

>   fprintf(stderr,
> - _("%s: could not identify system: got 
> %d rows and %d fields, expected %d rows and %d fields\n"),
> - progname, PQntuples(res), 
> PQnfields(res), 1, 3);
> + _("%s: could not identify system: got 
> %d rows and %d fields, expected 1 row and 3 or more fields\n"),
> + progname, PQntuples(res), 
> PQnfields(res));

Please don't change this.  The reason these messages use %d and an extra
printf argument is to avoid giving translators extra work when the
number of rows or fields is changed.  In these cases I suggest this:

> - _("%s: could not identify system: got 
> %d rows and %d fields, expected %d rows and %d fields\n"),
> - progname, PQntuples(res), 
> PQnfields(res), 1, 3);
> + _("%s: could not identify system: got 
> %d rows and %d fields, expected %d rows and %d or more fields\n"),
> + progname, PQntuples(res), 
> PQnfields(res), 1, 3);

(Yes, I know the "expected 1 rows" output looks a bit silly.  Since this
is an unexpected error message anyway, I don't think that's worth
fixing.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9.1

Hi,

On 2014-03-05 23:20:57 +0100, Andres Freund wrote:
> On 2014-03-05 17:05:24 -0500, Robert Haas wrote:
> > > I very much dislike having the three different event loops, but it's
> > > pretty much forced by the design of the xlogreader. "My" xlogreader
> > > version didn't block when it neeeded to wait for WAL but just returned
> > > "need input/output", but with the eventually committed version you're
> > > pretty much forced to block inside the read_page callback.
> > >
> > > I don't really have a idea how we could sensibly unify them atm.
> > 
> > WalSndLoop(void (*gutsfn)())?
> 
> The problem is that they are actually different. In the WalSndLoop we're
> also maintaining the walsender's state, in WalSndWriteData() we're just
> waiting for writes to be flushed, in WalSndWaitForWal we're primarily
> waiting for the flush pointer to pass some LSN. And the timing of the
> individual checks isn't trivial (just added some more comments about
> it).
> 
> I'll simplify it by pulling out more common code, maybe it'll become
> apparent how it should look.

I've attached a new version of the walsender patch. It's been rebased
ontop of Heikki's latest commit to walsender.c. I've changed a fair bit
of stuff:
* The sleeptime is now computed to sleep until we either need to send a
  keepalive or kill ourselves, as Heikki sugggested.
* Sleep time computation, sending pings, checking timeouts is now done
  in separate functions.
* Comment and codestyle improvements.

Although they are shorter and simpler now, I have not managed to unify
the three loops however. They seem to be too different to unify them
inside one. I tried a common function with an 'wait_for' bitmask
argument, but that turned out to be fairly illegible. The checks in
WalSndWaitForWal() and WalSndLoop() just seem to be too different.

I'd be grateful if you (or somebody else!) could have a quick look at
body of the loops in WalSndWriteData(), WalSndWaitForWal() and
WalSndLoop(). Maybe I am just staring at it the wrong way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ca84cd3d2966f0e7297d1c7270b094122eec2f18 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 5 Mar 2014 00:15:38 +0100
Subject: [PATCH] Add walsender interface for the logical decoding
 functionality.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This exposes the changeset extraction feature via walsenders which
allows the data to be received in a streaming fashion and supports
synchronous replication.

To do this walsenders need to be able to connect to a specific
database. For that it extend the existing 'replication' parameter to
not only allow a boolean value but also "database". If the latter is
specified we connect to the database specified in 'dbname'.

Andres Freund, with contributions from Álvaro Herrera, reviewed by
Robert Haas.
---
 doc/src/sgml/protocol.sgml |  24 +-
 src/backend/postmaster/postmaster.c|  28 +-
 .../libpqwalreceiver/libpqwalreceiver.c|   4 +-
 src/backend/replication/repl_gram.y|  81 +-
 src/backend/replication/repl_scanner.l |   1 +
 src/backend/replication/walsender.c| 914 ++---
 src/backend/utils/init/postinit.c  |  15 +-
 src/bin/pg_basebackup/pg_basebackup.c  |   6 +-
 src/bin/pg_basebackup/pg_receivexlog.c |   6 +-
 src/bin/pg_basebackup/receivelog.c |   6 +-
 src/include/replication/walsender.h|   1 +
 src/tools/pgindent/typedefs.list   |   1 +
 12 files changed, 918 insertions(+), 169 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index d36f2f3..510bf9a 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1302,10 +1302,13 @@
 
 
 To initiate streaming replication, the frontend sends the
-replication parameter in the startup message. This tells the
-backend to go into walsender mode, wherein a small set of replication commands
-can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+replication parameter in the startup message. A boolean value
+of true tells the backend to go into walsender mode, wherein a
+small set of replication commands can be issued instead of SQL statements. Only
+the simple query protocol can be used in walsender mode.
+Passing a database as the value instructs walsender to connect to
+the database specified in the dbname paramter which will in future
+allow some additional commands to the ones specified below to be run.
 
 The commands accepted in walsender mode are:
 
@@ -1315,7 +1318,7 @@ The commands accepted in walsender mode are:
 
  
   Requests the server to identify itself. Server replies with a result
-  set of a single row, containing th

Re: [HACKERS] Hot standby doesn't come up on some situation.

2014-03-07 Thread Kyotaro HORIGUCHI

Hello,

> > After all, I have confirmed that this fixes the problem on crash
> > recovery of hot-standby botfor 9.3 and HEAD and no problem was
> > found except unreadability :(
> 
> Ok, committed. Thanks!

Thank you.

> Any concrete suggestions about the readability? Is there some
> particular spot that needs clarifying?

Well, Although I became to see no problem there after I understood how
it works:-), I'll write about two points where I had difficulties to
understand.

|  * (or the checkpoint record itself, if it's a shutdown checkpoint).
|  */
| if (checkPoint.redo < RecPtr)

First, it was a bit tough work to confirm the equivalence between
(redo == RecPtr) and that the checkpoint is shutdown checkpoint.
Although finally I was convinced that it surely holds, that is
actually not the point. The point here is in the first half of the
phrase. The comment might be less perplexing if it were as folowing
even if only shutdown checkpoint satisfies the condition. But it would
occur another quiestion in readers' mind.

|  * (or the checkpoint record itself, e.g. if it's a shutdown checkpoint).

Second, the added code depends on the assumption that RecPtr points to
the checkpoint record and EndRecPtr points to the next record there.
It would be better for understandability and stability (against
modifying code) to explicitly declare the precondition, like this

| Here RecPtr points the checkpoint record and EndRecPtr points to the
| place for the record just after.


> > Surely this is the consequence of illegal operation but I think
> > it is also not a issue of assertion - which fires on something
> > wrong in design or quite rare cases(this case ?).
> 
> Ah, I see. Yes, that's definitely a bug. If you don't hit the
> assertion, because the oldestActiveXID is set in the checkpoint record
> even though wal_level is 'archive', or if you simply have assertions
> disabled, the system will start up in hot standby mode even though
> it's not safe.
> 
> > So it might be
> > better to show message as below on the case.
> >
> > | FATAL:  Checkpoint doesn't have valid oldest active transaction id
> > | HINT: Reading WAL might have been written under insufficient
> > | wal_level.

Agreed.

> Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though
> wal_level is 'archive'. So the above patch doesn't fix the whole
> problem.
> 
> The real bug here is that CheckRequiredParameterValues() tests for
> InArchiveRecovery, when it should be testing for
> ArchiveRecoveryRequested. Otherwise, the checks are not performed when
> going through the crash recovery followed by archive recovery. I
> should've changed that as part of the commit that added the crash
> recovery then archive recovery behavior.
> 
> Fixed, thanks for pointing it out!

It's my pleasre.

regrds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-03-07 Thread Peter Eisentraut

On 2/27/14, 6:04 AM, Dimitri Fontaine wrote:
> What about allowing a control file like this:
> 
># hstore extension
>comment = 'data type for storing sets of (key, value) pairs'
>default_version = '1.3'
>directory = 'local/hstore-new'
>module_pathname = '$directory/hstore'
>relocatable = true

I think your previously proposed patch to add extension_control_path
plus my suggestion to update existing de facto best practices to not
include $libdir into the module path name (thus allowing the use of
dynamic_library_path) will address all desired use cases just fine.

Moreover, going that way would reuse existing facilities and concepts,
remove indirections and reduce overall complexity.  This new proposal,
on the other hand, would go the other way, introducing new concepts,
adding more indirections, and increasing overall complexity, while
actually achieving less.

I see an analogy here.  What we are currently doing is similar to
hardcoding absolute rpaths into all libraries.  Your proposal is
effectively to (1) add the $ORIGIN mechanism and (2) make people use
chrpath when they want to install somewhere else.  My proposal is to get
rid of all rpaths and just set a search path.  Yes, on technical level,
this is less powerful, but it's simpler and gets the job done and is
harder to misuse.

A problem with features like these is that they get rarely used but
offer infinite flexibility, so they are not used consistently and you
can't rely on anything.  This is already the case for the
module_pathname setting in the control file.  It has, AFAICT, no actual
use, and because of that no one uses it, and because of that, there is
no guarantee that extensions use it sensibly, and because of that no one
can ever make sensible use of it in the future, because there is no
guarantee that extensions have it set sensibly.  In fact, I would
propose deprecating module_pathname.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] missing RelationCloseSmgr in FreeFakeRelcacheEntry?


On 03/06/2014 02:18 AM, Bruce Momjian wrote:

On Tue, Nov  5, 2013 at 08:36:32PM +0100, Andres Freund wrote:

On 2013-11-04 13:48:32 +0100, Andres Freund wrote:

What about just unowning the smgr entry with
if (rel->rd_smgr != NULL)
smgrsetowner(NULL, rel->rd_smgr)
when closing the fake relcache entry?

That shouldn't require any further changes than changing the comment in
smgrsetowner() that this isn't something expected to frequently happen.


Attached is a patch doing it like that, it required slightmy more
invasive changes than that. With the patch applied we survive replay of
a primary's make check run under valgrind without warnings.


Where are we on this patch?


Committed now, with small changes. I made the new smgrclearowner 
function to check that the SmgrRelation object is indeed owned by the 
owner we expect, so that it doesn't unown it if it's actually owned by 
someone else. That shouldn't happen, but it seemed prudent to check.


Thanks Andres. I tried to reproduce the valgrind message you reported, 
but couldn't. How did you do it? Did this commit fix it?


And thanks for the nudge, Bruce.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

On 03/06/2014 09:34 PM, Robert Haas wrote:

On Thu, Mar 6, 2014 at 8:11 AM, Heikki Linnakangas
wrote:

I don't think it's necessary to improve concurrency just to get WAL-logging.
Better concurrency is a worthy goal of its own, of course, but it's a
separate concern.

To some extent, I agree, but only to some extent. To make hash
indexes generally usable, we really need to solve both problems. When
I got rid of just some of the heavyweight locking in commit
76837c1507cb5a5a0048046233568447729e66dd, the results were pretty
dramatic at higher levels of concurrency:

http://www.postgresql.org/message-id/CA+Tgmoaf=nojxlyzgcbrry+pe-0vll0vfhi6tjdm3fftvws...@mail.gmail.com

But there was still an awful lot of contention inside the heavyweight
lock manager, and I don't think hash indexes are going to be ready for
prime time until we solve that problem.

Hmm. You suggested ensuring that a scan always has at least a pin, and
split takes a vacuum-lock. That ought to work. There's no need for the
more complicated maneuvers you described, ISTM that you can just replace
the heavy-weight share lock with holding a pin on the primary page of
the bucket, and an exclusive lock with a vacuum-lock. Note that
_hash_expandtable already takes the exclusive lock conditionally, ie. if
it doesn't get the lock immediately it just gives up. We could do the
same with the cleanup lock.

Vacuum could also be enhanced. It currently takes an exclusive lock on
the bucket, then removes any dead tuples and finally "squeezes" the
bucket by moving tuples to earlier pages. But you only really need the
exclusive lock for the squeeze-phase. You could do the dead-tuple
removal without the bucket-lock, and only grab for the squeeze-phase.
And the squeezing is optional, so you could just skip that if you can't
get the lock. But that's a separate patch as well.

One more thing we could do to make hash indexes more scalable,
independent of the above: Cache the information in the metapage in
backend-private memory. Then you (usually) wouldn't need to access the
metapage at all when scanning. Store a copy of the bitmask for that
bucket in the primary page, and when scanning, check that it matches the
cached value. If not, refresh the cached metapage and try again.

So, there seems to be a few fairly simple and independent improvements
to be made:

1. Replace the heavy-weight lock with pin & vacuum-lock.
2. Make it crash-safe, by adding WAL-logging
3. Only acquire the exclusive-lock (vacuum-lock after step 1) in VACUUM
for the squeeze phase.

4. Cache the metapage.

We still don't know if it's going to be any better than B-tree after all
that's done, but the only way to find out is to go ahead and implement it.

This seems like a great GSoC project to me. We have a pretty good idea
of what we want to accomplish. It's uncontroversial: I don't think
anyone is going to object to improving hash indexes (one could argue
that it's a waste of time, but that's different from objecting to the
idea). And it consists of a few mostly independent parts, so it's
possible to do incrementally which makes it easier to track progress,
and we'll probably have something useful at the end of the summer even
if it doesn't all get finished.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe Reply-To:

2014-03-07 Thread Simon Riggs

On 6 March 2014 22:43, Noah Misch  wrote:
> On Tue, Mar 04, 2014 at 06:50:17PM +0100, Andres Freund wrote:
>> On 2014-03-04 11:40:10 -0500, Tom Lane wrote:
>> > Robert Haas  writes: > I think this is all too
>> > late for 9.4, though.
>> >
>> > I agree with the feeling that a meaningful fix for pg_dump isn't going
>> > to get done for 9.4.  So that leaves us with the alternatives of (1)
>> > put off the lock-strength-reduction patch for another year; (2) push
>> > it anyway and accept a reduction in pg_dump reliability.
>> >
>> > I don't care for (2).  I'd like to have lock strength reduction as
>> > much as anybody, but it can't come at the price of reduction of
>> > reliability.
>>
>> I am sorry, but I think this is vastly overstating the scope of the
>> pg_dump problem. CREATE INDEX *already* doesn't require a AEL, and the
>> amount of problems that has caused in the past is surprisingly low. If
>> such a frequently used command doesn't cause problems, why are you
>> assuming other commands to be that problematic? And I think it's hard to
>> argue that the proposed changes are more likely to cause problems.
>>
>> Let's try to go at this a bit more methodically. The commands that -
>> afaics - change their locklevel due to latest patch (v21) are:
> [snip]
>
> Good analysis.  The hazards arise when pg_dump uses one of the ruleutils.c
> deparse worker functions.  As a cross-check to your study, I looked briefly at
> the use of those functions in pg_dump and how this patch might affect them:
>
> -- pg_get_constraintdef()
>
> pg_dump reads the constraint OID with its transaction snapshot, so we will
> never see a too-new constraint.  Dropping a constraint still requires
> AccessExclusiveLock.

Agreed

> Concerning VALIDATE CONSTRAINT, pg_dump reads convalidated with its
> transaction snapshot and uses that to decide whether to dump the CHECK
> constraint as part of the CREATE TABLE or as a separate ALTER TABLE ADD
> CONSTRAINT following the data load.  However, pg_get_constraintdef() reads the
> latest convalidated to decide whether to emit NOT VALID.  Consequently, one
> can get a dump in which the dumped table data did not yet conform to the
> constraint, and the ALTER TABLE ADD CONSTRAINT (w/o NOT VALID) fails.
> (Suppose you deleted the last invalid rows just before executing the VALIDATE
> CONSTRAINT.  I tested this by committing the DELETE + VALIDATE CONSTRAINT with
> pg_dump stopped at getTableAttrs().)
>
> One hacky, but maintainable and effective, solution to the VALIDATE CONSTRAINT
> problem is to have pg_dump tack on a NOT VALID if pg_get_constraintdef() did
> not do so.  It's, conveniently, the last part of the definition.  I would tend
> to choose this.  We could also just decide this isn't bad enough to worry
> about.  The consequence is that an ALTER TABLE ADD CONSTRAINT fails.  Assuming
> no --single-transaction for the original restoral, you just add NOT VALID to
> the command and rerun.  Like most of the potential new pg_dump problems, this
> can already happen today if the relevant database changes happen between
> taking the pg_dump transaction snapshot and locking the tables.

Too hacky for me, but some good thinking. My proposed solution is below.

> -- pg_get_expr() for default expressions
>
> pg_dump reads pg_attrdef.adbin using its transaction snapshot, so it will
> never see a too-new default.  This does allow us to read a dropped default.
> That's not a problem directly.  However, suppose the default references a
> function dropped at the same time as the default.  pg_dump could fail in
> pg_get_expr().
>
> -- pg_get_indexdef()
>
> As you explained elsewhere, new indexes are no problem.  DROP INDEX still
> requires AccessExclusiveLock.  Overall, no problems here.
>
> -- pg_get_ruledef()
>
> The patch changes lock requirements for enabling and disabling of rules, but
> that is all separate from the rule expression handling.  No problems.
>
> -- pg_get_triggerdef()
>
> The patch reduces CREATE TRIGGER and DROP TRIGGER to ShareUpdateExclusiveLock.
> The implications for pg_dump are similar to those for pg_get_expr().

These are certainly concerning. What surprises me the most is that
pg_dump has been so happy to randomly mix SQL using the transaction
snapshot with sys cache access code using a different snapshot. If
that was intention there is no documentation in code or in the docs to
explain that.

> -- pg_get_viewdef()
>
> Untamed: pg_dump does not lock views at all.

OMG, its really a wonder pg_dump works at all.

> One thing not to forget is that you can always get the old mutual exclusion
> back by issuing LOCK TABLE just before a DDL operation.  If some unlucky user
> regularly gets pg_dump failures due to concurrent DROP TRIGGER, he has a
> workaround.  There's no comparable way for someone who would not experience
> that problem to weaken the now-hardcoded AccessExclusiveLock.  Many
> consequences of insufficient locking are too severe for that workaround to
> bring comfort,

Re: [HACKERS] syslog_ident mentioned as syslog_identify in the docs