Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 12:01 AM, Robert Haas  wrote:
>
> On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane  wrote:
> > Generally I'd be in favor of avoiding platform-dependent code where
> > possible, but that doesn't represent a YES vote for this particular
> > patch.  It looks pretty messy in a quick look, even granting that the
> > #ifdef WIN32's would all go away.
>
> Hmm, OK.  I have not read the patch.  Hopefully that's something that
> could be fixed.
>
> > A larger question here is about forward/backward compatibility of the
> > basebackup files.  Changing the representation of symlinks like this
> > would break that.  Maybe we don't care, not sure (is there already a
> > catversion check for these things?).  Changing the file format for only
> > some platforms seems like definitely a bad idea though.
>
> What are the practical consequences of changing the file format?  I
> think that an old backup containing symlinks could be made to work on
> a new server that knows how to create them,

So if I understand correctly, by *old backup* you mean backup created
by 9.5 and by *new server*, you mean server > 9.5, if yes the current
design should handle it.

However if the backup is created on version < 9.5 using pg_basebackup
of same version and trying to restore it with server >=9.5 won't work,
because server won't have the information about symlinks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 12:03 AM, Alvaro Herrera 
wrote:
>
> Amit Kapila wrote:
>
> > 2. Symlink file format:
> >  
> > 16387 E:\PostgreSQL\tbs
> >
> > Symlink file will contain entries for all the tablspaces
> > under pg_tblspc directory.  I have kept the file name as
> > symlink_label (suggestion are welcome if you want some
> > different name for this file).
>
> I think symlink_label isn't a very good name.  This file is not a label
> in the sense that backup_label is; it seems more a "catalog" to me.  And
> it's not, in essence, about symlinks either, but rather about
> tablespaces.  I would name it following the term "tablespace catalog" or
> some variation thereof.
>

This file is going to provide the symlink path for each tablespace, so
it not be bad to have that in file name.  I agree with you that it's more
about tablespaces.  So how about:

tablespace_symlink
symlink_tablespace
tablespace_info

> I know we don't expect that users would have to look at the file or edit
> it in normal cases, but it seems better to make it be human-readable.  I
> would think that the file needs to have tablespace names too, then, not
> just OIDs.  Maybe we don't use the tablespace name for anything other
> than "documentation" purposes if someone decides to look at the file, so
> perhaps it should look like a comment:
>
> ; 
>
> We already do this in pg_restore -l output IIRC.
>

Okay, I will take care of doing this in next version of patch if no one
objects to this idea or more people are in favour of doing so.

> One use case mentioned upthread is having the clone be created in the
> same machine as the source server.  Does your proposal help with it?
>

Sorry, but I am not getting which proposal exactly you are referring here,
Could you explain in more detail?

In general, if user took the backup (in tar format) using pg_basebackup,
this
patch will be able to restore such a backup even on the same server.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-14 Thread Michael Paquier
On Fri, Nov 14, 2014 at 5:31 PM, Michael Paquier
 wrote:
> 3) pg_xlogdump does not seem to work:
> $ pg_xlogdump 0001000D
> pg_xlogdump: FATAL:  could not find a valid record after 0/D00
This one is a bad manipulation from my side. Please forget this comment.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Michael Paquier
On Sat, Nov 15, 2014 at 1:27 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> Hi all,
>>
>> The new storage parameter pages_per_range is missing in the
>> documentation of CREATE INDEX. The patch attached corrects that.
>
> Ah BTW I had pushed this earlier today.
Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Steve Singer

On 11/14/2014 08:21 PM, Simon Riggs wrote:

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.



Simon,
The use cases I'm talking about aren't really replication related. Often 
I have come across systems that want to do something such as 'select * 
from orders where X > the_last_row_I_saw order by X' and then do further 
processing on the order.


This is kind of awkard to do today because you don't have a good 
candidate for 'X' to order on.   Using either a sequence or insert-row 
timestamp doesn't work well because a transaction with a lower value for 
X might end up committing after the higher value in in a query result.


Yes you could setup a logical wal slot and listen on the stream of 
inserts into your order table but thats a lot of administration overhead 
compared to just issuing an SQL query for what really is a query type 
operation.


Using the commit timestamp for my X sounded very tempting but could 
allow duplicates.


One could argue that this patch is about replication features, and 
providing commit ordering for query purposes should be a separate patch 
to add that on top of this infrastructure. I see merit to smaller more 
focused patches but that requires leaving the door open to easily 
extending things later.


It could also be that I'm the only one who wants to order and filter 
queries in this manner (but that would surprise me).  If the commit lsn 
has limited appeal and we decide we don't want it at all  then we 
shouldn't add it.  I've seen this type of requirement in a number of 
different systems at a number of different companies.  I've generally 
seen it dealt with by either selecting rows behind the last now() 
timestamp seen and then filtering out already processed rows or by 
tracking the 'processed' state of each row individually (ie performing 
an update on each row once its been processed) which performs poorly.


Steve







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless dead struct in parse_func.h

2014-11-14 Thread Alvaro Herrera
Peter Geoghegan wrote:
> It looks like commit 0e99be1c removed the final real use of the struct
> InhPaths. Attached patch removes it entirely.

I didn't verify the claim about 0e99be1c, but I pushed it anyway.
Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:
> Hi all,
> 
> The new storage parameter pages_per_range is missing in the
> documentation of CREATE INDEX. The patch attached corrects that.

Ah BTW I had pushed this earlier today.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:

> Btw, perhaps this diff should be pushed as a different patch as this is a
> rather different thing:
> -   if (heapRelation->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED
> &&
> +   if (indexRelation->rd_rel->relpersistence ==
> RELPERSISTENCE_UNLOGGED &&
> !smgrexists(indexRelation->rd_smgr, INIT_FORKNUM)
> As this is a correctness fix, it does not seem necessary to back-patch it:
> the parent relation always has the same persistence as its indexes.

There was an argument for doing it this way that only applies if this
patch went in, but I can't remember now what it was.

Anyway I pushed the patch after some slight additional changes.  Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Amit Kapila
On Sat, Nov 15, 2014 at 2:21 AM, Peter Eisentraut  wrote:
>
> On 11/13/14 4:33 PM, Peter Eisentraut wrote:
> >> Is this still relevant after this commit?
> >> >
> >> > commit fb05f3ce83d225dd0f39f8860ce04082753e9e98
> >> > Author: Peter Eisentraut 
> >> > Date:   Sat Feb 22 13:38:06 2014 -0500
> >> >
> >> > pg_basebackup: Add support for relocating tablespaces
> > I believe so.
> >
> > The commit only applies to "plain" output.  Amit's complaint is that tar
> > utilities on Windows don't unpack symlinks, so the "tar" format isn't
> > useful on Windows when tablespaces are used.  So he wants the recovery
> > mechanism to restore the symlinks.
>
> Um, wouldn't accepting this patch break the above-mentioned
> tablespace-relocation feature, because pg_basebackup wouldn't see any
> more symlinks sent down?

No, the new feature is implemented only for tar format and above feature
works only with plain format.  It will still send the symlink information as
previously for plain format.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote:
> > I don't see the point in including them for --clean..?  --clean states
> > that DROP commands would be added, not that existing roles would be
> > adjusted in some way.
> 
> It does state that, but note this comment in dumpRoles():
> 
>   /*
>* We dump CREATE ROLE followed by ALTER ROLE to ensure that 
> the role
>* will acquire the right properties even if it already exists 
> (ie, it
>* won't hurt for the CREATE to fail).  This is particularly 
> important
>* for the role we are connected as, since even with --clean we 
> will
>* have failed to drop it.  binary_upgrade cannot generate any 
> errors,
>* so we assume the current role is already created.
>*/

Ah, yes, of course.

> Under --clean, "the right properties" are those the role would have had if the
> DROP ROLE had succeeded.  Those are necessarily independent of the pre-DROP
> version of the role.  (Otherwise, you potentially get different outcomes
> depending on which superuser restored the --clean dump.)

Agreed, and in this case we'd need to set any attributes not set back to
the default, which would include having NOBYPASSRLS.

> > As for using 'always false'- I tend to think Robert actually has it
> > better by using the default for users.  Consider rolinherit- that
> > defaults to 'true' and while it would technically be more 'safe' to set
> > it to false, it wouldn't have matched what we provided under the user /
> > group system prior to roles.  Doing this would also reduce clutter in
> > pg_dumpall output.
> 
> My arguments and conclusion apply only to the permission-like attributes that
> are subsets of SUPERUSER.  rolinherit is indeed not in that category.

Understood.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 08:39:28PM -0500, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > So, if you desire to make this consistent, I recommend using 
> > rolreplication's
> > treatment as the gold standard.  That is to say, when dumping from an older
> > version, set to false any of these role attributes not existing in that
> > version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
> > reasonable alternative, though less so for "pg_dumpall --clean".  It would 
> > be
> > defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
> > consider that my second choice.
> 
> I don't see the point in including them for --clean..?  --clean states
> that DROP commands would be added, not that existing roles would be
> adjusted in some way.

It does state that, but note this comment in dumpRoles():

/*
 * We dump CREATE ROLE followed by ALTER ROLE to ensure that 
the role
 * will acquire the right properties even if it already exists 
(ie, it
 * won't hurt for the CREATE to fail).  This is particularly 
important
 * for the role we are connected as, since even with --clean we 
will
 * have failed to drop it.  binary_upgrade cannot generate any 
errors,
 * so we assume the current role is already created.
 */

Under --clean, "the right properties" are those the role would have had if the
DROP ROLE had succeeded.  Those are necessarily independent of the pre-DROP
version of the role.  (Otherwise, you potentially get different outcomes
depending on which superuser restored the --clean dump.)

> As for using 'always false'- I tend to think Robert actually has it
> better by using the default for users.  Consider rolinherit- that
> defaults to 'true' and while it would technically be more 'safe' to set
> it to false, it wouldn't have matched what we provided under the user /
> group system prior to roles.  Doing this would also reduce clutter in
> pg_dumpall output.

My arguments and conclusion apply only to the permission-like attributes that
are subsets of SUPERUSER.  rolinherit is indeed not in that category.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote:
> > rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
> > not.  Noah- would you argue that we should change rolcreaterole, which
> > has this behavior in all released branches (though, of course, it's only
> > relevant when upgrading from a pre-8.1 server where we didn't have
> > rolcreaterole)?
> 
> Setting aside that I wouldn't have argued for any change here, yes.  I agree
> that there's no good reason to handle rolcreaterole unlike rolreplication.
> The choice between their respective techniques has behavior consequences only
> if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart
> from explicit ALTER ROLE testing.  Having said that, I prefer setting these
> attributes to false when dumping from a version that did not have them, for
> these reasons:
> 
> 1) It's fail-safe.  The hypothetical ALTER ROLE x NOSUPERUSER may leave the
>role with fewer privileges than expected, never with more privileges than
>expected.
> 
> 2) It's more consistent with how folks create superuser accounts.  I've not
>seen "CREATE USER x SUPERUSER CREATEROLE" used.  Where SUPERUSER preempts a
>given role attribute, the norm among sites I've seen is to omit the
>attribute.  (The bootstrap superuser does turn this point on its head.)
> 
> 3) It's cleaner in \du output.

I agree with these points and my experience matches yours.  The
bootstrap user is the one anomaly at those sites and makes it stand out
rather than look like the majority of the superuser accounts which
exist (where there exists more than one or two anyway).

> I can't pinpoint a technical argument against your proposal to cease adding
> excess attributes to the bootstrap superuser at initdb time.  It feels like a
> case of the tail wagging the dog, changing antediluvian initdb behavior to
> make pg_dumpall slightly more transparent.

For my 2c, it feels like it was simply an attempt to reflect actual
capabilities without consideration for what happened later rather than
any particularly technical reason, and I don't feel that original
reasoning (if that's what it was, anyway) was sound.

> So, if you desire to make this consistent, I recommend using rolreplication's
> treatment as the gold standard.  That is to say, when dumping from an older
> version, set to false any of these role attributes not existing in that
> version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
> reasonable alternative, though less so for "pg_dumpall --clean".  It would be
> defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
> consider that my second choice.

I don't see the point in including them for --clean..?  --clean states
that DROP commands would be added, not that existing roles would be
adjusted in some way.

As for using 'always false'- I tend to think Robert actually has it
better by using the default for users.  Consider rolinherit- that
defaults to 'true' and while it would technically be more 'safe' to set
it to false, it wouldn't have matched what we provided under the user /
group system prior to roles.  Doing this would also reduce clutter in
pg_dumpall output.

> > What are your thoughts on the additional role
> > attributes which are being discussed?
> 
> All three of rolcreaterole, rolreplication and rolbypassrls deserve the same
> dumpRoles() treatment.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Useless dead struct in parse_func.h

2014-11-14 Thread Peter Geoghegan
It looks like commit 0e99be1c removed the final real use of the struct
InhPaths. Attached patch removes it entirely.

-- 
Peter Geoghegan
diff --git a/src/include/parser/parse_func.h b/src/include/parser/parse_func.h
index b9b06ae..4423bc0 100644
--- a/src/include/parser/parse_func.h
+++ b/src/include/parser/parse_func.h
@@ -18,18 +18,6 @@
 #include "parser/parse_node.h"
 
 
-/*
- *	This structure is used to explore the inheritance hierarchy above
- *	nodes in the type tree in order to disambiguate among polymorphic
- *	functions.
- */
-typedef struct _InhPaths
-{
-	int			nsupers;		/* number of superclasses */
-	Oid			self;			/* this class */
-	Oid		   *supervec;		/* vector of superclasses */
-}	InhPaths;
-
 /* Result codes for func_get_detail */
 typedef enum
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Simon Riggs
On 14 November 2014 17:12, Robert Haas  wrote:

> We are not talking about loading 20 new
> requirements on top of this patch; that would be intolerable.  We're
> talking about adding one additional piece of information that has been
> requested multiple times over the years.

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 11:47:49AM -0500, Stephen Frost wrote:
> > On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
> > > I think Noah is arguing for leaving the pg_dumpall queries as they
> > > stand.  I disagree, but he's entitled to his opinion.

> Ah, ok.  I'm impartial, but I do note that we're currently inconsistent
> and so I'd prefer to go one way or the other.
> 
> rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
> not.  Noah- would you argue that we should change rolcreaterole, which
> has this behavior in all released branches (though, of course, it's only
> relevant when upgrading from a pre-8.1 server where we didn't have
> rolcreaterole)?

Setting aside that I wouldn't have argued for any change here, yes.  I agree
that there's no good reason to handle rolcreaterole unlike rolreplication.
The choice between their respective techniques has behavior consequences only
if you later use ALTER ROLE x NOSUPERUSER, which I have not seen done apart
from explicit ALTER ROLE testing.  Having said that, I prefer setting these
attributes to false when dumping from a version that did not have them, for
these reasons:

1) It's fail-safe.  The hypothetical ALTER ROLE x NOSUPERUSER may leave the
   role with fewer privileges than expected, never with more privileges than
   expected.

2) It's more consistent with how folks create superuser accounts.  I've not
   seen "CREATE USER x SUPERUSER CREATEROLE" used.  Where SUPERUSER preempts a
   given role attribute, the norm among sites I've seen is to omit the
   attribute.  (The bootstrap superuser does turn this point on its head.)

3) It's cleaner in \du output.

I can't pinpoint a technical argument against your proposal to cease adding
excess attributes to the bootstrap superuser at initdb time.  It feels like a
case of the tail wagging the dog, changing antediluvian initdb behavior to
make pg_dumpall slightly more transparent.

So, if you desire to make this consistent, I recommend using rolreplication's
treatment as the gold standard.  That is to say, when dumping from an older
version, set to false any of these role attributes not existing in that
version.  Robert's proposal to emit neither NOBYPASSRLS nor BYPASSRLS is a
reasonable alternative, though less so for "pg_dumpall --clean".  It would be
defensible to send NOBYPASSRLS under --clean and omit the option otherwise;
consider that my second choice.

> What are your thoughts on the additional role
> attributes which are being discussed?

All three of rolcreaterole, rolreplication and rolbypassrls deserve the same
dumpRoles() treatment.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] END_OF_RECOVERY shutdowns and ResetUnloggedRelations()

2014-11-14 Thread Andres Freund
On 2014-10-27 16:09:30 +0530, Abhijit Menon-Sen wrote:
> At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
> >
> > On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
> > > 
> > > 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
> > >earlier in StartupXLOG.
> > > 
> > > 2. Inside that function, issue fsync()s for the main forks we create by
> > >copying the _init fork.
> > 
> > These two imo should definitely be backpatched.
> 
> I've attached updated versions of these two patches. The only changes
> are to revise the comments as you suggested.

Committed and backpatched (to 9.1) these. That required also
backpatching the move of fsync_fname() to fd.c/h to 9.1-9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb -S and tablespaces

2014-11-14 Thread Andres Freund
On 2014-10-30 14:30:28 +0530, Abhijit Menon-Sen wrote:
> Here's a proposed patch to initdb to make initdb -S fsync everything
> under pg_tblspc. It introduces a new function that calls walkdir on
> every entry under pg_tblspc. This is only one approach: I could have
> also changed walkdir to follow links, but that would have required a
> bunch of #ifdefs for Windows (because it doesn't have symlinks), and I
> guessed a separate function with two calls might be easier to patch into
> back branches. I've tested this patch under various conditions on Linux,
> but it could use some testing on Windows.

I've pushed this. The windows buildfarm animals that run pg_upgrade (and
thus --sync-only) will have to tell us whether there's a problem. I sure
hope there's none...

Thanks for that patch!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On the warpath again about ill-considered inclusion nests

2014-11-14 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Well, if you *only* move RowSecurityDesc and not RowSecurityPolicy,
> > okay, but that seems a bit useless/inconsistent if I'm reading it
> > right that RowSecurityDesc contains a List of RowSecurityPolicy structs.
> 
> Yes, good point.
> 
> > What seems possibly saner is to just remove the header inclusion in rel.h
> > and declare the new Relation field similarly to the way we handle
> > rd_fdwroutine and some other fields there:
> > 
> > /* use "struct" here to avoid needing to include rowsecurity.h: */
> > struct RowSecurityDesc *rsdesc; /* Row-security policy, or NULL */
> 
> Makes sense to me.
> 
> > And while you are at it, how about renaming "rsdesc" to "rd_rsdesc"?
> > The fact that whoever put in trigdesc didn't get the memo about the
> > naming convention for Relation fields doesn't excuse you from following
> > it.
> 
> Ok.  I tend to be bad and mistakenly consider existing code 'gospel'.
> Will fix.
> 
> > PS: The comments for struct RowSecurityPolicy could stand to be improved.
> 
> Understood, will do so.

And, done.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 2:51 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane  wrote:
>> > What's bothering me is that I see this in pg_dumpall output from a 9.4
>> > or earlier database:
>> >
>> > ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN 
>> > REPLICATION NOBYPASSRLS;
>>
>> What about leaving out NOBYPASSRLS and letting it go to whatever the default 
>> is?
>
> I'd be fine with that- but would we want to do it for the other role
> attributes also?  Specifically rolcreaterole and rolreplication for
> older server versions.  I'm still of the opinion that we should just
> drop the explicit "true" for all the role attributes for the bootstrap
> superuser and then go with this suggestion to let it go to the default
> for upgrades from older versions.

Not sure; I was just brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Alex Shulgin

Andres Freund  writes:
>
> On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
>> After reading up through archives on the two $subj related TODO items
>> I'm under impression that the patches[1,2] didn't make it mainly because
>> of the risk of breaking SSL internals if we try to longjump out of the
>> signal handler in the middle of a blocking SSL read and/or if we try to
>> report that final "transaction/connection aborted" notice to the client.
>
> I've written a patch that allows that - check
> http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de
>
> I feel pretty confident that that's the way to go. I just need some time
> to polish it.
>
>> What if instead of trying to escape from the signal handler we would set
>> a flag and use it to simulate EOF after the recv() is restarted due to
>> EINTR?  From the backend perspective it should look like the client has
>> just hang up.
>
> That's pretty much what the above does. Except that it actually deals
> with blocking syscalls by not using them and relying on latches/select
> instead.

Neat, I must have missed it.

--
Alex


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Andres Freund
Hi,

On 2014-11-15 00:11:36 +0300, Alex Shulgin wrote: 
> After reading up through archives on the two $subj related TODO items
> I'm under impression that the patches[1,2] didn't make it mainly because
> of the risk of breaking SSL internals if we try to longjump out of the
> signal handler in the middle of a blocking SSL read and/or if we try to
> report that final "transaction/connection aborted" notice to the client.

I've written a patch that allows that - check
http://archives.postgresql.org/message-id/20140927225421.GE5423%40alap3.anarazel.de

I feel pretty confident that that's the way to go. I just need some time
to polish it.

> What if instead of trying to escape from the signal handler we would set
> a flag and use it to simulate EOF after the recv() is restarted due to
> EINTR?  From the backend perspective it should look like the client has
> just hang up.

That's pretty much what the above does. Except that it actually deals
with blocking syscalls by not using them and relying on latches/select
instead.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Idle transaction cancel/timeout and SSL revisited

2014-11-14 Thread Alex Shulgin
Hello Hackers,

After reading up through archives on the two $subj related TODO items
I'm under impression that the patches[1,2] didn't make it mainly because
of the risk of breaking SSL internals if we try to longjump out of the
signal handler in the middle of a blocking SSL read and/or if we try to
report that final "transaction/connection aborted" notice to the client.

What if instead of trying to escape from the signal handler we would set
a flag and use it to simulate EOF after the recv() is restarted due to
EINTR?  From the backend perspective it should look like the client has
just hang up.

Both SSL and cleartext connections are routed through secure_raw_read,
so it seems like a good candidate for checking the flag we would set in
StatementCancelHandler().  (Should we also add the same check in
interactive_getc()?)

Ultimately, in PostgresMain() the ReadCommand() would return EOF and we
can either let the existing code shutdown the backend, or add special
handling that will only abort the transaction and report the case to the
client, which was the intent in[1].  And if that works out, the timeout
handler in[2] can simply reuse the flag.

A potential problem with this is that SSL might perform some eager
cleanup when seeing EOF, so the backend might need to reset/restart the
connection (is that possible?)

I'm attaching a patch (that doesn't compile yet :-p) in hope it can be
useful for the purpose of the discussion.  Notably, secure_raw_read()
can't know about IdleTransactionCancelPending and I'm not sure what
would be the best way to let it see that (is it too much of a shortcut
anyway?)

--
Alex

[1] http://www.postgresql.org/message-id/1262268211.19367.10736.camel@ebony
[2] http://www.postgresql.org/message-id/538dc843.2070...@dalibo.com

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
new file mode 100644
index b05364c..377984d
*** a/src/backend/libpq/be-secure-openssl.c
--- b/src/backend/libpq/be-secure-openssl.c
*** rloop:
*** 543,550 
  	 errmsg("SSL error: %s", SSLerrmessage(;
  			/* fall through */
  		case SSL_ERROR_ZERO_RETURN:
! 			errno = ECONNRESET;
! 			n = -1;
  			break;
  		default:
  			ereport(COMMERROR,
--- 543,553 
  	 errmsg("SSL error: %s", SSLerrmessage(;
  			/* fall through */
  		case SSL_ERROR_ZERO_RETURN:
! 			if (!IdleTransactionCancelPending) /* HACK */
! 			{
! errno = ECONNRESET;
! n = -1;
! 			}
  			break;
  		default:
  			ereport(COMMERROR,
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
new file mode 100644
index 41ec1ad..44079ff
*** a/src/backend/libpq/be-secure.c
--- b/src/backend/libpq/be-secure.c
*** secure_raw_read(Port *port, void *ptr, s
*** 145,155 
  {
  	ssize_t		n;
  
! 	prepare_for_client_read();
  
! 	n = recv(port->sock, ptr, len, 0);
  
! 	client_read_ended();
  
  	return n;
  }
--- 145,160 
  {
  	ssize_t		n;
  
! 	if (IdleTransactionCancelPending)
! 		n = 0; /* simulate EOF */
! 	else
! 	{
! 		prepare_for_client_read();
  
! 		n = recv(port->sock, ptr, len, 0);
  
! 		client_read_ended();
! 	}
  
  	return n;
  }
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
new file mode 100644
index cc62b2c..37437fc
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
*** interactive_getc(void)
*** 310,318 
  {
  	int			c;
  
! 	prepare_for_client_read();
! 	c = getc(stdin);
! 	client_read_ended();
  	return c;
  }
  
--- 310,323 
  {
  	int			c;
  
! 	if (IdleTransactionCancelPending)
! 		c = EOF;
! 	else
! 	{
! 		prepare_for_client_read();
! 		c = getc(stdin);
! 		client_read_ended();
! 	}
  	return c;
  }
  
*** StatementCancelHandler(SIGNAL_ARGS)
*** 2629,2642 
  	if (!proc_exit_inprogress)
  	{
  		InterruptPending = true;
! 		QueryCancelPending = true;
  
  		/*
  		 * If it's safe to interrupt, and we're waiting for input or a lock,
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! 			CritSectionCount == 0)
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
--- 2634,2655 
  	if (!proc_exit_inprogress)
  	{
  		InterruptPending = true;
! 
! 		if (!DoingCommandRead)
! 			QueryCancelPending = true;
! 		else if (IsTransactionOrTransactionBlock())
! 			IdleTransactionCancelPending = true;
  
  		/*
  		 * If it's safe to interrupt, and we're waiting for input or a lock,
  		 * service the interrupt immediately
  		 */
  		if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
! 			CritSectionCount == 0 && QueryCancelPending)
! 			/*
! 			 * ProcessInterrupts() does nothing in case of
! 			 * IdleTransactionCancelPending, it is handled in PostgresMain
! 			 */
  		{
  			/* bump holdoff count to make ProcessInterrupts() a no-op */
  			/* until we are done getting ready for it */
*** PostgresMain(

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Peter Eisentraut
On 11/13/14 4:33 PM, Peter Eisentraut wrote:
>> Is this still relevant after this commit?
>> > 
>> > commit fb05f3ce83d225dd0f39f8860ce04082753e9e98
>> > Author: Peter Eisentraut 
>> > Date:   Sat Feb 22 13:38:06 2014 -0500
>> > 
>> > pg_basebackup: Add support for relocating tablespaces
> I believe so.
> 
> The commit only applies to "plain" output.  Amit's complaint is that tar
> utilities on Windows don't unpack symlinks, so the "tar" format isn't
> useful on Windows when tablespaces are used.  So he wants the recovery
> mechanism to restore the symlinks.

Um, wouldn't accepting this patch break the above-mentioned
tablespace-relocation feature, because pg_basebackup wouldn't see any
more symlinks sent down?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-14 Thread Alvaro Herrera
Michael Paquier wrote:
> Hi all,
> 
> The new storage parameter pages_per_range is missing in the
> documentation of CREATE INDEX. The patch attached corrects that.

Thanks!  Pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote:
> Jeremy Harris  writes:
>> On 14/11/14 00:46, Simon Riggs wrote:
>>> Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
>>> ->  Sort  (cost= rows=568733 width=175) (actual time=
>>> rows=20 loops=1)
>>> Sort Method: top-N heapsort
> 
>> Going off on a tangent, when I was playing with a merge-sort
>> implementation I propagated limit information into the sort
>> node, for a significant win.
> 
> I'm not entirely following.  The top-N heapsort approach already
> makes use of the limit info.

Having gone back to look, you're right.  It was Uniq nodes I merged
(the sort handles both bounded-output and dedup).
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO request: log_long_transaction

2014-11-14 Thread Jim Nasby

On 11/7/14, 1:19 PM, Michael Banck wrote:

Am Montag, den 27.10.2014, 19:29 + schrieb Thom Brown:

>On 27 October 2014 19:21, Josh Berkus  wrote:

> >I just realized that there is one thing we can't log currently:
> >transactions which last more than #ms.  This is valuable diagnostic
> >information when looking for issues like causes of bloat and deadlocks.
> >
> >I'd like it to be on the TODO list because it seems like part of a good
> >GSOC project or first-time contribution.

>
>
>So effectively, log_min_duration_transaction?  Sounds useful.


FWIW, I've also wanted the equivalent of statement_timeout for transactions; 
the ability to abort a transaction if it runs for too long.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] printing table in asciidoc with psql

2014-11-14 Thread Szymon Guz
On 14 November 2014 20:57, Alvaro Herrera  wrote:

> Is anyone going to submit a new version of this patch?
>
>
>

Hi Alvaro,
due to family issues I will not be able to work on it for the next 10 days.

regards,
Szymon


Re: [HACKERS] Question about RI checks

2014-11-14 Thread Alvaro Herrera
Kevin, are you handling this?


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] printing table in asciidoc with psql

2014-11-14 Thread Alvaro Herrera
Is anyone going to submit a new version of this patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of regression database

2014-11-14 Thread Tom Lane
Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> A quick eyeball check says that that quintupling of the database size
>>> is all in BRIN index tests.  Could we dial that back to something a
>>> bit saner please?

> Done:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=86cf9a565069755189e08290343d2d62afdd1f52
> Now a pg_dumpall for me is 5510969 bytes which seems reasonable.

Sounds good to me.  Thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane  wrote:
> > What's bothering me is that I see this in pg_dumpall output from a 9.4
> > or earlier database:
> >
> > ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN 
> > REPLICATION NOBYPASSRLS;
> 
> What about leaving out NOBYPASSRLS and letting it go to whatever the default 
> is?

I'd be fine with that- but would we want to do it for the other role
attributes also?  Specifically rolcreaterole and rolreplication for
older server versions.  I'm still of the opinion that we should just
drop the explicit "true" for all the role attributes for the bootstrap
superuser and then go with this suggestion to let it go to the default
for upgrades from older versions.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Size of regression database

2014-11-14 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Tom Lane wrote:
> > I was testing backwards compatibility of pg_dumpall just now, and was
> > somewhat astonished to notice the size of the output for the regression
> > database compared to what it was not too long ago:
> > 
> > -rw-rw-r--. 1 tgl tgl  4509135 Nov 13 16:19 dumpall.83
> > -rw-rw-r--. 1 tgl tgl  4514441 Nov 13 16:24 dumpall.84
> > -rw-rw-r--. 1 tgl tgl  4666917 Nov 13 16:15 dumpall.90
> > -rw-rw-r--. 1 tgl tgl  4681235 Nov 13 16:15 dumpall.91
> > -rw-rw-r--. 1 tgl tgl  5333587 Nov 13 16:15 dumpall.92
> > -rw-rw-r--. 1 tgl tgl  5409083 Nov 13 16:15 dumpall.93
> > -rw-rw-r--. 1 tgl tgl  5493686 Nov 13 16:15 dumpall.94
> > -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head
> > 
> > A quick eyeball check says that that quintupling of the database size
> > is all in BRIN index tests.  Could we dial that back to something a
> > bit saner please?
> 
> Oops.  Sure, will see about this.

Done:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=86cf9a565069755189e08290343d2d62afdd1f52

Now a pg_dumpall for me is 5510969 bytes which seems reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread Andres Freund
On 2014-11-15 03:25:16 +0900, Fujii Masao wrote:
> On Fri, Nov 14, 2014 at 7:22 PM,   wrote:
> > Hi,
> >
> > "pg_ctl stop" does't work propley, if --slot option is specified when WAL 
> > is flushed only it has switched.
> > These processes still continue even after the posmaster 
> > failed:pg_receivexlog, walsender and logger.
> 
> I could reproduce this problem. At normal shutdown, walsender keeps waiting
> for the last WAL record to be replicated and flushed in pg_receivexlog. But
> pg_receivexlog issues sync command only when WAL file is switched. Thus,
> since pg_receivexlog may never flush the last WAL record, walsender may have
> to keep waiting infinitely.

Right.

> pg_recvlogical handles this problem by calling fsync() when it receives the
> request of immediate reply from the server. That is, at shutdown, walsender
> sends the request, pg_receivexlog receives it, flushes the last WAL record,
> and sends the flush location back to the server. Since walsender can see that
> the last WAL record is successfully flushed in pg_receivexlog, it can
> exit cleanly.
> 
> One idea to the problem is to introduce the same logic as pg_recvlogical has,
> to pg_receivexlog. Thought?

Sounds sane to me. Are you looking into doing that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Alvaro Herrera
Amit Kapila wrote:

> 2. Symlink file format:
>  
> 16387 E:\PostgreSQL\tbs
> 
> Symlink file will contain entries for all the tablspaces
> under pg_tblspc directory.  I have kept the file name as
> symlink_label (suggestion are welcome if you want some
> different name for this file).

I think symlink_label isn't a very good name.  This file is not a label
in the sense that backup_label is; it seems more a "catalog" to me.  And
it's not, in essence, about symlinks either, but rather about
tablespaces.  I would name it following the term "tablespace catalog" or
some variation thereof.

I know we don't expect that users would have to look at the file or edit
it in normal cases, but it seems better to make it be human-readable.  I
would think that the file needs to have tablespace names too, then, not
just OIDs.  Maybe we don't use the tablespace name for anything other
than "documentation" purposes if someone decides to look at the file, so
perhaps it should look like a comment:

; 

We already do this in pg_restore -l output IIRC.

One use case mentioned upthread is having the clone be created in the
same machine as the source server.  Does your proposal help with it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 1:15 PM, Tom Lane  wrote:
> Generally I'd be in favor of avoiding platform-dependent code where
> possible, but that doesn't represent a YES vote for this particular
> patch.  It looks pretty messy in a quick look, even granting that the
> #ifdef WIN32's would all go away.

Hmm, OK.  I have not read the patch.  Hopefully that's something that
could be fixed.

> A larger question here is about forward/backward compatibility of the
> basebackup files.  Changing the representation of symlinks like this
> would break that.  Maybe we don't care, not sure (is there already a
> catversion check for these things?).  Changing the file format for only
> some platforms seems like definitely a bad idea though.

What are the practical consequences of changing the file format?  I
think that an old backup containing symlinks could be made to work on
a new server that knows how to create them, and we should probably
design it that way, but a physical backup isn't compatible across
major versions anyway, so it doesn't have the same kinds of
repercussions as changing something like the pg_dump file format.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread Fujii Masao
On Fri, Nov 14, 2014 at 7:22 PM,   wrote:
> Hi,
>
> "pg_ctl stop" does't work propley, if --slot option is specified when WAL is 
> flushed only it has switched.
> These processes still continue even after the posmaster 
> failed:pg_receivexlog, walsender and logger.

I could reproduce this problem. At normal shutdown, walsender keeps waiting
for the last WAL record to be replicated and flushed in pg_receivexlog. But
pg_receivexlog issues sync command only when WAL file is switched. Thus,
since pg_receivexlog may never flush the last WAL record, walsender may have
to keep waiting infinitely.

pg_recvlogical handles this problem by calling fsync() when it receives the
request of immediate reply from the server. That is, at shutdown, walsender
sends the request, pg_receivexlog receives it, flushes the last WAL record,
and sends the flush location back to the server. Since walsender can see that
the last WAL record is successfully flushed in pg_receivexlog, it can
exit cleanly.

One idea to the problem is to introduce the same logic as pg_recvlogical has,
to pg_receivexlog. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-11-14 Thread Andres Freund
On 2014-11-14 12:38:52 -0500, Robert Haas wrote:
> >> This is basically the same problem as multi-master replication
> >> conflicts, except with DDL.  Resolving replication conflicts is not a
> >> very easy thing to get right even if you're only concerned about the
> >> rows in the tables.  It's probably harder if you're worried about the
> >> schema, too.
> >
> > I don't think it's a sane thing to do multimaster with differing schemas
> > for individual relations, except maybe additional nonunique indexes.
> 
> But that's exactly what you ARE doing, isn't it?  I mean, if you
> replicate in only one direction, nothing keeps someone from modifying
> things on the replica independently of BDR, and if you replicate in
> both directions, well that's multi-master.

Meh. By that definition any logical replication solution does multi
master replication. Either you tell your users that that's not allowed,
or you just prevent it by technical means. Absolutely the same is true
for table contents.

FWIW, in BDR we *do* prevent schemas from being modified independently
on different nodes (unless you set the 'running with scissors' guc).

> I am quite doubtful about whether there will ever be a second, working
> implementation, so I see all of this code - and the maintenance effort
> associated with it - as something that will really only benefit BDR.
> I understand that you don't see it that way, and I'm not saying that
> you are offering anything in bad faith, but it looks to me like even
> with all of this very substantial amount of infrastructure, you're
> still going to need a big pile of additional code inside BDR to
> actually make it work, and I don't hear anyone else offering to
> develop something similar.

I don't know what to say about this. I don't see any other realistic way
to perform DDL replication in logical rep, and nearly all my
conversations with users have indicated that they want that.

I think it's a good idea to structure independent features in a way that
other solutions can reuse them. But I sure as hell can't force them to
use it - especially as there's unfortunately not too much development
going on in the existing logical replication solutions for postgres.

Btw, I really think there's quite legitimate use cases for this besides
logical replication solutions - e.g. schema tracking is quite a sensible
use case.

> >> By the way, the fact that you're planning to do log-based replication
> >> of DML and trigger-based replication of DDL scares the crap out of me.
> >> I'm not sure how that's going to work at all if the two are
> >> interleaved in the same transaction.
> >
> > Maybe that's based on a misunderstanding. All the event trigger does is
> > insert a event into a (local) queue. That's then shipped to the other
> > side using the same replication mechanisms as used for rows. Then, when
> > receiving changes in that ddl queue the standby performs those actions
> > before continuing with the replay.
> > That makes the interleaving on the standby to be pretty much the same as
> > on the primary.
> 
> OK.  But that's also not something that I can really imagine us ever
> adopting in core.

Well, that bit really depends on what the user (e.g. a replication
solution, or a schema tracking feature) needs. There's certain things
that you can quite easily do as part of core (e.g. insert something into
the WAL stream), that you just can't as external code.

I don't think there's any external replication solution that won't have
some form of internal queue to manipulate its internal state. For an
external solution such a queue currently pretty much has to be some
table, but for an eventual in core solution it could be done
differently.

> If we were going to do DDL replication in core, I have to believe we'd
> find a way to put all of the information in the WAL stream, not use
> triggers.

I agree that we might want to represent the transport to standbys
differently for something in core. That there's many different ways the
deparsing output could be used imo is a good reason why that part of the
mechanism isn't part of this submission.

I don't really understand the arguments against triggers in general
though. We're already using them quite extensively - I don't see why DDL
replication has to meet some completely different bar than say foreign
key checks or deferred uniqueness checks. We easily can add 'implicit'
event triggers, that aren't defined inside the catalog if we feel like
it. I'm just not sure we really would need/want to.

> > Yes, it's not trivial. And I think there's some commands where you might
> > not want to try but either scream ERROR or just rereplicate the whole
> > relation or such.
> >
> > I very strongly feel that we (as postgres devs) have a *much* higher
> > chance of recognizing these cases than either some random users (that
> > write slonik scripts or similar) or some replication solution authors
> > that aren't closely involved with -hackers.
> 
> It's clearly the j

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Tom Lane
Robert Haas  writes:
> On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila  wrote:
>> OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that
>> this patch introduces and can have consistency for this operation on
>> both linux and Windows.

> Having one code path for everything seems appealing to me, but what do
> others think?

Generally I'd be in favor of avoiding platform-dependent code where
possible, but that doesn't represent a YES vote for this particular
patch.  It looks pretty messy in a quick look, even granting that the
#ifdef WIN32's would all go away.

A larger question here is about forward/backward compatibility of the
basebackup files.  Changing the representation of symlinks like this
would break that.  Maybe we don't care, not sure (is there already a
catversion check for these things?).  Changing the file format for only
some platforms seems like definitely a bad idea though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 7:45 AM, Andres Freund  wrote:
>> Right.  And that's why it's cool that logical decoding can operate
>> through DDL differences.  The apply side might not be able to cope
>> with what pops out, but that's not logical decoding's fault, and
>> different apply-sides can adopt different policies as to how to deal
>> with whatever problems crop up.
>
> I think pretty much all of the solutions just say "oops, you're on your
> own". And I can't blame them for that. Once there's a schema difference
> and it causes problem there's really not much that can be done.

Agreed.

>> I don't know exactly what you mean by "a normal single master setup".
>> Surely the point of logical decoding is that the replica might not be
>> identical to the master.
>
> I actually think that that's not primary the point if you talk about
> individual objects. The majority of objects will be exactly the same on
> all nodes. If you actually want to have differening objects on the nodes
> you'll have to opt out/in (depending on your solution) of ddl
> replication for those objects.

I'm not saying it's the primary point; I'm just saying that it can happen.

>> And if it isn't, then a command that
>> succeeded on the master might fail on the standby - for example,
>> because an object by that name already exists there, or because a type
>> doesn't exist there.  (Even if you replicate a CREATE EXTENSION
>> command, there's no guarantee that the .so exists on the target.) Then
>> what?
>
> Sure. There's reasons logical replication isn't always a win. But I
> don't see why that's a reason not to make it as robust as possible.

I agree.

> Btw, the .so problem exists for wal shipping as as well.

Not in the same way.  You might not be able to access the data on the
standby if the .so defining the datatype isn't present, but the
replication itself doesn't care.

>> This is basically the same problem as multi-master replication
>> conflicts, except with DDL.  Resolving replication conflicts is not a
>> very easy thing to get right even if you're only concerned about the
>> rows in the tables.  It's probably harder if you're worried about the
>> schema, too.
>
> I don't think it's a sane thing to do multimaster with differing schemas
> for individual relations, except maybe additional nonunique indexes.

But that's exactly what you ARE doing, isn't it?  I mean, if you
replicate in only one direction, nothing keeps someone from modifying
things on the replica independently of BDR, and if you replicate in
both directions, well that's multi-master.

>> > The solution here doesn't force you to do that, does it? It's something
>> > that can be used by more than replication solution?
>>
>> In theory, yes.
>
> What's the practical point here?

I am quite doubtful about whether there will ever be a second, working
implementation, so I see all of this code - and the maintenance effort
associated with it - as something that will really only benefit BDR.
I understand that you don't see it that way, and I'm not saying that
you are offering anything in bad faith, but it looks to me like even
with all of this very substantial amount of infrastructure, you're
still going to need a big pile of additional code inside BDR to
actually make it work, and I don't hear anyone else offering to
develop something similar.

> I think generally logical replication has more failure cases than
> physical ones. Which you seem to agree with.

Yes, I agree with that.

>> Physical replication is basically no-fail because it just
>> says, hey, go write these bytes into this page, and we can pretty much
>> always do that. But statement-based logical replication means
>> basically executing arbitrary chunks of code all over the backend, and
>> there is just no way to guarantee that code won't throw an error.  So
>> the best we can do is to hope that those errors will get reported back
>> to the user, which is going to require some kind of distributed
>> transaction.  Your idea to just run the replicated DDL statements on
>> the standby before committing on the master is one approach to that
>> problem, and probably the simplest one, but not the only one - one can
>> imagine something that resembles true clustering, for example.
>
> I think that's generally not what people need for primary/standby
> cases (of subsets of tables). In practice there aren't many failures
> like that besides schema differences. And there's just many usecases
> that aren't doable with physical replication, so we can't advise people
> doing that.

I can't really follow this, especially the first sentence.

>> By the way, the fact that you're planning to do log-based replication
>> of DML and trigger-based replication of DDL scares the crap out of me.
>> I'm not sure how that's going to work at all if the two are
>> interleaved in the same transaction.
>
> Maybe that's based on a misunderstanding. All the event trigger does is
> insert a event into a (local) queue. That's then shipped 

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-14 Thread Robert Haas
On Fri, Nov 14, 2014 at 2:55 AM, Amit Kapila  wrote:
> On Fri, Nov 14, 2014 at 9:11 AM, Robert Haas  wrote:
>> On Thu, Nov 13, 2014 at 10:37 PM, Amit Kapila 
>> wrote:
>> >>
>> >
>> > I have mentioned that this can be usable for Linux users as well on that
>> > thread, however I think we might want to provide it with an option for
>> > linux users.  In general, I think it is good to have this patch for
>> > Windows
>> > users and later if we find that Linux users can also get the benefit
>> > with
>> > this functionality, we can expose the same with an additional option.
>>
>> Why make it an option instead of just always doing it this way?
>>
> To avoid extra work during archive recovery if it is not required.  I
> understand that this might not create any measurable difference, but
> still there is addition I/O involved (read from file) which can be avoided.

Yeah, but it's trivial.  We're not going create enough tablespaces on
one cluster for the cost of dropping a few extra symlinks in place to
matter.

> OTOH, if that is okay, then I think we can avoid few #ifdef WIN32 that
> this patch introduces and can have consistency for this operation on
> both linux and Windows.

Having one code path for everything seems appealing to me, but what do
others think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 6:32 PM, Tom Lane  wrote:
> What's bothering me is that I see this in pg_dumpall output from a 9.4
> or earlier database:
>
> ALTER ROLE postgres WITH SUPERUSER INHERIT CREATEROLE CREATEDB LOGIN 
> REPLICATION NOBYPASSRLS;

What about leaving out NOBYPASSRLS and letting it go to whatever the default is?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Robert Haas
On Thu, Nov 13, 2014 at 6:55 PM, Simon Riggs  wrote:
> On 13 November 2014 21:24, Robert Haas  wrote:
>> On Thu, Nov 13, 2014 at 8:18 AM, Simon Riggs  wrote:
>>> Ordering transactions in LSN order is very precisly the remit of the
>>> existing logical decoding API. Any user that wishes to see a commits
>>> in sequence can do so using that API. BDR already does this, as do
>>> other users of the decoding API. So Slony already has access to a
>>> useful ordering if it wishes it. We do not need to anything *on this
>>> patch* to enable LSNs for Slony or anyone else. I don't see any reason
>>> to provide the same facility twice, in two different ways.
>>
>> Perhaps you could respond more specifically to concerns expressed
>> upthread, like:
>>
>> http://www.postgresql.org/message-id/blu436-smtp28b68b9312cbe5d9125441dc...@phx.gbl
>>
>> I don't see that as a dumb argument on the face of it.
>
> We have a clear "must have" argument for timestamps to support
> replication conflicts.
>
> Adding LSNs, when we already have a way to access them, is much more
> of a nice to have. I don't really see it as a particularly nice to
> have either, since the SLRU is in no way ordered.
>
> Scope creep is a dangerous thing, so we shouldn't, and elsewhere
> don't, collect up ideas like a bag of mixed sweets. It's easy to
> overload things to the point where they don't fly at all. The whole
> point of this is that we're building something faster than trigger
> based systems.

I think that's slamming the door closed and nailing it shut behind
you.  If we add the feature without LSNs, how will someone go back and
add that later?  It would change the on-disk format of the SLRU, so to
avoid breaking pg_upgrade, someone would have to write a conversion
utility.  Even at that, it would slow pg_upgrade down when the feature
has been used.

By way of contrast, adding the feature now is quite easy.  It just
requires storing a few more bytes per transaction.

I am all in favor of incremental development when possible, but not
when it so greatly magnifies the work that needs to be done.  People
have been asking for the ability to determine the commit ordering for
years, and this is a way to do that, very inexpensively, as part of a
patch that is needed anyway.  We are not talking about loading 20 new
requirements on top of this patch; that would be intolerable.  We're
talking about adding one additional piece of information that has been
requested multiple times over the years.

The way I see it, there are at least three uses for this information.
One, trigger-based replication solutions.  While logical decoding will
doubtless be preferable, I don't think trigger-based replication
solutions will go away completely, and certainly not right away.
They've wanted this since forever.  Two, for user applications that
want to know the commit order for their own purposes, as in Steve's
example.  Three, for O(1) snapshots.  Heikki's patch to make that
happen seems to have stalled a bit, but if it's ever to go anywhere it
will need something like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alternative model for handling locking in parallel groups

2014-11-14 Thread Andres Freund
Hi,

On 2014-11-13 15:59:11 -0500, Robert Haas wrote:
> Discussion of my incomplete group locking patch seems to have
> converged around two points: (1) Everybody agrees that undetected
> deadlocks are unacceptable.  (2) Nobody agrees with my proposal to
> treat locks held by group members as mutually non-conflicting.  As was
> probably evident from the emails on the other thread, it was not
> initially clear to me why you'd EVER want heavyweight locks held by
> different group members to mutually conflict, but after thinking it
> over for a while, I started to think of cases where you would
> definitely want that:

> 1. Suppose two or more parallel workers are doing a parallel COPY.
> Each time the relation needs to be extended, one backend or the other
> will need to take the relation extension lock in Exclusive mode.
> Clearly, taking this lock needs to exclude both workers in the same
> group and also unrelated processes.
> 
> 2. Suppose two or more parallel workers are doing a parallel
> sequential scan, with a filter condition of myfunc(a.somecol), and
> that myfunc(a.somecal) updates a tuple in some other table.  Access to
> update that tuple will be serialized using tuple locks, and it's no
> safer for two parallel workers to do this at the same time than it
> would be for two unrelated processes to do it at the same time.
> 
> On the other hand, I think there are also some cases where you pretty
> clearly DO want the locks among group members to be mutually
> non-conflicting, such as:
> 
> 3. Parallel VACUUM.  VACUUM takes ShareUpdateExclusiveLock, so that
> only one process can be vacuuming a relation at the same time.  Now,
> if you've got several processes in a group that are collaborating to
> vacuum that relation, they clearly need to avoid excluding each other,
> but they still need to exclude other people.  And in particular,
> nobody else should get to start vacuuming that relation until the last
> member of the group exits.  So what you want is a
> ShareUpdateExclusiveLock that is, in effect, shared across the whole
> group, so that it's only released when the last process exits.

Note that you'd definitely not want to do this naively - currently
there's baked in assumptions into the vaccum code that only one backend
is doing parts of it.

I think there's

> 4. Parallel query on a locked relation.  Parallel query should work on
> a table created in the current transaction, or one explicitly locked
> by user action.  It's not acceptable for that to just randomly
> deadlock, and skipping parallelism altogether, while it'd probably be
> acceptable for a first version, is not going a good long-term
> solution.

FWIW, I think it's perfectly acceptable to refuse to work in parallel in
that scenario. And not just for now.

The biggest argument I can see to that is parallel index creation.

> It also sounds buggy and fragile for the query planner to
> try to guess whether the lock requests in the parallel workers will
> succeed or fail when issued.

I don't know. Checking whether we hold a self exclusive lock on that
relation doesn't sound very problematic to me.

> Figuring such details out is the job of
> the lock manager or the parallelism infrastructure, not the query
> planner.

I don't think you can sensibly separate the parallelism infrastructure
and the query planner.

> After thinking about these cases for a bit, I came up with a new
> possible approach to this problem.  Suppose that, at the beginning of
> parallelism, when we decide to start up workers, we grant all of the
> locks already held by the master to each worker (ignoring the normal
> rules for lock conflicts).  Thereafter, we do everything the same as
> now, with no changes to the deadlock detector.  That allows the lock
> conflicts to happen normally in the first two cases above, while
> preventing the unwanted lock conflicts in the second two cases.

I don't think that's safe enough. There's e.g. a reason why ANALYZE
requires SUE lock. It'd definitely not be safe to simply grant the
worker a SUE lock, just because the master backend already analyzed it
or something like that. You could end up with the master and worker
backends ANALYZEing the same relation.

That said, I can definitely see use for a infrastructure where we
explicitly can grant another backend a lock that'd conflict with one
we're already holding. But I think it'll need to be more explicit than
just granting all currently held locks at the "highest" current
level. And I think it's not necessarily going to be granting them all
the locks at their current levels.

What I'm thinking of is basically add a step to execMain.c:InitPlan()
that goes through the locks and grants the child process all the locks
that are required for the statement to run. But not possibly preexisting
higher locks.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgs

Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
> > Stephen Frost  writes:
> > > * Noah Misch (n...@leadboat.com) wrote:
> > >> I'd agree for a new design, but I see too little to gain from changing 
> > >> it now.
> > >> Today's behavior is fine.
> > 
> > > To clarify- you mean with the changes described- using usesuper for
> > > rolreplication and rolbypassrls instead of 'false' when dumping from
> > > older versions, correct?
> > 
> > I think Noah is arguing for leaving the pg_dumpall queries as they
> > stand.  I disagree, but he's entitled to his opinion.
> 
> Yes, that.

Ah, ok.  I'm impartial, but I do note that we're currently inconsistent
and so I'd prefer to go one way or the other.

rolcreaterole uses usesuper, while rolreplication and rolbypassrls do
not.  Noah- would you argue that we should change rolcreaterole, which
has this behavior in all released branches (though, of course, it's only
relevant when upgrading from a pre-8.1 server where we didn't have
rolcreaterole)?  What are your thoughts on the additional role
attributes which are being discussed?

> (Adopt Gilles Darold's fix, of course.)

That's been done already.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-11-14 Thread Andres Freund
On 2014-11-13 11:41:18 -0500, Robert Haas wrote:
> On Wed, Nov 12, 2014 at 7:31 PM, Andres Freund  wrote:
> > But I think it won't work realistically. We have a *lot* of
> > infrastructure that refers to indexes using it's primary key. I don't
> > think we want to touch all those places to also disambiguate on some
> > other factor. All the relevant APIs are either just passing around oids
> > or relcache entries.
> 
> I'm not quite following this.  The whole point is to AVOID having two
> indexes.  You have one index which may at times have two sets of
> physical storage.

Right. But how are we going to refer to these different relfilenodes?
All the indexing infrastructure just uses oids and/or Relation pointers
to refer to the index. How would you hand down the knowledge which of
the relfilenodes is supposed to be used in some callchain?

There's ugly solutions like having a flag like 'bool
rd_useotherfilenode' inside struct RelationData, but even ignoring the
uglyness I don't think that'd work well - what if some function called
inside the index code again starts a index lookup?

I think I might just not getting your idea here?

> > And all the
> > indexing infrastructure can't deal with that without having separate
> > oids & relcache entries.

Hopefully explained above?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 11:24:20AM -0500, Tom Lane wrote:
> Stephen Frost  writes:
> > * Noah Misch (n...@leadboat.com) wrote:
> >> I'd agree for a new design, but I see too little to gain from changing it 
> >> now.
> >> Today's behavior is fine.
> 
> > To clarify- you mean with the changes described- using usesuper for
> > rolreplication and rolbypassrls instead of 'false' when dumping from
> > older versions, correct?
> 
> I think Noah is arguing for leaving the pg_dumpall queries as they
> stand.  I disagree, but he's entitled to his opinion.

Yes, that.  (Adopt Gilles Darold's fix, of course.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Tom Lane
Stephen Frost  writes:
> * Noah Misch (n...@leadboat.com) wrote:
>> I'd agree for a new design, but I see too little to gain from changing it 
>> now.
>> Today's behavior is fine.

> To clarify- you mean with the changes described- using usesuper for
> rolreplication and rolbypassrls instead of 'false' when dumping from
> older versions, correct?

I think Noah is arguing for leaving the pg_dumpall queries as they
stand.  I disagree, but he's entitled to his opinion.

> Note for all- rolreplication goes back to 9.1.  Are we thinking that
> change should be backpatched?

I would say it's not really worth back-patching.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote:
> > Personally, I'm leaning towards the first as it's less clutter in the
> > output of psql.
> 
> I'd agree for a new design, but I see too little to gain from changing it now.
> Today's behavior is fine.

To clarify- you mean with the changes described- using usesuper for
rolreplication and rolbypassrls instead of 'false' when dumping from
older versions, correct?

Note for all- rolreplication goes back to 9.1.  Are we thinking that
change should be backpatched?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-11-14 Thread Andres Freund
On 2014-11-14 02:04:00 -0600, Jim Nasby wrote:
> On 11/13/14, 3:50 PM, Andres Freund wrote:
> Having been responsible for a site where downtime was a 6 figure
> dollar amount per hour, I've spent a LOT of time worrying about lock
> problems. The really big issue here isn't grabbing an exclusive lock;
> it's grabbing one at some random time when no one is there to actively
> monitor what's happening. (If you can't handle *any* exclusive locks,
> that also means you can never do an ALTER TABLE ADD COLUMN either.)

> With that in mind, would it be possible to set this up so that the
> time-consuming process of building the new index file happens first,
> and then (optionally) some sort of DBA action is required to actually
> do the relfilenode swap? I realize that's not the most elegant
> solution, but it's WAY better than this feature not hitting 9.5 and
> people having to hand-code a solution.

I don't think having a multi step version of the feature and it not
making into 9.5 are synonymous. And I really don't want to make it even
more complex before we have the basic version in.

I think a split like your:

> Possible syntax:
> REINDEX CONCURRENTLY -- Does what current patch does
> REINDEX CONCURRENT BUILD -- Builds new files
> REINDEX CONCURRENT SWAP -- Swaps new files in

could make sense, but it's really an additional feature ontop.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Tom Lane
Jeremy Harris  writes:
> On 14/11/14 00:46, Simon Riggs wrote:
>> Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
>> ->  Sort  (cost= rows=568733 width=175) (actual time=
>> rows=20 loops=1)
>> Sort Method: top-N heapsort

> Going off on a tangent, when I was playing with a merge-sort
> implementation I propagated limit information into the sort
> node, for a significant win.

I'm not entirely following.  The top-N heapsort approach already
makes use of the limit info.

If the limit is so large that the sort spills to disk, then we
stop thinking about the limit.  But I'm finding it doubtful either
that that's a case worthy of extra code or that you could get very
much win if you did add code for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Teaching pg_dump to use NOT VALID constraints

2014-11-14 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 11/13/14 5:07 AM, Simon Riggs wrote:
> > On 13 November 2014 00:20, Jim Nasby  wrote:
> > 
> >> Isn't the real use-case here that if constraints were valid when you dumped
> >> then we shouldn't have to *any* re-validate when we load? (Though, we'd 
> >> have
> >> to be careful of that with CHECK because that can call user code...)
> > 
> > Yes, that is the use case this patch would improve considerably.
> 
> But your patch doesn't really address that.  It just leaves the
> constraints as invalid, and someone will have to revalidate them later
> (how?).  What Jim was describing was a mode that creates the constraints
> as valid but doesn't actually validate them.  I can see both sides of
> that kind of feature.

This might lead to users introducing invalid data by way of declaring
constants as valid but not checked by the system; if they turn out to be
invalid, the final state is a mess.  I would only buy such a feature if
we had some way to pass down the knowledge of the constraint being valid
in the original system through some other means; say emit a CRC of the
copy data in the pg_dump output that can be verified while loading, and
only allow unvalidated constraints to be marked VALID if the sum
matches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Customized Options Threshold Error

2014-11-14 Thread Tom Lane
 writes:
> When we specify a value which exceeds valid range in "Customized Options" ,
> its behavior is different from "Parameter Interaction via Configuration File" 
> behavior.

> In case of "Parameter Interaction via Configuration File", it finish with 
> FATAL error when it get threshold error.
> But in "Customized Options", it just printout WARNING and continue the 
> process with default value.

I think this is based on a misunderstanding.  Bad values in
postgresql.conf only result in a FATAL error at initial postmaster
startup; if you change them and SIGHUP the server, an erroneous new
value just results in a WARNING.  This is because we don't really
want the server crashing once it's up.  The case you're showing with
registering a new background worker is something that happens after
the server is up, so it's consistent for it to produce a WARNING
not a fatal error.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Noah Misch
On Fri, Nov 14, 2014 at 08:35:25AM -0500, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Noah Misch  writes:
> > > On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote:
> > >> Agreed.  I'll take care of both and we'll make sure the new role
> > >> attributes being added will do the same for upgrades also.
> > 
> > > That would make pg_dumpall less faithful for every role other than the
> > > bootstrap superuser, a net loss.  It would be defensible to do this for
> > > BOOTSTRAP_SUPERUSERID only.
> > 
> > Huh?  It seems difficult to argue that it's "less faithful" to do this
> > when the previous version didn't have the concept at all.
> 
> I believe what Noah is pointing out is that we'll end up adding
> attributes which weren't there already for superusers created by users.

Yes.

> There's a couple ways to address this-
> 
> Stop enabling all the role attribute bits for the bootstrap superuser,
> in which case it'd look a lot more like other superusers that a user
> might create (at least, in my experience, no one bothers to set the role
> attributes beyond superuser in real environments).
> 
> or [...]

> Personally, I'm leaning towards the first as it's less clutter in the
> output of psql.

I'd agree for a new design, but I see too little to gain from changing it now.
Today's behavior is fine.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Segmentation fault in pg_dumpall from master down to 9.1 and other bug introduced by RLS

2014-11-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Noah Misch  writes:
> > On Thu, Nov 13, 2014 at 08:24:36PM -0500, Stephen Frost wrote:
> >> Agreed.  I'll take care of both and we'll make sure the new role
> >> attributes being added will do the same for upgrades also.
> 
> > That would make pg_dumpall less faithful for every role other than the
> > bootstrap superuser, a net loss.  It would be defensible to do this for
> > BOOTSTRAP_SUPERUSERID only.
> 
> Huh?  It seems difficult to argue that it's "less faithful" to do this
> when the previous version didn't have the concept at all.

I believe what Noah is pointing out is that we'll end up adding
attributes which weren't there already for superusers created by users.

You're correct that we currently enable all attributes for the bootstrap
superuser and therefore a dump/restore upgrade looks different from an
initdb, unless the dump includes all new attributes for the bootstrap
superuser.

There's a couple ways to address this-

Stop enabling all the role attribute bits for the bootstrap superuser,
in which case it'd look a lot more like other superusers that a user
might create (at least, in my experience, no one bothers to set the role
attributes beyond superuser in real environments).

or

Reflect actual capability in what is viewed through the catalog.  This
might actually dovetail nicely with the role attribute representation
change which is also being discussed, were we to make pg_authid a view
which called 'has_rolX_privilege' to get the value for each attribute.
What's actually in the bitmask might end up being different, but at
least what's seen in pg_authid (and hopefully for all client tools)
would make sense.  Of course, this also has the downside that if the
superuser bit is removed later, we'd revert to whatever is actually in
the catalog for the user and that'd potentially be different for the
bootstrap superuser vs. user-created superusers.

Personally, I'm leaning towards the first as it's less clutter in the
output of psql.  If we add the role attributes proposed and continue to
enable all of them explicitly for the bootstrap superuser, the
'Attributes' column is going to get mighty wide.  I don't really see the
explicit list of attributes as helping de-mystify what is going on for
users as it's akin to root anyway- doing an 'ls' as root doesn't show
all the file permissions based on what root can do.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread David Rowley
On Fri, Nov 14, 2014 at 1:27 PM, Simon Riggs  wrote:

> On 12 November 2014 00:54, Robert Haas  wrote:
> > Interestingly, I have a fairly solid idea of what proisparallel is,
> > but I have no clear idea what CONTAINS NO SQL is or why it's relevant.
> > I would imagine that srandom() contains no SQL under any reasonable
> > definition of what that means, but it ain't parallel-safe.
>
> What is wrong in generating random numbers in parallel?
>

I was just watching Robert's talk on Parallel query on youtube... I think
the answer is at 41:09, the link below should take you there:
https://www.youtube.com/watch?v=3klfarKEtMQ#t=2469

Regards

David Rowley


Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote:
> Limit  (cost= rows=20 width=175) (actual time= rows=20 loops=1)
>->  Sort  (cost= rows=568733 width=175) (actual time=
> rows=20 loops=1)
>  Sort Method: top-N heapsort

Going off on a tangent, when I was playing with a merge-sort
implementation I propagated limit information into the sort
node, for a significant win.  Eliding the Limit node gave
a further slight win.

I wasn't convinced the use-case was common enough to justify
the replacement of quicksort (despite having consistently
fewer compares, the merge sort was slightly slower.  I never
understood why) - but I never asked.  Is there any appetite
for supporting alternate sort algorithms?
-- 
Cheers,
  Jeremy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Kouhei Kaigai
> On 14 November 2014 07:37, Jim Nasby  wrote:
> > On 11/12/14, 1:54 AM, David Rowley wrote:
> >>
> >> On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs  >> > wrote:
> >>
> >>
> >> This plan type is widely used in reporting queries, so will hit the
> >> mainline of BI applications and many Mat View creations.
> >> This will allow SELECT count(*) FROM foo to go faster also.
> >>
> >> We'd also need to add some infrastructure to merge aggregate states
> >> together for this to work properly. This means that could also work
> >> for
> >> avg() and stddev etc. For max() and min() the merge functions would
> >> likely just be the same as the transition functions.
> >
> >
> > Sanity check: what % of a large aggregate query fed by a seqscan
> > actually spent in the aggregate functions? Even if you look strictly
> > at CPU cost, isn't there more code involved to get data to the
> > aggregate function than in the aggregation itself, except maybe for
> numeric?
> 
> Yes, which is why I suggested pre-aggregating before collecting the streams
> together.
> 
> The point is not that the aggregation is expensive, its that the aggregation
> eats data and the required bandwidth for later steps is reduced and hence
> does not then become a bottleneck that renders the parallel Seq Scan
> ineffective.
> 
I'd like to throw community folks a question. 
Did someone have a discussion to the challenge of aggregate push-down across
relations join in the past? It potentially reduces number of rows to be joined.
If we already had, I'd like to check up the discussion at that time.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size of regression database

2014-11-14 Thread Alvaro Herrera
Tom Lane wrote:
> I was testing backwards compatibility of pg_dumpall just now, and was
> somewhat astonished to notice the size of the output for the regression
> database compared to what it was not too long ago:
> 
> -rw-rw-r--. 1 tgl tgl  4509135 Nov 13 16:19 dumpall.83
> -rw-rw-r--. 1 tgl tgl  4514441 Nov 13 16:24 dumpall.84
> -rw-rw-r--. 1 tgl tgl  4666917 Nov 13 16:15 dumpall.90
> -rw-rw-r--. 1 tgl tgl  4681235 Nov 13 16:15 dumpall.91
> -rw-rw-r--. 1 tgl tgl  5333587 Nov 13 16:15 dumpall.92
> -rw-rw-r--. 1 tgl tgl  5409083 Nov 13 16:15 dumpall.93
> -rw-rw-r--. 1 tgl tgl  5493686 Nov 13 16:15 dumpall.94
> -rw-rw-r--. 1 tgl tgl 27151777 Nov 13 16:21 dumpall.head
> 
> A quick eyeball check says that that quintupling of the database size
> is all in BRIN index tests.  Could we dial that back to something a
> bit saner please?

Oops.  Sure, will see about this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-14 Thread furuyao
Hi,

"pg_ctl stop" does't work propley, if --slot option is specified when WAL is 
flushed only it has switched.
These processes still continue even after the posmaster failed:pg_receivexlog, 
walsender and logger.

How to reproduce:
1.Start PostgreSQL
2.Create slot
3.Specify --slot option to pg_receivexlog and start it
4.Stop PostgreSQL

Example commands for reproduce:
1.pg_ctl start
2.psql -c "select pg_create_physical_replication_slot('test');"
3.pg_receivexlog --slot='test' -D ./testdir
4.pg_ctl stop

This happen cause --slot option set a flush location to feedback messages,
but it only flush when wal has switched, so walsender wait for the WAL have 
been replicated forever.

When --slot option is not specified, 'invalid' is set to flush location and it 
use write location to check replication so it can stop the process propley.

So I thought set 'invalid' to flush location as well in this case, this problem 
will be solved.
Does --slot option need to set flush location?
Thought?

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unintended restart after recovery error

2014-11-14 Thread Antonin Houska
Fujii Masao  wrote:
> On Thu, Nov 13, 2014 at 8:30 AM, Robert Haas  wrote:
>> It's true that if the startup process dies we don't try to restart,
>> but it's also true that if the checkpointer dies we do try to restart.
>> I'm not sure why this specific situation should be an exception to
>> that general rule.

My distinction was "during recovery" vs "outside recovery", rather than
"startup process" vs "checkpointer". But I'm not sure it's easy enough to
recognize that checkpointer (maybe also bgwriter) no longer participates in
recovery.

> 442231d7f71764b8c628044e7ce2225f9aa43b6 introduced the latter rule
> for hot-standby case.

I didn't fully understand the purpose of this condition until I saw the commit
message. Thanks for pointing out.

> Maybe *during crash recovery* (i.e., hot standby should not be enabled) it's
> better to treat the crash of startup process as a catastrophic crash.

Yes, that's what I thought too. But I think the current StartupXLOG() does not
always (need to) determine the exact boundary between crash and archive
recovery. I'd need to think more if the end of crash recovery can be safely
identified during the replay and signaled to postmaster.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Simon Riggs
On 14 November 2014 01:51, David Rowley  wrote:

> When I mentioned this, I didn't mean to appear to be placing a road block.I
> was just bringing to the table the information that COUNT(*) + COUNT(*)
> works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) does not.
>
>  Merge functions should be a simple patch to write. If I thought there was
> going to be a use for them in this release, I'd put my hand up to put a
> patch together.

The hard part is describing and then agreeing the semantics.

If you raise a separate post on this, copy me in and I will help.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread Simon Riggs
On 14 November 2014 07:37, Jim Nasby  wrote:
> On 11/12/14, 1:54 AM, David Rowley wrote:
>>
>> On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs > > wrote:
>>
>>
>> This plan type is widely used in reporting queries, so will hit the
>> mainline of BI applications and many Mat View creations.
>> This will allow SELECT count(*) FROM foo to go faster also.
>>
>> We'd also need to add some infrastructure to merge aggregate states
>> together for this to work properly. This means that could also work for
>> avg() and stddev etc. For max() and min() the merge functions would likely
>> just be the same as the transition functions.
>
>
> Sanity check: what % of a large aggregate query fed by a seqscan actually
> spent in the aggregate functions? Even if you look strictly at CPU cost,
> isn't there more code involved to get data to the aggregate function than in
> the aggregation itself, except maybe for numeric?

Yes, which is why I suggested pre-aggregating before collecting the
streams together.

The point is not that the aggregation is expensive, its that the
aggregation eats data and the required bandwidth for later steps is
reduced and hence does not then become a bottleneck that renders the
parallel Seq Scan ineffective.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-14 Thread David Rowley
On 14 November 2014 20:37, Jim Nasby  wrote:

> On 11/12/14, 1:54 AM, David Rowley wrote:
>
>>
>> We'd also need to add some infrastructure to merge aggregate states
>> together for this to work properly. This means that could also work for
>> avg() and stddev etc. For max() and min() the merge functions would likely
>> just be the same as the transition functions.
>>
>
> Sanity check: what % of a large aggregate query fed by a seqscan actually
> spent in the aggregate functions? Even if you look strictly at CPU cost,
> isn't there more code involved to get data to the aggregate function than
> in the aggregation itself, except maybe for numeric?
>
>
You might be right, but that sounds like it would need all the parallel
workers to send each matching tuple to a queue to be processed by some
aggregate node. I guess this would have more advantage for wider tables or
tables with many dead tuples, or if the query has quite a selective where
clause, as less data would make it onto that queue.

Perhaps I've taken 1 step too far forward here. I had been thinking that
each worker would perform the partial seqscan and in the worker context
pass the tuple down to the aggregate node. Then later once each worker had
complete some other perhaps new node type (MergeAggregateStates) would
merge all those intermediate agg states into the final agg state (which
would then be ready for the final function to be called).

Are there any plans for what will be in charge of deciding how many workers
would be allocated to a parallel query? Will this be something that's done
at planning time? Or should the planner just create a parallel friendly
plan, iif the plan is costly enough and then just allow the executor decide
how many workers to throw at the job based on how busy the system is with
other tasks at execution time?

Regards

David Rowley


Re: [HACKERS] alter user/role CURRENT_USER

2014-11-14 Thread Kyotaro HORIGUCHI
Hi, this is revised version.

> Kyotaro HORIGUCHI wrote:
> 
> > - Storage for new information
> > 
> > The new struct NameId stores an identifier which telling what it
> > logically is using the new enum NameIdTypes.
> 
> I think NameId is a bad name for this.  My point is that NameId, as it
> stands, might be a name for anything, not just a role; and the object it
> identifies is not an Id either.  Maybe RoleSpec?

Yeah! I felt it no good even if it were a generic type for
various "Name of something or its oid". RoleSpec sounds much better.

> Do we need a public_ok
> argument to get_nameid_oid() (get a better name for this function too)

Maybe get_rolespec_oid() as a name ofter its parameter type?

> so that callers don't have to check for InvalidOid argument?  I think
> the arrangement you propose is not very convenient; it'd be best to
> avoid duplicating the check for InvalidOid in all callers of the new
> function, particularly where there was no check before.

I agree that It'd be better keeping away from duplicated
InvalidOid checks, but public_ok seems a bit myopic. Since
there's no reasonable border between functions accepting 'public'
and others, such kind of solution would not be reasonable..

What about checking it being a PUBLIC or not *before* calling
get_rolespec_oid()?

The attached patch modified in the following points.

 - rename NameId to RoleSpec and NameIdType to RoleSpecTypes.
 - rename get_nameid_oid() to get_rolespec_oid().
 - rename roleNamesToIds() to roleSpecsToIds().
 - some struct members are changed such as authname to authrole.
 - check if rolespec is "public" or not before calling get_rolespec_oid()
 - ExecAlterDefaultPrivilegesStmt and ExecuteGrantStmt does
   slightly different things about ACL_ID_PUBLIC but I unified it
   to the latter.
 - rebased to the current master

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

CreateStmt->authrole = NULL => ?
>From 307249654c97b6449261febbfd84190fbad9111d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 14 Nov 2014 17:37:22 +0900
Subject: [PATCH] ALTER USER CURRENT_USER v3

---
 src/backend/catalog/aclchk.c   |  30 +++---
 src/backend/commands/alter.c   |   2 +-
 src/backend/commands/extension.c   |   2 +-
 src/backend/commands/foreigncmds.c |  57 +--
 src/backend/commands/schemacmds.c  |  30 +-
 src/backend/commands/tablecmds.c   |   4 +-
 src/backend/commands/tablespace.c  |   2 +-
 src/backend/commands/user.c|  86 +
 src/backend/nodes/copyfuncs.c  |  37 +---
 src/backend/nodes/equalfuncs.c |  35 ---
 src/backend/parser/gram.y  | 190 +++--
 src/backend/parser/parse_utilcmd.c |   4 +-
 src/backend/utils/adt/acl.c|  34 +++
 src/include/commands/user.h|   2 +-
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/parsenodes.h |  48 +++---
 src/include/utils/acl.h|   2 +-
 17 files changed, 385 insertions(+), 181 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index d30612c..24811c6 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -430,13 +430,16 @@ ExecuteGrantStmt(GrantStmt *stmt)
 	foreach(cell, stmt->grantees)
 	{
 		PrivGrantee *grantee = (PrivGrantee *) lfirst(cell);
+		Oid grantee_uid = ACL_ID_PUBLIC;
 
-		if (grantee->rolname == NULL)
-			istmt.grantees = lappend_oid(istmt.grantees, ACL_ID_PUBLIC);
-		else
-			istmt.grantees =
-lappend_oid(istmt.grantees,
-			get_role_oid(grantee->rolname, false));
+		/* "public" is mapped to ACL_ID_PUBLIC */
+		if (grantee->role->roltype != ROLESPEC_PUBLIC)
+		{
+			grantee_uid = get_rolespec_oid(grantee->role, false);
+			if (!OidIsValid(grantee_uid))
+grantee_uid = ACL_ID_PUBLIC;
+		}
+		istmt.grantees = lappend_oid(istmt.grantees, grantee_uid);
 	}
 
 	/*
@@ -913,13 +916,16 @@ ExecAlterDefaultPrivilegesStmt(AlterDefaultPrivilegesStmt *stmt)
 	foreach(cell, action->grantees)
 	{
 		PrivGrantee *grantee = (PrivGrantee *) lfirst(cell);
+		Oid grantee_uid = ACL_ID_PUBLIC;
 
-		if (grantee->rolname == NULL)
-			iacls.grantees = lappend_oid(iacls.grantees, ACL_ID_PUBLIC);
-		else
-			iacls.grantees =
-lappend_oid(iacls.grantees,
-			get_role_oid(grantee->rolname, false));
+		/* "public" is mapped to ACL_ID_PUBLIC */
+		if (grantee->role->roltype != ROLESPEC_PUBLIC)
+		{
+			grantee_uid = get_rolespec_oid(grantee->role, false);
+			if (!OidIsValid(grantee_uid))
+grantee_uid = ACL_ID_PUBLIC;
+		}
+		iacls.grantees = lappend_oid(iacls.grantees, grantee_uid);
 	}
 
 	/*
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index c9a9baf..c53d4e5 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -678,7 +678,7 @@ AlterObjectNamespace_internal(Relation rel, Oid objid, Oid nspOid)
 Oid
 ExecAlterOwnerStmt(AlterOwnerStmt *stmt)
 {
-	Oid			newowner = get_role_oid(stmt->newowner, false);

Re: [HACKERS] WAL format and API changes (9.5)

2014-11-14 Thread Michael Paquier
On Thu, Nov 13, 2014 at 10:33 PM, Heikki Linnakangas
 wrote:
> In quick testing, this new WAL format is somewhat more compact than the 9.4
> format. That also seems to have more than bought back the performance
> regression I saw earlier. Here are results from my laptop, using the
> wal-update-testsuite.sh script:
> master:
> [results]
> (27 rows)
> wal-format-and-api-changes-9.patch:
> [results]
> (27 rows)
So based on your series of tests, you are saving 6% up to 10%.
That's... Really cool!

> Aside from the WAL record format changes, this patch adds the "decoded WAL
> record" infrastructure that we talked about with Andres. XLogReader now has
> a new function, DecodeXLogRecord, which parses the block headers etc. from
> the WAL record, and copies the data chunks to aligned buffers. The redo
> routines are passed a pointer to the XLogReaderState, instead of the plain
> XLogRecord, and the redo routines can use macros and functions defined
> xlogreader.h to access the already-decoded WAL record. The new WAL record
> format is difficult to parse in a piece-meal fashion, so it really needs
> this separate decoding pass to be efficient.
>
> Thoughts on this new WAL record format? I've attached the xlogrecord.h file
> here separately for easy reading, if you want to take a quick look at just
> that without applying the whole patch.
The new format is neat, and that's a lot of code.. Grouping together
the blocks of data is a good thing for the FPW compression patch as
well.

Note that this patch conflicts with the recent commits 81c4508 and
34402ae. installcheck-world is passing, and standbys are able to
replay records, at least without crashes.

Here are some more comments:
1) with assertions enabled this does not compile because of a small
typo in xlogreader.h here:
+#define XLogRecHasAnyBlockRefs(decoder) ((decoder)->max_block id >= 0)
2) In xlogreader.c, XLogRecGetBlockData should return char * but a
boolean is returned here:
+XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len)
+{
+   DecodedBkpBlock *bkpb;
+
+   if (!record->blocks[block_id].in_use)
+   return false;
As no blocks are in use in this case this should be NULL.
3) pg_xlogdump does not seem to work:
$ pg_xlogdump 0001000D
pg_xlogdump: FATAL:  could not find a valid record after 0/D00
4) A couple of NOT_USED blocks could be removed, no?
+#ifdef NOT_USED
BlockNumber leftChildBlkno = InvalidBlockNumber;
+#endif
5) Here why not using the 2nd block instead of the 3rd (@_bt_getroot)?
+   XLogBeginInsert();
+   XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
+   XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
6) There are FIXME blocks:
+   // FIXME
+   //if ((bkpb->fork_flags & BKPBLOCK_WILL_INIT) != 0 &&
mode != RBM_ZERO)
+   //  elog(PANIC, "block with WILL_INIT flag in WAL
record must be zeroed by redo routine");]
And that:
+   /* FIXME: translation? Although this shouldn't happen.. */
+   ereport(ERROR,
+   (errmsg("error decoding WAL record"),
+errdetail("%s", errormsg)));
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in HEAP_NEWPAGE logging makes diagnosis harder

2014-11-14 Thread Heikki Linnakangas

On 10/30/2014 04:12 PM, Andres Freund wrote:

Hi,

I've just once more looked at the WAL stream ans was briefly confused
about all the XLOG_FPI records. Since 54685338e3
log_newpage/log_newpage_buffer and XLogSaveBufferForHint() use the same
WAL record. I personally find that a bad idea because they're used in
quite different situations.

Can we use different IDs again?


Yeah, we should do something about that. I'll fix that after the WAL 
format changes stuff.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-11-14 Thread Jim Nasby

On 11/13/14, 3:50 PM, Andres Freund wrote:

On November 13, 2014 10:23:41 PM CET, Peter Eisentraut  wrote:

On 11/12/14 7:31 PM, Andres Freund wrote:

Yes, it sucks. But it beats not being able to reindex a relation with

a

primary key (referenced by a fkey) without waiting several hours by a
couple magnitudes. And that's the current situation.


That's fine, but we have, for better or worse, defined CONCURRENTLY :=
does not take exclusive locks.  Use a different adverb for an
in-between
facility.


I think that's not actually a service to our users. They'll have to adapt their 
scripts and knowledge when we get around to the more concurrent version. What 
exactly CONCURRENTLY means is already not strictly defined and differs between 
the actions.


It also means that if we ever found a way to get rid of the exclusive lock we'd 
then have an inconsistency anyway. Or we'd also create REINDEX CONCURRENT at 
that time, and then have 2 command syntaxes to support.


I'll note that DROP INDEX CONCURRENTLY actually already  internally acquires an 
AEL lock. Although it's a bit harder to see the consequences of that.


Having been responsible for a site where downtime was a 6 figure dollar amount 
per hour, I've spent a LOT of time worrying about lock problems. The really big 
issue here isn't grabbing an exclusive lock; it's grabbing one at some random 
time when no one is there to actively monitor what's happening. (If you can't 
handle *any* exclusive locks, that also means you can never do an ALTER TABLE 
ADD COLUMN either.) With that in mind, would it be possible to set this up so 
that the time-consuming process of building the new index file happens first, 
and then (optionally) some sort of DBA action is required to actually do the 
relfilenode swap? I realize that's not the most elegant solution, but it's WAY 
better than this feature not hitting 9.5 and people having to hand-code a 
solution.

Possible syntax:
REINDEX CONCURRENTLY -- Does what current patch does
REINDEX CONCURRENT BUILD -- Builds new files
REINDEX CONCURRENT SWAP -- Swaps new files in

This suffers from the syntax problems I mentioned above, but at least this way 
it's all limited to one command, and it probably allows a lot more people to 
use it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Customized Options Threshold Error

2014-11-14 Thread furuyao
Hi,

When we specify a value which exceeds valid range in "Customized Options" ,
its behavior is different from "Parameter Interaction via Configuration File" 
behavior.

In case of "Parameter Interaction via Configuration File", it finish with FATAL 
error when it get threshold error.
But in "Customized Options", it just printout WARNING and continue the process 
with default value.

In manual, it says "issue warnings for any unrecognized placeholders that begin 
with its extension name" written in "18.16. Customized Options".
So when extension name is correct and the value exceeds the threshold range,
I think it should end with FATAL error just like "Parameter Interaction via 
Configuration File".
Thought?

Following is the threshold error log. One is "Customized Options Threshold 
Error" and another is "Parameter Interaction via Configuration File Threshold 
Error".

"Customized Options Threshold Error"

 WARNING:  invalid value for parameter "pg_receivexlog_mng.monitor_interval": 
"2147483648"
 HINT:  Value exceeds integer range.
 LOG:  registering background worker "pg_receivexlog_mng"
 
"Parameter Interaction via Configuration File Threshold Error"

 FATAL:  configuration file "/dbfp/data/postgresql.conf" contains errors

Regards,

--
Furuya Osamu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers