Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Michael Paquier
On Fri, Dec 16, 2016 at 3:32 PM, Magnus Hagander  wrote:
> I don't really know how to write a good test for it. I mean, we could run it
> with the parameter, but how to we make it actually verify the slot? Make a
> really big database to make it guaranteed to be slow enough that we can
> notice it in a different session?

No, not that. I was meaning tests to trigger the error code paths with
the compatibility switches you are adding.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Abhijit Menon-Sen
At 2016-12-16 07:32:24 +0100, mag...@hagander.net wrote:
>
> I really think the default should be "what most people want", and not
> "whatever is compatible with a mode that was necessary because we
> lacked infrastructure".

Very much agreed.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Magnus Hagander
On Dec 16, 2016 07:27, "Michael Paquier"  wrote:



On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander 
wrote:
> So here's a patch that does this, for discussion. It implements the
> following behavior for -X:
>
> * When used with <10.0 servers, behave just like before.
> * When -S  is specified, behave just like before (use an existing
> replication slot, fail if it does not exist)
> * When used on 10.0 with no -S, create and use a temporary replication
slot
> while running, with name pg_basebackup_.
> * When used with 10.0 with no -S but --no-slot specified, run without a
slot
> like before.

There are users using -X stream without a slot now because they don't want
to
cause WAL retention in pg_xlog and are ready for retries in taking the base
backup... I am wondering if it is a good idea to change the default behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs on
write
for example. And it may be a good idea to be able to use --slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot name is
given
with --slot generating one looks fine to me.


I really think the default should be "what most people want", and not
"whatever is compatible with a mode that was necessary because we lacked
infrastructure".

Yes there are some people who are fine with their backups failing. I'm
willing to say there are *many* more who benefit from an automatic slot.


Regarding the patch, it would be good to add a set of TAP tests to cover the
new features. I am not seeing anything badly wrong with it at first sight by
the way


I don't really know how to write a good test for it. I mean, we could run
it with the parameter, but how to we make it actually verify the slot? Make
a really big database to make it guaranteed to be slow enough that we can
notice it in a different session?

/Magnus


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Michael Paquier


On Thu, Dec 15, 2016 at 7:28 PM, Magnus Hagander  wrote:
> So here's a patch that does this, for discussion. It implements the
> following behavior for -X:
>
> * When used with <10.0 servers, behave just like before.
> * When -S  is specified, behave just like before (use an existing
> replication slot, fail if it does not exist)
> * When used on 10.0 with no -S, create and use a temporary replication slot
> while running, with name pg_basebackup_.
> * When used with 10.0 with no -S but --no-slot specified, run without a slot
> like before.

There are users using -X stream without a slot now because they don't want to
cause WAL retention in pg_xlog and are ready for retries in taking the base
backup... I am wondering if it is a good idea to change the default behavior
and not introduce a new option like --temporary-slot, or
--slot-mode=temp|persistent|none with none being the default instead. There
are users caring about pg_xlog not filling up if pg_basebackup hangs on write
for example. And it may be a good idea to be able to use --slot-mode=temp
with -S/--slot actually to allow users to monitor it. If no slot name is given
with --slot generating one looks fine to me.

Regarding the patch, it would be good to add a set of TAP tests to cover the
new features. I am not seeing anything badly wrong with it at first sight by
the way.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2016-12-15 Thread David Fetter
On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:
> Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
> > 
> > 
> > On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser  > > wrote:
> > 
> > 
> > We decided to follow your recommendation and add the patch to the
> > commitfest.
> > 
> > 
> > Path is not applying properly to HEAD.
> > Moved to next CF with "waiting on author" status.
> > 
> 
> We updated our patch. We tested it with the latest
> commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

This looks neat, but it no longer applies to master.  Is a rebase in
the offing?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-15 Thread Michael Paquier
On Thu, Dec 15, 2016 at 11:25:10AM +0200, Heikki Linnakangas wrote:
> On 12/15/2016 10:44 AM, Magnus Hagander wrote:
> > I wonder if there might be more corner cases like this, but in this
> > particular one it seems easy enough to just say that failing to rename
> > recovery.conf because it didn't exist is safe.
> 
> Yeah. It's unexpected though, so I think erroring out is quite reasonable.
> If the recovery.conf file went missing, who knows what else is wrong.

i'd rather not mess up with the exiting behavior and just complain to the
pilot to not do that. This enters in the category of not removing the
backup_label file after taking a backup...

> > But in the case of failing to rename recovery.conf for example because of
> > permissions errors, we don't want to ignore it. But we also really don't
> > want to end up with this kind of inconsistent data directory IMO. I don't
> > know that code well enough to suggest how to fix it though -- hoping for
> > input for someone who knows it closer?
> 
> Hmm. Perhaps we should write the timeline history file only after renaming
> recovery.conf. In general, we should keep the window between writing the
> timeline history file and writing the end-of-recovery record as small as
> possible. I don't think we can eliminate it completely, but it makes sense
> to minimize it. Something like the attached (completely untested).

I did some tests. And after a lookup it looks like a good thing to book
the timeline number at an earlier step and let other nodes know about
it. So +1 on it.

Looking at PrescanPreparedTransactions(), I am thinking as well that it would
be better to get a hard failure when bumping on a corrupted 2PC file. Future
files are one thing, but corrupted files should be treated more carefully.
-- 
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-15 Thread Michael Paquier
On Thu, Dec 15, 2016 at 6:08 PM, Masahiko Sawada  wrote:
> Attached latest v12 patch.
> I changed behavior of "N (standby_list)" to use the priority method
> and incorporated some review comments so far. Please review it.

Some comments...

+Another example of synchronous_standby_names for multiple
+synchronous standby is:
Here standby takes an 's'.

+candidates. The master server will wait for at least 2 replies from them.
+s4 is an asynchronous standby since its name is not in the 
list.
+   
"will wait for replies from at least two of them".

+ * next-highest-priority standby. In quorum method, the all standbys
+ * appearing in the list are considered as a candidate for quorum commit.
"the all" is incorrect. I think you mean "all the" instead.

+ * NIL if no sync standby is connected. In quorum method, all standby
+ * priorities are same, that is 1. So this function returns the list of
This is not true. Standys have a priority number assigned. Though it does
not matter much for quorum groups, it gives an indication of their position
in the defined list.

 #synchronous_standby_names = ''# standby servers that provide sync rep
 -   # number of sync standbys and comma-separated list of 
application_name
 +   # synchronization method, number of sync standbys
 +   # and comma-separated list of application_name
 # from standby(s); '*' = all
The formulation is funny here: "sync rep synchronization method".

I think that Fujii-san has also some doc changes in his box. For anybody
picking up this patch next, it would be good to incorporate the things
I have noticed here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Etsuro Fujita

On 2016/12/16 1:39, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/12/13 23:13, Ashutosh Bapat wrote:

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath.



Seems reasonable.



No, because GetExistingLocalJoinPath is called once per relation not once
per path.  Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.


Hmm, I agree on that point that giving it up might result in a bad plan.

As I said upthread, an alternative I am thinking is (1) to create an 
equivalent nestloop join path using inner/outer paths of a foreign join 
path, except when that join path implements a full join, in which case a 
merge/hash join path is used, (2) store it in fdw_outerpath as-is, and 
(3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer 
subplan created from the fdw_outerpath as-is.  What do you think about that?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-15 Thread Michael Paquier


On Thu, Dec 15, 2016 at 9:48 PM, Heikki Linnakangas  wrote:
> The only way to distinguish, is to know about every verifier kind there is,
> and check whether rolpassword looks valid as anything else than a plaintext
> password. And we already got tripped by a bug-of-omission on that once. If
> we add more verifier formats in the future, it's bound to happen again.
> Let's nip that source of bugs in the bud. Attached is a patch to implement
> what I have in mind.

OK, I had a look at the patch proposed.

-if (!pg_md5_encrypt(username, username, namelen, encrypted))
-elog(ERROR, "password encryption failed");
-if (strcmp(password, encrypted) == 0)
-ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("password must not contain user name")));

This patch removes the only possible check for MD5 hashes that it has
never been done in passwordcheck. It may be fine to remove it, but I would
think that it is a good source of example regarding what could be done with
MD5 hashes, though limited. So it seems to me that this check should involve
as well pg_md5_encrypt on the username and compare if with the MD5 hash
given by the caller. The new code is being careful about trying to pass
down a plain password, but it is possible to load MD5 hashes directly as
well, aka pg_dumpall.

A simple ALTER USER role PASSWORD 'foo' causes a crash:
#0  0x004764d7 in heap_compute_data_size (tupleDesc=0x277f090, 
values=0x27504b8, isnull=0x2750550 "") at heaptuple.c:106
106 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
(gdb) bt
#0  0x004764d7 in heap_compute_data_size (tupleDesc=0x277f090, 
values=0x27504b8, isnull=0x2750550 "") at heaptuple.c:106
#1  0x004781e9 in heap_form_tuple (tupleDescriptor=0x277f090, 
values=0x27504b8, isnull=0x2750550 "") at heaptuple.c:736
#2  0x004784d0 in heap_modify_tuple (tuple=0x277adc8, 
tupleDesc=0x277f090, replValues=0x7fff1369d030, replIsnull=0x7fff1369d020 "", 
doReplace=0x7fff1369d010 "")
at heaptuple.c:833
#3  0x00673788 in AlterRole (stmt=0x27a4f78) at user.c:845
#4  0x0082aa49 in standard_ProcessUtility (parsetree=0x27a4f78, 
queryString=0x27a43e8 "alter role ioltas password 'toto';", 
context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, dest=0x27a5300, completionTag=0x7fff1369d5b0 "") at 
utility.c:711

+case PASSWORD_TYPE_PLAINTEXT:
+shadow_pass = _pass[strlen("plain:")];
+break;
It would be a good idea to have a generic routine able to get the plain
password value. In short I think that we should reduce the amount of
locations where "plain:" prefix is hardcoded.

> Alternatively, you could argue that we should forbid storing passwords in
> plaintext altogether. I'm OK with that, too, if that's what people prefer.
> Then you cannot have a user that can log in with both MD5 and SCRAM
> authentication, but it's certainly more secure, and it's easier to document.

At the end this may prove to be a bad idea for some developers. In local
deployments when working on a backend application with Postgres as backend,
it is actually useful to have plain passwords. At least I have found that
useful in some stuff I did many years ago.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-15 Thread Josh Berkus
On 12/15/2016 12:54 PM, Tom Lane wrote:
> Magnus Hagander  writes:
>> On Thu, Dec 15, 2016 at 1:11 AM, Bruce Momjian  wrote:
>>> You are saying this is more massive than any other change we have made
>>> in the past?  In general, what need to be documented?
> 
>> I don't necessarily think it's because it's more massive than any chance we
>> have made before. I think it's more that this is something that we probably
>> should've had before, and just didn't.
> 
>> Right now we basically have a bulletpoint list of things that are new, with
>> a section about things that are incompatible.  Having an actual section
>> with more detailed descriptions of how to handle these changes would
>> definitely be a win. it shouldn't *just* be for these changes of course, it
>> should be for any other changes that are large enough to benefit from more
>> than a oneliner about the fact that they've changed.
> 
> Yeah, it seems to me that where this is ending up is "we may need to
> write more in the compatibility entries than we have in the past".
> I don't see any problem with that, particularly if someone other than
> Bruce or me is volunteering to write it ;-)

I'm up for writing it (with help from feature owners), provided that I
don't have to spend time arguing that it's not too long, or that I
should put it somewhere different.  So can we settle the "where"
question first?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in TS_phrase_execute

2016-12-15 Thread Tom Lane
I wrote:
>> but I notice that some normalization seems to be getting done by
>> tsqueryin:

>> regression=# select $$( 'sanct' & 'peter' ) <-> ( 'sanct' & 'peter' 
>> )$$::tsquery;
>> tsquery 
>> ---
>>  'sanct' <-> 'sanct' & 'peter' <-> 'sanct' & 'sanct' <-> 'peter' & 'peter' 
>> <-> 'peter'
>> (1 row)

> BTW, it seems like that normalization is wrong.  The transformed query
> should (and does) match the string "sanct sanct peter sanct sanct peter
> peter peter", since each of the <-> pairs has a match somewhere in there.
> But I would expect the original query to be specifying that a match occurs
> at exactly one place, which of course is unsatisfiable since 'sanct' and
> 'peter' can't match the same word.

After further thought, it seems like a correct transformation would be
to replace & underneath a PHRASE operator with <0>, ie

('a' & 'b')  ('c' & 'd')

becomes

('a' <0> 'b')  ('c' <0> 'd')

This would have the same effect of getting rid of non-PHRASE operators
underneath a PHRASE, and it would produce what seems to me much less
surprising results, ie you get a match only when both sides of the &
can match at the same place.  Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-15 Thread Tom Lane
Magnus Hagander  writes:
> On Thu, Dec 15, 2016 at 1:11 AM, Bruce Momjian  wrote:
>> You are saying this is more massive than any other change we have made
>> in the past?  In general, what need to be documented?

> I don't necessarily think it's because it's more massive than any chance we
> have made before. I think it's more that this is something that we probably
> should've had before, and just didn't.

> Right now we basically have a bulletpoint list of things that are new, with
> a section about things that are incompatible.  Having an actual section
> with more detailed descriptions of how to handle these changes would
> definitely be a win. it shouldn't *just* be for these changes of course, it
> should be for any other changes that are large enough to benefit from more
> than a oneliner about the fact that they've changed.

Yeah, it seems to me that where this is ending up is "we may need to
write more in the compatibility entries than we have in the past".
I don't see any problem with that, particularly if someone other than
Bruce or me is volunteering to write it ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New design for FK-based join selectivity estimation

2016-12-15 Thread Tom Lane
I wrote:
> ronan.dunk...@dalibo.com writes:
>> If I understand it correctly and the above is right, I think we should ignore
>> SEMI or ANTI joins altogether when considering FKs, and keep the 
>> corresponding
>> restrictinfos for later processing since they are already special-cased later
>> on.

> That seems like an overreaction.  While the old code happens to get this
> example exactly right, eqjoinsel_semi is still full of assumptions and
> approximations, and it doesn't do very well at all if it lacks MCV lists
> for both sides.

> I'm inclined to think that what we want to have happen in this case is
> to estimate the fraction of outer rows having a match as equal to the
> selectivity of the inner query's WHERE clauses, ie the semijoin
> selectivity should be sizeof(inner result) divided by sizeof(inner
> relation).

After further study, I concluded that we can only easily estimate that
when the inner side of the SEMI or ANTI join is just the single referenced
table.  If the inner side is itself a join, it's not easy to determine
what fraction of the referenced table will survive the join clauses.

However, we can still be brighter than to just throw all the FK qual
clauses back into the pool: that would result in multiplying their
selectivity estimates together, which for a multi-column FK results in
exactly the drastic underestimation that 100340e2d intended to avoid.
What seems to make sense here is to take the minimum of the per-clause
selectivities, as we are doing for other outer-join cases.

Hence, I propose the attached patch.  This rearranges the existing code
slightly to avoid duplicating it.

regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e42895d..415edad 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** get_foreign_key_join_selectivity(Planner
*** 4085,4090 
--- 4085,4091 
  	{
  		ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc);
  		bool		ref_is_outer;
+ 		bool		use_smallest_selectivity = false;
  		List	   *removedlist;
  		ListCell   *cell;
  		ListCell   *prev;
*** get_foreign_key_join_selectivity(Planner
*** 4205,4213 
  		 * be double-counting the null fraction, and (2) it's not very clear
  		 * how to combine null fractions for multiple referencing columns.
  		 *
! 		 * In the first branch of the logic below, null derating is done
! 		 * implicitly by relying on clause_selectivity(); in the other two
! 		 * paths, we do nothing for now about correcting for nulls.
  		 *
  		 * XXX another point here is that if either side of an FK constraint
  		 * is an inheritance parent, we estimate as though the constraint
--- 4206,4214 
  		 * be double-counting the null fraction, and (2) it's not very clear
  		 * how to combine null fractions for multiple referencing columns.
  		 *
! 		 * In the use_smallest_selectivity code below, null derating is done
! 		 * implicitly by relying on clause_selectivity(); in the other cases,
! 		 * we do nothing for now about correcting for nulls.
  		 *
  		 * XXX another point here is that if either side of an FK constraint
  		 * is an inheritance parent, we estimate as though the constraint
*** get_foreign_key_join_selectivity(Planner
*** 4230,4257 
  			 * the smallest per-column selectivity, instead.  (This should
  			 * correspond to the FK column with the most nulls.)
  			 */
! 			Selectivity thisfksel = 1.0;
! 
! 			foreach(cell, removedlist)
! 			{
! RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
! Selectivity csel;
! 
! csel = clause_selectivity(root, (Node *) rinfo,
! 		  0, jointype, sjinfo);
! thisfksel = Min(thisfksel, csel);
! 			}
! 			fkselec *= thisfksel;
  		}
  		else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
  		{
  			/*
  			 * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
! 			 * fraction of LHS rows that have matches.  If the referenced
! 			 * table is on the inner side, that means the selectivity is 1.0
! 			 * (modulo nulls, which we're ignoring for now).  We already
! 			 * covered the other case, so no work here.
  			 */
  		}
  		else
  		{
--- 4231,4271 
  			 * the smallest per-column selectivity, instead.  (This should
  			 * correspond to the FK column with the most nulls.)
  			 */
! 			use_smallest_selectivity = true;
  		}
  		else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
  		{
  			/*
  			 * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
! 			 * fraction of LHS rows that have matches.  The referenced table
! 			 * is on the inner side (we already handled the other case above),
! 			 * so the FK implies that every LHS row has a match *in the
! 			 * referenced table*.  But any restriction or join clauses below
! 			 * here will reduce the number of matches.
  			 */
+ 			if (bms_membership(inner_relids) == 

Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 1:11 AM, Bruce Momjian  wrote:

> On Wed, Dec 14, 2016 at 02:29:07PM -0800, Josh Berkus wrote:
> > On 12/14/2016 08:06 AM, Bruce Momjian wrote:
> > > On Fri, Dec  9, 2016 at 09:46:44AM +0900, Michael Paquier wrote:
> >  My own take on it is that the release notes are already a massive
> >  amount of work, and putting duplicative material in a bunch of other
> >  places isn't going to make things better, it'll just increase the
> >  maintenance burden.
> > >>>
> > >>> This would mean adding literally pages of material to the release
> notes.
> > >>> In the past, folks have been very negative on anything which would
> make
> > >>> the release notes longer.  Are you sure?
> > >>
> > >> As that's a per-version information, that seems adapted to me. There
> > >> could be as well in the release notes a link to the portion of the
> > >> docs holding this manual. Definitely this should be self-contained in
> > >> the docs, and not mention the wiki. My 2c.
> > >
> > > Yes, that is the usual approach.
> > >
> >
> > So where in the docs should these go, then?  We don't (currently) have a
> > place for this kind of doc.  Appendices?
>
> You are saying this is more massive than any other change we have made
> in the past?  In general, what need to be documented?
>
>
I don't necessarily think it's because it's more massive than any chance we
have made before. I think it's more that this is something that we probably
should've had before, and just didn't.

Right now we basically have a bulletpoint list of things that are new, with
a section about things that are incompatible.  Having an actual section
with more detailed descriptions of how to handle these changes would
definitely be a win. it shouldn't *just* be for these changes of course, it
should be for any other changes that are large enough to benefit from more
than a oneliner about the fact that they've changed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-15 Thread Peter Eisentraut
On 12/15/16 1:27 PM, Magnus Hagander wrote:
> On Thu, Dec 15, 2016 at 7:23 PM, Peter Eisentraut
>  > wrote:
> 
> On 11/8/16 12:45 PM, Magnus Hagander wrote:
> > Per some discussions with a number of different people at pgconfeu, here
> > is a patch that changes the default mode of pg_basebackup to be
> > streaming the wal, as this is what most users would want -- and those
> > that don't want it have to make other changes as well. Doing the "most
> > safe" thing by default is a good idea.
> >
> > The new option "-x none" is provided to turn this off and get the
> > previous behavior back.
> 
> I would have expected that the "stream" method would become the default.
>  Just a short while ago it was proposed to remove the "fetch" method
> altogether, and it was only kept because of some niche use cases.  I
> don't think "fetch" is the most safe method, because it requires
> wal_keep_segments to be configured.
> 
> 
> Eh. Yes. That's exactly what this patch does, is it not?

Yes, I misread the patch.  Works for me, then.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 7:36 PM, Joshua D. Drake 
wrote:

> On 12/15/2016 10:23 AM, Peter Eisentraut wrote:
>
>> On 11/8/16 12:45 PM, Magnus Hagander wrote:
>>
>>> Per some discussions with a number of different people at pgconfeu, here
>>> is a patch that changes the default mode of pg_basebackup to be
>>> streaming the wal, as this is what most users would want -- and those
>>> that don't want it have to make other changes as well. Doing the "most
>>> safe" thing by default is a good idea.
>>>
>>> The new option "-x none" is provided to turn this off and get the
>>> previous behavior back.
>>>
>>
>> I would have expected that the "stream" method would become the default.
>>  Just a short while ago it was proposed to remove the "fetch" method
>> altogether, and it was only kept because of some niche use cases.  I
>> don't think "fetch" is the most safe method, because it requires
>> wal_keep_segments to be configured.
>>
>
> IMO, if you are using fetch just using archive_command. Let's rip it out
> of pg_basebackup or at least deprecate it.
>

Fetch runs fine without ssh keys. Especially important on platforms without
native ssh. It's also pull rather than push. And it fetches just the xlog
you need, and not *all* of it, for one-off backups. I think it definitely
still has it's uses, but should of course not be the default.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-15 Thread Joshua D. Drake

On 12/15/2016 10:23 AM, Peter Eisentraut wrote:

On 11/8/16 12:45 PM, Magnus Hagander wrote:

Per some discussions with a number of different people at pgconfeu, here
is a patch that changes the default mode of pg_basebackup to be
streaming the wal, as this is what most users would want -- and those
that don't want it have to make other changes as well. Doing the "most
safe" thing by default is a good idea.

The new option "-x none" is provided to turn this off and get the
previous behavior back.


I would have expected that the "stream" method would become the default.
 Just a short while ago it was proposed to remove the "fetch" method
altogether, and it was only kept because of some niche use cases.  I
don't think "fetch" is the most safe method, because it requires
wal_keep_segments to be configured.


IMO, if you are using fetch just using archive_command. Let's rip it out 
of pg_basebackup or at least deprecate it.


jD






--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 7:23 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 11/8/16 12:45 PM, Magnus Hagander wrote:
> > Per some discussions with a number of different people at pgconfeu, here
> > is a patch that changes the default mode of pg_basebackup to be
> > streaming the wal, as this is what most users would want -- and those
> > that don't want it have to make other changes as well. Doing the "most
> > safe" thing by default is a good idea.
> >
> > The new option "-x none" is provided to turn this off and get the
> > previous behavior back.
>
> I would have expected that the "stream" method would become the default.
>  Just a short while ago it was proposed to remove the "fetch" method
> altogether, and it was only kept because of some niche use cases.  I
> don't think "fetch" is the most safe method, because it requires
> wal_keep_segments to be configured.
>
>
Eh. Yes. That's exactly what this patch does, is it not?

I'd say the main reason fetch was kept around was to support tar file mode,
which previously did not support streaming. I don't really see any other
usecase where it was better than stream.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-15 Thread Peter Eisentraut
On 11/8/16 12:45 PM, Magnus Hagander wrote:
> Per some discussions with a number of different people at pgconfeu, here
> is a patch that changes the default mode of pg_basebackup to be
> streaming the wal, as this is what most users would want -- and those
> that don't want it have to make other changes as well. Doing the "most
> safe" thing by default is a good idea.
> 
> The new option "-x none" is provided to turn this off and get the
> previous behavior back.

I would have expected that the "stream" method would become the default.
 Just a short while ago it was proposed to remove the "fetch" method
altogether, and it was only kept because of some niche use cases.  I
don't think "fetch" is the most safe method, because it requires
wal_keep_segments to be configured.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing and Tuple Deforming (including JIT)

2016-12-15 Thread Bruce Momjian
On Tue, Dec  6, 2016 at 11:10:59AM -0800, Andres Freund wrote:
> > I concur with your feeling that hand-rolled JIT is right out.  But
> > I'm not sure that whatever performance gain we might get in this
> > direction is worth the costs.
> 
> Well, I'm not impartial, but I don't think we do our users a service by
> leaving significant speedups untackled, and after spending a *LOT* of
> time on this, I don't see much other choice than JITing. Note that
> nearly everything performance sensitive is moving towards doing JITing
> in some form or another.

Agreed, we don't really have a choice.  After all the optimizations we
have done to so many subsystems, our executor is relatively slow and is
a major drag on our system, specifically for long-running queries.  The
base problem with the executor are the state machines at so many levels,
and we really can't optimize that while keeping a reasonable maintenance
burden.

This is where JIT and LLVM help.  I outlined two external projects that
were researching this in this blog entry:

http://momjian.us/main/blogs/pgblog/2016.html#April_1_2016

I am excited to now be seeing WIP code.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Build HTML documentation using XSLT stylesheets by default

2016-12-15 Thread Tom Lane
Alexander Law  writes:
> Hello Alvaro,
> It's caused by the condition
> ...
> in the simple.xlink template 
> (docbook/stylesheet/docbook-xsl/xhtml/inline.xsl). (This test executed 
> for each xlink (~ 9 times)).
> Yes, it's inefficient but it doesn't affect build time (for me).
> You can try to apply the attached patch and measure the time with it.

For me, that reduces the "make html" time from 1m44s to 1m43s.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/12/13 23:13, Ashutosh Bapat wrote:
>> On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:
>>> One way that we could make things better is to rely on the knowledge
>>> that EPQ isn't asked to evaluate joins for more than one row per input
>>> relation, and therefore using merge or hash join technology is really
>>> overkill.  We could make a tree of simple nestloop joins, which aren't
>>> going to care about sort order, if we could identify the correct join
>>> clauses to apply.

>> I am not able to understand how this strategy of applying all join
>> clauses on top of cross product would work if there are OUTER joins
>> involved. Wouldn't nullable sides change the result?

> I'm not able to understand that, either.

Yeah, you'd have to reproduce the outer join structure if any.

>> A possible short-term fix may be this: instead of picking any random
>> path to stick into fdw_outerpath, we choose a path which covers the
>> pathkeys of ForeignPath.

> Seems reasonable.

No, because GetExistingLocalJoinPath is called once per relation not once
per path.  Even if we were willing to eat the overhead of calling it once
per path, we'd have to give up considering foreign paths with sort orders
that there wasn't any cheap way to produce locally.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Kevin Grittner
On Thu, Dec 15, 2016 at 9:53 AM, Ian Jackson  wrote:

> I don't think "set max_pred_locks_per_transaction generously" is a
> practical thing to write in the documentation, because the application
> programmer, or admin, has no sensible way to calculate what a
> sufficiently generous value is.

ok

> You seem to be implying that code relying on the summarised data might
> make over-optimistic decisions.  That seems dangerous to me, but (with
> my very dim view of database innards) I can't immediately see how to
> demonstrate that it must in any case be excluded.

No, with any of these conditions, the information on which a
decision to generate a serialization failure is summarized into
something less granular, and in all cases we turn any "in doubt"
situations into serialization failures; that is, you can get false
positives (a serialization failure exception where complete
information could have avoided it) but not false negatives (a
serialization anomaly appearing in the database or query results
from a transaction which commits).  Based on that alone, I think it
is fair to say that it does not weaken guarantees about
serialization failures for read-only transactions not being
possible on commit unless the initial exception is suppressed in a
subtransaction nor that anomalous results are not possible in a
read-only transaction.  The granularity promotion of predicate
locks could not affect the guarantees about never seeing a
serialization failure on the first statement that reads data in a
read-only transaction, but I would need to take a very close look
at how the SLRU summarization of committed transactions might
affect that one -- we lose some of the detail about the relative
order of the commits and snapshot acquisitions, and that might be
enough to allow a false positive on that first statement.  That
would require more study than I can give it this month.

I do remember that Dan ran some saturation workloads to stress this
feature for days and weeks at a time pushing things to the point of
using the SLRU summarization, and I remember thinking it odd that
certain tests generated zero errors on the read-only transactions,
which I'm pretty sure were single-statement transactions.  It was
only during this week's discussion that I had the epiphany about
that only being possible if the read-only transaction had multiple
statements; but the fact that such long saturation runs with SLRU
summarization showed no errors on read-only transactions supports
the idea that such summarization doesn't compromise that guarantee.
Unfortunately, it falls short of proof.  :-(

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-12-15 Thread Amit Kapila
On Wed, Dec 14, 2016 at 10:47 PM, Robert Haas  wrote:
> On Wed, Dec 14, 2016 at 4:27 AM, Amit Kapila  wrote:
>>
>> Yeah, it will fix the problem in hashbucketcleanup, but there are two
>> other problems that need to be fixed for which I can send a separate
>> patch.  By the way, as mentioned to you earlier that WAL patch already
>> takes care of removing _hash_wrtbuf and simplified the logic wherever
>> possible without introducing the logic of MarkBufferDirty multiple
>> times under one lock.  However, if you want to proceed with the
>> current patch, then I can send you separate patches for the remaining
>> problems as addressed in bug fix patch.
>
> That sounds good.
>

Attached are the two patches on top of remove-hash-wrtbuf.   Patch
fix_dirty_marking_v1.patch allows to mark the buffer dirty in one of
the corner cases in _hash_freeovflpage() and avoids to mark dirty
without need in _hash_squeezebucket().  I think this can be combined
with remove-hash-wrtbuf patch. fix_lock_chaining_v1.patch fixes the
chaining behavior (lock next overflow bucket page before releasing the
previous bucket page) was broken in _hash_freeovflpage().  These
patches can be applied in series, first remove-hash-wrtbuf, then
fix_dirst_marking_v1.patch and then fix_lock_chaining_v1.patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dirty_marking_v1.patch
Description: Binary data


fix_lock_chaining_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow I/O can break throttling of base backup

2016-12-15 Thread Antonin Houska
It seems to be my bug. I'll check tomorrow.

Magnus Hagander  wrote:

> Running pg_basebackup with a throttling of say 10M runs it into the risk of
> the I/O on the server actually being slower than pg_basebackup (I have
> preproduced similar issues on fake-slow disks with lower rate limits).
> 
> What happens in this case in basebackup.c is that the value for "sleep"
> comes out negative. This means we don't sleep, which is fine.
> 
> However, that also means we don't set throttle_last.
> 
> That means that the next time we come around to throttle(), the value for
> "elapsed" is even bigger, which results in an even bigger negative number,
> and we're "stuck".
> 
> AFAICT this means that a temporary I/O spike that makes reading of the disk
> too slow can leave us in a situation where we never recover, and thus never
> end up sleeping ever again, effectively turning off rate limiting.
> 
> I wonder if the else if (sleep > 0) at the bottom of throttle() should just
> be a simple else and always run, resetting last_throttle?
> 
> -- 
> Magnus Hagander
> Me: http://www.hagander.net/
> Work: http://www.redpill-linpro.com/
> 

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Ian Jackson
Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation [and 2 more messages] [and 1 more messages]"):
> On Thu, Dec 15, 2016 at 6:09 AM, Ian Jackson  
> wrote:
> > [...]  Are there other reasons,
> > besides previously suppressed serialisation failures, why commit of a
> > transaction that did only reads[1] might fail ?
> 
> I'm pretty confident that if you're not using prepared transactions
> the answer is "no".  [...] I fear that [for now], if "pre-crash"
> prepared transactions are still open, some of the deductions above
> may not hold.

I think it is reasonable to write in the documentation "if you use
prepared transactions, even read only serialisable transctions might
throw a serialisation failure during commit, and they might do so
after returning data which is not consistent with any global
serialisation".

Prepared transactions are a special purpose feature intended for use
by external transaction management software which I hope could cope
with a requirement to not trust data from a read only transaction
until it had been committed.  (Also, frankly, the promise that a
prepared transaction is can be committed successfully with "very high
probability" is not sufficiently precise to be of use when building
robust software at the next layer up.)

> One other situation in which I'm not entirely sure, and it would
> take me some time to review code to be sure, is if
> max_pred_locks_per_transaction is not set high enough to
> accommodate tracking all serializable transactions in allocated RAM
> (recognizing that they must often be tracked after commit, until
> overlapping serializable transactions commit), we have a mechanism
> to summarize some of the committed transactions and spill them to
> disk (using an internal SLRU module).  The summarized data might
> not be able to determine all of the above as precisely as the
> "normal" data tracked in RAM.  To avoid this, be generous when
> setting max_pred_locks_per_transaction; not only will it avoid this
> summarization, but it will reduce the amount of summarization of
> multiple page locks in the predicate locking system to relation
> locks.  Coarser locks increase the "false positive" rate of
> serialization failures, reducing performance.

I don't think "set max_pred_locks_per_transaction generously" is a
practical thing to write in the documentation, because the application
programmer, or admin, has no sensible way to calculate what a
sufficiently generous value is.

You seem to be implying that code relying on the summarised data might
make over-optimistic decisions.  That seems dangerous to me, but (with
my very dim view of database innards) I can't immediately see how to
demonstrate that it must in any case be excluded.

But, I think this can only be a problem (that is, it can only cause a
return of un-serialisable results within such a transaction) if, after
such a spill, COMMIT would recalculate the proper answers, in full,
and thus be able to belatedly report the serialisation failure.  Is
that the case ?

> > If so presumably it always throws a serialisation failure at that
> > point.  I think that is then sufficient.  There is no need to tell the
> > application programmer they have to commit even transactions which
> > only read.
> 
> Well, if they don't explicitly start a transaction there is no need
> to explicitly commit it, period.  [...]

Err, yes, I meant multi-statement transactions.  (Or alternatively by
"have to commit" I meant to include the implicit commit of an implicit
transaction.)

> If you can put together a patch to improve the documentation, that
> is always welcome!

Thanks.  I hope I will be able to do that.  Right now I am still
trying to figure out what guarantees the application programmer can be
offered.

Regards,
Ian.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : composite type not null constraints

2016-12-15 Thread Tom Lane
Wesley Massuda  writes:
> I would like to propose extending composite types with constraints.

I'm not convinced we should go there, *particularly* not for not-null
constraints.  Data-type-related not-null constraints are broken by design,
because they are fundamentally inconsistent with outer joins.  Here's an
example:

regression=# create domain nnint as int check (value is not null);
CREATE DOMAIN
regression=# create table tt (id int, nn nnint);
CREATE TABLE
regression=# insert into tt values (1,1), (2,1);
INSERT 0 2
regression=# select * from tt a left join tt b on a.id = b.nn;
 id | nn | id | nn 
+++
  1 |  1 |  1 |  1
  1 |  1 |  2 |  1
  2 |  1 ||   
(3 rows)

We have here a column that claims to be of type nnint but contains
nulls.

The only really good solution to this problem for domains, IMO,
is to consider that the type of the join output column is changed
to the domain's base type, so that there's no relevant constraint
for it to violate.  We don't do that at present, but if anyone got
really bent out of shape about this behavior, I'd tell them to
submit a patch that does that.

However, if we allow constraints on elements of a composite type,
there's no similar "out" available to describe values that look
like the type but don't satisfy its constraints.  That's a hard
place that I don't want to get wedged into.

There are other implementation problems that you'd fall foul of
as well, for instance that plpgsql lacks any way to initialize
rowtype variables to non-null.  Those are probably surmountable
with enough work, but the outer join problem is built into SQL.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing newlines in error messages

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 4:38 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > While poking around the pg_basebackup code, I noticed there are a lot of
> > error messages are missing the \n at the end. Most of them are very
> > unlikely to happen, but in general I believe all error messages we print
> to
> > stderr from those binaries should have a newline at the end.
>
> > Is that wrong? :)
>
> I think you missed the fact that PQerrorMessage's result will already
> end with a newline.  So most (not all) of these look OK as they stand.
>

Ha, I knew there was something. But somehow I forgot about that one. I was
wondering why nobody had found it before :)

And yeah, that leaves just the one error that I actually bumped into.  I'll
go ahead and apply that one alone then.

Pah. Less coding on jetlag.

Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Declarative partitioning - another take

2016-12-15 Thread Dmitry Ivanov

Hi everyone,

Looks like "sql_inheritance" GUC is affecting partitioned tables:

explain (costs off) select * from test;
 QUERY PLAN  
--

Append
  ->  Seq Scan on test
  ->  Seq Scan on test_1
  ->  Seq Scan on test_2
  ->  Seq Scan on test_1_1
  ->  Seq Scan on test_1_2
  ->  Seq Scan on test_1_1_1
  ->  Seq Scan on test_1_2_1
(8 rows)


set sql_inheritance = off;


explain (costs off) select * from test;
   QUERY PLAN
--

Seq Scan on test
(1 row)


I might be wrong, but IMO this should not happen. Queries involving update, 
delete etc on partitioned tables are basically broken. Moreover, there's no 
point in performing such operations on a parent table that's supposed to be 
empty at all times.


I've come up with a patch which fixes this behavior for UPDATE, DELETE, 
TRUNCATE and also in transformTableEntry(). It might be hacky, but it gives 
an idea.


I didn't touch RenameConstraint() and renameatt() since this would break 
ALTER TABLE ONLY command.



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7a574dc..67e118e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1183,7 +1183,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 	{
 		RangeVar   *rv = lfirst(cell);
 		Relation	rel;
-		bool		recurse = interpretInhOption(rv->inhOpt);
+		bool		recurse;
 		Oid			myrelid;
 
 		rel = heap_openrv(rv, AccessExclusiveLock);
@@ -1198,6 +1198,12 @@ ExecuteTruncate(TruncateStmt *stmt)
 		rels = lappend(rels, rel);
 		relids = lappend_oid(relids, myrelid);
 
+		/* Use interpretInhOption() unless it's a partitioned table */
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+			recurse = interpretInhOption(rv->inhOpt);
+		else
+			recurse = true;
+
 		if (recurse)
 		{
 			ListCell   *child;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 5e65fe7..a3772f7 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -367,6 +367,7 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
 	Query	   *qry = makeNode(Query);
 	ParseNamespaceItem *nsitem;
 	Node	   *qual;
+	RangeTblEntry *rte;
 
 	qry->commandType = CMD_DELETE;
 
@@ -384,6 +385,11 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
 		 true,
 		 ACL_DELETE);
 
+	/* Set "inh" if table is partitioned */
+	rte = rt_fetch(qry->resultRelation, pstate->p_rtable);
+	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+		rte->inh = true;
+
 	/* grab the namespace item made by setTargetTable */
 	nsitem = (ParseNamespaceItem *) llast(pstate->p_namespace);
 
@@ -2164,6 +2170,7 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 	Query	   *qry = makeNode(Query);
 	ParseNamespaceItem *nsitem;
 	Node	   *qual;
+	RangeTblEntry *rte;
 
 	qry->commandType = CMD_UPDATE;
 	pstate->p_is_insert = false;
@@ -2181,6 +2188,11 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 		 true,
 		 ACL_UPDATE);
 
+	/* Set "inh" if table is partitioned */
+	rte = rt_fetch(qry->resultRelation, pstate->p_rtable);
+	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+		rte->inh = true;
+
 	/* grab the namespace item made by setTargetTable */
 	nsitem = (ParseNamespaceItem *) llast(pstate->p_namespace);
 
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 751de4b..215ec73 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -439,6 +439,10 @@ transformTableEntry(ParseState *pstate, RangeVar *r)
 	rte = addRangeTableEntry(pstate, r, r->alias,
 			 interpretInhOption(r->inhOpt), true);
 
+	/* Set "inh" if table is partitioned */
+	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+		rte->inh = true;
+
 	return rte;
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing newlines in error messages

2016-12-15 Thread Tom Lane
Magnus Hagander  writes:
> While poking around the pg_basebackup code, I noticed there are a lot of
> error messages are missing the \n at the end. Most of them are very
> unlikely to happen, but in general I believe all error messages we print to
> stderr from those binaries should have a newline at the end.

> Is that wrong? :)

I think you missed the fact that PQerrorMessage's result will already
end with a newline.  So most (not all) of these look OK as they stand.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal : composite type not null constraints

2016-12-15 Thread Wesley Massuda
I would like to propose extending composite types with constraints.
Currently there is a preliminar patch for not null constraints.

===
Changes :
  -Use the parser from create table to get the constraints
'OptTableElementList' instead of 'OptTableFuncElementList'.
  - Add a new transformation transformCompositeTypeStmt similar to
transformCreateStmt to add constraint informations to stmt.
   - Implement a recursive notnull check  when a type is composite
==
Features enabled :
 - Enable not null in parser for
create type tyname AS (  a  type NOT NULL )
 - Check for null in rowtypes

===



Wesley S. Massuda
From 5ef7ec081528f8405d34a4fb2155e74fdcd95094 Mon Sep 17 00:00:00 2001
From: Wesley Sales Massuda 
Date: Thu, 15 Dec 2016 12:29:42 -0200
Subject: [PATCH] add support for not null constraints for composite types add
 not null insert check for row_types

---
 src/backend/executor/execMain.c   |  78 +++
 src/backend/parser/gram.y |   2 +-
 src/backend/parser/parse_utilcmd.c| 101 ++
 src/backend/tcop/utility.c|   2 +-
 src/test/regress/expected/create_type.out |   2 +-
 src/test/regress/sql/create_type.sql  |   2 +-
 6 files changed, 183 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d43a204..b452fd6 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1756,6 +1756,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
 	Assert(constr || resultRelInfo->ri_PartitionCheck);
 
 	if (constr && constr->has_not_null)
+
 	{
 		int			natts = tupdesc->natts;
 		int			attrChk;
@@ -1783,6 +1784,13 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
 		 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0,
 		 errtablecol(rel, attrChk)));
 			}
+  Oid typoid = tupdesc->attrs[attrChk -1]->atttypid;
+  bool isrow = type_is_rowtype(typoid);
+  if(isrow){
+bool isnull;
+Datum dat = slot_getattr(slot,attrChk ,);
+composite_isnull(tupdesc,resultRelInfo ,slot,attrChk -1,dat,estate);
+  }
 		}
 	}
 
@@ -1831,7 +1839,77 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
 		  val_desc ? errdetail("Failing row contains %s.", val_desc) : 0));
 	}
 }
+void
+composite_isnull(TupleDesc topdesc,ResultRelInfo *resultRelInfo,TupleTableSlot *slot,int attnum,Datum composite, EState *estate)
+{
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+	HeapTupleHeader td;
+	Oid			tupType;
+	int32		tupTypmod;
+	TupleDesc	tupdesc;
+	HeapTupleData tmptup,
+			   *tuple;
+	int			i;
+
+	Bitmapset  *modifiedCols;
+	Bitmapset  *insertedCols;
+	Bitmapset  *updatedCols;
+
+	td = DatumGetHeapTupleHeader(composite);
+
+	/* Extract rowtype info and find a tupdesc */
+	tupType = HeapTupleHeaderGetTypeId(td);
+	tupTypmod = HeapTupleHeaderGetTypMod(td);
+	tupdesc = lookup_rowtype_tupdesc_copy(tupType, tupTypmod);
+
+	/* Build a temporary HeapTuple control structure */
+	tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
+	tmptup.t_data = td;
+	tuple = 
+
+
+	for (i = 0; i < tupdesc->natts; i++)
+	{
+		Datum		val;
+		bool		isnull;
+		char	   *attname;
+		Oid			outfuncoid;
+
+	if (tupdesc->attrs[i]->attisdropped)
+			continue;
+  if(tupdesc->attrs[i]->attnotnull ){
+  Datum att = heap_getattr(tuple,i+1,tupdesc,);
+  if( isnull){
+char	   *val_desc;
+insertedCols = GetInsertedColumns(resultRelInfo, estate);
+updatedCols = GetUpdatedColumns(resultRelInfo, estate);
+modifiedCols = bms_union(insertedCols, updatedCols);
+val_desc = ExecBuildSlotValueDescription(RelationGetRelid(rel),
+		 slot,
+		 topdesc,
+		 modifiedCols,
+		 64);
 
+ereport(ERROR,
+		(errcode(ERRCODE_NOT_NULL_VIOLATION),
+		 errmsg("null value in column \"%s\" violates not-null constraint",
+			  NameStr(tupdesc->attrs[i]->attname)),
+		 val_desc ? errdetail("Failing row contains %s.", val_desc) : 0,
+		 errtablecol(rel, attnum +1 )));
+
+}
+  
+Oid typoid = tupdesc->attrs[i]->atttypid;
+bool isrow = type_is_rowtype(typoid);
+if(isrow){
+composite_isnull(topdesc,resultRelInfo ,slot,attnum,att,estate);
+}
+  }
+
+	}
+
+	ReleaseTupleDesc(tupdesc);
+}
 /*
  * ExecWithCheckOptions -- check that tuple satisfies any WITH CHECK OPTIONs
  * of the specified kind.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2ed7b52..2e873bb 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5504,7 +5504,7 @@ DefineStmt:
 	n->definition = NIL;
 	$$ = (Node *)n;
 }
-			| CREATE TYPE_P any_name AS '(' OptTableFuncElementList ')'
+			| CREATE TYPE_P any_name AS '(' OptTableElementList ')'
 {
 	CompositeTypeStmt *n = makeNode(CompositeTypeStmt);
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index cc6a961..d5624f0 100644
--- 

Re: [HACKERS] Missing newlines in error messages

2016-12-15 Thread Magnus Hagander
(oops, accidental send button press)


While poking around the pg_basebackup code, I noticed there are a lot of
error messages are missing the \n at the end. Most of them are very
unlikely to happen, but in general I believe all error messages we print to
stderr from those binaries should have a newline at the end.

Is that wrong? :)

Second -- if we were to backpatch something like the attached patch, is
that going to cause issues for translators?

On Thu, Dec 15, 2016 at 4:33 PM, Magnus Hagander 
wrote:

> While poking around the pg_basebackup code, I noticed there are a lot of
> error messa
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e2875df..f0f7126 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -192,7 +192,7 @@ cleanup_directories_atexit(void)
 
 	if (made_tablespace_dirs || found_tablespace_dirs)
 		fprintf(stderr,
-_("%s: changes to tablespace directories will not be undone"),
+_("%s: changes to tablespace directories will not be undone\n"),
 progname);
 }
 
@@ -1027,7 +1027,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COPY_OUT)
 	{
-		fprintf(stderr, _("%s: could not get COPY data stream: %s"),
+		fprintf(stderr, _("%s: could not get COPY data stream: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1108,7 +1108,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 		}
 		else if (r == -2)
 		{
-			fprintf(stderr, _("%s: could not read COPY data: %s"),
+			fprintf(stderr, _("%s: could not read COPY data: %s\n"),
 	progname, PQerrorMessage(conn));
 			disconnect_and_exit(1);
 		}
@@ -1295,7 +1295,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COPY_OUT)
 	{
-		fprintf(stderr, _("%s: could not get COPY data stream: %s"),
+		fprintf(stderr, _("%s: could not get COPY data stream: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1324,7 +1324,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 		}
 		else if (r == -2)
 		{
-			fprintf(stderr, _("%s: could not read COPY data: %s"),
+			fprintf(stderr, _("%s: could not read COPY data: %s\n"),
 	progname, PQerrorMessage(conn));
 			disconnect_and_exit(1);
 		}
@@ -1736,7 +1736,7 @@ BaseBackup(void)
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
-		fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+		fprintf(stderr, _("%s: could not send replication command \"%s\": %s\n"),
 progname, "BASE_BACKUP", PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1747,7 +1747,7 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _("%s: could not initiate base backup: %s"),
+		fprintf(stderr, _("%s: could not initiate base backup: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1783,7 +1783,7 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
-		fprintf(stderr, _("%s: could not get backup header: %s"),
+		fprintf(stderr, _("%s: could not get backup header: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1864,7 +1864,7 @@ BaseBackup(void)
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
 	{
 		fprintf(stderr,
-		 _("%s: could not get transaction log end position from server: %s"),
+		 _("%s: could not get transaction log end position from server: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
@@ -1883,7 +1883,7 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
+		fprintf(stderr, _("%s: final receive failed: %s\n"),
 progname, PQerrorMessage(conn));
 		disconnect_and_exit(1);
 	}
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index cb5f989..3b7e90a 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -151,7 +151,7 @@ sendFeedback(PGconn *conn, int64 now, bool force, bool replyRequested)
 
 	if (PQputCopyData(conn, replybuf, len) <= 0 || PQflush(conn))
 	{
-		fprintf(stderr, _("%s: could not send feedback packet: %s"),
+		fprintf(stderr, _("%s: could not send feedback packet: %s\n"),
 progname, PQerrorMessage(conn));
 		return false;
 	}
@@ -261,7 +261,7 @@ StreamLogicalLog(void)
 	res = PQexec(conn, query->data);
 	if (PQresultStatus(res) != PGRES_COPY_BOTH)
 	{
-		fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
+		fprintf(stderr, _("%s: could 

[HACKERS] Missing newlines in error messages

2016-12-15 Thread Magnus Hagander
While poking around the pg_basebackup code, I noticed there are a lot of
error messa

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Slow I/O can break throttling of base backup

2016-12-15 Thread Magnus Hagander
Running pg_basebackup with a throttling of say 10M runs it into the risk of
the I/O on the server actually being slower than pg_basebackup (I have
preproduced similar issues on fake-slow disks with lower rate limits).

What happens in this case in basebackup.c is that the value for "sleep"
comes out negative. This means we don't sleep, which is fine.

However, that also means we don't set throttle_last.

That means that the next time we come around to throttle(), the value for
"elapsed" is even bigger, which results in an even bigger negative number,
and we're "stuck".

AFAICT this means that a temporary I/O spike that makes reading of the disk
too slow can leave us in a situation where we never recover, and thus never
end up sleeping ever again, effectively turning off rate limiting.

I wonder if the
else if (sleep > 0)
at the bottom of throttle()
should just be a simple else and always run, resetting last_throttle?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Transaction oddity with list partition of a list partition

2016-12-15 Thread David Fetter
On Thu, Dec 15, 2016 at 06:20:04PM +0900, Amit Langote wrote:
> 
> Hi David,
> 
> On 2016/12/15 18:09, David Fetter wrote:
> > Per Thomas Munro, could it be that the CREATE ... PARTITION OF ...
> > code fails to run CacheInvalidateRelcache on its parent(s)?
> 
> Thomas's right.  There is a patch posted for this issue [1]; I'm
> sending an updated version of the patch later today in reply to [1].
> Meanwhile, could you try and see if the problem is fixed with the
> attached patch.

That fixed both cases.  Thanks!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: session server side variables

2016-12-15 Thread Pavel Stehule
Hi

Most important features:
>>
>> 1. the values are stored in native types
>> 2. access to content is protected by ACL - like the content of tables
>> 3. the content is not MVCC based - no any cost of UPDATE
>> 4. simple API allows access to content of variables from any supported
>> environment.
>>
>
> next update - setattr, getattr functions are working now
>

new update - rebased after partitioning patch

Regards

Pavel


>
> notes, comments?
>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>> --
>>>  Craig Ringer   http://www.2ndQuadrant.com/
>>>  PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>>
>
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 3086021..6bd88b8 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -274,6 +274,9 @@ restrict_and_check_grant(bool is_grant, AclMode avail_goptions, bool all_privs,
 		case ACL_KIND_TYPE:
 			whole_mask = ACL_ALL_RIGHTS_TYPE;
 			break;
+		case ACL_KIND_VARIABLE:
+			whole_mask = ACL_ALL_RIGHTS_VARIABLE;
+			break;
 		default:
 			elog(ERROR, "unrecognized object kind: %d", objkind);
 			/* not reached, but keep compiler quiet */
@@ -488,6 +491,10 @@ ExecuteGrantStmt(GrantStmt *stmt)
 			all_privileges = ACL_ALL_RIGHTS_FOREIGN_SERVER;
 			errormsg = gettext_noop("invalid privilege type %s for foreign server");
 			break;
+		case ACL_OBJECT_VARIABLE:
+			all_privileges = ACL_ALL_RIGHTS_VARIABLE;
+			errormsg = gettext_noop("invalid privilege type %s for variable");
+			break;
 		default:
 			elog(ERROR, "unrecognized GrantStmt.objtype: %d",
  (int) stmt->objtype);
@@ -558,6 +565,7 @@ ExecGrantStmt_oids(InternalGrant *istmt)
 	{
 		case ACL_OBJECT_RELATION:
 		case ACL_OBJECT_SEQUENCE:
+		case ACL_OBJECT_VARIABLE:
 			ExecGrant_Relation(istmt);
 			break;
 		case ACL_OBJECT_DATABASE:
@@ -625,6 +633,7 @@ objectNamesToOids(GrantObjectType objtype, List *objnames)
 	{
 		case ACL_OBJECT_RELATION:
 		case ACL_OBJECT_SEQUENCE:
+		case ACL_OBJECT_VARIABLE:
 			foreach(cell, objnames)
 			{
 RangeVar   *relvar = (RangeVar *) lfirst(cell);
@@ -775,6 +784,10 @@ objectsInSchemaToOids(GrantObjectType objtype, List *nspnames)
 objs = getRelationsInNamespace(namespaceId, RELKIND_SEQUENCE);
 objects = list_concat(objects, objs);
 break;
+			case ACL_OBJECT_VARIABLE:
+objs = getRelationsInNamespace(namespaceId, RELKIND_VARIABLE);
+objects = list_concat(objects, objs);
+break;
 			case ACL_OBJECT_FUNCTION:
 {
 	ScanKeyData key[1];
@@ -950,6 +963,10 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 			all_privileges = ACL_ALL_RIGHTS_TYPE;
 			errormsg = gettext_noop("invalid privilege type %s for type");
 			break;
+		case ACL_OBJECT_VARIABLE:
+			all_privileges = ACL_ALL_RIGHTS_VARIABLE;
+			errormsg = gettext_noop("invalid privilege type %s for variable");
+			break;
 		default:
 			elog(ERROR, "unrecognized GrantStmt.objtype: %d",
  (int) action->objtype);
@@ -1137,6 +1154,12 @@ SetDefaultACL(InternalDefaultACL *iacls)
 this_privileges = ACL_ALL_RIGHTS_TYPE;
 			break;
 
+		case ACL_OBJECT_VARIABLE:
+			objtype = DEFACLOBJ_VARIABLE;
+			if (iacls->all_privs && this_privileges == ACL_NO_RIGHTS)
+this_privileges = ACL_ALL_RIGHTS_VARIABLE;
+			break;
+
 		default:
 			elog(ERROR, "unrecognized objtype: %d",
  (int) iacls->objtype);
@@ -1363,6 +1386,9 @@ RemoveRoleFromObjectACL(Oid roleid, Oid classid, Oid objid)
 			case DEFACLOBJ_TYPE:
 iacls.objtype = ACL_OBJECT_TYPE;
 break;
+			case DEFACLOBJ_VARIABLE:
+iacls.objtype = ACL_OBJECT_VARIABLE;
+break;
 			default:
 /* Shouldn't get here */
 elog(ERROR, "unexpected default ACL type: %d",
@@ -1710,7 +1736,7 @@ ExecGrant_Attribute(InternalGrant *istmt, Oid relOid, const char *relname,
 }
 
 /*
- *	This processes both sequences and non-sequences.
+ *	This processes both sequences, variables and others.
  */
 static void
 ExecGrant_Relation(InternalGrant *istmt)
@@ -1767,11 +1793,21 @@ ExecGrant_Relation(InternalGrant *istmt)
 	 errmsg("\"%s\" is not a sequence",
 			NameStr(pg_class_tuple->relname;
 
+		/* Used GRANT VARIABLE on a non-variable? */
+		if (istmt->objtype == ACL_OBJECT_VARIABLE &&
+			pg_class_tuple->relkind != RELKIND_VARIABLE)
+			ereport(ERROR,
+	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+	 errmsg("\"%s\" is not a variable",
+			NameStr(pg_class_tuple->relname;
+
 		/* Adjust the default permissions based on object type */
 		if (istmt->all_privs && istmt->privileges == ACL_NO_RIGHTS)
 		{
 			if (pg_class_tuple->relkind == RELKIND_SEQUENCE)
 this_privileges = ACL_ALL_RIGHTS_SEQUENCE;
+			else if (pg_class_tuple->relkind == RELKIND_VARIABLE)
+this_privileges = ACL_ALL_RIGHTS_VARIABLE;
 			else
 this_privileges = ACL_ALL_RIGHTS_RELATION;
 		}
@@ -1864,6 +1900,9 @@ ExecGrant_Relation(InternalGrant *istmt)
 case RELKIND_SEQUENCE:
 	old_acl = 

Re: [HACKERS] Logical Replication WIP

2016-12-15 Thread Petr Jelinek
On 15/12/16 13:06, Craig Ringer wrote:
> On 15 Dec. 2016 18:19, "Petr Jelinek"  > wrote:
> 
> On 13/12/16 21:42, Peter Eisentraut wrote:
> > On 12/10/16 2:48 AM, Petr Jelinek wrote:
> >> Attached new version with your updates and rebased on top of the
> current
> >> HEAD (the partitioning patch produced quite a few conflicts).
> >
> > I have attached a few more "fixup" patches, mostly with some
> editing of
> > documentation and comments and some compiler warnings.
> >
> > In 0006 in the protocol documentation I have left a "XXX ???" where I
> > didn't understand what it was trying to say.
> >
> 
> Ah so you didn't understand the
> > +Identifies the following TupleData submessage as
> a key.
> > +This field is optional and is only present if
> > +the update changed the REPLICA IDENTITY index. XXX???
> 
> So what happens here is that the update message can contain one or two
> out of 3 possible tuple submessages. It always contains 'N' message
> which is the new data. Then it can optionally contain 'O' message with
> old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
> IDENTITY index like pkey, etc). Or it can include 'K' message that only
> contains old data for the columns in the REPLICA IDENTITY index. But if
> the REPLICA IDENTITY index didn't change (ie, old and new would be same
> for those columns) we simply omit the 'K' message and let the downstream
> take the key data from the 'N' message to save space.
> 
> 
> Something we forgot to bake into pglogical that might be worth leaving
> room for here: sending the whole old tuple, with some fields marked as key.
> 
> So you can use replica identity pkey or whatever and the downstream
> knows which are the key fields. But can still transmit the whole old
> tuple in case the downstream wants it for conflict resolution/logging/etc.
> 
> We don't have the logical decoding and wal output for this yet, nor a
> way of requesting old tuple recording table by table. So all i'm
> suggesting is leaving room in the protocol.
> 

Not really sure I follow, which columns are keys is not part of the info
in the data message, it's part of relation message, so it's already
possible in the protocol. Also the current implementation is fully
capable of taking advantage of PK on downstream even with REPLICA
IDENTITY FULL.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Kevin Grittner
On Thu, Dec 15, 2016 at 6:09 AM, Ian Jackson  wrote:

> However, in that example, as you seem to allude to, there is still a
> complete serialisation of all the transactions, even the failed T3:
> T1,T2,T3.  The database has detected the problem before returning data
> in T3 that would contradict the serialisation order.

In that case, yes.  Thinking about it, I suspect that a read-only
transaction will never actually return results that need to be
ignored.  Let me take a quick run at an argument to support that.
I'm working from the previous conclusion about read-only
transactions: that the read-only transaction will only be the one
to experience serialization failure if the other two transactions
involved in the "dangerous structure" have already committed
without developing a serialization failure, and the failure will be
detected during a read of the data by the read-only transaction --
never during commit.  Catching the initial exception and trying to
suppress it can cause it to resurface on the commit, but it would
have been initially detected and a serialization error thrown on
the read, but even if it is re-thrown on the commit, the initial
exception would have prevented the data which contradicted
already-committed state from being returned.

I also realized some other properties of read-only transactions
that might interest you (and that I should probably document).
Since the only way for a read-only transaction to be the on
experiencing a serialization failure is if Tout commits before the
read-only transaction (which is always Tin) acquires its snapshot,
Tpivot is still running when Tin acquires its snapshot, Tpivot
commits before a serialization failure involving Tin is detected,
and *then* Tin reads a data set affected by the writes of Tpivot.
Since a snapshot is only acquired when a statement is run which
requires a snapshot, that means that a query run in an implicit
transaction (i.e., there is no START or BEGIN statement to
explicitly start it; the SELECT or other data-accessing statement
automatically starts the transaction so it has a valid context in
which to run) that does not write data can never return bad results
nor receive a serialization failure.  Nor can those things happen
on the *first* or *only* non-writing statement in an explicit
transaction.

> The thing that was puzzling me, after having slept on it, and before I
> read your mail, was how it could happen that the serialisation failure
> (of a transaction that did only reads) would only be detected at
> commit.  The point about attempts to suppress the serialisation
> failure is part of the answer to that.  Are there other reasons,
> besides previously suppressed serialisation failures, why commit of a
> transaction that did only reads[1] might fail ?

I'm pretty confident that if you're not using prepared transactions
the answer is "no".  Our initial implementation of serializable
prepared transactions was found to have a bug after crash and when
dealing with the persisted data found during recovery.  The safest
way to fix that on stable branches was, until those prepared
transactions which were found during recovery were committed or
rolled back, to be *very* eager to throw serialization failures for
any new transactions which developed a rw-dependency with them.
This can be improved, but I fear that until and unless that
happens, if "pre-crash" prepared transactions are still open, some
of the deductions above may not hold.  If you don't use prepared
transactions, or promptly clean up any that were pending when a
server crashes, that should not be a problem, but it's probably
worth mentioning.

One other situation in which I'm not entirely sure, and it would
take me some time to review code to be sure, is if
max_pred_locks_per_transaction is not set high enough to
accommodate tracking all serializable transactions in allocated RAM
(recognizing that they must often be tracked after commit, until
overlapping serializable transactions commit), we have a mechanism
to summarize some of the committed transactions and spill them to
disk (using an internal SLRU module).  The summarized data might
not be able to determine all of the above as precisely as the
"normal" data tracked in RAM.  To avoid this, be generous when
setting max_pred_locks_per_transaction; not only will it avoid this
summarization, but it will reduce the amount of summarization of
multiple page locks in the predicate locking system to relation
locks.  Coarser locks increase the "false positive" rate of
serialization failures, reducing performance.

> [1] I mean to include transactions which don't update even if they're
> not explicitly declared `read only', so that the application retained
> (until it said to commit) the option to try to make changes.

There is an attempt to recognize, at commit time, *implicit*
read-only transactions -- those that, in spite of not being
*declared* as READ ONLY never wrote any data.  Although these 

Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-15 Thread Stephen Frost
* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 12/14/2016 04:57 PM, Stephen Frost wrote:
> >* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> >>On 12/14/16 5:15 AM, Michael Paquier wrote:
> >>>I would be tempted to suggest adding the verifier type as a new column
> >>>of pg_authid
> >>
> >>Yes please.
> >
> >This discussion seems to continue to come up and I don't entirely
> >understand why we keep trying to shove more things into pg_authid, or
> >worse, into rolpassword.
> 
> I understand the relational beauty of having a separate column for
> the verifier type, but I don't think it would be practical.

I disagree.

> For
> starters, we'd still like to have a self-identifying string format
> like "scram-sha-256:", so that you can conveniently pass the
> verifier as a string to CREATE USER.

I don't follow why we can't change the syntax for CREATE USER to allow
specifying the verifier type independently.  Generally speaking, I don't
expect *users* to be providing actual encoded *verifiers* very often, so
it seems like a bit of extra syntax that pg_dump has to use isn't that
big of a deal.

> I think it'll be much better to
> stick to one format, than try to split the verifier into type and
> the string, when it enters the catalog table.

Apparently, multiple people disagree with this approach.  I don't think
history is really on your side here either.

> >We should have an independent table for the verifiers, which has a
> >different column for the verifier type, and either starts off supporting
> >multiple verifiers per role or at least gives us the ability to add that
> >easily later.  We should also move rolvaliduntil to that new table.
> 
> I agree we'll probably need a new table for verifiers. Or turn
> rolpassword into an array or something. We discussed that before,
> however, and it didn't really go anywhere, so right now I'd like to
> get SCRAM in with minimal changes to the rest of the system. There
> is a lot of room for improvement once it's in.

Using an array strikes me as an absolutely terrible idea- how are you
going to handle having different valid_until times then?

I do agree with trying to get SCRAM in without changing too much of the
rest of the system, but I wanted to make it clear that it's the only
point that I agree with for continuing down this path and that we should
absolutely be looking to change the CREATE USER syntax to specify the
verifier independently, plan to use a different table for the verifiers
with an independent column for the verifier type, support multiple
verifiers per role, etc, in the (hopefully very near...) future.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Feike Steenbergen
> but -X stream is, then we use a temporary slot always.
This seems even more useful with -X fetch to me, as with fetch we are sure
we
are not immediately receiving the WAL. So, I'd propose to use it whenever -X
is specified, regardless of which method is specified.

> (I still think we should change the default here, but that's a different
topic)
+1

> Does that seem reasonable? Or would people prefer it to default to off?
Yes. Users are specifically requesting wal using -X, so making sure that
actually happens by default would work.


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-15 Thread Heikki Linnakangas

On 12/15/2016 03:00 AM, Michael Paquier wrote:

On Wed, Dec 14, 2016 at 8:33 PM, Heikki Linnakangas  wrote:

But, a password stored in plaintext works with either MD5 or SCRAM, or any
future authentication mechanism. So as soon as we have SCRAM authentication,
it becomes somewhat useful again.

In a nutshell:

auth / stored   MD5 SCRAM   plaintext
-
passwordY   Y   Y
md5 Y   N   Y
scram   N   Y   Y

If a password is stored in plaintext, it can be used with any authentication
mechanism. And the plaintext 'password' authentication mechanism works with
any kind of a stored password. But an MD5 hash cannot be used with SCRAM
authentication, or vice versa.


So.. I have been thinking about this portion of the thread. And what I
find the most scary is not the fact that we use plain passwords for
SCRAM authentication, it is the fact that we would need to do a
catalog lookup earlier in the connection workflow to decide what is
the connection protocol to use depending on the username provided in
the startup packet if the pg_hba.conf entry matching the user and
database names uses "password".


I don't see why we would need to do a catalog lookup any earlier. With 
"password" authentication, the server can simply request the client to 
send its password. When it receives it, it performs the catalog lookup 
to get pg_authid.rolpassword. If it's in plaintext, just compare it, if 
it's an MD5 hash, hash the client's password and compare, and if it's a 
SCRAM verifier, build a verifier with the same salt and iteration count 
and compare.



And, honestly, why do we actually need to have a support table that
spread? SCRAM is designed to be secure, so it seems to me that it
would on the contrary a bad idea to encourage the use of plain
passwords if we actually think that they should never be used (they
are actually useful for located, development instances, not production
ones).


I agree we should not encourage bad password practices. But as long as 
we support passwords to be stored in plaintext at all, it makes no sense 
to not allow them to be used with SCRAM. The fact that you can use a 
password stored in plaintext with both MD5 and SCRAM is literally the 
only reason you would store a password in plaintext, so if we don't want 
to allow that, we should disallow storing passwords in plaintext altogether.



So what I would suggest would be to have a support table like
that:
auth / stored   MD5 SCRAM   plaintext
-
passwordY   Y   N
md5 Y   N   Y
scram   N   N   Y


I was using 'Y' to indicate that the combination works, and 'N' to 
indicate that it does not. Assuming you're using the same notation, the 
above doesn't make any sense.



So here is an idea for things to do now:
1) do not change the format of the existing passwords
2) do not change pg_authid
3) block access to instances if "password" or "md5" are used in
pg_hba.conf if the user have a SCRAM verifier.
4) block access if "scram" is used and if user has a plain or md5 verifier.
5) Allow access if "scram" is used and if user has a SCRAM verifier.
We had a similar discussion regarding verifier/password formats last
year but that did not end well. It would be sad to fall back again
into this discussion and get no result. If somebody wants to support
access to SCRAM with plain password entries, why not. But that would
gain a -1 from me regarding the earlier lookup of pg_authid needed to
do the decision making on the protocol to use. And I think that we
want SCRAM to be designed to be a maximum stable and secure.


The bottom line is that at the moment, when plaintext passwords are 
stored as is, without any indicator that it's a plaintext password, it's 
ambiguous whether a password is a SCRAM verifier, or if it's a plaintext 
password that just happens to begin with the word "scram:". That is 
completely unrelated to which combinations of stored passwords and 
authentication mechanisms we actually support or allow to work.


The only way to distinguish, is to know about every verifier kind there 
is, and check whether rolpassword looks valid as anything else than a 
plaintext password. And we already got tripped by a bug-of-omission on 
that once. If we add more verifier formats in the future, it's bound to 
happen again. Let's nip that source of bugs in the bud. Attached is a 
patch to implement what I have in mind.


Alternatively, you could argue that we should forbid storing passwords 
in plaintext altogether. I'm OK with that, too, if that's what people 
prefer. Then you cannot have a user that can log in with both MD5 and 
SCRAM authentication, but it's certainly more secure, and it's easier to 
document.


- Heikki



0001-Use-plain-prefix-for-plaintext-passwords-stored-in-p.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers 

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-15 Thread Ian Jackson
Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation [and 2 more messages] [and 1 more messages]"):
> As Robert pointed out, a read-only transaction cannot normally be
> aborted by a serialization failure on COMMIT.  Under exceptional
> conditions, like an attempt to suppress the serialization failure,
> you might see the commit aborted, though.

Thanks for the detailed explanation.  Sorry for missing the 2nd
example in that page, which does indeed show a read-only transaction
failing:

However, in that example, as you seem to allude to, there is still a
complete serialisation of all the transactions, even the failed T3:
T1,T2,T3.  The database has detected the problem before returning data
in T3 that would contradict the serialisation order.

The thing that was puzzling me, after having slept on it, and before I
read your mail, was how it could happen that the serialisation failure
(of a transaction that did only reads) would only be detected at
commit.  The point about attempts to suppress the serialisation
failure is part of the answer to that.  Are there other reasons,
besides previously suppressed serialisation failures, why commit of a
transaction that did only reads[1] might fail ?

[1] I mean to include transactions which don't update even if they're
not explicitly declared `read only', so that the application retained
(until it said to commit) the option to try to make changes.

Supposing I understand your `doomed' flag correctly, I think it is
then probably possible to construct an argument that proves that
allowing the application to trap and suppress serialisation failures
does not make it harder to provide coherency guarantees.

Or to put it another way: does pgsql already detect serialisation
problems (in transactions which only read) at the point where it would
otherwise return data not consistent with any serialisation order ?
(As it does in the `Rollover' example.)

If so presumably it always throws a serialisation failure at that
point.  I think that is then sufficient.  There is no need to tell the
application programmer they have to commit even transactions which
only read.

If my supposition is right then I will try to develop this argument
more formally.  I think that would be worthwhile because the converse
property is very surprising to non-database programmers, and would
require very explicit documentation by pgsql, and careful attention by
application programmers.  It would be nice to be able to document a
stronger promise.

Ian.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-12-15 Thread Craig Ringer
On 15 Dec. 2016 18:19, "Petr Jelinek"  wrote:

On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
>
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
>
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
>

Ah so you didn't understand the
> +Identifies the following TupleData submessage as a key.
> +This field is optional and is only present if
> +the update changed the REPLICA IDENTITY index. XXX???

So what happens here is that the update message can contain one or two
out of 3 possible tuple submessages. It always contains 'N' message
which is the new data. Then it can optionally contain 'O' message with
old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
IDENTITY index like pkey, etc). Or it can include 'K' message that only
contains old data for the columns in the REPLICA IDENTITY index. But if
the REPLICA IDENTITY index didn't change (ie, old and new would be same
for those columns) we simply omit the 'K' message and let the downstream
take the key data from the 'N' message to save space.


Something we forgot to bake into pglogical that might be worth leaving room
for here: sending the whole old tuple, with some fields marked as key.

So you can use replica identity pkey or whatever and the downstream knows
which are the key fields. But can still transmit the whole old tuple in
case the downstream wants it for conflict resolution/logging/etc.

We don't have the logical decoding and wal output for this yet, nor a way
of requesting old tuple recording table by table. So all i'm suggesting is
leaving room in the protocol.


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2016-12-15 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 11:51 PM, Michael Paquier  wrote:

> > On Wed, Dec 14, 2016 at 4:07 PM, Peter Eisentraut
> >  wrote:
> > Others will follow later in separate patches. Or is it preferred to have
> one
> > huge patch submitted?
>
> Please yes. One change makes little sense.


Ack. I was not sure what patch size is preferred here. Will continue with a
patch in original thread later.

Thanks for feedback!

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 10:26 AM, Magnus Hagander 
wrote:

>
>
> On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander 
> wrote:
>
>> I've started work on a patch to make pg_basebackup use the temporary
>> slots feature that has been committed (thanks Petr!!).  The reason for this
>> is to avoid the cases where a burst of traffic on the master during the
>> backup can cause the receive log part of the basebackup to fall far enough
>> behind that it fails.
>>
>> I have a few considerations at this point, about interaction with
>> existing options.
>>
>> Right now, we have -S/--slot which specifies a slot name. If you want to
>> use that, you have to create the slot ahead of time, and it will be a
>> permanent slot (of course). This is primarily documented as a feature to
>> use for replication (to make sure xlog is kept around until the standby is
>> started up), but it does also solve the same problem. But to use it for
>> base backups today you have to manually create the slot, then base backup,
>> then drop the slot, which is error prone.
>>
>> My thought is that if -S/--slot is not specified, but -X stream is, then
>> we use a temporary slot always. This obviously requires the server to be
>> configured with enough slots (I still think we should change the default
>> here, but that's a different topic), but I think that's acceptable. Then we
>> should add a "--no-slot" to make it revert to previous behaviour.
>>
>> Does that seem reasonable? Or would people prefer it to default to off?
>>
>>
So here's a patch that does this, for discussion. It implements the
following behavior for -X:

* When used with <10.0 servers, behave just like before.
* When -S  is specified, behave just like before (use an existing
replication slot, fail if it does not exist)
* When used on 10.0 with no -S, create and use a temporary replication slot
while running, with name pg_basebackup_.
* When used with 10.0 with no -S but --no-slot specified, run without a
slot like before.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1f15a17..f912ed0 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -246,6 +246,20 @@ PostgreSQL documentation
  
 
  
+  --no-slot
+  
+   
+This option prevents the creation of a temporary replication slot
+during the backup even if it's supported by the server.
+   
+   
+Temporary replication slots are created by default if no slot name
+is given with the -S when using log streaming.
+   
+  
+ 
+
+ 
   -T olddir=newdir
   --tablespace-mapping=olddir=newdir
   
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e2875df..85bc7ef 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -61,6 +61,11 @@ typedef struct TablespaceList
  */
 #define MINIMUM_VERSION_FOR_PG_WAL	10
 
+/*
+ * Temporary replication slots are supported from version 10.
+ */
+#define MINIMUM_VERSION_FOR_TEMP_SLOTS 10
+
 /* Global options */
 static char *basedir = NULL;
 static TablespaceList tablespace_dirs = {NULL, NULL};
@@ -79,6 +84,8 @@ static bool do_sync = true;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static pg_time_t last_progress_report = 0;
 static int32 maxrate = 0;		/* no limit by default */
+static char *replication_slot = NULL;
+static bool temp_replication_slot = true;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -323,6 +330,7 @@ usage(void)
 	printf(_("  -R, --write-recovery-conf\n"
 			 " write recovery.conf after backup\n"));
 	printf(_("  -S, --slot=SLOTNAMEreplication slot to use\n"));
+	printf(_("  --no-slot  prevent creation of temporary replication slot\n"));
 	printf(_("  -T, --tablespace-mapping=OLDDIR=NEWDIR\n"
 	  " relocate tablespace in OLDDIR to NEWDIR\n"));
 	printf(_("  -x, --xlog include required WAL files in backup (fetch mode)\n"));
@@ -452,6 +460,7 @@ typedef struct
 	char		xlog[MAXPGPATH];	/* directory or tarfile depending on mode */
 	char	   *sysidentifier;
 	int			timeline;
+	bool		temp_slot;
 } logstreamer_param;
 
 static int
@@ -471,6 +480,16 @@ LogStreamerMain(logstreamer_param *param)
 	stream.do_sync = do_sync;
 	stream.mark_done = true;
 	stream.partial_suffix = NULL;
+	stream.replication_slot = replication_slot;
+	stream.temp_slot = param->temp_slot;
+	if (stream.temp_slot && !stream.replication_slot)
+	{
+		/* Generate a name for the temporary slot */
+		char		name[32];
+
+		snprintf(name, sizeof(name), "pg_basebackup_%d", getpid());
+		stream.replication_slot = pstrdup(name);
+	}
 
 	if (format == 'p')
 		stream.walmethod = 

Re: [HACKERS] Logical Replication WIP

2016-12-15 Thread Petr Jelinek
On 13/12/16 21:42, Peter Eisentraut wrote:
> On 12/10/16 2:48 AM, Petr Jelinek wrote:
>> Attached new version with your updates and rebased on top of the current
>> HEAD (the partitioning patch produced quite a few conflicts).
> 
> I have attached a few more "fixup" patches, mostly with some editing of
> documentation and comments and some compiler warnings.
> 
> In 0006 in the protocol documentation I have left a "XXX ???" where I
> didn't understand what it was trying to say.
> 

Ah so you didn't understand the
> +Identifies the following TupleData submessage as a key.
> +This field is optional and is only present if
> +the update changed the REPLICA IDENTITY index. XXX???

So what happens here is that the update message can contain one or two
out of 3 possible tuple submessages. It always contains 'N' message
which is the new data. Then it can optionally contain 'O' message with
old data if the table has REPLICA IDENTITY FULL (ie, not REPLICA
IDENTITY index like pkey, etc). Or it can include 'K' message that only
contains old data for the columns in the REPLICA IDENTITY index. But if
the REPLICA IDENTITY index didn't change (ie, old and new would be same
for those columns) we simply omit the 'K' message and let the downstream
take the key data from the 'N' message to save space.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-15 Thread Heikki Linnakangas

On 12/14/2016 04:57 PM, Stephen Frost wrote:

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:

On 12/14/16 5:15 AM, Michael Paquier wrote:

I would be tempted to suggest adding the verifier type as a new column
of pg_authid


Yes please.


This discussion seems to continue to come up and I don't entirely
understand why we keep trying to shove more things into pg_authid, or
worse, into rolpassword.


I understand the relational beauty of having a separate column for the 
verifier type, but I don't think it would be practical. For starters, 
we'd still like to have a self-identifying string format like 
"scram-sha-256:", so that you can conveniently pass the verifier 
as a string to CREATE USER. I think it'll be much better to stick to one 
format, than try to split the verifier into type and the string, when it 
enters the catalog table.



We should have an independent table for the verifiers, which has a
different column for the verifier type, and either starts off supporting
multiple verifiers per role or at least gives us the ability to add that
easily later.  We should also move rolvaliduntil to that new table.


I agree we'll probably need a new table for verifiers. Or turn 
rolpassword into an array or something. We discussed that before, 
however, and it didn't really go anywhere, so right now I'd like to get 
SCRAM in with minimal changes to the rest of the system. There is a lot 
of room for improvement once it's in.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-15 Thread Etsuro Fujita

On 2016/12/13 23:13, Ashutosh Bapat wrote:

On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane  wrote:

I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?



Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths.


I think Ashutosh is right.


Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.


+1


Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

 * Just skip anything else. We don't know if corresponding
 * plan would build the output row from whole-row references
 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.


I wrote the first version of GetExistingLocalJoinPath (and 
postgresRecheckForeignScan), to verify that the changes to core for 
pushing down foreign/custom joins in 9.5 would work well for EPQ 
rechecks.  And I rejected custom paths in that version because I 
intended to use that function not only for foreign joins but custom 
joins.  (IIRC, for 9.6, I proposed to put GetExistingLocalJoinPath in a 
more common area such as src/backend/optimizer/path/joinpath.c, but that 
was ignored...)



I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.



I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?


I'm not able to understand that, either.


(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)


An alternative I was thinking was (1) to create an equivalent nestloop 
join path for each foreign/custom join path and store it in 
fdw_outerpath as-is, except when that join path implements a full join, 
in which case a merge or hash join path is created, and (2) to apply 
postgresRecheckForeignScan to the outer subplan created from the 
fdw_outerpath when executing EPQ rechecks.  So, I was thinking to 
provide a helper function that creates the equivalent local join path 
for a given foreign/custom join path in #1.



A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from 

[HACKERS] Re: [COMMITTERS] pgsql: Build HTML documentation using XSLT stylesheets by default

2016-12-15 Thread Alexander Law

Hello Alvaro,

It's caused by the condition
...
in the simple.xlink template 
(docbook/stylesheet/docbook-xsl/xhtml/inline.xsl). (This test executed 
for each xlink (~ 9 times)).

Yes, it's inefficient but it doesn't affect build time (for me).
You can try to apply the attached patch and measure the time with it.
So If the performance is rather acceptable now I'd continue switch to 
XML, and get back to the performance issues after the switch.
(epub generation is much more slow, and I have developed a patch to 
speed up it too.)


Best regards,
Alexander

01.12.2016 19:49, Alvaro Herrera wrote:

Pavel Stehule wrote:


It does much more intensive work with IO - I have feeling like there are
intensive fsync.

You could prove that, by running "make html" under "strace -f -e
trace=fsync" etc.  I just tried that, and I don't see any fsync.  I
guess you could try other syscalls, or simply "-e trace=file".  Doing
the latter I noticed an absolutely stupid number of attempts to open
file
/usr/lib/libxslt-plugins/nwalsh_com_xslt_ext_com_nwalsh_saxon_UnwrapLinks.so
which deserves a WTF.



diff --git a/doc/src/sgml/stylesheet-speedup-xhtml.xsl b/doc/src/sgml/stylesheet-speedup-xhtml.xsl
index ff08bef..60c9fd4 100644
--- a/doc/src/sgml/stylesheet-speedup-xhtml.xsl
+++ b/doc/src/sgml/stylesheet-speedup-xhtml.xsl
@@ -1,5 +1,6 @@
 
 http://www.w3.org/1999/XSL/Transform;
+xmlns:xlink="http://www.w3.org/1999/xlink;
 xmlns="http://www.w3.org/1999/xhtml;
 version='1.0'>
 
@@ -292,4 +293,189 @@
   
 
 
+
+
+
+  
+  
+
+  
+  
+  
+
+  
+  
+
+  _blank
+  _top
+  
+
+  
+
+  
+
+  
+
+
+
+  
+
+
+1
+0
+  
+
+
+
+
+  
+
+
+1
+0
+  
+
+
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+
+
+
+
+
+  
+
+
+
+  
+
+  XLink to nonexistent id: 
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+  
+  
+
+  
+  
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+  
+
+  
+  
+  
+
+  
+
+  
+
+  
+
+
+
+
+  
+
+
+
+  
+  
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+  
+
+  
+
+  
+
+
+  
+
+
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackups and slots

2016-12-15 Thread Magnus Hagander
On Thu, Dec 15, 2016 at 10:04 AM, Magnus Hagander 
wrote:

> I've started work on a patch to make pg_basebackup use the temporary slots
> feature that has been committed (thanks Petr!!).  The reason for this is to
> avoid the cases where a burst of traffic on the master during the backup
> can cause the receive log part of the basebackup to fall far enough behind
> that it fails.
>
> I have a few considerations at this point, about interaction with existing
> options.
>
> Right now, we have -S/--slot which specifies a slot name. If you want to
> use that, you have to create the slot ahead of time, and it will be a
> permanent slot (of course). This is primarily documented as a feature to
> use for replication (to make sure xlog is kept around until the standby is
> started up), but it does also solve the same problem. But to use it for
> base backups today you have to manually create the slot, then base backup,
> then drop the slot, which is error prone.
>
> My thought is that if -S/--slot is not specified, but -X stream is, then
> we use a temporary slot always. This obviously requires the server to be
> configured with enough slots (I still think we should change the default
> here, but that's a different topic), but I think that's acceptable. Then we
> should add a "--no-slot" to make it revert to previous behaviour.
>
> Does that seem reasonable? Or would people prefer it to default to off?
>
>
> However, it also made me think that the interface for setting up a replica
> with a *permanent* slot would be much nicer if we could just have
> pg_basebackup create the slot, instead of having to create it manually.
>
> Basically, I'm envisioning adding an IF_NOT_EXISTS argument to
> CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it
> wants to use and have it created if necessary. Do people think think this
> is worth doing? (It would also make pg_receivexlog and pg_receivelogical
> easier to use, I think, and it would obviously be implemented there as
> well).
>
>
Pah, nevermind this last question. I was looking a  completely broken
branch -- we already have the  if-not-exists parameter for
pg_receivexlog/pg_receivelogical. I shouldn't write emails before
thinking...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-15 Thread Heikki Linnakangas

On 12/15/2016 10:44 AM, Magnus Hagander wrote:

I wonder if there might be more corner cases like this, but in this
particular one it seems easy enough to just say that failing to rename
recovery.conf because it didn't exist is safe.


Yeah. It's unexpected though, so I think erroring out is quite 
reasonable. If the recovery.conf file went missing, who knows what else 
is wrong.



But in the case of failing to rename recovery.conf for example because of
permissions errors, we don't want to ignore it. But we also really don't
want to end up with this kind of inconsistent data directory IMO. I don't
know that code well enough to suggest how to fix it though -- hoping for
input for someone who knows it closer?


Hmm. Perhaps we should write the timeline history file only after 
renaming recovery.conf. In general, we should keep the window between 
writing the timeline history file and writing the end-of-recovery record 
as small as possible. I don't think we can eliminate it completely, but 
it makes sense to minimize it. Something like the attached (completely 
untested).


- Heikki



reorder-end-of-archive-recovery-actions-1.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction oddity with list partition of a list partition

2016-12-15 Thread Amit Langote

Hi David,

On 2016/12/15 18:09, David Fetter wrote:
> Per Thomas Munro, could it be that the CREATE ... PARTITION OF ... code
> fails to run CacheInvalidateRelcache on its parent(s)?

Thomas's right.  There is a patch posted for this issue [1]; I'm sending
an updated version of the patch later today in reply to [1].  Meanwhile,
could you try and see if the problem is fixed with the attached patch.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZ86v1G%2Bzx9etMiSQaBBvYMKfU-iitqZArSh5z0n8Q4cA%40mail.gmail.com
>From 4553aa4588a3d18aba3d0aa8d07627ff8654f436 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 13 Dec 2016 15:07:06 +0900
Subject: [PATCH 1/6] Invalidate the parent's relcache after partition
 creation.

---
 src/backend/catalog/heap.c   |  7 ++-
 src/backend/commands/tablecmds.c | 13 -
 src/include/catalog/heap.h   |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index c09c9f28a7..e5d6aecc3f 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3230,9 +3230,12 @@ RemovePartitionKeyByRelId(Oid relid)
  * StorePartitionBound
  *		Update pg_class tuple of rel to store the partition bound and set
  *		relispartition to true
+ *
+ * Also, invalidate the parent's relcache, so that the next rebuild will load
+ * the new partition's info into its partition descriptor.
  */
 void
-StorePartitionBound(Relation rel, Node *bound)
+StorePartitionBound(Relation rel, Relation parent, Node *bound)
 {
 	Relation	classRel;
 	HeapTuple	tuple,
@@ -3273,4 +3276,6 @@ StorePartitionBound(Relation rel, Node *bound)
 	CatalogUpdateIndexes(classRel, newtuple);
 	heap_freetuple(newtuple);
 	heap_close(classRel, RowExclusiveLock);
+
+	CacheInvalidateRelcache(parent);
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7a574dc50d..1c219b03dd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -777,10 +777,11 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * it does not return on error.
 		 */
 		check_new_partition_bound(relname, parent, bound);
-		heap_close(parent, NoLock);
 
 		/* Update the pg_class entry. */
-		StorePartitionBound(rel, bound);
+		StorePartitionBound(rel, parent, bound);
+
+		heap_close(parent, NoLock);
 
 		/*
 		 * The code that follows may also update the pg_class tuple to update
@@ -13141,7 +13142,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 			  cmd->bound);
 
 	/* Update the pg_class entry. */
-	StorePartitionBound(attachRel, cmd->bound);
+	StorePartitionBound(attachRel, rel, cmd->bound);
 
 	/*
 	 * Generate partition constraint from the partition bound specification.
@@ -13352,12 +13353,6 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd)
 		}
 	}
 
-	/*
-	 * Invalidate the parent's relcache so that the new partition is now
-	 * included its partition descriptor.
-	 */
-	CacheInvalidateRelcache(rel);
-
 	ObjectAddressSet(address, RelationRelationId, RelationGetRelid(attachRel));
 
 	/* keep our lock until commit */
diff --git a/src/include/catalog/heap.h b/src/include/catalog/heap.h
index 77dc1983e8..0e4262f611 100644
--- a/src/include/catalog/heap.h
+++ b/src/include/catalog/heap.h
@@ -143,6 +143,6 @@ extern void StorePartitionKey(Relation rel,
 	Oid *partopclass,
 	Oid *partcollation);
 extern void RemovePartitionKeyByRelId(Oid relid);
-extern void StorePartitionBound(Relation rel, Node *bound);
+extern void StorePartitionBound(Relation rel, Relation parent, Node *bound);
 
 #endif   /* HEAP_H */
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-15 Thread Amit Kapila
On Thu, Dec 15, 2016 at 10:04 AM, Andres Freund  wrote:
> Hi,
>
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> Ashutosh, could you try it and see if it improves things?
>
> So what's the theory of why this triggers pldebugger hangs, but
> apparently causes not many other problems?
>

The theory is that with pldebugger latch event gets triggered before
FD_READ due to which it seems like the second event gets lost and
WaitForMultipleObjects() keeps on waiting indefinitely.  The probable
reason is that we fail to reset the event due to which we are seeing
this behavior. That seems to be required as per msdn as well, as
pointed by Robert upthread.

Find attached patch to implement the resetting of event only for the
required case.  This fixes the reported problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


reset_wait_events_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-15 Thread Masahiko Sawada
On Thu, Dec 15, 2016 at 3:06 PM, Kyotaro HORIGUCHI
 wrote:
> At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada  
> wrote in 
>> On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
>>  wrote:
>> > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao  
>> > wrote:
>> >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>> >>  wrote:
>> >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  
>> >>> wrote:
>>  If we drop the "standby_list" syntax, I don't think that new parameter 
>>  is
>>  necessary. We can keep s_s_names and just drop the support for that 
>>  syntax
>>  from s_s_names. This may be ok if we're really in "break all the 
>>  things" mode
>>  for PostgreSQL 10.
>> >>>
>> >>> Please let's not raise that as an argument again... And not break the
>> >>> s_list argument. Many users depend on that for just single sync
>> >>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>> >>> a standby list is a priority list if we can maintain that. Upthread
>> >>> agreement was to break that, I did not insist further, and won't if
>> >>> that's still the feeling.
>> >>
>> >> I wonder why you think that the backward-compatibility for standby_list is
>> >> so "special". We renamed pg_xlog directory to pg_wal and are planning to
>> >> change recovery.conf API at all, though they have bigger impacts on
>> >> the existing users in terms of the backward compatibility. OTOH, so far,
>> >> changing GUC between major releases happened several times.
>> >
>> > Silent failures for existing failover deployments is a pain to solve
>> > after doing upgrades. That's my only concern. Changing pg_wal would
>> > result in a hard failure when upgrading. And changing the meaning of
>> > the standby list (without keyword ANY or FIRST!) does not fall into
>> > that category... So yes just removing support for standby list would
>> > result in a hard failure, which would be fine for the
>> > let-s-break-all-things move.
>> >
>> >> But I'm not against keeping the backward compatibility for standby_list,
>> >> to be honest. My concern is that the latest patch tries to support
>> >> the backward compatibility "partially" and which would be confusing to 
>> >> users,
>> >> as I told upthread.
>> > If we try to support backward compatibility, I'd personally do it
>> > fully, and have a list of standby names specified meaning a priority
>> > list.
>> >
>> >> So I'd like to propose to keep the backward compatibility fully for 
>> >> s_s_names
>> >> (i.e., both "standby_list" and "N (standby_list)" mean the priority 
>> >> method)
>> >> at the first commit, then continue discussing this and change it if we 
>> >> reach
>> >> the consensus until PostgreSQL 10 is actually released. Thought?
>> >
>> > +1 on that.
>>
>> +1.
>
> FWIW, +1 from me.
>
>> I'll update the patch.
>

Attached latest v12 patch.
I changed behavior of "N (standby_list)" to use the priority method
and incorporated some review comments so far. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57..91eb888 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3054,41 +3054,67 @@ include_dir 'conf.d'
 transactions waiting for commit will be allowed to proceed after
 these standby servers confirm receipt of their data.
 The synchronous standbys will be those whose names appear
-earlier in this list, and
+in this list, and
 that are both currently connected and streaming data in real-time
 (as shown by a state of streaming in the
 
 pg_stat_replication view).
-Other standby servers appearing later in this list represent potential
-synchronous standbys. If any of the current synchronous
-standbys disconnects for whatever reason,
-it will be replaced immediately with the next-highest-priority standby.
-Specifying more than one standby name can allow very high availability.


 This parameter specifies a list of standby servers using
 either of the following syntaxes:
 
-num_sync ( standby_name [, ...] )
+[FIRST] num_sync ( standby_name [, ...] )
+ANY num_sync ( standby_name [, ...] )
 standby_name [, ...]
 
 where num_sync is
 the number of synchronous standbys that transactions need to
 wait for replies from,
 and standby_name
-is the name of a standby server. For example, a setting of
-3 (s1, s2, s3, s4) makes transaction commits wait
-until their WAL records are received by three higher-priority standbys
-chosen from standby servers s1, s2,
-

Re: [HACKERS] Transaction oddity with list partition of a list partition

2016-12-15 Thread David Fetter
On Thu, Dec 15, 2016 at 12:23:24AM -0800, David Fetter wrote:
> Folks,
> 
> I'm having some trouble understanding what's going on here.  When I \i
> the file in 55caaaeba877eac1feb6481fb413fa04ae9046ac without starting
> a transaction explicitly, it produces the expected results.  When I \i
> it after a BEGIN, not so much.


I've managed to get a shorter repro for the issue:

BEGIN;
CREATE TABLE the_log (
ts TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
"user" TEXT NOT NULL DEFAULT current_user,
action TEXT NOT NULL,
table_schema TEXT NOT NULL,
table_name TEXT NOT NULL,
old_row JSONB,
new_row JSONB,
CHECK(
CASE action
WHEN 'INSERT' THEN old_row IS NULL AND new_row IS NOT NULL
WHEN 'UPDATE' THEN old_row IS NOT NULL AND new_row IS NOT NULL
ELSE /*DELETE, and maybe TRUNCATE, if that's supported by access to 
old rows */
old_row IS NOT NULL AND new_row IS NULL
END
)
) PARTITION BY LIST(table_schema);
CREATE TABLE public_log
PARTITION OF the_log FOR VALUES IN ('public');
INSERT INTO the_log (action, table_schema, table_name, new_row)
VALUES ('INSERT','public','city','{"name": "Oakland", "population": 419267}');

leads to:

ERROR:  no partition of relation "the_log" found for row
DETAIL:  Failing row contains (2016-12-15 00:59:17.980094-08, shackle, INSERT, 
public, city, null, {"name": "Oakland", "population": 419267}).

Per Thomas Munro, could it be that the CREATE ... PARTITION OF ... code
fails to run CacheInvalidateRelcache on its parent(s)?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-12-15 Thread Petr Jelinek
On 13/12/16 01:33, Andres Freund wrote:
> 
> On 2016-12-12 09:18:48 -0500, Peter Eisentraut wrote:
>> On 12/8/16 4:10 PM, Petr Jelinek wrote:
>>> On 08/12/16 20:16, Peter Eisentraut wrote:
 On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>> I think that the removal of changes to ReplicationSlotAcquire() that you
>> did will result in making it impossible to reacquire temporary slot once
>> you switched to different one in the session as the if (active_pid != 0)
>> will always be true for temp slot.
>
> I see.  I suppose it's difficult to get a test case for this.

 I created a test case, saw the error of my ways, and added your code
 back in.  Patch attached.

>>>
>>> Hi,
>>>
>>> I am happy with this version, thanks for moving it forward.
>>
>> committed
> 
> Hm.
> 
>  /*
> + * Cleanup all temporary slots created in current session.
> + */
> +void
> +ReplicationSlotCleanup()
> 
> I'd rather see a (void) there. The prototype has it, but still.
> 
> 
> +
> + /*
> +  * No need for locking as we are only interested in slots active in
> +  * current process and those are not touched by other processes.
> 
> I'm a bit suspicious of this claim.  Without a memory barrier you could
> actually look at outdated versions of active_pid. In practice there's
> enough full memory barriers in the slot creation code that it's
> guaranteed to not be the same pid from before a wraparound though.
> 
> I think that doing iterations of slots without
> ReplicationSlotControlLock makes things more fragile, because suddenly
> assumptions that previously held aren't true anymore.   E.g. factually
>   /*
>* The slot is definitely gone.  Lock out concurrent scans of the array
>* long enough to kill it.  It's OK to clear the active flag here 
> without
>* grabbing the mutex because nobody else can be scanning the array 
> here,
>* and nobody can be attached to this slot and thus access it without
>* scanning the array.
>*/
> is now simply not true anymore.  It's probably not harmfully broken, but
> at least you've changed the locking protocol without adapting comments.
> 
> 

Any thoughts on attached? Yes it does repeated scans which can in theory
be slow but as I explained in the comment, in practice there is not much
need to have many temporary slots active within single session so it
should not be big issue.

I am not quite convinced that all the locking is necessary from the
current logic perspective TBH but it should help prevent mistakes by
whoever changes things in slot.c next.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 3ddeac5eb01da1642a6c7eb9a290a3ded08045dd Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 15 Dec 2016 09:43:17 +0100
Subject: [PATCH 2/2] Improve behavior of ReplicationSlotCleanup()

Make sure we have slot locked properly for modification everywhere and
also cleanup MyReplicationSlot so to reduce code duplication.
---
 src/backend/replication/slot.c  | 58 -
 src/backend/replication/walsender.c |  3 --
 src/backend/storage/lmgr/proc.c |  6 +---
 src/backend/tcop/postgres.c |  4 ---
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d8ed005..ed50e49 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -412,30 +412,60 @@ ReplicationSlotRelease(void)
 }
 
 /*
- * Cleanup all temporary slots created in current session.
+ * Search the replication slot list for temporary slot owned by current
+ * session and return it. Returns NULL if not found.
  */
-void
-ReplicationSlotCleanup()
+static ReplicationSlot *
+FindMyNextTempSlot(void)
 {
-	int			i;
-
-	Assert(MyReplicationSlot == NULL);
+	int	i;
+	ReplicationSlot	   *slot;
 
-	/*
-	 * No need for locking as we are only interested in slots active in
-	 * current process and those are not touched by other processes.
-	 */
+	LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
 	for (i = 0; i < max_replication_slots; i++)
 	{
-		ReplicationSlot *s = >replication_slots[i];
+		slot = >replication_slots[i];
 
-		if (s->active_pid == MyProcPid)
+		SpinLockAcquire(>mutex);
+		if (slot->active_pid == MyProcPid)
 		{
-			Assert(s->in_use && s->data.persistency == RS_TEMPORARY);
+			Assert(slot->in_use && slot->data.persistency == RS_TEMPORARY);
 
-			ReplicationSlotDropPtr(s);
+			SpinLockRelease(>mutex);
+			LWLockRelease(ReplicationSlotControlLock);
+			return slot;
 		}
+		else
+			SpinLockRelease(>mutex);
 	}
+	LWLockRelease(ReplicationSlotControlLock);
+
+	return NULL;
+}
+
+/*
+ * Cleanup all any slot state we might have. This includes releasing any
+ * active replication slot in current session and dropping all temporary
+ * slots created in 

[HACKERS] pg_basebackups and slots

2016-12-15 Thread Magnus Hagander
I've started work on a patch to make pg_basebackup use the temporary slots
feature that has been committed (thanks Petr!!).  The reason for this is to
avoid the cases where a burst of traffic on the master during the backup
can cause the receive log part of the basebackup to fall far enough behind
that it fails.

I have a few considerations at this point, about interaction with existing
options.

Right now, we have -S/--slot which specifies a slot name. If you want to
use that, you have to create the slot ahead of time, and it will be a
permanent slot (of course). This is primarily documented as a feature to
use for replication (to make sure xlog is kept around until the standby is
started up), but it does also solve the same problem. But to use it for
base backups today you have to manually create the slot, then base backup,
then drop the slot, which is error prone.

My thought is that if -S/--slot is not specified, but -X stream is, then we
use a temporary slot always. This obviously requires the server to be
configured with enough slots (I still think we should change the default
here, but that's a different topic), but I think that's acceptable. Then we
should add a "--no-slot" to make it revert to previous behaviour.

Does that seem reasonable? Or would people prefer it to default to off?


However, it also made me think that the interface for setting up a replica
with a *permanent* slot would be much nicer if we could just have
pg_basebackup create the slot, instead of having to create it manually.

Basically, I'm envisioning adding an IF_NOT_EXISTS argument to
CREATE_REPLICATION_SLOT, so that pg_basebackup can pass up the slot name it
wants to use and have it created if necessary. Do people think think this
is worth doing? (It would also make pg_receivexlog and pg_receivelogical
easier to use, I think, and it would obviously be implemented there as
well).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Crash on promotion when recovery.conf is renamed

2016-12-15 Thread Magnus Hagander
I had a system where the recovery.conf file was renamed "out of the way" at
some point, and then the system was promoted. This is obviously operator
error, but it seems like something we should handle.

What happens now is that the non-existance of recovery.conf is a FATAL
error. I wonder if it should just be a WARNING, at least in the case of
ENOENT?

What happens is this.

Log output:
2016-12-15 09:36:46.265 CET [25437] LOG:  received promote request
2016-12-15 09:36:46.265 CET [25438] FATAL:  terminating walreceiver process
due to administrator command
mha@mha-laptop:~/postgresql/inst/head$ 2016-12-15 09:36:46.265 CET [25437]
LOG:  invalid record length at 0/5015168: wanted 24, got 0
2016-12-15 09:36:46.265 CET [25437] LOG:  redo done at 0/5015130
2016-12-15 09:36:46.265 CET [25437] LOG:  last completed transaction was at
log time 2016-12-15 09:36:19.27125+01
2016-12-15 09:36:46.276 CET [25437] LOG:  selected new timeline ID: 2
2016-12-15 09:36:46.429 CET [25437] FATAL:  could not open file
"recovery.conf": No such file or directory
2016-12-15 09:36:46.429 CET [25436] LOG:  startup process (PID 25437)
exited with exit code 1
2016-12-15 09:36:46.429 CET [25436] LOG:  terminating any other active
server processes
2016-12-15 09:36:46.429 CET [25456] WARNING:  terminating connection
because of crash of another server process
2016-12-15 09:36:46.429 CET [25456] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-12-15 09:36:46.429 CET [25456] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2016-12-15 09:36:46.431 CET [25436] LOG:  database system is shut down


So we can see it switches to timeline 2. Looking in pg_wal (or pg_xlog --
customer system was on 9.5, but this is reproducible in HEAD):

-rw--- 1 mha mha 16777216 Dec 15 09:36 00010004
-rw--- 1 mha mha 16777216 Dec 15 09:36 00010005
-rw--- 1 mha mha 16777216 Dec 15 09:36 00020005
-rw--- 1 mha mha   41 Dec 15 09:36 0002.history

However, according to pg_controldata, we are still on timeline 1:
Latest checkpoint location:   0/460
Prior checkpoint location:0/460
Latest checkpoint's REDO location:0/428
Latest checkpoint's REDO WAL file:00010004
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
..
Minimum recovery ending location: 0/5015168
Min recovery ending loc's timeline:   1


But since we have a history file for timeline 2 in the data directory (and
neatly archived), this data directory isn't consistent with that. Meaning
that for example any other standbys that you try to connect to this cluster
will simply fail, because they try to join up on timeline 2 which doesn't
actually exist.


I wonder if there might be more corner cases like this, but in this
particular one it seems easy enough to just say that failing to rename
recovery.conf because it didn't exist is safe.

But in the case of failing to rename recovery.conf for example because of
permissions errors, we don't want to ignore it. But we also really don't
want to end up with this kind of inconsistent data directory IMO. I don't
know that code well enough to suggest how to fix it though -- hoping for
input for someone who knows it closer?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] new set of psql patches for loading (saving) data from (to) text, binary files

2016-12-15 Thread Pavel Stehule
Hi

I am sending set of patches - for simpler testing these patches are
independent in this moment.

These patches are replacement of my previous patches in this area: COPY RAW
and fileref variables.

1. parametrized queries support - the psql variables can be passed as query
parameters

2. \gstore, \gbstore - save returned (binary) value to file

3. \set_from_file. \set_from_bfile - set a variable from (binary) file

The code is simple - there are not any change in critical or complex parts
of psql.

Regards

Pavel

Comments, notes?
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 9915731..7e2fa96 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1944,6 +1944,31 @@ hello 10
 
   
 
+
+  
+\gstore [ filename ]
+\gstore [ |command ]
+\gbstore [ filename ]
+\gbstore [ |command ]
+
+
+ Sends the current query input buffer to the server and stores the
+ raw query's output into stores the query's output in filename or pipes the output
+ to the shell command command.  The file or command is
+ written to only if the query successfully returns exactly one row
+ one column non null result, not if the query fails or is a 
+ non-data-returning SQL command. For example:
+
+= SELECT avatar FROM users WHERE id = 123
+- \gbstore ~/avatar.png
+
+
+
+  
+
+
   
 \h or \help [ command ]
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a9a2fdb..e8fabb9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -929,6 +929,27 @@ exec_command(const char *cmd,
 		status = PSQL_CMD_SEND;
 	}
 
+	/* \gstore [filename], \gbstore [filename] -- send query and store result in (binary) file */
+	else if (strcmp(cmd, "gstore") == 0 ||
+			  (strcmp(cmd, "gbstore") == 0))
+	{
+		char	   *fname = psql_scan_slash_option(scan_state,
+   OT_FILEPIPE, NULL, false);
+
+		if (!fname)
+			pset.gfname = pg_strdup("");
+		else
+		{
+			expand_tilde();
+			pset.gfname = pg_strdup(fname);
+		}
+
+		pset.raw_flag = true;
+		pset.binres_flag = (strcmp(cmd, "gbstore") == 0);
+		free(fname);
+		status = PSQL_CMD_SEND;
+	}
+
 	/* help */
 	else if (strcmp(cmd, "h") == 0 || strcmp(cmd, "help") == 0)
 	{
@@ -1064,7 +1085,6 @@ exec_command(const char *cmd,
 		free(opt2);
 	}
 
-
 	/* \o -- set query output */
 	else if (strcmp(cmd, "o") == 0 || strcmp(cmd, "out") == 0)
 	{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a7789df..d4b4f15 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -854,6 +854,85 @@ StoreQueryTuple(const PGresult *result)
 	return success;
 }
 
+/*
+ * StoreRawResult: the returned value (possibly binary) is displayed
+ * or stored in file. The result should be exactly one row, one column.
+ */
+static bool
+StoreRawResult(const PGresult *result)
+{
+	bool		success = true;
+
+	if (PQntuples(result) < 1)
+	{
+		psql_error("no rows returned for \\gstore or \\gbstore\n");
+		success = false;
+	}
+	else if (PQntuples(result) > 1)
+	{
+		psql_error("more than one row returned for \\gstore or \\gbstore\n");
+		success = false;
+	}
+	else if (PQnfields(result) < 1)
+	{
+		psql_error("no columns returned for \\gstore or \\gbstore\n");
+		success = false;
+	}
+	else if (PQnfields(result) > 1)
+	{
+		psql_error("more than one column returned for \\gstore or \\gbstore\n");
+		success = false;
+	}
+	else if (PQgetisnull(result, 0, 0))
+	{
+		psql_error("returned value is null for \\gstore or \\gbstore\n");
+		success = false;
+	}
+	else
+	{
+		char	   *value;
+		int			length;
+		FILE	   *fout = NULL;
+		bool		is_pipe = false;
+
+		value = PQgetvalue(result, 0, 0);
+		length = PQgetlength(result, 0, 0);
+
+		if (pset.gfname && *(pset.gfname) != '\0')
+		{
+			if (!openQueryOutputFile(pset.gfname, , _pipe))
+success = false;
+			if (success && is_pipe)
+disable_sigpipe_trap();
+		}
+
+		if (success)
+		{
+			success = fwrite(value, 1, length, fout != NULL ? fout : pset.queryFout) == length;
+			if (!success)
+psql_error("%s: %s\n", pset.gfname, strerror(errno));
+
+			if (success)
+success = fflush(fout != NULL ? fout : pset.queryFout) == 0;
+
+			if (!success)
+psql_error("%s: %s\n", pset.gfname, strerror(errno));
+
+			if (fout != NULL)
+			{
+if (is_pipe)
+{
+	pclose(fout);
+	restore_sigpipe_trap();
+}
+else
+	fclose(fout);
+			}
+		}
+	}
+
+	return success;
+}
 
 /*
  * ExecQueryTuples: assuming query result is OK, execute each query
@@ -1124,6 +1203,8 @@ PrintQueryResults(PGresult *results)
 success = ExecQueryTuples(results);
 			else if (pset.crosstab_flag)
 success = PrintResultsInCrosstab(results);
+			else if (pset.raw_flag)
+success = StoreRawResult(results);
 			else
 success = PrintQueryTuples(results);
 			/* if it's INSERT/UPDATE/DELETE RETURNING, also print status */
@@ -1278,7 

[HACKERS] Transaction oddity with list partition of a list partition

2016-12-15 Thread David Fetter
Folks,

I'm having some trouble understanding what's going on here.  When I \i
the file in 55caaaeba877eac1feb6481fb413fa04ae9046ac without starting
a transaction explicitly, it produces the expected results.  When I \i
it after a BEGIN, not so much.

What's going on?

Best,
David.

shackle@shackle=# BEGIN;
BEGIN
shackle@shackle=# \i ten_plus.sql 
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE FUNCTION
CREATE TABLE
CREATE TRIGGER
psql:ten_plus.sql:66: ERROR:  no partition of relation "the_log" found for row
DETAIL:  Failing row contains (2016-12-15 00:17:46.579357-08, shackle, INSERT, 
public, city, null, {"id": 1, "name": "Oakland", "population": 419267}).
CONTEXT:  SQL statement "INSERT INTO the_log(
action,
table_schema,
table_name,
old_row,
new_row)
VALUES (
TG_OP,
TG_TABLE_SCHEMA,
TG_TABLE_NAME, 
CASE TG_OP WHEN 'INSERT' THEN NULL ELSE row_to_json(OLD)::jsonb END,
CASE TG_OP WHEN 'DELETE' THEN NULL ELSE row_to_json(NEW)::jsonb END
)"
PL/pgSQL function log_change() line 3 at SQL statement
shackle@shackle=# ROLLBACK;

-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


ten_plus.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bigint vs txid user confusion

2016-12-15 Thread Craig Ringer
Hi all

I recently had a real world case of a user confused by the
(non)relationship between txid_current()'s output and that of the xid
type, specifically pg_stat_replication.backend_xmin .

I quote:

"
> What should we look for to determine normal? I thought maybe it would
> compare to txid_current(), but these numbers are not at all similar:
>
> XXX=> select txid_current();
>  txid_current
> --
>6311252596
> (1 row)
>
> XXX=> select client_addr, backend_xmin from pg_stat_replication;
>  client_addr  | backend_xmin
> --+--
>  192.168.X.Y |
>  192.168.X.Y  |   2016096136
>  192.168.X.Y  |
>  192.168.X.Y |   2016096136
> (4 rows)


This is a confusing user interface issue in PostgreSQL.

backend_xmin is of type 'xid'. txid_current(), though, returns a
bigint where the high bits are an epoch incremented once per xid
wrap-around, and the low bits are the 32-bit xid.

That's why this output is consistent with the user's two servers having
hot_standby_feedback, but the shown backend_xmin is 4295156460 XIDs
behind the master. That's greater than MAXUINT32 (4294967296)
difference, which seems impossible with a 32-bit transaction ID. It's
because your xid counter has wrapped around once, and
pg_stat_replication doesn't show that, but txid_current() does.

Rather than comparing against txid_current(), the simplest way to get
an indication of how far "behind" the master those XIDs are is to use
the age() function, e.g.

select client_addr, backend_xmin, age(backend_xmin) from
pg_stat_replication;

which will report the difference from the master's xid counter, taking
into account wrap-around etc.

Doing the comparison manually is a bit tricky in SQL.

PostgreSQL really should expose a function to strip the epoch and get
a txid (if the epoch is recent) or null (if the epoch is far in the
past) to make this easier. I submitted one as a part of the
txid_status() patch set and I'll get back to that soon. I just thought
this was relevant.

I really wish we could just change the pg_stat_activity and
pg_stat_replication xid fields to be epoch qualified in a 64-bit wide
'fullxid' type, or similar.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers