Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Craig Ringer
Draft patch for comment fix.​
From 304453f67b26219dc94e6aef9dfdc1c1debf12bb Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 6 Apr 2016 14:57:26 +0800
Subject: [PATCH] Fix incorrect comments introduced in logical decoding
 timeline following

---
 src/backend/replication/logical/logicalfuncs.c | 14 +++---
 src/include/replication/slot.h | 18 --
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 5af6751..5f74941 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -236,8 +236,10 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 	PG_TRY();
 	{
 		/*
-		 * Passing InvalidXLogRecPtr here causes decoding to start returning
-		 * commited xacts to the client at the slot's confirmed_flush.
+		 * Passing InvalidXLogRecPtr here causes logical decoding to
+		 * start calling the output plugin for transactions that commit
+		 * at or after the slot's confirmed_flush. This filters commits
+		 * out from the client but they're still decoded.
 		 */
 		ctx = CreateDecodingContext(InvalidXLogRecPtr,
 	options,
@@ -262,11 +264,9 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 		ctx->output_writer_private = p;
 
 		/*
-		 * We start reading xlog from the restart lsn, even though we won't
-		 * start returning decoded data to the user until the point set up in
-		 * CreateDecodingContext. The restart_lsn is far enough back that we'll
-		 * see the beginning of any xact we're expected to return to the client
-		 * so we can assemble a complete reorder buffer for it.
+		 * Reading and decoding of WAL must start at restart_lsn so that the
+		 * entirety of each of the xacts that commit after confimed_lsn can be
+		 * accumulated into reorder buffers.
 		 */
 		startptr = MyReplicationSlot->data.restart_lsn;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index efcce5f..50c8a33 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -63,10 +63,24 @@ typedef struct ReplicationSlotPersistentData
 	 */
 	TransactionId catalog_xmin;
 
-	/* oldest LSN that might be required by this replication slot */
+	/*
+	 * oldest LSN that might be required by this replication slot.
+	 *
+	 * Points to the LSN of the most recent xl_running_xacts record where
+	 * no transaction that commits after confirmed_flush is already in
+	 * progress. At this point all xacts committing after confirmed_flush
+	 * can be read entirely into reorder buffers and all visibility
+	 * information can be reconstructed.
+	 */
 	XLogRecPtr	restart_lsn;
 
-	/* oldest LSN that the client has acked receipt for */
+	/*
+	 * oldest LSN that the client has acked receipt for
+	 *
+	 * Decoding will start calling output plugin callbacks for commits
+	 * after this LSN unless a different start point is specified by
+	 * the client.
+	 */
 	XLogRecPtr	confirmed_flush;
 
 	/* plugin name */
-- 
2.1.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] large regression for parallel COPY

2016-04-05 Thread Fabien COELHO


Hello Robert,

I tried the same test mentioned in the original post on cthulhu (EDB 
machine, CentOS 7.2, 8 sockets, 8 cores per socket, 2 threads per core, 
Xeon E7-8830 @ 2.13 GHz).  I attempted to test both the effects of 
multi_extend_v21 and the *_flush_after settings.


I'm not sure of {backend,writer}_flush_after intrinsic effectiveness, 
especially on HDDs, because although for the checkpointer 
(checkpoint_flush_after) there is a great deal of effort to generate large 
sequential writes, there is no such provisions for other write activities. 
I'm not sure how the write activity of the "parallel" copy is organized, 
but that sounds like it will generate less sequential writes than before, 
and the negative performance impact could be accentuated by flushing.


This might suggest that the benefit of these two settings are more 
irregular/hard to predict, so their default value should be 0 (aka off)?


Or maybe warn clearly in the documentation about the uncertain effects of 
these two settings?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Craig Ringer
I tried to address Andres's entirely valid complaints about that comment I
added in this patch.

I was going to add a description to restart_lsn in slot.h instead, but it's
proving harder to accurately describe restart_lsn than I'd like.

It's a point at which no transactions that commit after confirmed_lsn are
yet in progress. Easy enough.

But it's also a point at which visibility information can be reconstructed.
It's updated lazily by LogicalIncreaseRestartDecodingForSlot when the
client sends confirmation. Looking at SnapBuildProcessRunningXacts that can
be the most recent serialized snapshot from the snapshot builder (if
there's no xact running at confirmed_lsn) or the restart_decoding_lsn of
the oldest xact in progress at the confirmed_lsn. If I've read it correctly
the xact's restart_decoding_lsn is the ReorderBuffer's restart_decoding_lsn
at the time the xact started, which is in turn the last serialized snapshot
at that time.


So I'd describe restart_lsn as something like:

"
The oldest LSN that might be required by this replication slot.

Points to the LSN of the most recent xl_running_xacts record where no
transaction that commits after confirmed_flush is already in progress. At
this point all xacts committing after confirmed_flush can be read entirely
into reorder buffers and all visibility information can be reconstructed.
"

Is that accurate?

I want to get rid of the incorrect comment and fix this up.


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Fujii Masao
On Wed, Apr 6, 2016 at 2:51 PM, Masahiko Sawada  wrote:
> On Wed, Apr 6, 2016 at 2:21 PM, Fujii Masao  wrote:
>> On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs  wrote:
>>> On 5 April 2016 at 12:26, Fujii Masao  wrote:
>>>

 Multiple standbys with the same name may connect to the master.
 In this case, users might want to specifiy k<=N. So k<=N seems not invalid
 setting.
>>>
>>>
>>> Confusing as that is, it is already the case; k > N could make sense. ;-(
>>>
>>> However, in most cases, k > N would not make sense and we should issue a
>>> WARNING.
>>
>> Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
>> and the code for that test was included in the old patch (but I excluded it).
>> Now the majority seems to prefer to add that test, so I just revived and
>> revised that test code.
>
> The regression test codes seems not to be included in latest patch, no?

I intentionally excluded the regression test from the patch because
I'd like to review and commit it separately from the main part of the feature.

I'd appreciate if you read through the regression test which was included
in previous patch and update it if required.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-04-05 Thread Ian Barwick
On 05/04/16 10:24, Peter Eisentraut wrote:
> On 03/30/2016 09:15 PM, Ian Barwick wrote:
>> Currently pg_create_physical_replication_slot() may refer to
>> the deprecated wal_level setting "archive":
> 
> I have fixed this in the most direct way, since there was some disagreement 
> about rewording.

Thanks!


Regards

Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Fujii Masao
On Wed, Apr 6, 2016 at 2:18 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao  wrote 
> in 
>> >> list_member_int() performs the loop internally. So I'm not sure how much
>> >> adding extra list_member_int() here can optimize this processing.
>> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>> >> standby or not. In this idea, without adding extra loop, we can exit 
>> >> earilier
>> >> in the case where I'm not a sync standby. Does this make sense?
>> >
>> > The list_member_int() is also performed in the "(snip)" part. So
>> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
>> >
>> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
>> >
>> > /*
>> >  *  Quick exit if I am not synchronous or there's not
>> >  *  enough synchronous standbys
>> >  * /
>> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
>> > {
>> >   list_free(sync_standbys);
>> >   return false;
>> > }
>>
>> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
>> it checks whether we're managing a sync standby or not.
>> Attached is the updated version of the patch. I also applied several
>> review comments to the patch.
>
> It still does list_member_int but it can be gotten rid of as the
> attached patch.

Thanks for the review!

>
> regards,
>
> diff --git a/src/backend/replication/syncrep.c 
> b/src/backend/replication/syncrep.c
> index 9b2137a..6998bb8 100644
> --- a/src/backend/replication/syncrep.c
> +++ b/src/backend/replication/syncrep.c
> @@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
> if (XLogRecPtrIsInvalid(walsnd->flush))
> continue;
>
> +   /* Notify myself as 'synchonized' if I am */
> +   if (am_sync != NULL && walsnd == MyWalSnd)
> +   *am_sync = true;
> +
> /*
>  * If the priority is equal to 1, consider this standby as 
> sync
>  * and append it to the result. Otherwise append this standby
> @@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
> if (this_priority == 1)
> {
> result = lappend_int(result, i);
> -   if (am_sync != NULL && walsnd == MyWalSnd)
> -   *am_sync = true;
> if (list_length(result) == SyncRepConfig->num_sync)
> {
> list_free(pending);
> @@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
> {
> boolneedfree = (result != NIL && pending != NIL);
>
> -   if (am_sync != NULL && !(*am_sync))
> -   *am_sync = list_member_int(pending, MyWalSnd->slotno);
> -
> result = list_concat(result, pending);
> if (needfree)
> pfree(pending);
> @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
> }
>
> /*
> +* The pending list contains eventually potentially-synchronized 
> standbys
> +* and this walsender may be one of them. So once reset am_sync.
> +*/
> +   if (am_sync != NULL)
> +   *am_sync = false;
> +
> +   /*

This code seems wrong in the case where this walsender is in the result list.
So I adopted another logic. Attached is the updated version of the patch.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2906,2939  include_dir 'conf.d'


 
! Specifies a comma-separated list of standby names that can support
  synchronous replication, as described in
  .
! At any one time there will be at most one active synchronous standby;
  transactions waiting for commit will be allowed to proceed after
! this standby server confirms receipt of their data.
! The synchronous standby will be the first standby named in this list
  that is both currently connected and streaming data in real-time
  (as shown by a state of streaming in the
  
  pg_stat_replication view).
  Other standby servers appearing later in this list represent potential
! synchronous standbys.
! If the current synchronous standby disconnects for whatever reason,
  it will be replaced immediately with the next-highest-priority standby.
  Specifying more than one standby name can allow very high availability.
 
 
  The name of a standby server for this purpose is the
  application_name setting of the standby, as set in the
  primary_conninfo of the standby's WAL receiver.  There is
  no mechanism to enforce uniqueness. In case of duplicates one of the
! matching standbys will be chosen to be the synchronous standby, though
  exactly which one is indetermina

Re: [HACKERS] WAL logging problem in 9.4.3?

2016-04-05 Thread Michael Paquier
On Wed, Mar 23, 2016 at 12:45 PM, Michael Paquier
 wrote:
> On Wed, Mar 23, 2016 at 11:11 AM, David Steele  wrote:
>> I would prefer not to bump it to the next CF unless we decide this will
>> not get fixed for 9.6.
>
> It may make sense to add that to the list of open items for 9.6
> instead. That's not a feature.

So I have moved this patch to the next CF for now, and will work on
fixing it rather soonishly as an effort to stabilize 9.6 as well as
back-branches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-04-05 Thread Michael Paquier
On Sat, Apr 2, 2016 at 12:30 AM, Tom Lane  wrote:
> I wrote:
>> So the core of my complaint is that we need to fix things so that, whether
>> or not we are able to create the PGRES_FATAL_ERROR PGresult (and we'd
>> better consider the behavior when we cannot), ...
>
> BTW, the real Achilles' heel of any attempt to ensure sane behavior at
> the OOM limit is this possibility of being unable to create a PGresult
> with which to inform the client that we failed.
>
> I wonder if we could make things better by keeping around an emergency
> backup PGresult struct.  Something along these lines:
>
> 1. Add a field "PGresult *emergency_result" to PGconn.
>
> 2. At the very beginning of any PGresult-returning libpq function, check
> to see if we have an emergency_result, and if not make one, ensuring
> there's room in it for a reasonable-size error message; or maybe even
> preload it with "out of memory" if we assume that's the only condition
> it'll ever be used for.  If malloc fails at this point, just return NULL
> without doing anything or changing any libpq state.  (Since a NULL result
> is documented as possibly caused by OOM, this isn't violating any API.)
>
> 3. Subsequent operations never touch the emergency_result unless we're
> up against an OOM, but it can be used to return a failure indication
> to the client so long as we leave libpq in a state where additional
> calls to PQgetResult would return NULL.
>
> Basically this shifts the point where an unreportable OOM could happen
> from somewhere in the depths of libpq to the very start of an operation,
> where we're presumably in a clean state and OOM failure doesn't leave
> us with a mess we can't clean up.

I have moved this patch to next CF for the time being. As that's a
legit bug and not a feature, that should be fine to pursue work on
this item even if this CF ends.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Michael Paquier
On Wed, Apr 6, 2016 at 2:51 PM, Masahiko Sawada  wrote:
> On Wed, Apr 6, 2016 at 2:21 PM, Fujii Masao  wrote:
>> On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs  wrote:
>>> On 5 April 2016 at 12:26, Fujii Masao  wrote:
>>>

 Multiple standbys with the same name may connect to the master.
 In this case, users might want to specifiy k<=N. So k<=N seems not invalid
 setting.
>>>
>>>
>>> Confusing as that is, it is already the case; k > N could make sense. ;-(
>>>
>>> However, in most cases, k > N would not make sense and we should issue a
>>> WARNING.
>>
>> Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
>> and the code for that test was included in the old patch (but I excluded it).
>> Now the majority seems to prefer to add that test, so I just revived and
>> revised that test code.
>
> The regression test codes seems not to be included in latest patch, no?

I am looking at the latest patch now, and they are not included. It
would be good to get those tests bundled in for a last lookup I think.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump fails to handle WAL file with multi-page XLP_FIRST_IS_CONTRECORD data

2016-04-05 Thread Craig Ringer
On 1 April 2016 at 21:30, Craig Ringer  wrote:


> I'll attach the new testcase once I either get it to reproduce this bug or
> give up and leave the basic xlogdump testcase alone.
>

I had another bash at this and I still can't reproduce it on master using
the giant commit record approach Andres suggested. In fact I generated a
commit record larger than an entire xlog segment and it was still fine.

The DO procedure I posted upthread, when run on 9.4, reliably produces
segments that the xlogreader cannot decode with the symptoms Pavan
reported. It's fine on 9.6.

So I can't reproduce this on 9.6, but there might be a separate bug on 9.4.

I've attached a patch with the simple tests I added for pg_xlogdump as part
of this. I doubt it'd be desirable to commit the ridiculous commit record
part, but that's trivially removed, and I left it in place in case someone
else wanted to fiddle with other ways to reproduce this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From bf05b552dd9eb5d2e9b50c77928426817033685d Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 6 Apr 2016 13:57:32 +0800
Subject: [PATCH] Tests for pg_xlogdump

---
 src/bin/pg_xlogdump/Makefile |   6 +
 src/bin/pg_xlogdump/t/010_pg_xlogdump.pl | 249 +++
 2 files changed, 255 insertions(+)
 create mode 100644 src/bin/pg_xlogdump/t/010_pg_xlogdump.pl

diff --git a/src/bin/pg_xlogdump/Makefile b/src/bin/pg_xlogdump/Makefile
index 11df47d..c1138d2 100644
--- a/src/bin/pg_xlogdump/Makefile
+++ b/src/bin/pg_xlogdump/Makefile
@@ -38,3 +38,9 @@ uninstall:
 
 clean distclean maintainer-clean:
 	rm -f pg_xlogdump$(X) $(OBJS) $(RMGRDESCSOURCES) xlogreader.c
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_xlogdump/t/010_pg_xlogdump.pl b/src/bin/pg_xlogdump/t/010_pg_xlogdump.pl
new file mode 100644
index 000..93b2063
--- /dev/null
+++ b/src/bin/pg_xlogdump/t/010_pg_xlogdump.pl
@@ -0,0 +1,249 @@
+use strict;
+use warnings;
+use bigint qw(hex);
+use Cwd;
+use Config;
+use File::Basename qw(basename);
+use List::Util qw(minstr maxstr);
+use PostgresNode;
+use TestLib;
+use Test::More;
+use Data::Dumper;
+
+my $verbose = 1;
+
+sub slurp_xlogdump_lines
+{
+	my ($node, $timeline, $firstwal, $lastwal) = @_;
+	my ($stdout, $stderr) = ('', '');
+
+	if (defined($firstwal) && $firstwal =~ /^[[:xdigit:]]{24}$/)
+	{
+		$firstwal = $node->data_dir . "/pg_xlog/" . $firstwal;
+	}
+
+	if (defined($lastwal) && $lastwal !~ /^[[:xdigit:]]{24}$/)
+	{
+		die("pg_xlogdump expects the last WAL seg to be a bare filename, not '$lastwal'");
+	}
+
+	if (!defined($firstwal) || !defined($lastwal))
+	{
+		my $wal_pattern = sprintf("%s/pg_xlog/%08X" . ("?" x 16), $node->data_dir, $timeline);
+		my @wal = glob $wal_pattern;
+		$firstwal = List::Util::minstr(@wal) if !defined($firstwal);
+		$lastwal = basename(List::Util::maxstr(@wal)) if !defined($lastwal);
+	}
+
+	diag("decoding from " . $firstwal . " to " . $lastwal)
+		if $verbose;
+
+	IPC::Run::run ['pg_xlogdump', $firstwal, $lastwal], '>', \$stdout, '2>', \$stderr;
+
+	like($stderr, qr/(?:invalid record length at [0-9A-F]+\/[0-9A-F]+: wanted 24, got 0|^\s*$)/,
+		'pg_xlogdump exits with expected error or silence');
+
+	diag "xlogdump exited with: '$stderr'"
+		if $verbose;
+
+	my $lineno = 1;
+
+	my @xld_lines = split("\n", $stdout);
+
+	return \@xld_lines;
+}
+
+sub match_xlogdump_lines
+{
+	my ($node, $timeline, $firstwal, $lastwal) = @_;
+
+	my $xld_lines = slurp_xlogdump_lines($node, $timeline, $firstwal, $lastwal);
+
+	my @records;
+	my $lineno = 1;
+
+	for my $xld_line (@$xld_lines)
+	{
+		chomp $xld_line;
+
+		# We don't use Test::More's 'like' here because it'd run a variable number
+		# of tests. Instead do our own diagnostics and fail.
+		if ($xld_line =~ /^rmgr: (\w+)\s+len \(rec\/tot\):\s*([[:digit:]]+)\/\s*([[:digit:]]+), tx:\s*([[:digit:]]*), lsn: ([[:xdigit:]]{1,8})\/([[:xdigit:]]{1,8}), prev ([[:xdigit:]]{1,8})\/([[:xdigit:]]{1,8}), desc: (.*)$/)
+		{
+			my %record = (
+'_raw' => length($xld_line) >= 100 ? substr($xld_line,0,100) . "..." : $xld_line,
+'rmgr' => $1,
+'len_rec' => int($2),
+'len_tot' => int($3),
+'tx' => defined($4) ? int($4) : undef,
+'lsn' => (hex($5)<<32) + hex($6),
+'prev' => (hex($7)<<32) + hex($8),
+'lsn_str' => "$5/$6",
+'prev_str' => "$7/$8",
+'desc' => length($9) >= 100 ? substr($9,0,100) . "..." : $9,
+			);
+
+			if ($record{'prev'} >= $record{'lsn'})
+			{
+diag("in xlog record on line $lineno:\n$xld_line\n   ... prev ptr $record{prev_str} is greater than or equal to lsn ptr $record{lsn_str}");
+cmp_ok($record{prev}, 'lt', $record{lsn}, 'previous pointer less than lsn ptr');
+			}
+
+			push @records, \%record;
+		}
+		else
+		{
+			diag("xlog record on line $lineno:\n$xld_line\n   ...does not match the test pattern");
+			fail("an xlog record didn't match the expected p

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Masahiko Sawada
On Wed, Apr 6, 2016 at 2:21 PM, Fujii Masao  wrote:
> On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs  wrote:
>> On 5 April 2016 at 12:26, Fujii Masao  wrote:
>>
>>>
>>> Multiple standbys with the same name may connect to the master.
>>> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
>>> setting.
>>
>>
>> Confusing as that is, it is already the case; k > N could make sense. ;-(
>>
>> However, in most cases, k > N would not make sense and we should issue a
>> WARNING.
>
> Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
> and the code for that test was included in the old patch (but I excluded it).
> Now the majority seems to prefer to add that test, so I just revived and
> revised that test code.

The regression test codes seems not to be included in latest patch, no?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Fujii Masao
On Tue, Apr 5, 2016 at 11:47 PM, Robert Haas  wrote:
> On Mon, Apr 4, 2016 at 4:28 AM, Fujii Masao  wrote:
 + ereport(LOG,
 + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
 + application_name, MyWalSnd->sync_standby_priority)));

 s/ the / a /
>>
>> I have no objection to this change itself. But we have used this message
>> in 9.5 or before, so if we apply this change, probably we need
>> back-patching.
>
> "the" implies that there can be only one synchronous standby at that
> priority, while "a" implies that there could be more than one.  So the
> situation might be different with this patch than previously.  (I
> haven't read the patch so I don't know whether this is actually true,
> but it might be what Thomas was going for.)

Thanks for the explanation!
I applied that change, to the latest patch I posted upthread.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Fujii Masao
On Tue, Apr 5, 2016 at 11:40 PM, Amit Kapila  wrote:
> On Tue, Apr 5, 2016 at 3:15 PM, Fujii Masao  wrote:
>>
>> On Tue, Apr 5, 2016 at 4:31 PM, Amit Kapila 
>> wrote:
>> > On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao 
>> > wrote:
>> >>
>> >>
>> >> Thanks for updating the patch!
>> >>
>> >> I applied the following changes to the patch.
>> >> Attached is the revised version of the patch.
>> >>
>> >
>> > 1.
>> >{
>> > {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
>> > gettext_noop("List of names of potential synchronous standbys."),
>> > NULL,
>> > GUC_LIST_INPUT
>> > },
>> > &SyncRepStandbyNames,
>> > "",
>> > check_synchronous_standby_names, NULL, NULL
>> > },
>> >
>> > Isn't it better to modify the description of synchronous_standby_names
>> > in
>> > guc.c based on new usage?
>>
>> What about "Number of synchronous standbys and list of names of
>> potential synchronous ones"? Better idea?
>>
>
> Looks good.
>
>>
>> > 2.
>> > pg_stat_get_wal_senders()
>> > {
>> > ..
>> > /*
>> > ! * Allocate and update the config data of synchronous replication,
>> > ! * and then get the currently active synchronous standbys.
>> >   */
>> > + SyncRepUpdateConfig();
>> >   LWLockAcquire(SyncRepLock, LW_SHARED);
>> > ! sync_standbys = SyncRepGetSyncStandbys();
>> >   LWLockRelease(SyncRepLock);
>> > ..
>> > }
>> >
>> > Why is it important to update the config with patch?  Earlier also any
>> > update to config between calls wouldn't have been visible.
>>
>> Because a backend has no chance to call SyncRepUpdateConfig() and
>> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
>> called here. This means that pg_stat_replication may return the
>> information
>> based on the old value of s_s_names.
>>
>
> Thats right, but without this patch also won't pg_stat_replication can show
> old information? If no, why so?

Without the patch, when s_s_names is changed and SIGHUP is sent,
a backend calls ProcessConfigFile(), parse the configuration file and
set the global variable SyncRepStandbyNames to the latest value of
s_s_names. When pg_stat_replication is accessed, a backend calculates
which standby is synchronous based on that latest value in SyncRepStandbyNames,
and then displays the information of sync replication.

With the patch, basically the same steps are executed when s_s_names is
changed. But the difference is that, with the patch, SyncRepUpdateConfig()
must be called after ProcessConfigFile() is called before the calculation of
sync standbys. So I just added the call of SyncRepUpdateConfig() to
pg_stat_get_wal_senders().

BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
from pg_stat_get_wal_senders() and every backends always parse the value
of s_s_names when the setting is changed.

>> > 3.
>> >   Planning for High Availability
>> >
>> >  
>> > ! synchronous_standby_names specifies the number of
>> > ! synchronous standbys that transaction commits made when
>> >
>> > Is it better to say like: synchronous_standby_names
>> > specifies
>> > the number and names of
>>
>> Precisely s_s_names specifies a list of names of potential sync standbys
>> not sync ones.
>>
>
> Okay, but you doesn't seem to have updated this in your latest patch.

I applied the change you suggested, to the patch. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-05 Thread Craig Ringer
On 6 April 2016 at 12:42, Noah Misch  wrote:


>
> The chapter already does describe pg_basebackup before describing
> pg_start_backup; what else did the plan entail?
>

http://www.postgresql.org/docs/current/static/backup.html


24.1. SQL Dump
24.1.1. Restoring the Dump
24.1.2. Using pg_dumpall
24.1.3. Handling Large Databases
24.2. File System Level Backup
24.3. Continuous Archiving and Point-in-Time Recovery (PITR)
24.3.1. Setting Up WAL Archiving
24.3.2. Making a Base Backup
24.3.3. Making a Base Backup Using the Low Level API
24.3.4. Recovering Using a Continuous Archive Backup
24.3.5. Timelines
24.3.6. Tips and Examples
24.3.7. Caveats

"File System Level Backup" should be last, and should link to pg_basebackup
as the strongly recommended method.

I'm not too keen on how pg_basebackup is only covered under continuous
archiving and PITR, but OTOH that's really how it's mostly used  and I
think most people who're doing physical backups are doing continuous
archiving. Or should be.

I'd also be inclined to add a hint in the sql dumps section to the effect
that while there's no incremental/differential pg_dump support, you can do
incrementals with continuous archiving.

In "Making a base backup" discussion of pg_basebackup should probably
mention that 'pg_basebackup -X stream' is required if you want a
self-contained base backup.

I think that'd address most of the confusion I've seen on occasion.

I'd love to spend several solid days reorganizing and updating those docs
some day, but I think that for now moving some chunks around will still
help.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Choosing parallel_degree

2016-04-05 Thread Amit Kapila
On Tue, Apr 5, 2016 at 11:55 PM, Julien Rouhaud 
wrote:
>
> On 05/04/2016 06:19, Amit Kapila wrote:
> >
> > Few more comments:
> >
> > 1.
> > @@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
> > UNLOGGED ] TABLE [ IF NOT EXI
> >
> > 
> >
> > 
> > +parallel_degree (integer)
> > +
> > +
> > 
> > + Sets the degree of parallelism for an individual relation.  The
> > requested
> > + number of workers will be
> > limited by  > + linkend="guc-max-parallel-degree">.
> > + 
> > +
> > +   
> >
> > All other parameters in this category are supportted by Alter table
> > command as well, so I think this parameter should also be supported by
> > Alter Table command (for both SET and RESET variants).
> >
>
> I don't quite understand.  With the patch you can use parallel_degree in
> either CREATE or ALTER table (SET and RESET) statements.
>

No issues then.

>  Considering
> documentation, the list of storage parameters only appears in
> create_table.sgml, alter_table.sgml pointing to it.
>
> In alter_table.sgml, I didn't comment the lock level needed to modify
> parallel_degree since it requires an access exclusive lock for now.
> While thinking about it, I think it's safe to use a share update
> exclusive lock but I may be wrong.  What do you think?
>

We require to keep AccessExclusiveLock for operations which can impact
Select operation which I think this operation does, so lets
retain AccessExclusiveLock for now.  If somebody else thinks, we should not
bother about Selects, then we can change it.

> > 2.
> > +"Number of parallel processes per executor node wanted for this
relation.",
> >
> > How about
> > Number of parallel processes that can be used for this relation per
> > executor node.
> >
>
> I just rephrased what was used for the max_parallel_degree GUC, which is:
>
> "Sets the maximum number of parallel processes per executor node."
>
> I find your version better once again, but should we keep some
> consistency between them or it's not important?
>

I think consistency is good, but this is different from
max_parallel_degree, so I would prefer to use something on lines of what I
have mentioned.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Fujii Masao
On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs  wrote:
> On 5 April 2016 at 12:26, Fujii Masao  wrote:
>
>>
>> Multiple standbys with the same name may connect to the master.
>> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
>> setting.
>
>
> Confusing as that is, it is already the case; k > N could make sense. ;-(
>
> However, in most cases, k > N would not make sense and we should issue a
> WARNING.

Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
and the code for that test was included in the old patch (but I excluded it).
Now the majority seems to prefer to add that test, so I just revived and
revised that test code.

Attached is the updated version of the patch. I also completed Amit's
and Robert's comments.

Regards,

-- 
Fujii Masao
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2906,2939  include_dir 'conf.d'


 
! Specifies a comma-separated list of standby names that can support
  synchronous replication, as described in
  .
! At any one time there will be at most one active synchronous standby;
  transactions waiting for commit will be allowed to proceed after
! this standby server confirms receipt of their data.
! The synchronous standby will be the first standby named in this list
  that is both currently connected and streaming data in real-time
  (as shown by a state of streaming in the
  
  pg_stat_replication view).
  Other standby servers appearing later in this list represent potential
! synchronous standbys.
! If the current synchronous standby disconnects for whatever reason,
  it will be replaced immediately with the next-highest-priority standby.
  Specifying more than one standby name can allow very high availability.
 
 
  The name of a standby server for this purpose is the
  application_name setting of the standby, as set in the
  primary_conninfo of the standby's WAL receiver.  There is
  no mechanism to enforce uniqueness. In case of duplicates one of the
! matching standbys will be chosen to be the synchronous standby, though
  exactly which one is indeterminate.
  The special entry * matches any
  application_name, including the default application name
  of walreceiver.
 
 
  If no synchronous standby names are specified here, then synchronous
  replication is not enabled and transaction commits will not wait for
--- 2906,2974 


 
! Specifies a list of standby names that can support
  synchronous replication, as described in
  .
! There will be one or more active synchronous standbys;
  transactions waiting for commit will be allowed to proceed after
! these standby servers confirm receipt of their data.
! The synchronous standbys will be those whose names appear
! earlier in this list, and
  that is both currently connected and streaming data in real-time
  (as shown by a state of streaming in the
  
  pg_stat_replication view).
  Other standby servers appearing later in this list represent potential
! synchronous standbys. If any of the current synchronous
! standbys disconnects for whatever reason,
  it will be replaced immediately with the next-highest-priority standby.
  Specifying more than one standby name can allow very high availability.
 
 
+ This parameter specifies a list of standby servers by using
+ either of the following syntaxes:
+ 
+ num_sync ( standby_name [, ...] )
+ standby_name [, ...]
+ 
+ where num_sync is
+ the number of synchronous standbys that transactions need to
+ wait for replies from,
+ and standby_name
+ is the name of a standby server. For example, a setting of
+ '3 (s1, s2, s3, s4)' makes transaction commits wait
+ until their WAL records are received by three higher priority standbys
+ chosen from standby servers s1, s2,
+ s3 and s4.
+ 
+ 
+ The second syntax was used before PostgreSQL
+ version 9.6 and is still supported. It's the same as the first syntax
+ with num_sync=1.
+ For example, both settings of '1 (s1, s2)' and
+ 's1, s2' have the same meaning; either s1
+ or s2 is chosen as a synchronous standby.
+
+
  The name of a standby server for this purpose is the
  application_name setting of the standby, as set in the
  primary_conninfo of the standby's WAL receiver.  There is
  no mechanism to enforce uniqueness. In case of duplicates one of the
! matching standbys will be considered as higher priorit

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Kyotaro HORIGUCHI
At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao  wrote in 

> >> list_member_int() performs the loop internally. So I'm not sure how much
> >> adding extra list_member_int() here can optimize this processing.
> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
> >> standby or not. In this idea, without adding extra loop, we can exit 
> >> earilier
> >> in the case where I'm not a sync standby. Does this make sense?
> >
> > The list_member_int() is also performed in the "(snip)" part. So
> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
> >
> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
> >
> > /*
> >  *  Quick exit if I am not synchronous or there's not
> >  *  enough synchronous standbys
> >  * /
> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
> > {
> >   list_free(sync_standbys);
> >   return false;
> > }
> 
> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
> it checks whether we're managing a sync standby or not.
> Attached is the updated version of the patch. I also applied several
> review comments to the patch.

It still does list_member_int but it can be gotten rid of as the
attached patch.

regards,
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 9b2137a..6998bb8 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
 		if (XLogRecPtrIsInvalid(walsnd->flush))
 			continue;
 
+		/* Notify myself as 'synchonized' if I am */
+		if (am_sync != NULL && walsnd == MyWalSnd)
+			*am_sync = true;
+
 		/*
 		 * If the priority is equal to 1, consider this standby as sync
 		 * and append it to the result. Otherwise append this standby
@@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
 		if (this_priority == 1)
 		{
 			result = lappend_int(result, i);
-			if (am_sync != NULL && walsnd == MyWalSnd)
-*am_sync = true;
 			if (list_length(result) == SyncRepConfig->num_sync)
 			{
 list_free(pending);
@@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
 	{
 		bool		needfree = (result != NIL && pending != NIL);
 
-		if (am_sync != NULL && !(*am_sync))
-			*am_sync = list_member_int(pending, MyWalSnd->slotno);
-
 		result = list_concat(result, pending);
 		if (needfree)
 			pfree(pending);
@@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
 	}
 
 	/*
+	 * The pending list contains eventually potentially-synchronized standbys
+	 * and this walsender may be one of them. So once reset am_sync.
+	 */
+	if (am_sync != NULL)
+		*am_sync = false;
+
+	/*
 	 * Find the sync standbys from the pending list.
 	 */
 	priority = next_highest_priority;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd system-column handling in postgres_fdw join pushdown patch

2016-04-05 Thread Noah Misch
On Tue, Apr 05, 2016 at 03:41:00PM +0900, Etsuro Fujita wrote:
> On 2016/03/29 15:37, Etsuro Fujita wrote:
> >I added two helper functions: GetFdwScanTupleExtraData and
> >FillFdwScanTupleSysAttrs.  The FDW author could use the former to get
> >info about system attributes other than ctids and oids in fdw_scan_tlist
> >during BeginForeignScan, and the latter to set values for these system
> >attributes during IterateForeignScan (InvalidTransactionId for
> >xmins/xmaxs, InvalidCommandId for cmins/cmaxs, and valid values for
> >tableoids).  Attached is a proposed patch for that.  I also slightly
> >simplified the changes to make_tuple_from_result_row and
> >conversion_error_callback made by the postgres_fdw join pushdown patch.
> >  What do you think about that?
> 
> I revised comments a little bit.  Attached is an updated version of the
> patch.  I think this issue should be fixed in advance of the PostgreSQL
> 9.6beta1 release.

Of the foreign table columns affected here, I bet only tableoid sees
non-negligible use.  Even tableoid is not very prominent, so I would not have
thought of this as a beta1 blocker.  What about this bug makes pre-beta1
resolution especially valuable?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Petr Jelinek

On 06/04/16 06:13, Dmitry Dolgov wrote:


On 6 April 2016 at 03:29, Andrew Dunstan mailto:and...@dunslane.net>> wrote:


Yeah, keeping it but rejecting update of an existing key is my
preference too.

cheers

andrew


Yes, it sounds quite reasonable. Here is a new version of patch (it will
throw
an error for an existing key). Is it better now?


This seems like reasonable compromise to me. I wonder if the errcode 
should be ERRCODE_INVALID_PARAMETER_VALUE but don't feel too strongly 
about that.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-05 Thread Noah Misch
On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> I've pushed this version, and also added the item from the Brussels
> developer meeting to actually rewrite the main backup docs to the open
> items so they are definitely not forgotten for 9.6.

Here's that PostgreSQL 9.6 open item:

* Update of backup documentation (assigned to Bruce at 
[https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting Brussels 
Developer Meeting], but others are surely allowed to work on it as well)
** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the current 
documentation is now incorrect

Unless Bruce endorsed this development strategy, I think it unfair for commit
7117685 to impose a deadline on his backup.sgml project.  Did commit 7117685
indeed make the documentation "incorrect?"  Coverage in the Backup and Restore
chapter would be a good thing, but I don't think that gap alone makes 7117685,
with its reference-only documentation, incorrect.  Did 7117685 impair
documentation in any other respect?


Incidentally, I'm not clear on the extent of Bruce's plans to change backup
documentation.  Relevant meeting note fragment:

  Magnus: We need a new robust API fornon-exclusive backups
  Simon: Keep but deprecate the existing API.
  Need to find a better way to ensure users have the required xlog in backups
  Craig: Our docs are in the wrong order. pg_basebackup should be first, ahead 
of manual methods.
  Action point: Re-arrange backup docs page (Bruce)

The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Dmitry Dolgov
On 6 April 2016 at 03:29, Andrew Dunstan  wrote:

>
> Yeah, keeping it but rejecting update of an existing key is my preference
> too.
>
> cheers
>
> andrew
>

Yes, it sounds quite reasonable. Here is a new version of patch (it will
throw
an error for an existing key). Is it better now?
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1bc9fbc..0d4b3a1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10903,6 +10903,9 @@ table2-mapping
jsonb_set
   
   
+   jsonb_insert
+  
+  
jsonb_pretty
   
 
@@ -11184,6 +11187,39 @@ table2-mapping
 

   
+   
+   
+   jsonb_insert(target jsonb, path text[], new_value jsonb, insert_after boolean)
+   
+   
+   jsonb
+   
+ Returns target with
+ new_value inserted.  If
+ target section designated by
+ path is in a JSONB array,
+ new_value will be inserted before target or
+ after if insert_after is true (default is
+ false). If target section
+ designated by path is in JSONB object,
+ new_value will be inserted only if
+ target does not exist. As with the path
+ orientated operators, negative integers that appear in
+ path count from the end of JSON arrays.
+   
+   
+   
+   jsonb_insert('{"a": [0,1,2]}', '{a, 1}', '"new_value"')
+   
+   
+   jsonb_insert('{"a": [0,1,2]}', '{a, 1}', '"new_value"', true)
+   
+   
+   {"a": [0, "new_value", 1, 2]}
+ {"a": [0, 1, "new_value", 2]}
+
+   
+  
jsonb_pretty(from_json jsonb)
  
text
@@ -11235,10 +11271,11 @@ table2-mapping
   
 
   All the items of the path parameter of jsonb_set
-  must be present in the target, unless
-  create_missing is true, in which case all but the last item
-  must be present. If these conditions are not met the target
-  is returned unchanged.
+  as well as jsonb_insert except the last item must be present
+  in the target. If create_missing is false, all
+  items of the path parameter of jsonb_set must be
+  present. If these conditions are not met the target is
+  returned unchanged.
 
 
   If the last path item is an object key, it will be created if it
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9ae1ef4..a6e661c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -997,3 +997,11 @@ RETURNS text[]
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'parse_ident';
+
+CREATE OR REPLACE FUNCTION
+  jsonb_insert(jsonb_in jsonb, path text[] , replacement jsonb,
+insert_after boolean DEFAULT false)
+RETURNS jsonb
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'jsonb_insert';
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 97e0e8e..c1b8041 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -33,6 +33,15 @@
 #include "utils/memutils.h"
 #include "utils/typcache.h"
 
+/* Operations available for setPath */
+#define JB_PATH_NOOP	0x
+#define JB_PATH_CREATE	0x0001
+#define JB_PATH_DELETE	0x0002
+#define JB_PATH_INSERT_BEFORE			0x0004
+#define JB_PATH_INSERT_AFTER			0x0008
+#define JB_PATH_CREATE_OR_INSERT (JB_PATH_INSERT_BEFORE | JB_PATH_INSERT_AFTER \
+		| JB_PATH_CREATE)
+
 /* semantic action functions for json_object_keys */
 static void okeys_object_field_start(void *state, char *fname, bool isnull);
 static void okeys_array_start(void *state);
@@ -130,14 +139,14 @@ static JsonbValue *IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
 static JsonbValue *setPath(JsonbIterator **it, Datum *path_elems,
 		bool *path_nulls, int path_len,
 		JsonbParseState **st, int level, Jsonb *newval,
-		bool create);
+		int op_type);
 static void setPathObject(JsonbIterator **it, Datum *path_elems,
 			  bool *path_nulls, int path_len, JsonbParseState **st,
 			  int level,
-			  Jsonb *newval, uint32 npairs, bool create);
+			  Jsonb *newval, uint32 npairs, int op_type);
 static void setPathArray(JsonbIterator **it, Datum *path_elems,
 			 bool *path_nulls, int path_len, JsonbParseState **st,
-			 int level, Jsonb *newval, uint32 nelems, bool create);
+			 int level, Jsonb *newval, uint32 nelems, int op_type);
 static void addJsonbToParseState(JsonbParseState **jbps, Jsonb *jb);
 
 /* state for json_object_keys */
@@ -3544,7 +3553,7 @@ jsonb_set(PG_FUNCTION_ARGS)
 	it = JsonbIteratorInit(&in->root);
 
 	res = setPath(&it, path_elems, path_nulls, path_len, &st,
-  0, newval, create);
+  0, newval, create ? JB_PATH_CREATE : JB_PATH_NOOP);
 
 	Assert(res != NULL);
 
@@ -3588,7 +3597,52 @@ jsonb_delete_path(PG_FUNCTION_ARGS)
 
 	it = JsonbIteratorInit(&in->root);
 
-	res = setPath(&it, path_elems, path_nulls, path_len, &st, 0, NULL, false);
+	res = setPath(&it

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-05 Thread Abhijit Menon-Sen
At 2016-04-05 18:45:58 -0300, alvhe...@2ndquadrant.com wrote:
>
> I changed the regression test a bit more, so please recheck.

Looks good, thank you.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2016-04-05 Thread Craig Ringer
On 5 April 2016 at 04:19, Oleksii Kliukin  wrote:


> Thank you for the update. I’ve got some rejects when applying the
> 0001-Allow-replication-slots-to-follow-failover.patch after the "Dirty
> replication slots when confirm_lsn is changed” changes. I think it should
> be rebased against the master, (might be the consequence of the "logical
> slots follow the timeline” patch committed).
>

I'll rebase it on top of the new master after timeline following for
logical slots got committed and follow up shortly.

That said, I've marked this patch 'returned with feedback' in the CF. It
should possibly actually be 'rejected' given the discussion on the logical
decoding timeline following thread, which points heavily at a different
approach to solving this problem in 9.7.

That doesn't mean nobody can pick it up if they think it's valuable and
want to run with it, but we're very close to feature freeze now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Petr Jelinek

On 06/04/16 03:29, Robert Haas wrote:



I don't understand why it seems to be considered OK for logical slots to
just vanish on failover. The only other things I can think of where that's
considered OK are unlogged tables (because that's the point and we have
failover-safe ones too) and the old hash indexes nobody's quite willing to
remove yet.


First, it wasn't until 9.3 that physical standbys could follow
timeline switches, but that doesn't mean that streaming replication
was useless in 9.0 - 9.2, or that warm standby was useless in earlier
versions.  Logical decoding isn't useless without that capability
either.  Would it be nice if we did have that capability?  Of course.



Except that in 9.0 - 9.2 there was working workaround for that, there is 
no such thing for logical decoding.



Second, I'm not sure whether it was a good design decision to make
logical slots a special kind of object that sit off to the side,
neither configuration (like postgresql.conf) nor WAL-protected data
(like pg_clog and the data files themselves), but it was certainly a
very deliberate decision.  I sort of expected them to be WAL-logged,
but Andres argued (not unconvincingly) that we'd want to have slots on
standbys, and making them WAL-logged would preclude that.


I do think it was good design decision. We just need to make them 
failoverable bit differently and the failover slots patch IMHO isn't the 
right way either as I said in another reply in this thread.





Review and test responses have been pretty underwhelming for pglogical, and
quite a bit seem to have boiled down to "this should live as an extension,
we don't need it in core". It often feels like we can't win: if we seek to
get it into core we're told it's not wanted/needed, but if we try to focus
on solving issues in core to make it work better and let it live as an
extension we're told we shouldn't bother until it's in core.


To be honest, I was shocked that pglogical and pglogical_output didn't
go into this release.  I assumed that you and other folks at
2ndQuadrant were going to make a big push to get that done.  I did
take a brief look at one of them - pglogical, I think - a week or two
ago but there were unaddressed review comments that had been pending
for months and there were a lot of fairly obvious things that needed
to be done before it could be seriously considered as a core
submission.  Like, for example, rewriting the documentation heavily
and making it look like the rest of our docs, and putting it in SGML
format.  The code seemed to need quite a bit of cleanup, too.  Now,
logical replication is a sufficiently important feature that if the
only way it's going to get into core is if I work on it myself, or get
other people at EnterpriseDB to do so, then I'll try to make that
happen.  But I was assuming that that was your/2ndQuadrant's patch,
that you were going to get it in shape, and that me poking my nose
into it wasn't going to be particularly welcome.  Maybe I've misread
the whole dynamic here.


I guess you did, me and I think Craig as well hoped for some feedback on 
the general ideas in terms of protocol, node setup (I mean catalogs) and 
general architecture from the community. That didn't really happen. And 
without any of that happening I didn't feel confident trying to get it 
right within last month of dev cycle. Especially given the size of the 
patch and the fact we also had other patches that we worked on and had 
realistically higher chance of getting in. Not sure how Craig feels 
about it. Converting documentation, renaming some params in function 
names etc (those unaddressed comments) seemed like secondary to me.


(As a side note I was also 2 weeks without proper working laptop around 
FOSDEM time which had effect on my responses to -hackers about the 
topic, especially to Steve Singer who did good job of reviewing the 
usability at the time, but even if I had it it would not saved the patch)


In general I think project of this size requires more attention from 
committer to help shepherding it and neither Craig or me are that. I am 
glad that Andres said he plans to give some time in next cycle to 
logical replication because that should be big help.




That being said, if we get a logical replication system into core that
doesn't do DDL, doesn't do multi-master, doesn't know squat about
sequences, and rolls over and dies if a timeline switch happens, I
would consider that a huge step forward and I think a lot of other
people would, too.


I agree with the exception of working HA. I would consider it very sad 
if we got logical replication in core without having any provision for 
continuity of service. Doing that is relatively trivial in comparison to 
the logical replication itself however.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscri

Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Craig Ringer
On 6 April 2016 at 09:29, Robert Haas  wrote:


> > Right now if you're doing any kind of logical deocoding from a master
> server
> > that fails over to a standby the client just dies. The slot vanishes.
> You're
> > stuffed. Gee, I hope you didn't need all that nice consistent ordering,
> > because you're going to have to start from scratch and somehow reconcile
> the
> > data in the new master with what you've already received ... and haven't.
>
> Right, but right now you probably *aren't* do doing any kind of
> logical decoding from a master server to a standby, because there's
> squat in the core distribution that could make use of that capability.
> So you never get as far as discovering this problem.
>

OK, I'm starting to understand where this is going.

pglogical and pglogical_output are irrelevant so long as they haven't yet
landed in core. There is no logical replication worth considering,
therefore anything that improves logical replication is not worth the
effort because there's really nothing to improve. Why bother with
enhancements like timeline following when nothing uses it.

Right?

That's pretty much what I'm hearing from a number of others too. Obviously
I don't wholly agree, but I do see where you're coming from. Especially
since pglogical is in a sort of twilight zone where it's neither a public,
open project with a proper website, public SCM, etc that encourages
external contribution, nor is it strictly and only a submission to core.

It's not like, say, PostGIS, in that it's always been aimed at core (or
stated to be) and is something that can be realistically considered for
core. It doesn't have much of a life as an independent extension, and at
least from the "outside" it doesn't look like there's much energy going
into giving it one, right? Yet it's also had limited time and effort put
into getting it in core in this release cycle.

While I don't like that, I have to say I understand it.


> First, it wasn't until 9.3 that physical standbys could follow
> timeline switches, but that doesn't mean that streaming replication
> was useless in 9.0 - 9.2


I don't think there's much of a parallel there. It's closer to being like
having _cascading_ replication without the ability to follow timeline
switches via WAL archive replay or streaming.


> Second, I'm not sure whether it was a good design decision to make
> logical slots a special kind of object that sit off to the side,
> neither configuration (like postgresql.conf) nor WAL-protected data
> (like pg_clog and the data files themselves), but it was certainly a
> very deliberate decision.  I sort of expected them to be WAL-logged,
> but Andres argued (not unconvincingly) that we'd want to have slots on
> standbys, and making them WAL-logged would preclude that.


Yeah. I understand the reasons for that decision. Per an earlier reply I
think we can avoid making them WAL-logged so they can be used on standbys
and still achieve usable failover support on physical replicas.


> It's more like unlogged tables, where a deliberate
> design decision to lose data was made in order to meet some other
> goal.
>

Yeah ... but it's like unlogged tables without having the option of logged
tables.

Nice while it lasts but you're in bad trouble when it goes wrong.

(The parallel isn't perfect of course, since unlogged tables aren't crash
safe wheras slots are, they just aren't replicated).


> If you retype a column from text to integer,
> you probably aren't storing anything in it other than integers, in
> which case it is not necessarily the case that you are locked into
> applying that change at a particular point in the change stream.


... if you're replicating in text format, not send/recv format, yeah. You
get away with it in that particular case. But that *only* works where
you're using text format and the types are convertable via text
representation.

It's also just one example. Adding a NOT NULL column bites you, hard, for
example.

Asking users to do this themselves is incredibly fragile and requires much
more knowledge of PostgreSQL innards than I think is realistic. It's part
of why Slony-I is difficult too.

Sure, you can avoid having DDL replication if you're really, really careful
and know exactly what you're doing. I don't think it's a pre-req for
getting logical replication into core, but I do think it's a pretty strong
requirement for taking the use of logical replication for HA and failover
seriously for general use.

To be honest, I was shocked that pglogical and pglogical_output didn't
> go into this release.  I assumed that you and other folks at
> 2ndQuadrant were going to make a big push to get that done.  I did
> take a brief look at one of them - pglogical, I think - a week or two
> ago but there were unaddressed review comments that had been pending
> for months and there were a lot of fairly obvious things that needed
> to be done before it could be seriously considered as a core
> submission.  Like, for example, rew

Re: [HACKERS] Default Roles (was: Additional role attributes)

2016-04-05 Thread Stephen Frost
Noah,

* Noah Misch (n...@leadboat.com) wrote:
> On Sun, Apr 03, 2016 at 10:27:02PM -0400, Stephen Frost wrote:
> > * Fujii Masao (masao.fu...@gmail.com) wrote:
> > > Currently only superusers can call pgstattuple().
> > 
> > I started looking into this.
> > 
> > If we were starting from a green field, the pg_dump dump catalog ACLs
> > patch would work just fine for this case.  Simply remove the superuser
> > checks and REVOKE EXECUTE from public in the script and we're done.
> > 
> > Unfortunately, we aren't, and that's where things get complicated.  The
> > usual pg_upgrade case will, quite correctly, dump out the objects
> > exactly as they exist from the 9.5-or-earlier system and restore them
> > into the 9.6 system, however, the new .so will be installed and that .so
> > won't have the superuser checks in it.
> > 
> > The only approach to addressing this which I can think of offhand would
> > be to have the new .so library check the version of the extension and,
> > for the 1.3 (pre-9.6) and previous versions, keep the superuser check,
> > but skip it for 1.4 (9.6) and later versions.
> 
> At the C level, have a pgstattuple function and a pgstattuple_v1_4 function.
> Let them differ only in that the former has a superuser check.  Binary
> upgrades will use the former, and fresh CREATE EXTENSION shall use the latter.

Excellent suggestion and many thanks for that.

I'll draft up a patch for that.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2016-04-05 Thread Robert Haas
On Mon, Mar 21, 2016 at 3:31 AM, Haribabu Kommi
 wrote:
> On Mon, Mar 21, 2016 at 2:00 PM, Alvaro Herrera
>  wrote:
>> Haribabu Kommi wrote:
>>
>>> > Check.
>>> >
>>> > +} lookup_hba_line_context;
>>> > ^ but why TAB here?
>>>
>>> corrected. I am not sure why pg_indent is adding a tab here.
>>
>> It's because lookup_hba_line_context is not listed in typedefs.list.
>> I suggest adding it and all other new typedefs you add, and rerunning
>> pgindent, as the lack of those may affect other places where those names
>> appear.
>
> Thanks for the details. I added the new typedef into typedefs.list file.
> Updated patch is attached.

This patch is still marked "needs review".  If it's ready to go, one
of the reviewers should mark it "ready for committer".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standalone backend PANICs during recovery

2016-04-05 Thread Robert Haas
On Sat, Apr 2, 2016 at 5:57 PM, Alvaro Herrera  wrote:
> Bernd Helmle wrote:
>> While investigating a problem on a streaming hot standby instance at a
>> customer site, i got the following when using a standalone backend:
>>
>> PANIC:  btree_xlog_delete_get_latestRemovedXid: cannot operate with
>> inconsistent data
>> CONTEXT:  xlog redo delete: index 1663/65588/65625; iblk 11, heap
>> 1663/65588/65613;
>>
>> The responsible PANIC is triggered here:
>>
>> src/backend/access/nbtree/nbtxlog.c:555
>>
>> btree_xlog_delete_get_latestRemovedXid(XLogReaderState *record)
>
> This PANIC was introduced in the commit below.  AFAICS your proposed
> patch and rationale are correct.

If nobody's going to commit this right away, this should be added to
the next CommitFest so we don't forget it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-04-05 Thread Robert Haas
On Wed, Mar 16, 2016 at 7:43 PM, Peter Geoghegan  wrote:
> On Wed, Mar 16, 2016 at 11:25 AM, Robert Haas  wrote:
>> Sure, and if everybody does that, then there will be 40 patches that
>> get updated in the last 2 days if the CommitFest, and that will be
>> impossible.  Come on.  You're demanding a degree of preferential
>> treatment which is unsupportable.
>
> It's unexpected that an entirely maintenance-orientated patch like
> this would be received this way. I'm not demanding anything, or
> applying any real pressure. Let's just get on with it.
>
> I attach a revision, that makes all the changes that Heikki suggested,
> except one. As already noted several times, following this suggestion
> would have added a bug. Alvaro preferred my original approach here in
> any case. I refer to my original approach of making the new
> UNIQUE_CHECK_SPECULATIVE case minimally different from the existing
> UNIQUE_CHECK_PARTIAL case currently used for deferred unique
> constraints and speculative insertion, as opposed to making the new
> UNIQUE_CHECK_SPECULATIVE "like CHECK_UNIQUE_YES, but return FALSE
> instead of throwing an error on conflict". That was broken because
> CHECK_UNIQUE_YES waits for the outcome of an xact, which
> UNIQUE_CHECK_PARTIAL never does, and so UNIQUE_CHECK_SPECULATIVE must
> never do.
>
> Any and all waits happen in the first phase of speculative insertion,
> and never the seconds. I could give a complicated explanation for why,
> involving a deadlock scenario, but a simple explanation will do: it
> has always worked that way, and was tested to work that way.
>
> Feedback from Heikki led to these changes for this revision:
>
> * The use of arguments within ExecInsert() was simplified.
>
> * More concise AM documentation.
>
> The docs essentially describe two new concepts:
>
> - What unique index insertion needs to know about speculative
> insertion in general. This doesn't just apply to speculative inserters
> themselves, of course.
>
> - What speculative insertion is. Why it exists (why we don't just wait
> on xact). In other words, what "unprincipled deadlocks" are, and how
> they are avoided (from a relatively high level).
>
> I feel like I have a responsibility to make sure that this mechanism
> is well documented, especially given that not that many people were
> involved in its design. It's possible that no more than the 3 original
> authors of UPSERT fully understand speculative insertion -- it's easy
> to miss some of the subtleties.
>
> I do not pursue something like this without good reason. I'm
> optimistic that the patch will be accepted if it is carefully
> considered.

This patch has attracted no reviewers.  Any volunteers?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] insufficient qualification of some objects in dump files

2016-04-05 Thread Robert Haas
On Fri, Mar 18, 2016 at 11:22 AM, Robert Haas  wrote:
> On Fri, Mar 18, 2016 at 2:13 AM, Michael Paquier
>  wrote:
>> On Fri, Mar 18, 2016 at 5:38 AM, Tom Lane  wrote:
>>> Given the lack of any other complaints about this, I'm okay with the
>>> approach as presented.  (I haven't read the patch in detail, though.)
>>
>> FWIW, I am still of the opinion that the last patch sent by Peter is
>> in a pretty good shape.
>
> Great.  I've marked this patch as "Ready for Committer" in the
> CommitFest application.

Peter, are you going to commit this?  Feature freeze is in just a few days.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 7:21 PM, Thomas Munro
 wrote:
> Thanks.  I see a way to move that loop and change the overflow
> behaviour along those lines but due to other commitments I won't be
> able to post a well tested patch and still leave time for reviewers
> and committer before the looming deadline.  After the freeze I will
> post an updated version that addresses these problems for the next CF.

OK, I've marked this Returned with Feedback for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views vs. primary keys

2016-04-05 Thread Amit Langote
On 2016/04/06 8:48, David Fetter wrote:
> On Tue, Apr 05, 2016 at 07:10:56PM -0400, Robert Haas wrote:
>> On Tue, Apr 5, 2016 at 6:50 PM, David Fetter  wrote:
>>> Is there a reason other than lack of tuits for this restriction?
>>
>> "this" lacks an antecedent.
> 
> Try to put a primary key on a materialized view, for example:
> 
> CREATE TABLE foo(id SERIAL PRIMARY KEY, t text);
> 
> CREATE MATERIALIZED VIEW bar AS SELECT * FROM foo;
> 
> REFRESH MATERIALIZED VIEW bar;
> 
> ALTER MATERIALIZED VIEW bar ADD PRIMARY KEY(id);
> 
> At that last step, you get an error that bar is not a table.  You get
> an identical error with the hoary old trick of 
> 
> ALTER TABLE bar ADD PRIMARY KEY(id);

Initially I thought it may be just an oversight of forgetting to pass
ATT_MATVIEW to ATSimplePermissions() in ALTER TABLE processing and that
there are no deeper technical reasons why that is so. But, there seem to
be. On inspecting a little, it seems I can create unique indexes on a
matview, but couldn't manage to set its columns to NOT NULL.  Only allowed
relations in the latter case are plain tables and foreign tables.  I guess
that follows from how NOT NULL constraints are enforced.

> This lack prevents things that depend on primary keys (foreign keys,
> logical replication, etc.) from operating on the materialized views.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 3:51 AM, Craig Ringer  wrote:
> First, I'd like to remind everyone that logical decoding is useful for more
> than replication. You can consume the change stream for audit
> logging/archival, to feed into a different DBMS, etc etc. This is not just
> about replicating from one PostgreSQL to another, though to be sure that'll
> be the main use in the near term.
>
> The Zalando guys at least are already using it for other things, and
> interest in the json support suggests they're not alone.

I have not forgotten any of that, nor do I consider it unimportant.

> Right now if you're doing any kind of logical deocoding from a master server
> that fails over to a standby the client just dies. The slot vanishes. You're
> stuffed. Gee, I hope you didn't need all that nice consistent ordering,
> because you're going to have to start from scratch and somehow reconcile the
> data in the new master with what you've already received ... and haven't.

Right, but right now you probably *aren't* do doing any kind of
logical decoding from a master server to a standby, because there's
squat in the core distribution that could make use of that capability.
So you never get as far as discovering this problem.

> I don't understand why it seems to be considered OK for logical slots to
> just vanish on failover. The only other things I can think of where that's
> considered OK are unlogged tables (because that's the point and we have
> failover-safe ones too) and the old hash indexes nobody's quite willing to
> remove yet.

First, it wasn't until 9.3 that physical standbys could follow
timeline switches, but that doesn't mean that streaming replication
was useless in 9.0 - 9.2, or that warm standby was useless in earlier
versions.  Logical decoding isn't useless without that capability
either.  Would it be nice if we did have that capability?  Of course.

Second, I'm not sure whether it was a good design decision to make
logical slots a special kind of object that sit off to the side,
neither configuration (like postgresql.conf) nor WAL-protected data
(like pg_clog and the data files themselves), but it was certainly a
very deliberate decision.  I sort of expected them to be WAL-logged,
but Andres argued (not unconvincingly) that we'd want to have slots on
standbys, and making them WAL-logged would preclude that.  So I don't
really think that this is much like hash indexes, which just never got
properly finished.  It's more like unlogged tables, where a deliberate
design decision to lose data was made in order to meet some other
goal.

>> realistically, there are a lot of people who simply don't change their
>> schema all that often, and who could (and might even prefer to) manage
>> that process in other ways - e.g. change nodes one by one while they
>> are off-line, then bring them on-line.
>
> Yeah, it's a bit more complex than that. Schema changes *must* be made at a
> specific point in replay. You can't generally just ALTER TABLE ... DROP
> COLUMN on the master then do the same thing on the replica. The replica
> probably still has un-replayed changes from the master that have the
> now-dropped column in their change stream, but now it can't apply them to
> the new table structure on the downstream. This particular case can be
> worked around, but column type changes, addition of non-null columns etc
> cannot.

There are certainly problem cases, but I'm not sure they really arise
that much in practice.  If you retype a column from text to integer,
you probably aren't storing anything in it other than integers, in
which case it is not necessarily the case that you are locked into
applying that change at a particular point in the change stream.  If
you are storing non-integers in a text column and relying on a USING
clause to make them look like integers during the conversion, then,
yes, that has to be done at a precise point in the change stream.  But
that's a pretty strange thing to do, and your application is most
likely going to get confused anyway, so you are probably taking a
maintenance window for the changeover anyway - in which case, there's
not really a big problem.  You can run the same change at the same
time on both servers while nothing else is happening.

> Also, a growing portion of the world uses generated schemas and migrations
> and can't easily pipe that through some arbitrary function we provide. They
> may not be too keen on stopping writes to their entire database as an
> alternative either (and we don't provide a true read-only mode for them to
> use if they did want to). I know it's fashionable around here to just write
> them off as idiots for using ORMs and so on, but they're rather widespread
> idiots who're just as likely to be interested in geographically distributed
> selective replication, change stream extraction, etc. This has been a
> persistent problem with people who want to use BDR, too.

It's not my intent to write anyone off as an idiot.

> Review and test respon

Re: [HACKERS] [BUGS] Breakage with VACUUM ANALYSE + partitions

2016-04-05 Thread Noah Misch
On Mon, Apr 04, 2016 at 12:45:25PM -0400, Robert Haas wrote:
> Backends shouldn't be requesting nonexistent blocks from a relation -
> higher-level safeguards, like holding AccessExclusiveLock before
> trying to complete a DROP or TRUNCATE - are supposed to prevent that.
> If this patch is causing us to hold onto smgr references to a relation
> on which we no longer hold locks, I think that's irretrievably broken
> and should be reverted.  I really doubt this will be the only thing
> that goes wrong if you do that.

+1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-04-05 Thread Craig Ringer
On 5 April 2016 at 14:09, Andres Freund  wrote:

> On 2016-04-05 05:53:53 +0200, Petr Jelinek wrote:
> > On 04/04/16 17:15, Andres Freund wrote:
> > >
> > >>* Robust sequence decoding and replication. If you were following the
> later
> > >>parts of that discussion you will've seen how fun that's going to be,
> but
> > >>it's the simplest of all of the problems.
> > >
> > >Unconvinced. People used londiste and slony for years without that, and
> > >it's not even remotely at the top of the list of problems with either.
> > >
> >
> > Londiste and Slony also support physical failover unlike logical decoding
> > which is the main point of this discussion, lets not forget that.
>
> Sure. But that level of failover isn't all that hard to implement.
>
>
Well, they get it mostly for free. Because they work at the SQL level and
we have working physical failover, so they don't really notice the change.

Just like I've been trying to make possible for logical decoding and
logical replication.

> If we got failover slots into 9.6 it would
> > be better but that does not look realistic at this point. I don't think
> that
> > current design for failover slots is best possible - I think failover
> slots
> > should be created on replica and send their status up to the master which
> > would then take them into account when calculating oldest needed catalog
> > xmin and lsn (simple way of doing that would be to add this to feedback
> > protocol and let physical slot to keep the xmin/lsn as well)
>
> Yes, that's not too far away from what I'm thinking of. If we do it
> right that also solves the important problems for decoding on a standby.
>

I'm pretty happy with this approach too.

It puts a bit more burden on the client, but I think that can be solved
separately and later with a helper running on the replica.

I've never liked how failover slots can't work with a cascading replica
when we get support for that. By contrast, this approach would actually
help solve one of the problems needed to get replay from a replica working.

It's a pity it's a no-hoper for 9.6 though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-05 Thread Peter Geoghegan
On Tue, Apr 5, 2016 at 1:31 PM, Peter Geoghegan  wrote:
> My new understanding: The extra "included" columns are stored in the
> index, but do not affect its sort order at all. They are no more part
> of the key than, say, the heap TID that the key points to. They are
> just "payload".

Noticed a few issues following another pass:

* tuplesort.c should handle the CLUSTER case in the same way as the
btree case. No?

* Why have a RelationGetNumberOfAttributes(indexRel) call in
tuplesort_begin_index_btree() at all now?

* This critical section is unnecessary, because this happens during
index builds:

+   if (indnkeyatts != indnatts && P_ISLEAF(opageop))
+   {
+   /*
+* It's essential to truncate High key here.
+* The purpose is not just to save more space
on this particular page,
+* but to keep whole b-tree structure
consistent. Subsequent insertions
+* assume that hikey is already truncated, and
so they should not
+* worry about it, when copying the high key
into the parent page
+* as a downlink.
+* NOTE It is not crutial for reliability in present,
+* but maybe it will be that in the future.
+* NOTE this code will be changed by the
"btree compression" patch,
+* which is in progress now.
+*/
+   keytup = index_reform_tuple(wstate->index, oitup,
+
 indnatts, indnkeyatts);
+
+   /*  delete "wrong" high key, insert keytup as
P_HIKEY. */
+   START_CRIT_SECTION();
+   PageIndexTupleDelete(opage, P_HIKEY);
+
+   if (!_bt_pgaddtup(opage,
IndexTupleSize(keytup), keytup, P_HIKEY))
+   elog(ERROR, "failed to rewrite
compressed item in index \"%s\"",
+   RelationGetRelationName(wstate->index));
+   END_CRIT_SECTION();
+   }

Note that START_CRIT_SECTION() promotes any ERROR to PANIC, which
isn't useful here, because we have no buffer lock held, and nothing
must be WAL-logged.

* Think you forgot to update spghandler(). (You did not add a test for
just that one AM, either)

* I wonder why this restriction needs to exist:

+   else
+   elog(ERROR, "Expressions are not supported in
included columns.");

What does not supporting it buy us? Was it just that the pg_index
representation is more complicated, and you wanted to put it off?

An error like this should use ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED ..., btw.

* I would like to see index_reform_tuple() assert that the new,
truncated index tuple is definitely <= the original (I worry about the
1/3 page restriction issue). Maybe you should also change the name of
index_reform_tuple(), per David.

* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.

* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.

* Valgrind shows an error with an aggregate statement I tried:

2016-04-05 17:01:31.129 PDT 12310 LOG:  statement: explain analyze
select count(*) from ab  where b > 5 group by a, b;
==12310== Invalid read of size 4
==12310==at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310==by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310==by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310==by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310==by 0x658054: create_index_paths (indxpath.c:269)
==12310==by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310==by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310==by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310==by 0x64D93B: make_one_rel (allpaths.c:170)
==12310==by 0x66876C: query_planner (planmain.c:246)
==12310==by 0x669FBA: grouping_planner (planner.c:1666)
==12310==by 0x66D0C9: subquery_planner (planner.c:751)
==12310==by 0x66D3DA: standard_planner (planner.c:300)
==12310==by 0x66D714: planner (planner.c:170)
==12310==by 0x6FD692: pg_plan_query (postgres.c:798)
==12310==by 0x59082D: ExplainOneQuery (explain.c:350)
==12310==  Address 0xbff290

Re: [HACKERS] Materialized views vs. primary keys

2016-04-05 Thread David Fetter
On Tue, Apr 05, 2016 at 07:10:56PM -0400, Robert Haas wrote:
> On Tue, Apr 5, 2016 at 6:50 PM, David Fetter  wrote:
> > Is there a reason other than lack of tuits for this restriction?
> 
> "this" lacks an antecedent.

Try to put a primary key on a materialized view, for example:

CREATE TABLE foo(id SERIAL PRIMARY KEY, t text);

CREATE MATERIALIZED VIEW bar AS SELECT * FROM foo;

REFRESH MATERIALIZED VIEW bar;

ALTER MATERIALIZED VIEW bar ADD PRIMARY KEY(id);

At that last step, you get an error that bar is not a table.  You get
an identical error with the hoary old trick of 

ALTER TABLE bar ADD PRIMARY KEY(id);

This lack prevents things that depend on primary keys (foreign keys,
logical replication, etc.) from operating on the materialized views.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unused macros in src/include/access/transam.h

2016-04-05 Thread Michael Paquier
On Wed, Apr 6, 2016 at 4:55 AM, Fabrízio de Royes Mello
 wrote:
> Hi all,
>
> When dealing with some patch review I've noticed there are two macro is not
> used anywhere:
>
> #define TransactionIdStore(xid, dest)   (*(dest) = (xid))
> #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)

Not by the core code, perhaps they are used by extensions though. The
last bit of code using TransactionIdStore is from 2007 (282d2a03), by
the way. Still, knowing the low level of maintenance those are
requiring, it does not seem worth a compilation breakage I think.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-04-05 Thread Thomas Munro
On Tue, Apr 5, 2016 at 4:17 AM, Robert Haas  wrote:
> On Wed, Mar 30, 2016 at 2:22 AM, Thomas Munro
>  wrote:
>> On Wed, Mar 30, 2016 at 2:36 PM, Robert Haas  wrote:
>>> OK, I committed this, with a few tweaks.  In particular, I added a
>>> flag variable instead of relying on "latch set" == "need to send
>>> reply"; the other changes were cosmetic.
>>>
>>> I'm not sure how much more of this we can realistically get into 9.6;
>>> the latter patches haven't had much review yet.  But I'll set this
>>> back to Needs Review in the CommitFest and we'll see where we end up.
>>> But even if we don't get anything more than this, it's still rather
>>> nice: remote_apply turns out to be only slightly slower than remote
>>> flush, and it's a guarantee that a lot of people are looking for.
>>
>> Thank you Michael and Robert!
>>
>> Please find attached the rest of the patch series, rebased against
>> master.  The goal of the 0002 patch is to provide an accurate
>> indication of the current replay lag on each standby, visible to users
>> like this:
>>
>> postgres=# select application_name, replay_lag from pg_stat_replication;
>>  application_name │   replay_lag
>> ──┼─
>>  replica1 │ 00:00:00.000299
>>  replica2 │ 00:00:00.000323
>>  replica3 │ 00:00:00.000319
>>  replica4 │ 00:00:00.000303
>> (4 rows)
>>
>> It works by maintaining a buffer of (end of WAL, time now) samples
>> received from the primary, and then eventually feeding those times
>> back to the primary when the recovery process replays the
>> corresponding locations.
>>
>> Compared to approaches based on commit timestamps, this approach has
>> the advantage of providing non-misleading information between commits.
>> For example, if you run a batch load job that takes 1 minute to insert
>> the whole phonebook and no other transactions run, you will see
>> replay_lag updating regularly throughout that minute, whereas typical
>> commit timestamp-only approaches will show an increasing lag time
>> until a commit record is eventually applied.  Compared to simple LSN
>> location comparisons, it reports in time rather than bytes of WAL,
>> which can be more meaningful for DBAs.
>>
>> When the standby is entirely caught up and there is no write activity,
>> the reported time effectively represents the ping time between the
>> servers, and is updated every wal_sender_timeout / 2, when keepalive
>> messages are sent.  While new WAL traffic is arriving, the walreceiver
>> records timestamps at most once per second in a circular buffer, and
>> then sends back replies containing the recorded timestamps as fast as
>> the recovery process can apply the corresponding xlog.  The lag number
>> you see is computed by the primary server comparing two timestamps
>> generated by its own system clock, one of which has been on a journey
>> to the standby and back.
>>
>> Accurate lag estimates are a prerequisite for the 0004 patch (about
>> which more later), but I believe users would find this valuable as a
>> feature on its own.
>
> Well, one problem with this is that you can't put a loop inside of a
> spinlock-protected critical section.
>
> In general, I think this is a pretty reasonable way of attacking this
> problem, but I'd say it's significantly under-commented.  Where should
> someone go to get a general overview of this mechanism?  The answer is
> not "at place XXX within the patch".  (I think it might merit some
> more extensive documentation, too, although I'm not exactly sure what
> that should look like.)
>
> When you overflow the buffer, you could thin in out in a smarter way,
> like by throwing away every other entry instead of the oldest one.  I
> guess you'd need to be careful how you coded that, though, because
> replaying an entry with a timestamp invalidates some of the saved
> entries without formally throwing them out.
>
> Conceivably, 0002 could be split into two patches, one of which
> computes "stupid replay lag" considering only records that naturally
> carry timestamps, and a second adding the circular buffer to handle
> the case where much time passes without finding such a record.

Thanks.  I see a way to move that loop and change the overflow
behaviour along those lines but due to other commitments I won't be
able to post a well tested patch and still leave time for reviewers
and committer before the looming deadline.  After the freeze I will
post an updated version that addresses these problems for the next CF.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Michael Paquier
On Wed, Apr 6, 2016 at 5:58 AM, Alvaro Herrera  wrote:
> Robbie Harwood wrote:
>> Alvaro Herrera  writes:
>> > It seems to me that the right solution for this is to create a new
>> > memory context which is a direct child of TopMemoryContext, so that
>> > palloc can be used, and so that it can be reset separately, and that it
>> > doesn't suffer from resets of other contexts.  (I think Michael's point
>> > is that if those chunks were it a context of their own, you wouldn't
>> > need the pfrees but a MemCxtReset would be enough.)
>>
>> Hmm, that's also an option.  I read Michael's point as arguing for
>> calloc()/free() rather than a new context, but I could be wrong.
>
> Yeah, if you weren't using stringinfos that would be an option, but
> since you are I think that's out of the question.

To be honest, my first thought about that was to create a private
memory context only dedicated to those buffers, save it in pg_gssinfo
and remove those pfree calls as the memory context would clean up
itself with inheritance. Regarding the calloc/free stuff, should I be
a committer I would have given it a spin to see how the code gets
uglier or better and would have done a final decision depending on
that (you are true about the repalloc/pfree calls in memory contexts
btw, I forgot that).

>> A question, though: it it valuable for the context to be reset()able
>> separately?  If there were more than just these two buffers going into
>> it, I could see it being convenient - especially if it were for
>> different encryption types, for instance - but it seems like it would be
>> overkill?
>
> If two buffers is all, I think retail pfrees are fine.  I haven't
> actually read the patch.

Those are the only two buffers present per session.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views vs. primary keys

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 6:50 PM, David Fetter  wrote:
> Is there a reason other than lack of tuits for this restriction?

"this" lacks an antecedent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Michael Paquier
On Wed, Apr 6, 2016 at 6:15 AM, Magnus Hagander  wrote:
> On Tue, Apr 5, 2016 at 7:58 PM, Robbie Harwood  wrote:
>>
>> > -#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
>> > -/*
>> > - * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for
>> > MingW
>> > - * that contain the OIDs required. Redefine here, values copied
>> > - * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
>> > - */
>> > -static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
>> > -{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
>> > -static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME =
>> > &GSS_C_NT_USER_NAME_desc;
>> > -#endif
>> > Regarding patch 0003 it may be fine to remove that... Robbie, do you
>> > know how long ago this has been fixed upstream? I'd rather not have
>> > this bit removed if this could impact some users.
>>
>> I double-checked with MIT, and we think it was fixed in 2003 in commit
>> 4ce1f7c3a46485e342d3a68b4c60b76c196d1851 which can be viewed at
>>
>> https://github.com/krb5/krb5/commit/4ce1f7c3a46485e342d3a68b4c60b76c196d1851
>> and the corresponding bug on their bugtracker was
>> http://krbdev.mit.edu/rt/Ticket/Display.html?id=1666

Thanks for the investigation! It would be interesting to see what at
least narwhal in the buildfarm thinks about that.

> That certainly looks like it fixes is. This was way too long ago for me to
> remember which versions I was using at the time though.

So, this is as well an indication that it would actually be fine :)

> It looks like it was already OK in the MSVC build back then, and only mingw
> was broken. Which makes it even more reasonable that they might've fixed it
> now - or a long time ago.
>
> If it works on reasonably modern mingw, then I suggest pushing it to the
> buildfarm and see what happens. But it definitely needs at least one round
> of building on mingw..

What about giving a spin first to patch 0003 as two different patches?
One to remove this check, and one to improve the error reporting
(which is an improvement in itself). Then we could rebase the rest on
top of it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Materialized views vs. primary keys

2016-04-05 Thread David Fetter
Folks,

Is there a reason other than lack of tuits for this restriction?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-05 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Abhijit Menon-Sen wrote:
> > OK, thanks for the clarification. Here's the earlier patch, but with
> > the relevant added docs and tests retained.
> 
> I'd like to add indexes and materialized views to the set of objects
> covered (functions and triggers).  I'm already doing that, so no need to
> resubmit; it should be a pretty easy addition.  I've done a few minor
> additional changes too.

... and pushed.  I changed the regression test a bit more, so please
recheck.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Andrew Dunstan



On 04/05/2016 03:08 PM, Tom Lane wrote:


I think there is potentially some use-case for insert-only semantics
for an object target, if you want to be sure you're not overwriting
data.  So I think "throw an error on duplicate key" is marginally better
than "reject object target altogether".  But I could live with either.




Yeah, keeping it but rejecting update of an existing key is my 
preference too.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] large regression for parallel COPY

2016-04-05 Thread Robert Haas
On Wed, Mar 30, 2016 at 4:10 PM, Andres Freund  wrote:
> Indeed. On SSDs I see about a 25-35% gain, on HDDs about 5%. If I
> increase the size of backend_flush_after to 64 (like it's for bgwriter)
> I however do get about 15% for HDDs as well.

I tried the same test mentioned in the original post on cthulhu (EDB
machine, CentOS 7.2, 8 sockets, 8 cores per socket, 2 threads per
core, Xeon E7-8830 @ 2.13 GHz).  I attempted to test both the effects
of multi_extend_v21 and the *_flush_after settings.  The machine has
both HD and SSD, but I used HD for this test.

master, logged tables, 4 parallel copies: 1m15.411s, 1m14.248s, 1m15.040s
master, logged tables, 1 copy: 0m28.336s, 0m28.040s, 0m29.576s
multi_extend_v21, logged tables, 4 parallel copies: 0m46.058s,
0m44.515s, 0m45.688s
multi_extend_v21, logged tables, 1 copy: 0m28.440s, 0m28.129s, 0m30.698s
master, logged tables, 4 parallel copies,
{backend,bgwriter}_flush_after=0: 1m2.817s, 1m4.467s, 1m12.319s
multi_extend_v21, logged tables, 4 parallel copies,
{backend,bgwriter}_flush_after=0: 0m41.301s, 0m41.104s, 0m41.342s
master, logged tables, 1 copy, {backend,bgwriter}_flush_after=0:
0m26.948s, 0m26.829s, 0m26.616s

So the flushing is a small win with only 1 parallel copy, but with 4
parallel copies it's a significant loss.  However, the relation
extension patch reduces the regression significantly, probably because
it makes it far more likely that a backend doing a flush is flushing a
consecutive range of blocks all of which it added to the relation, so
that there is no interleaving.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Magnus Hagander
On Tue, Apr 5, 2016 at 7:58 PM, Robbie Harwood  wrote:

> > -#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
> > -/*
> > - * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
> > - * that contain the OIDs required. Redefine here, values copied
> > - * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
> > - */
> > -static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
> > -{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
> > -static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = &GSS_C_NT_USER_NAME_desc;
> > -#endif
> > Regarding patch 0003 it may be fine to remove that... Robbie, do you
> > know how long ago this has been fixed upstream? I'd rather not have
> > this bit removed if this could impact some users.
>
> I double-checked with MIT, and we think it was fixed in 2003 in commit
> 4ce1f7c3a46485e342d3a68b4c60b76c196d1851 which can be viewed at
>
> https://github.com/krb5/krb5/commit/4ce1f7c3a46485e342d3a68b4c60b76c196d1851
> and the corresponding bug on their bugtracker was
> http://krbdev.mit.edu/rt/Ticket/Display.html?id=1666



That certainly looks like it fixes is. This was way too long ago for me to
remember which versions I was using at the time though.

It looks like it was already OK in the MSVC build back then, and only mingw
was broken. Which makes it even more reasonable that they might've fixed it
now - or a long time ago.

If it works on reasonably modern mingw, then I suggest pushing it to the
buildfarm and see what happens. But it definitely needs at least one round
of building on mingw..


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Alvaro Herrera
Robbie Harwood wrote:
> Alvaro Herrera  writes:

> > It seems to me that the right solution for this is to create a new
> > memory context which is a direct child of TopMemoryContext, so that
> > palloc can be used, and so that it can be reset separately, and that it
> > doesn't suffer from resets of other contexts.  (I think Michael's point
> > is that if those chunks were it a context of their own, you wouldn't
> > need the pfrees but a MemCxtReset would be enough.)
> 
> Hmm, that's also an option.  I read Michael's point as arguing for
> calloc()/free() rather than a new context, but I could be wrong.

Yeah, if you weren't using stringinfos that would be an option, but
since you are I think that's out of the question.

> A question, though: it it valuable for the context to be reset()able
> separately?  If there were more than just these two buffers going into
> it, I could see it being convenient - especially if it were for
> different encryption types, for instance - but it seems like it would be
> overkill?

If two buffers is all, I think retail pfrees are fine.  I haven't
actually read the patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread David Steele
On 4/5/16 1:25 AM, Michael Paquier wrote:

> Btw, those seem like small things to me, and my comments have been
> addressed, so I have switched the patch as ready for committer. I
> guess that Stephen would be the one to look at it.

I have also run this patch through my tests and didn't find any
problems.  I agree that it is ready for committer.

I don't see a problem with using the top memory context but whoever
commits it may feel differently.  I'm happy to leave that decision up to
them.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another small patch - reorderbuffer.c:1099

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 10:38 AM, Alvaro Herrera
 wrote:
> IMO the code is wrong.  There should be a single block comment saying
> something like "Remove the node from its containing list.  In the FOO
> case, the list corresponds to BAR and therefore we delete it because
> BAZ.  In the QUUX case the list is PLUGH and we delete in because THUD."
> Then a single line dlist_delete(...) follows.
>
> The current arrangement looks bizantine to me, for no reason.  If we
> think that one of the two branches might do something additional to the
> list deletion, surely that will be in a separate stanza with its own
> comment; and if we ever want to remove the list deletion from one of the
> two cases (something that strikes me as unlikely) then we will need to
> fix the comment, too.

+1 to everything here except for the way byzantine is spelled.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-05 Thread Peter Geoghegan
On Tue, Apr 5, 2016 at 7:56 AM, Anastasia Lubennikova
 wrote:
> I would say, this is changed, but it doesn't matter.

Actually, I would now say that it hasn't really changed (see below),
based on my new understanding. The *choice* to go on one page or the
other still exists.

> Performing any search in btree (including choosing suitable page for
> insertion), we use only key attributes.
> We assume that included columns are stored in index unordered.

The patch assumes no ordering for the non-indexed columns in the
index? While I knew that the patch was primarily motivated by enabling
index-only scans, I didn't realize that at all. The patch is much much
less like a general suffix truncation patch than I thought. I may have
been confused in part by the high key issue that you only recently
fixed, but you should have corrected me about suffix truncation
earlier. Obviously, this was a significant misunderstanding; we have
been "talking at cross purposes" this whole time.

There seems to have been significant misunderstanding about this before now:

http://www.postgresql.org/message-id/cakjs1f9w0ab-g7h6yygnbq7hjsokf3uwhu7-q5jobbatyk9...@mail.gmail.com

My new understanding: The extra "included" columns are stored in the
index, but do not affect its sort order at all. They are no more part
of the key than, say, the heap TID that the key points to. They are
just "payload".

> "just matching on constrained attributes" is the core idea of the whole
> patch. Included columns just provide us possibility to use index-only scan.
> Nothing more. We assume use case where index-only-scan is faster than
> index-scan + heap fetch. For example, in queries like "select data from tbl
> where id = 1;" we have no scan condition on data. Maybe you afraid of long
> linear scan when we have enormous index bloat even on unique index. It will
> happen anyway, whether we have index-only scan on covering index or
> index-scan on unique index + heap fetch. The only difference is that the
> covering index is faster.

My concern about performance when that happens is very much secondary.
I really only mentioned it to help explain my primary concern.

> At the very beginning of the proposal discussion, I suggested to implement
> third kind of columns, which are not constrained, but used in scankey.
> They must have op class to do it, and they are not truncated. But it was
> decided to abandon this feature.

I must have missed that. Obviously, I wasn't paying enough attention
to earlier discussion. Earlier versions of the patch did fail to
recognize that the sort order was not the entire indexed order, but
that isn't the case with V8. That that was ever possible was only a
bug, it turns out.

>> The more I think about it, the more I doubt that it's okay to not
>> ensure downlinks are always distinct with their siblings, by sometimes
>> including non-constrained (truncatable) attributes within internal
>> pages, as needed to *distinguish* downlinks (also, we must
>> occasionally have *all* attributes including truncatable attributes in
>> internal pages -- we must truncate nothing to keep the key space sane
>> in the parent).

> Frankly, I still do not understand what you're worried about.
> If high key is greater than the scan key, we definitely cannot find any more
> tuples, because key attributes are ordered.
> If high key is equal to the scan key, we will continue searching and read
> next page.

I thought, because of the emphasis on unique indexes, that this patch
was mostly to offer a way of getting an index with uniqueness only
enforced on certain columns, but otherwise just the same as having a
non-unique index on those same columns. Plus, some suffix truncation,
because point-lookups involving later attributes are unlikely to be
useful when this is scoped to just unique indexes (which were
emphasized by you), because truncating key columns is not helpful
unless bloat is terrible.

I now understand that it was quite wrong to link this to suffix
truncation at all. The two are really not the same. That does make the
patch seem significantly simpler, at least as far as nbtree goes; a
tool like amcheck is not likely to detect problems in this patch that
a human tester could not catch. That was the kind of problem that I
feared.

> The code is not changed here, it is the same as processing of duplicates
> spreading over several pages. If you do not trust postgresql btree changes
> to the L&Y to make duplicates work, I don't know what to say, but it's
> definitely not related to my patch.

My point about the postgres btree changes to L&Y to make duplicates
work is that I think it makes the patch work, but perhaps not
absolutely reliably. I don't have any specific misgivings about it on
its own. Again, my earlier remarks were based on a misguided
understanding of the patch, so it doesn't matter now.

Communication is hard. There may be a lesson here for both of us about that.

> Of course I do not mind if someone will do more testing.
> I

Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Robbie Harwood
Alvaro Herrera  writes:

> Robbie Harwood wrote:
>> Michael Paquier  writes:
>> 
>> > On Tue, Apr 5, 2016 at 9:06 AM, Robbie Harwood  wrote:
>> >> Here's v12, both here and on my github:
>> >> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt12
>
>> > So you are saving everything in the top memory context. I am fine to
>> > give the last word to a committer here but I would just go with
>> > calloc/free to simplify those hunks.
>> 
>> Yeah, it's definitely worth thinking/talking about; this came up in IRC
>> discussion as well.
>> 
>> If we go the memory context route, it has to be TopMemoryContext since
>> nothing else lives long enough (i.e., entire connection).
> [...]
>> It turns out that's not actually required, but could easily be made
>> explicit here.  According to the README for the memory context system,
>> pfree() and repalloc() do not require setting CurrentMemoryContext
>> (since 7.1).
>
> It seems to me that the right solution for this is to create a new
> memory context which is a direct child of TopMemoryContext, so that
> palloc can be used, and so that it can be reset separately, and that it
> doesn't suffer from resets of other contexts.  (I think Michael's point
> is that if those chunks were it a context of their own, you wouldn't
> need the pfrees but a MemCxtReset would be enough.)

Hmm, that's also an option.  I read Michael's point as arguing for
calloc()/free() rather than a new context, but I could be wrong.

A question, though: it it valuable for the context to be reset()able
separately?  If there were more than just these two buffers going into
it, I could see it being convenient - especially if it were for
different encryption types, for instance - but it seems like it would be
overkill?

This is all new to me so I may be entirely mistaken though.

>> Side note: it's really irritating to work with having this file under
>> version control because of how different it ends up being when I
>> autoreconf (which I need to do because I'm changing the build system).
>> (I'm also idly curious what system/autotools version is generating this
>> file because it doesn't match any that I tried.)
>
> We use stock autoconf from the GNU package and it definitely produces
> matching output for me.  Maybe your Linux distro includes a patched
> version?  I know Debian does, but I suppose you're using some Redhat
> thing, so no idea.

Hmm, that explains the Debian behavior I was seeing (it does the
above).  The Fedora one adds a couple blank lines in places but it's
much less... gratuitous... in its changes.


signature.asc
Description: PGP signature


Re: [HACKERS] oversight in parallel aggregate

2016-04-05 Thread Robert Haas
On Mon, Apr 4, 2016 at 9:09 PM, David Rowley
 wrote:
> On 5 April 2016 at 11:59, Robert Haas  wrote:
>> One of my EDB colleagues, while in the process of refactoring some
>> unrelated Advanced Server code, discovered that (1) there's no way to
>> mark an aggregate as anything other than parallel-unsafe but (2) it
>> doesn't matter because has_parallel_hazard ignores Aggrefs anyway.
>> These mistakes cancel each other out (sorta) if all of your aggregates
>> happen to be parallel-safe, but otherwise not so much.  Barring
>> objections, I intend to speedily apply the attached patch to fix this.
>
> Thanks for working on this. I should have noticed this myself...
>
> I had a quick look at this and I manged to make this happen;
>
> david=# create aggregate mysum(int) (sfunc=int4pl, combinefunc=int4pl,
> stype=int, parallel);
> server closed the connection unexpectedly
>
> I've attached a fix, which makes the code a bit more simple, and also
> inline with the other code in DefineAggregate().

Thanks.

> I think there was also a couple of missing syntax synopsis in the docs
> too. I've added those.

The first one was indeed needed, but the second syntax doesn't
actually work, so I took that back out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unused macros in src/include/access/transam.h

2016-04-05 Thread Fabrízio de Royes Mello
Hi all,

When dealing with some patch review I've noticed there are two macro is not
used anywhere:

#define TransactionIdStore(xid, dest)   (*(dest) = (xid))
#define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] [COMMITTERS] pgsql: Move each SLRU's lwlocks to a separate tranche.

2016-04-05 Thread Robert Haas
On Fri, Mar 25, 2016 at 12:47 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 25, 2016 at 9:48 AM, Tom Lane  wrote:
>>> Robert Haas  writes:
 It's stupid that we keep spending time and energy figuring out which
 shared memory data structures require alignment and which ones don't.
 Let's just align them *all* and be done with it.  The memory cost
 shouldn't be more than a few kB.
>
>>> I think such a proposal should come with a measurement, not just
>>> speculation about what it costs.
>
>> About 6kB with default settings.  See below.
>
> Sold, then.

Excellent.  Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2016-04-05 Thread Fabien COELHO



Please note that the checkpointer patch has two open items that perhaps
you can help with --- see https://wiki.postgresql.org/wiki/Open_Items


Indeed, I just looked at the commitfest, and I did not notice the other 
threads.


I do not have an OSX available, but I'll have a look at the other one.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 2:52 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> Now, let's suppose that the user sets up a sharded table and then
>> says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
>> like to have happen is that for each child foreign table, we go and
>> fetch partially aggregated results.  Those children might be running
>> any version of PostgreSQL - I was not assuming that we'd insist on
>> matching major versions, although of course that could be done - and
>> there would probably need to be a minimum version of PostgreSQL
>> anyway.  They could even be running some other database.  As long as
>> they can spit out partial aggregates in a format that we can
>> understand, we can deserialize those aggregates and run combine
>> functions on them.  But if the remote side is, say, MariaDB, it's
>> probably much easier to get it to spit out something that looks like a
>> PostgreSQL array than it is to make it spit out some bytea blob that's
>> in an entirely PostgreSQL-specific format.
>
> Basing parts of the Postgres sharding mechanism on FDWs sounds
> acceptable.  Trying to design things so that *any* FDW can be part of a
> shard, so that you have some shards in Postgres and other shards in
> MariaDB, seems ludicrous to me.  Down that path lies madness.

I'm doubtful that anyone wants to do the work to make that happen, but
I don't agree that we shouldn't care about whether it's possible.
Extensibility is a feature of the system that we've worked hard for,
and we shouldn't casually surrender it.  For example, postgres_fdw now
implements join pushdown, and I suspect few other FDW authors will
care to do the work to add similar support to their implementations.
But some may, and it's good that the code is structured in such a way
that they have the option.

Actually, though, MariaDB is a bad example.  What somebody's much more
likely to want to do is have PostgreSQL as a frontend accessing data
that's actually stored in Hadoop.  There are lots of SQL interfaces to
Hadoop already, so it's clearly a thing people want, and our SQL is
the best SQL (right?) so if you could put that on top of Hadoop
somebody'd probably like it.  I'm not planning to try it myself, but I
think it would be cool if somebody else did.  I have been very pleased
to see that many of the bits and pieces of infrastructure that I
created for parallel query (parallel background workers, DSM, shm_mq)
have attracted quite a bit of interest from other developers for
totally unrelated purposes, and I think anything we do around
horizontal scalability should be designed the same way: the goal
should be to work with PostgreSQL on the other side, but the bits that
can be made reusable for other purposes should be so constructed.

> In fact, trying to ensure cross-major-version compatibility already
> sounds like asking for too much.  Requiring matching major versions
> sounds a perfectly acceptable restricting to me.

I disagree.  One of the motivations (not the only one, by any means)
for wanting logical replication in PostgreSQL is precisely to get
around the fact that physical replication requires matching major
versions.  That restriction turns out to be fairly onerous, not least
because when you've got a cluster of several machines you'd rather
upgrade them one at a time rather than all at once.  That's going to
be even more true with a sharded cluster, which will probably tend to
involve more machines than a replication cluster, maybe a lot more.
If you say that the user has got to shut the entire thing down,
upgrade all the machines, and turn it all back on again, and just hope
it works, that's going to be really painful.  I think that we should
treat this more like we do with libpq, where each major release can
add new capabilities that new applications can use, but the old stuff
has got to keep working essentially forever.  Maybe the requirements
here are not quite so tight, because it's probably reasonable to say,
e.g. that you must upgrade every machine in the cluster to at least
release 11.1 before upgrading any machine to 11.3 or higher, but the
fewer such requirements we have, the better.  Getting people to
upgrade to new major releases is already fairly hard, and anything
that makes it harder is an unwelcome development from my point of
view.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-05 Thread Alvaro Herrera
Abhijit Menon-Sen wrote:
> OK, thanks for the clarification. Here's the earlier patch, but with
> the relevant added docs and tests retained.

I'd like to add indexes and materialized views to the set of objects
covered (functions and triggers).  I'm already doing that, so no need to
resubmit; it should be a pretty easy addition.  I've done a few minor
additional changes too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Tom Lane
Andrew Dunstan  writes:
> On 04/05/2016 12:42 PM, Teodor Sigaev wrote:
>> I'm agree about covering this case by tests, but I think it should be 
>> allowed.
>> In this case it will work exactly as jsbonb_set

> It seems to me a violation of POLA to allow something called "insert" to 
> do a "replace" instead.

I agree with Andrew's point here.  If the target is an array it would
never replace an entry, so why would it do so for an object?

I think there is potentially some use-case for insert-only semantics
for an object target, if you want to be sure you're not overwriting
data.  So I think "throw an error on duplicate key" is marginally better
than "reject object target altogether".  But I could live with either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Endless loop calling PL/Python set returning functions

2016-04-05 Thread Tom Lane
Alexey Grishchenko  writes:
> Any comments on this patch?

I felt that this was more nearly a bug fix than a new feature, so I picked
it up even though it's nominally in the next commitfest not the current
one.  I did not like the code too much as it stood: you were not being
paranoid enough about ensuring that the callstack data structure stayed
in sync with the actual control flow.  Also, it didn't work for functions
that modify their argument values (cf the committed regression tests);
you have to explicitly save named arguments not only the "args" version,
and you have to do it for SRF suspend/resume not just recursion cases.
But I cleaned all that up and committed it.

> triggers are a bit different - they depend on modifying the global "TD"
> dictionary inside the Python function, and they return only the status
> string. For them, there is no option of modifying the code to avoid global
> input parameters without breaking the backward compatibility with the old
> enduser code.

Yeah.  It might be worth the trouble to include triggers in the
save/restore logic, since at least in principle they can be invoked
recursively; but there's not that much practical use for such cases.
I didn't bother with that in the patch as-committed, but if you want
to follow up with an adjustment for it, I'd take a look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Alvaro Herrera
Robert Haas wrote:

> Now, let's suppose that the user sets up a sharded table and then
> says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
> like to have happen is that for each child foreign table, we go and
> fetch partially aggregated results.  Those children might be running
> any version of PostgreSQL - I was not assuming that we'd insist on
> matching major versions, although of course that could be done - and
> there would probably need to be a minimum version of PostgreSQL
> anyway.  They could even be running some other database.  As long as
> they can spit out partial aggregates in a format that we can
> understand, we can deserialize those aggregates and run combine
> functions on them.  But if the remote side is, say, MariaDB, it's
> probably much easier to get it to spit out something that looks like a
> PostgreSQL array than it is to make it spit out some bytea blob that's
> in an entirely PostgreSQL-specific format.

Basing parts of the Postgres sharding mechanism on FDWs sounds
acceptable.  Trying to design things so that *any* FDW can be part of a
shard, so that you have some shards in Postgres and other shards in
MariaDB, seems ludicrous to me.  Down that path lies madness.

In fact, trying to ensure cross-major-version compatibility already
sounds like asking for too much.  Requiring matching major versions
sounds a perfectly acceptable restricting to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Alvaro Herrera
Robbie Harwood wrote:
> Michael Paquier  writes:
> 
> > On Tue, Apr 5, 2016 at 9:06 AM, Robbie Harwood  wrote:
> >> Here's v12, both here and on my github:
> >> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt12

> > So you are saving everything in the top memory context. I am fine to
> > give the last word to a committer here but I would just go with
> > calloc/free to simplify those hunks.
> 
> Yeah, it's definitely worth thinking/talking about; this came up in IRC
> discussion as well.
> 
> If we go the memory context route, it has to be TopMemoryContext since
> nothing else lives long enough (i.e., entire connection).
[...]
> It turns out that's not actually required, but could easily be made
> explicit here.  According to the README for the memory context system,
> pfree() and repalloc() do not require setting CurrentMemoryContext
> (since 7.1).

It seems to me that the right solution for this is to create a new
memory context which is a direct child of TopMemoryContext, so that
palloc can be used, and so that it can be reset separately, and that it
doesn't suffer from resets of other contexts.  (I think Michael's point
is that if those chunks were it a context of their own, you wouldn't
need the pfrees but a MemCxtReset would be enough.)

> Side note: it's really irritating to work with having this file under
> version control because of how different it ends up being when I
> autoreconf (which I need to do because I'm changing the build system).
> (I'm also idly curious what system/autotools version is generating this
> file because it doesn't match any that I tried.)

We use stock autoconf from the GNU package and it definitely produces
matching output for me.  Maybe your Linux distro includes a patched
version?  I know Debian does, but I suppose you're using some Redhat
thing, so no idea.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 1:04 PM, Andres Freund  wrote:
> On 2016-04-05 12:14:35 -0400, Robert Haas wrote:
>> On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
>> > On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
>> >> This fluctuation started appearing after commit 6150a1b0 which we have
>> >> discussed in another thread [1] and a colleague of mine is working on to
>> >> write a patch to try to revert it on current HEAD and then see the 
>> >> results.
>> >
>> > I don't see what that buys us. That commit is a good win on x86...
>>
>> Maybe.  But I wouldn't be surprised to find out that that is an
>> overgeneralization.  Based on some results Mithun Cy showed me this
>> morning, I think that some of this enormous run-to-run fluctuation
>> that we're seeing is due to NUMA effects.  So some runs we get two
>> things that are frequently accessed together on the same NUMA node and
>> other times they get placed on different NUMA nodes and then
>> everything sucks.  I don't think we fully understand what's going on
>> here yet - and I think we're committing changes in this area awfully
>> quickly - but I see no reason to believe that x86 is immune to such
>> effects.  They may just happen in different scenarios than what we see
>> on POWER.
>
> I'm not really following - we were talking about 6150a1b0 ("Move buffer
> I/O and content LWLocks out of the main tranche.") made four months
> ago. Afaics the atomic buffer pin patch is a pretty clear win on both
> ppc and x86?

The point is that the testing Amit's team is doing can't tell the
answer to that question one way or another.  6150a1b0 completely
destabilized performance on our test systems to the point where
testing subsequent patches is extremely difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 10:19 AM, Robert Haas  wrote:
> I'm going to concede the point that this shouldn't really be a
> priority for 9.6, but I might want to come back to it later.

It seems to me that if two aggregates are using the same transition
function, they ought to also be using the same combine, serialization,
and deserialization functions.  I wrote a query to find cases where
that wasn't so and found a few, which I changed before committing.

DATA(insert ( 2100  n 0 int8_avg_accum  numeric_poly_avg
int8_avg_combineint8_avg_serialize  int8_avg_deserialize
int8_avg_accum  int8_avg_accum_inv  numeric_poly_avgf f 0   2281
 17  48  228148  _null_ _null_ ));
DATA(insert ( 2107  n 0 int8_avg_accum  numeric_poly_sum
numeric_poly_combineint8_avg_serialize  int8_avg_deserialize
 int8_avg_accum  int8_avg_accum_inv  numeric_poly_sumf f 0   2281
  17  48  228148  _null_ _null_ ));

I changed the second of these from numeric_poly_combine to int8_avg_combine.

DATA(insert ( 2103  n 0 numeric_avg_accum   numeric_avg
numeric_avg_combine numeric_avg_serialize   numeric_avg_deserialize
numeric_avg_accum numeric_accum_inv numeric_avg f f 0   2281
 17  128 2281128 _null_ _null_ ));
DATA(insert ( 2114  n 0 numeric_avg_accum   numeric_sum
numeric_combine numeric_avg_serialize
numeric_avg_deserialize numeric_avg_accum numeric_accum_inv
numeric_sum f f 0   228117  128 2281128 _null_ _null_
));

I changed the second of these from numeric_combine to
numeric_avg_combine.  I also added a regression test for this.  Please
let me know if I'm off-base here.

Committed 0002+0003 with those changes, some minor cosmetic stuff, and
of course the obligatory catversion bump.  Oh, and fixed an OID
conflict with the patch Magnus just committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Choosing parallel_degree

2016-04-05 Thread Julien Rouhaud
On 05/04/2016 06:19, Amit Kapila wrote:
>
> Few more comments:
> 
> 1.
> @@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
> UNLOGGED ] TABLE [ IF NOT EXI
> 
> 
>  
> 
> +parallel_degree (integer)
> +
> + 
> 
> + Sets the degree of parallelism for an individual relation.  The
> requested
> + number of workers will be 
> limited by  + linkend="guc-max-parallel-degree">.
> + 
> +
> +   
> 
> All other parameters in this category are supportted by Alter table
> command as well, so I think this parameter should also be supported by
> Alter Table command (for both SET and RESET variants).
>

I don't quite understand.  With the patch you can use parallel_degree in
either CREATE or ALTER table (SET and RESET) statements. Considering
documentation, the list of storage parameters only appears in
create_table.sgml, alter_table.sgml pointing to it.

In alter_table.sgml, I didn't comment the lock level needed to modify
parallel_degree since it requires an access exclusive lock for now.
While thinking about it, I think it's safe to use a share update
exclusive lock but I may be wrong.  What do you think?

> 2.
> +"Number of parallel processes per executor node wanted for this relation.",
> 
> How about
> Number of parallel processes that can be used for this relation per
> executor node.
> 

I just rephrased what was used for the max_parallel_degree GUC, which is:

"Sets the maximum number of parallel processes per executor node."

I find your version better once again, but should we keep some
consistency between them or it's not important?

> 3.
> -if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
> +if (rel->pages < 
> parallel_threshold && rel->rel_parallel_degree == -1 &&
> +rel->reloptkind == RELOPT_BASEREL)
> 
> A. Second line should be indented with the begin of first line after
> bracket '(' which means with rel->pages.  Refer multiline condition in
> near by code.  Or you can run pgindent.

I ran pgindent, fixed.

> B. The comment above this condition needs slight adjustment as per new
> condition.
> 

Also fixed.

> 4.
> +intparallel_degree; /* max number of parallel worker */
>  } StdRdOptions;
> 
> Typo in comments
> /worker/workers
>

fixed.

I'll send an updated patch when I'll know what to do about the first two
points.


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-05 Thread Magnus Hagander
On Tue, Apr 5, 2016 at 5:57 PM, Amit Kapila  wrote:

> On Tue, Apr 5, 2016 at 5:35 PM, Magnus Hagander 
> wrote:
>
>>
>>
>> On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila 
>> wrote:
>>
>>> On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander 
>>> wrote:
>>>
 On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila 
 wrote:


> Also, I think below part of documentation for pg_start_backup() needs
> to be modified:
>
> 
>
> pg_start_backup accepts an
>
> arbitrary user-defined label for the backup.  (Typically this
> would be
>
> the name under which the backup dump file will be stored.)  The
> function
>
> writes a backup label file (backup_label) and, if
> there
>
> are any links in the pg_tblspc/ directory, a
> tablespace map
>
> file (tablespace_map) into the database cluster's
> data
>
> directory, performs a checkpoint, and then returns the backup's
> starting
>
> transaction log location as text.  The user can ignore this result
> value,
>
> but it is provided in case it is useful.
>

 That one definitely needs to be fixed, as it's part of the reference.
 Good spot.



> Similarly, there is a description for pg_stop_backup which needs to be
> modified.
>

 That's the one you're referring to in your first commend above, is it
 not? Or is there one more that you mean?


>>> I am referring to below part of docs in func.sgml
>>>
>>> 
>>>
>>> pg_stop_backup removes the label file and, if it
>>> exists,
>>>
>>> the tablespace_map file created by
>>>
>>> pg_start_backup, and creates a backup history file in
>>>
>>> the transaction log archive area.  The history file includes the
>>> label given to
>>>
>>> pg_start_backup, the starting and ending transaction
>>> log locations for
>>>
>>> the backup, and the starting and ending times of the backup.  The
>>> return
>>>
>>> value is the backup's ending transaction log location (which again
>>>
>>> can be ignored).  After recording the ending location, the current
>>>
>>> transaction log insertion
>>>
>>> point is automatically advanced to the next transaction log file, so
>>> that the
>>>
>>> ending transaction log file can be archived immediately to complete
>>> the backup.
>>>
>>>  
>>>
>>>
>> Ah, gotcha. Clearly I wasn't paying attention properly.
>>
>> PFA a better one (I think), also with the rename and added comments that
>> David was asking for.
>>
>>
> This version looks good.  Apart from review, I have done some tests
> (including regression test), everything seems to be fine.
>
>
I've pushed this version, and also added the item from the Brussels
developer meeting to actually rewrite the main backup docs to the open
items so they are definitely not forgotten for 9.6.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-04-05 Thread Robbie Harwood
Michael Paquier  writes:

> On Tue, Apr 5, 2016 at 9:06 AM, Robbie Harwood  wrote:
>> Here's v12, both here and on my github:
>> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt12
>>
> +#ifdef ENABLE_GSS
> +   {
> +   MemoryContext save = CurrentMemoryContext;
> +   CurrentMemoryContext = TopMemoryContext;
> +
> +   initStringInfo(&port->gss->buf);
> +   initStringInfo(&port->gss->writebuf);
> +
> +   CurrentMemoryContext = save;
> +   }
> +#endif
>
> So you are saving everything in the top memory context. I am fine to
> give the last word to a committer here but I would just go with
> calloc/free to simplify those hunks.

Yeah, it's definitely worth thinking/talking about; this came up in IRC
discussion as well.

If we go the memory context route, it has to be TopMemoryContext since
nothing else lives long enough (i.e., entire connection).

On the other hand, if we go with calloc/free here, I can't use
StringInfo for these buffers because StringInfo uses palloc.  Using
StringInfo is really handy; if I don't use StringInfo, I'd need to
reimplement enlargeStringInfo() for the ad-hoc buffer structure.  What
I'm trying to articulate is that it'd simplify the initialization code
since it removes MemoryContext management, but it makes everything else
more complicated.

I prefer the StringInfo approach since I think the abstraction is a good
one, but if you've got a case for explicit calloc()/free() in light of
that, I'd be interested to hear it.

> +#ifdef ENABLE_GSS
> +   if (conn->gss->buf.data)
> +   pfree(conn->gss->buf.data);
> +   if (conn->gss->writebuf.data)
> +   pfree(conn->gss->writebuf.data);
> +#endif
>
> This should happen in its own memory context, no

It turns out that's not actually required, but could easily be made
explicit here.  According to the README for the memory context system,
pfree() and repalloc() do not require setting CurrentMemoryContext
(since 7.1).

I'm not opposed to making this one explicit if you think it adds
clarity.  I would not want to make all the calls to StringInfo functions
explicit just because they can call repalloc, though.

> @@ -775,6 +775,7 @@ infodir
>  docdir
>  oldincludedir
>  includedir
> +runstatedir
>  localstatedir
>  sharedstatedir
>  sysconfdir
> @@ -896,6 +897,7 @@ datadir='${datarootdir}'
>  sysconfdir='${prefix}/etc'
>  sharedstatedir='${prefix}/com'
>  localstatedir='${prefix}/var'
> +runstatedir='${localstatedir}/run'
> What's that? I would recommend re-running autoconf to remove this
> portion (committers do it at the end btw, so that's actually no bug
> deal).

Well, I guess /that/ mistake finally happened.  This is what happens
when I run autoreconf on my VMs.  Because configure is checked in to
version control, I accidentally committed the whole thing instead of
just the GSSAPI changes.  I can back that out.

Side note: it's really irritating to work with having this file under
version control because of how different it ends up being when I
autoreconf (which I need to do because I'm changing the build system).
(I'm also idly curious what system/autotools version is generating this
file because it doesn't match any that I tried.)

> -#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
> -/*
> - * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
> - * that contain the OIDs required. Redefine here, values copied
> - * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
> - */
> -static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
> -{10, (void *) "\x2a\x86\x48\x86\xf7\x12\x01\x02\x01\x02"};
> -static GSS_DLLIMP gss_OID GSS_C_NT_USER_NAME = &GSS_C_NT_USER_NAME_desc;
> -#endif
> Regarding patch 0003 it may be fine to remove that... Robbie, do you
> know how long ago this has been fixed upstream? I'd rather not have
> this bit removed if this could impact some users.

I double-checked with MIT, and we think it was fixed in 2003 in commit
4ce1f7c3a46485e342d3a68b4c60b76c196d1851 which can be viewed at
https://github.com/krb5/krb5/commit/4ce1f7c3a46485e342d3a68b4c60b76c196d1851
and the corresponding bug on their bugtracker was
http://krbdev.mit.edu/rt/Ticket/Display.html?id=1666

In terms of release versions, this was first included in krb5-1.4.  For
reference, the oldest Kerberos that MIT upstream supports is 1.12; the
oldest Kerberos that I (maintainer for Red Hat) support in production
(i.e., RHEL-5) is 1.6.1, and even that is slated to go away early 2017.
To my knowledge no one is supporting an older version than that.

In the interest of completion, MIT did express concern that this might
not have been a bug in MIT at all, but rather a toolchain issue.  This
workaround has been present since the introduction of GSSAPI auth
support back in 2007.  I pinged Magnus on IRC, but I think I missed the
awake window for Sweden, so he's on CC as well.

> On 0003, +1 for reading the whole stack of messages. That's definitely
> worth a separate patch.

Okay!  It

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-05 Thread Abhijit Menon-Sen
OK, thanks for the clarification. Here's the earlier patch, but with
the relevant added docs and tests retained.

-- Abhijit
>From dfb6ded15246ec65cc911864bfcff285eef1c4d4 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Tue, 5 Apr 2016 11:55:09 +0530
Subject: =?UTF-8?q?Implement=20"ALTER=20=E2=80=A6=20DEPENDS=20ON=20EXTENSI?=
 =?UTF-8?q?ON=20=E2=80=A6"=20for=20functions=20and=20triggers?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A new dependency type, DEPENDENCY_AUTO_EXTENSION, indicates that an
object depends on an extension (but doesn't belong to it, so pg_dump
should continue to dump it) and should be dropped when the extension
itself is dropped. A new statement type, AlterObjectDependsStmt, is
used to declare such dependencies.

Includes documentation and tests.
---
 doc/src/sgml/catalogs.sgml | 13 ++
 doc/src/sgml/ref/alter_function.sgml   | 20 +
 doc/src/sgml/ref/alter_trigger.sgml| 16 +++
 src/backend/catalog/dependency.c   |  2 +
 src/backend/catalog/objectaddress.c| 25 +++
 src/backend/commands/alter.c   | 32 ++
 src/backend/nodes/copyfuncs.c  | 17 
 src/backend/nodes/equalfuncs.c | 15 +++
 src/backend/parser/gram.y  | 36 +++-
 src/backend/tcop/utility.c | 27 
 src/include/catalog/dependency.h   |  9 +++-
 src/include/catalog/objectaddress.h|  4 ++
 src/include/commands/alter.h   |  1 +
 src/include/nodes/nodes.h  |  1 +
 src/include/nodes/parsenodes.h | 15 +++
 src/include/parser/kwlist.h|  1 +
 src/test/modules/test_extensions/Makefile  |  2 +-
 .../test_extensions/expected/test_extdepend.out| 50 ++
 .../modules/test_extensions/sql/test_extdepend.sql | 32 ++
 src/tools/pgindent/typedefs.list   |  1 +
 20 files changed, 315 insertions(+), 4 deletions(-)
 create mode 100644 src/test/modules/test_extensions/expected/test_extdepend.out
 create mode 100644 src/test/modules/test_extensions/sql/test_extdepend.sql

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bb75229..3c128a0 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2888,6 +2888,19 @@
   
  
 
+
+
+ DEPENDENCY_AUTO_EXTENSION (x)
+ 
+  
+   The dependent object is not a member of the extension that is the
+   referenced object (and so should not be ignored by pg_dump), but
+   cannot function without it and should be dropped when the
+   extension itself is. The dependent object may be dropped on its
+   own as well.
+  
+ 
+

 
Other dependency flavors might be needed in future.
diff --git a/doc/src/sgml/ref/alter_function.sgml b/doc/src/sgml/ref/alter_function.sgml
index f40363e..1ef905f 100644
--- a/doc/src/sgml/ref/alter_function.sgml
+++ b/doc/src/sgml/ref/alter_function.sgml
@@ -29,6 +29,8 @@ ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
 SET SCHEMA new_schema
+ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+DEPENDS ON EXTENSION extension_name
 
 where action is one of:
 
@@ -148,6 +150,15 @@ ALTER FUNCTION name ( [ [ extension_name
+
+ 
+  The name of an extension that the function depends on.
+ 
+
+   
+
 
  CALLED ON NULL INPUT
  RETURNS NULL ON NULL INPUT
@@ -300,6 +311,15 @@ ALTER FUNCTION sqrt(integer) SET SCHEMA maths;
   
 
   
+   To mark the function sqrt for type
+   integer as being dependent on the extension
+   mathlib:
+
+ALTER FUNCTION sqrt(integer) DEPENDS ON EXTENSION mathlib;
+
+  
+
+  
To adjust the search path that is automatically set for a function:
 
 ALTER FUNCTION check_password(text) SET search_path = admin, pg_temp;
diff --git a/doc/src/sgml/ref/alter_trigger.sgml b/doc/src/sgml/ref/alter_trigger.sgml
index c63b5df..37c4d03 100644
--- a/doc/src/sgml/ref/alter_trigger.sgml
+++ b/doc/src/sgml/ref/alter_trigger.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 ALTER TRIGGER name ON table_name RENAME TO new_name
+ALTER TRIGGER name ON table_name DEPENDS ON EXTENSION extension_name
 
  
 
@@ -70,6 +71,15 @@ ALTER TRIGGER name ON 
 

+
+   
+extension_name
+
+ 
+  The name of an extension that the trigger depends on.
+ 
+
+   
   
  
 
@@ -93,6 +103,12 @@ ALTER TRIGGER name ON 
 ALTER TRIGGER emp_stamp ON emp RENAME TO emp_track_chgs;
 
+
+  
+   To mark a trigger as being dependent on an extension:
+
+ALTER TRIGGER emp_stamp ON emp DEPENDS ON EXTENSION emplib;
+
  
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 17f9de1..79595a9 1006

Re: [HACKERS] LOCK TABLE .. DEFERRABLE

2016-04-05 Thread Rod Taylor
On Tue, Apr 5, 2016 at 1:10 PM, Simon Riggs  wrote:

> If a lock is successfully obtained on one table, but not on all tables, it
>> releases that lock and will retry to get them as a group in the future.
>> Since inheritance acts as a group of tables (top + recursive cascade to
>> children), this implementation is necessary even if only a single table is
>> specified in the command.
>>
>
> I'd prefer to see this as a lock wait mode where it sits in the normal
> lock queue BUT other lock requestors are allowed to queue jump past it.
> That should be just a few lines changed in the lock conflict checker and
> some sleight of hand in the lock queue code.
>
> That way we avoid the busy-wait loop and multiple DEFERRABLE lock waiters
> queue up normally.
>

Yeah, that would be better. I can see how to handle a single structure in
that way but I'm not at all certain how to handle multiple tables and
inheritance is multiple tables even with a single command.

X1 inherits from X

There is a long-running task on X1.

Someone requests LOCK TABLE X IN ACCESS EXCLUSIVE MODE WAIT PATIENTLY.
Internally this also grabs X1.

The lock on X might be granted immediately and now blocks all other access
to that table.

There would need be a Locking Group kind of thing so various LockTags are
treated as a single entity to grant them simultaneously. That seems pretty
invasive; at least I don't see anything like that today.


Re: [HACKERS] LOCK TABLE .. DEFERRABLE

2016-04-05 Thread Andres Freund
On 2016-04-05 18:10:11 +0100, Simon Riggs wrote:
> I'd prefer to see this as a lock wait mode where it sits in the normal lock
> queue BUT other lock requestors are allowed to queue jump past it. That
> should be just a few lines changed in the lock conflict checker and some
> sleight of hand in the lock queue code.

+1, although wading into deadlock.c makes one need a shower.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE .. DEFERRABLE

2016-04-05 Thread Simon Riggs
On 5 April 2016 at 17:43, Rod Taylor  wrote:

> The intention of this feature is to give the ability to slip into a normal
> workload for non-urgent maintenance work. In essence, instead of lock
> waiters being in a Queue, DEFERRABLE causes the current lock statement to
> always be last.
>

Good idea; this was on my list of things to implement. I was going to call
it WAIT PATIENTLY option.


> It was discussed at last years pgCon as useful for replication tools
> adding/removing triggers. I've also seen more than one plpgsql loop using
> subtransactions and LOCK TABLE .. NOWAIT to achieve a similar effect. IMO,
> it's much cleaner built in.
>

Agreed, but your implementation is essentially just the same looping
concept, which I don't much like.


> If a lock is successfully obtained on one table, but not on all tables, it
> releases that lock and will retry to get them as a group in the future.
> Since inheritance acts as a group of tables (top + recursive cascade to
> children), this implementation is necessary even if only a single table is
> specified in the command.
>

I'd prefer to see this as a lock wait mode where it sits in the normal lock
queue BUT other lock requestors are allowed to queue jump past it. That
should be just a few lines changed in the lock conflict checker and some
sleight of hand in the lock queue code.

That way we avoid the busy-wait loop and multiple DEFERRABLE lock waiters
queue up normally.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Andrew Dunstan



On 04/05/2016 12:42 PM, Teodor Sigaev wrote:
I've been asked to look at and comment on the SQL API of the feature. 
I think
it's basically sound, although there is one thing that's not clear 
from the
regression tests: what happens if we're inserting into an object and 
the key

already exists? e.g.:

select jsonb_insert('{"a": {"b": "value"}}', '{a, b}', '"new_value"');

I think this should be forbidden, i.e. the function shouldn't ever 
overwrite an
existing value. If that's not handled it should be, and either way 
there should

be a regression test for it.


I'm agree about covering this case by tests, but I think it should be 
allowed.

In this case it will work exactly as jsbonb_set



It seems to me a violation of POLA to allow something called "insert" to 
do a "replace" instead.


cheers

andre



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-04-05 Thread Peter Geoghegan
On Tue, Apr 5, 2016 at 10:02 AM, Robert Haas  wrote:
> So the first thing here is that the patch seems to be a clear win in
> this test.  For a single copy, it seems to be pretty much a wash.
> When running 4 copies in parallel, it is about 20-25% faster with both
> logged and unlogged tables.  The second thing that is interesting is
> that we are getting super-linear scalability even without the patch:
> if 1 copy takes 20 seconds, you might expect 4 to take 80 seconds, but
> it really takes 60 unpatched or 45 patched.  If 1 copy takes 30
> seconds, you might expect 4 to take 120 seconds, but in really takes
> 105 unpatched or 80 patched.  So we're not actually I/O constrained on
> this test, I think, perhaps because this machine has an SSD.

It's not unusual for COPY to not be I/O constrained, I believe.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund
On 2016-04-05 12:14:35 -0400, Robert Haas wrote:
> On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
> > On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> >> This fluctuation started appearing after commit 6150a1b0 which we have
> >> discussed in another thread [1] and a colleague of mine is working on to
> >> write a patch to try to revert it on current HEAD and then see the results.
> >
> > I don't see what that buys us. That commit is a good win on x86...
> 
> Maybe.  But I wouldn't be surprised to find out that that is an
> overgeneralization.  Based on some results Mithun Cy showed me this
> morning, I think that some of this enormous run-to-run fluctuation
> that we're seeing is due to NUMA effects.  So some runs we get two
> things that are frequently accessed together on the same NUMA node and
> other times they get placed on different NUMA nodes and then
> everything sucks.  I don't think we fully understand what's going on
> here yet - and I think we're committing changes in this area awfully
> quickly - but I see no reason to believe that x86 is immune to such
> effects.  They may just happen in different scenarios than what we see
> on POWER.

I'm not really following - we were talking about 6150a1b0 ("Move buffer
I/O and content LWLocks out of the main tranche.") made four months
ago. Afaics the atomic buffer pin patch is a pretty clear win on both
ppc and x86?

I agree that there's numa effects we don't understand.  I think
re-considering Kevin's patch that did explicit numa hinting on linux
would be rather worthwhile. It makes a lot of sense to force shmem to be
force-interleaved, but make backend local memory, uh, local.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-04-05 Thread Robert Haas
On Thu, Mar 31, 2016 at 9:03 AM, Dilip Kumar  wrote:
> If you need some more information please let me know ?

I repeated the testing described in
http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
on a MacBook Pro (OS X 10.8.5, 2.4 GHz Intel Core i7, 8GB, early 2013)
and got the following results.  Note that I did not adjust
*_flush_delay in this test because that's always 0, apparently, on
MacOS.

master, unlogged tables, 1 copy: 0m18.928s, 0m20.276s, 0m18.040s
patched, unlogged tables, 1 copy: 0m20.499s, 0m20.879s, 0m18.912s
master, unlogged tables, 4 parallel copies: 0m57.301s, 0m58.045s, 0m57.556s
patched, unlogged tables, 4 parallel copies: 0m47.994s, 0m45.586s, 0m44.440s

master, logged tables, 1 copy: 0m29.353s, 0m29.693s, 0m31.840s
patched, logged tables, 1 copy: 0m30.837s, 0m31.567s, 0m36.843s
master, logged tables, 4 parallel copies: 1m45.691s, 1m53.085s, 1m35.674s
patched, logged tables, 4 parallel copies: 1m21.137s, 1m20.678s, 1m22.419s

So the first thing here is that the patch seems to be a clear win in
this test.  For a single copy, it seems to be pretty much a wash.
When running 4 copies in parallel, it is about 20-25% faster with both
logged and unlogged tables.  The second thing that is interesting is
that we are getting super-linear scalability even without the patch:
if 1 copy takes 20 seconds, you might expect 4 to take 80 seconds, but
it really takes 60 unpatched or 45 patched.  If 1 copy takes 30
seconds, you might expect 4 to take 120 seconds, but in really takes
105 unpatched or 80 patched.  So we're not actually I/O constrained on
this test, I think, perhaps because this machine has an SSD.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Alvaro Herrera
Andrew Dunstan wrote:

> I haven't been following this thread due to pressure of time, so my
> apologies in advance if these comments have already been covered.
> 
> I've been asked to look at and comment on the SQL API of the feature. I
> think it's basically sound, although there is one thing that's not clear
> from the regression tests: what happens if we're inserting into an object
> and the key already exists? e.g.:
> 
> select jsonb_insert('{"a": {"b": "value"}}', '{a, b}', '"new_value"');

It has been covered: Petr said, and I agree with him, that this new
interface is for arrays, not objects, and so the above example should be
rejected altogether.  For objects we already have jsonb_set(), so what
is the point of having this work there?  It feels too much like
TIMTOWTDI, which isn't a principle we adhere to.

I think it'd be much more sensible if we were just to make this function
work on arrays only.  There, the solution to Andrew's question is
trivial: if you insert a value in a position that's already occupied,
the elements to its right are "shifted" one position upwards in the
array (this is why this is an "insert" operation and not "replace" or
"set").

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LOCK TABLE .. DEFERRABLE

2016-04-05 Thread Rod Taylor
The intention of this feature is to give the ability to slip into a normal
workload for non-urgent maintenance work. In essence, instead of lock
waiters being in a Queue, DEFERRABLE causes the current lock statement to
always be last. It was discussed at last years pgCon as useful for
replication tools adding/removing triggers. I've also seen more than one
plpgsql loop using subtransactions and LOCK TABLE .. NOWAIT to achieve a
similar effect. IMO, it's much cleaner built in.


If a lock is successfully obtained on one table, but not on all tables, it
releases that lock and will retry to get them as a group in the future.
Since inheritance acts as a group of tables (top + recursive cascade to
children), this implementation is necessary even if only a single table is
specified in the command.


Like various CONCURRENT commands, it waits on a set of transactions which
were found to be blocking it. This puts it into the "waiting" state and
allows isolation testing to work as expected. I started with a simple loop
with a timer (and a GUC) but it didn't feel right without pg_stat_activity
showing the waiting state. statement_timeout is suggested for a time
restriction.


Possibly Ugly stuff:

SetLocktagRelationOid() no longer static inline. Better option? My C foo
isn't all that it should be. Lock Table allows locking shared tables so I
can't just assume MyDatabaseId is sufficient for the lock tag.

Return value InvalidOid in RangeVarGetRelidExtended() can now appear in 2
different situations; relation missing if missing_ok enabled and relation
unlockable if LockWaitPolicy LockWaitNonBlock. No callers currently use
both of these options at this time.

LockTableRecurse() returns the OID of the relation it could not lock in
order to wait on the processes holding those locks. It also keeps a list of
everything it did lock so they can be unlocked if necessary.


I'll add it to the open November commitfest.

regards,

Rod Taylor
diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index b946eab..e852f1d 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]
+LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT | DEFERRABLE ]
 
 where lockmode is one of:
 
@@ -39,7 +39,23 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
NOWAIT is specified, LOCK
TABLE does not wait to acquire the desired lock: if it
cannot be acquired immediately, the command is aborted and an
-   error is emitted.  Once obtained, the lock is held for the
+   error is emitted.  
+  
+
+  
+   If DEFERRABLE is specified,
+   LOCK TABLE will wait without blocking for the
+   duration of 
+   for all locks to become available. If all locks cannot be obtained
+   simultaneously before the timeout then none of the structures 
+   will be locked and an error is emitted. Since it is non-blocking,
+   other transactions may obtain locks freely and may cause the
+   required wait time to be infinite. Use statement_timeout
+   for to restrict the wait time.
+  
+
+  
+   Once obtained, the lock is held for the
remainder of the current transaction.  (There is no UNLOCK
TABLE command; locks are always released at transaction
end.)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34ba385..4259072 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4865,6 +4865,9 @@ l3:
 		RelationGetRelationName(relation;
 
 		break;
+	case LockWaitNonBlock:
+		elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+		break;
 }
 
 /*
@@ -4902,6 +4905,9 @@ l3:
 	 errmsg("could not obtain lock on row in relation \"%s\"",
 		RelationGetRelationName(relation;
 		break;
+	case LockWaitNonBlock:
+		elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+		break;
 }
 			}
 
@@ -5125,6 +5131,9 @@ heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode,
 	errmsg("could not obtain lock on row in relation \"%s\"",
 		   RelationGetRelationName(relation;
 			break;
+		case LockWaitNonBlock:
+			elog(ERROR, "unsupported lock wait_policy LockWaitNonBlock");
+			break;
 	}
 	*have_tuple_lock = true;
 
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 446b2ac..a0c4e56 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -46,6 +46,7 @@
 #include "nodes/makefuncs.h"
 #include "parser/parse_func.h"
 #include "storage/ipc.h"
+#include "storage/lock.h"
 #include "storage/lmgr.h"
 #include "storage/sinval.h"
 #include "utils/acl.h"
@@ -223,14 +224,19 @@ Datum		pg_is_other_temp_schema(PG_FUNCTION_ARGS);
  * If the schema or relation is not found, return InvalidOid if missing_ok
  * = true, otherwise raise an error.
  *
- * If nowait = true, throw an err

Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Teodor Sigaev

I've been asked to look at and comment on the SQL API of the feature. I think
it's basically sound, although there is one thing that's not clear from the
regression tests: what happens if we're inserting into an object and the key
already exists? e.g.:

select jsonb_insert('{"a": {"b": "value"}}', '{a, b}', '"new_value"');

I think this should be forbidden, i.e. the function shouldn't ever overwrite an
existing value. If that's not handled it should be, and either way there should
be a regression test for it.


I'm agree about covering this case by tests, but I think it should be allowed.
In this case it will work exactly as jsbonb_set
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 11:30 AM, Andres Freund  wrote:
> On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
>> This fluctuation started appearing after commit 6150a1b0 which we have
>> discussed in another thread [1] and a colleague of mine is working on to
>> write a patch to try to revert it on current HEAD and then see the results.
>
> I don't see what that buys us. That commit is a good win on x86...

Maybe.  But I wouldn't be surprised to find out that that is an
overgeneralization.  Based on some results Mithun Cy showed me this
morning, I think that some of this enormous run-to-run fluctuation
that we're seeing is due to NUMA effects.  So some runs we get two
things that are frequently accessed together on the same NUMA node and
other times they get placed on different NUMA nodes and then
everything sucks.  I don't think we fully understand what's going on
here yet - and I think we're committing changes in this area awfully
quickly - but I see no reason to believe that x86 is immune to such
effects.  They may just happen in different scenarios than what we see
on POWER.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-04-05 Thread Dmitry Ivanov
> It seems that everything is settled now, so here's the patch introducing the
> '<->' and '' operators. I've made the necessary changes to docs &
> regression tests.

I noticed that I had accidently trimmed whitespaces in docs, this is a better 
one.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/contrib/tsearch2/expected/tsearch2.out b/contrib/tsearch2/expected/tsearch2.out
index 972f764..97379e7 100644
--- a/contrib/tsearch2/expected/tsearch2.out
+++ b/contrib/tsearch2/expected/tsearch2.out
@@ -278,15 +278,15 @@ SELECT '(!1|2)&3'::tsquery;
 (1 row)
 
 SELECT '1|(2|(4|(5|6)))'::tsquery;
- tsquery 
--
- '1' | ( '2' | ( '4' | ( '5' | '6' ) ) )
+   tsquery   
+-
+ '1' | '2' | '4' | '5' | '6'
 (1 row)
 
 SELECT '1|2|4|5|6'::tsquery;
- tsquery 
--
- ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6'
+   tsquery   
+-
+ '1' | '2' | '4' | '5' | '6'
 (1 row)
 
 SELECT '1&(2&(4&(5&6)))'::tsquery;
@@ -340,7 +340,7 @@ select 'a' > 'b & c'::tsquery;
 select 'a | f' < 'b & c'::tsquery;
  ?column? 
 --
- t
+ f
 (1 row)
 
 select 'a | ff' < 'b & c'::tsquery;
@@ -443,9 +443,9 @@ select count(*) from test_tsquery where keyword >  'new & york';
 
 set enable_seqscan=on;
 select rewrite('foo & bar & qq & new & york',  'new & york'::tsquery, 'big & apple | nyc | new & york & city');
- rewrite  
---
- 'foo' & 'bar' & 'qq' & ( 'city' & 'new' & 'york' | ( 'nyc' | 'big' & 'apple' ) )
+   rewrite
+--
+ 'foo' & 'bar' & 'qq' & ( 'nyc' | 'big' & 'apple' | 'city' & 'new' & 'york' )
 (1 row)
 
 select rewrite('moscow', 'select keyword, sample from test_tsquery'::text );
@@ -461,9 +461,9 @@ select rewrite('moscow & hotel', 'select keyword, sample from test_tsquery'::tex
 (1 row)
 
 select rewrite('bar &  new & qq & foo & york', 'select keyword, sample from test_tsquery'::text );
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;
@@ -479,9 +479,9 @@ select rewrite( ARRAY['moscow & hotel', keyword, sample] ) from test_tsquery;
 (1 row)
 
 select rewrite( ARRAY['bar &  new & qq & foo & york', keyword, sample] ) from test_tsquery;
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select keyword from test_tsquery where keyword @> 'new';
@@ -520,9 +520,9 @@ select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('e
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'bar &  new & qq & foo & york') as query where keyword <@ query;
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'moscow') as query where query @> keyword;
@@ -538,9 +538,9 @@ select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('e
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'bar &  new & qq & foo & york') as query where query @> keyword;
-   rewrite

Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-05 Thread Amit Kapila
On Tue, Apr 5, 2016 at 5:35 PM, Magnus Hagander  wrote:

>
>
> On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila 
> wrote:
>
>> On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander 
>> wrote:
>>
>>> On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila 
>>> wrote:
>>>
>>>
 Also, I think below part of documentation for pg_start_backup() needs
 to be modified:

 

 pg_start_backup accepts an

 arbitrary user-defined label for the backup.  (Typically this would
 be

 the name under which the backup dump file will be stored.)  The
 function

 writes a backup label file (backup_label) and, if
 there

 are any links in the pg_tblspc/ directory, a
 tablespace map

 file (tablespace_map) into the database cluster's data

 directory, performs a checkpoint, and then returns the backup's
 starting

 transaction log location as text.  The user can ignore this result
 value,

 but it is provided in case it is useful.

>>>
>>> That one definitely needs to be fixed, as it's part of the reference.
>>> Good spot.
>>>
>>>
>>>
 Similarly, there is a description for pg_stop_backup which needs to be
 modified.

>>>
>>> That's the one you're referring to in your first commend above, is it
>>> not? Or is there one more that you mean?
>>>
>>>
>> I am referring to below part of docs in func.sgml
>>
>> 
>>
>> pg_stop_backup removes the label file and, if it exists,
>>
>> the tablespace_map file created by
>>
>> pg_start_backup, and creates a backup history file in
>>
>> the transaction log archive area.  The history file includes the
>> label given to
>>
>> pg_start_backup, the starting and ending transaction
>> log locations for
>>
>> the backup, and the starting and ending times of the backup.  The
>> return
>>
>> value is the backup's ending transaction log location (which again
>>
>> can be ignored).  After recording the ending location, the current
>>
>> transaction log insertion
>>
>> point is automatically advanced to the next transaction log file, so
>> that the
>>
>> ending transaction log file can be archived immediately to complete
>> the backup.
>>
>>  
>>
>>
> Ah, gotcha. Clearly I wasn't paying attention properly.
>
> PFA a better one (I think), also with the rename and added comments that
> David was asking for.
>
>
This version looks good.  Apart from review, I have done some tests
(including regression test), everything seems to be fine.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Amit Kapila
On Tue, Apr 5, 2016 at 9:00 PM, Andres Freund  wrote:
>
> On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> > This fluctuation started appearing after commit 6150a1b0 which we have
> > discussed in another thread [1] and a colleague of mine is working on to
> > write a patch to try to revert it on current HEAD and then see the
results.
>
> I don't see what that buys us. That commit is a good win on x86...
>

At least, that way we can see the results of pin-unpin without
fluctuation.  I agree that we need to narrow down why even after reducing
BufferDesc size, we are seeing performance fluctuation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Sequence Access Method WIP

2016-04-05 Thread Petr Jelinek

On 04/04/16 15:53, Fabrízio de Royes Mello wrote:



On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek mailto:p...@2ndquadrant.com>> wrote:
 >
 > Hi,
 >
 > new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the
AMs can define their own type.
 >

Yeah... now all works fine... I had a minor issue when apply to the
current master but the attached fix it and I also added tab-complete
support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.



Cool thanks.


I ran all the regression tests and all passed.

I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in
GaplessSequenceState struct?



I missed we have that :)

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-04-05 Thread Andrew Dunstan



On 03/31/2016 09:00 AM, Dmitry Dolgov wrote:
On 31 March 2016 at 17:31, Vitaly Burovoy > wrote:


it is logical to insert new value if "before", then current value,
then new
value if "after".


Oh, I see now. There is a slightly different logic: `v` is a current 
value and `newval` is a new value.
So basically we insert a current item in case of "after", then a new 
value (if it's not a delete operation),
then a current item in case of "before". But I agree, this code can be 
more straightforward. I've attached
a new version, pls take a look (it contains the same logic that you've 
mentioned).




I haven't been following this thread due to pressure of time, so my 
apologies in advance if these comments have already been covered.


I've been asked to look at and comment on the SQL API of the feature. I 
think it's basically sound, although there is one thing that's not clear 
from the regression tests: what happens if we're inserting into an 
object and the key already exists? e.g.:


select jsonb_insert('{"a": {"b": "value"}}', '{a, b}', '"new_value"');

I think this should be forbidden, i.e. the function shouldn't ever 
overwrite an existing value. If that's not handled it should be, and 
either way there should be a regression test for it.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund
On 2016-04-05 20:56:31 +0530, Amit Kapila wrote:
> This fluctuation started appearing after commit 6150a1b0 which we have
> discussed in another thread [1] and a colleague of mine is working on to
> write a patch to try to revert it on current HEAD and then see the results.

I don't see what that buys us. That commit is a good win on x86...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Amit Kapila
On Tue, Apr 5, 2016 at 8:15 PM, Andres Freund  wrote:
>
> On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> > Could the reason be that we're increasing concurrency for LWLock state
> > atomic variable by placing queue spinlock there?
>
> Don't think so, it's the same cache-line either way.
>
> > But I wonder why this could happen during "pgbench -S", because it
doesn't
> > seem to have high traffic of exclusive LWLocks.
>
> Yea, that confuses me too. I suspect there's some mis-aligned
> datastructures somewhere. It's hard to investigate such things without
> access to hardware.
>

This fluctuation started appearing after commit 6150a1b0 which we have
discussed in another thread [1] and a colleague of mine is working on to
write a patch to try to revert it on current HEAD and then see the results.

> (FWIW, I'm working on getting pinunpin committed)
>

Good to know, but I am slightly worried that it will make the problem
harder to detect as it will reduce the reproducibility.  I understand that
we are running short of time and committing this patch is important, so we
should proceed with it as this is not a problem of this patch.  After this
patch gets committed, we always need to revert it locally to narrow down
the problem due to commit 6150a1b0.

[1] -
http://www.postgresql.org/message-id/caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Masahiko Sawada
On Tue, Apr 5, 2016 at 7:23 PM, Kyotaro HORIGUCHI
 wrote:
> At Mon, 4 Apr 2016 22:00:24 +0900, Masahiko Sawada  
> wrote in 
>> > For this case, the tree members of SyncRepConfig are '2[Sby1,',
>> > 'Sby2', "Sby3]'. This syntax is valid for the current
>> > specification but will surely get different meaning by the future
>> > changes. We should refuse this known-to-be-wrong-in-future syntax
>> > from now.
>>
>> I couldn't get your point but why will the above syntax meaning be
>> different from current meaning by future change?
>> I thought that another method uses another kind of parentheses.
>
> If the 'another kind of parehtheses' is a pair of brackets, an
> application_name 'tokyo[A]', for example, is currently allowed to
> occur unquoted in the list but will become disallowed by the
> syntax change.
>
>

Thank you for explaining.
I understood but since the future syntax is yet to be reached
consensus, I thought that it would be difficult  to refuse particular
kind of parentheses for now.

> > list_member_int() performs the loop internally. So I'm not sure how much
> > adding extra list_member_int() here can optimize this processing.
> > Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
> > standby or not. In this idea, without adding extra loop, we can exit 
> > earilier
> > in the case where I'm not a sync standby. Does this make sense?
> The list_member_int() is also performed in the "(snip)" part. So
> SyncRepGetSyncStandbys() returning am_sync seems making sense.
>
> sync_standbys = SyncRepGetSyncStandbys(am_sync);
>
> /*
> *  Quick exit if I am not synchronous or there's not
> *  enough synchronous standbys
> * /
> if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
> {
>  list_free(sync_standbys);
> return false;

I meant that it can skip to acquire spin lock at least, so it will
optimise that logic.
But anyway I agree with making SyncRepGetSyncStandbys returns am_sync variable.

-- 
Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-05 Thread Anastasia Lubennikova

05.04.2016 01:48, Peter Geoghegan :

On Mon, Mar 21, 2016 at 9:53 AM, Anastasia Lubennikova
 wrote:

It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
 /*
  * We copy the last item on the page into the new page, and then
  * rearrange the old page so that the 'last item' becomes its high
key
  * rather than a true data item.  There had better be at least two
  * items on the page already, else the page would be empty of useful
  * data.
  */
 /*
  * Move 'last' into the high key position on opage
  */

To be consistent with other steps of algorithm ( all high keys must be
truncated tuples), I had to update this high key on place:
delete the old one, and insert truncated high key.

Hmm. But the high key comparing equal to the Scankey gives insertion
the choice of where to put its IndexTuple (it can go on the page with
the high key, or its right-sibling, according only to considerations
about fillfactor, etc). Is this changed? Does it not matter? Why not?
Is it just worth it?


I would say, this is changed, but it doesn't matter.
Performing any search in btree (including choosing suitable page for 
insertion), we use only key attributes.

We assume that included columns are stored in index unordered.
Simple example.
create table tbl(id int, data int);
create index idx on tbl (id) including (data);

Select query does not consider included columns in scan key.
It selects all tuples satisfying the condition on key column. And only 
after that it applies filter to remove wrong rows from the result.
If key attribute doesn't satisfy query condition, there are no more 
tuples to return and we can interrupt scan.


You can find more explanations in the attached sql script,
that contains queries to recieve detailed information about index 
structure using pageinspect.



The right-most page on every level has no high-key. But you say those
pages have an "imaginary" *positive* infinity high key, just as
internal pages have (non-imaginary) minus infinity downlinks as their
first item/downlink. So tuples in a (say) leaf page are always bound
by the downlink lower bound in parent, while their own high key is an
upper bound. Either (and, rarely, both) could be (positive or
negative) infinity.

Maybe you now see why I talked about special _bt_compare() logic for
this. I proposed special logic that is similar to the existing minus
infinity thing _bt_compare() does (although _bt_binsrch(), an
important caller of _bt_compare() also does special things for
internal .vs leaf case, so I'm not sure any new special logic must go
in _bt_compare()).


In my view, it's the correct way to fix this problem, because the caller is
responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it an
assertion (see the patch attached).

I see what you mean, but I think we need to decide what to do about
the key space when leaf high keys are truncated. I do think that
truncating the high key was the right idea, though, and it nicely
illustrates that nothing special should happen in upper levels. Suffix
truncation should only happen when leaf pages are split, generally
speaking.
As I said, the high key is very similar to the downlinks, in that both
bound the things that go on each page. Together with downlinks they
represent *discrete* ranges *unambiguously*, so INCLUDING truncation
needs to make it clear which page new items go on. As I said,
_bt_binsrch() already takes special actions for internal pages, making
sure to return the item that is < the scankey, not <= the scankey
which is only allowed for leaf pages. (See README, from "Lehman and
Yao assume that the key range for a subtree S is described by Ki < v
<= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent
page...").

To give a specific example, I worry about the case where two sibling
downlinks in a parent page are distinct, but per specific-to-Postgres
"Ki <= v <= Ki+1" thing (which differs from the classic L&Y
invariant), some tuples with all right downlink's attributes matching
end up in left child page, not right child page. I worry that since
_bt_findsplitloc() doesn't consider this (for example), the split
point doesn't *reliably* and unambiguously divide the key space
between the new halves of a page being split. I think the "Ki <= v <=
Ki+1"/_bt_binsrch() thing might save you in common cases where all
downlink attributes are distinct, so maybe that simpler case is okay.
But to be even more specific, what about the more complicated case
where the downlinks *are* fully _bt_compare()-wise equal? This could
happen even though they're constrained to be unique in leaf pages, due
to bloat. Unique indexes aren't special here; they just make it far
less likely that this would happen in practice, because it takes a lot
of bloat. Less importantly, when that bloat happens, you don't wa

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Robert Haas
On Mon, Apr 4, 2016 at 4:28 AM, Fujii Masao  wrote:
>>> + ereport(LOG,
>>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>>> + application_name, MyWalSnd->sync_standby_priority)));
>>>
>>> s/ the / a /
>
> I have no objection to this change itself. But we have used this message
> in 9.5 or before, so if we apply this change, probably we need
> back-patching.

"the" implies that there can be only one synchronous standby at that
priority, while "a" implies that there could be more than one.  So the
situation might be different with this patch than previously.  (I
haven't read the patch so I don't know whether this is actually true,
but it might be what Thomas was going for.)

Also, I'd like to associate myself with the general happiness about
the prospect of having this feature in 9.6 (but without specifically
endorsing the code, since I have not read it).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-04-05 Thread Amit Kapila
On Tue, Apr 5, 2016 at 3:15 PM, Fujii Masao  wrote:
>
> On Tue, Apr 5, 2016 at 4:31 PM, Amit Kapila 
wrote:
> > On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao 
wrote:
> >>
> >>
> >> Thanks for updating the patch!
> >>
> >> I applied the following changes to the patch.
> >> Attached is the revised version of the patch.
> >>
> >
> > 1.
> >{
> > {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
> > gettext_noop("List of names of potential synchronous standbys."),
> > NULL,
> > GUC_LIST_INPUT
> > },
> > &SyncRepStandbyNames,
> > "",
> > check_synchronous_standby_names, NULL, NULL
> > },
> >
> > Isn't it better to modify the description of synchronous_standby_names
in
> > guc.c based on new usage?
>
> What about "Number of synchronous standbys and list of names of
> potential synchronous ones"? Better idea?
>

Looks good.

>
> > 2.
> > pg_stat_get_wal_senders()
> > {
> > ..
> > /*
> > ! * Allocate and update the config data of synchronous replication,
> > ! * and then get the currently active synchronous standbys.
> >   */
> > + SyncRepUpdateConfig();
> >   LWLockAcquire(SyncRepLock, LW_SHARED);
> > ! sync_standbys = SyncRepGetSyncStandbys();
> >   LWLockRelease(SyncRepLock);
> > ..
> > }
> >
> > Why is it important to update the config with patch?  Earlier also any
> > update to config between calls wouldn't have been visible.
>
> Because a backend has no chance to call SyncRepUpdateConfig() and
> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
> called here. This means that pg_stat_replication may return the
information
> based on the old value of s_s_names.
>

Thats right, but without this patch also won't pg_stat_replication can show
old information? If no, why so?

> > 3.
> >   Planning for High Availability
> >
> >  
> > ! synchronous_standby_names specifies the number of
> > ! synchronous standbys that transaction commits made when
> >
> > Is it better to say like: synchronous_standby_names
specifies
> > the number and names of
>
> Precisely s_s_names specifies a list of names of potential sync standbys
> not sync ones.
>

Okay, but you doesn't seem to have updated this in your latest patch.

> > 4.
> > + /*
> > +  * Return the list of sync standbys, or NIL if no sync standby is
> > connected.
> > +  *
> > +  * If there are multiple standbys with the same priority,
> > +  * the first one found is considered as higher priority.
> >
> > Here line indentation of second line can be improved.
>
> What about "the first one found is selected first"? Or better idea?
>

What I was complaining about that few words from second line can be moved
to previous line, but may be pgindent will take care of same, so no need to
worry.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Andres Freund
On 2016-04-05 17:36:49 +0300, Alexander Korotkov wrote:
> Could the reason be that we're increasing concurrency for LWLock state
> atomic variable by placing queue spinlock there?

Don't think so, it's the same cache-line either way.

> But I wonder why this could happen during "pgbench -S", because it doesn't
> seem to have high traffic of exclusive LWLocks.

Yea, that confuses me too. I suspect there's some mis-aligned
datastructures somewhere. It's hard to investigate such things without
access to hardware.

(FWIW, I'm working on getting pinunpin committed)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in parallel aggregate

2016-04-05 Thread Robert Haas
On Mon, Apr 4, 2016 at 10:26 PM, David Rowley
 wrote:
> Does this need to check the parallel flags on the transfn or serialfn?
> these'll be executed on the worker process. Possibly we also need the
> combinefn/deserialfn/finalfn to be checked too as I see that we do
> generate_gather_paths() from set_append_rel_pathlist().

That's basically the same as Tom's question, I think.  For right now,
I'd like to regard the aggregate function's pg_proc marking as
certifying that the entire aggregate can be trusted to be
parallel-safe.  That's cheap and easy to check.  If, in the future, we
want to allow more complicated things where some but not all of
aggregate's functions are parallel-safe, we can add logic for that
then - i.e. if the aggregate is labeled as parallel-restricted, then
inquire within.  But to be honest, I hope we won't get there.  As it
is, the list of things that you might want to do in an aggregate that
are parallel-unsafe is pretty short, and I hope we're going to go in
the direction of making even more things parallel-safe in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another small patch - reorderbuffer.c:1099

2016-04-05 Thread Alvaro Herrera
Simon Riggs wrote:
> On 5 April 2016 at 10:12, Andres Freund  wrote:
> 
> > On 2016-04-05 12:07:40 +0300, Aleksander Alekseev wrote:
> > > > I recall discussing this code with Andres, and I think that he has
> > > > mentioned me this is intentional, because should things be changed for
> > > > a reason or another in the future, we want to keep in mind that a list
> > > > of TXIDs and a list of sub-TXIDs should be handled differently.
> > >
> > > I see. If this it true I think there should be a comment that explains
> > > it. When you read such a code you suspect a bug. Not mentioning that
> > > static code analyzers (I'm currently experimenting with Clang and PVS
> > > Studio) complain about code like this.
> >
> > There's different comments in both branches...
> 
> Then one or both of the comments is incomplete.

IMO the code is wrong.  There should be a single block comment saying
something like "Remove the node from its containing list.  In the FOO
case, the list corresponds to BAR and therefore we delete it because
BAZ.  In the QUUX case the list is PLUGH and we delete in because THUD."
Then a single line dlist_delete(...) follows.

The current arrangement looks bizantine to me, for no reason.  If we
think that one of the two branches might do something additional to the
list deletion, surely that will be in a separate stanza with its own
comment; and if we ever want to remove the list deletion from one of the
two cases (something that strikes me as unlikely) then we will need to
fix the comment, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-04-05 Thread Alexander Korotkov
On Tue, Apr 5, 2016 at 10:26 AM, Dilip Kumar  wrote:

>
> On Mon, Apr 4, 2016 at 2:28 PM, Andres Freund  wrote:
>
>> Hm, interesting. I suspect that's because of the missing backoff in my
>> experimental patch. If you apply the attached patch ontop of that
>> (requires infrastructure from pinunpin), how does performance develop?
>>
>
> I have applied this patch also, but still results are same, I mean around
> 550,000 with 64 threads and 650,000 with 128 client with lot of
> fluctuations..
>
> *128 client 
> **(head+**0001-WIP-Avoid-the-use-of-a-separate-spinlock-to-protect
> +pinunpin-cas-9+backoff)*
>
> run1 645769
> run2 643161
> run3 *285546*
> run4 *289421*
> run5 630772
> run6 *284363*
>

Could the reason be that we're increasing concurrency for LWLock state
atomic variable by placing queue spinlock there?
But I wonder why this could happen during "pgbench -S", because it doesn't
seem to have high traffic of exclusive LWLocks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 9:30 AM, David Rowley
 wrote:
> On 6 April 2016 at 01:01, Robert Haas  wrote:
>> On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
>>> To be really honest, I'm quite worried that if I go and make this
>>> change then my time might be wasted as I really think making that
>>> change this late in the day is just setting this up for failure.  I
>>> really don't think we can bat this patch over the Pacific Ocean too
>>> many times before we find ourselves inside the feature freeze. Of
>>> course, if you really think it's no good, that's different, it can't
>>> be committed, but "I think it might be better" does not seem like a
>>> solid enough argument for me to want to risk trying this and delaying
>>> further for that.
>>
>> I think I agree.  Certainly, with what we've got here today, these are
>> not user-exposed, so I think we could certainly change them next time
>> around if need be.  If they ever become user-exposed, then maybe we
>> should think a little harder.
>
> I understand your concern. Painting ourselves into a corner would be pretty 
> bad.
>
> Although, I'm not so sure we'd want to go down the route of trying to
> stabilise the format, even into something human readable. As far as I
> know we've never mentioned anything about what's stored in the
> internal aggregate states in the documentation, and we've made some
> fairly hefty changes to these over the last few releases. For example;
> 959277a4, where internal int128 support was added to improve
> performance of various aggregates, like sum(bigint). I think if we're
> going to head down the route of trying to keep this format stable,
> then we're going to struggle to make improvements to aggregates in the
> future.

That might be true, but it's much more likely to be true if the
internal format is an opaque blob.  If the serialized transition state
for average is {count,sum} then we can switch between having the
internal representation being numeric and having it be int128 and
nothing breaks.  With the opaque blob, maybe nothing breaks, or maybe
something does.

> It's also not all that clear that all of the internal fields really
> make sense to other databases, for example NumericAggState's maxScale
> and maxScaleCount. These really only are needed for moving aggregates.
> Would we want some other database to have to magic up some numbers for
> these fields because our format requires it?

Well, if we want to do cross-node aggregation with that database on
the other side, those numbers are going to have to get magicked up
someplace along the line.

I'm going to concede the point that this shouldn't really be a
priority for 9.6, but I might want to come back to it later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-04-05 Thread Alvaro Herrera
Abhijit Menon-Sen wrote:
> At 2016-04-05 12:33:56 +0530, a...@2ndquadrant.com wrote:
> >
> > Álvaro: I did document and test the extra types you added, but now
> > that I think about it a bit more, it's hard to argue that it's useful
> > to have a table, for example, depend on an extension. There's really
> > nothing about a table that "doesn't work without" an extension.
> 
> I think it makes sense to implement this for triggers and functions. It
> may also be useful for indexes and materialised views, which can refer
> to functions in an extension (and in future, sequences as well).
> 
> It's certainly good to know the grammar would work if we wanted to add
> other object types in future, but I think we should leave it at that.

Yes, agreed.  What I implemented weren't cases that were supposed to be
useful to users, only those for which I thought it was good to test
bison with.  Sorry I wasn't clear about this.  Feel free the strip out
(some of?) them, if they aren't useful.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread David Rowley
On 6 April 2016 at 01:01, Robert Haas  wrote:
> On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
>> To be really honest, I'm quite worried that if I go and make this
>> change then my time might be wasted as I really think making that
>> change this late in the day is just setting this up for failure.  I
>> really don't think we can bat this patch over the Pacific Ocean too
>> many times before we find ourselves inside the feature freeze. Of
>> course, if you really think it's no good, that's different, it can't
>> be committed, but "I think it might be better" does not seem like a
>> solid enough argument for me to want to risk trying this and delaying
>> further for that.
>
> I think I agree.  Certainly, with what we've got here today, these are
> not user-exposed, so I think we could certainly change them next time
> around if need be.  If they ever become user-exposed, then maybe we
> should think a little harder.

I understand your concern. Painting ourselves into a corner would be pretty bad.

Although, I'm not so sure we'd want to go down the route of trying to
stabilise the format, even into something human readable. As far as I
know we've never mentioned anything about what's stored in the
internal aggregate states in the documentation, and we've made some
fairly hefty changes to these over the last few releases. For example;
959277a4, where internal int128 support was added to improve
performance of various aggregates, like sum(bigint). I think if we're
going to head down the route of trying to keep this format stable,
then we're going to struggle to make improvements to aggregates in the
future.

It's also not all that clear that all of the internal fields really
make sense to other databases, for example NumericAggState's maxScale
and maxScaleCount. These really only are needed for moving aggregates.
Would we want some other database to have to magic up some numbers for
these fields because our format requires it?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-04-05 Thread Merlin Moncure
On Wed, Mar 30, 2016 at 6:49 PM, Josh berkus  wrote:
> http://www.zdnet.com/article/microsoft-and-canonical-partner-to-bring-ubuntu-to-windows-10/
>
> ... could be good news for us ...

This is nothing new.  Windows has had a unix subsystem (Interix AKA
Windows Services for Unix) for quite some time now.  See:
http://www.postgresql.org/message-id/303E00EBDD07B943924382E153890E5434AA78@cuthbert.rcsinc.local

...it was missing sync() but not fork() -- I had a functional native
non-cygwin postgres that beat the mingw port by a major version :-).
I was astounded to see that Interix is still supported as of windows 8
but no longer will be available as of windows 10.  ISTM the new ubuntu
stuff is the same engine with some spit polish and a heavy dose of
advertising.

Thinking about this objectively, it would be nice to see:
*) broad developer adoption
*) performance matching native
*) clean install process that could be integrated to the windows installer

before considering deprecating the native port, the first of those
being the most important.  The cautionary note here is that Microsoft
can and will kill the ubuntu subsystem if it's not broadly adopted
(Interix is something of a cautionary tale here).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-04-05 Thread David Steele

On 4/5/16 8:05 AM, Magnus Hagander wrote:


PFA a better one (I think), also with the rename and added comments that
David was asking for.

Barring objections, I will apply this version.


This version looks good to me.

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-04-05 Thread Robert Haas
On Tue, Apr 5, 2016 at 3:55 AM, David Rowley
 wrote:
>> I think it might be a good idea if these patches made less use of
>> bytea and exposed the numeric transition values as, say, a 2-element
>> array of numeric.
>
> Well, if you have a look at NumericAggState you can see it's not quite
> as simple as an array of numeric, unless of course you'd be willing to
> spend the extra cycles, use the extra memory, and bandwidth to convert
> those int64's to numeric too, then it could be made to work. To do are
> you describe properly, we'd need a composite type.

Uggh, yeah.  Yuck.

> hmm, isn't that why we have a deserialisation functions? Do you see a
> use case where these won't be available?
...
> I've not yet read the design spec for sharding in Postgres. If there's
> something I can read over to get an idea of why you think this won't
> work, then maybe we can come to a good conclusion that way.  But if I
> take a guess, then I'd have imagined that we'd not support sharding
> over different major versions, and if we really needed to change the
> serialisation format later, then we could do so. We could even put a
> note in the documents that the serialisation format may change between
> major versions.

Well, OK, so here was my thought.  For the sake of simplicity, let's
suppose that creating a sharded table works more or less like what you
can already do today: create a parent table with a non-inherited CHECK
(false) constraint and then create some inheritance children that are
foreign tables on various remote servers.  Give those children CHECK
constraints that explicate the partitioning scheme.  This is probably
not actually how we want this to work in detail (e.g. we probably want
declarative partitioning) but the details don't matter very much for
purposes of what I'm trying to explain here so let's just ignore them
for the moment.

Now, let's suppose that the user sets up a sharded table and then
says: SELECT a, SUM(b), AVG(c) FROM sometab.  At this point, what we'd
like to have happen is that for each child foreign table, we go and
fetch partially aggregated results.  Those children might be running
any version of PostgreSQL - I was not assuming that we'd insist on
matching major versions, although of course that could be done - and
there would probably need to be a minimum version of PostgreSQL
anyway.  They could even be running some other database.  As long as
they can spit out partial aggregates in a format that we can
understand, we can deserialize those aggregates and run combine
functions on them.  But if the remote side is, say, MariaDB, it's
probably much easier to get it to spit out something that looks like a
PostgreSQL array than it is to make it spit out some bytea blob that's
in an entirely PostgreSQL-specific format.

Now, maybe that doesn't matter.  Even getting something like this
working with PostgreSQL as the remote side is going to be hard.
Moreover, for this to have any chance of working with anything other
than a compatible PostgreSQL server on the remote side, the FDW is
going to have to write bespoke code for each aggregate anyway, and
that code can always construct the requisite bytea blobs internally.
So who cares?  I can't point to any really tangible advantage of
having serialized transition states be human-readable, so maybe there
isn't one.  But I was thinking about it, for fear that there might be
some advantage that I'm missing.

> To be really honest, I'm quite worried that if I go and make this
> change then my time might be wasted as I really think making that
> change this late in the day is just setting this up for failure.  I
> really don't think we can bat this patch over the Pacific Ocean too
> many times before we find ourselves inside the feature freeze. Of
> course, if you really think it's no good, that's different, it can't
> be committed, but "I think it might be better" does not seem like a
> solid enough argument for me to want to risk trying this and delaying
> further for that.

I think I agree.  Certainly, with what we've got here today, these are
not user-exposed, so I think we could certainly change them next time
around if need be.  If they ever become user-exposed, then maybe we
should think a little harder.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >