Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-06-24 Thread Etsuro Fujita

Hi KaiGai-san,

I'd like to work on this issue with you!

On 2015/06/25 10:48, Kouhei Kaigai wrote:

I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.


Maybe I'm missing something, but I don't think your proposal works 
properly because we don't have any component ForeignScan state node or 
subsidiary join state node once we've replaced the entire join with the 
ForeignScan performing the join remotely, IIUC.  So, my image was to 
have another subplan for EvalPlanQual as well as the ForeignScan, to do 
the entire join locally for the component test tuples if we are inside 
an EvalPlanQual recheck.



BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/


OK


Also, as an aside, postgres_fdw does not implement RefetchForeignRow()
at this moment. Does it make a problem?


I don't think so, though I think it would be better to test that the 
foreign join pushdown API patch also allows late row locking using the 
postgres_fdw.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-24 Thread Fujii Masao
On Thu, Jun 25, 2015 at 12:15 PM, Michael Paquier
 wrote:
> On Wed, Jun 24, 2015 at 11:30 PM, Fujii Masao  wrote:
>> On Fri, May 15, 2015 at 9:18 PM, Michael Paquier
>>  wrote:
>>> On Fri, May 15, 2015 at 8:55 PM, Beena Emerson  
>>> wrote:
 There was a discussion on support for N synchronous standby servers started
 by Michael. Refer
 http://archives.postgresql.org/message-id/cab7npqr9c84ig0zuvhmqamq53vqsd4rc82vyci4dr27pvof...@mail.gmail.com
 . The use of hooks and dedicated language was suggested, however, it seemed
 to be an overkill for the scenario and there was no consensus on this.
 Exploring GUC-land was preferred.
>>>
>>> Cool.
>>>
 Please find attached a patch,  built on Michael's patch from above 
 mentioned
 thread, which supports choosing different number of nodes from each set 
 i.e.
 k nodes from set 1, l nodes from set 2, so on.
 The format of synchronous_standby_names has been updated to standby name
 followed by the required count separated by hyphen. Ex: 'aa-1, bb-3'.  The
 transaction waits for all the specified number of standby in each group. 
 Any
 extra nodes with the same name will be considered potential. The special
 entry * for the standby name is also supported.
>>>
>>> I don't think that this is going in the good direction, what was
>>> suggested mainly by Robert was to use a micro-language that would
>>> allow far more extensibility that what you are proposing. See for
>>> example ca+tgmobpwoenmmepfx0jwrvqufxvbqrv26ezq_xhk21gxrx...@mail.gmail.com
>>> for some ideas.
>>
>> Doesn't this approach prevent us from specifying the "potential" synchronous
>> standby server? For example, imagine the case where you want to treat
>> the server AAA as synchronous standby. You also want to use the server BBB
>> as synchronous standby only if the server AAA goes down. IOW, you want to
>> prefer to the server AAA as synchronous standby rather than BBB.
>> Currently we can easily set up that case by just setting
>> synchronous_standby_names as follows.
>>
>> synchronous_standby_names = 'AAA, BBB'
>>
>> However, after we adopt the quorum commit feature with the proposed
>> macro-language, how can we set up that case? It seems impossible...
>> I'm afraid that this might be a backward compatibility issue.
>
> Like that:
> synchronous_standby_names = 'AAA, BBB'
> The thing is that we need to support the old grammar as well to be
> fully backward compatible,

Yep, that's an idea. Supporting two different grammars is a bit messy, though...
If we merge the "priority" concept to the quorum commit,
that's better. But for now I have no idea about how we can do that.

> and that's actually equivalent to that in
> the grammar: 1(AAA,BBB,CCC).

I don't think that they are the same. In the case of 1(AAA,BBB,CCC), while
two servers AAA and BBB are running, the master server may return a success
of the transaction to the client just after it receives the ACK from BBB.
OTOH, in the case of AAA,BBB, that never happens. The master must wait for
the ACK from AAA to arrive before completing the transaction. And then,
if AAA goes down, BBB should become synchronous standby.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] checkpointer continuous flushing

2015-06-24 Thread Fabien COELHO


Hello Amit,


[...]
Ok, I misunderstood your question. I thought you meant a dip between 1
client and 4 clients. I meant that when flush is turned on tps goes down by
8% (743 to 681 tps) on this particular run.


This 8% might matter if the dip is bigger with more clients and
more aggressive workload.  Do you know what could lead to this
dip, because if we know what is the reason than it will be more
predictable to know if this is the max dip that could happen or it
could lead to bigger dip in other cases.


I do not know the cause of the dip, and whether it would increase with 
more clients. I do not have a box for such tests. If someone can provided 
the box, I can provide test scripts:-)


The first, although higher, measure is really very unstable, with pg 
totaly unresponsive (offline, really) at time.


I think that the flush option may always have a risk of (small) 
detrimental effects on tps, because there are two steady states: one with 
pg only doing wal-logged transactions with great tps, and one with pg 
doing the checkpoint at nought tps. If this is on the same disk, even at 
best the combination means that probably each operation will amper the 
other one a little bit, so the combined tps performance would/could be 
lower than doing one after the other and having pg offline 50% of the 
time...


Please also note that this 8% "dip" is on a 681 (with the dip) vs 198 (no 
options at all) a X 3.4 improvement compared to pg current behavior.



Basically tps improvements mostly come from "sort", and "flush" has
uncertain effects on tps (throuput), but much more on latency and
performance stability (lower late rate, lower standard deviation).


I agree that performance stability is important, but not sure if it
is good idea to sacrifice the throuput for it.


See discussion above. I think better stability may imply slightly lower 
throughput on some load. That is why there are options and DBA to choose 
them:-)


If sort + flush always gives better results, then isn't it better to 
perform these actions together under one option.


Sure, but that is not currently the case. Also what is done is very 
orthogonal, so I would tend to keep these separate. If one is always 
beneficial and it is wished that it should be always activated, then the 
option could be removed.



Hmmm. My point of view is still that the logical priority is to optimize
for disk IO first, then look for compatible RAM optimisations later.


It is not only about RAM optimisation which we can do later, but also
about avoiding regression in existing use-cases.


Hmmm. Currently I have not seen really significant regressions. I have 
seen some less good impact of some options on some loads.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting TAP tests with MSVC and Windows

2015-06-24 Thread Michael Paquier
On Tue, May 26, 2015 at 3:39 PM, Michael Paquier wrote:
> Here is v6, a rebased version on HEAD (79f2b5d). There were some
> conflicts with the indentation and some other patches related to
> pg_rewind and initdb's tests.

Attached is v7, rebased on 0b157a0.
-- 
Michael
From 4b960b29446c01ca2f1a44f0e545835d99ab87e6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 20 Apr 2015 04:57:37 -0700
Subject: [PATCH 1/2] Add support for TAP tests on Windows

Nodes initialized by the TAP tests use SSPI to securely perform the
tests, and test scripts are patched in a couple of places to support
Windows grammar. In the case of MSVC, tests can be run with this
command:
vcregress tapcheck
---
 doc/src/sgml/install-windows.sgml  |  1 +
 src/Makefile.global.in |  2 +-
 src/bin/initdb/t/001_initdb.pl |  1 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   | 67 --
 src/bin/pg_controldata/t/001_pg_controldata.pl |  5 +-
 src/bin/pg_ctl/t/001_start_stop.pl | 14 --
 src/bin/pg_ctl/t/002_status.pl | 12 -
 src/bin/pg_rewind/RewindTest.pm| 66 +++--
 src/bin/scripts/t/020_createdb.pl  |  3 ++
 src/test/perl/TestLib.pm   | 18 ---
 src/tools/msvc/Solution.pm |  1 +
 src/tools/msvc/config_default.pl   |  1 +
 src/tools/msvc/vcregress.pl| 48 +-
 13 files changed, 175 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index d154b44..2047790 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -439,6 +439,7 @@ $ENV{CONFIG}="Debug";
 vcregress modulescheck
 vcregress ecpgcheck
 vcregress isolationcheck
+vcregress tapcheck
 vcregress upgradecheck
 
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c583b44..563d1d1 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -336,7 +336,7 @@ cd $(srcdir) && TESTDIR='$(CURDIR)' PATH="$(bindir):$$PATH" PGPORT='6$(DEF_PGPOR
 endef
 
 define prove_check
-cd $(srcdir) && TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
+cd $(srcdir) && TESTDIR='$(CURDIR)' TESTREGRESS='$(top_builddir)/src/test/regress/pg_regress' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
 endef
 
 else
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 299dcf5..095cbf3 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -4,6 +4,7 @@
 
 use strict;
 use warnings;
+use Config;
 use TestLib;
 use Test::More tests => 14;
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index c8c9250..3fd297b 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -1,8 +1,9 @@
 use strict;
 use warnings;
 use Cwd;
+use Config;
 use TestLib;
-use Test::More tests => 35;
+use Test::More tests => ($Config{osname} eq "MSWin32") ? 25 : 35;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -25,10 +26,20 @@ if (open BADCHARS, ">>$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
 	close BADCHARS;
 }
 
+# Use SSPI on Windows, node has been initialized already accordingly
+# by pg_regress --config-auth.
 open HBA, ">>$tempdir/pgdata/pg_hba.conf";
-print HBA "local replication all trust\n";
-print HBA "host replication all 127.0.0.1/32 trust\n";
-print HBA "host replication all ::1/128 trust\n";
+if ($Config{osname} ne "MSWin32")
+{
+	print HBA "local replication all trust\n";
+	print HBA "host replication all 127.0.0.1/32 trust\n";
+	print HBA "host replication all ::1/128 trust\n";
+}
+else
+{
+	print HBA "host replication all 127.0.0.1/32 sspi include_realm=1 map=regress\n";
+	print HBA "host replication all ::1/128 sspi include_realm=1 map=regress\n";
+}
 close HBA;
 system_or_bail 'pg_ctl', '-s', '-D', "$tempdir/pgdata", 'reload';
 
@@ -64,6 +75,33 @@ command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft' ],
 	'pg_basebackup tar with long name fails');
 unlink "$tempdir/pgdata/$superlongname";
 
+command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
+	'-T with empty old directory fails');
+command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=" ],
+	'-T with empty new directory fails');
+command_fails(
+	[   'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp',
+		"-T/foo=/bar=/baz" ],
+	'-T with multiple = fails');
+command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ],
+	'-T with old directory not absolute fails');
+command_fails(
+	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ],
+	'-T with new directory no

Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Noah Misch
On Wed, Jun 24, 2015 at 10:03:50AM -0400, Robert Haas wrote:
> On Tue, Jun 23, 2015 at 10:15 PM, Noah Misch  wrote:
> >> That brings it back to the enforcing - would we want to enforce both those?
> >
> > May as well.  AuthorDate is the main source of trouble.  You would need to 
> > go
> > out of your way (e.g. git filter-branch) to push an old CommitDate, but 
> > let's
> > check it just the same.
> 
> Actually, you just need the system clock to be off.  I've noticed, for
> example, that when my VMware VMs go to sleep (because I close my
> laptop lid) their clocks don't run, so I have to remember to ntpdate
> afterwards if I want correct time.  I don't happen to use those for
> committing to PostgreSQL, but somebody else might have a similar setup
> that they do use for that purpose.

I didn't think of that.  So, yep, checking both is valuable.

> So +1 for sanity-checking the commit date, and +1 as well as Alvaro's
> proposal for checking for both past and future times.  I think we
> should tolerate a bit more in terms of past timestamps than future
> timestamps.  It's quite reasonable for someone to commit locally and
> then run make check-world or so before pushing; let's not make that
> unnecessarily annoying.  But future timestamps should really only ever
> happen because of clock slew, and I don't think we need to tolerate
> very much of that.

Sounds good.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improving log capture of TAP tests with IPC::Run

2015-06-24 Thread Michael Paquier
Hi all,

After looking at the issues with the TAP test suite that hamster faced
a couple of days ago, which is what has been discussed on this thread:
http://www.postgresql.org/message-id/13002.1434307...@sss.pgh.pa.us

I have developed a patch to improve log capture of the TAP tests by
being able to collect stderr and stdout output of each command run in
the tests by using more extensively IPC::Run::run (instead of system()
that is not able to help much) that has already been sent on the
thread above.

This patch looks worth having in the TAP suite to track problems,
hence this thread dedicated to it. The patch has been added as well to
the next CF.
Regards,
-- 
Michael
From 59fa3c2cd99bfbff970c6054ea279bb26a0a3413 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 20 Jun 2015 17:44:39 +0900
Subject: [PATCH] Improve log capture of TAP tests and fix race conditions

All tests have their logs stored as regress_log/$TEST_NAME, with content
captured from the many commands run during the tests. Commands that were
previously called in silent mode are made verbose and have their output
written in the newly-created log files.
---
 src/Makefile.global.in |  1 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   |  4 +-
 src/bin/pg_controldata/t/001_pg_controldata.pl |  2 +-
 src/bin/pg_ctl/t/001_start_stop.pl |  2 +-
 src/bin/pg_ctl/t/002_status.pl |  6 +--
 src/bin/pg_rewind/.gitignore   |  1 -
 src/bin/pg_rewind/Makefile |  2 +-
 src/bin/pg_rewind/RewindTest.pm| 74 --
 src/bin/pg_rewind/t/001_basic.pl   |  1 -
 src/bin/pg_rewind/t/002_databases.pl   |  1 -
 src/bin/pg_rewind/t/003_extrafiles.pl  |  1 -
 src/test/perl/TestLib.pm   | 52 +-
 src/test/ssl/ServerSetup.pm|  6 +--
 13 files changed, 83 insertions(+), 70 deletions(-)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index c583b44..171d1e4 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -336,6 +336,7 @@ cd $(srcdir) && TESTDIR='$(CURDIR)' PATH="$(bindir):$$PATH" PGPORT='6$(DEF_PGPOR
 endef
 
 define prove_check
+rm -rf $(srcdir)/tmp_check/log
 cd $(srcdir) && TESTDIR='$(CURDIR)' $(with_temp_install) PGPORT='6$(DEF_PGPORT)' top_builddir='$(CURDIR)/$(top_builddir)' $(PROVE) $(PG_PROVE_FLAGS) $(PROVE_FLAGS) t/*.pl
 endef
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 3476ea6..b87ef29 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -22,7 +22,7 @@ print HBA "local replication all trust\n";
 print HBA "host replication all 127.0.0.1/32 trust\n";
 print HBA "host replication all ::1/128 trust\n";
 close HBA;
-system_or_bail 'pg_ctl', '-s', '-D', "$tempdir/pgdata", 'reload';
+run_or_bail(['pg_ctl', '-D', "$tempdir/pgdata", 'reload']);
 
 command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
@@ -51,7 +51,7 @@ ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
 
 my $superlongname = "superlongname_" . ("x" x 100);
 
-system_or_bail 'touch', "$tempdir/pgdata/$superlongname";
+run_or_bail(['touch', "$tempdir/pgdata/$superlongname"]);
 command_fails([ 'pg_basebackup', '-D', "$tempdir/tarbackup_l1", '-Ft' ],
 	'pg_basebackup tar with long name fails');
 unlink "$tempdir/pgdata/$superlongname";
diff --git a/src/bin/pg_controldata/t/001_pg_controldata.pl b/src/bin/pg_controldata/t/001_pg_controldata.pl
index a4180e7..0d1407e 100644
--- a/src/bin/pg_controldata/t/001_pg_controldata.pl
+++ b/src/bin/pg_controldata/t/001_pg_controldata.pl
@@ -11,6 +11,6 @@ program_options_handling_ok('pg_controldata');
 command_fails(['pg_controldata'], 'pg_controldata without arguments fails');
 command_fails([ 'pg_controldata', 'nonexistent' ],
 	'pg_controldata with nonexistent directory fails');
-system_or_bail "initdb -D '$tempdir'/data -A trust >/dev/null";
+run_or_bail(['initdb', '-D', "$tempdir/data", '-A', 'trust']);
 command_like([ 'pg_controldata', "$tempdir/data" ],
 	qr/checkpoint/, 'pg_controldata produces output');
diff --git a/src/bin/pg_ctl/t/001_start_stop.pl b/src/bin/pg_ctl/t/001_start_stop.pl
index 6c9ec5c..57563d7 100644
--- a/src/bin/pg_ctl/t/001_start_stop.pl
+++ b/src/bin/pg_ctl/t/001_start_stop.pl
@@ -36,4 +36,4 @@ command_ok([ 'pg_ctl', 'restart', '-D', "$tempdir/data", '-w', '-m', 'fast' ],
 command_ok([ 'pg_ctl', 'restart', '-D', "$tempdir/data", '-w', '-m', 'fast' ],
 	'pg_ctl restart with server running');
 
-system_or_bail 'pg_ctl', '-s', 'stop', '-D', "$tempdir/data", '-m', 'fast';
+run_or_bail(['pg_ctl', 'stop', '-D', "$tempdir/data", '-m', 'fast']);
diff --git a/src/bin/pg_ctl/t/002_status.pl b/src/bin/pg_ctl/t/002_status.pl
index 0558854..d1192f6 100644
--- a/src/bin/pg_ctl/t/002_status.pl
+++ b/src/bin/pg_ctl/t/002_status.pl
@@ -18,9 +18,9 @@ close CONF;
 comm

Re: [HACKERS] 9.5 release notes

2015-06-24 Thread Amit Langote

Hi,

On 2015-06-11 PM 01:15, Bruce Momjian wrote:
> I have committed the first draft of the 9.5 release notes.  You can view
> the output here:
> 
>   http://momjian.us/pgsql_docs/release-9-5.html
>   
> and it will eventually appear here:
> 
>   http://www.postgresql.org/docs/devel/static/release.html
> 
> I am ready to make suggested adjustments, though I am traveling for
> conferences for the next ten days so there might a delay in my replies.
> 

Is it intentional that the following items are separate?

  
   
Move pg_upgrade from contrib to
src/bin (Peter Eisentraut)
   
  


 
  
   Move pg_upgrade_support code into backend and
   remove the module (Peter Eisentraut)
  
 

Or could they made into one item?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-24 Thread Michael Paquier
On Thu, Jun 25, 2015 at 12:57 PM, Fujii Masao wrote:
> On Thu, Jun 25, 2015 at 12:15 PM, Michael Paquier wrote:
>> and that's actually equivalent to that in
>> the grammar: 1(AAA,BBB,CCC).
>
> I don't think that they are the same. In the case of 1(AAA,BBB,CCC), while
> two servers AAA and BBB are running, the master server may return a success
> of the transaction to the client just after it receives the ACK from BBB.
> OTOH, in the case of AAA,BBB, that never happens. The master must wait for
> the ACK from AAA to arrive before completing the transaction. And then,
> if AAA goes down, BBB should become synchronous standby.

Ah. Right. I missed your point, that's a bad day... We could have
multiple separators to define group types then:
- "()" where the order of acknowledgement does not matter
- "[]" where it does not.
You would find the old grammar with:
1[AAA,BBB,CCC]
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-24 Thread Michael Paquier
On Wed, Jun 24, 2015 at 11:30 PM, Fujii Masao  wrote:
> On Fri, May 15, 2015 at 9:18 PM, Michael Paquier
>  wrote:
>> On Fri, May 15, 2015 at 8:55 PM, Beena Emerson  
>> wrote:
>>> There was a discussion on support for N synchronous standby servers started
>>> by Michael. Refer
>>> http://archives.postgresql.org/message-id/cab7npqr9c84ig0zuvhmqamq53vqsd4rc82vyci4dr27pvof...@mail.gmail.com
>>> . The use of hooks and dedicated language was suggested, however, it seemed
>>> to be an overkill for the scenario and there was no consensus on this.
>>> Exploring GUC-land was preferred.
>>
>> Cool.
>>
>>> Please find attached a patch,  built on Michael's patch from above mentioned
>>> thread, which supports choosing different number of nodes from each set i.e.
>>> k nodes from set 1, l nodes from set 2, so on.
>>> The format of synchronous_standby_names has been updated to standby name
>>> followed by the required count separated by hyphen. Ex: 'aa-1, bb-3'.  The
>>> transaction waits for all the specified number of standby in each group. Any
>>> extra nodes with the same name will be considered potential. The special
>>> entry * for the standby name is also supported.
>>
>> I don't think that this is going in the good direction, what was
>> suggested mainly by Robert was to use a micro-language that would
>> allow far more extensibility that what you are proposing. See for
>> example ca+tgmobpwoenmmepfx0jwrvqufxvbqrv26ezq_xhk21gxrx...@mail.gmail.com
>> for some ideas.
>
> Doesn't this approach prevent us from specifying the "potential" synchronous
> standby server? For example, imagine the case where you want to treat
> the server AAA as synchronous standby. You also want to use the server BBB
> as synchronous standby only if the server AAA goes down. IOW, you want to
> prefer to the server AAA as synchronous standby rather than BBB.
> Currently we can easily set up that case by just setting
> synchronous_standby_names as follows.
>
> synchronous_standby_names = 'AAA, BBB'
>
> However, after we adopt the quorum commit feature with the proposed
> macro-language, how can we set up that case? It seems impossible...
> I'm afraid that this might be a backward compatibility issue.

Like that:
synchronous_standby_names = 'AAA, BBB'
The thing is that we need to support the old grammar as well to be
fully backward compatible, and that's actually equivalent to that in
the grammar: 1(AAA,BBB,CCC). This is something I understood was
included in Robert's draft proposal.

> Or we should extend the proposed micro-language so that it also can handle
> the priority of each standby servers? Not sure that's possible, though.

I am not sure that's really necessary, we need only to be able to
manage priorities within each subgroup. Putting it in a shape that
user can understand easily in pg_stat_replication looks more
challenging though. We are going to need a new view like
pg_stat_replication group that shows up the priority status of each
group, with one record for each group, taking into account that a
group can be included in another one.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-06-24 Thread Kouhei Kaigai
Fujita-san,

> Does it make sense to put the result tuple of remote join on evety
> estate->es_epqTupleSet[] slot represented by this ForeignScan if
> scanrelid==0?
>
Sorry, I misunderstood behavior of the es_epqTupleSet[].

I'd like to suggest a solution that re-construct remote tuple according
to the fdw_scan_tlist on ExecScanFetch, if given scanrelid == 0.
It enables to run local qualifier associated with the ForeignScan node,
and it will also work for the case when tuple in es_epqTupleSet[] was
local heap.

For details:
The es_epqTuple[] is set by EvalPlanQualSetTuple(). It put a tuple
exactly reflects a particular base relation (that has positive rtindex).
Even if it is a foreign-table, ExecLockRows() put a tuple dynamically
constructed via whole-row-reference at EvalPlanQualFetchRowMarks().
So, regardless of copy or reference to heap, we can expect es_epqTuple[]
keeps tuples of the base relations for each.

On the other hands, ForeignScan that replaced local join by remote
join has a valid fdw_scan_tlist list. It contains expression node
to construct individual attribute of the pseudo scan target-list.

So, all we need to do is, (1) if scanrelid == 0 on ExecScanFetch(),
(2) it should be ForeignScan or CustomScan, with *_scan_tlist.
(3) then, we reconstruct a tuple of the pseudo scan based on the
*_scan_tlist, instead of simple reference to es_epqTupleSet[],
(4) and, evaluate local qualifiers of the node.

How about your thought?

BTW, if you try newer version of postgres_fdw foreign join patch,
please provide me to reproduce the problem/

Also, as an aside, postgres_fdw does not implement RefetchForeignRow()
at this moment. Does it make a problem?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
> Sent: Wednesday, June 24, 2015 10:02 PM
> To: Etsuro Fujita
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> Does it make sense to put the result tuple of remote join on evety
> estate->es_epqTupleSet[] slot represented by this ForeignScan if
> scanrelid==0?
> 
> It allows to recheck qualifier for each LockRow that intends to lock
> base foreign table underlying the remote join.
> ForeignScan->fdw_relids tells us which rtindexes are represented
> by this ForeignScan, so infrastructure side may be able to handle.
> 
> Thanks,
> 
> 
> 2015-06-24 11:40 GMT+09:00 Etsuro Fujita :
> > Hi,
> >
> > While reviewing the foreign join pushdown core patch, I noticed that the
> > patch doesn't perform an EvalPlanQual recheck properly.  The example
> > that crashes the server will be shown below (it uses the postgres_fdw
> > patch [1]).  I think the reason for that is because the ForeignScan node
> > performing the foreign join remotely has scanrelid = 0 while
> > ExecScanFetch assumes that its scan node has scanrelid > 0.
> >
> > I think this is a bug.  I've not figured out how to fix this yet, but I
> > thought we would also need another plan that evaluates the join locally
> > for the test tuples for EvalPlanQual.  Though I'm missing something though.
> >
> > Create an environment:
> >
> > postgres=# create table tab (a int, b int);
> > CREATE TABLE
> > postgres=# create foreign table foo (a int) server myserver options
> > (table_name 'foo');
> > CREATE FOREIGN TABLE
> > postgres=# create foreign table bar (a int) server myserver options
> > (table_name 'bar');
> > CREATE FOREIGN TABLE
> > postgres=# insert into tab values (1, 1);
> > INSERT 0 1
> > postgres=# insert into foo values (1);
> > INSERT 0 1
> > postgres=# insert into bar values (1);
> > INSERT 0 1
> > postgres=# analyze tab;
> > ANALYZE
> > postgres=# analyze foo;
> > ANALYZE
> > postgres=# analyze bar;
> > ANALYZE
> >
> > Run the example:
> >
> > [Terminal 1]
> > postgres=# begin;
> > BEGIN
> > postgres=# update tab set b = b + 1 where a = 1;
> > UPDATE 1
> >
> > [Terminal 2]
> > postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
> > foo.a and foo.a = bar.a for update;
> >
> >  QUERY PLAN
> >
> >
> 
> 
> >
> 
> 
> >  LockRows  (cost=100.00..101.18 rows=4 width=70)
> >Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
> >->  Nested Loop  (cost=100.00..101.14 rows=4 width=70)
> >  Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
> >  Join Filter: (foo.a = tab.a)
> >  ->  Seq Scan on public.tab  (cost=0.00..1.01 rows=1 width=14)
> >Output: tab.a, tab.b, tab.ctid
> >  ->  Foreign Scan  (cost=100.00..100.08 rows=4 width=64)
> >Output: foo.*, foo.a, bar.*,

Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-24 Thread Michael Paquier
On Wed, Jun 24, 2015 at 3:43 PM, Michael Paquier wrote:
>> I just realized another problem: We recently learned the hard way that some
>> people have files in the data directory that are not writeable by the
>> 'postgres' user
>> (http://www.postgresql.org/message-id/20150523172627.ga24...@msg.df7cb.de).
>> pg_rewind will try to overwrite all files it doesn't recognize as relation
>> files, so it's going to fail on those. A straightforward fix would be to
>> first open the destination file in read-only mode, and compare its contents,
>> and only open the file in write mode if it has changed. It would still fail
>> when the files really differ, but I think that's acceptable.
>
> If I am missing nothing, two code paths need to be patched here:
> copy_file_range and receiveFileChunks. copy_file_range is
> straight-forward. Now wouldn't it be better to write the contents into
> a temporary file, compare their content, and then switch if necessary
> for receiveFileChunks?

After sleeping on it, I have been looking at this issue again and came
up with the patch attached. Instead of checking if the content of the
target and the source file are the same, meaning that we would still
need to fetch chunk content from the server in stream mode, I think
that it is more robust to check if the target file can be opened and
check for EACCES on failure, bypassing it if process does not have
permissions on it. the patch contains a test case as well, and is
independent on the rest sent upthread.
Thoughts?
-- 
Michael
diff --git a/src/bin/pg_rewind/copy_fetch.c b/src/bin/pg_rewind/copy_fetch.c
index 224fad1..ef830ac 100644
--- a/src/bin/pg_rewind/copy_fetch.c
+++ b/src/bin/pg_rewind/copy_fetch.c
@@ -174,7 +174,8 @@ copy_file_range(const char *path, off_t begin, off_t end, bool trunc)
 	if (lseek(srcfd, begin, SEEK_SET) == -1)
 		pg_fatal("could not seek in source file: %s\n", strerror(errno));
 
-	open_target_file(path, trunc);
+	if (!open_target_file(path, trunc))
+		return;
 
 	while (end - begin > 0)
 	{
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index c2d8aa1..1f68855 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -40,17 +40,17 @@ static void remove_target_symlink(const char *path);
  * Open a target file for writing. If 'trunc' is true and the file already
  * exists, it will be truncated.
  */
-void
+bool
 open_target_file(const char *path, bool trunc)
 {
 	int			mode;
 
 	if (dry_run)
-		return;
+		return true;
 
 	if (dstfd != -1 && !trunc &&
 		strcmp(path, &dstpath[strlen(datadir_target) + 1]) == 0)
-		return;	/* already open */
+		return true;/* already open */
 
 	close_target_file();
 
@@ -60,9 +60,19 @@ open_target_file(const char *path, bool trunc)
 	if (trunc)
 		mode |= O_TRUNC;
 	dstfd = open(dstpath, mode, 0600);
+
+	/* ignore errors for unreadable files */
+	if (dstfd < 0 && errno == EACCES)
+	{
+		pg_log(PG_PROGRESS, "could not open target file \"%s\", bypassing: %s\n",
+			   dstpath, strerror(errno));
+		return false;
+	}
 	if (dstfd < 0)
 		pg_fatal("could not open target file \"%s\": %s\n",
  dstpath, strerror(errno));
+
+	return true;
 }
 
 /*
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index f68c71d..e437cb1 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -12,7 +12,7 @@
 
 #include "filemap.h"
 
-extern void open_target_file(const char *path, bool trunc);
+extern bool open_target_file(const char *path, bool trunc);
 extern void write_target_range(char *buf, off_t begin, size_t size);
 extern void close_target_file(void);
 extern void truncate_target_file(const char *path, off_t newsize);
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index df71069..41c096b 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -283,7 +283,8 @@ receiveFileChunks(const char *sql)
 		pg_log(PG_DEBUG, "received chunk for file \"%s\", offset %d, size %d\n",
 			   filename, chunkoff, chunksize);
 
-		open_target_file(filename, false);
+		if (!open_target_file(filename, false))
+			continue;
 
 		write_target_range(chunk, chunkoff, chunksize);
 
@@ -406,8 +407,8 @@ libpq_executeFileMap(filemap_t *map)
 
 			case FILE_ACTION_COPY:
 /* Truncate the old file out of the way, if any */
-open_target_file(entry->path, true);
-fetch_file_range(entry->path, 0, entry->newsize);
+if (open_target_file(entry->path, true))
+	fetch_file_range(entry->path, 0, entry->newsize);
 break;
 
 			case FILE_ACTION_TRUNCATE:
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 032301f..77dddbe 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -496,7 +496,8 @@ createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli, XLogRecPtr checkpo
 		pg_fatal("backup label buffer too small\n");	/* shouldn't happen */
 
 	/* TODO: move old file out of the way, if any. */
-	open_

Re: [HACKERS] checkpointer continuous flushing

2015-06-24 Thread Amit Kapila
On Wed, Jun 24, 2015 at 9:50 AM, Fabien COELHO  wrote:

>
>flsh |  full speed tps   | percent of late tx, 4 clients
>   /srt |  1 client   |  4 clients  |   100 |   200 |   400 |
>N/N | 173 +- 289* | 198 +- 531* | 27.61 | 43.92 | 61.16 |
>N/Y | 458 +- 327* | 743 +- 920* |  7.05 | 14.24 | 24.07 |
>Y/N | 169 +- 166* | 187 +- 302* |  4.01 | 39.84 | 65.70 |
>Y/Y | 546 +- 143  | 681 +- 459  |  1.55 |  3.51 |  2.84 |
>
> The effect of sorting is very positive (+150% to 270% tps). On this
> run,
>
>  flushing has a positive (+20% with 1 client) or negative (-8 % with 4
 clients) on throughput, and late transactions are reduced by 92-95% when
 both options are activated.

 Why there is dip in performance with multiple clients,

>>>
>>> I'm not sure to see the "dip". The performances are better with 4 clients
>>> compared to 1 client?
>>>
>>
>> What do you mean by "negative (-8 % with 4 clients) on throughput" in
>> above sentence? I thought by that you mean that there is dip in TPS with
>> patch as compare to HEAD at 4 clients.
>>
>
> Ok, I misunderstood your question. I thought you meant a dip between 1
> client and 4 clients. I meant that when flush is turned on tps goes down by
> 8% (743 to 681 tps) on this particular run.


This 8% might matter if the dip is bigger with more clients and
more aggressive workload.  Do you know what could lead to this
dip, because if we know what is the reason than it will be more
predictable to know if this is the max dip that could happen or it
could lead to bigger dip in other cases.


> Basically tps improvements mostly come from "sort", and "flush" has
> uncertain effects on tps (throuput), but much more on latency and
> performance stability (lower late rate, lower standard deviation).
>
>
I agree that performance stability is important, but not sure if it
is good idea to sacrifice the throuput for it.  If sort + flush always
gives better results, then isn't it better to perform these actions
together under one option.


> Note that I'm not comparing to HEAD in the above tests, but with the new
> options desactivated, which should be more or less comparable to current
> HEAD, i.e. there is no sorting nor flushing done, but this is not strictly
> speaking HEAD behavior. Probably I should get some figures with HEAD as
> well to check the "more or less" assumption.
>
>  Also I am not completely sure what's +- means in your data above?
>>
>
> The first figure before "+-" is the tps, the second after is its standard
> deviation computed in per-second traces. Some runs are very bad, with
> pgbench stuck at times, and result on stddev larger than the average, they
> ere noted with "*".
>
>  I understand your point and I also don't have any specific answer
>> for it at this moment, the point of worry is that it should not lead
>> to degradation of certain cases as compare to current algorithm.
>> The workload where it could effect is when your data doesn't fit
>> in shared buffers, but can fit in RAM.
>>
>
> Hmmm. My point of view is still that the logical priority is to optimize
> for disk IO first, then look for compatible RAM optimisations later.
>

It is not only about RAM optimisation which we can do later, but also
about avoiding regression in existing use-cases.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Amit Langote

Peter,

On 2015-06-25 AM 02:35, Peter Geoghegan wrote:
> 
> Inheritance with triggers is a leaky abstraction, so this kind of
> thing is always awkward. Still, UPSERT has full support for
> *inheritance* -- that just doesn't help in this case.
> 

Could you clarify as to what UPSERT's support for inheritance entails?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-24 Thread Tomas Vondra



On 06/24/2015 07:54 PM, Jeff Janes wrote:


Were the stories (or the experience which lead to the stories) on 9.3 or
later?  Do they have a good way to reproduce it for testing purposes?


The per-db split can only improve things if there actually are multiple 
databases, and if the objects are somehow evenly distributed among them. 
If there's a single database, or if most of the objects are in a single 
database (but stored in multiple schemas, for example), it does not help 
much. So it's trivially to reproduce the previous issues.




When I was testing the stat file split patch, one thing I noticed is
that on the ext4 file system, when you atomically replace a file by
renaming a file over the top of an existing file name, it will
automatically fsync the file being renamed. This is exactly what we
don't want in the case of the temp stats files.  But there seems to be
no way to circumvent it, other than unlinking the target file before the
rename (which of course defeats the point of atomic replacement), or
moving to a different filesystem for the stats files.


Interesting. I don't think unlinking is a good idea - my understanding 
is that we do it this way because rename is supposed to be atomic or 
something like that. I don't know whether addressing this filesystem 
feature is worth it, or if pgstat.c is the right place to do that.


--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] less log level for success dynamic background workers for 9.5

2015-06-24 Thread Jim Nasby

On 6/23/15 8:11 PM, Craig Ringer wrote:

I've certainly had similar issues to you w.r.t. to debug messages from
user-level code, and wanted to be able to enable one particular log
line, all log output from a particular function, or all log output
from a particular extension / all functions in a schema.


I would also like the ability to potentially change the log level for 
all calls made out of a certain function/schema/etc. Basically, change 
it for everything below a certain entry in the call stack.



I think it's a whole separate topicto Pavel's original proposal
though, and really merits a separate thread. For Pavel's issue I'm all
in favour of just changing the log message, I think LOG is way too
high for something that's internal implementation detail.


+1.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 24, 2015 at 3:49 PM, Andres Freund  wrote:
>> On 2015-06-24 15:41:22 -0400, Peter Eisentraut wrote:
>>> One more argument for leaving everything alone.  If users don't like it,
>>> they can turn it off themselves.

>> Because it's so obvious to get there from "SSL error: unexpected
>> message", "SSL error: bad write retry" or "SSL error: unexpected record"
>> to disabling renegotiation. Right?  Search the archives and you'll find
>> plenty of those, mostly in relation to streaming rep. It took -hackers
>> years to figure out what causes those, how are normal users supposed to
>> a) correlate such errors with renegotiation b) evaluate what do about
>> it?

> We could document the issues, create release-note entries suggesting a
> configuration change, and/or blog about it.
> I don't accept the argument that there are not ways to tell users
> about things they might want to do.

I think there's a strong argument for changing the default setting to
zero (no renegotiation), even in the back branches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-24 Thread Peter Geoghegan
On Sun, Jun 14, 2015 at 2:08 PM, Peter Geoghegan  wrote:
> Really? The pre-check thing wasn't too complex for Magnus to have a
> couple of bullet points on it *specifically* in his high level NYC
> talk on Postgres 9.5 features [1]. I don't think it's hard to
> understand at all.
>
> Also, it's simply incorrect to describe abbreviation as: "Improve the
> speed of sorting character and numeric fields". Character fields
> presumably include character(n), and as I pointed out character(n)
> lacks abbreviation support.

Where are we on this? Bruce mentioned that he'd revisit this during pgCon.

Aside from the issue of whether or not the pre-check thing is
mentioned, and aside from the issue of correctly identifying which
types the abbreviation optimization applies to, I have another
concern: I cannot imagine why we'd fail to mention a totally
independent speed up of about 10% [1] for CREATE INDEX on integer
columns. This speed-up has nothing to do with abbreviation or anything
else mentioned in the 9.5 release notes currently; it's down to commit
5ea86e6e, which extended sortsupport to work with cases like CREATE
INDEX and CLUSTER.

[1] http://www.postgresql.org/message-id/545ac1d9.1040...@proxel.se
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Are we sufficiently clear that jsonb containment is nested?

2015-06-24 Thread Peter Geoghegan
I worry that "8.14.3. jsonb Containment and Existence" is not
sufficiently clear in explaining that jsonb containment is nested.
I've seen anecdata suggesting that this is unclear to users. We do
say:

"""
The general principle is that the contained object must match the
containing object as to structure and data contents, possibly after
discarding some non-matching array elements or object key/value pairs
from the containing object.
"""

I think that we could still do with an example showing *nested*
containment, where many non-matching elements/pairs at each of several
nesting levels are discarded. This could be back-patched to 9.4.
Something roughly like the delicious sample data, where queries like
the following are possible and useful:

postgres=# select jsonb_pretty(doc) from delicious where doc @>
'{"tags":[{"term":"Florence" }, {"term":"food"} ]  }' limit 1;
  jsonb_pretty
-
 {
 +
 "id": "http://delicious.com/url/5f05d61a6e8519e9c9c8c557216375da#Avrami";,
 +
 "link": 
"http://www.foodnetwork.com/recipes/tyler-florence/the-ultimate-lasagna-recipe/index.html";,
   +
 "tags": [
 +
 {
 +
 "term": "Lasagna",
 +
 "label": null,
 +
 "scheme": "http://delicious.com/Avrami/";
 +
 },
 +
*** SNIP ***
+
 "title": "The Ultimate Lasagna Recipe : Tyler Florence : Food
Network",+
 "author": "Avrami",
 +
 "source": {
 +
 },
 +
 "updated": "Fri, 11 Sep 2009 17:09:20 +",
 +
 "comments":
"http://delicious.com/url/5f05d61a6e8519e9c9c8c557216375da";,
+
 "guidislink": false,
 +
 "title_detail": {
 +
 "base":
"http://feeds.delicious.com/v2/rss/recent?min=1&count=100";,
+
 "type": "text/plain",
 +
 "value": "The Ultimate Lasagna Recipe : Tyler Florence : Food
Network",+
 "language": null
 +
 },
 +
 "wfw_commentrss":
"http://feeds.delicious.com/v2/rss/url/5f05d61a6e8519e9c9c8c557216375da";
+
 }
(1 row)

Obviously a real doc-patch example would have to be more worked out
and clearer than what I show here. My immediate concern is whether
users appreciate that jsonb is capable of this kind of complex, nested
containment-driven querying. I do not recall ever seeing an example
like this in the wild, which is where this concern comes from. It
would be a shame if they were working around a non-existent
limitation, especially given that this kind of thing can work
reasonably effectively with the jsonb_path_ops opclass.

Opinions?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oh, this is embarrassing: init file logic is still broken

2015-06-24 Thread Josh Berkus
On 06/23/2015 04:44 PM, Tom Lane wrote:
> Chasing a problem identified by my Salesforce colleagues led me to the
> conclusion that my commit f3b5565dd ("Use a safer method for determining
> whether relcache init file is stale") is rather borked.  It causes
> pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file,
> because that index is not used by any syscache.  I had been aware of that
> actually, but considered it a minor issue.  It's not so minor though,
> because RelationCacheInitializePhase3 marks that index as nailed for
> performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES.
> That means that load_relcache_init_file *always* decides that the init
> file is busted and silently(!) ignores it.  So we're taking a nontrivial
> hit in backend startup speed as of the last set of minor releases.

OK, this is pretty bad in its real performance effects.  On a workload
which is dominated by new connection creation, we've lost about 17%
throughput.

To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a
database which fits in shared_buffers on two different m3.large
instances on AWS (across the network, not on unix sockets).  A typical
run on 9.3.6 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 252322
tps = 210.267219 (including connections establishing)
tps = 31958.233736 (excluding connections establishing)
statement latencies in milliseconds:
0.002515\set naccounts 10 * :scale
0.000963\setrandom aid 1 :naccounts
19.042859   SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Whereas a typical run on 9.3.9 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 208180
tps = 173.482259 (including connections establishing)
tps = 31092.866153 (excluding connections establishing)
statement latencies in milliseconds:
0.002518\set naccounts 10 * :scale
0.000988\setrandom aid 1 :naccounts
23.076961   SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Numbers are pretty consistent on four runs each on two different
instances (+/- 4%), so I don't think this is Amazon variability we're
seeing.  I think the syscache invalidation is really costing us 17%.  :-(

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 3:49 PM, Andres Freund  wrote:
> On 2015-06-24 15:41:22 -0400, Peter Eisentraut wrote:
>> On 6/24/15 3:13 PM, Andres Freund wrote:
>> > Meh. The relevant branches already exist, as you can disable it today.
>> >
>> > We could also just change the default in the back branches.
>>
>> One more argument for leaving everything alone.  If users don't like it,
>> they can turn it off themselves.
>
> Because it's so obvious to get there from "SSL error: unexpected
> message", "SSL error: bad write retry" or "SSL error: unexpected record"
> to disabling renegotiation. Right?  Search the archives and you'll find
> plenty of those, mostly in relation to streaming rep. It took -hackers
> years to figure out what causes those, how are normal users supposed to
> a) correlate such errors with renegotiation b) evaluate what do about
> it?

We could document the issues, create release-note entries suggesting a
configuration change, and/or blog about it.

I don't accept the argument that there are not ways to tell users
about things they might want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 3:41 PM, Peter Eisentraut  wrote:
> On 6/24/15 3:13 PM, Andres Freund wrote:
>> Meh. The relevant branches already exist, as you can disable it today.
>>
>> We could also just change the default in the back branches.
>
> One more argument for leaving everything alone.  If users don't like it,
> they can turn it off themselves.

I find it very hard to disagree with that perspective.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-06-24 Thread Josh Berkus
On 06/21/2015 01:37 PM, Fabien COELHO wrote:
> 
>> The worst case with pgbench is that we break two people's test scripts,
>> they read the release notes, and fix them.
> 
> Sure, I agree that breaking pgbench custom scripts compatibility is no
> big deal, and having pgbench consistent with psql is useful.

... apparently nobody disagrees ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GIN: Implementing triConsistent and strategy number

2015-06-24 Thread Jeff Janes
Is there a way to implement triConsistent for only some of the strategy
numbers?

It looks like I would have to internally re-implement something like
shimTriConsistentFn for each strategy number for which I don't want to
implement the ternary system in full.  Am I missing a trick?

Thanks,

Jeff


Re: [HACKERS] pg_stat_*_columns?

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 2:03 PM, Jeff Janes  wrote:
> On Sat, Jun 20, 2015 at 7:20 AM, Robert Haas  wrote:
>>
>> On Sat, Jun 20, 2015 at 10:12 AM, Joel Jacobson  wrote:
>> > Is there any chance the project would accept a patch which adds the
>> > pg_stat_*_columns-feature without first optimizing the collector?
>>
>> I doubt it.  It's such a pain point already that massively increasing
>> the amount of data we need to store does not seem like a good plan.
>
> What if the increase only happened for people who turned on
> track_column_usage?

Hmm.  You know, what I think would really help is if the column
statistics were stored in different files from the table statistics,
and requested separately.  Since these are just there for the DBA, and
wouldn't be used by anything automated, the enormous explosion
wouldn't hurt much.

I'm skeptical of the proposal overall in any event: even if good work
is done to keep the performance impact small, how many DBAs will want
to increase the amount of statistical data by 1-2 orders of magnitude?
 It's a lot of numbers to keep around for something that you won't
look at very often and which may not be informative when you do look
at it (e.g. if your queries tend to use SELECT *). But I won't stand
in the way if there's a general consensus that users will find this
useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Peter Eisentraut
On 6/24/15 12:55 PM, Robert Haas wrote:
>> FWIW, I have been doing that, and I have not considered it a problem.
>> If the patch was submitted three months ago, reviewed, and then
>> committed unchanged, the date is what it is.  Also, when I cherry-pick a
>> commit from master to a back branch, I keep the author timestamp the
>> same.  I consider that a feature.
> 
> I don't, because it means that the timestamps you see when you run git
> log are non-linear.  I don't care myself if they are slightly out of
> order, although it seems that others do, but I do mind when they are
> months off.

Why is that a problem?

> Typically when this happens to me, it's not a case of the patch being
> unchanged.  I make changes on a branch and then use git rebase -i to
> squash them into a single patch which I then cherry-pick.  But git
> rebase -i keeps the timestamp of the first (oldest) commit, so I end
> up with a commit that is timestamped as to when I began development,
> not when I finished it.  So the date is just wrong.

Sure, but then it's up to you to fix it, just like it's up to you to
merge the commit messages and do other adjustments to the commit.  But
this is one of many cases, and we shouldn't throw out the whole concept
because of it.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Andres Freund
On 2015-06-24 15:41:22 -0400, Peter Eisentraut wrote:
> On 6/24/15 3:13 PM, Andres Freund wrote:
> > Meh. The relevant branches already exist, as you can disable it today.
> > 
> > We could also just change the default in the back branches.
> 
> One more argument for leaving everything alone.  If users don't like it,
> they can turn it off themselves.

Because it's so obvious to get there from "SSL error: unexpected
message", "SSL error: bad write retry" or "SSL error: unexpected record"
to disabling renegotiation. Right?  Search the archives and you'll find
plenty of those, mostly in relation to streaming rep. It took -hackers
years to figure out what causes those, how are normal users supposed to
a) correlate such errors with renegotiation b) evaluate what do about
it?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Peter Eisentraut
On 6/24/15 3:13 PM, Andres Freund wrote:
> Meh. The relevant branches already exist, as you can disable it today.
> 
> We could also just change the default in the back branches.

One more argument for leaving everything alone.  If users don't like it,
they can turn it off themselves.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trustly PostgreSQL Data Corruption Bug Bounty Program

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 12:06 PM, Simon Riggs  wrote:
> Thanks Joel, good idea.

Any scheme that offers incentives for doing the unglamorous work of
writing a corruption test suite for PostgreSQL is good news, IMV. That
seems to be the idea here. I just hope that someone doesn't end up
incentivized to keep those cards close to their chest.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Andres Freund
On June 24, 2015 9:07:35 PM GMT+02:00, Peter Eisentraut  wrote:
>On 6/24/15 12:26 PM, Tom Lane wrote:
>> Andres Freund  writes:
>>> On 2015-06-24 11:57:53 -0400, Peter Eisentraut wrote:
 If Red Hat fixes their bug, then PostgreSQL doesn't have any actual
 problem anymore, does it?
>> 
>>> It does, there are numerous bugs around renegotiation that exist
>with
>>> upstream openssl and postgres. More in the older branches, but even
>in
>>> HEAD we break regularly. Most only occur in replication connections
>(due
>>> to copy both) and/or when using more complex clients where clients
>and
>>> servers send data at the same time due to pipelining.
>> 
>> The lesson to learn from the Red Hat fiasco is that vendors are not
>> adequately testing renegotiation either.  All the more reason to get
>> out from under it.  I did not like being told that "Postgres fails
>and
>> $randomapp doesn't, therefore it's Postgres' problem" when actually
>> the difference was that $randomapp doesn't invoke renegotiation.
>
>I'm fine with removing renegotiation.  But the original proposal was to
>backpatch renegation changes, which seemed like replacing one problem
>variation with another, and does not sound comfortable given recent
>backpatching record.

Meh. The relevant branches already exist, as you can disable it today.

We could also just change the default in the back branches.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] object_classes array is broken, again

2015-06-24 Thread Robert Haas
The transforms patch seems to have forgotten to add
TransformRelationId to object_classes[], much like the RLS patch
forgot to add PolicyRelationId in the same place.

Fixing this is easy, but ISTM that we need to insert some sort of a
guard to prevent people from continuing to forget this, because it's
apparently quite easy to do.  Perhaps add_object_address should
Assert(OidIsValid(object_classes[oclass])), plus a (static?) assert
someplace checking that OidIsValid(object_classes[MAX_OCLASS - 1])?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Peter Eisentraut
On 6/24/15 12:26 PM, Tom Lane wrote:
> Andres Freund  writes:
>> On 2015-06-24 11:57:53 -0400, Peter Eisentraut wrote:
>>> If Red Hat fixes their bug, then PostgreSQL doesn't have any actual
>>> problem anymore, does it?
> 
>> It does, there are numerous bugs around renegotiation that exist with
>> upstream openssl and postgres. More in the older branches, but even in
>> HEAD we break regularly. Most only occur in replication connections (due
>> to copy both) and/or when using more complex clients where clients and
>> servers send data at the same time due to pipelining.
> 
> The lesson to learn from the Red Hat fiasco is that vendors are not
> adequately testing renegotiation either.  All the more reason to get
> out from under it.  I did not like being told that "Postgres fails and
> $randomapp doesn't, therefore it's Postgres' problem" when actually
> the difference was that $randomapp doesn't invoke renegotiation.

I'm fine with removing renegotiation.  But the original proposal was to
backpatch renegation changes, which seemed like replacing one problem
variation with another, and does not sound comfortable given recent
backpatching record.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trustly PostgreSQL Data Corruption Bug Bounty Program

2015-06-24 Thread Simon Riggs
On 24 June 2015 at 19:18, Josh Berkus  wrote:

> On 06/24/2015 10:15 AM, Joel Jacobson wrote:
> > You are most welcome to contact me at j...@trustly.com or by replying
> > to this thread, if you have any suggestions on how to improve this
> > bug bounty program, or if you have any other feedback in general.
>
> Members of the PostgreSQL Security Team need to be disqualified from
> eligibility, of course.
>

..err... this is for data corruption bugs, not security.

Editing data files by privileged users doesn't count, since it needs to be
reproducible.

Thanks Joel, good idea.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-06-24 Thread Alvaro Herrera
Fabrízio de Royes Mello wrote:

> Another rebased version.

There are a number of unrelated whitespace changes in this patch; also
please update the comment on top of check_for_column_name_collision.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trustly PostgreSQL Data Corruption Bug Bounty Program

2015-06-24 Thread Andres Freund
On 2015-06-24 11:18:03 -0700, Josh Berkus wrote:
> On 06/24/2015 10:15 AM, Joel Jacobson wrote:
> > You are most welcome to contact me at j...@trustly.com or by replying
> > to this thread, if you have any suggestions on how to improve this
> > bug bounty program, or if you have any other feedback in general.
> 
> Members of the PostgreSQL Security Team need to be disqualified from
> eligibility, of course.

Huh, why? I sure hope we have no security team member that'd use a
security exploit he learned about for this.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GiST support for UUIDs

2015-06-24 Thread Paul A Jungwirth
Hello,

I'm interested in adding GiST support for the UUID column type from
the uuid-ossp extension. This has been requested and attempted before:


http://dba.stackexchange.com/questions/83604/optimizing-postgres-row-overlap-constraints-involving-uuids-and-gist

http://www.postgresql.org/message-id/flat/cah3i69njj9ax1fzhwt6uouzcmbqnadbwhmaphrqzzlwifb2...@mail.gmail.com#cah3i69njj9ax1fzhwt6uouzcmbqnadbwhmaphrqzzlwifb2...@mail.gmail.com

I've used Postgres for a long time, but I've only dabbled a bit in the
source code. So I'm curious where this change would go? The btree_gist
extension? The uuid-ossp extension? Somewhere else?

If anyone has any advice I'd be grateful to hear it.

Thanks,
Paul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trustly PostgreSQL Data Corruption Bug Bounty Program

2015-06-24 Thread Josh Berkus
On 06/24/2015 10:15 AM, Joel Jacobson wrote:
> You are most welcome to contact me at j...@trustly.com or by replying
> to this thread, if you have any suggestions on how to improve this
> bug bounty program, or if you have any other feedback in general.

Members of the PostgreSQL Security Team need to be disqualified from
eligibility, of course.

Other than that, I'd be interested to see how it works out.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 11:02 AM, Peter Geoghegan  wrote:
> I think that the real way to fix this is, as you say, to make it so
> that it isn't necessary in general to write trigger functions like
> this to make inheritance work.

Excuse me -- I mean to make *partitioning* work.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-24 Thread Jeff Janes
On Sat, Jun 20, 2015 at 7:20 AM, Robert Haas  wrote:

> On Sat, Jun 20, 2015 at 10:12 AM, Joel Jacobson  wrote:
> > Is there any chance the project would accept a patch which adds the
> > pg_stat_*_columns-feature without first optimizing the collector?
>
> I doubt it.  It's such a pain point already that massively increasing
> the amount of data we need to store does not seem like a good plan.
>

What if the increase only happened for people who turned on
track_column_usage?

Cheers,

Jeff


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 7:05 AM, Fujii Masao  wrote:
> How should we treat this problem for 9.5? If we want to fix this problem
> completely, probably we would need to make constraint_exclusion work with
> even UPSERT. Which sounds difficult to do at least for 9.5. Any other idea?
> Or we should just treat it as a limitation of UPSERT and add that document?

I think that the real way to fix this is, as you say, to make it so
that it isn't necessary in general to write trigger functions like
this to make inheritance work. I can imagine the system rewriting an
INSERT ... ON CONFLICT DO UPDATE query to target the correct child
partition based on the values proposed for insertion, much like
constraint exclusion. There are some problems with that idea, like
what to do about BEFORE INSERT triggers changing those values (the
values that we use to determine the appropriate child partition).
These seem like problems that can be fixed.

What I think cannot work is making the trigger inheritance pattern
really robust with UPSERT. Sure, we can do a little bit, like expose
basic information about whether the INSERT is ON CONFLICT DO NOTHING
or ON CONFLICT DO UPDATE. But as you imply, it seems unwise to expect
the user to do the rewriting themselves, within their trigger
function. Maybe that's an argument against ever exposing even this
basic information to the user.

Thanks for testing.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-24 Thread Jeff Janes
On Mon, Jun 15, 2015 at 7:47 PM, Jim Nasby  wrote:

> On 6/8/15 3:26 PM, Joel Jacobson wrote:
>
>> So I've heard from Magnus Hagander today IRL at our Stockholm PostgreSQL
>> User Group meeting where we discussed this idea. He told me the overhead
>> in the statistics collector is mainly when reading from it, not that
>> much when writing to it.
>>
>
> I've heard enough stories of people moving the stats files to faster
> storage that I'm not sure how true that really is...



Were the stories (or the experience which lead to the stories) on 9.3 or
later?  Do they have a good way to reproduce it for testing purposes?

When I was testing the stat file split patch, one thing I noticed is that
on the ext4 file system, when you atomically replace a file by renaming a
file over the top of an existing file name, it will automatically fsync the
file being renamed. This is exactly what we don't want in the case of the
temp stats files.  But there seems to be no way to circumvent it, other
than unlinking the target file before the rename (which of course defeats
the point of atomic replacement), or moving to a different filesystem for
the stats files.

Perhaps the problem was related to that.

Cheers,

Jeff


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Andres Freund
On 2015-06-24 19:35:51 +0200, Andres Freund wrote:
> Our code currently uses crude hacks (c.f. comment around
> SSL_clear_num_renegotiations(), and the loop around SSL_do_handshake()
> in the back branches) to manage renegotiations. There's pending patches
> to substantially increase the amount of ugly hacking to cope with us
> misusing the SSL_read/write protocol.

C.f.
http://archives.postgresql.org/message-id/54DE6FAF.6050005%40vmware.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Andres Freund
On 2015-06-24 12:57:03 -0400, Robert Haas wrote:
> On Wed, Jun 24, 2015 at 11:11 AM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> I, by now, have come to a different conclusion. I think it's time to
> >> entirely drop the renegotiation support.
> >
> > Well, that's a radical proposal, but I think we should take it seriously.
> >
> > On balance I think I agree that SSL renegotiation has not been worth the
> > trouble.  And we definitely aren't testing it adequately, so if we wanted
> > to keep it then there's even *more* work that somebody ought to expend.
> 
> I'd like to know what factors we are balancing against each other.

Benefits:

SSL renegotiation limits the exposure of on-the-wire material that's
leaked if either client or server is hijacked during a existing
session. With renegotiation the client/server will hopefully only have
the currently negotiated symetric key, covering only
session_renegotiation_limit bytes, in memory.

That's nice, but from a practical point of view it's not worth all that
much. If the server has been hacked nearly all the data is accessible
anyway, and even if not, the hacker can just continue collecting data
going forward.  If the client has been hacked, it's likely that it has
been relegating data for the session to somewhere that's still
accessible.

For the server hacked case all that generally only matters if perfect
forward secrecy (PFS) has been employed, without that the session keys
can be recovered anyway as the private key will be accessible in memory.

All this only matters for hacks that access running processes.

Deficits:

SSL renegotiation has had several weaknesses (c.f. CVE-2009-3555/RFC
5746 , although I'm not 100% sure it's possible to apply to PG) on the
protocol level, at least partially leading to much worse vulnerabilities
than renegotiation in the pg style fixes. The openssl implementation has
had several bugs several of them unfixed years after they were reported
(#3712, #2481, #2146,...). By my reading of openssl's code the current
state is entirely broken for duplex communication.

The current draft of the next version of the TLS standard removes
renegotiation entirely.

Additionally supporting SSL renegotiation requires SSL_write/read
callers to always call the SSL_write/read after either function has
reported the need for additional reads/writes (note that SSL_read can
require writes and the other way round). We don't comply with that rule,
especially in the backbranches, because it's really hard to do that.

Our code currently uses crude hacks (c.f. comment around
SSL_clear_num_renegotiations(), and the loop around SSL_do_handshake()
in the back branches) to manage renegotiations. There's pending patches
to substantially increase the amount of ugly hacking to cope with us
misusing the SSL_read/write protocol.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 7:38 AM, Robert Haas  wrote:
> Is the root of the problem that the trigger is called for an INSERT ..
> ON CONFLICT statement but it turns that into a plain INSERT?
>
> Is there any way of writing a partitioning trigger that doesn't have
> that defect?

We did discuss whether or not it was important to expose information
to triggers sufficient to route + UPSERT tuples into inheritance
children in an equivalent way (equivalent to an UPSERT into the
inheritance parent). At the time, you seemed to think that it could
wait [1].

Even if we added what was discussed as a minimal and practical way of
exposing this [2], it would still be awkward for an INSERT trigger to
examine the structure of the "UPDATE part" of the UPSERT -- there will
be no *tuple* for that case (yet).

Inheritance with triggers is a leaky abstraction, so this kind of
thing is always awkward. Still, UPSERT has full support for
*inheritance* -- that just doesn't help in this case.

[1] 
http://www.postgresql.org/message-id/ca+tgmozavxmwoebpk4aszonuwur3qpswwk6xetxmca+8h7c...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/cam3swzrvjyu8nnvw_jhexx4ymq9xaa7u0getlbcgym0oean...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Trustly PostgreSQL Data Corruption Bug Bounty Program

2015-06-24 Thread Joel Jacobson
Dear hackers,

The text below has also been published at:
https://trustly.com/en/postgresql-bug-bounty/

You are most welcome to contact me at j...@trustly.com or by replying
to this thread, if you have any suggestions on how to improve this
bug bounty program, or if you have any other feedback in general.

Many thanks.

Joel

--

Most people take for granted that PostgreSQL will never corrupt their data,
because the database has such an impressive track record and few users will
ever experience any data corruption problems, even after years of continuous
heavy load.

But just as we shouldn't take our beloved ones for granted, we shouldn't take
the reliability of PostgreSQL for granted. We want to give a small monetary
incentive to the people who look after our beloved PostgreSQL and make sure
that data corruption problems never ever reach a release candiate.

What's the bounty?
Trustly pays USD 1024 per verified data corruption bug in the HEAD of the master
branch of PostgreSQL.

Trustly's bug bounty program is not associated with the official PostgreSQL
project; it's a completely separate initiative having little to do with the
project, except that the bugs we are interested in finding happen to be in the
code the PostgreSQL project works on.

The bug must be demonstrated against the HEAD of the master branch of the
official PostgreSQL git repository, and must not have been reported or
discovered earlier.

Reproducing the bug must not require any types of hardware failure or
unreasonable actions by the administrator, such as turning off fsync or
full_page_writes. However, you are allowed to crash the server at will.

The submitted bug must be reproducible in a unit test and show any of the
following phenomena:
- database not starting up after a shutdown or a crash
- acknowledged commit not recoverable (data silently disappeared, or errors
  while trying to read or modify the data)
- data not written to the database appearing in SELECTs (duplicates of the
  same row, deleted rows reappear, or garbage data)

You are allowed to assume that PL/pgSQL is available, and bugs which can be
demonstrated to lead to any of the above phenomena only inside PL/pgSQL
functions are still eligible for the bounty, except for bugs which lead to
errors when trying to read or modify data.

- Why?
With this bug bounty program, our hope is to incentivize more people to work
on new clever ways of testing PostgreSQL or to invent other methods capable
of finding data corruption bugs in PostgreSQL.

The objective is to shorten the number of commits between the commit which
introduced a data corruption bug and the commit which fixed it.

If bugs in the master branch are not detected early enough, there is a risk
they will stick around undetected long enough to be included in a released
version of PostgreSQL.

If instead a new data corruption bug introduced in the master branch is fixed
before ever being included in the next release, then no harm is done from a
user perspective, since users who care much about their data only run released
versions of PostgreSQL in production.

- How?
To report a data corruption bug, please first submit it to the PostgreSQL
project by following the instructions at
http://www.postgresql.org/support/submitbug/.
Once the bug has been verified by the PostgreSQL project, go to
http://www.postgresql.org/list/pgsql-bugs/ to locate your bug report and its
message-id, then e-mail your bug report to bug-bou...@trustly.com to collect
your reward.

Bounties are awarded at the discretion of Trustly. We only pay individuals.
Please allow up to two weeks for us to verify a found bug.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-06-24 Thread Fabrízio de Royes Mello
On Thu, Apr 23, 2015 at 12:05 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Wed, Apr 22, 2015 at 3:48 PM, Payal Singh  wrote:
> >
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, failed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:not tested
> >
> > Seeing this when trying to apply the patch:
> >
> > Patching file src/backend/commands/tablecmds.c using Plan A...
> > Hunk #1 FAILED at 328.
> > Hunk #2 succeeded at 2294 (offset 11 lines).
> > Hunk #3 FAILED at 3399.
> > Hunk #4 FAILED at 3500.
> > Hunk #5 succeeded at 4658 with fuzz 1 (offset 65 lines).
> > Hunk #6 succeeded at 4753 (offset 66 lines).
> > Hunk #7 succeeded at 4989 with fuzz 2 (offset 66 lines).
> > Hunk #8 succeeded at 5003 (offset 69 lines).
> > Hunk #9 succeeded at 5017 (offset 69 lines).
> > Hunk #10 succeeded at 5033 (offset 69 lines).
> >
> > The new status of this patch is: Waiting on Author
> >
>
> The patch needs a "rebase". Done!
>

Another rebased version.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 207fec1..339320e 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE name
 
 where action is one of:
 
-ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ]column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] column_name [ SET DATA ] TYPE data_type [ COLLATE collation ] [ USING expression ]
 ALTER [ COLUMN ] column_name SET DEFAULT expression
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE name
 
   

-ADD COLUMN
+ADD COLUMN [ IF NOT EXISTS ]
 
  
   This form adds a new column to the table, using the same syntax as
-  .
+  . If IF NOT EXISTS
+  is specified and the column already exists, no error is thrown.
  
 

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d394713..2257ca2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -306,14 +306,14 @@ static void createForeignKeyTriggers(Relation rel, Oid refRelOid,
 		 Constraint *fkconstraint,
 		 Oid constraintOid, Oid indexOid);
 static void ATController(AlterTableStmt *parsetree,
-			 Relation rel, List *cmds, bool recurse, LOCKMODE lockmode);
+		 Relation rel, List *cmds, bool recurse, LOCKMODE lockmode);
 static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		  bool recurse, bool recursing, LOCKMODE lockmode);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode);
 static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		  AlterTableCmd *cmd, LOCKMODE lockmode);
 static void ATRewriteTables(AlterTableStmt *parsetree,
-List **wqueue, LOCKMODE lockmode);
+			List **wqueue, LOCKMODE lockmode);
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
@@ -328,8 +328,8 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname, bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -631,7 +631,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 
 			cooked = (CookedConstraint *) palloc(sizeof(CookedConstraint));
 			cooked->contype = CONSTR_DEFAULT;
-			cooked->conoid = InvalidOid;		/* until created */
+			cooked->conoid = InvalidOid;	/* until created */
 			cooked->name = NULL;
 			cooked->attnum = attnum;
 			cooked->expr = colDef->cooked_default;
@@ -1751,7 +1751,7 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 
 	cooked = (CookedConstraint *) 

Re: [HACKERS] problems on Solaris

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 8:42 AM, Andres Freund  wrote:
> I'm wondering wether we should add a #warning to atomic.c if either the
> fallback memory or compiler barrier is used? Might be annoying to people
> using -Werror, but I doubt that's possible anyway on such old systems.

#warning isn't totally portable, so I think it might be better not to
do that.  Yeah, it'll work in a lot of places, but the sorts of
obscure systems where the fallbacks are used are also more likely to
have funky compilers that just barf on the directive outright.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-06-24 Thread Simon Riggs
On 24 June 2015 at 16:30, Simon Riggs  wrote:


> Though TED sounds nice, the way to avoid another round of on-disk bugs is
> by using a new kind of testing, not simply by moving the bits around.
>
> It might be argued that we are increasing the diagnostic/forensic
> capabilities by making CIDs more public. We can use that...
>
> The good thing I see from TED is it allows us to test the on-disk outcome
> of concurrent activity. Currently we have isolationtester, but that is not
> married in any way to the on-disk state allowing us the situation where
> isolationtester can pass yet we have corrupted on-disk state. We should
> specify the on-disk tuple representation as a state machine and work out
> how to recheck the new on-disk state matches the state transition that we
> performed.
>

To put some more flesh on this idea...

What I'm suggesting is moving from a 2-session isolationtester to a
3-session isolationtester

1. Session 1
2. Session 2
3. After-action confirmation that the planned state change exists correctly
on disk, rather than simply having the correct behaviour

The absence of the third step in our current testing is what has led us to
various bugs in the past (IMHO)

A fourth step would be to define the isolationtests in such a way that we
can run them as burn-in tests for billions of executions.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 11:11 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> I, by now, have come to a different conclusion. I think it's time to
>> entirely drop the renegotiation support.
>
> Well, that's a radical proposal, but I think we should take it seriously.
>
> On balance I think I agree that SSL renegotiation has not been worth the
> trouble.  And we definitely aren't testing it adequately, so if we wanted
> to keep it then there's even *more* work that somebody ought to expend.

I'd like to know what factors we are balancing against each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 11:37 AM, Peter Eisentraut  wrote:
> On 6/12/15 9:31 AM, Robert Haas wrote:
>> Could we update our git hook to refuse a push of a new commit whose
>> timestamp is more than, say, 24 hours in the past?  Our commit history
>> has some timestamps in it now that are over a month off, and it's
>> really easy to do, because when you rebase a commit, it keeps the old
>> timestamp.  If you then merge or cherry-pick that into an official
>> branch rather than patch + commit, you end up with this problem unless
>> you are careful to fix it by hand.  It would be nice to prevent
>> further mistakes of this sort, as they create confusion.
>
> FWIW, I have been doing that, and I have not considered it a problem.
> If the patch was submitted three months ago, reviewed, and then
> committed unchanged, the date is what it is.  Also, when I cherry-pick a
> commit from master to a back branch, I keep the author timestamp the
> same.  I consider that a feature.

I don't, because it means that the timestamps you see when you run git
log are non-linear.  I don't care myself if they are slightly out of
order, although it seems that others do, but I do mind when they are
months off.

Typically when this happens to me, it's not a case of the patch being
unchanged.  I make changes on a branch and then use git rebase -i to
squash them into a single patch which I then cherry-pick.  But git
rebase -i keeps the timestamp of the first (oldest) commit, so I end
up with a commit that is timestamped as to when I began development,
not when I finished it.  So the date is just wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 11:30 AM, Simon Riggs  wrote:
> Though TED sounds nice, the way to avoid another round of on-disk bugs is by
> using a new kind of testing, not simply by moving the bits around.

I agree that we can do much better at testing than we traditionally
have done, and I think pretty much everyone in the room for the
developer unconference session on testing at PGCon was also in
agreement with that.  I really like the idea of taking purpose-built
testing frameworks - like the one that Heikki created for the WAL
format changes - and polishing them to the point where they can go
into core.  That's more work, of course, but very beneficial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Magnus Hagander
On Jun 24, 2015 5:13 PM, "Tom Lane"  wrote:
>
> Andres Freund  writes:
> > I, by now, have come to a different conclusion. I think it's time to
> > entirely drop the renegotiation support.
>
> Well, that's a radical proposal, but I think we should take it seriously.
>

Yes.

Just on my phone right now, but wasn't renegotiation also an attack vector
in one of the recent openssl bugs?

/Magnus


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Tom Lane
Andres Freund  writes:
> On 2015-06-24 11:57:53 -0400, Peter Eisentraut wrote:
>> If Red Hat fixes their bug, then PostgreSQL doesn't have any actual
>> problem anymore, does it?

> It does, there are numerous bugs around renegotiation that exist with
> upstream openssl and postgres. More in the older branches, but even in
> HEAD we break regularly. Most only occur in replication connections (due
> to copy both) and/or when using more complex clients where clients and
> servers send data at the same time due to pipelining.

The lesson to learn from the Red Hat fiasco is that vendors are not
adequately testing renegotiation either.  All the more reason to get
out from under it.  I did not like being told that "Postgres fails and
$randomapp doesn't, therefore it's Postgres' problem" when actually
the difference was that $randomapp doesn't invoke renegotiation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Andres Freund
On 2015-06-24 11:57:53 -0400, Peter Eisentraut wrote:
> On 6/23/15 2:33 PM, Tom Lane wrote:
> > I do not know at this point whether these behaviors are really the same
> > bug or not, but I wonder whether it's time to consider back-patching the
> > renegotiation fixes we did in 9.4.
> 
> If Red Hat fixes their bug, then PostgreSQL doesn't have any actual
> problem anymore, does it?

It does, there are numerous bugs around renegotiation that exist with
upstream openssl and postgres. More in the older branches, but even in
HEAD we break regularly. Most only occur in replication connections (due
to copy both) and/or when using more complex clients where clients and
servers send data at the same time due to pipelining.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Peter Eisentraut
On 6/23/15 2:33 PM, Tom Lane wrote:
> I do not know at this point whether these behaviors are really the same
> bug or not, but I wonder whether it's time to consider back-patching the
> renegotiation fixes we did in 9.4.

If Red Hat fixes their bug, then PostgreSQL doesn't have any actual
problem anymore, does it?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Peter Eisentraut
On 6/12/15 9:31 AM, Robert Haas wrote:
> Could we update our git hook to refuse a push of a new commit whose
> timestamp is more than, say, 24 hours in the past?  Our commit history
> has some timestamps in it now that are over a month off, and it's
> really easy to do, because when you rebase a commit, it keeps the old
> timestamp.  If you then merge or cherry-pick that into an official
> branch rather than patch + commit, you end up with this problem unless
> you are careful to fix it by hand.  It would be nice to prevent
> further mistakes of this sort, as they create confusion.

FWIW, I have been doing that, and I have not considered it a problem.
If the patch was submitted three months ago, reviewed, and then
committed unchanged, the date is what it is.  Also, when I cherry-pick a
commit from master to a back branch, I keep the author timestamp the
same.  I consider that a feature.

Note that we have a while ago enabled author and committer to be
distinct.  But the above proposal would effectively require author date
and committer date to be (almost) the same, which would create
inconsistent information.

Also, rebasing updates the committer date but not the author date,
because, well, the commit was changed, but no authoring is involved.
git rebase has options to vary this behavior, but note that the
documentation uses the term "lie" in their description.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-06-24 Thread Simon Riggs
On 24 June 2015 at 14:57, Robert Haas  wrote:

> On Fri, Jun 5, 2015 at 10:46 AM, Robert Haas 
> wrote:
> > It would be a great deal nicer if we didn't have to keep ANY of the
> > transactional data for a tuple around once it's all-visible.  Heikki
> > defined ephemeral as "only needed when xmin or xmax is in-progress",
> > but if we extended that definition slightly to "only needed when xmin
> > or xmax is in-progress or commited but not all-visible" then the
> > amount non-ephemeral data in the tuple header is 5 bytes (infomasks +
> > t_hoff).
>
> OK, I was wrong here: if you only have that stuff, you can't
> distinguish between a tuple that is visible to everyone and a tuple
> that is visible to no one.  I think the minimal amount of data we need
> in order to distinguish visibility once no relevant transactions are
> in progress is one XID: either XMIN, if the tuple was never updated at
> all or only be the inserting transaction or one of its subxacts; or
> XMAX, if the inserting transaction committed.  The other visibility
> information -- including (1) the other of XMIN and XMAX, (2) CMIN and
> CMAX, and (3) the CTID -- are only interesting the transactions
> involved are no longer running and, if they committed, visible to all
> running transactions.
>
> Heikki's proposal is basically to merge the 4-byte CID field and the
> first 4 bytes of the CTID that currently store the block number into
> one 8-byte field that can store a pointer into this new TED structure.
> And after mulling it over, that sounds pretty good to me.  It's true
> (as has been pointed out by several people) that the TED will need to
> be persistent because of prepared transactions.  But it would still be
> a big improvement over the status quo, because:
>
> (1) We would no longer need to freeze MultiXacts.  TED wouldn't need
> to be frozen either.  You'd just truncate it whenever RecentGlobalXmin
> advances.
>
> (2) If the TED becomes horribly corrupted, you can recover by
> committing or aborting any prepared transactions, shutting the system
> down, and truncating it, with no loss of data integrity.  Nothing in
> the TED is required to determine whether tuples are visible to an
> unrelated transaction - you only need it (a) to determine whether
> tuples are visible to a particular command within a transaction that
> has inserted, updated, or deleted the tuple and (b) determine whether
> tuples are locked.
>
> (3) As a bonus, we'd eliminate combo CIDs, because the TED could have
> space to separately store CMIN and CMAX.  Combo CIDs required special
> handling for logical decoding, and they are one of the nastier
> barriers to making parallelism support writes (because they are stored
> in backend-local memory of unbounded size and therefore can't easily
> be shared with workers), so it wouldn't be very sad if they went away.
>
> I'm not quite sure how to decide whether something like this worth (a)
> the work and (b) the risk of creating new bugs, but the more I think
> about it, the more the principal of the thing seems sound to me.


Splitting multitrans into persistent (xmax) and ephemeral (TED) is
something I already proposed so I support the concept; TED is a much better
suggestion, so I support TED.

Your addition of removing combocids is good also, since everything is
public.

I think we need to see a detailed design and we also need to understand the
size of this new beast. I'm worried it might become very big, very quickly
causing problems for us in other ways. We would need to be certain that
truncation can actually occur reasonably frequently and that there are no
edge cases that cause it to bloat.

Though TED sounds nice, the way to avoid another round of on-disk bugs is
by using a new kind of testing, not simply by moving the bits around.

It might be argued that we are increasing the diagnostic/forensic
capabilities by making CIDs more public. We can use that...

The good thing I see from TED is it allows us to test the on-disk outcome
of concurrent activity. Currently we have isolationtester, but that is not
married in any way to the on-disk state allowing us the situation where
isolationtester can pass yet we have corrupted on-disk state. We should
specify the on-disk tuple representation as a state machine and work out
how to recheck the new on-disk state matches the state transition that we
performed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Andres Freund
On 2015-06-24 11:11:16 -0400, Tom Lane wrote:
> On balance I think I agree that SSL renegotiation has not been worth the
> trouble.  And we definitely aren't testing it adequately, so if we wanted
> to keep it then there's even *more* work that somebody ought to expend.

Right. Our code was nearly entirely broken for streaming replication for
*years* without anybody noticing. And even now it doesn't reliably
work. It's also pretty hard to test due to the required data volumes and
the vast number of different behaviours across openssl versions.

> I assume we'd back-patch it, too?  (Probably not remove the
> ssl_renegotiation_limit variable, but always act as though it were
> zero.)

Yes, I think so. Maybe log a warning at startup if set to nonzero
(startup is probably the best we can do).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Tom Lane
Andres Freund  writes:
> I, by now, have come to a different conclusion. I think it's time to
> entirely drop the renegotiation support.

Well, that's a radical proposal, but I think we should take it seriously.

On balance I think I agree that SSL renegotiation has not been worth the
trouble.  And we definitely aren't testing it adequately, so if we wanted
to keep it then there's even *more* work that somebody ought to expend.

I assume we'd back-patch it, too?  (Probably not remove the
ssl_renegotiation_limit variable, but always act as though it were zero.)
If we still have to maintain the code in the back branches then we'd
continue to have to deal with its bugs for some time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should we back-patch SSL renegotiation fixes?

2015-06-24 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> >> On Tue, Jun 23, 2015 at 2:33 PM, Tom Lane  wrote:
> >>> I do not know at this point whether these behaviors are really the same
> >>> bug or not, but I wonder whether it's time to consider back-patching the
> >>> renegotiation fixes we did in 9.4.  Specifically, I think maybe we should
> >>> back-patch 31cf1a1a4, 86029b31e, and 36a3be654.
> 
> > Yes, +1 for backpatching.  Don't forget 5674460b and b1aebbb6.
> 
> Huh?  5674460b is ancient, and we concluded that b1aebbb6 didn't represent
> anything much more than cosmetic fixes.

Sorry, I mixed up 5674460b with 36a3be65 which you already mentioned ...
and I see that because of the conclusions from 272923a0a695 then the
one-char change in b1aebbb6 is not very interesting.

I do think that perhaps we should simplify the code down to what
272923a0a695 + 1c2b7c0879d8 do.


I also agree that the other changes by Andres and Heikki, which involve
making all communication use a nonblocking socket, seem too invasive to
backpatch; even with the insurance provided by beta+release, the
disruptiveness of changes seems a bit too high, considering that
387da18874a apparently cannot be used without 4f85fde8eb which is a bit
scary in itself.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Andres Freund
On 2015-06-24 10:38:38 -0400, Robert Haas wrote:
> On Wed, Jun 24, 2015 at 10:29 AM, Andres Freund  wrote:
> > On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
> >> INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current 
> >> partitioning
> >> mechanism. For example, in the following SQL commands, the last UPSERT 
> >> command
> >> would fail with an error. The error message is
> >
> > I think that's pretty much inevitable without baking in touple routing
> > into the core system and supporting unique-constraints that span
> > partitions. In other words, I don't think this is upsert's fault.
> 
> Is the root of the problem that the trigger is called for an INSERT ..
> ON CONFLICT statement but it turns that into a plain INSERT?

If you'd not do that, you'd avoid errors when violating a unique key
inside a partition, so it'd help a bit.

But it'd not do anything sane when the conflict is actually *not*
aligned with the partitioning schema, because we'll obviously only
search for conflicts within the one table.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Removing SSL renegotiation (Was: Should we back-patch SSL renegotiation fixes?)

2015-06-24 Thread Andres Freund
On 2015-06-23 14:33:21 -0400, Tom Lane wrote:
> I do not know at this point whether these behaviors are really the same
> bug or not, but I wonder whether it's time to consider back-patching the
> renegotiation fixes we did in 9.4.

I, by now, have come to a different conclusion. I think it's time to
entirely drop the renegotiation support.

While there's a security benefit of renegotiation by limiting the amount
of leaked data in case either client or server is exploited while the
connection is ongoing, the reality is that the renegotiation support in
openssl just isn't up to the task.

Both Heikki and I have spent a considerable amount of time trying to
find workarounds for the renegotiation bugs in openssl, but so far I
don't think that's bullet proof. Additionally the track record of
renegotiation both in ssl specification and in the openssl specification
is that it opens many more security holes than it fixes.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 10:29 AM, Andres Freund  wrote:
> On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
>> INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
>> mechanism. For example, in the following SQL commands, the last UPSERT 
>> command
>> would fail with an error. The error message is
>
> I think that's pretty much inevitable without baking in touple routing
> into the core system and supporting unique-constraints that span
> partitions. In other words, I don't think this is upsert's fault.

Is the root of the problem that the trigger is called for an INSERT ..
ON CONFLICT statement but it turns that into a plain INSERT?

Is there any way of writing a partitioning trigger that doesn't have
that defect?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-06-24 Thread Fujii Masao
On Fri, May 15, 2015 at 9:18 PM, Michael Paquier
 wrote:
> On Fri, May 15, 2015 at 8:55 PM, Beena Emerson  
> wrote:
>> There was a discussion on support for N synchronous standby servers started
>> by Michael. Refer
>> http://archives.postgresql.org/message-id/cab7npqr9c84ig0zuvhmqamq53vqsd4rc82vyci4dr27pvof...@mail.gmail.com
>> . The use of hooks and dedicated language was suggested, however, it seemed
>> to be an overkill for the scenario and there was no consensus on this.
>> Exploring GUC-land was preferred.
>
> Cool.
>
>> Please find attached a patch,  built on Michael's patch from above mentioned
>> thread, which supports choosing different number of nodes from each set i.e.
>> k nodes from set 1, l nodes from set 2, so on.
>> The format of synchronous_standby_names has been updated to standby name
>> followed by the required count separated by hyphen. Ex: 'aa-1, bb-3'.  The
>> transaction waits for all the specified number of standby in each group. Any
>> extra nodes with the same name will be considered potential. The special
>> entry * for the standby name is also supported.
>
> I don't think that this is going in the good direction, what was
> suggested mainly by Robert was to use a micro-language that would
> allow far more extensibility that what you are proposing. See for
> example ca+tgmobpwoenmmepfx0jwrvqufxvbqrv26ezq_xhk21gxrx...@mail.gmail.com
> for some ideas.

Doesn't this approach prevent us from specifying the "potential" synchronous
standby server? For example, imagine the case where you want to treat
the server AAA as synchronous standby. You also want to use the server BBB
as synchronous standby only if the server AAA goes down. IOW, you want to
prefer to the server AAA as synchronous standby rather than BBB.
Currently we can easily set up that case by just setting
synchronous_standby_names as follows.

synchronous_standby_names = 'AAA, BBB'

However, after we adopt the quorum commit feature with the proposed
macro-language, how can we set up that case? It seems impossible...
I'm afraid that this might be a backward compatibility issue.

Or we should extend the proposed micro-language so that it also can handle
the priority of each standby servers? Not sure that's possible, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Andres Freund
Hi,

On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
> INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
> mechanism. For example, in the following SQL commands, the last UPSERT command
> would fail with an error. The error message is

I think that's pretty much inevitable without baking in touple routing
into the core system and supporting unique-constraints that span
partitions. In other words, I don't think this is upsert's fault.

> Or we should just treat it as a limitation of UPSERT and add that document?

I think we should (IIRC there's actually already something) rather
document it as a limitation of the partitioning approach.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT on partition

2015-06-24 Thread Fujii Masao
Hi,

INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
mechanism. For example, in the following SQL commands, the last UPSERT command
would fail with an error. The error message is

ERROR:  duplicate key value violates unique constraint "hoge_20150601_pkey"
DETAIL:  Key (col1)=(2015-06-01 10:30:00) already exists.
CONTEXT:  SQL statement "INSERT INTO hoge_20150601 VALUES (($1).*)"
PL/pgSQL function hoge_insert_trigger() line 6 at EXECUTE statement


CREATE TABLE hoge (col1 TIMESTAMP PRIMARY KEY, col2 INT);

CREATE OR REPLACE FUNCTION hoge_insert_trigger () RETURNS trigger AS
$$
DECLARE
part TEXT;
BEGIN
part := 'hoge_' || to_char(new.col1,
'MMDD');
EXECUTE 'INSERT INTO ' || part || '
VALUES (($1).*)' USING new;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_hoge_trigger
 BEFORE INSERT ON hoge
 FOR EACH ROW EXECUTE PROCEDURE hoge_insert_trigger();

CREATE TABLE hoge_20150601 (
LIKE hoge INCLUDING INDEXES
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS,
CHECK ('2015-06-01 00:00:00' <= col1 AND col1 < '2015-06-02 00:00:00')
)
INHERITS (hoge);

CREATE TABLE hoge_20150602 (
LIKE hoge INCLUDING INDEXES
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS,
CHECK ('2015-06-02 00:00:00' <= col1 AND col1 < '2015-06-03 00:00:00')
)
INHERITS (hoge);

INSERT INTO hoge VALUES ('2015-06-01 10:30:00', 1234)
ON CONFLICT (col1) DO UPDATE SET col2 = EXCLUDED.col2;

INSERT INTO hoge VALUES ('2015-06-01 10:30:00', 1234)
ON CONFLICT (col1) DO UPDATE SET col2 = EXCLUDED.col2;


How should we treat this problem for 9.5? If we want to fix this problem
completely, probably we would need to make constraint_exclusion work with
even UPSERT. Which sounds difficult to do at least for 9.5. Any other idea?
Or we should just treat it as a limitation of UPSERT and add that document?

Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Robert Haas
On Tue, Jun 23, 2015 at 10:15 PM, Noah Misch  wrote:
>> That brings it back to the enforcing - would we want to enforce both those?
>
> May as well.  AuthorDate is the main source of trouble.  You would need to go
> out of your way (e.g. git filter-branch) to push an old CommitDate, but let's
> check it just the same.

Actually, you just need the system clock to be off.  I've noticed, for
example, that when my VMware VMs go to sleep (because I close my
laptop lid) their clocks don't run, so I have to remember to ntpdate
afterwards if I want correct time.  I don't happen to use those for
committing to PostgreSQL, but somebody else might have a similar setup
that they do use for that purpose.

So +1 for sanity-checking the commit date, and +1 as well as Alvaro's
proposal for checking for both past and future times.  I think we
should tolerate a bit more in terms of past timestamps than future
timestamps.  It's quite reasonable for someone to commit locally and
then run make check-world or so before pushing; let's not make that
unnecessarily annoying.  But future timestamps should really only ever
happen because of clock slew, and I don't think we need to tolerate
very much of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multixid hindsight design

2015-06-24 Thread Robert Haas
On Fri, Jun 5, 2015 at 10:46 AM, Robert Haas  wrote:
> It would be a great deal nicer if we didn't have to keep ANY of the
> transactional data for a tuple around once it's all-visible.  Heikki
> defined ephemeral as "only needed when xmin or xmax is in-progress",
> but if we extended that definition slightly to "only needed when xmin
> or xmax is in-progress or commited but not all-visible" then the
> amount non-ephemeral data in the tuple header is 5 bytes (infomasks +
> t_hoff).

OK, I was wrong here: if you only have that stuff, you can't
distinguish between a tuple that is visible to everyone and a tuple
that is visible to no one.  I think the minimal amount of data we need
in order to distinguish visibility once no relevant transactions are
in progress is one XID: either XMIN, if the tuple was never updated at
all or only be the inserting transaction or one of its subxacts; or
XMAX, if the inserting transaction committed.  The other visibility
information -- including (1) the other of XMIN and XMAX, (2) CMIN and
CMAX, and (3) the CTID -- are only interesting the transactions
involved are no longer running and, if they committed, visible to all
running transactions.

Heikki's proposal is basically to merge the 4-byte CID field and the
first 4 bytes of the CTID that currently store the block number into
one 8-byte field that can store a pointer into this new TED structure.
And after mulling it over, that sounds pretty good to me.  It's true
(as has been pointed out by several people) that the TED will need to
be persistent because of prepared transactions.  But it would still be
a big improvement over the status quo, because:

(1) We would no longer need to freeze MultiXacts.  TED wouldn't need
to be frozen either.  You'd just truncate it whenever RecentGlobalXmin
advances.

(2) If the TED becomes horribly corrupted, you can recover by
committing or aborting any prepared transactions, shutting the system
down, and truncating it, with no loss of data integrity.  Nothing in
the TED is required to determine whether tuples are visible to an
unrelated transaction - you only need it (a) to determine whether
tuples are visible to a particular command within a transaction that
has inserted, updated, or deleted the tuple and (b) determine whether
tuples are locked.

(3) As a bonus, we'd eliminate combo CIDs, because the TED could have
space to separately store CMIN and CMAX.  Combo CIDs required special
handling for logical decoding, and they are one of the nastier
barriers to making parallelism support writes (because they are stored
in backend-local memory of unbounded size and therefore can't easily
be shared with workers), so it wouldn't be very sad if they went away.

I'm not quite sure how to decide whether something like this worth (a)
the work and (b) the risk of creating new bugs, but the more I think
about it, the more the principal of the thing seems sound to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git push hook to check for outdated timestamps

2015-06-24 Thread Alvaro Herrera
Noah Misch wrote:
> On Sun, Jun 14, 2015 at 12:37:00PM +0200, Magnus Hagander wrote:

> > Would we in that case want to enforce linearity *and* recent-ness, or just
> > linearity? as in, do we care about the commit time even if it doesn't
> > change the order?
> 
> If a recency check is essentially free, let's check both.  Otherwise,
> enforcing linearity alone is a 95% solution that implicitly bounds recency.

Should there also be a check to reject times too far in the future?  I
think a clock skew of a few minutes (up to ten?) is acceptable, but more
than that would cause trouble for other committers later on.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-06-24 Thread Kohei KaiGai
Does it make sense to put the result tuple of remote join on evety
estate->es_epqTupleSet[] slot represented by this ForeignScan if
scanrelid==0?

It allows to recheck qualifier for each LockRow that intends to lock
base foreign table underlying the remote join.
ForeignScan->fdw_relids tells us which rtindexes are represented
by this ForeignScan, so infrastructure side may be able to handle.

Thanks,


2015-06-24 11:40 GMT+09:00 Etsuro Fujita :
> Hi,
>
> While reviewing the foreign join pushdown core patch, I noticed that the
> patch doesn't perform an EvalPlanQual recheck properly.  The example
> that crashes the server will be shown below (it uses the postgres_fdw
> patch [1]).  I think the reason for that is because the ForeignScan node
> performing the foreign join remotely has scanrelid = 0 while
> ExecScanFetch assumes that its scan node has scanrelid > 0.
>
> I think this is a bug.  I've not figured out how to fix this yet, but I
> thought we would also need another plan that evaluates the join locally
> for the test tuples for EvalPlanQual.  Though I'm missing something though.
>
> Create an environment:
>
> postgres=# create table tab (a int, b int);
> CREATE TABLE
> postgres=# create foreign table foo (a int) server myserver options
> (table_name 'foo');
> CREATE FOREIGN TABLE
> postgres=# create foreign table bar (a int) server myserver options
> (table_name 'bar');
> CREATE FOREIGN TABLE
> postgres=# insert into tab values (1, 1);
> INSERT 0 1
> postgres=# insert into foo values (1);
> INSERT 0 1
> postgres=# insert into bar values (1);
> INSERT 0 1
> postgres=# analyze tab;
> ANALYZE
> postgres=# analyze foo;
> ANALYZE
> postgres=# analyze bar;
> ANALYZE
>
> Run the example:
>
> [Terminal 1]
> postgres=# begin;
> BEGIN
> postgres=# update tab set b = b + 1 where a = 1;
> UPDATE 1
>
> [Terminal 2]
> postgres=# explain verbose select tab.* from tab, foo, bar where tab.a =
> foo.a and foo.a = bar.a for update;
>
>  QUERY PLAN
>
> 
> 
>  LockRows  (cost=100.00..101.18 rows=4 width=70)
>Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
>->  Nested Loop  (cost=100.00..101.14 rows=4 width=70)
>  Output: tab.a, tab.b, tab.ctid, foo.*, bar.*
>  Join Filter: (foo.a = tab.a)
>  ->  Seq Scan on public.tab  (cost=0.00..1.01 rows=1 width=14)
>Output: tab.a, tab.b, tab.ctid
>  ->  Foreign Scan  (cost=100.00..100.08 rows=4 width=64)
>Output: foo.*, foo.a, bar.*, bar.a
>Relations: (public.foo) INNER JOIN (public.bar)
>Remote SQL: SELECT l.a1, l.a2, r.a1, r.a2 FROM (SELECT
> ROW(l.a9), l.a9 FROM (SELECT a a9 FROM public.foo FOR UPDATE) l) l (a1,
> a2) INNER
> JOIN (SELECT ROW(r.a9), r.a9 FROM (SELECT a a9 FROM public.bar FOR
> UPDATE) r) r (a1, a2) ON ((l.a2 = r.a2))
> (11 rows)
>
> postgres=# select tab.* from tab, foo, bar where tab.a = foo.a and foo.a
> = bar.a for update;
>
> [Terminal 1]
> postgres=# commit;
> COMMIT
>
> [Terminal 2]
> (After the commit in Terminal 1, Terminal 2 will show the following.)
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
> Best regards,
> Etsuro Fujita
>
> [1]
> http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] problems on Solaris

2015-06-24 Thread Andres Freund
On 2015-05-31 01:09:18 +0200, Andres Freund wrote:
> On 2015-05-27 21:23:34 -0400, Robert Haas wrote:
> > > Oh wow, that's bad, and could explain a couple of the problems we're
> > > seing. One possible way to fix is to replace the sequence with if
> > > (!TAS(spin)) S_UNLOCK();. But that'd mean TAS() has to be a barrier,
> > > even if the lock isn't free - which e.g. isn't the case for PowerPC's
> > > implementation :(
> > 
> > Another possibility is to make the fallback barrier implementation a
> > system call, like maybe kill(PostmasterPid, 0).
> 
> It's not necessarily true that all system calls are effective
> barriers. I'm e.g. doubtful that kill(..., 0) is one as it only performs
> local error checking. It might be that the process existance check
> includes a lock that's sufficient, but I would not like to rely on
> it. Sending an actual signal probably would be, but has the potential of
> disrupting postmaster progress.

I thought about various other syscalls we could use, and your proposal
seems to be least worst. My idea of using waitpid() falls short because
it only works for child processes.  I think the kind of systems that we
don't have barriers on, are unlikely to use complex stuff like RCU to
manage access to process hierarchies.

I reproduced the 'stuck' issue on x86 by #ifdef'ing out barrier support
- about 50% of the time test_shm_mq gets stuck. Replacing it with
kill(PostmasterPid, 0) "works". Unless somebody protests soon that's
what I'm going to commit. It surely is better than easily reproducible
hangs.

I'm wondering wether we should add a #warning to atomic.c if either the
fallback memory or compiler barrier is used? Might be annoying to people
using -Werror, but I doubt that's possible anyway on such old systems.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-06-24 Thread Uriy Zhuravlev
Hello Hackers.

Because change the commutator and negator raised questions I suggest 3 version 
of the patch without them. In addition, for us now is much more important 
restrict and join (for "Selectivity estimation for intarray" patch).

What do you think?

Thanks!

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..f7770fd 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -28,8 +28,10 @@
 #include "catalog/pg_operator.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "miscadmin.h"
 #include "parser/parse_oper.h"
+#include "parser/parse_func.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -53,7 +55,7 @@ static Oid OperatorShellMake(const char *operatorName,
   Oid leftTypeId,
   Oid rightTypeId);
 
-static void OperatorUpd(Oid baseId, Oid commId, Oid negId);
+static void ShellOperatorUpd(Oid baseId, Oid commId, Oid negId);
 
 static Oid get_other_operator(List *otherOp,
    Oid otherLeftTypeId, Oid otherRightTypeId,
@@ -563,7 +565,7 @@ OperatorCreate(const char *operatorName,
 		commutatorId = operatorObjectId;
 
 	if (OidIsValid(commutatorId) || OidIsValid(negatorId))
-		OperatorUpd(operatorObjectId, commutatorId, negatorId);
+		ShellOperatorUpd(operatorObjectId, commutatorId, negatorId);
 
 	return address;
 }
@@ -633,7 +635,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
 }
 
 /*
- * OperatorUpd
+ * ShellOperatorUpd
  *
  *	For a given operator, look up its negator and commutator operators.
  *	If they are defined, but their negator and commutator fields
@@ -642,7 +644,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
  *	which are the negator or commutator of each other.
  */
 static void
-OperatorUpd(Oid baseId, Oid commId, Oid negId)
+ShellOperatorUpd(Oid baseId, Oid commId, Oid negId)
 {
 	int			i;
 	Relation	pg_operator_desc;
@@ -864,3 +866,158 @@ makeOperatorDependencies(HeapTuple tuple)
 
 	return myself;
 }
+
+/*
+ * Operator update aka ALTER OPERATOR for RESTRICT, JOIN
+ */
+void OperatorUpd(Oid classId,
+ Oid baseId,
+ List *operator_params)
+{
+	int			i;
+	ListCell	*pl;
+	Relation	catalog;
+	HeapTuple	tup;
+	Oid 		operator_param_id = 0;
+	bool		nulls[Natts_pg_operator];
+	bool		replaces[Natts_pg_operator];
+	Datum		values[Natts_pg_operator];
+
+	for (i = 0; i < Natts_pg_operator; ++i)
+	{
+		values[i] = (Datum) 0;
+		replaces[i] = false;
+		nulls[i] = false;
+	}
+
+	catalog = heap_open(classId, RowExclusiveLock);
+	tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(baseId));
+	if (HeapTupleIsValid(tup))
+	{
+		/*
+		 * loop over the definition list and extract the information we need.
+		 */
+		foreach(pl, operator_params)
+		{
+			DefElem*defel = (DefElem *) lfirst(pl);
+			List	   *param = defGetQualifiedName(defel);
+			int			param_type;
+
+			if (pg_strcasecmp(defel->defname, "restrict") == 0)
+param_type = Anum_pg_operator_oprrest;
+			else if (pg_strcasecmp(defel->defname, "join") == 0)
+param_type = Anum_pg_operator_oprjoin;
+			else
+			{
+ereport(WARNING,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("operator attribute \"%s\" not recognized",
+defel->defname)));
+continue;
+			}
+
+			/* Resets if written NULL */
+			if (pg_strcasecmp(NameListToString(param), "null") == 0)
+			{
+values[param_type - 1] = ObjectIdGetDatum(InvalidOid);
+replaces[param_type - 1] = true;
+continue;
+			}
+
+			if (param_type == Anum_pg_operator_oprrest)
+			{
+if (PointerIsValid(param))
+{
+	Oid			typeId[5];
+	AclResult	aclresult;
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = OIDOID;		/* operator OID */
+	typeId[2] = INTERNALOID;	/* args list */
+	typeId[3] = INT4OID;	/* varRelid */
+
+	operator_param_id = LookupFuncName(param, 4, typeId, false);
+
+	/* estimators must return float8 */
+	if (get_func_rettype(operator_param_id) != FLOAT8OID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("restriction estimator function %s must return type \"float8\"",
+		NameListToString(param;
+
+	/* Require EXECUTE rights for the estimator */
+	aclresult = pg_proc_aclcheck(operator_param_id, GetUserId(), ACL_EXECUTE);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, ACL_KIND_PROC,
+	   NameListToString(param));
+}
+else
+	operator_param_id = 0;
+			}
+			else if (param_type == Anum_pg_operator_oprjoin)
+			{
+if (PointerIsValid(param))
+{
+	Oid			typeId[5];
+	AclResult	aclresult;
+	typeId[0] = INTERNALOID;	/* PlannerInfo */
+	typeId[1] = OIDOID;		/* operator OID */
+	typeId[2] = INTERNALOID;	/* args list */
+	typeId[3] 

Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

2015-06-24 Thread Taiki Kondo
Hi, Merlin.

Thank you for your comment, and sorry for late response.

> *) how do you estimate %done and ETA when dumping?

I mentioned in the mail I replied to Andres, I think %done and ETA can be 
estimated from number of tuples in "pg_class.reltuples".
Pg_dump, you maybe know, writes in file whenever it reads one tuple when 
executing "COPY FROM".
Therefore pg_dump can calculate %done and ETA by getting "pg_class.reltuples" 
and measuring number of dumped tuples per second.

> *) what's the benefit of doing this instead of using a utility like 'pv'?

Thank you for giving new point of view. I have never known about the utility 
'pv'. :)
I tried pg_dump with pv, and then I found this approach uses the number of how 
many chars passed through the pipe.
In my point of view, it seems that using 'pv' has some problems as following.
At least, I think the following points from No.1 to No.4 are benefits.

1) %done and ETA is calculated from number of chars passed through the pipe 
(mentioned above), and total amount of chars is specified by "hand".
   Therefore, if specified total amount is completely wrong, %done and ETA have 
a large gap from their true value.
2) Since 'pv' is used with pipe processing, pg_dump/pg_restore can't be used 
together with '-j' option.
   This forces pg_dump/pg_restore to be processing with only 1 process even if 
processing with 2+ processes is possible.
3) Since same reason, command line for pg_dump/pg_restore is longer and less 
easier.
   This may spoil user experiences. 
4) To pass data through pipe, pg_dump can't be used together with '-f' option, 
and pg_restore also can't be used together with '-d' option.
   This also may spoil user experiences because command line is longer and less 
easier.
5) Neither this approach nor my proposal resolve the concern about "CREATE 
INDEX".
   We have to discuss more further for it.



regards,
--
Taiki Kondo



-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Friday, June 12, 2015 10:42 PM
To: Taiki Kondo
Cc: pgsql-hackers@postgresql.org; Akio Iwaasa
Subject: Re: [HACKERS] [Proposal] Progress bar for pg_dump/pg_restore

On Fri, Jun 12, 2015 at 7:45 AM, Taiki Kondo  wrote:
> Hi, all.
>
> I am newbie in hackers.
> I have an idea from my point of view as one user, I would like to propose the 
> following.
>
>
> Progress bar for pg_dump / pg_restore
> =
>
> Motivation
> --
> "pg_dump" and "pg_restore" show nothing if users don't specify verbose (-v) 
> option.
> In too large table to finish in a few minutes, this behavior worries some 
> users about if this situation (nothing shows up) is all right.
>
> I propose this feature to free these users from worrying.
>
>
> Design & API
> 
> When pg_dump / pg_restore is running, progress bar and estimated time to 
> finish is shown on screen like following.
>
>
> =>   (50%)  15:50
>
> The bar ("=>" in above) and percentage value ("50%" in above) show percentage 
> of progress, and the time ("15:50" in above) shows estimated time to finish.
> (This percentage is the ratio for the whole processing.)
>
> Percentage and time are calculated and shown for every 1 second.
>
> In pg_dump, the information, which is required for calculating percentage and 
> time, is from pg_class.
>
> In pg_restore, to calculate the same things, I want to record total amount of 
> command lines into pg_dump file, thus I would like to add a new element to 
> "Archive" structure.
> (This means that version number of archive format is changed.)
>
>
> Usage
> --
> To use this feature, user must specify "-P" option in command line.
> (This definition is also temporary, so this is changeable if this leads 
> problem.)
>
> $ pg_dump -Fc -P -f foo.pgdump foo
>
> I also think it's better that this feature is enabled as the default and does 
> not force users to specify any options, but it means changing the default 
> behavior, and can make problem in some programs expecting no output on stdout.
>
>
> I will implement this feature if this proposal is accepted by hackers.
> (Maybe, I will not use ncurses for implementing this feature, because ncurses 
> can not be used with standard printf family functions.)
>
>
> Any comments are welcome.

*) how do you estimate %done and ETA when dumping?

*) what's the benefit of doing this instead of using a utility like 'pv'?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes

2015-06-24 Thread Andres Freund
On 2015-06-23 21:08:36 -0400, Robert Haas wrote:
> On Tue, Jun 23, 2015 at 5:48 PM, Kevin Grittner  wrote:
> > Andres Freund  wrote:
> >>>  
> >>>   
> >>>Improve concurrent locking and buffer scan performance (Andres
> >>>Freund, Kevin Grittner)
> >>>   
> >>>  
> >>
> >> If this is ab5194e6f, I don't think it makes sense to mention "buffer
> >> scan" - it's just any lwlock, and buffer locks aren't the primary
> >> benefit (ProcArrayLock, buffer mapping lock probably are that). I also
> >
> >> don't think Kevin was involved?
> >
> > It seems likely that 2ed5b87f9 was combined with something else in
> > this reference.  By reducing buffer pins and buffer content locking
> > during btree index scans it shows a slight performance gain in
> > btree scans and avoids some blocking of btree index vacuuming.

Oh. That's what it was combined with. I don't think it makes sense to
throw these three items together into one note. Their benefit/risk
potential is pretty different.

> I think maybe we should separate that back out.  The list needs to be
> user-accessible, but if it's hard to understand what it's referring
> to, that's not good either.

Yea. And if then Bruce goes and compares feature counts... :)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 1:45 AM, Craig Ringer  wrote:
> WARNING: hash indexes are not crash-safe, not replicated, and their
> use is discouraged

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2015-06-24 Thread Craig Ringer
On 18 October 2014 at 02:36, Bruce Momjian  wrote:
> On Fri, Oct 17, 2014 at 12:56:52PM -0400, Tom Lane wrote:
>> David G Johnston  writes:
>> > The question is whether we explain the implications of not being WAL-logged
>> > in an error message or simply state the fact and let the documentation
>> > explain the hazards - basically just output:
>> > "hash indexes are not WAL-logged and their use is discouraged"
>>
>> +1.  The warning message is not the place to be trying to explain all the
>> details.

While I don't think it should explain all the details, "WAL-logged"
will mean *nothing* to most users, including most of those who're
using streaming replication, PITR, etc.

I would strongly prefer to see something that conveys some meaning to
a user who doesn't know PostgreSQL's innards, since by the time "WAL
logged" means much to you, you've got a good chance of having already
learned that hash indexes aren't crash-safe. Or of reading the manual.

Perhaps:

WARNING: hash indexes are not crash-safe, not replicated, and their
use is discouraged

?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_*_columns?

2015-06-24 Thread Andres Freund
On 2015-06-23 16:32:54 -0400, Robert Haas wrote:
> On Tue, Jun 23, 2015 at 3:47 PM, Andres Freund  wrote:
> > On 2015-06-22 21:05:52 -0400, Robert Haas wrote:
> >> Interesting idea.  We could consider the set of stats files a database
> >> unto itself and reserve a low-numbered OID for it.  The obvious thing
> >> to do is use the database's OID as the relfilenode, but then how do
> >> you replace the stats file?
> >
> > The relmapper infrastructure should be able to take care of that.
> 
> How?  I think it assumes that the number of mapped entries is pretty small.

Well, we could just use one stats file for all of the stats - that'd not
necessarily be bad. The relmapper would just be there to be able, if we
need it, to rewrite them in the background. The other alternative is to
place the stats file in the target database instead of pg_global, in
which case we can just use the per database relmapper.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash index creation warning

2015-06-24 Thread Michael Paquier
On Wed, Jun 24, 2015 at 12:27 AM, Robert Haas wrote:
> I think you should be testing RelationNeedsWAL(), not the
> relpersistence directly.  The same point applies for temporary
> indexes.

Indeed. Patch updated attached.
-- 
Michael
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7340a1f..b450bcf 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -490,7 +490,8 @@ DefineIndex(Oid relationId,
 	accessMethodId = HeapTupleGetOid(tuple);
 	accessMethodForm = (Form_pg_am) GETSTRUCT(tuple);
 
-	if (strcmp(accessMethodName, "hash") == 0)
+	if (strcmp(accessMethodName, "hash") == 0 &&
+		RelationNeedsWAL(rel))
 		ereport(WARNING,
 (errmsg("hash indexes are not WAL-logged and their use is discouraged")));
 
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 5c2e67d..b72e65d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2342,6 +2342,9 @@ CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
 WARNING:  hash indexes are not WAL-logged and their use is discouraged
 CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
 WARNING:  hash indexes are not WAL-logged and their use is discouraged
+CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
+CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id int4_ops);
+DROP TABLE unlogged_hash_table;
 -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
 --
 -- Test functional index
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 67dd2f0..ff86953 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -684,6 +684,10 @@ CREATE INDEX hash_txt_index ON hash_txt_heap USING hash (random text_ops);
 
 CREATE INDEX hash_f8_index ON hash_f8_heap USING hash (random float8_ops);
 
+CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
+CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id int4_ops);
+DROP TABLE unlogged_hash_table;
+
 -- CREATE INDEX hash_ovfl_index ON hash_ovfl_heap USING hash (x int4_ops);
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers