date:20180228

Re: Online enabling of checksums

2018-02-28 Thread Andrey Borodin



> 28 февр. 2018 г., в 6:22, Daniel Gustafsson  написал(а):
> 
>> Is there any way we could provide this functionality for previous versions 
>> (9.6,10)? Like implement utility for offline checksum enabling, without 
>> WAL-logging, surely.
> 
> While outside the scope of the patch in question (since it deals with enabling
> checksums online), such a utility should be perfectly possible to write.

I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, 
bgw_type, changes to specscanner), patch works fine.
Do we provide backporting for such features?

Best regards, Andrey Borodin.

Re: Contention preventing locking

2018-02-28 Thread Konstantin Knizhnik




On 28.02.2018 16:32, Amit Kapila wrote:

On Mon, Feb 26, 2018 at 8:26 PM, Konstantin Knizhnik
 wrote:

On 26.02.2018 17:20, Amit Kapila wrote:

Can you please explain, how it can be done easily without extra tuple
locks?  I have tried to read your patch but due to lack of comments,
it is not clear what you are trying to achieve.  As far as I can see
you are changing the locktag passed to LockAcquireExtended by the
first waiter for the transaction.  How will it achieve the serial
waiting protocol (queue the waiters for tuple) for a particular tuple
being updated?


The idea of transaction lock chaining was very simple. I have explained it
in the first main in this thread.
Assumed that transaction T1 has updated tuple R1.
Transaction T2 also tries to update this tuple and so waits for T1 XID.
If then yet another transaction T3 also tries to update R1, then it should
wait for T2, not for T1.


Isn't this exactly we try to do via tuple locks
(heap_acquire_tuplock)?  Currently, T2 before waiting for T1 will
acquire tuple lock on R1 and T3 will wait on T2 (not on T-1) to
release the tuple lock on R-1 and similarly, the other waiters should
form a queue and will be waked one-by-one.  After this as soon T2 is
waked up, it will release the lock on tuple and will try to fetch the
updated tuple. Now, releasing the lock on tuple by T-2 will allow T-3
to also proceed and as T-3 was supposed to wait on T-1 (according to
tuple satisfies API), it will immediately be released and it will also
try to do the same work as is done by T-2.  One of those will succeed
and other have to re-fetch the updated-tuple again.


Yes, but two notices:
1. Tuple lock is used inside heap_* functions. But not in 
EvalPlanQualFetch where transaction lock is also used.
2. Tuple lock is hold until the end of update, not until commit of the 
transaction. So other transaction can receive conrol before this 
transaction is completed. And contention still takes place.
Contention is reduced and performance is increased only if locks (either 
tuple lock, either xid lock) are hold until the end of transaction.  
Unfortunately it may lead to deadlock.


My last attempt to reduce contention was to replace shared lock with 
exclusive in XactLockTableWait and removing unlock from this function. 
So only one transaction can get xact lock and will will hold it until 
the end of transaction. Also tuple lock seems to be not needed in this 
case. It shows better performance on pgrw test but on YCSB benchmark 
with workload A (50% of updates) performance was even worser than with 
vanilla postgres. But was is wost of all - there are deadlocks in 
pgbench tests.



I think in this whole process backends may need to wait multiple times
either on tuple lock or xact lock.  It seems the reason for these
waits is that we immediately release the tuple lock (acquired by
heap_acquire_tuplock) once the transaction on which we were waiting is
finished.  AFAICU, the reason for releasing the tuple lock immediately
instead of at end of the transaction is that we don't want to
accumulate too many locks as that can lead to the unbounded use of
shared memory.  How about if we release the tuple lock at end of the
transaction unless the transaction acquires more than a certain
threshold (say 10 or 50) of such locks in which case we will fall back
to current strategy?

Certainly, I have tested such version. Unfortunately it doesn't help. 
Tuple lock is using tuple TID. But once transaction has made the update, 
new version of tuple will be produced with different TID and all new 
transactions will see this version, so them will not notice this lock at 
all. This is why my first attempt to address content was to replace TID 
lock with PK (primary key) lock. And it really helps to reduce 
contention and degradation of performance with increasing number of 
connections. But it is not so easy to correctly extract Pk in all cases.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

RE: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-02-28 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:mich...@paquier.xyz]
> Yes, it should not copy those WAL files.  Most of the time they are going
> to be meaningless.  See this recent thread:
> https://www.postgresql.org/message-id/20180126023609.GH17847%40paquier
> .xyz
> So I would rather go this way instead of having to worry about manipulating
> those WAL segments as you do.  Depending on the needs, I think that even
> a backpatch could be considered.

Thank you for information.  I didn't notice those activities going around 
pg_rewind.

It's a regret that Chen's patch, which limits the WAL to be copied, is not 
committed yet.  It looks good to be ready for committer.


> > Related to this, shouldn't pg_rewind avoid copying more files and
> > directories like pg_basebackup?  Currently, pg_rewind doesn't copy
> > postmaster.pid, postmaster.opts, and temporary files/directories
> > (pg_sql_tmp/).
> 
> Yes, it should not copy those files.  I have a patch in the current CF to
> do that:
> https://commitfest.postgresql.org/17/1507/

Wow, what a great patch.  I think I should look at it.  But I'm afraid it won't 
be backpatched because it's big...

Even with your patch and Chen's one, my small patch is probably necessary to 
avoid leaving 0-byte or half-baked files.  I'm not sure whether those strangely 
sized files would cause actual trouble, but maybe it would be healthy to try to 
clean things up as much as possible.  (files in pg_twophase/ might emit WARNING 
messages, garbage server log files might make the DBA worried, etc.; yes, these 
may be just FUD.)

Regards
Takayuki Tsunakawa

Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 13:39, Arseny Sher  wrote:

> Hello,
>
> In LogicalConfirmReceivedLocation two fields (data.catalog_xmin and
> effective_catalog_xmin) of ReplicationSlot structure are used for
> advancing xmin of the slot. This allows to avoid hole when tuples might
> already have been vacuumed, but slot's state was not yet flushed to the
> disk: if we crash during this hole, after restart we would assume that
> all tuples with xmax > old catalog_xmin are still there, while actually
> some of them might be already vacuumed. However, the same dodge is not
> used for restart_lsn advancement. This means that under somewhat
> unlikely circumstances it is possible that we will start decoding from
> segment which was already recycled, making the slot broken. Shouldn't
> this be fixed?


You mean the small window between

MyReplicationSlot->data.restart_lsn =
MyReplicationSlot->candidate_restart_lsn;


and

ReplicationSlotMarkDirty();
ReplicationSlotSave();

in LogicalConfirmReceivedLocation ?

We do release the slot spinlock after updating the slot and before dirtying
and flushing it. But to make the change visible, someone else would have to
call ReplicationSlotsComputeRequiredLSN(). That's possible by the looks,
and could be caused by a concurrent slot drop, physical slot confirmation,
or another logical slot handling a concurrent confirmation.

For something to break, we'd have to

* hit this race to update XLogCtl->replicationSlotMinLSN  by a concurrent
call to ReplicationSlotsComputeRequiredLSN while in
LogicalConfirmReceivedLocation
* have the furthest-behind slot be the one in the race window in
LogicalConfirmReceivedLocation
* checkpoint here, before slot is marked dirty
* actually recycle/remove the needed xlog
* crash before writing the new slot state

Checkpoints write out slot state. But only for dirty slots. And we didn't
dirty the slot while we had its spinlock, we only dirty it just before
writing.

So I can't say it's definitely impossible. It seems astonishingly unlikely,
but that's not always good enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] pgbench randomness initialization

2018-02-28 Thread Fabien COELHO



Hello Tom,


Fabien COELHO  writes:

This is a simple patch that does what it says on the tin. I ran into
trouble with the pgbench TAP test *even before applying the patch*, but
only because I was doing a VPATH build as a user without 'write'
on the source tree (001_pgbench_with_server.pl tried to make pgbench
create log files there). Bad me. Oddly, that was the only test in the
whole tree to have such an issue, so here I add a pre-patch to fix that.
Now my review needs a review. :)



Yep. I find the multiple chdir solution a little bit too extreme.



ISTM that it should rather add the correct path to --log-prefix by
prepending $node->basedir, like the pgbench function does for -f scripts.
See attached.


Hm ... so I tried to replicate this problem, and failed to: the log files
get made under the VPATH build directory, as desired, even without this
patch.  Am I doing something wrong, or is this platform-dependent somehow?


As I recall, it indeed works if the source directories are rw, but fails 
if they are ro because then the local mkdir fails. So you would have to do 
a vpath build with sources that are ro to get the issue the patch is 
fixing. Otherwise, the issue would have been cought earlier by the 
buildfarm, which I guess is doing vpath compilation and full validation.


--
Fabien.

Re: server crash in nodeAppend.c

2018-02-28 Thread Rajkumar Raghuwanshi

On Wed, Feb 28, 2018 at 9:29 PM, Robert Haas  wrote:

> Nice test case.  I pushed commit
> ce1663cdcdbd9bf15c81570277f70571b3727dd3, including your test case, to
> fix this.

Thanks Robert for fix and commit. I have reverified commit, this is working
fine now.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: CALL optional in PL/pgSQL

2018-02-28 Thread Pavel Stehule

2018-03-01 5:51 GMT+01:00 Peter Eisentraut :

> This seems to be a popular issue when porting from PL/SQL, so I'll throw
> it out here for discussion.  Apparently, in PL/SQL you can call another
> procedure without the CALL keyword.  Here is a patch that attempts to
> implement that in PL/pgSQL as well.  It's not very pretty.
>

The CALL is not optional in PL/SQL - I was surprised - it is required in
some environments, and it should not be used in other (like PL/SQL)

please, fix me, if I am wrong.

SQL/PSM requires it.

I agree, so in this case, the CALL can be optional - because procedures are
called by different mechanism than functions - and there is not additional
overhead. It is not strictly necessary, because tools like ora2pg has not
any problem with procedure identification and some transformations.

But - if we allow optional CALL in PL/pgSQL, then we will have
inconsistence between PL/pgSQL and other environments, when the CALL will
be required. What is not too nice.


> I seem to recall that there were past discussions about this, with
> respect to the PERFORM command, but I couldn't find them anymore.
>
> Also, I think PL/SQL allows you to call a procedure with no arguments
> without parentheses.  I have not implemented that.  I think it could be
> done, but it's not very appealing.
>

I don't like this feature. I don't see any benefit. Different case are
functions - then users can implement some pseudovariables like
CURRENT_USER, ..


> If anyone has more details about the PL/SQL side of this, that would be
> useful.  What I could find is that using CALL and not using CALL appear
> to be equivalent.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: Synchronous replay take III

2018-02-28 Thread Thomas Munro

On Thu, Mar 1, 2018 at 2:39 PM, Thomas Munro
 wrote:
> I was pinged off-list by a fellow -hackers denizen interested in the
> synchronous replay feature and wanting a rebased patch to test.  Here
> it goes, just in time for a Commitfest.  Please skip to the bottom of
> this message for testing notes.

Moved to next CF based on
https://www.postgresql.org/message-id/24193.1519882945%40sss.pgh.pa.us
.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: 2018-03 Commitfest starts tomorrow

2018-02-28 Thread Tom Lane

David Steele  writes:
> I'll be starting the Commitfest at midnight AoE (07:00 ET, 13:00 CET) so
> please get your patches in before then.
> Please remember that if you drop a new and large (or invasive patch)
> into this CF it may be moved to the next CF.
> This last CF for PG11 should generally be restricted to patches that
> have gone through review in prior CFs or are modest in scope.

As of right now, there are 229 entries in this commitfest (not counting
the items already committed or RWF).

It's hard for me to tell for sure, because the activity log page doesn't
stretch back far enough, but I think about 40 of those have been created
in the last 24 hours.  It's certainly well over 20, because the log does
go back far enough to show 21 patch records created since about noon
Wednesday UTC.

This is NOT how it is supposed to work.  This is gaming the system.

I think that we should summarily bounce to the September 'fest anything
submitted in the last two days; certainly anything that's nontrivial.

There is no way that we can possibly handle 200+ CF entries in a month.
A large fraction of these are going to end up pushed to September no
matter what, and I think simple fairness demands that we spend our
time on the ones that are not Johnny-come-latelies.

We also need to be pretty hard-nosed about quickly bouncing anything
that's not realistically going to be able to get committed this month.
I've not gone through the list in detail, but there are at least several
waiting-on-author items that have seen no activity since the last 'fest.
I'd recommend marking those RWF pretty soon.

regards, tom lane

Re: PATCH: Unlogged tables re-initialization tests

2018-02-28 Thread Thomas Munro

On Thu, Mar 1, 2018 at 9:24 AM, David Steele  wrote:
> These tests were originally included in the exclude unlogged tables
> patch [1] to provide coverage for the refactoring of reinit.c.

Hi David,

+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.

Could you please explain this a bit more?  I don't see any symlinks
being used directly in this test.  I do see CREATE TABLESPACE and that
uses symlink(), but win32_port.h converts that to "junctions",
apparently the Windows equivalent.  Is there some reason that doesn't
work with this test?

If, by any chance, you are talking about whatever dark force prevents
the "tablespace" regression test from succeeding when run as a user
with administrative privileges on Windows, then I would *love* to hear
an explanation for that phenomenon and how to fix it, because it
currently prevents me from automatically testing all Commitfest
patches on the appveyor.com Windows build farm.  I know that it passes
as a non-administrative user.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Why chain of snapshots is used in ReorderBufferCommit?

2018-02-28 Thread Andres Freund

Hi,

On 2018-03-01 08:17:33 +0300, Arseny Sher wrote:
> While prowling through snapbuild & reorderbuffer code, I wondered: why a queue
> of snapshots is used for replaying each transaction instead of just picking up
> snapshot from snapbuilder once when COMMIT record is read? I am not aware of 
> any
> DDL/DML mix which would make this invalid: e.g. we can't insert something into
> table in xact T1, then alter a column in xact T2, and then insert something 
> more
> in T1. All ALTER TABLE forms which are currently interesting from the decoding
> POV obtain ACCESS EXCLUSIVE lock, conflicting with any other manipulations on 
> the
> table:
> https://www.postgresql.org/docs/devel/static/sql-altertable.html

I don't think that's right. For one there's plenty types of DDL where no
such locking exists, consider e.g. composite datums where additional
columns can be added even after such a type is already used by another
table. For another, T1 might need to see a DDL change to a table that
has been made by T2 after T1 started, which is not prohibited by locking
if T1 uses the table first after T2 commits.

- Andres

Re: Online enabling of checksums

2018-02-28 Thread Andrey Borodin

> 28 февр. 2018 г., в 22:06, Robert Haas  написал(а):
> 
> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander  wrote:
>> Also if that wasn't clear -- we only do the full page write if there isn't
>> already a checksum on the page and that checksum is correct.
> 
> Hmm.
> 
> Suppose that on the master there is a checksum on the page and that
> checksum is correct, but on the standby the page contents differ in
> some way that we don't always WAL-log, like as to hint bits, and there
> the checksum is incorrect.  Then you'll enable checksums when the
> standby still has some pages without valid checksums, and disaster
> will ensue.
> 
> I think this could be hard to prevent if checksums are turned on and
> off multiple times.

This seems 100% valid concern. If pages can be binary different (on master and 
standby), we have to log act of checksumming a page.
And WAL replay would have to verify that his checksum is OK.
What is unclear to me is what standby should do if he sees incorrect checksum? 
Change his page? Request page from master? Shutdown?

Or should we just WAL-log page whenever it is checksummed by worker? Even if 
the checksum was correct?

Best regards, Andrey Borodin.

Re: CALL optional in PL/pgSQL

2018-02-28 Thread David G. Johnston

On Wednesday, February 28, 2018, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
>
>
> I seem to recall that there were past discussions about this, with
> respect to the PERFORM command, but I couldn't find them anymore.
>

I'm thinking you are thinking of this one.

https://www.postgresql.org/message-id/CAFjFpReVcC%2BRE3WBJb2X1-O%3D_%2BORiZggDZ-Orr0QafeuAC7k-w%40mail.gmail.com

David J.

RE: [bug fix] Produce a crash dump before main() on Windows

2018-02-28 Thread Tsunakawa, Takayuki

From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
> Another idea to add to the current patch is to move the call to SetErrorMode()
> to the below function, which is called first in main().  How about this?
> 
> void
> pgwin32_install_crashdump_handler(void)
> {
>   SetUnhandledExceptionFilter(crashDumpHandler);
> }

I moved SetErrorMode() to the beginning of main().  It should be placed before 
any code which could crash.  The current location is a bit late: in fact, 
write_stderr() crashed when WSAStartup() failed.

Regards
Takayuki Tsunakawa



crash_dump_before_main_v2.patch
Description: crash_dump_before_main_v2.patch

Re: [HACKERS] path toward faster partition pruning

2018-02-28 Thread Ashutosh Bapat

On Thu, Mar 1, 2018 at 6:57 AM, Amit Langote
 wrote:
> On 2018/02/28 19:14, Ashutosh Bapat wrote:
>> On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote:
>>> BTW, should there be a relevant test in partition_join.sql?  If yes,
>>> attached a patch (partitionwise-join-collation-test-1.patch) to add one.
>>
>> A partition-wise join path will be created but discarded because of
>> higher cost. This test won't see it in that case. So, please add some
>> data like other tests and add command to analyze the partitioned
>> tables. That kind of protects from something like that.
>
> Thanks for the review.
>
> Hmm, the added test is such that the partition collations won't match, so
> partition-wise join won't be considered at all due to differing
> PartitionSchemes, unless I'm missing something.
>

The point is we wouldn't know whether PWJ was not selected because of
PartitionScheme mismatch OR the PWJ paths were expensive compared to
non-PWJ as happens with empty tables. In both the cases we will see a
non-PWJ "plan" although in one case PWJ was not possible and in the
other it was possible. I think what we want to test is that PWJ Is not
possible with collation mismatch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

CALL optional in PL/pgSQL

2018-02-28 Thread Peter Eisentraut

This seems to be a popular issue when porting from PL/SQL, so I'll throw
it out here for discussion.  Apparently, in PL/SQL you can call another
procedure without the CALL keyword.  Here is a patch that attempts to
implement that in PL/pgSQL as well.  It's not very pretty.

I seem to recall that there were past discussions about this, with
respect to the PERFORM command, but I couldn't find them anymore.

Also, I think PL/SQL allows you to call a procedure with no arguments
without parentheses.  I have not implemented that.  I think it could be
done, but it's not very appealing.

If anyone has more details about the PL/SQL side of this, that would be
useful.  What I could find is that using CALL and not using CALL appear
to be equivalent.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ffd93d16fd046bf0352eec914f3ae087390370ef Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 28 Feb 2018 23:22:17 -0500
Subject: [PATCH] PL/pgSQL: Allow calling procedures without CALL keyword

For compatibility with Oracle, allow calling procedures without the CALL
keyword.  So

BEGIN
  someproc();
END

is the same thing as

BEGIN
  CALL someproc();
END

This works as long as someproc is not a keyword.
---
 src/pl/plpgsql/src/expected/plpgsql_call.out |  6 -
 src/pl/plpgsql/src/pl_gram.y | 34 ++--
 src/pl/plpgsql/src/sql/plpgsql_call.sql  |  2 ++
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out 
b/src/pl/plpgsql/src/expected/plpgsql_call.out
index e2442c603c..11288ede87 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -43,6 +43,8 @@ AS $$
 BEGIN
 CALL test_proc3(y);
 CALL test_proc3($1);
+test_proc3(y);
+public.test_proc3(y);
 END;
 $$;
 CALL test_proc4(66);
@@ -51,7 +53,9 @@ SELECT * FROM test1;
 
  66
  66
-(2 rows)
+ 66
+ 66
+(4 rows)
 
 DROP PROCEDURE test_proc1;
 DROP PROCEDURE test_proc2;
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 688fbd6531..3c9b3e9fce 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -1921,7 +1921,25 @@ stmt_execsql : K_IMPORT
plpgsql_push_back_token(tok);
if (tok == '=' || tok == 
COLON_EQUALS || tok == '[')

word_is_not_variable(&($1), @1);
-   $$ = make_execsql_stmt(T_WORD, 
@1);
+   else if (tok == '(' &&
+   /* check for 
keywords that can be followed by ( */
+($1.quoted || 
(strcmp($1.ident, "explain") != 0 &&
+   
strcmp($1.ident, "reindex") != 0 &&
+   
strcmp($1.ident, "select") != 0 &&
+   
strcmp($1.ident, "vacuum") != 0 &&
+   
strcmp($1.ident, "values") != 0)))
+   {
+   PLpgSQL_stmt_execsql 
*new;
+
+   new = 
palloc0(sizeof(PLpgSQL_stmt_execsql));
+   new->cmd_type = 
PLPGSQL_STMT_EXECSQL;
+   new->lineno = 
plpgsql_location_to_lineno(@1);
+   new->sqlstmt = 
read_sql_stmt(psprintf("CALL %s", quote_identifier(($1).ident)));
+
+   $$ = (PLpgSQL_stmt 
*)new;
+   }
+   else
+   $$ = 
make_execsql_stmt(T_WORD, @1);
}
| T_CWORD
{
@@ -1931,7 +1949,19 @@ stmt_execsql : K_IMPORT
plpgsql_push_back_token(tok);
if (tok == '=' || tok == 
COLON_EQUALS || tok == '[')

cword_is_not_variable(&($1), @1);
-   $$ = make_execsql_stmt(T_CWORD, 
@1);
+   else

Re: [HACKERS] SERIALIZABLE with parallel query

2018-02-28 Thread Thomas Munro

On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munro
 wrote:
> I've now broken it into two patches.

Rebased.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v13.patch
Description: Binary data


0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v13.patch
Description: Binary data

fixing more format truncation issues

2018-02-28 Thread Peter Eisentraut

In 6275f5d28a1577563f53f2171689d4f890a46881, we fixed warnings from the
options -Wformat-overflow and -Wformat-truncation, which are part of
-Wall in gcc 7.

Here, I propose to dial this up a bit by adding -Wformat-overflow=2
-Wformat-truncation=2, which use some more worst-case approaches for
estimating the buffer sizes.

AFAICT, the issues addressed here either can't really happen without
trying very hard, or would cause harmless output truncation.  Still, it
seems good to clean this up properly and not rely on made-up buffer size
guesses that turn out to be wrong, even if we don't want to adopt the
warning options by default.

One issue that is of external interest is that I increase BGW_MAXLEN
from 64 to 96.  Apparently, the old value would cause the bgw_name of
logical replication workers to be truncated in some circumstances.  I
have also seen truncated background worker names with third-party
packages, so giving some more room here would be useful.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 99316e6db5a98f5daaaf61a75a01b88e7b2f39bf Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 28 Feb 2018 23:11:24 -0500
Subject: [PATCH] Fix more format truncation issues

Add compiler warning options -Wformat-overflow=2 -Wformat-truncation=2,
supported since GCC 7, and fix the warnings that they create.

The issues are all harmless, but some dubious coding patterns are
cleaned up.
---
 configure| 71 
 configure.in |  3 ++
 contrib/pgstattuple/pgstattuple.c|  2 +-
 src/backend/commands/explain.c   |  5 ++-
 src/backend/utils/adt/dbsize.c   |  2 +-
 src/backend/utils/adt/float.c| 24 +--
 src/backend/utils/adt/formatting.c   | 33 +--
 src/backend/utils/misc/guc.c |  4 +-
 src/bin/initdb/initdb.c  |  6 +--
 src/bin/pg_dump/pg_backup_archiver.c |  2 +-
 src/bin/pg_dump/pg_backup_tar.c  |  2 +-
 src/bin/pgbench/pgbench.c|  4 +-
 src/include/postmaster/bgworker.h|  2 +-
 src/interfaces/libpq/fe-secure-openssl.c |  2 +-
 src/pl/tcl/pltcl.c   |  2 +-
 15 files changed, 111 insertions(+), 53 deletions(-)

diff --git a/configure b/configure
index 7dcca506f8..48a7546513 100755
--- a/configure
+++ b/configure
@@ -4595,6 +4595,77 @@ if test x"$pgac_cv_prog_cc_cflags__Wformat_security" = 
x"yes"; then
   CFLAGS="$CFLAGS -Wformat-security"
 fi
 
+  # gcc 7+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports 
-Wformat-overflow=2" >&5
+$as_echo_n "checking whether $CC supports -Wformat-overflow=2... " >&6; }
+if ${pgac_cv_prog_cc_cflags__Wformat_overflow_2+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS -Wformat-overflow=2"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_cc_cflags__Wformat_overflow_2=yes
+else
+  pgac_cv_prog_cc_cflags__Wformat_overflow_2=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_prog_cc_cflags__Wformat_overflow_2" >&5
+$as_echo "$pgac_cv_prog_cc_cflags__Wformat_overflow_2" >&6; }
+if test x"$pgac_cv_prog_cc_cflags__Wformat_overflow_2" = x"yes"; then
+  CFLAGS="$CFLAGS -Wformat-overflow=2"
+fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports 
-Wformat-truncation=2" >&5
+$as_echo_n "checking whether $CC supports -Wformat-truncation=2... " >&6; }
+if ${pgac_cv_prog_cc_cflags__Wformat_truncation_2+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS -Wformat-truncation=2"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_cc_cflags__Wformat_truncation_2=yes
+else
+  pgac_cv_prog_cc_cflags__Wformat_truncation_2=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_prog_cc_cflags__Wformat_truncation_2" >&5
+$as_echo "$pgac_cv_prog_cc_cflags__Wformat_truncation_2" >&6; }
+if test x"$pgac_cv_prog_cc_cflags__Wformat_truncation_2" = x"yes"; then
+  CFLAGS="$CFLAGS -Wformat-truncation=2"
+fi
+
   # Disable strict-aliasing rules; needed for gcc 3.3+
   { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports

chained transactions

2018-02-28 Thread Peter Eisentraut

This feature is meant to help manage transaction isolation in
procedures.  I proposed elsewhere a patch that allows running SET
TRANSACTION in PL/pgSQL.  But if you have complex procedures that commit
many times in different branches perhaps, you'd have to do this in every
new transaction, which would create code that is difficult to manage.

The SQL standard offers the "chained transactions" feature to address
this.  The new command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN
immediately start a new transaction with the characteristics (isolation
level, read/write, deferrable) of the previous one.  So code that has
particular requirements regard transaction isolation and such can use
this to simplify code management.

In the patch series, 0001 through 0006 is some preparatory code cleanup
that is useful independently.  0007 is the implementation of the feature
for the main SQL environment, 0008 is for PL/pgSQL.  Support in other
PLs could be added easily.

The patch series also requires the "SET TRANSACTION in PL/pgSQL" patch
for use in the test suite.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ce07129f4d7ba376c59fa5d8244953a4eec05027 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 16 Feb 2018 20:29:20 -0500
Subject: [PATCH v1 1/8] Update function comments

After a6542a4b6870a019cd952d055d2e7af2da2fe102, some function comments
were misplaced.  Fix that.
---
 src/backend/access/transam/xact.c | 40 ++-
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index dbaaf8e005..d7688879a3 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -3190,12 +3190,25 @@ PreventTransactionChain(bool isTopLevel, const char 
*stmtType)
 }
 
 /*
- * These two functions allow for warnings or errors if a command is
- * executed outside of a transaction block.
+ * WarnNoTranactionChain
+ * RequireTransactionChain
+ *
+ * These two functions allow for warnings or errors if a command is 
executed
+ * outside of a transaction block.  This is useful for commands that have 
no
+ * effects that persist past transaction end (and so calling them outside a
+ * transaction block is presumably an error).  DECLARE CURSOR is an 
example.
+ * While top-level transaction control commands (BEGIN/COMMIT/ABORT) and 
SET
+ * that have no effect issue warnings, all other no-effect commands 
generate
+ * errors.
+ *
+ * If we appear to be running inside a user-defined function, we do not
+ * issue anything, since the function could issue more commands that make
+ * use of the current statement's results.  Likewise subtransactions.
+ * Thus this is an inverse for PreventTransactionChain.
  *
- * While top-level transaction control commands (BEGIN/COMMIT/ABORT) and
- * SET that have no effect issue warnings, all other no-effect commands
- * generate errors.
+ * isTopLevel: passed down from ProcessUtility to determine whether we are
+ * inside a function.
+ * stmtType: statement type name, for warning or error messages.
  */
 void
 WarnNoTransactionChain(bool isTopLevel, const char *stmtType)
@@ -3209,23 +3222,6 @@ RequireTransactionChain(bool isTopLevel, const char 
*stmtType)
CheckTransactionChain(isTopLevel, true, stmtType);
 }
 
-/*
- * RequireTransactionChain
- *
- * This routine is to be called by statements that must run inside
- * a transaction block, because they have no effects that persist past
- * transaction end (and so calling them outside a transaction block
- * is presumably an error).  DECLARE CURSOR is an example.
- *
- * If we appear to be running inside a user-defined function, we do not
- * issue anything, since the function could issue more commands that make
- * use of the current statement's results.  Likewise subtransactions.
- * Thus this is an inverse for PreventTransactionChain.
- *
- * isTopLevel: passed down from ProcessUtility to determine whether we are
- * inside a function.
- * stmtType: statement type name, for warning or error messages.
- */
 static void
 CheckTransactionChain(bool isTopLevel, bool throwError, const char *stmtType)
 {

base-commit: 51057feaa6bd24b51e6a4715c2090491ef037534
prerequisite-patch-id: 767f2e4d21b2ba347d4d08c3257abb91921fdb7b
-- 
2.16.2

From c1d5189f52537d4632d0a4069a732e8666c123e1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 16 Feb 2018 20:44:15 -0500
Subject: [PATCH v1 2/8] Rename TransactionChain functions

We call this thing a "transaction block" everywhere except in a few
functions, where it is mysteriously called a "transaction chain".  In
the SQL standard, a transaction chain is something different.  So rename
these functions to match

Re: Support for ECDSA & ed25519 digital signatures in pgcrypto?

2018-02-28 Thread Bruce Momjian

On Sun, Feb  4, 2018 at 04:38:24PM +0530, Nilesh Trivedi wrote:
> I recently had to build ed25519 digital signature validation in PostgreSQL.
> Since pgcrypto doesn't
> support these methods, I had to look into PL/Python and PL/v8 based
> implementations. The
> experience turned out to be very poor (documented here: 
> https://gist.github.com
> /nileshtrivedi
> /7cd622d4d521986593bff81bfa1e5893
> 
> I think OpenSSL already supports these encryption methods and it would be 
> great
> to have them
> supported within pgcrypto - especially with the advent of distributed systems
> like IPFS, public
> blockchains like BitCoin, Ethereum. Elliptic curve cryptography has some major
> advantages over
> RSA: for both security and usability. Some are listed here: https://
> ed25519.cr.yp.to/
> 
> Is somebody working on this? I'm not a C programmer but if needed, I can look
> into implementing
> this.

I agree there is going to be a lot more focus on ECDSA because elliptic
curve cryptography is much more efficient for large key sizes, see:

https://momjian.us/main/writings/pgsql/tls.pdf#page=17

and RSA can't support elliptic curve.  Chrome accessing mail.google.com
is already using ECDSA:

https://momjian.us/main/writings/pgsql/tls.pdf#page=16

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

faster testing with symlink installs

2018-02-28 Thread Peter Eisentraut

I'm proposing a way to make test runs a bit faster.

A fair amount of time is taken by creating the test installation.  Not
huge compared to the whole test run, but still long enough to get
boring.  The idea (that I stole from somewhere else) is that we make the
installation as symlinks pointing back to the source tree.  So after the
first run, the installation is automatically up to date after each
build.  Then we can skip the whole temp install run if we find that we
already have one (unless you change the set of installed files in a
major way, which is very rate).  This also helps if you use make
installcheck: You just need to rebuild and restart the server, skipping
the make install run.

So that's what the patch does: make install INSTALL_SYMLINKS=1 installs
files by making relative symlinks.  make check TEMP_INSTALL_SYMLINKS=1
creates the temporary installation using symlinks and skips the install
step if a tmp_install already exists.  It requires the "ln" program from
GNU coreutils.  So it would be optional.

Except ... this doesn't actually work.  find_my_exec() resolves symlinks
to find the actual program installation, and so for example the
installed initdb will look for postgres in src/bin/initdb/.  I would
like to revert this behavior, because it seems to do more harm than
good.  The original commit message indicates that this would allow
symlinking a program to somewhere outside of the installation tree and
still allow it to work and find its friends.  But that could just as
well be done with a shell script.

Reverting this behavior would also allow Homebrew-like installations to
work better.  The actual installation is in a versioned directory like
/usr/local/Cellar/postgresql/10.1/... but symlinked to
/usr/local/opt/postgresql/.  But because of the symlink resolution,
calling /usr/local/opt/postgresql/bin/pg_config returns paths under
Cellar, which then causes breakage of software built against that path
on postgresql upgrades the change the version number.

Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From f101d14195d5bc26f09a84e823f06b6c422fe444 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 28 Jan 2018 12:24:14 -0500
Subject: [PATCH 1/3] Remove duplicate installation of some header files

In non-vpath builds, some header files such as pg_config.h were actually
installed twice (to the same target location).  Rearrange the make rules
a bit to avoid that.
---
 src/include/Makefile | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/include/Makefile b/src/include/Makefile
index a689d352b6..2d1baca04a 100644
--- a/src/include/Makefile
+++ b/src/include/Makefile
@@ -39,13 +39,7 @@ install: all installdirs
$(INSTALL_DATA) $(srcdir)/port.h 
'$(DESTDIR)$(includedir_internal)'
$(INSTALL_DATA) $(srcdir)/postgres_fe.h  
'$(DESTDIR)$(includedir_internal)'
$(INSTALL_DATA) $(srcdir)/libpq/pqcomm.h 
'$(DESTDIR)$(includedir_internal)/libpq'
-# These headers are needed for server-side development
-   $(INSTALL_DATA) pg_config.h '$(DESTDIR)$(includedir_server)'
-   $(INSTALL_DATA) pg_config_ext.h '$(DESTDIR)$(includedir_server)'
-   $(INSTALL_DATA) pg_config_os.h  '$(DESTDIR)$(includedir_server)'
-   $(INSTALL_DATA) utils/errcodes.h '$(DESTDIR)$(includedir_server)/utils'
-   $(INSTALL_DATA) utils/fmgroids.h '$(DESTDIR)$(includedir_server)/utils'
-   $(INSTALL_DATA) utils/fmgrprotos.h 
'$(DESTDIR)$(includedir_server)/utils'
+# Headers for server-side development
 # We don't use INSTALL_DATA for performance reasons --- there are a lot of 
files
cp $(srcdir)/*.h '$(DESTDIR)$(includedir_server)'/ || exit; \
chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/*.h  || 
exit; \
@@ -54,7 +48,8 @@ install: all installdirs
  chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$dir/*.h 
 || exit; \
done
 ifeq ($(vpath_build),yes)
-   for file in dynloader.h catalog/schemapg.h parser/gram.h 
storage/lwlocknames.h utils/probes.h; do \
+   for file in pg_config.h pg_config_ext.h pg_config_os.h utils/errcodes.h 
utils/fmgroids.h utils/fmgrprotos.h \
+ dynloader.h catalog/schemapg.h parser/gram.h storage/lwlocknames.h 
utils/probes.h; do \
  cp $$file '$(DESTDIR)$(includedir_server)'/$$file || exit; \
  chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$file || 
exit; \
done

base-commit: 51057feaa6bd24b51e6a4715c2090491ef037534
-- 
2.16.2

From afc5738edc445f577d3b7da5503ff61d1debe953 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sun, 28 Jan 2018 12:24:37 -0500
Subject: [PATCH 2/3] Don't resolve symlinks in find_my_exec()

This reverts 336969e490d71c316a42fabeccda87f798e562dd.  That feature
allowed symlinking a binary to a location outside of the tree and still
be able to find the

Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-02-28 Thread Michael Paquier

On Thu, Mar 01, 2018 at 01:26:32AM +, Tsunakawa, Takayuki wrote:
> BTW, should pg_rewind really copy WAL files from the primary?  If the
> sole purpose of pg_rewind is to recover an instance to use as a
> standby, can pg_rewind just remove all WAL files in the target
> directory, because the standby can get WAL files from the primary
> and/or archive?

Yes, it should not copy those WAL files.  Most of the time they are going
to be meaningless.  See this recent thread:
https://www.postgresql.org/message-id/20180126023609.GH17847%40paquier.xyz
So I would rather go this way instead of having to worry about
manipulating those WAL segments as you do.  Depending on the needs, I
think that even a backpatch could be considered.

> Related to this, shouldn't pg_rewind avoid copying more files and
> directories like pg_basebackup?  Currently, pg_rewind doesn't copy
> postmaster.pid, postmaster.opts, and temporary files/directories
> (pg_sql_tmp/). 

Yes, it should not copy those files.  I have a patch in the current CF
to do that:
https://commitfest.postgresql.org/17/1507/
--
Michael

signature.asc
Description: PGP signature

Synchronous replay take III

2018-02-28 Thread Thomas Munro

Hi hackers,

I was pinged off-list by a fellow -hackers denizen interested in the
synchronous replay feature and wanting a rebased patch to test.  Here
it goes, just in time for a Commitfest.  Please skip to the bottom of
this message for testing notes.

In previous threads[1][2][3] I called this feature proposal "causal
reads".  That was a terrible name, borrowed from MySQL.  While it is
probably a useful term of art, for one thing people kept reading it as
"casual", which it ain't, and more importantly this patch is only one
way to achieve read-follows-write causal consistency.  Several others
are proposed or exist in forks (user managed wait-for-LSN, global
transaction manager, ...).

OVERVIEW

For writers, it works a bit like RAID mirroring: when you commit a
write transaction, it waits until the data has become visible on all
elements of the array, and if an array element is not responding fast
enough it is kicked out of the array.  For readers, it's a little
different because you're connected directly to the array elements
(rather than going through a central controller), so it uses a system
of leases allowing read transactions to know instantly and whether
they are running on an element that is currently in the array and are
therefore able to service synchronous_replay transactions, or should
raise an error telling you to go and ask some other element.

This is a design choice favouring read-mostly workloads at the expense
of write transactions.  Hot standbys' whole raison for existing is to
move *some* read-only workloads off the primary server.  This proposal
is for users who are prepared to trade increased primary commit
latency for a guarantee about visibility on the standbys, so that
*all* read-only work could be moved to hot standbys.

The guarantee is: When two transactions tx1, tx2 are run with
synchronous_replay set to on and tx1 reports successful commit before
tx2 begins, then tx1 is guaranteed either to see tx1 or to raise a new
error 40P02 if it is run on a hot standby.  I have joked that that
error means "snapshot too young".  You could handle it the same way
you handle deadlocks and serialization failures: by retrying, except
in this case you might want to avoid that node for a while.

Note that this feature is concerned with transaction visibility.  It
is not concerned with transaction durability.  It will happily kick
all of your misbehaving or slow standbys out of the array so that you
fall back to single-node commit durability.  You can express your
durability requirement (ie I must have have N copies of the data on
disk before I tell any external party about a transaction) separately,
by configuring regular synchronous replication alongside this feature.
I suspect that this feature would be most popular with people who are
already using regular synchronous replication though, because they
already tolerate higher commit latency.

STATUS

Here's a quick summary of the status of this proposal as I see it:

* Simon Riggs, as the committer most concerned with the areas this
proposal touches -- namely streaming replication and specifically
syncrep -- has not so far appeared to be convinced by the value of
this approach, and has expressed a preference for pursuing client-side
or middleware tracked LSN tokens exclusively.  I am perceptive enough
to see that failing to sell the idea to Simon is probably fatal to the
proposal.  The main task therefore is to show convincingly that there
is a real use case for this high-level design and its set of
trade-offs, and that it justifies its maintenance burden.

* I have tried to show that there are already many users who route
their read-only queries to hot standby databases (not just "reporting
queries"), and libraries and tools to help people do that using
heuristics like "logged in users need fresh data, so primary only" or
"this session has written in the past N minutes, so primary only".
This proposal would provide a way for those users to do something
based on a guarantee instead of such flimsy heuristics.  I have tried
to show that the libraries used by Python, Ruby, Java etc to achieve
that sort of load balancing should easily be able to handle finding
read-only nodes, routing read-only queries and dealing with the new
error.  I do also acknowledge that such libraries could also be used
to provide transparent read-my-writes support by tracking LSNs and
injecting wait-for-LSN directives with alternative proposals, but that
is weaker than a global reads-follow-writes guarantee and the
difference can matter.

* I have argued that token-based systems are in fact rather
complicated[4] and by no means a panacea.  As usual, there are a whole
bunch of trade-offs.  I suspect that this proposal AND fully
user-managed causality tokens (no middleware) are both valuable sweet
spots for a non-GTM system.

* Ants Aasma pointed out that this proposal doesn't provide a
read-follows-read guarantee.  He is right, and I'm not sure to what
extent that is a

Re: [HACKERS] path toward faster partition pruning

2018-02-28 Thread Amit Langote

On 2018/02/28 19:14, Ashutosh Bapat wrote:
> On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote:
>> BTW, should there be a relevant test in partition_join.sql?  If yes,
>> attached a patch (partitionwise-join-collation-test-1.patch) to add one.
> 
> A partition-wise join path will be created but discarded because of
> higher cost. This test won't see it in that case. So, please add some
> data like other tests and add command to analyze the partitioned
> tables. That kind of protects from something like that.

Thanks for the review.

Hmm, the added test is such that the partition collations won't match, so
partition-wise join won't be considered at all due to differing
PartitionSchemes, unless I'm missing something.

Thanks,
Amit

[bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-02-28 Thread Tsunakawa, Takayuki

Hello,

Our customer hit another bug of pg_rewind with PG 9.5.  The attached patch 
fixes this.


PROBLEM


After a long run of successful pg_rewind, the synchronized standby could not 
catch up the primary forever, emitting the following message repeatedly:

LOG:  XX000: could not read from log segment 0006028A0031, offset 
16384: No error


CAUSE


If the primary removes WAL files that pg_rewind is going to get, pg_rewind 
leaves 0-byte WAL files in the target directory here:

[libpq_fetch.c]
case FILE_ACTION_COPY:
/* Truncate the old file out of the way, if any 
*/
open_target_file(entry->path, true);
fetch_file_range(entry->path, 0, 
entry->newsize);
break;

pg_rewind completes successfully, create recovery.conf, and then start the 
standby in the target cluster.  walreceiver receives WAL records and write them 
to the 0-byte WAL files.  Finally, xlog reader complains that he cannot read a 
WAL page.


FIX


pg_rewind deletes the file when it finds that the primary has deleted it.


OTHER THOUGHTS


BTW, should pg_rewind really copy WAL files from the primary?  If the sole 
purpose of pg_rewind is to recover an instance to use as a standby, can 
pg_rewind just remove all WAL files in the target directory, because the 
standby can get WAL files from the primary and/or archive?

Related to this, shouldn't pg_rewind avoid copying more files and directories 
like pg_basebackup?  Currently, pg_rewind doesn't copy postmaster.pid, 
postmaster.opts, and temporary files/directories (pg_sql_tmp/).

Regards
Takayuki Tsunakawa




pg_rewind_corrupt_wal.patch
Description: pg_rewind_corrupt_wal.patch

Re: Online enabling of checksums

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 03:42, Alvaro Herrera  wrote:

> I noticed that pg_verify_checksum takes an "-o oid" argument to only
> check the relation with that OID; but that's wrong, because the number
> is a relfilenode, not an OID (since it's compared to the on-disk file
> name).  I would suggest changing everything to clarify that it's a
> pg_class.relfilenode value, otherwise it's going to be very confusing.
> Maybe use "-f filenode" if -f is available?
>
>
I see this mistake/misunderstanding enough that I'd quite like to change
how we generate relfilenode IDs, making them totally independent of the oid
space.

Unsure how practical it is, but it'd be so nice to get rid of that trap.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 09:00, Justin Pryzby  wrote:

>
> > >  - started in single user mode or with system indices disabled?
> > why?
>
> Some of these I suggested just as a datapoint (or other brainstorms I
> couldn't
> immediately reject).  A cluster where someone has UPDATED pg_* (even
> pg_statistic) or otherwise hacked on I would tend to think about
> differently
> than a "pristine" cluster that's never seen anything more interesting than
> a
> join.

How about "has run in a state where system catalog modifications are
permitted". Because I've had one seriously frustrating case where that
would've been extremely pertinent information.

That said, I'm not convinced it's worth the effort and I think this is
going off into the weeds a little. Lets focus on Andres's core suggestion
(and maybe refine the fsync part to get rid of false positives due to bulk
loading) and build on from there in subsequent work.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online enabling of checksums

2018-02-28 Thread Daniel Gustafsson

> On 01 Mar 2018, at 05:07, Tomas Vondra  wrote:
> 
> On 02/28/2018 08:42 PM, Alvaro Herrera wrote:
>> I noticed that pg_verify_checksum takes an "-o oid" argument to only
>> check the relation with that OID; but that's wrong, because the number
>> is a relfilenode, not an OID (since it's compared to the on-disk file
>> name).  I would suggest changing everything to clarify that it's a
>> pg_class.relfilenode value, otherwise it's going to be very confusing.
>> Maybe use "-f filenode" if -f is available?
> 
> I'd argue this is merely a mistake in the --help text.

Agreed, the --help text isn’t really clear in this case and should be updated
to say something along the lines of:

printf(_("  -o relfilenode  check only relation with specified relfilenode\n"));

cheers ./daniel

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 06:28, Justin Pryzby  wrote:

> On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> > On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > > I can see why you'd want that, but as a DBA, I don't necessarily want
> > > all of that recorded, especially in a quasi-permanent way.
> >
> > Huh?  You're arguing that we should make it easier for DBAs to hide
> > potential causes of corruption? I fail to see when that's necessary /
> > desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> > nor does it contain private data...
>
> The more fine grained these are the more useful they can be:
>
> Running with fsync=off is common advice while loading, so reporting that
> "fsync=off at some point" is much less useful than reporting "having to run
> recovery from an fsync=off database".
>
> I propose there could be two bits:
>
>  1. fsync was off at some point in history when the DB needed recovery;
>  2. fsync was off during the current instance; I assume the bit would be
> set
> whenever fsync=off was set, but could be cleared when the server was
> cleanly shut down.  If the bit is set during recovery, the first bit
> would
> be permanently set.  Not sure but if this bit is set and then fsync
> re-enabled, maybe this could be cleared after any checkpoint and not
> just
> shutdown ?
>
>
I think that's a very useful way to eliminate false positives and make this
diagnostic field more useful. But we'd need to guarantee that when you've
switched from fsync=off to fsync=on, we've actually fsync'd every extent of
every table and index, whether or not we think they're currently dirty and
whether or not the smgr currently has them open. Do something like initdb
does to flush everything. Without that, we could have WAL-vs-heap ordering
issues and so on *even after a Pg restart* if the system has lots of dirty
writeback to do.

A quick look at CheckPointGuts and CheckPointBuffers suggests we don't
presently do that but I admit I haven't read in depth.

We force a fsync of the current WAL segment when switching fsync on, but
don't do anything else AFAICS.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 05:43, Andres Freund  wrote:

> Hi,
>
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used
>
> What do others think?
>
>
A huge +1 from me for the idea. I can't even count the number of black box
"WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
turned out to be down to the user doing something very silly and not saying
anything about it.

It's only some flags, so putting it in pg_control is arguably somewhat
wasteful but so minor as to be of no real concern. And that's probably the
best way to make sure it follows the cluster around no matter what
backup/restore/copy mechanisms are used and how "clever" they try to be.

What I'd _really_ love would be to blow the scope of this up a bit and turn
it into a key-events cluster journal, recording key param switches,
recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into
people with weird workloads who managed to make it some massive file, we'd
have to make sure we had a way to stop it getting left out of
copies/backups, and it'd generally be irritating. So lets not do that.
Proper support for class-based logging and multiple outputs would be a good
solution for this at some future point.

What you propose is simple enough to be quick to implement, adds no admin
overhead, and will be plenty useful enough.

I'd like to add "postmaster.pid was absent when the cluster started" to
this list, please. Sure, it's not conclusive, and there are legit reasons
why that might be the case, but so often it's somebody kill -9'ing the
postmaster, then removing the postmaster.pid and starting up again without
killing the workers

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Two small patches for the isolationtester lexer

2018-02-28 Thread Daniel Gustafsson

> On 01 Mar 2018, at 06:01, Tom Lane  wrote:
> Daniel Gustafsson  writes:

>> I agree, patch 0002 was broken and the correct fix is a much bigger project -
>> one too big for me to tackle right now (but hopefully at some point in the 
>> near
>> future).  Thanks for the review of it though!
> 
> OK.  I'm going to mark this commitfest entry closed, since the other patch
> is in and this one needs a good bit more work.  Please start a new thread,
> or at least a new CF entry, if you do more work in this area.

I absolutely agree, thanks!

cheers ./daniel

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-02-28 Thread Amit Langote

On 2018/03/01 1:03, Robert Haas wrote:
> On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
>  wrote:
>> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
>> Following the lead of edd44738bc88 ("Be lazier about partition tuple
>> routing.") this incarnation only does the necessary push-ups for the
>> specific partition that needs it, at execution time.  As far as I can
>> tell, it works as intended.
>>
>> I chose to refuse the case where the DO UPDATE clause causes the tuple
>> to move to another partition (i.e. you're updating the partition key of
>> the tuple).  While it's probably possible to implement that, it doesn't
>> seem a very productive use of time.
> 
> I would have thought that to be the only case we could support with
> the current infrastructure.  Doesn't a correct implementation for any
> other case require a global index?

I'm thinking that Alvaro is talking here about the DO UPDATE action part,
not the conflict checking part.  The latter will definitely require global
indexes if conflict were to be checked on columns not containing the
partition key.

The case Alvaro mentions arises after checking the conflict, presumably
using an inherited unique index whose keys must include the partition
keys. If the conflict action is DO UPDATE and its SET clause changes
partition key columns such that the row will have to change the partition,
then the current patch will result in an error.  I think that's because
making update row movement work in this case will require some adjustments
to what 2f178441044 (Allow UPDATE to move rows between partitions)
implemented.  We wouldn't have things set up in the ModifyTableState that
the current row-movement code depends on being set up; for example, there
wouldn't be per-subplan ResultRelInfo's in the ON CONFLICT DO UPDATE case.
 The following Assert in ExecUpdate() will fail for instance:

map_index = resultRelInfo - mtstate->resultRelInfo;
Assert(map_index >= 0 && map_index < mtstate->mt_nplans);

Thanks,
Amit

Re: row filtering for logical replication

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 07:03, Euler Taveira  wrote:

> Hi,
>
> The attached patches add support for filtering rows in the publisher.
> The output plugin will do the work if a filter was defined in CREATE
> PUBLICATION command. An optional WHERE clause can be added after the
> table name in the CREATE PUBLICATION such as:
>
> CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <=
> 3000);
>
> Row that doesn't match the WHERE clause will not be sent to the
> subscribers.
>
> Patches 0001 and 0002 are only refactors and can be applied
> independently. 0003 doesn't include row filtering on initial
> synchronization.
>

Good idea. I haven't read this yet, but one thing to make sure you've
handled is limiting the clause to referencing only the current tuple and
the catalogs. user-catalog tables are OK, too, anything that is
RelationIsAccessibleInLogicalDecoding().

This means only immutable functions may be invoked, since a stable or
volatile function might attempt to access a table. And views must be
prohibited or recursively checked. (We have tree walkers that would help
with this).

It might be worth looking at the current logic for CHECK expressions, since
the requirements are similar. In my opinion you could safely not bother
with allowing access to user catalog tables in the filter expressions and
limit them strictly to immutable functions and the tuple its self.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: VPATH build from a tarball fails with some gmake versions

2018-02-28 Thread Andrew Dunstan

On Thu, Mar 1, 2018 at 9:24 AM, Tom Lane  wrote:
> I tried doing a VPATH build referencing a source directory that I'd 
> distclean'd
> but not maintainer-clean'd, which should simulate the case of a VPATH
> build from a tarball.  I was quite surprised that it fell over:
>
> ...
> gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
> -Wendif-labels -Wmissing-format-attribute -Wformat-security 
> -fno-strict-aliasing -fwrapv -g -O2 -I. -I/home/postgres/pgsql/src/bin/psql 
> -I/home/postgres/pgsql/src/interfaces/libpq -I../../../src/include 
> -I/home/postgres/pgsql/src/include  -D_GNU_SOURCE   -c -o sql_help.o 
> sql_help.c
> gcc: sql_help.c: No such file or directory
> gcc: no input files
> make[3]: *** [sql_help.o] Error 1
> make[3]: Leaving directory `/home/postgres/buildroot/src/bin/psql'
>
> gmake should be generating a path to sql_help.c, since that exists
> in the corresponding source directory, but it does not.  I can reproduce
> this on RHEL6 with make-3.81-23, but not on Fedora 26 with make-4.2.1-2,
> and (curiously) also not on High Sierra with Apple's copy of make 3.81.
> I wonder how many other people see it.
>
> The attached patch seems to fix it, indicating that there's something
> about the rules relating sql_help.c and sql_help.h that is confusing
> make, such that turning sql_help.c into the primary target for the
> create_help rule removes the confusion.
>


Very odd. I can't reproduce it on Centos6 with make 3.81.23 and a
downloaded tarball.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: row filtering for logical replication

2018-02-28 Thread David Fetter

On Wed, Feb 28, 2018 at 08:03:02PM -0300, Euler Taveira wrote:
> Hi,
> 
> The attached patches add support for filtering rows in the publisher.
> The output plugin will do the work if a filter was defined in CREATE
> PUBLICATION command. An optional WHERE clause can be added after the
> table name in the CREATE PUBLICATION such as:
> 
> CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);
> 
> Row that doesn't match the WHERE clause will not be sent to the subscribers.
> 
> Patches 0001 and 0002 are only refactors and can be applied
> independently. 0003 doesn't include row filtering on initial
> synchronization.
> 
> Comments?

Great feature!  I think a lot of people will like to have the option
of trading a little extra CPU on the pub side for a bunch of network
traffic and some work on the sub side.

I noticed that the WHERE clause applies to all tables in the
publication.  Is that actually the right thing?  I'm thinking of a
case where we have foo(id, ...) and bar(foo_id, ).  To slice that
correctly, we'd want to do the ids in the foo table and the foo_ids in
the bar table.  In the system as written, that would entail, at least
potentially, writing a lot of publications by hand.

Something like
WHERE (
(table_1,..., table_N) HAS (/* WHERE clause here */) AND
(table_N+1,..., table_M) HAS (/* WHERE clause here */) AND
...
)

could be one way to specify.

I also noticed that in psql, \dRp+ doesn't show the WHERE clause,
which it probably should.

Does it need regression tests?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: prokind column (was Re: [HACKERS] SQL procedures)

2018-02-28 Thread Michael Paquier

On Wed, Feb 28, 2018 at 05:37:11PM -0500, Peter Eisentraut wrote:
> On 2/28/18 15:45, Tom Lane wrote:
>> I have reviewed this patch and attach an updated version below.
>> I've rebased it up to today, fixed a few minor errors, and adopted
>> most of Michael's suggestions.  Also, since I remain desperately
>> unhappy with putting zeroes into prorettype, I changed it to not
>> do that ;-) ... now procedures have VOIDOID as their prorettype,
>> and it will be substantially less painful to allow them to return
>> some other scalar result in future, should we wish to.  I believe
>> I've found all the places that were relying on prorettype == 0 as
>> a substitute for prokind == 'p'.
> 
> I have just posted "INOUT parameters in procedures", which contains some
> of those same changes.  So I think we're on the same page.  I will work
> on consolidating this.

Thanks Peter.

I have read the patch set that Tom has posted here and hunted for other
inconsistencies.  It seems to me that you have spotted everything.

The changes in ProcedureCreate() are nice.

I was wondering as well if it would be worth checking the contents of
pg_proc with prorettype = 0 in the regression tests to always make sure
that this never happens...  Until I saw that opr_sanity.sql was actually
doing already that so procedures actually are breaking that check on
HEAD.
--
Michael


signature.asc
Description: PGP signature

Re: IndexTupleDSize macro seems redundant

2018-02-28 Thread Tom Lane

Stephen Frost  writes:
> Updated (combined) patch attached for review.  I went through and looked
> again to make sure there weren't any cases of making an unaligned
> pointer to a struct and didn't see any, and I added some comments to
> _bt_restore_page().

This seems to have fallen through a crack, so I went ahead and pushed it.

regards, tom lane

Re: Cast jsonb to numeric, int, float, bool

2018-02-28 Thread Nikita Glukhov


On 01.03.2018 00:43, Darafei Praliaskouski wrote:


The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

We're using this patch and like it a lot.

We store a lot of log-like data in s3-hosted .json.gz files.
Sometimes we need to suddenly ingest them and calculate statistics and check 
the new what-ifs.
We ingest data to simple single-column table with jsonb field, and then perform 
our calculation on top of it.
Without this patch the only option to get data already parsed as numbers into 
jsonb into calculation is via binary-text-binary transformation.

We're relying on the patch for our custom spatial type extension and casting in 
it.
https://github.com/gojuno/lostgis/blob/master/sql/types/type_tpv.sql#L21

For postgres installations without the patch we do WITH INOUT cast stubbing,
https://github.com/gojuno/lostgis/blob/master/sql/types/__jsonb_casts.sql
- but without performance benefits of raw C processing :)

A thing this patch does not implement is cast from jsonb to bigint.
That would be useful for transforming stored osm_id OpenStreetMap object 
identifiers.
Right now we're stubbing it with jsonb::numeric::bigint cast, but the middle 
one would be nice to get rid of.

The new status of this patch is: Ready for Committer


Attached new version of the patch in which I removed duplicated code 
using new subroutine JsonbExtractScalar(). I am not sure what is better 
to do when a JSON item has an unexpected type: to throw an error or to 
return SQL NULL. Also JSON nulls could be converted to SQL NULLs. I 
should note here that expression (jb -> 'key')::datatype can be 
rewritten with SQL/JSON function JSON_VALUE: JSON_VALUE(jb, '$.key' 
RETURNING datatype ERROR ON ERROR) But by standard JSON_VALUE tries to 
cast string JSON items to the specified datatype too, so 
JSON_VALUE('{"key": "123"}'::jsonb, '$.key' RETURNING int ERROR ON 
ERROR) does not throw an error but returns 123. We already have jsonpath 
operators @#, @*, so it might be very useful if our jsonb casts were 
equivalent to theirs SQL/JSON analogues. For example, (jb @# 
'$.key')::datatype could be equivalent to JSON_VALUE(jb, '$.key' 
RETURNING datatype ERROR ON ERROR) or JSON_VALUE(jb, '$.key' RETURNING 
datatype [NULL ON ERROR]). But if we want to have NULL ON ERRORbehavior 
(which is default in SQL/JSON) in casts, then casts should not throw any 
errors.


-- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The 
Russian Postgres Company


>From fd71e22f087af2cd14c4a9101c96efb56888ab72 Mon Sep 17 00:00:00 2001
From: Anastasia 
Date: Thu, 1 Mar 2018 01:55:31 +0300
Subject: [PATCH] Cast jsonbNumeric to numeric, int4 and float8. Cast jsonbBool
 to bool.

---
 src/backend/utils/adt/jsonb.c | 95 +++
 src/include/catalog/pg_cast.h |  5 +++
 src/include/catalog/pg_proc.h |  9 
 3 files changed, 109 insertions(+)

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 0f70180..fb907f2 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1845,3 +1845,98 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(out);
 }
+
+
+/*
+ * Extract scalar value from raw-scalar pseudo-array jsonb.
+ */
+static JsonbValue *
+JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res)
+{
+	JsonbIterator *it;
+	JsonbIteratorToken tok PG_USED_FOR_ASSERTS_ONLY;
+	JsonbValue	tmp;
+
+	if (!JsonContainerIsArray(jbc) || !JsonContainerIsScalar(jbc))
+		return NULL;
+
+	/*
+	 * A root scalar is stored as an array of one element, so we get the
+	 * array and then its first (and only) member.
+	 */
+	it = JsonbIteratorInit(jbc);
+
+	tok = JsonbIteratorNext(, , true);
+	Assert(tok == WJB_BEGIN_ARRAY);
+	Assert(tmp.val.array.nElems == 1 && tmp.val.array.rawScalar);
+
+	tok = JsonbIteratorNext(, res, true);
+	Assert (tok == WJB_ELEM);
+	Assert(IsAJsonbScalar(res));
+
+	tok = JsonbIteratorNext(, , true);
+	Assert (tok == WJB_END_ARRAY);
+
+	tok = JsonbIteratorNext(, , true);
+	Assert(tok == WJB_DONE);
+
+	return res;
+}
+
+Datum
+jsonb_numeric(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbValue	v;
+
+	if (!JsonbExtractScalar(>root, ) || v.type != jbvNumeric)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("jsonb value must be numeric")));
+
+	PG_RETURN_NUMERIC(v.val.numeric);
+}
+
+Datum
+jsonb_int4(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbValue  v;
+
+	if (!JsonbExtractScalar(>root, ) || v.type != jbvNumeric)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("jsonb value must be numeric")));
+
+	PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4,
+  NumericGetDatum(v.val.numeric;
+}
+
+Datum

Re: Missing comment edit

2018-02-28 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> I happend to find that the comment on formdesc is missing
> pg_subscription. Please find the attached patch (I'm sure:) to
> fix that .

Hmm ... certainly, that comment is now wrong, but I'm kind of inclined
to just remove it, because it'll certainly be wrong again in future.
It's not telling you anything you can't find out with a trivial search
in the same file, so is it worth the maintenance overhead?

regards, tom lane

unused includes in test_decoding

2018-02-28 Thread Euler Taveira

Hi,

This is a cosmetic patch that removes some unused includes in
test_decoding. It seems to be like this since day 1 (9.4).


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From c50fa4a67276d423ba92cf85578f38533d1b68e0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 28 Feb 2018 23:44:51 +
Subject: [PATCH] Remove some unused includes

Those includes are not used by test_decoding.
---
 contrib/test_decoding/test_decoding.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index 0f18afa..f0b37f6 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -12,25 +12,15 @@
  */
 #include "postgres.h"
 
-#include "access/sysattr.h"
-
-#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 
-#include "nodes/parsenodes.h"
-
-#include "replication/output_plugin.h"
 #include "replication/logical.h"
-#include "replication/message.h"
 #include "replication/origin.h"
 
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
-#include "utils/relcache.h"
-#include "utils/syscache.h"
-#include "utils/typcache.h"
 
 PG_MODULE_MAGIC;
 
-- 
2.7.4

Re: compress method for spgist - 2

2018-02-28 Thread Tom Lane

Alexander Korotkov  writes:
> On Thu, Jan 4, 2018 at 1:17 AM, Dagfinn Ilmari Mannsåker 
> wrote:
>> This patch added two copies of the poly_ops row to the "Built-in SP-GiST
>> Operator Classes" table in spgist.sgml.

> Thank for fixing this!  I'm sure that Teodor will push this after end of
> New Year holidays in Russia.

He didn't ... so I did.

regards, tom lane

row filtering for logical replication

2018-02-28 Thread Euler Taveira

Hi,

The attached patches add support for filtering rows in the publisher.
The output plugin will do the work if a filter was defined in CREATE
PUBLICATION command. An optional WHERE clause can be added after the
table name in the CREATE PUBLICATION such as:

CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000);

Row that doesn't match the WHERE clause will not be sent to the subscribers.

Patches 0001 and 0002 are only refactors and can be applied
independently. 0003 doesn't include row filtering on initial
synchronization.

Comments?


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
From ae95eb51dd72c2e8ba278da950b478f6c6741fc0 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Tue, 27 Feb 2018 02:21:03 +
Subject: [PATCH 1/3] Refactor function create_estate_for_relation

Relation localrel is the only LogicalRepRelMapEntry structure member
that is useful for create_estate_for_relation.
---
 src/backend/replication/logical/worker.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 04985c9..6820c1a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -183,7 +183,7 @@ ensure_transaction(void)
  * This is based on similar code in copy.c
  */
 static EState *
-create_estate_for_relation(LogicalRepRelMapEntry *rel)
+create_estate_for_relation(Relation rel)
 {
 	EState	   *estate;
 	ResultRelInfo *resultRelInfo;
@@ -193,12 +193,12 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 
 	rte = makeNode(RangeTblEntry);
 	rte->rtekind = RTE_RELATION;
-	rte->relid = RelationGetRelid(rel->localrel);
-	rte->relkind = rel->localrel->rd_rel->relkind;
+	rte->relid = RelationGetRelid(rel);
+	rte->relkind = rel->rd_rel->relkind;
 	estate->es_range_table = list_make1(rte);
 
 	resultRelInfo = makeNode(ResultRelInfo);
-	InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
+	InitResultRelInfo(resultRelInfo, rel, 1, NULL, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -584,7 +584,7 @@ apply_handle_insert(StringInfo s)
 	}
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel->localrel);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 		RelationGetDescr(rel->localrel));
 
@@ -688,7 +688,7 @@ apply_handle_update(StringInfo s)
 	check_relation_updatable(rel);
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel->localrel);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 		RelationGetDescr(rel->localrel));
 	localslot = ExecInitExtraTupleSlot(estate,
@@ -806,7 +806,7 @@ apply_handle_delete(StringInfo s)
 	check_relation_updatable(rel);
 
 	/* Initialize the executor state. */
-	estate = create_estate_for_relation(rel);
+	estate = create_estate_for_relation(rel->localrel);
 	remoteslot = ExecInitExtraTupleSlot(estate,
 		RelationGetDescr(rel->localrel));
 	localslot = ExecInitExtraTupleSlot(estate,
-- 
2.7.4

From 8421809e38d3e1d43b00feaac4b8bccaa6738079 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 24 Jan 2018 17:01:31 -0200
Subject: [PATCH 2/3] Rename a WHERE node

A WHERE clause will be used for row filtering in logical replication. We
already have a similar node: 'WHERE (condition here)'. Let's rename the
node to a generic name and use it for row filtering too.
---
 src/backend/parser/gram.y | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d99f2be..bf32362 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -468,7 +468,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	def_arg columnElem where_clause where_or_current_clause
 a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
 columnref in_expr having_clause func_table xmltable array_expr
-ExclusionWhereClause operator_def_arg
+OptWhereClause operator_def_arg
 %type 	rowsfrom_item rowsfrom_list opt_col_def_list
 %type  opt_ordinality
 %type 	ExclusionConstraintList ExclusionConstraintElem
@@ -3734,7 +3734,7 @@ ConstraintElem:
 	$$ = (Node *)n;
 }
 			| EXCLUDE access_method_clause '(' ExclusionConstraintList ')'
-opt_definition OptConsTableSpace ExclusionWhereClause
+opt_definition OptConsTableSpace OptWhereClause
 ConstraintAttributeSpec
 {
 	Constraint *n = makeNode(Constraint);
@@ -3831,7 +3831,7 @@ ExclusionConstraintElem: index_elem WITH any_operator
 			}
 		;
 
-ExclusionWhereClause:
+OptWhereClause:
 			WHERE '(' a_expr ')'	{ $$ = $3; }
 			| /*EMPTY*/{ $$ =

Re: [HACKERS] pgbench randomness initialization

2018-02-28 Thread Tom Lane

Fabien COELHO  writes:
>> This is a simple patch that does what it says on the tin. I ran into
>> trouble with the pgbench TAP test *even before applying the patch*, but
>> only because I was doing a VPATH build as a user without 'write'
>> on the source tree (001_pgbench_with_server.pl tried to make pgbench
>> create log files there). Bad me. Oddly, that was the only test in the
>> whole tree to have such an issue, so here I add a pre-patch to fix that.
>> Now my review needs a review. :)

> Yep. I find the multiple chdir solution a little bit too extreme.

> ISTM that it should rather add the correct path to --log-prefix by 
> prepending $node->basedir, like the pgbench function does for -f scripts.
> See attached.

Hm ... so I tried to replicate this problem, and failed to: the log files
get made under the VPATH build directory, as desired, even without this
patch.  Am I doing something wrong, or is this platform-dependent somehow?

regards, tom lane

VPATH build from a tarball fails with some gmake versions

2018-02-28 Thread Tom Lane

I tried doing a VPATH build referencing a source directory that I'd distclean'd
but not maintainer-clean'd, which should simulate the case of a VPATH
build from a tarball.  I was quite surprised that it fell over:

...
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -g -O2 -I. -I/home/postgres/pgsql/src/bin/psql 
-I/home/postgres/pgsql/src/interfaces/libpq -I../../../src/include 
-I/home/postgres/pgsql/src/include  -D_GNU_SOURCE   -c -o sql_help.o sql_help.c
gcc: sql_help.c: No such file or directory
gcc: no input files
make[3]: *** [sql_help.o] Error 1
make[3]: Leaving directory `/home/postgres/buildroot/src/bin/psql'

gmake should be generating a path to sql_help.c, since that exists
in the corresponding source directory, but it does not.  I can reproduce
this on RHEL6 with make-3.81-23, but not on Fedora 26 with make-4.2.1-2,
and (curiously) also not on High Sierra with Apple's copy of make 3.81.
I wonder how many other people see it.

The attached patch seems to fix it, indicating that there's something
about the rules relating sql_help.c and sql_help.h that is confusing
make, such that turning sql_help.c into the primary target for the
create_help rule removes the confusion.

Comments?

regards, tom lane

diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index c8eb2f9..4482097 100644
*** a/src/bin/psql/Makefile
--- b/src/bin/psql/Makefile
*** psql: $(OBJS) | submake-libpq submake-li
*** 35,49 
  
  help.o: sql_help.h
  
! sql_help.c: sql_help.h ;
! sql_help.h: create_help.pl $(wildcard $(REFDOCDIR)/*.sgml)
  	$(PERL) $< $(REFDOCDIR) $*
  
  psqlscanslash.c: FLEXFLAGS = -Cfe -p -p
  psqlscanslash.c: FLEX_NO_BACKUP=yes
  psqlscanslash.c: FLEX_FIX_WARNING=yes
  
! distprep: sql_help.h psqlscanslash.c
  
  install: all installdirs
  	$(INSTALL_PROGRAM) psql$(X) '$(DESTDIR)$(bindir)/psql$(X)'
--- 35,49 
  
  help.o: sql_help.h
  
! sql_help.h: sql_help.c ;
! sql_help.c: create_help.pl $(wildcard $(REFDOCDIR)/*.sgml)
  	$(PERL) $< $(REFDOCDIR) $*
  
  psqlscanslash.c: FLEXFLAGS = -Cfe -p -p
  psqlscanslash.c: FLEX_NO_BACKUP=yes
  psqlscanslash.c: FLEX_FIX_WARNING=yes
  
! distprep: sql_help.c psqlscanslash.c
  
  install: all installdirs
  	$(INSTALL_PROGRAM) psql$(X) '$(DESTDIR)$(bindir)/psql$(X)'

Re: prokind column (was Re: [HACKERS] SQL procedures)

2018-02-28 Thread Peter Eisentraut

On 2/28/18 15:45, Tom Lane wrote:
> I have reviewed this patch and attach an updated version below.
> I've rebased it up to today, fixed a few minor errors, and adopted
> most of Michael's suggestions.  Also, since I remain desperately
> unhappy with putting zeroes into prorettype, I changed it to not
> do that ;-) ... now procedures have VOIDOID as their prorettype,
> and it will be substantially less painful to allow them to return
> some other scalar result in future, should we wish to.  I believe
> I've found all the places that were relying on prorettype == 0 as
> a substitute for prokind == 'p'.

I have just posted "INOUT parameters in procedures", which contains some
of those same changes.  So I think we're on the same page.  I will work
on consolidating this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

INOUT parameters in procedures

2018-02-28 Thread Peter Eisentraut

This patch set adds support for INOUT parameters to procedures.
Currently, INOUT and OUT parameters are not supported.

A top-level CALL returns the output parameters as a result row.  In
PL/pgSQL, I have added special support to pass the output back into the
variables, as one would expect.

These patches apply on top of the "prokind" patch set v2.  (Tom has
submitted an updated version of that, which overlaps with some of the
changes I've made here.  I will work on consolidating that soon.)


So ... no OUT parameters, though.  I'm struggling to find a way to make
this compatible with everything else.  For functions, the OUT parameters
don't appear in the signature.  But that is not how this is specified in
the SQL standard for procedures (I think).  In PL/pgSQL, you'd expect that

CREATE PROCEDURE foo(a int, OUT b int) ...

could be called like

CALL foo(x, y);

but that would require a different way of parsing function invocation.

At the top-level, it's even more dubious.  In DB2, apparently you write

CALL foo(123, ?);

with a literal ? for the OUT parameters.

In Oracle, I've seen CALL ... INTO syntax.

Anyway, I'm leaving this out for now.  It can be worked around by using
INOUT parameters.  Future improvements would be mainly syntax/parsing
adjustments; the guts that I'm implementing here would remain valid.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 127f3716a28cceca5077786e2cb3717e36dbb426 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 27 Feb 2018 09:55:32 -0500
Subject: [PATCH v1 1/2] fixup! Add prokind column, replacing proisagg and
 proiswindow

---
 src/backend/commands/dropcmds.c |  2 +-
 src/backend/parser/parse_func.c |  6 +++---
 src/backend/utils/cache/lsyscache.c | 12 ++--
 src/include/utils/lsyscache.h   |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index fc4ce8d22a..45493abf57 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -92,7 +92,7 @@ RemoveObjects(DropStmt *stmt)
 */
if (stmt->removeType == OBJECT_FUNCTION)
{
-   if (get_func_isagg(address.objectId))
+   if (get_func_kind(address.objectId) == 
PROKIND_AGGREGATE)
ereport(ERROR,

(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("\"%s\" is an aggregate 
function",
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 9dbf2c2b63..0b5145f70d 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -2078,7 +2078,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs 
*func, bool noError)
if (objtype == OBJECT_FUNCTION)
{
/* Make sure it's a function, not a procedure */
-   if (oid && get_func_rettype(oid) == InvalidOid)
+   if (oid && get_func_kind(oid) == PROKIND_PROCEDURE)
{
if (noError)
return InvalidOid;
@@ -2109,7 +2109,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs 
*func, bool noError)
}
 
/* Make sure it's a procedure */
-   if (get_func_rettype(oid) != InvalidOid)
+   if (get_func_kind(oid) != PROKIND_PROCEDURE)
{
if (noError)
return InvalidOid;
@@ -2145,7 +2145,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs 
*func, bool noError)
}
 
/* Make sure it's an aggregate */
-   if (!get_func_isagg(oid))
+   if (get_func_kind(oid) != PROKIND_AGGREGATE)
{
if (noError)
return InvalidOid;
diff --git a/src/backend/utils/cache/lsyscache.c 
b/src/backend/utils/cache/lsyscache.c
index 161470aa34..869a937d5a 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1600,20 +1600,20 @@ func_parallel(Oid funcid)
 }
 
 /*
- * get_func_isagg
- *Given procedure id, return whether the function is an aggregate.
+ * get_func_kind
+ *Given procedure id, return the function kind (prokind).
  */
-bool
-get_func_isagg(Oid funcid)
+char
+get_func_kind(Oid funcid)
 {
HeapTuple   tp;
-   boolresult;
+   charresult;
 
tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
if (!HeapTupleIsValid(tp))
elog(ERROR, "cache lookup failed for function %u", funcid);
 
-   result = ((Form_pg_proc) GETSTRUCT(tp))->prokind == PROKIND_AGGREGATE;
+   result = ((Form_pg_proc)

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
> 
> Huh?  You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

 1. fsync was off at some point in history when the DB needed recovery;
 2. fsync was off during the current instance; I assume the bit would be set
whenever fsync=off was set, but could be cleared when the server was
cleanly shut down.  If the bit is set during recovery, the first bit would
be permanently set.  Not sure but if this bit is set and then fsync
re-enabled, maybe this could be cleared after any checkpoint and not just
shutdown ?

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

Hi,

On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote:

Unfortunately your list seems to raise the bar to a place I don't see us
going soon :(


>  - pg_control versions used on this cluster (hopefully a full list..obviously
>not going back before PG11);

That needs arbitrary much space, that's not feasible for
pg_control. Needs to be <= 512 bytes


>  - did recovery (you could use "needed recovery" instead, but then there's the
>question of how reliable that field would be);
>+ or: timestamp of most recent recovery (attempt?)

What'd that be useful for?


>  - index corruption detected (on system table?);  Note that "CREATE IF NOT
>EXIST" doesn't avoid unique key errors on system tables, so it's not useful
>to log every ERROR on system tables.

I don't think we have a easy way to diagnose this specifically enough


>  - page %u is uninitialized --- fixing

Doesn't really indicate a bug / problem.


>  - here's one I just dug up: ERROR: missing chunk number 0 for toast value 
> 10270298 in pg_toast_2619 

Hm.

>  - XID wraparound?

why?


>  - autovacuum disabled?

why?


>  - checksum failue (in system relation?)

Hm.


>  - local_preload_libraries?

Hm?


>  - started in single user mode or with system indices disabled?

why?


>  - hit assertion or PANIC ??

We certainly don't write to persistent data in those case.


>  - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
>vacuum/analyze/cluster/reindex?)

Hm, the former I could see some use for, but it might end up being
pretty ugly locking and layering wise.

Greetings,

Andres Freund

Re: [HACKERS] MERGE SQL Statement for PG11

2018-02-28 Thread Peter Geoghegan

On Fri, Feb 9, 2018 at 6:36 AM, Robert Haas  wrote:
> Here's my $0.02: I think that new concurrency errors thrown by the
> merge code itself deserve strict scrutiny and can survive only if they
> have a compelling justification, but if the merge code happens to
> choose an action that leads to a concurrency error of its own, I think
> that deserves only mild scrutiny.
>
> On that basis, of the options I listed in
> http://postgr.es/m/ca+tgmozdl-caukhkwet7sr7sqr0-e2t91+devhqen5sfqsm...@mail.gmail.com
> I like #1 least.
>
> I also dislike #4 from that list for the reasons stated there.  For
> example, if you say WHEN MATCHED AND x.some_boolean and then WHEN
> MATCHED, you expect that every tuple that hits the latter clause will
> have that Boolean as false or null, but #4 makes that not true.
>
> I think the best options are #2 and #5 -- #2 because it's simple, and
> #5 because it's (maybe) more intuitive, albeit at the risk of
> livelock.  But I'm willing to convinced of some other option; like
> you, I'm willing to be open-minded about this.  But, as you say, we
> need a coherent, well-considered justification for the selected
> option, not just "well, this is what we implemented so you should like
> it".  The specification should define the implementation, not the
> reverse.

At first I hated option #2. I'm warming to #2 a lot now, though,
because I've come to understand the patch's approach a little better.
(Pavan and Simon should still verify that I got things right in my
mail from earlier today, though.)

It now seems like the patch throws a RC serialization error more or
less only due to concurrent deletions (rarely, it will actually be a
concurrent update that changed the join qual values of our MERGE).
We're already not throwing the error (we just move on to the next
input row from the join) when we happen to not have a WHEN NOT MATCHED
case. But why even make that distinction? Why not just go ahead and
WHEN NOT MATCHED ... INSERT when the MERGE happens to have such a
clause? The INSERT will succeed, barring any concurrent conflicting
insertion by a third session -- a hazard that has nothing to do with
RC conflict handling in particular.

Just inserting without throwing a RC serialization error is almost
equivalent to a new MVCC snapshot being acquired due to a RC conflict,
so all of this now looks okay to me. Especially because every other
MERGE implementation seems to have serious issues with doing anything
too fancy with the MERGE target table's quals within the main ON join.
I think that SQL Server actually assumes that you're using the
target's PK in a simple equi-join. All the examples look like that,
and this assumption is heavily suggested by the "Do not attempt to
improve query performance by filtering out rows in the target table in
the ON clause" weasel words from their docs, that I referenced in my
mail from earlier today.

I can get my head around all of this now only because I've come to
understand that once we've decided that a given input from the main
join is NOT MATCHED, we stick to that determination. We don't bounce
around between MATCHED and NOT MATCHED cases during an EPQ update
chain walk. We can bounce around between multiple alternative MATCHED
merge actions/WHEN cases, but that seems okay because it's still part
of essentially the same EPQ update chain walk -- no obvious livelock
hazard. It seems fairly clean to restart everything ("goto lmerge")
for each and every tuple in the update chain.

Maybe we should actually formalize that you're only supposed to do a
PK or unique index equi-join within the main ON join, though -- you
can do something fancy with the source table, but not the target
table.

-- 
Peter Geoghegan

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

Hi,

On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> I can see why you'd want that, but as a DBA, I don't necessarily want
> all of that recorded, especially in a quasi-permanent way.

Huh?  You're arguing that we should make it easier for DBAs to hide
potential causes of corruption? I fail to see when that's necessary /
desirable? That a cluster ran with fsync=off isn't an embarassing thing,
nor does it contain private data...

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

What about:

 - pg_control versions used on this cluster (hopefully a full list..obviously
   not going back before PG11);
 - did recovery (you could use "needed recovery" instead, but then there's the
   question of how reliable that field would be);
   + or: timestamp of most recent recovery (attempt?)
 - index corruption detected (on system table?);  Note that "CREATE IF NOT
   EXIST" doesn't avoid unique key errors on system tables, so it's not useful
   to log every ERROR on system tables.
 - page %u is uninitialized --- fixing
 - here's one I just dug up: ERROR: missing chunk number 0 for toast value 
10270298 in pg_toast_2619 
 - XID wraparound?
 - autovacuum disabled?
 - checksum failue (in system relation?)
 - local_preload_libraries?
 - started in single user mode or with system indices disabled?
 - hit assertion or PANIC ??
 - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
   vacuum/analyze/cluster/reindex?)

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

On 2018-02-28 23:13:44 +0100, Tomas Vondra wrote:
> 
> On 02/28/2018 10:43 PM, Andres Freund wrote:
> > Hi,
> > 
> > a significant number of times during investigations of bugs I wondered
> > whether running the cluster with various settings, or various tools
> > could've caused the issue at hand.  Therefore I'd like to propose adding
> > a 'tainted' field to pg_control, that contains some of the "history" of
> > the cluster. Individual bits inside that field that I can think of right
> > now are:
> > - pg_resetxlog was used non-passively on cluster
> > - ran with fsync=off
> > - ran with full_page_writes=off
> > - pg_upgrade was used
> > 
> > What do others think?

> Isn't the uncertainty usually that you know one of these things happened
> on the cluster, but you don't know if that's the actual root cause?
> That's my experience, at least.

My experience is that you get a bugreport and you have no clue what
happened with the cluster. Often the "owner" doesn't have either. Then
you go through various theories and end up blaming it on one of these,
even though it's quite possibly never happened.

Of course this does *not* solve the issue when these actually
occurred. It's just a tool to narrow down possible causes / exclude
others.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Peter Eisentraut

On 2/28/18 16:43, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

I can see why you'd want that, but as a DBA, I don't necessarily want
all of that recorded, especially in a quasi-permanent way.

Maybe this could be stored in a separate location, like a small
text/log-like file in the data directory.  Then DBAs can save or keep or
remove that information as they wish.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Tomas Vondra


On 02/28/2018 10:43 PM, Andres Freund wrote:
> Hi,
> 
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used
> 
> What do others think?
> 

Isn't the uncertainty usually that you know one of these things happened
on the cluster, but you don't know if that's the actual root cause?
That's my experience, at least.

That being said, I'm not strongly opposed to adding such field.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-02-28 Thread Nikita Glukhov


Attached 3rd version of kNN for SP-GiST.


On 09.03.2017 16:52, Alexander Korotkov wrote:


Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for 
its name.  GiST supports ordering only by float4 and float8 
distances.  SP-GiST also goes that way. But in the future, access 
methods could support distances of other types.
Thus, I suggest to rename function to 
 index_store_float8_orderby_distances().

This function was renamed.


* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.

Patches were rebased onto current master.



*** AUTHORS
*** 371,373 
--- 376,379 

      Teodor Sigaev >
      Oleg Bartunov >
+     Vlad Sterzhanov >


I don't think it worth mentioning here GSoC student who made just 
dirty prototype and abandon this work just after final evaluation.

Student's mention was removed.

More generally, you switched SP-GiST from scan stack to pairing heap.  
We should check if it introduces overhead to non-KNN scans.

I decided to bring back scan stack for unordered scans.


Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now.


* 
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch


I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 
'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration]

return get_float8_nan();
 ^
1 warning generated.

Fixed.


* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable, 
but completely separate feature.  And it should be posted and 
discussed separately.
Compress method for SP-GiST with poly_ops already have been committed, 
so I left only ordering operators for polygons in the last 6th patch.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index b30b931..da76a76 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
-#include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "lib/pairingheap.h"
@@ -540,7 +540,6 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
-	int			i;
 
 	if (scan->xs_hitup)
 	{
@@ -561,45 +560,10 @@ getNextNearest(IndexScanDesc scan)
 			/* found a heap item at currently minimal distance */
 			scan->xs_ctup.t_self = item->data.heap.heapPtr;
 			scan->xs_recheck = item->data.heap.recheck;
-			scan->xs_recheckorderby = item->data.heap.recheckDistances;
-			for (i = 0; i < scan->numberOfOrderBys; i++)
-			{
-if (so->orderByTypes[i] == FLOAT8OID)
-{
-#ifndef USE_FLOAT8_BYVAL
-	/* must free any old value to avoid memory leakage */
-	if (!scan->xs_orderbynulls[i])
-

Re: Two small patches for the isolationtester lexer

2018-02-28 Thread Tom Lane

Daniel Gustafsson  writes:
> On 22 Feb 2018, at 05:10, Tom Lane  wrote:
>> Actually, looking closer, this would also trigger on '#' used inside a
>> SQL literal, which seems to move the problem cases into the "pretty
>> likely" category instead of the "far-fetched" one.  So I'd only be OK
>> with it if we made the lexer smart enough to distinguish inside-a-SQL-
>> literal from not.  That might be a good thing anyway, since it would
>> allow us to not choke on literals containing '}', but it'd be a great
>> deal more work.  You might be able to steal code from the psql lexer
>> though.

> I agree, patch 0002 was broken and the correct fix is a much bigger project -
> one too big for me to tackle right now (but hopefully at some point in the 
> near
> future).  Thanks for the review of it though!

OK.  I'm going to mark this commitfest entry closed, since the other patch
is in and this one needs a good bit more work.  Please start a new thread,
or at least a new CF entry, if you do more work in this area.

regards, tom lane

Re: Two small patches for the isolationtester lexer

2018-02-28 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 22 Feb 2018, at 05:12, Tom Lane  wrote:
>> Another idea is just to teach addlitchar to realloc the buffer bigger
>> when necessary.

> I think this is the best approach for the task, the attached patch changes the
> static allocation to instead realloc when required.  Having an upper limit on
> the allocation seemed like a good idea to me, but perhaps it’s overthinking 
> and
> complicating things for no good reason.

Yeah, it doesn't seem to me that we need any fixed limit on the length,
so I deleted that part of the logic, and pushed it with one or two other
cosmetic adjustments.

regards, tom lane

Re: Cast jsonb to numeric, int, float, bool

2018-02-28 Thread Darafei Praliaskouski

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

We're using this patch and like it a lot.

We store a lot of log-like data in s3-hosted .json.gz files.
Sometimes we need to suddenly ingest them and calculate statistics and check 
the new what-ifs.
We ingest data to simple single-column table with jsonb field, and then perform 
our calculation on top of it.
Without this patch the only option to get data already parsed as numbers into 
jsonb into calculation is via binary-text-binary transformation.

We're relying on the patch for our custom spatial type extension and casting in 
it.
https://github.com/gojuno/lostgis/blob/master/sql/types/type_tpv.sql#L21 

For postgres installations without the patch we do WITH INOUT cast stubbing,
https://github.com/gojuno/lostgis/blob/master/sql/types/__jsonb_casts.sql
- but without performance benefits of raw C processing :)

A thing this patch does not implement is cast from jsonb to bigint.
That would be useful for transforming stored osm_id OpenStreetMap object 
identifiers.
Right now we're stubbing it with jsonb::numeric::bigint cast, but the middle 
one would be nice to get rid of.

The new status of this patch is: Ready for Committer

RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

Hi,

a significant number of times during investigations of bugs I wondered
whether running the cluster with various settings, or various tools
could've caused the issue at hand.  Therefore I'd like to propose adding
a 'tainted' field to pg_control, that contains some of the "history" of
the cluster. Individual bits inside that field that I can think of right
now are:
- pg_resetxlog was used non-passively on cluster
- ran with fsync=off
- ran with full_page_writes=off
- pg_upgrade was used

What do others think?

Perhaps we should rename 'tainted' to something less "blame-y" sounding,
but I can't quite think of something more descriptive than
'cluster_history'.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-02-28 Thread Tomas Vondra

On 02/28/2018 08:42 PM, Alvaro Herrera wrote:
> I noticed that pg_verify_checksum takes an "-o oid" argument to only
> check the relation with that OID; but that's wrong, because the number
> is a relfilenode, not an OID (since it's compared to the on-disk file
> name).  I would suggest changing everything to clarify that it's a
> pg_class.relfilenode value, otherwise it's going to be very confusing.
> Maybe use "-f filenode" if -f is available?
> 

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

  
   -o relfilenode
   

 Only validate checksums in the relation with specified relfilenode.

   
  


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-02-28 Thread Peter Geoghegan

On Wed, Feb 28, 2018 at 8:53 AM, Robert Haas  wrote:
> On Tue, Feb 27, 2018 at 5:07 PM, Peter Geoghegan  wrote:
>> I now feel like Simon's suggestion of throwing an error in corner
>> cases isn't so bad. It still seems like we could do better, but the
>> more I think about it, the less that seems like a cop-out. My reasons
>> are:
>
> I still think we really ought to try not to add a new class of error.

I do too. However, it's only fair to acknowledge that the lack of any
strong counterproposal after a month or so tells us something.

>> * We can all agree that *not* raising an error in the specific way
>> Simon proposes is possible, somehow or other. We also all agree that
>> avoiding the broader category of RC errors can only be taken so far
>> (e.g. in any event duplicate violations errors are entirely possible,
>> in RC mode, when a MERGE inserts a row). So this is a question of what
>> exact middle ground to take. Neither of the two extremes (throwing an
>> error on the first sign of a RC conflict, and magically preventing
>> concurrency anomalies) are actually on the table.
>
> Just because there's no certainty about which behavior is best doesn't
> mean that none of them are better than throwing an error.

Bear in mind that the patch doesn't throw an error at the first sign
of trouble. It throws an error after doing an EPQ style walk, which
individually verifies the extra "WHEN ... AND" quals for every tuple
in the update chain (ExecUpdate()/ExecDelete() return after first EPQ
tuple check for MERGE, so that an ExecMerge() caller can do that extra
part with special EPQ slot). The controversial serialization error
only comes when the original basic assessment of MATCHED needs to
change, without regard to extra "WHEN ... AND" quals (and only when
the MERGE statement was written in such a way that that matters -- it
must have a WHEN NOT MATCHED clause, too).

AFAICT, the patch will only throw an error when the join quals no
longer pass -- not when the extra "WHEN ... AND" quals no longer pass.
That would be far worse, IMV. ExecMerge() will retry until it reaches
the end of the update chain, possibly changing its mind about which
particular WHEN case applies repeatedly, going back and forth between
mergeActions (WHEN cases) as the update chain in walked. The patch can
do a lot to roll with conflicts before it gives up. Conflicts are
generally caused by concurrent DELETEs, so you could say that this
offers a kind of symmetry with concurrent INSERTs causing duplicate
violation errors. (Concurrent UPDATEs that change the join qual values
could provoke the error too, but presumably we're joining on the PK in
most cases, so that seems much less likely than a simple DELETE.)

Bear in mind that both the SQL Server and Oracle implementations have
many significant restrictions/caveats around doing something cute with
the main join quals. It's not surprising that we'd have something like
that, too. According to Rahila, Oracle doesn't allow MERGE to update
the columns in the join qual at all, and only allows a single WHEN
MATCHED case (a DELETE can only come after an UPDATE in Oracle, oddly
enough). SQL Server's docs have a warning that states: "Do not attempt
to improve query performance by filtering out rows in the target table
in the ON clause, such as by specifying AND NOT target_table.column_x
= value. Doing so may return unexpected and incorrect results.". What
does that even mean!?

I am slightly concerned that the patch may still have livelock hazards
in its approach to RC conflict handling. I haven't studied that aspect
in enough detail, though. Since we always make forward progress by
following an update chain, any problem here is probably fixable.

-- 
Peter Geoghegan

SET TRANSACTION in PL/pgSQL

2018-02-28 Thread Peter Eisentraut

Currently, you can't run SET TRANSACTION in PL/pgSQL.  A normal SQL
command run inside PL/pgSQL acquires a snapshot, but SET
TRANSACTION does not work anymore if a snapshot is set.  Here is a patch
to work around that by handling this command separately.  I have coded
this here bypassing SPI entirely.  But there is some overlap with the
no_snapshot option in the patch "PL/pgSQL nested CALL with
transactions", so maybe a better solution will arise.  This will also
inform how to tackle this in other PLs.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 8b4d1ca1a90c18eb3a797b3938e804478022543a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 26 Feb 2018 11:59:06 -0500
Subject: [PATCH v1] PL/pgSQL: Add support for SET TRANSACTION

A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET
TRANSACTION does not work anymore if a snapshot is set.  So we have to
handle this separately.
---
 .../plpgsql/src/expected/plpgsql_transaction.out   | 20 +
 src/pl/plpgsql/src/pl_exec.c   | 49 ++
 src/pl/plpgsql/src/pl_funcs.c  | 23 ++
 src/pl/plpgsql/src/pl_gram.y   | 32 +-
 src/pl/plpgsql/src/pl_scanner.c|  2 +
 src/pl/plpgsql/src/plpgsql.h   | 13 +-
 src/pl/plpgsql/src/sql/plpgsql_transaction.sql | 19 +
 7 files changed, 156 insertions(+), 2 deletions(-)

diff --git a/src/pl/plpgsql/src/expected/plpgsql_transaction.out 
b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
index 8ec22c646c..5f569dc64a 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_transaction.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
@@ -236,6 +236,26 @@ SELECT * FROM test3;
  1
 (1 row)
 
+-- SET TRANSACTION
+DO LANGUAGE plpgsql $$
+BEGIN
+PERFORM 1;
+RAISE INFO '%', current_setting('transaction_isolation');
+COMMIT;
+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+PERFORM 1;
+RAISE INFO '%', current_setting('transaction_isolation');
+COMMIT;
+SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+RESET TRANSACTION ISOLATION LEVEL;
+PERFORM 1;
+RAISE INFO '%', current_setting('transaction_isolation');
+COMMIT;
+END;
+$$;
+INFO:  read committed
+INFO:  repeatable read
+INFO:  read committed
 DROP TABLE test1;
 DROP TABLE test2;
 DROP TABLE test3;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 4ff87e0879..9a25ee9ad9 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -32,6 +32,7 @@
 #include "parser/scansup.h"
 #include "storage/proc.h"
 #include "tcop/tcopprot.h"
+#include "tcop/utility.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
@@ -299,6 +300,8 @@ static int exec_stmt_commit(PLpgSQL_execstate *estate,
 PLpgSQL_stmt_commit *stmt);
 static int exec_stmt_rollback(PLpgSQL_execstate *estate,
   PLpgSQL_stmt_rollback *stmt);
+static int exec_stmt_set(PLpgSQL_execstate *estate,
+  PLpgSQL_stmt_set *stmt);
 
 static void plpgsql_estate_setup(PLpgSQL_execstate *estate,
 PLpgSQL_function *func,
@@ -1997,6 +2000,10 @@ exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt)
rc = exec_stmt_rollback(estate, (PLpgSQL_stmt_rollback 
*) stmt);
break;
 
+   case PLPGSQL_STMT_SET:
+   rc = exec_stmt_set(estate, (PLpgSQL_stmt_set *) stmt);
+   break;
+
default:
estate->err_stmt = save_estmt;
elog(ERROR, "unrecognized cmd_type: %d", 
stmt->cmd_type);
@@ -4566,6 +4573,48 @@ exec_stmt_rollback(PLpgSQL_execstate *estate, 
PLpgSQL_stmt_rollback *stmt)
return PLPGSQL_RC_OK;
 }
 
+/*
+ * exec_stmt_set
+ *
+ * Execute SET/RESET statement.
+ *
+ * We just parse and execute the statement normally, but we have to do it
+ * without going through the usual SPI functions, because we don't want to set
+ * a snapshot for things like SET TRANSACTION.
+ *
+ * XXX It might be nice to use SPI_execute(), but in order to not get a
+ * snapshot, we have to pass read_only = true, which in turn prevents SET
+ * commands.
+ */
+static int
+exec_stmt_set(PLpgSQL_execstate *estate, PLpgSQL_stmt_set *stmt)
+{
+   List   *raw_parsetree_list;
+   RawStmt*parsetree;
+   List   *stmt_list;
+   PlannedStmt *pstmt;
+
+   raw_parsetree_list = pg_parse_query(stmt->expr->query);
+   Assert(list_length(raw_parsetree_list) == 1);
+   parsetree = linitial_node(RawStmt, raw_parsetree_list);
+
+   stmt_list = pg_analyze_and_rewrite(parsetree, stmt->expr->query, NULL, 
0, NULL);
+   stmt_list =

2018-03 Commitfest starts tomorrow

2018-02-28 Thread David Steele

Hackers!

I'll be starting the Commitfest at midnight AoE (07:00 ET, 13:00 CET) so
please get your patches in before then.

Please remember that if you drop a new and large (or invasive patch)
into this CF it may be moved to the next CF.

This last CF for PG11 should generally be restricted to patches that
have gone through review in prior CFs or are modest in scope.

Thanks,
-- 
-David
da...@pgmasters.net

PATCH: Unlogged tables re-initialization tests

2018-02-28 Thread David Steele

These tests were originally included in the exclude unlogged tables
patch [1] to provide coverage for the refactoring of reinit.c.

After review we found a simpler implementation that did not require the
reinit.c refactor so I dropped the tests from that patch.

I did not include the refactor here because it's just noise now, but I
think the tests still have value.

I will add to the 2018-03 CF.

-- 
-David
da...@pgmasters.net

[1]https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net
diff --git a/src/test/recovery/t/014_unlogged_reinit.pl 
b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 00..ac2e251158
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,117 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+   # Create unlogged tables in a tablespace
+   $tablespaceDir = TestLib::tempdir . "/ts1";
+
+   mkdir($tablespaceDir)
+   or die "unable to mkdir \"$tablespaceDir\"";
+
+   $node->safe_psql('postgres',
+   "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+   $node->safe_psql('postgres',
+   'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+   $ts1UnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('ts1_unlogged')});
+
+   # Make sure main and init forks exist
+   ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+   ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+}
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+   'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+   'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+   or die "unable to remove \"${baseUnloggedPath}\"";
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+   # Write forks to test that they are removed by recovery
+   $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+   'touch vm fork in tablespace');
+   $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+   'touch fsm fork in tablespace');
+
+   # Remove main fork to test that it is recopied from init
+   unlink("$pgdata/${ts1UnloggedPath}")
+   or die "unable to remove \"${ts1UnloggedPath}\"";
+}
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 4 if ($windows_os);
+
+   # Check unlogged table in tablespace
+   ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+   ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+   ok(!-f "$pgdata/${ts1UnloggedPath}_vm", 'vm fork not in tablespace');
+   ok(!-f "$pgdata/${ts1UnloggedPath}_fsm", 'fsm fork not in tablespace');
+
+   # Drop unlogged table
+   $node->safe_psql('postgres', 'DROP TABLE ts1_unlogged');
+
+   # Drop tablespace
+   $node->safe_psql('postgres', 'DROP TABLESPACE ts1');
+   rmdir($tablespaceDir)
+   or die "unable to rmdir \"$tablespaceDir\"";
+}

PL/pgSQL nested CALL with transactions

2018-02-28 Thread Peter Eisentraut

So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed.  This patch fixes
that by handling CALL and DO specially in PL/pgSQL, passing the
atomic/nonatomic execution context through and doing the required
management around transaction boundaries.

This requires a new flag in SPI to run SPI_execute* without snapshot
management.  This currently poked directly into the plan struct,
requiring access to spi_priv.h.  This could use some refinement ideas.

Other PLs are currently not supported, but they could be extended in the
future using the principles being worked out here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From fa13c429d1b401643af4fa87f553784edaa1515a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 28 Feb 2018 14:50:11 -0500
Subject: [PATCH v1] PL/pgSQL: Nested CALL with transactions

So far, a nested CALL or DO in PL/pgSQL would not establish a context
where transaction control statements were allowed.  This fixes that by
handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic
execution context through and doing the required management around
transaction boundaries.

XXX This requires a new flag in SPI to run SPI_execute* without snapshot
management.  This currently poked directly into the plan struct,
requiring access to spi_priv.h.  This could use some refinement ideas.
---
 doc/src/sgml/plpgsql.sgml  | 13 +++-
 src/backend/executor/spi.c | 12 +--
 src/backend/tcop/utility.c |  4 +-
 src/include/executor/spi_priv.h|  1 +
 src/include/tcop/utility.h |  1 +
 .../plpgsql/src/expected/plpgsql_transaction.out   | 73 +-
 src/pl/plpgsql/src/pl_exec.c   | 86 --
 src/pl/plpgsql/src/pl_funcs.c  | 25 +++
 src/pl/plpgsql/src/pl_gram.y   | 37 +-
 src/pl/plpgsql/src/pl_handler.c|  4 +-
 src/pl/plpgsql/src/pl_scanner.c|  2 +
 src/pl/plpgsql/src/plpgsql.h   | 16 +++-
 src/pl/plpgsql/src/sql/plpgsql_transaction.sql | 42 +--
 13 files changed, 273 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index c1e3c6a19d..04aa1759cd 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3447,9 +3447,9 @@ Looping Through a Cursor's Result
Transaction Management
 

-In procedures invoked by the CALL command from the top
-level as well as in anonymous code blocks (DO command)
-called from the top level, it is possible to end transactions using the
+In procedures invoked by the CALL command
+as well as in anonymous code blocks (DO command),
+it is possible to end transactions using the
 commands COMMIT and ROLLBACK.  A new
 transaction is started automatically after a transaction is ended using
 these commands, so there is no separate START
@@ -3479,6 +3479,13 @@ Transaction Management
 

 
+   
+Transaction control is only possible in CALL or
+DO invocations from the top level or nested
+CALL or DO invocations without any
+other intervening command.
+   
+

 A transaction cannot be ended inside a loop over a query result, nor
 inside a block with exception handlers.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 9fc4431b80..174ec6cd5a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2042,7 +2042,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 * In the first two cases, we can just push the snap onto the stack once
 * for the whole plan list.
 */
-   if (snapshot != InvalidSnapshot)
+   if (snapshot != InvalidSnapshot && !plan->no_snapshots)
{
if (read_only)
{
@@ -2121,7 +2121,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 * In the default non-read-only case, get a new snapshot, 
replacing
 * any that we pushed in a previous cycle.
 */
-   if (snapshot == InvalidSnapshot && !read_only)
+   if (snapshot == InvalidSnapshot && !read_only && 
!plan->no_snapshots)
{
if (pushed_active_snap)
PopActiveSnapshot();
@@ -2172,7 +2172,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 * If not read-only mode, advance the command counter 
before each
 * command and update the snapshot.
 */
-   if (!read_only)
+   if (!read_only && !plan->no_snapshots)
{

Re: Online enabling of checksums

2018-02-28 Thread Alvaro Herrera

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name).  I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: handling of heap rewrites in logical decoding

2018-02-28 Thread Peter Eisentraut

On 2/25/18 07:27, Craig Ringer wrote:
> I'm pretty sure we _will_ want the ability to decode and stream rewrite
> contents for non-IMMUTABLE table rewrites.
> 
> Filtering out by default is OK by me, but I think making it impossible
> to decode is a mistake. So I'm all for the oid option and had written a
> suggestion for it before I saw you already mentioned it  in the next
> part of your mail.
> 
> The main issue with filtering out rewrites by default is that I don't
> see how, if we default to ignore/filter-out, plugins would indicate
> "actually I want to choose about this one" or "I understand table
> rewrites". I'd prefer not to add another change callback.

Second version, which uses an OID.  I added another field to the output
plugin options (next to the output_type), to indicate whether the plugin
wants to receive these changes.  I added some test cases to
test_decoding to show how it works either way.  It's a bit messy to pass
this setting through to the ReorderBuffer; maybe there is a better idea.
 But the result seems pretty useful.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7463a354906744ec78d1537be6a3993f632f8a3c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 24 Feb 2018 17:03:57 -0500
Subject: [PATCH v2] Handle heap rewrites even better in logical decoding

Logical decoding should not publish anything about tables created as
part of a heap rewrite during DDL.  Those tables don't exist externally,
so consumers of logical decoding cannot do anything sensible with that
information.  In ab28feae2bd3d4629bd73ae3548e671c57d785f0, we worked
around this for built-in logical replication, but that was hack.

This is a more proper fix: We mark such transient heaps using the new
field pg_class.relwrite, linking to the original relation OID.  By
default, we ignore them in logical decoding before they get to the
output plugin.  Optionally, a plugin can register their interest in
getting such changes, if the handle DDL specially, in which case the new
field will help them get information about the actual table.
---
 .../test_decoding/expected/concurrent_ddl_dml.out  | 82 --
 contrib/test_decoding/expected/ddl.out | 20 +++---
 .../test_decoding/specs/concurrent_ddl_dml.spec|  2 +-
 contrib/test_decoding/sql/ddl.sql  |  5 +-
 contrib/test_decoding/test_decoding.c  | 14 
 doc/src/sgml/catalogs.sgml | 12 
 doc/src/sgml/logicaldecoding.sgml  |  5 ++
 src/backend/bootstrap/bootparse.y  |  1 +
 src/backend/catalog/heap.c |  4 ++
 src/backend/catalog/toasting.c |  1 +
 src/backend/commands/cluster.c |  1 +
 src/backend/commands/tablecmds.c   |  1 +
 src/backend/replication/logical/logical.c  |  4 ++
 src/backend/replication/logical/reorderbuffer.c|  7 ++
 src/backend/replication/pgoutput/pgoutput.c| 26 ---
 src/include/catalog/heap.h |  1 +
 src/include/catalog/pg_class.h | 22 +++---
 src/include/replication/output_plugin.h|  1 +
 src/include/replication/reorderbuffer.h|  2 +
 19 files changed, 109 insertions(+), 102 deletions(-)

diff --git a/contrib/test_decoding/expected/concurrent_ddl_dml.out 
b/contrib/test_decoding/expected/concurrent_ddl_dml.out
index a15bfa292e..1f9e7661b7 100644
--- a/contrib/test_decoding/expected/concurrent_ddl_dml.out
+++ b/contrib/test_decoding/expected/concurrent_ddl_dml.out
@@ -10,7 +10,7 @@ step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 
1);
 step s2_alter_tbl2_float: ALTER TABLE tbl2 ALTER COLUMN val2 TYPE float;
 step s1_insert_tbl2: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
 step s1_commit: COMMIT;
-step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data 
FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 
'0', 'skip-empty-xacts', '1');
+step s2_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
 data   
 
 BEGIN  
@@ -32,16 +32,13 @@ step s2_alter_tbl1_float: ALTER TABLE tbl1 ALTER COLUMN 
val2 TYPE float; 
-step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data 
FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 
'0', 'skip-empty-xacts', '1');
+step s2_get_changes: SELECT data FROM 
pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 
'skip-empty-xacts', '1');
 data   
 
 BEGIN  
 table public.tbl1: INSERT: val1[integer]:1 val2[integer]:1
 table public.tbl2: INSERT: val1[integer]:1 val2[integer]:1
 COMMIT 
-BEGIN  
-table public.pg_temp: INSERT: val1[integer]:1 val2[double precision]:1

Re: [PATCH] Verify Checksums during Basebackups

2018-02-28 Thread David Steele

On 2/28/18 1:08 PM, Michael Banck wrote:
> 
> The attached small patch verifies checksums (in case they are enabled)
> during a basebackup. The rationale is that we are reading every block in
> this case anyway, so this is a good opportunity to check them as well.
> Other and complementary ways of checking the checksums are possible of
> course, like the offline checking tool that Magnus just submitted.

+1.  I've done some work in this area so I have signed up to review.

-- 
-David
da...@pgmasters.net

Re: [HACKERS] Cast jsonb to numeric, int, float, bool

2018-02-28 Thread Anastasia Lubennikova


01.02.2017 17:41, Anastasia Lubennikova:
Now the simplest way to extract booleans and numbers from json/jsonb 
is to cast it

to text and then cast to the appropriate type:

postgres=# select 'true'::jsonb::text::bool;
 bool
--
 t
postgres=# select '1.0'::jsonb::text::numeric;
 numeric
-
 1.0


This patch implements direct casts from jsonb numeric (jbvNumeric) to 
numeric, int4 and float8,

and from jsonb bool (jbvBool) to bool.

postgres=# select 'true'::jsonb::bool;
 bool
--
 t

postgres=# select '1.0'::jsonb::numeric;
 numeric
-
 1.0


Waiting for your feedback.
If you find it useful, I can also add support of json and other types, 
such as smallint and bigint.





I totally forgot about this thread, but it is a small but useful feature.
Maybe it's time to dust it off.

This patch was made by request of one of our clients,
and though they already use custom build, I think it would be better to 
have these casts in core.


The patch is attached.
There are no tests yet, since I don't really sure what set of types do 
we want to support.

Now the patch provides jsonb to numeric, int4, float8 and bool.

Also I have some doubts about castcontext. But 'explisit' seems to be 
the correct one here.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 03b57ad50b9fe5bc785958bcf16badb58971d489
Author: Anastasia 
Date:   Wed Feb 28 21:14:31 2018 +0300

Cast jsonbNumeric to numeric, int4 and float8.
Cast jsonbBool to bool

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 0f70180..3678dbe 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1845,3 +1845,129 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(out);
 }
+
+Datum
+jsonb_numeric(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_NUMERIC(v.val.numeric);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_int4(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4,
+		  NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_float8(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_FLOAT8(DatumGetFloat8(DirectFunctionCall1(numeric_float8,
+			NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_bool(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB_P(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvBool)
+			PG_RETURN_BOOL(v.val.boolean);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json boolean")));
+}
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index b447818..ea765df 100644
---

Re: Server won't start with fallback setting by initdb.

2018-02-28 Thread Robert Haas

On Fri, Feb 9, 2018 at 3:08 AM, Kyotaro HORIGUCHI
 wrote:
> I'm not sure such a case happens in the real world nowadays,
> initdb uses the fallback value of max_connections=10. But it is
> out of favor of server since it is not larger than
> max_wal_senders(10).
>
>> postgres: max_wal_senders must be less than max_connections
>
> I think that we can safely increase the fallback value to 20 with
> which regtests are known not to fail. I believe that is
> preferable than explicitly reducing max_wal_senders in the
> generated config file. I confirmed that tegtest won't fail with
> the value. (Except with permanent failure of dynamic shared
> memory)

I propose an alternative fix: let's instead change the code like this:

if (max_wal_senders > MaxConnections)
{
write_stderr("%s: max_wal_senders may not be more than
max_connections\n", progname);
ExitPostmaster(1);
}

That way, the behavior would match the documentation, which says:

"WAL sender processes count towards the total number of connections,
so the parameter cannot be set higher than max_connections."

Alternatively, we could change the documentation to match the code and
then do this.  But it's not good that the documentation and the code
don't agree on whether equal values are acceptable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[PATCH] Verify Checksums during Basebackups

2018-02-28 Thread Michael Banck

Hi,

some installations have data which is only rarerly read, and if they are
so large that dumps are not routinely taken, data corruption would only
be detected with some large delay even with checksums enabled.

The attached small patch verifies checksums (in case they are enabled)
during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

It probably makes sense to use the same approach for determining the
segment numbers as Magnus did in his patch, or refactor that out in a
utility function, but I'm sick right now so wanted to submit this for
v11 first.

I did some light benchmarking and it seems that the performance
degradation is minimal, but this could well be platform or
architecture-dependent. Right now, the checksums are always checked but
maybe this could be made optional, probably by extending the replication
protocol.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..841471cfce 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -30,6 +31,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1199,10 +1202,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char		page[BLCKSZ];
 	size_t		pad;
+	PageHeader  phdr;
+	BlockNumber segno = 0;
+	bool		verify_checksum;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1214,10 +1225,54 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
  errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/').
+	 */
+	if (DataChecksumsEnabled() &&
+		(strncmp(readfilename, "./global/", 9) == 0 ||
+		strncmp(readfilename, "./base/", 7) == 0 ||
+		strncmp(readfilename, "/", 1) == 0))
+		verify_checksum = 1;
+	else
+		verify_checksum = 0;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Determine segment number for possible checksum verification, as block
+	 * numbers are ongoing. The initial block number of a multi-segment file is
+	 * the segment number times RELSEG_SIZE.
+	 */
+	filename = basename(readfilename);
+	if (strstr(filename, "."))
+		segno = atoi(strstr(filename, ".")+1);
+	blkno = segno * RELSEG_SIZE;
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ.
+			 */
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+checksum = pg_checksum_page((char *) page, blkno);
+phdr = (PageHeader) page;
+if (phdr->pd_checksum != checksum)
+	ereport(WARNING,
+			(errmsg("checksum mismatch in file \"%s\", block %d: "
+			"expected %x, found %x", readfilename, blkno,
+			checksum, phdr->pd_checksum)));
+blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..b410311791 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 83;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my

Re: Parallel index creation & pg_stat_activity

2018-02-28 Thread Andres Freund

Hi Peter,

On 2018-02-28 16:50:44 +, Phil Florent wrote:
> With an index creation (create index t1_i1 on t1(c1, c2);) I have this kind 
> of output :
> 
> ./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event"
> busy_pc | distinct_exe | pid  |  backend_type  |   query  
>  | wait_event_type |  wait_event
> -+--+--++---+-+--
>   68 | 1 / 136  | 8262 | client backend | create index t1_i1 on 
> t1(c1, c2); | IO  | DataFileRead
>   26 | 1 / 53   | 8262 | client backend | create index t1_i1 on 
> t1(c1, c2); | |
>6 | 1 / 11   | 8262 | client backend | create index t1_i1 on 
> t1(c1, c2); | IO  | BufFileWrite
> (3 rows)

> No parallel worker. At least one parallel worker was active though, I could 
> see its work with a direct query on pg_stat_activity or a ps -ef :
> 
> ...
> postgres  8262  8230  7 08:54 ?00:22:46 postgres: 11/main: postgres 
> postgres [local] CREATE INDEX
> ...
> postgres  9833  8230 23 14:17 ?00:00:33 postgres: 11/main: parallel 
> worker for PID 8262
> ...

Looks like we're not doing a pgstat_report_activity() in the workers?
Any argument for not doing so?

Greetings,

Andres Freund

Re: Changing the autovacuum launcher scheduling; oldest table first algorithm

2018-02-28 Thread Grigory Smolkin




On 02/28/2018 12:04 PM, Masahiko Sawada wrote:

Hi,

I've created the new thread for the changing AV launcher scheduling.
The problem of AV launcher scheduling is described on [1] but I
summarize it here.

If there is even one database that is at risk of wraparound, currently
AV launcher selects the database having the oldest datfrozenxid until
all tables in that database have been vacuumed. This leads that tables
on other database can bloat and not be frozen because other database
are not selected by AV launcher. It makes things worse if the database
has a large table that takes a long time to be vacuumed.

Attached patch changes the AV launcher scheduling algorithm based on
the proposed idea by Robert so that it selects a database first that
has the oldest table when the database is at risk of wraparound. For
detail of the algorithm please refer to [2].

In this algorithm, the running AV workers advertise the next
datfrozenxid on the shared memory that we will have. That way, AV
launcher can select a database that has the oldest table in the
database cluster. However, this algorithm doesn't support the case
where the age of next datfrozenxid gets greater than
autovacum_vacuum_max_age. This case can happen if users set
autovacvuum_vacuum_max_age to small value and vacuuming the whole
database takes a long time. But since it's not a common case and it
doesn't degrade than current behavior even if this happened, I think
it's not a big problem.

Feedback is very welcome.

[1] 
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05
[2] 
https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Hello!
I`ve tried to compile your patch on Fedora24 with gcc 6.3.1 20161221:

gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -g3 -O0 -I../../../src/include  
-D_GNU_SOURCE   -c -o autovacuum.o autovacuum.c

In file included from ../../../src/include/postgres.h:46:0,
 from autovacuum.c:62:
autovacuum.c: In function ‘get_next_xid_bounds’:
autovacuum.c:1979:9: warning: implicit declaration of function 
‘TransactionIdIsVaild’ [-Wimplicit-function-declaration]

  Assert(TransactionIdIsVaild(frozenxid));
 ^
../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’
   if (condition) \
   ^
autovacuum.c:1979:2: note: in expansion of macro ‘Assert’
  Assert(TransactionIdIsVaild(frozenxid));
  ^~
autovacuum.c:1980:28: error: ‘minmutli’ undeclared (first use in this 
function)

  Assert(MultiXactIdIsValid(minmutli));
^
../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’
   if (condition) \
   ^
autovacuum.c:1980:2: note: in expansion of macro ‘Assert’
  Assert(MultiXactIdIsValid(minmutli));
  ^~
autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’
  Assert(MultiXactIdIsValid(minmutli));
 ^~
autovacuum.c:1980:28: note: each undeclared identifier is reported only 
once for each function it appears in

  Assert(MultiXactIdIsValid(minmutli));
^
../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’
   if (condition) \
   ^
autovacuum.c:1980:2: note: in expansion of macro ‘Assert’
  Assert(MultiXactIdIsValid(minmutli));
  ^~
autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’
  Assert(MultiXactIdIsValid(minmutli));
 ^~
: recipe for target 'autovacuum.o' failed
make[3]: *** [autovacuum.o] Error 1

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] path toward faster partition pruning

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 8:12 PM, Amit Langote
 wrote:
> Ah, OK. I was missing that there is no need to have both parttypcoll and
> partcollation in PartitionSchemeData, as the Vars in rel->partexprs are
> built from a bare PartitionKey (not PartitionSchemeData), and after that
> point, parttypcoll no longer needs to kept around.
>
> I noticed that there is a typo in the patch.
>
> +   memcpy(part_scheme->partcollation, partkey->parttypcoll,
>
> s/parttypcoll/partcollation/g

Committed your version.

> BTW, should there be a relevant test in partition_join.sql?  If yes,
> attached a patch (partitionwise-join-collation-test-1.patch) to add one.

I don't feel strongly about it, but I'm not going to try to prevent
you from adding one, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2018-02-28 Thread Shubham Barai

On 28 February 2018 at 05:51, Thomas Munro 
wrote:

> On Wed, Jan 3, 2018 at 4:31 AM, Shubham Barai 
> wrote:
> > I have created new isolation tests. Please have a look at
> > updated patch.
>
> Hi Shubham,
>
> Could we please have a rebased version of the gin one?
>

Sure. I have attached a rebased version

Regards,
Shubham


Predicate-Locking-in-gin-index_v5.patch
Description: Binary data

Re: Online enabling of checksums

2018-02-28 Thread Robert Haas

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander  wrote:
> Also if that wasn't clear -- we only do the full page write if there isn't
> already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect.  Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] MERGE SQL Statement for PG11

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 5:07 PM, Peter Geoghegan  wrote:
> I now feel like Simon's suggestion of throwing an error in corner
> cases isn't so bad. It still seems like we could do better, but the
> more I think about it, the less that seems like a cop-out. My reasons
> are:

I still think we really ought to try not to add a new class of error.

> * We can all agree that *not* raising an error in the specific way
> Simon proposes is possible, somehow or other. We also all agree that
> avoiding the broader category of RC errors can only be taken so far
> (e.g. in any event duplicate violations errors are entirely possible,
> in RC mode, when a MERGE inserts a row). So this is a question of what
> exact middle ground to take. Neither of the two extremes (throwing an
> error on the first sign of a RC conflict, and magically preventing
> concurrency anomalies) are actually on the table.

Just because there's no certainty about which behavior is best doesn't
mean that none of them are better than throwing an error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2018-02-28 Thread Tomas Vondra

OK, time to revive this old thread ...

On 09/23/2017 05:27 PM, Tom Lane wrote:
> Tomas Vondra  writes:
 [ scalarineqsel may fall over when used by extension operators ]
> 
>> What about using two-pronged approach:
> 
>> 1) fall back to mid bucket in back branches (9.3 - 10)
>> 2) do something smarter (along the lines you outlined) in PG11
> 
> Sure.  We need to test the fallback case anyway.
> 

Attached is a minimal fix adding a flag to convert_numeric_to_scalar,
tracking when it fails because of unsupported data type. If any of the 3
calls (value + lo/hi boundaries) sets it to 'true' we simply fall back
to the default estimate (0.5) within the bucket.

>>> [ sketch of a more extensible design ]
> 
>> Sounds reasonable to me, I guess - I can't really think about anything
>> simpler giving us the same flexibility.
> 
> Actually, on further thought, that's still too simple.  If you look
> at convert_string_to_scalar() you'll see it's examining all three
> values concurrently (the probe value, of one datatype, and two bin
> boundary values of possibly a different type).  The reason is that
> it's stripping off whatever common prefix those strings have before
> trying to form a numeric equivalent.  While certainly
> convert_string_to_scalar is pretty stupid in the face of non-ASCII
> sort orders, the prefix-stripping is something I really don't want
> to give up.  So the design I sketched of considering each value
> totally independently isn't good enough.
> 
> We could, perhaps, provide an API that lets an operator estimation
> function replace convert_to_scalar in toto, but that seems like
> you'd still end up duplicating code in many cases.  Not sure about
> how to find a happy medium.
> 

I plan to work on this improvement next, once I polish a couple of other
patches for the upcoming commit fest.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fcc8323..5f6019a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -176,7 +176,7 @@ static bool estimate_multivariate_ndistinct(PlannerInfo *root,
 static bool convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue,
   Datum lobound, Datum hibound, Oid boundstypid,
   double *scaledlobound, double *scaledhibound);
-static double convert_numeric_to_scalar(Datum value, Oid typid);
+static double convert_numeric_to_scalar(Datum value, Oid typid, bool *unknown_type);
 static void convert_string_to_scalar(char *value,
 		 double *scaledvalue,
 		 char *lobound,
@@ -4071,10 +4071,15 @@ convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue,
 		case REGDICTIONARYOID:
 		case REGROLEOID:
 		case REGNAMESPACEOID:
-			*scaledvalue = convert_numeric_to_scalar(value, valuetypid);
-			*scaledlobound = convert_numeric_to_scalar(lobound, boundstypid);
-			*scaledhibound = convert_numeric_to_scalar(hibound, boundstypid);
-			return true;
+		{
+			bool	unknown_type = false;
+
+			*scaledvalue = convert_numeric_to_scalar(value, valuetypid, _type);
+			*scaledlobound = convert_numeric_to_scalar(lobound, boundstypid, _type);
+			*scaledhibound = convert_numeric_to_scalar(hibound, boundstypid, _type);
+
+			return (!unknown_type);
+		}
 
 			/*
 			 * Built-in string types
@@ -4147,7 +4152,7 @@ convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue,
  * Do convert_to_scalar()'s work for any numeric data type.
  */
 static double
-convert_numeric_to_scalar(Datum value, Oid typid)
+convert_numeric_to_scalar(Datum value, Oid typid, bool *unknown_type)
 {
 	switch (typid)
 	{
@@ -4185,9 +4190,11 @@ convert_numeric_to_scalar(Datum value, Oid typid)
 
 	/*
 	 * Can't get here unless someone tries to use scalarineqsel() on an
-	 * operator with one numeric and one non-numeric operand.
+	 * operator with one numeric and one non-numeric operand. Or more
+	 * precisely, operand that we don't know how to transform to scalar.
 	 */
-	elog(ERROR, "unsupported type: %u", typid);
+	*unknown_type = true;
+
 	return 0;
 }

Re: Registering LWTRANCHE_PARALLEL_HASH_JOIN

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 3:58 PM, Thomas Munro
 wrote:
> On Wed, Feb 28, 2018 at 8:39 AM, Robert Haas  wrote:
>> On Sat, Feb 10, 2018 at 6:07 PM, Thomas Munro
>>  wrote:
>>> I forgot to register a display name for LWTRANCHE_PARALLEL_HASH_JOIN,
>>> the tranche ID used by the LWLock that Parallel Hash uses when handing
>>> out chunks of memory.  Please see attached.
>>
>> I think that you need to insert some weasel words into the
>> documentation for this, because I don't think it's really accurate to
>> say that it's only used when trying to acquire a new chunk of memory.
>>
>> Or maybe I'm wrong and it's altogether accurate ... but
>> ExecParallelHashMergeCounters doesn't look like an allocation to me,
>> and ExecParallelHashTuplePrealloc doesn't really look like an
>> allocation either.
>
> Ok.  How about this?
>
> I noticed that some of the descriptions don't attempt to explain what
> activity the lock protects at all, they just say "Waiting for $BLAH
> lock".  I went the other way and covered the various different uses.
> There are 4 uses for the lock but only three things in my list,
> because I think "allocate" covers both ExecParallelHashTupleAlloc()
> and ExecParallelHashTuplePrealloc().

Well, the trouble with that of course is that if something changes
later then we have to update the docs, whereas if we keep it vague
then we avoid that.  But I've committed that version as you have it
and maybe by the time it needs to be updated they'll have made you a
committer and you can be the poor shmuck who has to spend time
committing fixes like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ON CONFLICT DO UPDATE for partitioned tables

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
 wrote:
> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time.  As far as I can
> tell, it works as intended.
>
> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple).  While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

I would have thought that to be the only case we could support with
the current infrastructure.  Doesn't a correct implementation for any
other case require a global index?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

2018-02-28 Thread Anastasia Lubennikova


Hi,

I want to propose a bunch of patches which allow to reduce WAL traffic
generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
and RUM, we can now log index pages of other access methods only once
in the end of indexbuild process. Implementation is based on generic_xlog.

Not only it decreases the amount of WAL generated, but also completely
eliminates WAL overhead in case of error during index build.

I also attached sql scripts which I used to measure xlog size.
They show that pg_wal_lsn_diff for patched version is from 3 to 5 times 
smaller.



Not sure if regression tests are needed, since it is just an optimization.
But I do not mind to add them if someone feels that it is necessary.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 179285fb5175d715c20fc95eca3087b6a1899ed9
Author: Anastasia 
Date:   Wed Feb 28 17:45:54 2018 +0300

add function generate_xlog_for_rel()

diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index ce02354..dd2c041 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -545,3 +545,34 @@ generic_mask(char *page, BlockNumber blkno)
 
 	mask_unused_space(page);
 }
+
+/*
+ * Function to write generic xlog for every existing block of a relation.
+ * Caller is responsible for locking the relation exclusively.
+ */
+void
+generate_xlog_for_rel(Relation rel)
+{
+	BlockNumber blkno;
+	BlockNumber nblocks;
+
+	nblocks = RelationGetNumberOfBlocks(rel);
+
+	elog(DEBUG2, "generate_xlog_for_rel '%s', nblocks %u BEGIN.",
+		 RelationGetRelationName(rel), nblocks);
+
+	for (blkno = 0; blkno < nblocks; blkno++)
+	{
+		Buffer	buffer;
+		GenericXLogState *state;
+
+		CHECK_FOR_INTERRUPTS();
+
+		buffer = ReadBuffer(rel, blkno);
+		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+		state = GenericXLogStart(rel);
+		GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
+		GenericXLogFinish(state);
+
+		UnlockReleaseBuffer(buffer);
+	}
+	elog(DEBUG2, "generate_xlog_for_rel '%s' END.", RelationGetRelationName(rel));
+}
diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h
index b23e1f6..33be157 100644
--- a/src/include/access/generic_xlog.h
+++ b/src/include/access/generic_xlog.h
@@ -42,4 +42,7 @@ extern const char *generic_identify(uint8 info);
 extern void generic_desc(StringInfo buf, XLogReaderState *record);
 extern void generic_mask(char *pagedata, BlockNumber blkno);
 
+/* other utils */
+void generate_xlog_for_rel(Relation rel);
+
 #endif			/* GENERIC_XLOG_H */
commit e176bd8f650a4b112fa2e61960a27cb57329138c
Author: Anastasia 
Date:   Wed Feb 28 17:53:15 2018 +0300

optimal WAL for gin

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 37070b3..c615d3c 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -388,7 +388,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		/* It will fit, perform the insertion */
 		START_CRIT_SECTION();
 
-		if (RelationNeedsWAL(btree->index))
+		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
 		{
 			XLogBeginInsert();
 			XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD);
@@ -409,7 +409,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 			MarkBufferDirty(childbuf);
 		}
 
-		if (RelationNeedsWAL(btree->index))
+		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
 		{
 			XLogRecPtr	recptr;
 			ginxlogInsert xlrec;
@@ -566,7 +566,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
 		}
 
 		/* write WAL record */
-		if (RelationNeedsWAL(btree->index))
+		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
 		{
 			XLogRecPtr	recptr;
 
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index f9daaba..349baa7 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -592,7 +592,7 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 		 * Great, all the items fit on a single page.  If needed, prepare data
 		 * for a WAL record describing the changes we'll make.
 		 */
-		if (RelationNeedsWAL(btree->index))
+		if (RelationNeedsWAL(btree->index) && !btree->isBuild)
 			computeLeafRecompressWALData(leaf);
 
 		/*
@@ -629,6 +629,7 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 		 * subsequent insertions will probably also go to the end. This packs
 		 * the index somewhat tighter when appending to a table, which is very
 		 * common.
+		 *
 		 */
 		if (!btree->isBuild)
 		{
@@ -718,7 +719,7 @@ dataExecPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 	dataPlaceToPageLeafRecompress(buf, leaf);
 
 	/* If needed, register WAL data built by computeLeafRecompressWALData */
-	if (RelationNeedsWAL(btree->index))
+	if

Re: server crash in nodeAppend.c

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 5:24 AM, Rajkumar Raghuwanshi
 wrote:
> SET parallel_setup_cost=0;
> SET parallel_tuple_cost=0;
> create or replace function foobar() returns setof text as
> $$ select 'foo'::varchar union all select 'bar'::varchar ; $$
> language sql stable;
>
> postgres=# select foobar();
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> --logfile
> 2018-02-26 22:15:52.908 IST [61738] LOG:  database system is ready to accept
> connections
> TRAP: FailedAssertion("!(whichplan >= 0 && whichplan <= node->as_nplans)",
> File: "nodeAppend.c", Line: 365)
> 2018-02-26 22:17:50.441 IST [61738] LOG:  server process (PID 61748) was
> terminated by signal 6: Aborted
> 2018-02-26 22:17:50.441 IST [61738] DETAIL:  Failed process was running:
> select foobar();

Nice test case.  I pushed commit
ce1663cdcdbd9bf15c81570277f70571b3727dd3, including your test case, to
fix this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

PATCH: Exclude temp relations from base backup

2018-02-28 Thread David Steele

This is a follow-up patch from the exclude unlogged relations discussion
[1].

The patch excludes temporary relations during a base backup using the
existing looks_like_temp_rel_name() function for identification.

It shares code to identify database directories from [1], so for now
that has been duplicated in this patch to make it independent.  I'll
rebase depending on what gets committed first.

Thanks,
-- 
-David
da...@pgmasters.net

[1]
https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3cec9e0b0c..3880a9134b 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2549,7 +2549,7 @@ The commands accepted in walsender mode are:
 
  Various temporary files and directories created during the operation
  of the PostgreSQL server, such as any file or directory beginning
- with pgsql_tmp.
+ with pgsql_tmp and temporary relations.
 


diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 185f32a5f9..f0c3d13b2b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -26,6 +26,7 @@
 #include "nodes/pg_list.h"
 #include "pgtar.h"
 #include "pgstat.h"
+#include "port.h"
 #include "postmaster/syslogger.h"
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
@@ -958,6 +959,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, 
List *tablespaces,
charpathbuf[MAXPGPATH * 2];
struct stat statbuf;
int64   size = 0;
+   const char  *lastDir;   /* Split last dir from 
parent path. */
+   boolisDbDir = false;/* Does this directory contain 
relations? */
+
+   /*
+* Determine if the current path is a database directory that can
+* contain relations.
+*
+* Start by finding the location of the delimiter between the parent
+* path and the current path.
+*/
+   lastDir = last_dir_separator(path);
+
+   /* Does this path look like a database path (i.e. all digits)? */
+   if (lastDir != NULL &&
+   strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1))
+   {
+   /* Part of path that contains the parent directory. */
+   int parentPathLen = lastDir - path;
+
+   /*
+* Mark path as a database directory if the parent path is 
either
+* $PGDATA/base or a tablespace version path.
+*/
+   if (strncmp(path, "./base", parentPathLen) == 0 ||
+   (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) 
- 1) &&
+strncmp(lastDir - 
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+TABLESPACE_VERSION_DIRECTORY,
+sizeof(TABLESPACE_VERSION_DIRECTORY) - 
1) == 0))
+   isDbDir = true;
+   }
 
dir = AllocateDir(path);
while ((de = ReadDir(dir, path)) != NULL)
@@ -1007,6 +1038,16 @@ sendDir(const char *path, int basepathlen, bool 
sizeonly, List *tablespaces,
if (excludeFound)
continue;
 
+   /* Exclude temporary relations */
+   if (isDbDir && looks_like_temp_rel_name(de->d_name))
+   {
+   elog(DEBUG2,
+"temporary relation file \"%s\" excluded from 
backup",
+de->d_name);
+
+   continue;
+   }
+
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name);
 
/* Skip pg_control here to back up it last */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 2a18e94ff4..d30a725f90 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -325,7 +325,6 @@ static void RemovePgTempFilesInDir(const char *tmpdirname, 
bool missing_ok,
   bool unlink_all);
 static void RemovePgTempRelationFiles(const char *tsdirname);
 static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname);
-static bool looks_like_temp_rel_name(const char *name);
 
 static void walkdir(const char *path,
void (*action) (const char *fname, bool isdir, int elevel),
@@ -3192,7 +3191,7 @@ RemovePgTempRelationFilesInDbspace(const char 
*dbspacedirname)
 }
 
 /* t_, or t__ */
-static bool
+bool
 looks_like_temp_rel_name(const char *name)
 {
int pos;
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl 
b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..16bd91f63e 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl

Re: Let's remove DSM_INPL_NONE.

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 11:30 PM, Tom Lane  wrote:
> I'd be in favor of having some normally-ifdef'd-out facility for causing
> failures of this kind.  (As I mentioned upthread, I do not think "always
> fail" is sufficient.)  That's very different from having a user-visible
> option, though.  We don't expose a GUC that enables CLOBBER_FREED_MEMORY,
> to take a relevant example.

Sure.  Feel free to propose something you like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Kerberos test suite

2018-02-28 Thread Peter Eisentraut

On 2/27/18 00:56, Thomas Munro wrote:
> FWIW it passes for me if I add this:
> 
> +elsif ($^O eq 'freebsd')
> +{
> +   $krb5_bin_dir = '/usr/local/bin';
> +   $krb5_sbin_dir = '/usr/local/sbin';

I suppose you only need the second one, right?  The first one should be
in the path.

> +}
> 
> One thing that surprised me is that your test runs my system's
> installed psql command out of my $PATH, not the one under test.  Is
> that expected?

Oh, you probably have a psql in /usr/local/bin, which we prepend to the
path, per the above.  We should probably append instead.  The ldap test
suite has the same issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Direct converting numeric types to bool

2018-02-28 Thread Alex Ignatov


-Original Message-
From: n.zhuch...@postgrespro.ru [mailto:n.zhuch...@postgrespro.ru] 
Sent: Wednesday, February 28, 2018 6:04 PM
To: pgsql-hackers 
Subject: Direct converting numeric types to bool

Attached patch allow direct convertion of numeric types to bool like
integer::bool.
Supported types:
  - smallint;
  - bigint;
  - real;
  - double precision;
  - decimal(numeric).

This functionality is helped with migration from Oracle.

--
Nikita Zhuchkov
Postgres Professional: http://www.postgrespro.com The Russian Postgres
Company

Hello!

What prevent us from:

postgres=# select 1::bigint::int::boolean;
 bool
--
 t
(1 row)

It is just one additional casting and required no additional patching
--
Alex Ignatov 
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 5:14 PM, Tom Lane  wrote:
>> I ran the postgres_fdw regression test with no sleep two times in a
>> CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with
>> the sleep (60 seconds) two times, but I couldn't reproduce that in both
>> cases.  I suspect the changes in the order of the RETURNING output there
>> was still caused by autovacuum kicking in partway through the run.  So
>> to make the regression test more stable against autovacuum, I'd propose
>> to modify the regression test to disable autovacuum for the remote table
>> (ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually
>> instead) in hopes of getting that fixed.  Attached is a patch for that.
>>   I think changes added by the previous patch wouldn't make sense
>> anymore, so I removed those changes.
>
> Ping?  We're still seeing those failures on jaguarundi.

This is another patch that got past me.  Committed now.

I'd be lying if I said I was filled with confidence that this was
going to fix it, but I don't know what to do other than keep tinkering
with it until we find something that works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Direct converting numeric types to bool

2018-02-28 Thread Pavel Stehule

2018-02-28 16:13 GMT+01:00 Pavel Stehule :

> Hi
>
> 2018-02-28 16:06 GMT+01:00 :
>
>> n.zhuch...@postgrespro.ru писал 2018-02-28 18:04:
>>
>> Attached patch allow direct convertion of numeric types to bool like
>>> integer::bool.
>>> Supported types:
>>>  - smallint;
>>>  - bigint;
>>>  - real;
>>>  - double precision;
>>>  - decimal(numeric).
>>>
>>> This functionality is helped with migration from Oracle.
>>>
>>
> Looks little bit obscure to upstream code (can lives as extension outside)
>
> all work can be done be function
>
> CREATE OR REPLACE FUNCTION public.to_bool(anyelement)
>  RETURNS boolean
>  LANGUAGE sql
>  IMMUTABLE STRICT
> AS $function$
> select $1::int::boolean $function$
>
> I really doesn't see any sense to allow cast from double to boolean
>

Long time Oracle had not boolean, so some ugly tricks was necessary there.
There are not reason do same in Postgres.


>
> -1 from me
>
> Regards
>
> Pavel
>
>

Re: Direct converting numeric types to bool

2018-02-28 Thread Pavel Stehule

Hi

2018-02-28 16:06 GMT+01:00 :

> n.zhuch...@postgrespro.ru писал 2018-02-28 18:04:
>
> Attached patch allow direct convertion of numeric types to bool like
>> integer::bool.
>> Supported types:
>>  - smallint;
>>  - bigint;
>>  - real;
>>  - double precision;
>>  - decimal(numeric).
>>
>> This functionality is helped with migration from Oracle.
>>
>
Looks little bit obscure to upstream code (can lives as extension outside)

all work can be done be function

CREATE OR REPLACE FUNCTION public.to_bool(anyelement)
 RETURNS boolean
 LANGUAGE sql
 IMMUTABLE STRICT
AS $function$
select $1::int::boolean $function$

I really doesn't see any sense to allow cast from double to boolean

-1 from me

Regards

Pavel

Re: Incorrect comments in partition.c

2018-02-28 Thread Robert Haas

On Wed, Feb 28, 2018 at 3:24 AM, Etsuro Fujita
 wrote:
> I'll add this to the upcoming commitfest.

Committed.  Sorry that I didn't notice this thread sooner (and that
the original commits didn't take care of it).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Direct converting numeric types to bool

2018-02-28 Thread Tom Lane

n.zhuch...@postgrespro.ru writes:
> Attached patch allow direct convertion of numeric types to bool like 
> integer::bool.
> Supported types:
>   - smallint;
>   - bigint;
>   - real;
>   - double precision;
>   - decimal(numeric).
> This functionality is helped with migration from Oracle.

I think you forgot to attach the patch, but in any case: is this
really a behavior we want?  "Oracle has it" is not a good argument
in my view, nor do I recall people complaining that they need such
a behavior to migrate.

regards, tom lane

Re: [HACKERS] [POC] Faster processing at Gather node

2018-02-28 Thread Robert Haas

On Tue, Feb 27, 2018 at 4:06 PM, Andres Freund  wrote:
>> OK, I'll try to check how feasible that would be.
>
> cool.

It's not too hard, but it doesn't really seem to help, so I'm inclined
to leave it alone.  To make it work, you need to keep two separate
counters in the shm_mq_handle, one for the number of bytes since we
did an increment and the other for the number of bytes since we sent a
signal.  I don't really want to introduce that complexity unless there
is a benefit.

With just 0001 and 0002: 3968.899 ms, 4043.428 ms, 4042.472 ms, 4142.226 ms
With two-separate-counters.patch added: 4123.841 ms, 4101.917 ms,
4063.368 ms, 3985.148 ms

If you take the total of the 4 times, that's an 0.4% slowdown with the
patch applied, but I think that's just noise.  It seems possible that
with a larger queue -- and maybe a different query shape it would
help, but I really just want to get the optimizations that I've got
committed, provided that you find them acceptable, rather than spend a
lot of time looking for new optimizations, because:

1. I've got other things to get done.

2. I think that the patches I've got here capture most of the available benefit.

3. This case isn't super-common in the first place -- we generally
want to avoid feeding tons of tuples through the Gather.

4. We might abandon the shm_mq approach entirely and switch to
something like sticking tuples in DSA using the flexible tuple slot
stuff you've proposed elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0002-shm-mq-reduce-receiver-latch-set-v3.patch
Description: Binary data

0001-shm-mq-less-spinlocks-v4.patch
Description: Binary data

two-separate-counters.patch
Description: Binary data

Re: Direct converting numeric types to bool

2018-02-28 Thread n . zhuchkov


n.zhuch...@postgrespro.ru писал 2018-02-28 18:04:

Attached patch allow direct convertion of numeric types to bool like
integer::bool.
Supported types:
 - smallint;
 - bigint;
 - real;
 - double precision;
 - decimal(numeric).

This functionality is helped with migration from Oracle.
diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 686528c..6cbfcf5 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -2970,3 +2970,83 @@ CREATE VIEW user_mappings AS
 FROM _pg_user_mappings;
 
 GRANT SELECT ON user_mappings TO PUBLIC;
+
+-- bool --> smallint(int2)
+CREATE OR REPLACE FUNCTION bool_smallint(boolean)
+  RETURNS smallint AS
+  'bool_int2'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (boolean AS smallint) WITH FUNCTION bool_smallint(boolean) as implicit;
+
+-- smallint(int2) --> bool
+CREATE OR REPLACE FUNCTION smallint_bool(smallint)
+  RETURNS bool AS
+  'int2_bool'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (smallint AS boolean) WITH FUNCTION smallint_bool(smallint) as implicit;
+
+-- bigint(int8) --> bool
+CREATE OR REPLACE FUNCTION bool_bigint(boolean)
+  RETURNS bigint AS
+  'bool_int8'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (boolean AS bigint) WITH FUNCTION bool_bigint(boolean) as implicit;
+
+-- bool --> bigint(int8)
+CREATE OR REPLACE FUNCTION bigint_bool(bigint)
+  RETURNS bool AS
+  'int8_bool'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (bigint AS boolean) WITH FUNCTION bigint_bool(bigint) as implicit;
+
+-- double precision(float8) --> bool
+CREATE OR REPLACE FUNCTION double_precision_bool(double precision)
+  RETURNS bool AS
+  'float8_bool'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (double precision AS boolean) WITH FUNCTION double_precision_bool(double precision) as implicit;
+
+-- bool --> double precision(float8)
+CREATE OR REPLACE FUNCTION bool_double_precision(boolean)
+  RETURNS double precision AS
+  'bool_float8'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (boolean AS double precision) WITH FUNCTION bool_double_precision(boolean) as implicit;
+
+-- real(float4) --> bool
+CREATE OR REPLACE FUNCTION real_bool(real)
+  RETURNS bool AS
+  'float4_bool'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (real AS boolean) WITH FUNCTION real_bool(real) as implicit;
+
+-- bool --> real(float4)
+CREATE OR REPLACE FUNCTION bool_real(boolean)
+  RETURNS real AS
+  'bool_float4'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (boolean AS real) WITH FUNCTION bool_real(boolean) as implicit;
+
+-- numeric(decimal) --> bool
+CREATE OR REPLACE FUNCTION numeric_bool(decimal)
+  RETURNS bool AS
+  'numeric_bool'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (decimal AS boolean) WITH FUNCTION numeric_bool(decimal) as implicit;
+
+-- bool --> numeric(decimal)
+CREATE OR REPLACE FUNCTION bool_numeric(boolean)
+  RETURNS decimal AS
+  'bool_numeric'
+  LANGUAGE internal IMMUTABLE STRICT  PARALLEL SAFE
+  COST 1;
+CREATE CAST (boolean AS decimal) WITH FUNCTION bool_numeric(boolean) as implicit;
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index bc6a3e0..a687104 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1331,6 +1331,53 @@ i2tof(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT4((float4) num);
 }
 
+ /*
+ *   float4_bool - converts a float4 number to a bool
+ *   epsilon - machine epsilon gives an upper bound on the relative error
+ *   due to rounding in floating point arithmetic.
+ */
+Datum
+float4_bool(PG_FUNCTION_ARGS)
+{
+ float epsilon = 5.96e-08;
+
+ if (fabs(PG_GETARG_FLOAT4(0)) <= epsilon)
+   PG_RETURN_BOOL(false);
+ else
+ PG_RETURN_BOOL(true);
+}
+
+/*
+ *   bool_float4 - converts a bool to a float4 number
+ */
+Datum
+bool_float4(PG_FUNCTION_ARGS)
+{
+ if (PG_GETARG_BOOL(0) == false)
+   PG_RETURN_FLOAT4(0);
+ else
+   PG_RETURN_FLOAT4(1);
+}
+
+/*
+ *   float8_bool - converts a float4 number to a bool
+ *   epsilon - machine epsilon gives an upper bound on the relative error
+ *   due to rounding in floating point arithmetic.
+ */
+Datum
+float8_bool(PG_FUNCTION_ARGS)
+{
+ float epsilon = 1.11e-16;
+
+ if (fabs(PG_GETARG_FLOAT8(0)) <= epsilon)
+   PG_RETURN_BOOL(false);
+ else
+   PG_RETURN_BOOL(true);
+}
+
+/*
+ *   bool_float8 - converts a bool to a float4 number
+ */
+Datum
+bool_float8(PG_FUNCTION_ARGS)
+{
+ if (PG_GETARG_BOOL(0) == false)
+   PG_RETURN_FLOAT8(0);
+ else
+   PG_RETURN_FLOAT8(1);
+}
 
 /*
  *		===
diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 559c365..a6ea79b 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -354,6 +354,26 @@

Direct converting numeric types to bool

2018-02-28 Thread n . zhuchkov

Attached patch allow direct convertion of numeric types to bool like 
integer::bool.

Supported types:
 - smallint;
 - bigint;
 - real;
 - double precision;
 - decimal(numeric).

This functionality is helped with migration from Oracle.

--
Nikita Zhuchkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-28 Thread Peter Eisentraut

On 2/24/18 18:29, Michael Paquier wrote:
> Sure.  But then I think that it would be nice to show up on screen the
> reason why the test failed if possible.  As of now if SSL is missing the
> whole run shows in red without providing much useful information.
> Instead of 0001 as shaped previously, why not using as well diag to show
> the failure on the screen?
> 
> For example the following block at the top of each test:
> if (!check_pg_config("#define USE_OPENSSL 1"))
> {
> diag "SSL tests not supported without support in build";
> die;
> }

I think BAIL_OUT() is intended for this.

>> What I had in mind would consist of something like this in
>> src/test/Makefile:
>>
>> ifeq ($(with_ldap),yes)
>> ifneq (,$(filter ldap,$(YOUR_VARIABLE_HERE)))
>> SUBDIRS += ldap
>> endif
>> endif
> 
> OK.  So let's call it PG_TEST_EXTRA_INSECURE or PG_TEST_EXTRA, which can
> take 'ldap', 'ssl' or 'ldap,ssl' as valid values.  Seeing that
> documented is really necessary in my opinion.  Any idea for a better
> name?

I don't have a great idea about the name.  The value should be
space-separated to work better with make functions.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Opclass parameters

2018-02-28 Thread Nikolay Shaplov

В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал:

> I would like to present patch set implementing opclass parameters.
> 
> This feature was recently presented at pgconf.ru:
> http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf
> 
> A analogous work was already done by Nikolay Shaplov two years ago:
> https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64
> But this patches are not based on it, although they are very similar.
Hi!

You know, I am still working on this issue.

When I started to work on it, I found out that option code is not flexible at 
all, and you can' reuse it for options that are not relOptions.

I gave your patch just a short glance for now, but as far as I can you start 
deviding options into global and local one. I am afraid it will create grater 
mess than it is now.

What I am doing right now, I am creating a new reloption internal API, that 
will allow to create any option in any place using the very same code.

I think it should be done first, and then use it for opclass parameters and 
any kind of options you like.

The big patch is here https://commitfest.postgresql.org/14/992/ (I am afraid 
it will not apply to current master as it is, It is quite old. But you can get 
the idea)

The smaller parts of the patch that in current commitfest are

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

You can help reviewing them. Then the whole thing will go faster. The second 
patch is quite trivial. The first will also need attention of someone who is 
really good in understanding postgres internals.

---

Concerning the patch that you've provided. I've just have a short look. But I 
already have some question.

1. I've seen you've added a new attribute into pg_index. Why??!!
As far as I can get, if have index built on several columns (A1, A2, A3) you 
can set, own opclass for each column. And set individual options for each 
opclass if we are speaking about options. So I would expect to have these 
options not in pg_index, but in pg_attribute. And we already have one there: 
attoptions.I just do not get how you have come to per-index options. May be I 
should read code more attentively...


2. Your patch does not provide any example of your new tool usage. In my 
prototype patch I've shown the implementation of opclass options for intarray. 
May be you should do the same. (Use my example it will be more easy then do it 
from scratch). It will give more understanding of how this improvement can be 
used.

3. 

--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -150,3 +150,8 @@ vacuum btree_tall_tbl;
 -- need to insert some rows to cause the fast root page to split.
 insert into btree_tall_tbl (id, t)
   select g, repeat('x', 100) from generate_series(1, 500) g;
+-- Test unsupported btree opclass parameters
+create index on btree_tall_tbl (id int4_ops(foo=1));
+ERROR:  access method "btree" does not support opclass options 
+create index on btree_tall_tbl (id default(foo=1));
+ERROR:  access method "btree" does not support opclass options 

Are you sure we really need these in postgres???

-

So my proposal is the following:
let's commit 

https://commitfest.postgresql.org/17/1536/
https://commitfest.postgresql.org/17/1489/

I will provide a final patch for option engine refactoring in commit fest, and 
you will write you implementation of opclass options on top of it. We can to 
it simultaneously. Or I will write opclass options myself... I do not care who 
will do oplcass options as long it is done using refactored options engine. If 
you do it, I can review it.

> 
> 
> Opclass parameters can give user ability to:
>   * Define the values of the constants that are hardcoded now in the
> opclasses depending on the indexed data.
>   * Specify what to index for non-atomic data types (arrays, json[b],
> tsvector). Partial index can only filter whole rows.
>   * Specify what indexing algorithm to use depending on the indexed data.
> 
> 
> Description of patches:
> 
> 1. Infrastructure for opclass parameters.
> 
> SQL grammar is changed only for CREATE INDEX statement: parenthesized
> parameters in reloptions format are added after column's opclass name. 
> Default opclass can be specified with DEFAULT keyword:
> 
> CREATE INDEX idx ON tab USING am (
> {expr {opclass | DEFAULT} ({name=value} [,...])} [,...]
> );
> 
> Example for contrib/intarray:
> 
> CREATE INDEX ON arrays USING gist (
>arr gist__intbig_ops (siglen = 32),
>arr DEFAULT (numranges = 100)
> );
> 
> \d arrays
>  Table "public.arrays"
>   Column |   Type| Collation | Nullable | Default
> +---+---+--+-
>   arr| integer[] |   |  |
> Indexes:
>  "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr
> gist__int_ops (numranges='100'))
> 
>

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-02-28 Thread Ivan Kartyshov


Thank you for your valuable comments. I've made a few adjustments.

The main goal of my changes is to let long read-only transactions run on 
replica if hot_standby_feedback is turned on.



Patch1 - hsfeedback_av_truncate.patch is made to stop 
ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy 
truncates heap on master cutting some pages at the end. When 
hot_standby_feedback is on, we know that the autovacuum does not remove 
anything superfluous, which could be needed on standby, so there is no 
need to rise any ResolveRecoveryConflict*.


1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which 
tells us that autovacuum generates them.


2) When autovacuum decides to trim the table (using lazy_truncate_heap), 
it takes AccessExclusiveLock and sends this lock to the replica, but 
replica should ignore  AccessExclusiveLock if hot_standby_feedback=on.


3) When autovacuum truncate wal message is replayed on a replica, it 
takes ExclusiveLock on a table, so as not to interfere with read-only 
requests.


We have two cases of resolving ResolveRecoveryConflictWithLock if timers 
 (max_standby_streaming_delay and max_standby_archive_delay) have run 
out:
backend is idle in transaction (waiting input) - in this case backend 
will be sent SIGTERM
backend transaction is running query - in this case running transaction 
will be aborted


How to test:
Make async replica, turn on feedback and reduce 
max_standby_streaming_delay.

Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test1:
Here we will do a load on the master and simulation of  a long 
transaction with repeated 1 second SEQSCANS on the replica (by  calling 
pg_sleep 1 second duration every 6 seconds).

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or 
increase max_standby_streaming_delay.


Test2:
Here we will do a load on the master and simulation of  a long 
transaction on the replica (by  taking LOCK on table)

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 1s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) 
id);

pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
LOCK TABLE test IN ACCESS SHARE MODE;
select * from test;
\watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Test3:
Here we do a load on the master and simulation of  a long transaction 
with repeated 1 second SEQSCANS on the replica (by  calling pg_sleep 1 
second duration every 6 seconds).

MASTERREPLICA
hot_standby = on
max_standby_streaming_delay = 4s
hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 200 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
start
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT pg_sleep(value) FROM test;

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

This way we can make transactions with SEQSCAN, INDEXSCAN or BITMAPSCAN


Patch2 - hsfeedback_noninvalide_xmin.patch
When walsender is initialized, its xmin in PROCARRAY is set to 
GetOldestXmin() in order to prevent autovacuum running on master from 
truncating relation and removing some pages that are required by 
replica. This might happen if master's autovacuum and replica's query 
started simultaneously. And the replica has not yet reported its xmin 
value.


How to test:
Make async replica, turn on feedback, reduce max_standby_streaming_delay 
and aggressive

Re: Reopen logfile on SIGHUP

2018-02-28 Thread Grigory Smolkin

If there is already SIGUSR1 for logfile reopening then shouldn`t 
postmaster watch for it? Postmaster PID is easy to obtain.



On 02/28/2018 01:32 AM, Greg Stark wrote:

On 27 February 2018 at 14:41, Anastasia Lubennikova
 wrote:


Small patch in the attachment implements logfile reopeninig on SIGHUP.
It only affects the file accessed by logging collector, which name you can
check with pg_current_logfile().

HUP will cause Postgres to reload its config files. That seems like a
fine time to reopen the log files as well but it would be nice if
there was also some way to get it to *just* do that and not reload the
config files.

I wonder if it would be easiest to just have the syslogger watch for
some other signal as well (I'm guessing the the syslogger doesn't use
relcache invalidations so it could reuse USR1 for example). That would
be a bit inconvenient as the admins would have to find the syslogger
and deliver the signal directly, rather than through the postmaster
but it would be pretty easy for them.



--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Function to track shmem reinit time

2018-02-28 Thread Grigory Smolkin

It can be used to accurately calculate server uptime, since you can`t 
rely on pg_postmaster_start_time() in this.



On 02/28/2018 03:11 PM, Anastasia Lubennikova wrote:


Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will 
just restart.

And the only way to know that it happened is to regularly parse logfile
or monitor it, catching restart messages. This approach is really 
inconvenient for

users, who have gigabytes of logs.

This new function can be periodiacally called by a monitoring agent, and,
if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
we know that server crashed-restarted, and also know the exact time, when.

Also, working on this patch, I noticed a bit of dead code
and some discordant comments in postmaster.c.
I see no reason to leave it as is.
So there is a small remove_dead_shmem_reinit_code_v0.patch.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Contention preventing locking

2018-02-28 Thread Amit Kapila

On Mon, Feb 26, 2018 at 8:26 PM, Konstantin Knizhnik
 wrote:
>
> On 26.02.2018 17:20, Amit Kapila wrote:
>>
>> Can you please explain, how it can be done easily without extra tuple
>> locks?  I have tried to read your patch but due to lack of comments,
>> it is not clear what you are trying to achieve.  As far as I can see
>> you are changing the locktag passed to LockAcquireExtended by the
>> first waiter for the transaction.  How will it achieve the serial
>> waiting protocol (queue the waiters for tuple) for a particular tuple
>> being updated?
>>
> The idea of transaction lock chaining was very simple. I have explained it
> in the first main in this thread.
> Assumed that transaction T1 has updated tuple R1.
> Transaction T2 also tries to update this tuple and so waits for T1 XID.
> If then yet another transaction T3 also tries to update R1, then it should
> wait for T2, not for T1.
>

Isn't this exactly we try to do via tuple locks
(heap_acquire_tuplock)?  Currently, T2 before waiting for T1 will
acquire tuple lock on R1 and T3 will wait on T2 (not on T-1) to
release the tuple lock on R-1 and similarly, the other waiters should
form a queue and will be waked one-by-one.  After this as soon T2 is
waked up, it will release the lock on tuple and will try to fetch the
updated tuple. Now, releasing the lock on tuple by T-2 will allow T-3
to also proceed and as T-3 was supposed to wait on T-1 (according to
tuple satisfies API), it will immediately be released and it will also
try to do the same work as is done by T-2.  One of those will succeed
and other have to re-fetch the updated-tuple again.

I think in this whole process backends may need to wait multiple times
either on tuple lock or xact lock.  It seems the reason for these
waits is that we immediately release the tuple lock (acquired by
heap_acquire_tuplock) once the transaction on which we were waiting is
finished.  AFAICU, the reason for releasing the tuple lock immediately
instead of at end of the transaction is that we don't want to
accumulate too many locks as that can lead to the unbounded use of
shared memory.  How about if we release the tuple lock at end of the
transaction unless the transaction acquires more than a certain
threshold (say 10 or 50) of such locks in which case we will fall back
to current strategy?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 115 matches

Mail list logo