Re: [HACKERS] Causal reads take II

2017-09-06 Thread Thomas Munro
On Thu, Aug 10, 2017 at 2:02 PM, Thomas Munro
 wrote:
> Rebased after conflicting commit 030273b7.  Now using format-patch
> with a commit message to keep track of review/discussion history.

TAP test 006_logical_decoding.pl failed with that version.  I had
missed some places that know how to decode wire protocol messages I
modified.  Fixed in the attached version.

It might be a good idea to consolidate the message encoding/decoding
logic into reusable routines, independently of this work.

-- 
Thomas Munro
http://www.enterprisedb.com


synchronous-replay-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Amit Langote
On 2017/09/07 13:09, Michael Paquier wrote:
> On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane  wrote:
>> The idea I'd had was to apply the masking only if pd_lower >=
>> SizeOfPageHeaderData, or if you wanted to be stricter, only if
>> pd_lower != 0.
> 
> If putting a check, it seems to me that the stricter one makes the
> most sense.

OK, patches updated that way.

> pd_lower should remain at 0 on pre-10 servers.

Doesn't PageInit(), which is where any page gets initialized, has always
set pd_lower to SizeOfPageHeaderData?

Thanks,
Amit
From 681c0ea82d0d9acd37f1979ebe1918d8636b0d26 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 23 Jun 2017 11:20:41 +0900
Subject: [PATCH 1/3] Set pd_lower correctly in the GIN metapage.

---
 src/backend/access/gin/ginutil.c |  7 +++
 src/backend/access/gin/ginxlog.c | 24 +---
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 136ea27718..ebac391818 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -374,6 +374,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+   /*
+* Set pd_lower just past the end of the metadata.  This is not 
essential
+* but it makes the page look compressible to xlog.c.
+*/
+   ((PageHeader) page)->pd_lower =
+   ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) page;
 }
 
 /*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..f5c11b2d9a 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
 
-   GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+   GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), >metadata, 
sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
 
-   GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+   GinInitMetabuffer(metabuffer);
 
memcpy(GinPageGetMeta(metapage), >metadata, 
sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -768,6 +768,7 @@ void
 gin_mask(char *pagedata, BlockNumber blkno)
 {
Pagepage = (Page) pagedata;
+   PageHeader  pagehdr = (PageHeader) page;
GinPageOpaque opaque;
 
mask_page_lsn(page);
@@ -776,18 +777,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
 
/*
-* GIN metapage doesn't use pd_lower/pd_upper. Other page types do. 
Hence,
-* we need to apply masking for those pages.
+* For GIN_DELETED page, the page is initialized to empty. Hence, mask
+* the page content.
 */
-   if (opaque->flags != GIN_META)
-   {
-   /*
-* For GIN_DELETED page, the page is initialized to empty. 
Hence, mask
-* the page content.
-*/
-   if (opaque->flags & GIN_DELETED)
-   mask_page_content(page);
-   else
-   mask_unused_space(page);
-   }
+   if (opaque->flags & GIN_DELETED)
+   mask_page_content(page);
+   else if (pagehdr->pd_lower != 0)
+   mask_unused_space(page);
 }
-- 
2.11.0

From a605ae99664138aa827500216567137be97d0460 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Jun 2017 15:13:32 +0900
Subject: [PATCH 2/3] Set pd_lower correctly in the BRIN index metapage

---
 src/backend/access/brin/brin_pageops.c | 7 +++
 src/backend/access/brin/brin_xlog.c| 9 +++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin_pageops.c 
b/src/backend/access/brin/brin_pageops.c
index 80f803e438..8762356433 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -491,6 +491,13 @@ brin_metapage_init(Page page, BlockNumber pagesPerRange, 
uint16 version)
 * revmap page to be created when the index is.
 */
metadata->lastRevmapPage = 0;
+
+   /*
+* Set pd_lower just past the end of the metadata.  This is to log full
+* page image of metapage in xloginsert.c.
+*/
+   ((PageHeader) page)->pd_lower =
+   ((char *) metadata + sizeof(BrinMetaPageData)) - (char *) page;
 }
 
 /*
diff --git a/src/backend/access/brin/brin_xlog.c 

Re: [HACKERS] Parallel Append implementation

2017-09-06 Thread Amit Kapila
On Thu, Aug 31, 2017 at 12:47 PM, Amit Khandekar  wrote:
> The last updated patch needs a rebase. Attached is the rebased version.
>

Few comments on the first read of the patch:

1.
@@ -279,6 +347,7 @@ void
 ExecReScanAppend(AppendState *node)
 {
  int i;
+ ParallelAppendDesc padesc = node->as_padesc;

  for (i = 0; i < node->as_nplans; i++)
  {
@@ -298,6 +367,276 @@ ExecReScanAppend(AppendState *node)
  if (subnode->chgParam == NULL)
  ExecReScan(subnode);
  }
+
+ if (padesc)
+ {
+ padesc->pa_first_plan = padesc->pa_next_plan = 0;
+ memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans);
+ }
+

For rescan purpose, the parallel state should be reinitialized via
ExecParallelReInitializeDSM.  We need to do that way mainly to avoid
cases where sometimes in rescan machinery we don't perform rescan of
all the nodes.  See commit 41b0dd987d44089dc48e9c70024277e253b396b7.

2.
+ * shared next_subplan counter index to start looking for unfinished plan,

Here using "counter index" seems slightly confusing. I think using one
of those will be better.

3.
+/* 
+ * exec_append_leader_next
+ *
+ * To be used only if it's a parallel leader. The backend should scan
+ * backwards from the last plan. This is to prevent it from taking up
+ * the most expensive non-partial plan, i.e. the first subplan.
+ * 
+ */
+static bool
+exec_append_leader_next(AppendState *state)

>From above explanation, it is clear that you don't want backend to
pick an expensive plan for a leader, but the reason for this different
treatment is not clear.

4.
accumulate_partialappend_subpath()
{
..
+ /* Add partial subpaths, if any. */
+ return list_concat(partial_subpaths, apath_partial_paths);
..
+ return partial_subpaths;
..
+ if (is_partial)
+ return lappend(partial_subpaths, subpath);
..
}

In this function, instead of returning from multiple places
partial_subpaths list, you can just return it at the end and in all
other places just append to it if required.  That way code will look
more clear and simpler.

5.
 * is created to represent the case that a relation is provably empty.
+ *
  */
 typedef struct AppendPath

Spurious line addition.

6.
if (!node->as_padesc)
{
/*
* This is Parallel-aware append. Follow it's own logic of choosing
* the next subplan.
*/
if (!exec_append_seq_next(node))

I think this is the case of non-parallel-aware appends, but the
comments are indicating the opposite.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] merge psql ef/ev sf/sv handling functions

2017-09-06 Thread Fabien COELHO



you can't do this sort of thing:

-psql_error("The server (version %s) does not support editing function 
source.\n",
+psql_error("The server (version %s) does not support editing 
%s.\n",
   formatPGVersionNumber(pset.sversion, false,
- sverbuf, sizeof(sverbuf)));
+ sverbuf, sizeof(sverbuf)),
+   is_func ? "function source" : "view definitions");

It's too much of a pain in the rear for translators.


Argh, indeed, I totally forgot about translations. Usually there is a _() 
hint for gettext.


See 
https://www.postgresql.org/docs/devel/static/nls-programmer.html#nls-guidelines 
Usually we just use two independent messages, and that's what I did.


Yep, makes sense. Thanks for the fix.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-09-06 Thread Tatsuro Yamada

On 2017/04/05 10:17, Masahiko Sawada wrote:

On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:

On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
 wrote:

Hmm, you're right.  It could be counted with a separate variable
initialized to 0 and incremented every time we decide to add a row to the
final set of sampled rows, although different implementations of
AcquireSampleRowsFunc have different ways of deciding if a given row will
be part of the final set of sampled rows.

On the other hand, if we decide to count progress in terms of blocks as
you suggested afraid, I'm afraid that FDWs won't be able to report the
progress.


I think it may be time to push this patch out to v11.  It was
submitted one day before the start of the last CommitFest, the design
wasn't really right, and it's not clear even now that we know what the
right design is.  And we're pretty much out of time.



+1
We're encountering the design issue and it takes more time to find out
right design including FDWs support.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Hi Vinayak,

I found a typo in your patch: s/taht/that/

sampling.c
   /* Report total number of blocks taht will be sampled */

Then, regarding FDWs support, I believe it means postgres_fdw support (is my 
understanding right?).
You might better to put pgstat_progress_update_param() into these functions, 
maybe.
  postgres_fdw.c
- postgresAnalyzeForeignTable
- postgresAcquireSampleRowsFunc

And PgFdwAnalyzeState has these variables, hopefully you can get current number 
of rows
as a progress indicator.

  sturuct PgFdwAnalyzeState
  ...
/* collected sample rows */
HeapTuple  *rows;   /* array of size targrows */
int targrows;   /* target # of sample rows */
int numrows;/* # of sample rows collected */

/* for random sampling */
double  samplerows; /* # of rows fetched */
double  rowstoskip; /* # of rows to skip before next sample */
  ...

I hope it will help you.

Regards,
Tatsuro Yamada



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restricting maximum keep segments by repslots

2017-09-06 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 1 Sep 2017 23:49:21 -0400, Peter Eisentraut 
 wrote in 
<751e09c4-93e0-de57-edd2-e64c4950f...@2ndquadrant.com>
> I'm still concerned about how the critical situation is handled.  Your
> patch just prints a warning to the log and then goes on -- doing what?
> 
> The warning rolls off the log, and then you have no idea what happened,
> or how to recover.

The victims should be complaining in their log files, but, yes, I
must admit that it's extremely resembles /dev/null. And the
catastrophe comes suddenly.

> I would like a flag in pg_replication_slots, and possibly also a
> numerical column that indicates how far away from the critical point
> each slot is.  That would be great for a monitoring system.

Great! I'll do that right now.

> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 

Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-06 Thread Masahiko Sawada
On Wed, Sep 6, 2017 at 4:01 PM, Fabien COELHO  wrote:
>
> Applies, compiles, works for me.
>
> Very very minor comments that I should have noticed before, sorry for this
> additional round trip.

Thank you for the dedicated review!

>
> In the help line, move -I just after -i, to put short options in
> alphabetical and decreasing importance order. On this line, also add the
> information about the default, something like:
>
>   -i, --ini...   
>   -I, --custom=[...]+  (default "ctgvp")
>  ...

Agreed. Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-06 Thread Noah Misch
On Wed, Sep 06, 2017 at 03:28:47PM +0900, Masahiko Sawada wrote:
> On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sher  wrote:
> > Arseny Sher  writes:
> >
> >> Attached patch fixes this by stopping workers before RO drop, as
> >> already done in case when we drop replication slot.
> >
> > Sorry, here is the patch.
> >
> 
> I could reproduce this issue, it's a bug. Added this to the open item.
> The cause of this is commit 7e174fa7 which make subscription ddls kill
> the apply worker only when transaction commit. I didn't look the patch
> deeply yet but I'm not sure the approach that kills apply worker
> before commit would be good.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2017-09-06 Thread Craig Ringer
On 14 August 2017 at 11:56, Craig Ringer  wrote:

>
> I don't want to block failover slots on decoding on standby just because
> decoding on standby would be nice to have.
>

However, during discussion with Tomas Munro a point has come up that does
block failover slots as currently envisioned - silent timeline divergence.
It's a solid reason why the current design and implementation is
insufficient to solve the problem. This issue exists both with the original
failover slots and with the model Robert and I were discussing.

Say a decoding client has replayed from master up to commit of xid 42 at
1/1000 and confirmed flush, then a failover slots standby of the master is
promoted. The standby has only received WAL from the failed master up to
1/500 with most recent xid 20. Now the standby does some other new xacts,
pushing xid up to 30 at 1/1000 then continuing to insert until xid 50 at
lsn 1/2000.

Then the logical client reconnects. The logical client will connect to the
failover slot fine, and start replay. But it'll ask for replay to start at
1/1000. The standby will happily fast-forward the slot (as it should), and
start replay after 1/1000.

But now we have silent divergence in timelines. The logical replica has
received and committed xacts 20...42 at lsn 1/500 through 1/1000, but these
are not present on the promoted master. And the replica has skipped over
the new-master's xids 20...30 with lsns 1/500 through 1/1000, so they're
present on the new master but not the replica.

IMO, this shows that not including the timeline in replication origins was
a bit of a mistake, since we'd trivially detect this if they were included
- but it's a bit late now.  And anyway, detection would just mean logical
rep would break, which doesn't help much.

The simplest fix, but rather limited, is to require that failover
candidates be in synchronous_standby_names, and delay ReorderBufferCommit
sending the actual commit message until all peers in s_s_n confirm flush of
the commit lsn. But that's not much good if you want sync rep for your
logical connections too, and is generally a hack.

A more general solution requires that masters be told which peers are
failover candidates, so they can ensure ordering between logical decoding
and physical failover candidates. Which effectively adds another kind of
sync rep, where we do "wait for physical failover candidates to flush, and
only then allow logical decoding". This actually seems pretty practical
with the design Robert and I discussed, but it's definitely an expansion in
scope.

Alternately, we could require the decoding clients to keep an eye on the
flush/replay positions of all failover candidates and delay commit+confirm
of decoded xacts until the upstream's failover candidates have received and
flushed up to that lsn. Theat starts to look at lot like a decoding on
standby based model for logical failover, where the downstream maintains
slots on each failover candidate upstream.

So yeah. More work needed here. Even if we suddenly decided the original
failover slots model was OK, it's not sufficient to fully solve the problem.

(It's something I'd thought for BDR failover, but never applied to falover
slots: the problem of detecting or preventing divergence when the logical
client is ahead of physical receive at the time the physical standby is
promoted.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane  wrote:
> The idea I'd had was to apply the masking only if pd_lower >=
> SizeOfPageHeaderData, or if you wanted to be stricter, only if
> pd_lower != 0.

If putting a check, it seems to me that the stricter one makes the
most sense. pd_lower should remain at 0 on pre-10 servers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane  wrote:
>> I looked briefly at these patches.  I'm not sure that it's safe for the
>> mask functions to assume that meta pages always have valid pd_lower.
>> What happens when replaying log data concerning an old index that doesn't
>> have that field filled?

> There will be inconsistency between the pages, and the masking check
> will complain.

That doesn't seem like a pleasant outcome to me.  The WAL consistency
check code is supposed to complain if there's some kind of replication
or replay failure, and this cannot be categorized as either.

The idea I'd had was to apply the masking only if pd_lower >=
SizeOfPageHeaderData, or if you wanted to be stricter, only if
pd_lower != 0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-09-06 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund  wrote in 
<20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de>
> On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> > The problem is that the current ReadRecord needs the first one of
> > a series of continuation records from the same source with the
> > other part, the master in the case.
> 
> What's the problem with that?  We can easily keep track of the beginning
> of a record, and only confirm the address before that.

After failure while reading a record locally, ReadRecored tries
streaming to read from the beginning of a record, which is not on
the master, then retry locally and.. This loops forever.

> > A (or the) solution closed in the standby side is allowing to
> > read a seris of continuation records from muliple sources.
> 
> I'm not following. All we need to use is the beginning of the relevant
> records, that's easy enough to keep track of. We don't need to read the
> WAL or anything.

The beginning is already tracked and nothing more to do. 

I reconsider that way and found that it doesn't need such
destructive refactoring.

The first *problem* was WaitForWALToBecomeAvaialble requests the
beginning of a record, which is not on the page the function has
been told to fetch. Still tliRecPtr is required to determine the
TLI to request, it should request RecPtr to be streamed.

The rest to do is let XLogPageRead retry other sources
immediately. To do this I made ValidXLogPageHeader@xlogreader.c
public (and renamed to XLogReaderValidatePageHeader).

The patch attached fixes the problem and passes recovery
tests. However, the test for this problem is not added. It needs
to go to the last page in a segment then put a record continues
to the next segment, then kill the standby after receiving the
previous segment but before receiving the whole record.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 8932a390bd3d1acfe5722bc62f42fc7e381ca617 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.

The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---
 src/backend/access/transam/xlog.c   |  7 ++-
 src/backend/access/transam/xlogreader.c | 12 +---
 src/include/access/xlogreader.h |  5 +
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index df4843f..eef3a97 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11566,6 +11566,11 @@ retry:
 	Assert(reqLen <= readLen);
 
 	*readTLI = curFileTLI;
+
+	if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
+	  (XLogPageHeader) readBuf))
+		goto next_record_is_invalid;
+
 	return readLen;
 
 next_record_is_invalid:
@@ -11700,7 +11705,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		}
 		else
 		{
-			ptr = tliRecPtr;
+			ptr = RecPtr;
 			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
 
 			if (curFileTLI > 0 && tli < curFileTLI)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0781a7b..aa05e3f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -27,8 +27,6 @@
 
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 
-static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-	XLogPageHeader hdr);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 	  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -545,7 +543,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 
 		hdr = (XLogPageHeader) state->readBuf;
 
-		if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+		if (!XLogReaderValidatePageHeader(state, targetSegmentPtr, hdr))
 			goto err;
 	}
 
@@ -582,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	/*
 	 * Now that we know we have the full header, validate it.
 	 */
-	if (!ValidXLogPageHeader(state, pageptr, hdr))
+	if (!XLogReaderValidatePageHeader(state, pageptr, hdr))
 		goto err;
 
 	/* update read state information */
@@ -709,9 +707,9 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 /*
  * Validate a page header
  */
-static bool
-ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-	XLogPageHeader hdr)
+bool
+XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
+			 XLogPageHeader hdr)
 {
 	XLogRecPtr	recaddr;
 	

Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Chapman Flack
On 09/06/17 18:33, Omar Kilani wrote:

> Is there anything people using float datetimes can do that isn't a
> pg_dumpall | pg_restore to do a less painful update?
> 
> We have several TB of data still using float datetimes and I'm trying
> to figure out how we can move to 10 (currently on 9.6.x) without
> massive amounts of $ in duplicated hardware or downtime.

I ran into an analogous issue with endianness of PL/Java-defined datatypes,
and ended up devising a procedure [1] for treating the type one way on
output and another way on input, and then constructing an UPDATE query
that just assigns the column to itself using a function that's essentially
identity, but forces the output-input conversion (and doesn't look like
a simple identity function to the query optimizer, which otherwise might
optimize the whole update away).

While the details of that were specific to PL/Java and byte order,
something similar might work in your case if you can afford some period,
either just before or just after migration, when your normal workload
is blocked, long enough to run such updates on your several TB of data.

Or if that's too big a chunk of downtime, you might be able to add
a column in advance, of some user-defined type the same width as an
integer datetime, populate that in advance, migrate, then do a quick
catalog switcheroo of the column names and types. I've never done that,
so someone else might have comments on its feasibility or the best way
of doing it.

> the exact crash, but the datetime values were either too small or too
> large to fit into the integer datetimes field -- I can retry this if

That does seem important to get the details on. If the integer datetime
column won't represent your stuff (and the too-large or too-small values
are necessary to represent), then some schema change might be necessary.
Or, if the outlying values were sentinel values of some kind (I wonder,
how else would you even have datetimes outside of the int64 range?),
maybe they can be replaced with others that fit.

-Chap

[1]: https://tada.github.io/pljava/use/byteordermigrate.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-09-06 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 9:29 AM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
>  wrote:
>> Eh, if you want to optimize it for the case where debug output is not
>> enabled, make sure to use ereport() not elog().  ereport()
>> short-circuits evaluation of arguments, whereas elog() does not.
>
> I should do that, but it's still not really noticeable.

Since this patch has now bit-rotted, I attach a new revision, V2.
Apart from fixing some Makefile bitrot, this revision also makes some
small tweaks as suggested by Thomas and Alvaro. The documentation is
also revised and expanded, and now discusses practical aspects of the
set membership being tested using a Bloom filter, how that relates to
maintenance_work_mem, and so on.

Note that this revision does not let the Bloom filter caller use their
own dynamic shared memory, which is something that Thomas asked about.
While that could easily be added, I think it should happen later. I
really just wanted to make sure that my Bloom filter was not in some
way fundamentally incompatible with Thomas' planned enhancements to
(parallel) hash join.

-- 
Peter Geoghegan
From d4dda95dd41204315dc12936fac83d2df8676992 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH 1/2] Add Bloom filter data structure implementation.

A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
---
 src/backend/lib/Makefile  |   4 +-
 src/backend/lib/README|   2 +
 src/backend/lib/bloomfilter.c | 297 ++
 src/include/lib/bloomfilter.h |  26 
 4 files changed, 327 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/bloomfilter.c
 create mode 100644 src/include/lib/bloomfilter.h

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index d1fefe4..191ea9b 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = binaryheap.o bipartite_match.o dshash.o hyperloglog.o ilist.o \
-	   knapsack.o pairingheap.o rbtree.o stringinfo.o
+OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
+   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index 5e5ba5e..376ae27 100644
--- a/src/backend/lib/README
+++ b/src/backend/lib/README
@@ -3,6 +3,8 @@ in the backend:
 
 binaryheap.c - a binary heap
 
+bloomfilter.c - probabilistic, space-efficient set membership testing
+
 hyperloglog.c - a streaming cardinality estimator
 
 pairingheap.c - a pairing heap
diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c
new file mode 100644
index 000..e93f9b0
--- /dev/null
+++ b/src/backend/lib/bloomfilter.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * bloomfilter.c
+ *		Minimal Bloom filter
+ *
+ * A Bloom filter is a probabilistic data structure that is used to test an
+ * element's membership of a set.  False positives are possible, but false
+ * negatives are not; a test of membership of the set returns either "possibly
+ * in set" or "definitely not in set".  This can be very space efficient when
+ * individual elements are larger than a few bytes, because elements are hashed
+ * in order to set bits in the Bloom filter bitset.
+ *
+ * Elements can be added to the set, but not removed.  The more elements that
+ * are added, the larger the probability of false positives.  Caller must hint
+ * an estimated total size of the set when its Bloom filter is initialized.
+ * This is used to balance the use of memory against the final false positive
+ * rate.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/bloomfilter.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/hash.h"
+#include "lib/bloomfilter.h"
+#include "utils/memutils.h"
+
+#define MAX_HASH_FUNCS	10
+#define NBITS(filt)		((1 << (filt)->bloom_power))
+
+typedef struct bloom_filter
+{
+	/* 2 ^ bloom_power is the size of the bitset (in bits) */
+	intbloom_power;
+	unsigned char  *bitset;
+
+	/* K hash functions are used, which are randomly seeded */
+	intk_hash_funcs;
+	

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 10:42 AM, Amit Langote
 wrote:
> I too tend to think that any users who use this masking facility would
> know to expect to get these failures on upgraded clusters with invalid
> pd_lower in meta pages.

Yes, I don't think that an optimization reducing WAL that impacts all
users should be stopped by a small set of users who use an option for
development purposes.

> (PS: I wonder if it is reasonable to allow configuring the error level
> used when a masking failure occurs?  Currently, checkXLogConsistency()
> will abort the process (FATAL))

It definitely is worth it in my opinion, perhaps with an on/off switch
to trigger a warning instead. The reason why we use FATAL now is to
trigger more easily red flags for any potential buildfarm runs: a
startup process facing FATAL takes down the standby.

>> Perhaps we should document this point for wal_consistency_check?
>
> Do you mean permanently under wal_consistency_check parameter
> documentation or in the release notes under incompatibilities for the
> affected index types?

Under the parameter itself, in the spirit of a don't-do-that from an
upgraded instance.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 10:05 PM, Peter Eisentraut
 wrote:
> Please send a rebased patch.
>
> I propose splitting the single Perl script into three separate test
> files: one for basic command-line option handling and so on (I would
> like to expand that later), one for the main upgrade test, and one for
> the funny database names tests.

That makes sense. There will be additional overhead with the creation
of an extra server though.

> In the testing file, you removed the paragraph that explains how to do
> cross-version upgrade testing.  It's unfortunate that we would lose that
> functionality.  What can we do about that?

Right, simply removing support for something which has been here for a
long time is no fun. I think that we should add in PostgresNode
objects a new bindir variable which will be used to define path to
binaries. Any new node created needs to go through init() or
init_from_backup(), so a node created with init() would set this
bindir to what pg_config in PATH reports, or to the value defined by
the caller if it is defined (let's use an option for %params). A node
created from init_from_backup() inherits the path of its root node.
This requires a bit of refactoring first. This could help also for
cross version tests out of the code core.

In the existing scripts, there are the following variables:
- oldsrc, old version's source tree
- oldbindir, old version's installed bin dir
- bindir, this version's installed bin dir.
- libdir, this version's installed lib dir
bindir and libdir are pointing to the currently installed version in
the tree, so we could do without it, no? oldbindir and oldsrc need to
be kept around to enforce the position of binaries for the old
version, as well as a proper shape of the original dump being compared
(+ drop of the past functions).

Then, for the pg_upgrade tests, let's check for ENV{oldbindir} and
enforce the bindir value of the PostgresNode to-be-upgraded. And also
for ENV{oldsrc}, first check if it is defined, and then do the
existing psql/dump changes. So one, in order to run cross-version
checks, would just need to rely on the fact that the version where
installcheck runs is the new version. Does that sound acceptable?
Looking at 5bab198, those don't run that often, but I definitely agree
that breaking something for no apparent reason is not cool either ;p

> We also need to have a plan for handling the build farm.  Maybe keep the
> vcregress.pl upgradecheck target as a thin wrapper for the time being?

Or we could make upgradecheck a noop, then remove it once all the MSVC
animals have upgraded to a newer version of the buildfarm client which
does not use upgradecheck anymore (I am fine to send a patch or a pull
request to Andrew for that).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-09-06 Thread Thomas Munro
On Fri, Sep 1, 2017 at 1:51 PM, Haribabu Kommi  wrote:
> Here I attached new set of patches that are rebased to the latest master.

Hi Haribabu,

Could you please post a new rebased version?
0007-Scan-functions-are-added-to-storage-AM.patch conflicts with
recent changes.

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Amit Langote
On 2017/09/07 8:51, Michael Paquier wrote:
> On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane  wrote:
>> Amit Langote  writes:
>>> On 2017/08/22 9:39, Michael Paquier wrote:
 On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
  wrote:
> I updated brin_mask() and spg_mask() in the attached updated patches so
> that they consider meta pages as containing unused space.
>>
>> I looked briefly at these patches.  I'm not sure that it's safe for the
>> mask functions to assume that meta pages always have valid pd_lower.
>> What happens when replaying log data concerning an old index that doesn't
>> have that field filled?
> 
> There will be inconsistency between the pages, and the masking check
> will complain. My point here is that wal_consistency_checking is
> primarily used by developers on newly-deployed clusters to check WAL
> consistency by using installcheck. So an upgraded cluster would see
> diffs because of that, but I would think that nobody would really face
> them.

I too tend to think that any users who use this masking facility would
know to expect to get these failures on upgraded clusters with invalid
pd_lower in meta pages.


(PS: I wonder if it is reasonable to allow configuring the error level
used when a masking failure occurs?  Currently, checkXLogConsistency()
will abort the process (FATAL))

> Perhaps we should document this point for wal_consistency_check?

Do you mean permanently under wal_consistency_check parameter
documentation or in the release notes under incompatibilities for the
affected index types?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 11:44 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Peter Eisentraut wrote:
>>> We also need to have a plan for handling the build farm.  Maybe keep the
>>> vcregress.pl upgradecheck target as a thin wrapper for the time being?
>
>> The buildfarm already runs "make check" in src/bin/ when TAP tests are
>> enabled, which should be enough to trigger the rewritten test, no?
>
> I think Peter's on about the Windows case.  Not sure how that's handled,
> but it's not "make check".

For MSVC, one can use "vcregress.bat upgradecheck". So perhaps we
could keep upgradecheck for a short time but make it a noop instead
with this patch, and then remove it once buildfarm animals are
upgraded to a newer client version? I would prefer seeing a simple
removal of upgradecheck at the end, and put all TAP tests for binaries
under the bincheck path. This feels more natural.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 10:11 AM, Peter Eisentraut
 wrote:
> On 9/4/17 06:03, Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> Michael Paquier  writes:
 I don't like breaking the abstraction of pg_log() with the existing
 flags with some kind of pg_debug() layer. The set of APIs present now
 in pg_rewind for logging is nice to have, and I think that those debug
 messages should be translated. So what about the attached?
>>>
>>> Your point about INT64_FORMAT not necessarily working with fprintf
>>> is an outstanding reason not to keep it like it is.  I've not reviewed
>>> this patch in detail but I think this is basically the way to fix it.
>>
>> Actually this code goes throgh vsnprintf, not fprintf, which should be
>> safe, so I removed that part of the comment, and pushed.
>
> Is there a reason this was not backpatched to 9.5?

Indeed. Please note that cherry-picking the fix from 23c1f0a works just fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-09-06 Thread Peter Eisentraut
On 9/4/17 06:03, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Michael Paquier  writes:
>>> I don't like breaking the abstraction of pg_log() with the existing
>>> flags with some kind of pg_debug() layer. The set of APIs present now
>>> in pg_rewind for logging is nice to have, and I think that those debug
>>> messages should be translated. So what about the attached?
>>
>> Your point about INT64_FORMAT not necessarily working with fprintf
>> is an outstanding reason not to keep it like it is.  I've not reviewed
>> this patch in detail but I think this is basically the way to fix it.
> 
> Actually this code goes throgh vsnprintf, not fprintf, which should be
> safe, so I removed that part of the comment, and pushed.

Is there a reason this was not backpatched to 9.5?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor code improvement to postgresGetForeignPlan

2017-09-06 Thread Tatsuro Yamada

On 2017/09/07 6:52, Tom Lane wrote:

Tatsuro Yamada  writes:

The declaration of postgresGetForeignPlan uses baserel, but
the actual definition uses foreignrel. It would be better to sync.


Pushed, thanks.

regards, tom lane


Thanks!

Regards,
Tatsuro Yamada




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> On 2017/08/22 9:39, Michael Paquier wrote:
>>> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
>>>  wrote:
 I updated brin_mask() and spg_mask() in the attached updated patches so
 that they consider meta pages as containing unused space.
>
> I looked briefly at these patches.  I'm not sure that it's safe for the
> mask functions to assume that meta pages always have valid pd_lower.
> What happens when replaying log data concerning an old index that doesn't
> have that field filled?

There will be inconsistency between the pages, and the masking check
will complain. My point here is that wal_consistency_checking is
primarily used by developers on newly-deployed clusters to check WAL
consistency by using installcheck. So an upgraded cluster would see
diffs because of that, but I would think that nobody would really face
them. Perhaps we should document this point for wal_consistency_check?
Getting the patch discussed on this thread into v10 would have saved
the day here, but this boat has sailed already.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 1:43 AM, Robert Haas  wrote:
> On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs  wrote:
 I'd personally be fine with --no-whatever for any whatever that might
 be a subsidiary property of database objects.  We've got
 --no-security-labels, --no-tablespaces, --no-owner, and
 --no-privileges already, so what's wrong with --no-comments?

 (We've also got --no-publications; I think it's arguable whether that
 is the same kind of thing.)
>>>
>>> And --no-subscriptions in the same bucket.
>>
>> Yes, it is. I was suggesting that we remove those as well.

FWIW, I do too. They are useful for given application code paths.

> That seems like a non-starter to me.  I have used those options many
> times to solve real problems, and I'm sure other people have as well.
> We wouldn't have ended up with all of these options if users didn't
> want to control such things.

As there begins to be many switches of this kind and much code
duplication, I think that some refactoring into a more generic switch
infrastructure would be nicer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Tom Lane
Omar Kilani  writes:
> Is there anything people using float datetimes can do that isn't a
> pg_dumpall | pg_restore to do a less painful update?

Um, not really.  You may be stuck on 9.6 until you can spare the effort
to convert.  The physical representations of timestamps are totally
different in the two cases.

> I did attempt a pg_dumpall | pg_restore at one point but for whatever
> reason we had data in tables that integer datetimes fails on (I forget
> the exact crash, but the datetime values were either too small or too
> large to fit into the integer datetimes field -- I can retry this if
> it would be helpful).

I'm pretty sure the minimum values are the same in both cases, to wit
Julian day zero.  As for the max, according to the old code comments

 * The upper limit for dates is 5874897-12-31, which is a bit less than what
 * the Julian-date code can allow.  We use that same limit for timestamps when
 * using floating-point timestamps ... For integer timestamps, the upper
 * limit is 294276-12-31.

I would hope that any timestamps you've got beyond 294276AD are erroneous
data that you need to clean up anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-Black tree traversal tests

2017-09-06 Thread Tom Lane
[ btw, please don't cc pgsql-hackers-owner, the list moderators don't
need the noise ]

Aleksander Alekseev  writes:
> I would say that this patch is in a pretty good shape now. And I see a
> 99% code coverage of rbtree.c. Let's see what committers think.

I took a quick look through the patch --- haven't tried to compile it
or anything like that --- and have a few comments:

* There's some typos, eg extention should be extension, triversal
should be traversal.  Maybe try a spell checker?

* The diff complains about lack of file-ending newlines in several
places.

* There's something weird at the start of test.c:

@@ -0,0 +1,577 @@
+/*--

Maybe your compiler thinks that's a BOM?  It's hard to see how it
compiles otherwise.

* I think it might be simpler to have the module expose just one SQL
function that invokes all these individual tests internally.  Less
boilerplate text that way, and less to change if you add more tests
later.  Also, you might consider passing in TEST_SIZE as an argument
of the SQL function instead of having it be hard-wired.

* We don't do C++-style (//) comments around here.  Please use C
style.  (You might consider running the code through pgindent,
which will convert those comments automatically.)

* It's also generally not per project style to use malloc/calloc/free
directly in backend code; and it's certainly not cool to use malloc or
calloc and then not check for a null result.  Use palloc instead.  Given
the short runtime of this test, you likely needn't be too worried about
pfree'ing stuff.

* _PG_init() declaration seems to be a leftover?   doesn't
belong here either, as postgres.h will bring that in for you.

* I know next to zip about red-black trees, but it disturbs me that
the traversal tests use trees whose values were inserted in strictly
increasing order.  Seems like that's not proving as much as it could.
I wonder how hard it would be to insert those integers in random order.

* I'm not too pleased that the rb_find calls mostly just assume that
the find won't return NULL.  You should be testing for that and reporting
the problem, not just dying on a null pointer crash if it happens.

* Possibly the tests should exercise rb_delete on keys *not* present.
And how about insertion of already-existing keys?  Although that's
a bit outside the scope of testing traversal, so if you want to leave
it for some future expansion, that'd be OK.

I'll set this back to Waiting on Author.  I do encourage you to finish
it up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Omar Kilani
Hi,

I know I'm 7 months late to this, but only just read the beta 4 release notes.

Is there anything people using float datetimes can do that isn't a
pg_dumpall | pg_restore to do a less painful update?

We have several TB of data still using float datetimes and I'm trying
to figure out how we can move to 10 (currently on 9.6.x) without
massive amounts of $ in duplicated hardware or downtime.

I did attempt a pg_dumpall | pg_restore at one point but for whatever
reason we had data in tables that integer datetimes fails on (I forget
the exact crash, but the datetime values were either too small or too
large to fit into the integer datetimes field -- I can retry this if
it would be helpful).

Thanks.

Regards,
Omar

On Mon, Feb 27, 2017 at 5:13 PM, Andres Freund  wrote:
> On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote:
>> On 02/22/2017 02:45 PM, Tom Lane wrote:
>> > Andres Freund  writes:
>> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
>> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes
>> > > > in HEAD, and just agreeing that in the back branches, use of 
>> > > > replication
>> > > > protocol with float-timestamp servers is not supported and we're not
>> > > > going to bother looking for related bugs there.  Given the lack of 
>> > > > field
>> > > > complaints, I do not believe anyone cares.)
>> >
>> > What I *am* willing to spend time on is removing float-timestamp code
>> > in HEAD.  I've not yet heard anybody speak against doing that (or at
>> > least, nothing I interpreted as a vote against it).  If I've not heard
>> > any complaints by tomorrow, I'll get started on that.
>>
>> Rip it out!
>
> Already happened: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
 wrote:
>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>
> This breaks the documentation build, because
> doc/src/sgml/release-9.6.sgml still contains  linkend="guc-replacement-sort-tuples"> but you removed that id.

Thanks for looking into it.

I suppose that the solution is to change the 9.6 release notes to no
longer have a replacement_sort_tuples link. Anyone else have an
opinion on that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor code improvement to postgresGetForeignPlan

2017-09-06 Thread Tom Lane
Tatsuro Yamada  writes:
> The declaration of postgresGetForeignPlan uses baserel, but
> the actual definition uses foreignrel. It would be better to sync.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Thomas Munro
On Fri, Sep 1, 2017 at 6:00 AM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan  wrote:
>> I may submit the simple patch to remove replacement selection, if
>> other contributors are receptive. Apart from everything else, the
>> "incrementalism" of replacement selection works against cleverer batch
>> memory management of the type I just outlined, which seems like a
>> useful direction to take tuplesort in.
>
> I attach a patch to remove replacement selection, which I'll submit to CF 1.

This breaks the documentation build, because
doc/src/sgml/release-9.6.sgml still contains  but you removed that id.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] merge psql ef/ev sf/sv handling functions

2017-09-06 Thread Tom Lane
Fabien COELHO  writes:
> Here is an attempt at merging some functions which removes 160 lines of 
> code.

Pushed with minor adjustments.  I thought a couple of variable names
could be better chosen, but mostly, you can't do this sort of thing:

-psql_error("The server (version %s) does not support editing 
function source.\n",
+psql_error("The server (version %s) does not support editing 
%s.\n",
formatPGVersionNumber(pset.sversion, false,
- sverbuf, sizeof(sverbuf)));
+ sverbuf, sizeof(sverbuf)),
+   is_func ? "function source" : "view definitions");

It's too much of a pain in the rear for translators.  See
https://www.postgresql.org/docs/devel/static/nls-programmer.html#nls-guidelines
Usually we just use two independent messages, and that's what I did.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Tom Lane
Amit Langote  writes:
> On 2017/08/22 9:39, Michael Paquier wrote:
>> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
>>  wrote:
>>> I updated brin_mask() and spg_mask() in the attached updated patches so
>>> that they consider meta pages as containing unused space.

I looked briefly at these patches.  I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane  wrote:
>> I'm not entirely following.  I thought that add_path was set up to treat
>> "can be parallelized" as an independent dimension of merit, so that
>> parallel paths would always survive.

> Here, the Gather path is not parallel-safe, but rather
> parallel-restricted:

Ah, right, the problem is with the Gather not its sub-paths.

>> Might be a tad messy to rearrange things that way.

> Why do you think I wanted you to do it?  :-)

I'll think about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote:
> Ah.  I was not thinking of touching pg_atomic_read_u32/u64_impl,
> although now that you mention it, it's not clear to me why we
> couldn't simplify

> - return *(>value);
> + return ptr->value;

Just to check, I applied that change to pg_atomic_read_u32_impl and
pg_atomic_read_u64_impl, and recompiled.  I get bit-for-bit the
same backend executable.  Maybe it would have an effect on some
other compiler, but I doubt it, except perhaps at -O0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> In particular, as Jeff and Amit point out, it
>> may well be that (a) before apply_projection_to_path(), the cheapest
>> plan is non-parallel and (b) after apply_projection_to_path(), the
>> cheapest plan would be a Gather plan, except that it's too late
>> because we've already thrown that path out.
>
> I'm not entirely following.  I thought that add_path was set up to treat
> "can be parallelized" as an independent dimension of merit, so that
> parallel paths would always survive.

It treats parallel-safety as an independent dimension of merit; a
parallel-safe plan is more meritorious than one of equal cost which is
not.  We need that so that because, for example, forming a partial
path for a join means joining a partial path to a parallel-safe path.
But that doesn't help us here; that's to make sure we can build the
necessary stuff *below* the Gather.  IOW, if we threw away
parallel-safe paths because there was a cheaper parallel-restricted
path, we might be unable to build a partial path for the join *at
all*.

Here, the Gather path is not parallel-safe, but rather
parallel-restricted: it's OK for it to exist in a plan that uses
parallelism (duh), but it can't be nested under another Gather (also
duh, kinda).  So before accounting for the differing projection cost,
the Gather path is doubly inferior: it is more expensive AND not
parallel-safe, whereas the competing non-parallel plan is both cheaper
AND parallel-safe.  After applying the expensive target list, the
parallel-safe plan gets a lot more expensive, but the Gather path gets
more expensive to a lesser degree because the projection step ends up
below the Gather and thus happens in parallel, so now the Gather plan,
still a loser on parallel-safety, is a winner on total cost and thus
ought to have been retained and, in fact, ought to have won.  Instead,
we threw it out too early.

>> What we ought to do, I think, is avoid generating gather paths until
>> after we've applied the target list (and the associated costing
>> changes) to both the regular path list and the partial path list.
>
> Might be a tad messy to rearrange things that way.

Why do you think I wanted you to do it?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> In particular, as Jeff and Amit point out, it
> may well be that (a) before apply_projection_to_path(), the cheapest
> plan is non-parallel and (b) after apply_projection_to_path(), the
> cheapest plan would be a Gather plan, except that it's too late
> because we've already thrown that path out.

I'm not entirely following.  I thought that add_path was set up to treat
"can be parallelized" as an independent dimension of merit, so that
parallel paths would always survive.

> What we ought to do, I think, is avoid generating gather paths until
> after we've applied the target list (and the associated costing
> changes) to both the regular path list and the partial path list.

Might be a tad messy to rearrange things that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
>> I think we can just use "old = ptr->value" to set up for the cmpxchg
>> loop in every generic.h function that uses such a loop.

> I think we might have been talking past each other - I thought you were
> talking about changing the pg_atomic_read_u64_impl implementation for
> external users.

Ah.  I was not thinking of touching pg_atomic_read_u32/u64_impl,
although now that you mention it, it's not clear to me why we
couldn't simplify

-   return *(>value);
+   return ptr->value;

AFAIK, the compiler is entitled to, and does, simplify away that
take-a-pointer-and-immediately-dereference-it dance.  If it did
not, a whole lot of standard array locutions would be much less
efficient than they should be.  What matters here is the volatile
qualifier, which we've already got.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 3:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
>>> If somebody's applying apply_projection_to_path to a path that's already
>>> been add_path'd, that's a violation of the documented restriction.
>
>> /me is confused.  Isn't that exactly what grouping_planner() is doing,
>> and has done ever since your original pathification commit
>> (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
>> current_rel->pathlist, so surely everything in there has been
>> add_path()'d.
>
> I think the assumption there is that we no longer care about validity of
> the input Relation, since we won't be looking at it any more (and
> certainly not adding more paths to it).  If there's some reason why
> that's not true, then maybe grouping_planner has a bug there.

Right, that's sorta what I assumed.  But I think that thinking is
flawed in the face of parallel query, because of the fact that
apply_projection_to_path() pushes down target list projection below
Gather when possible.  In particular, as Jeff and Amit point out, it
may well be that (a) before apply_projection_to_path(), the cheapest
plan is non-parallel and (b) after apply_projection_to_path(), the
cheapest plan would be a Gather plan, except that it's too late
because we've already thrown that path out.

What we ought to do, I think, is avoid generating gather paths until
after we've applied the target list (and the associated costing
changes) to both the regular path list and the partial path list.
Then the cost comparison is apples-to-apples.  The use of
apply_projection_to_path() on every path in the pathlist would be fine
if it were adjusting all the costs by a uniform amount, but it isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> >> It looks to me like two of the three implementations promise no such
> >> thing.
> 
> > They're volatile vars, so why not?
> 
> Yeah, but so are the caller's variables.  That is, in
> 
> pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_)
> {
>   uint64 old;
>   old = ptr->value;
> 
> ISTM that the compiler is required to actually fetch ptr->value, not
> rely on some previous read of it.  I do not think that (the first
> version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't
> there already.
> 
> >> Even if they somehow do, it hardly matters given that the cmpxchg loop
> >> would be self-correcting.
> 
> > Well, in this one instance maybe, hardly in others.
> 
> All the functions involved use nigh-identical cmpxchg loops.
> 
> > What are you suggesting as an alternative?
> 
> I think we can just use "old = ptr->value" to set up for the cmpxchg
> loop in every generic.h function that uses such a loop.

I think we might have been talking past each other - I thought you were
talking about changing the pg_atomic_read_u64_impl implementation for
external users.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
>> It looks to me like two of the three implementations promise no such
>> thing.

> They're volatile vars, so why not?

Yeah, but so are the caller's variables.  That is, in

pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_)
{
uint64 old;
old = ptr->value;

ISTM that the compiler is required to actually fetch ptr->value, not
rely on some previous read of it.  I do not think that (the first
version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't
there already.

>> Even if they somehow do, it hardly matters given that the cmpxchg loop
>> would be self-correcting.

> Well, in this one instance maybe, hardly in others.

All the functions involved use nigh-identical cmpxchg loops.

> What are you suggesting as an alternative?

I think we can just use "old = ptr->value" to set up for the cmpxchg
loop in every generic.h function that uses such a loop.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-09-06 Thread Andres Freund
On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> Hi,
> 
> At Mon, 4 Sep 2017 17:17:19 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> >> However, if that's the reasoning, why don't we make all of these
> >> use simple reads?  It seems unlikely that a locked read is free.
> 
> > We don't really use locked reads? All the _atomic_ wrapper forces is an
> > actual read from memory rather than a register.
> 
> It looks to me like two of the three implementations promise no such
> thing.

They're volatile vars, so why not?


> Even if they somehow do, it hardly matters given that the cmpxchg loop
> would be self-correcting.

Well, in this one instance maybe, hardly in others.


> Mostly, though, I'm looking at the fallback pg_atomic_read_u64_impl
> implementation (with a CAS), which seems far more expensive than can
> be justified for this.

What are you suggesting as an alternative?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
>> If somebody's applying apply_projection_to_path to a path that's already
>> been add_path'd, that's a violation of the documented restriction.

> /me is confused.  Isn't that exactly what grouping_planner() is doing,
> and has done ever since your original pathification commit
> (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
> current_rel->pathlist, so surely everything in there has been
> add_path()'d.

I think the assumption there is that we no longer care about validity of
the input Relation, since we won't be looking at it any more (and
certainly not adding more paths to it).  If there's some reason why
that's not true, then maybe grouping_planner has a bug there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
>> However, if that's the reasoning, why don't we make all of these
>> use simple reads?  It seems unlikely that a locked read is free.

> We don't really use locked reads? All the _atomic_ wrapper forces is an
> actual read from memory rather than a register.

It looks to me like two of the three implementations promise no such
thing.  Even if they somehow do, it hardly matters given that the cmpxchg
loop would be self-correcting.  Mostly, though, I'm looking at the
fallback pg_atomic_read_u64_impl implementation (with a CAS), which
seems far more expensive than can be justified for this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
>>> Okay, now I understand your point, but I think we already change the
>>> cost of paths in apply_projection_to_path which is done after add_path
>>> for top level scan/join paths.
>
>> Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)
>
> Yeah, and it's also documented:
>
>  * This has the same net effect as create_projection_path(), except that if
>  * a separate Result plan node isn't needed, we just replace the given path's
>  * pathtarget with the desired one.  This must be used only when the caller
>  * knows that the given path isn't referenced elsewhere and so can be modified
>  * in-place.
>
> If somebody's applying apply_projection_to_path to a path that's already
> been add_path'd, that's a violation of the documented restriction.

/me is confused.  Isn't that exactly what grouping_planner() is doing,
and has done ever since your original pathification commit
(3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
current_rel->pathlist, so surely everything in there has been
add_path()'d.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Fabien COELHO



Thus short, simple but meaningful examples which show how to do something
useful with all that in the documentation may help people take advantage
of these new features.


I don't have an objection to providing an example.  I wasn't terribly
impressed with Simon's version, but maybe we can do better.


That was just a quick idea to show how they could be used, probably it can 
be improved.


I think it should be concrete, not partly abstract; and likely it needs 
to be with \if not the variable proper.


ISTM that currently there is no "not". Maybe I do not understand your 
sentence.



Given my experience with "\d*", I'm not sure I would assume that a new
psql feature would work with older servers.


I don't recall anyone questioning whether PQserverVersion() would work
with older servers.  Being able to tell what version the server is is
sort of the point, no?  If there were a restriction on what it would
work with, I would agree that that needs to be documented.  But I
don't think "yes, it really works" is a helpful use of documentation
space.


Sure.

I can use psql without knowing that there is a libpq underneath, that it 
contains a PQserverVersion function implemented since pg7, and that the 
SERVER_VERSION_* variables introduced in pg10 or pg11 rely on that thus it 
should work with pg7/8/9 as well. It is not obvious, as a new feature 
could depend on any combination of server side functions, protocol 
extension, client library functions or client code...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Pavel Stehule
>
>
> I think a GUC is a decent, though not perfect, mechanism for this.
> This problem isn't restricted to PL/pgsql; indeed, the cases I've seen
> have come via prepared queries, not PL/pgsql functions.  Even without
> that, one advantage of a GUC is that they are fairly broadly
> understood and experienced users understand what they can do with
> them.  They can be set at various different scopes (system, user,
> database, SET clause for a particular function) and it's relatively
> convenient to do so.  Some kind of PL/pgsql-specific PRAGMA syntax is
> more likely to be overlooked by users who would actually benefit from
> it, and also won't cover non-PL/pgsql cases.  If we were going to go
> the PRAGMA route, it would make more sense to me to define that as a
> way of setting a GUC for the scope of one PL/pgsql block, like PRAGMA
> SETTING(custom_plan_tries, 0).  I think it is in general unfortunate
> that we don't have a mechanism to change a GUC for the lifespan of one
> particular query, like this:
>
> LET custom_plan_tries = 0 IN SELECT ...
>
> I bet a lot of people would find that quite convenient.  The problem
> of needing to set a planner GUC for one particular query is pretty
> common, and Pavel is absolutely right that having to do SET beforehand
> and RESET afterward is ugly.
>

Currently only prepared statement and PLpgSQL uses plan cache, what I know
- so some special GUC has sense only for this two environments.

I understand so GUC can be used and has sense when users cannot to modify
source code or when they want to apply this change globally.

For PLpgSQL there should be block, or statement level related syntax -
PRAGMA is well known alternative from ADA language. Maybe another syntax
like

BEGIN SET xxx = 1, yyy = 2 .. END ... theoretically we can introduce block
level GUC. I don't like it, because there are successful syntax from ADA,
PL/SQL.

Maybe can be useful enhance a PREPARE command to accept second optional
parameter for plan cache controlling - I have not idea about syntax.

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
Hi,

On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> I wrote:
> > Anyway, I don't have a big objection to applying this.  My concern
> > is more that we need to be taking a harder look at other parts of
> > the atomics infrastructure, because tweaks there are likely to buy
> > much more.
> 
> I went ahead and pushed these patches, adding __sync_fetch_and_sub
> since gcc seems to provide that on the same footing as these other
> functions.
> 
> Looking at these generic functions a bit closer, I notice that
> most of them are coded like
> 
>   old = pg_atomic_read_u32_impl(ptr);
>   while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_))
>   /* skip */;
> 
> but there's an exception: pg_atomic_exchange_u64_impl just does
> 
>   old = ptr->value;
>   while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_))
>   /* skip */;
> 
> That's interesting.  Why not pg_atomic_read_u64_impl there?

Because our platforms don't generally guaranatee that 64bit reads:

/*
 * 64 bit reads aren't safe on all platforms. In the generic
 * implementation implement them as a compare/exchange with 0. That'll
 * fail or succeed, but always return the old value. Possible might 
store
 * a 0, but only if the prev. value also was a 0 - i.e. harmless.
 */
pg_atomic_compare_exchange_u64_impl(ptr, , 0);

so I *guess* I decided doing an unnecessary cmpxchg wasn't useful. But
since then we've added an optimization for platforms where we know 64bit
reads have single copy atomicity. So we could change that now.


> I think that the simple read is actually okay as it stands: if we
> are unlucky enough to get a torn read, the first compare/exchange
> will fail to compare equal, and it will replace "old" with a valid
> atomically-read value, and then the next loop iteration has a chance
> to succeed.  Therefore there's no need to pay the extra cost of a
> locked read instead of an unlocked one.
> 
> However, if that's the reasoning, why don't we make all of these
> use simple reads?  It seems unlikely that a locked read is free.

We don't really use locked reads? All the _atomic_ wrapper forces is an
actual read from memory rather than a register.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Tom Lane
Fabien COELHO  writes:
> Thus short, simple but meaningful examples which show how to do something 
> useful with all that in the documentation may help people take advantage 
> of these new features.

I don't have an objection to providing an example.  I wasn't terribly
impressed with Simon's version, but maybe we can do better.  I think
it should be concrete, not partly abstract; and likely it needs to be
with \if not the variable proper.

> Given my experience with "\d*", I'm not sure I would assume that a new 
> psql feature would work with older servers.

I don't recall anyone questioning whether PQserverVersion() would work
with older servers.  Being able to tell what version the server is is
sort of the point, no?  If there were a restriction on what it would
work with, I would agree that that needs to be documented.  But I
don't think "yes, it really works" is a helpful use of documentation
space.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote:
> Anyway, I don't have a big objection to applying this.  My concern
> is more that we need to be taking a harder look at other parts of
> the atomics infrastructure, because tweaks there are likely to buy
> much more.

I went ahead and pushed these patches, adding __sync_fetch_and_sub
since gcc seems to provide that on the same footing as these other
functions.

Looking at these generic functions a bit closer, I notice that
most of them are coded like

old = pg_atomic_read_u32_impl(ptr);
while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_))
/* skip */;

but there's an exception: pg_atomic_exchange_u64_impl just does

old = ptr->value;
while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_))
/* skip */;

That's interesting.  Why not pg_atomic_read_u64_impl there?

I think that the simple read is actually okay as it stands: if we
are unlucky enough to get a torn read, the first compare/exchange
will fail to compare equal, and it will replace "old" with a valid
atomically-read value, and then the next loop iteration has a chance
to succeed.  Therefore there's no need to pay the extra cost of a
locked read instead of an unlocked one.

However, if that's the reasoning, why don't we make all of these
use simple reads?  It seems unlikely that a locked read is free.

If there's actually a reason for pg_atomic_exchange_u64_impl to be
different from the rest, it needs to have a comment explaining why.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Fabien COELHO


Hello,


* Clarification that this will work for current AND past server versions


The short answer is it works.  I do not think we need a longer answer.


To have something operational you have to know quite a bit of psql
details (:-substitutions, backslash command logic, gset, if, quit...).

Thus short, simple but meaningful examples which show how to do something 
useful with all that in the documentation may help people take advantage 
of these new features.


Given my experience with "\d*", I'm not sure I would assume that a new 
psql feature would work with older servers.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
>> Okay, now I understand your point, but I think we already change the
>> cost of paths in apply_projection_to_path which is done after add_path
>> for top level scan/join paths.

> Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)

Yeah, and it's also documented:

 * This has the same net effect as create_projection_path(), except that if
 * a separate Result plan node isn't needed, we just replace the given path's
 * pathtarget with the desired one.  This must be used only when the caller
 * knows that the given path isn't referenced elsewhere and so can be modified
 * in-place.

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

It might be that we should just get rid of apply_projection_to_path and
use create_projection_path, which is less mistake-prone at the cost of
manufacturing another level of Path node.  Now that that has the dummypp
flag, it really shouldn't make any difference in terms of the accuracy of
the cost estimates.

> I'd feel a lot happier if Tom were to decide how this ought to be
> fixed, because - in spite of some modifications by various parallel
> query code - this is basically all his design and mostly his code.

I can take a look, but not right away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add citext_pattern_ops to citext contrib module

2017-09-06 Thread Jacob Champion
On Tue, Jul 18, 2017 at 5:18 AM, Alexey Chernyshov
 wrote:
> Hi all,

Hi Alexey, I took a look at your patch. Builds fine here, and passes
the new tests.

I'm new to this code, so take my review with a grain of salt.

> The attached patch introduces citext_pattern_ops for citext extension type
> like text_pattern_ops for text type. Here are operators ~<~, ~<=~, ~>~, ~>=~
> combined into citext_pattern_ops operator class. These operators simply
> compare underlying citext values as C strings with memcmp() function.

Are there any cases where performing the str_tolower with the default
collation, then comparing byte-by-byte, could backfire? The added test
cases don't make use of any multibyte/accented characters, so it's not
clear to me yet what *should* be happening in those cases.

It also might be a good idea to add some test cases that compare
strings of different lengths, to exercise all the paths in
internal_citext_pattern_cmp().

> +-- test citext_pattern_cmp() function explicetily.

Spelling nitpick in the new SQL: s/explicetily/explicitly .

--Jacob


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Jesper Pedersen

Hi Jeff,

On 09/05/2017 03:47 PM, Jeff Janes wrote:

I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using
both logged and unlogged tables. Also ran an internal benchmark which
didn't show anything either.



What scale factor and client count? How many cores per socket?  It looks
like Sokolov was just starting to see gains at 200 clients on 72 cores,
using -N transaction.


I have done a run with scale factor 300, and another with 3000 on a 
2S/28C/56T/256Gb w/ 2 x RAID10 SSD machine; up to 200 clients.


I would consider the runs as "noise" as I'm seeing +-1% for all client 
counts, so nothing like Yura is seeing in [1] for the higher client counts.


I did a run with -N too using scale factor 300, using the settings in 
[1], but with same result (+-1%).


[1] 
https://www.postgresql.org/message-id/d62d7d9d473d07e172d799d5a57e7...@postgrespro.ru


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 10:27, Tom Lane  wrote:
> Simon Riggs  writes:
>> Other than that, I'm good to commit.
>> Any last minute objections?
>
> The docs and comments could stand a bit closer copy-editing by a
> native English speaker.  Other than that, it seemed sane in a
> quick read-through.

Will do

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
> Okay, now I understand your point, but I think we already change the
> cost of paths in apply_projection_to_path which is done after add_path
> for top level scan/join paths.

Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)

It's used in various places with comments like this:

/*
 * The path might not return exactly what we want, so fix that.  (We
 * assume that this won't change any conclusions about which was the
 * cheapest path.)
 */

And in another place:

 * In principle we should re-run set_cheapest() here to identify the
 * cheapest path, but it seems unlikely that adding the same tlist
 * eval costs to all the paths would change that, so we don't bother.

I think these assumptions were a little shaky even before parallel
query came along, but they're now outright false, because we're not
adding the *same* tlist eval costs to all paths any more.  The
parallel paths are getting smaller costs.  That probably doesn't
matter much if the expressions in questions are things like a + b, but
when as in Jeff's example it's slow(a), then it matters a lot.

I'd feel a lot happier if Tom were to decide how this ought to be
fixed, because - in spite of some modifications by various parallel
query code - this is basically all his design and mostly his code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Other than that, I'm good to commit.
> Any last minute objections?

The docs and comments could stand a bit closer copy-editing by a
native English speaker.  Other than that, it seemed sane in a
quick read-through.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> It's not a question of whether the return value is used, but of
> whether the updated value of *old is used.

Right, but if we re-read "old" in the loop, and if the primitive
doesn't return "old" (or does, but call site ignores it) then in
principle the CAS might be strength-reduced a bit.  Whether that
can win enough to be better than removing the unlocked read,
I dunno.

In the case at hand, looking at a loop like

while (count-- > 0)
{
(void) pg_atomic_fetch_or_u32(myptr, 1);
}

with our HEAD code and RHEL6's gcc I get this for the inner loop:

.L9:
movl(%rdx), %eax
movl%eax, %ecx
orl $1, %ecx
lock
cmpxchgl%ecx,(%rdx) 
setz%cl 
testb   %cl, %cl
je  .L9
subq$1, %rbx
testq   %rbx, %rbx
jg  .L9

Applying the proposed generic.h patch, it becomes

.L10:
movl(%rsi), %eax
.L9:
movl%eax, %ecx
orl $1, %ecx
lock
cmpxchgl%ecx,(%rdx) 
setz%cl 
testb   %cl, %cl
je  .L9
subq$1, %rbx
testq   %rbx, %rbx
jg  .L10

Note that in both cases the cmpxchgl is coming out of the asm construct in
pg_atomic_compare_exchange_u32_impl from atomics/arch-x86.h, so that even
if a simpler assembly instruction were possible given that we don't need
%eax to be updated, there's no chance of that actually happening.  This
gets back to the point I made in the other thread that maybe the
arch-x86.h asm sequences are not an improvement over the gcc intrinsics.

The code is the same if the loop is modified to use the result of
pg_atomic_fetch_or_u32, so I won't bother showing that.

Adding the proposed generic-gcc.h patch, the loop reduces to

.L11:
lock orl$1, (%rax)
subq$1, %rbx
testq   %rbx, %rbx
jg  .L11

or if we make the loop do
junk += pg_atomic_fetch_or_u32(myptr, 1);
then we get

.L13:
movl(%rsi), %eax
.L10:
movl%eax, %edi
movl%eax, %ecx
orl $1, %ecx
lock cmpxchgl   %ecx, (%rdx)
jne .L10
addl%edi, %r8d
subq$1, %rbx
testq   %rbx, %rbx
jg  .L13

So that last is slightly better than the generic.h + asm CAS version,
perhaps not meaningfully so --- but it's definitely better when
the compiler can see the result isn't used.

In short then, given the facts on the ground here, in particular the
asm-based CAS primitive at the heart of these loops, it's clear that
taking the read out of the loop can't hurt.  But that should be read
as a rather narrow conclusion.  With a different compiler and/or a
different version of pg_atomic_compare_exchange_u32_impl, maybe the
answer would be different.  And of course it's moot once the
generic-gcc.h patch is applied.

Anyway, I don't have a big objection to applying this.  My concern
is more that we need to be taking a harder look at other parts of
the atomics infrastructure, because tweaks there are likely to buy
much more.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-09-06 Thread Simon Riggs
On 4 September 2017 at 10:30, Adrien Nayrat  wrote:
> On 09/04/2017 06:16 PM, Alexander Korotkov wrote:
>> Looks good for me.  I've integrated those changes in the patch.
>> New revision is attached.
>
> Thanks, I changed status to "ready for commiter".

This looks useful and good to me.

I've changed this line of code to return NULL rather than return tuple
if (!HeapTupleIsValid(tuple))
return tuple;

Other than that, I'm good to commit.

Any last minute objections?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs  wrote:
>>> I'd personally be fine with --no-whatever for any whatever that might
>>> be a subsidiary property of database objects.  We've got
>>> --no-security-labels, --no-tablespaces, --no-owner, and
>>> --no-privileges already, so what's wrong with --no-comments?
>>>
>>> (We've also got --no-publications; I think it's arguable whether that
>>> is the same kind of thing.)
>>
>> And --no-subscriptions in the same bucket.
>
> Yes, it is. I was suggesting that we remove those as well.

That seems like a non-starter to me.  I have used those options many
times to solve real problems, and I'm sure other people have as well.
We wouldn't have ended up with all of these options if users didn't
want to control such things.

> But back to the main point which is that --no-comments discards ALL
> comments simply to exclude one pointless and annoying comment. That
> runs counter to our stance that we do not allow silent data loss. I
> want to solve the problem too. I accept that not everyone uses
> comments, but if they do, spilling them all on the floor is a user
> visible slip up that we should not be encouraging. Sorry y'all.

/me shrugs.

I agree that there could be better solutions to the original problem,
but I think there's no real argument that a user can't exclude
comments from a backup if they wish to do so.  As the OP already
pointed out, it can still be done without the switch; it's just more
annoying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variable substitution in psql backtick expansion

2017-09-06 Thread Fabien COELHO



Seeing it as is, it calls for having "SERVER_VERSION" as well, but I'm not
sure of the better way to get it. I tried with "SELECT VERSION() AS
SERVER_VERSION \gset" but varnames are lowerized.


The problem there is you can't get version() without an extra round trip
to the server --- and an extra logged query --- which people are going to
complain about.


Here is a PoC that does it through a guc, just like "server_version" (the 
short version) is transmitted, with a fallback if it is not there.


Whether it is worth it is debatable, but I like the symmetry of having
the same informations accessible the same way for client and server sides.

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5f59a38..8b69ed1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7961,8 +7961,8 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
   

-Reports the version number of the server. It is determined by the
-value of PG_VERSION when building the server.
+Reports the version number of the server as a short string. It is determined
+by the value of PG_VERSION when building the server.

   
  
@@ -7981,6 +7981,20 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  server_version_raw (string)
+  
+   server_version_raw configuration parameter
+  
+  
+  
+   
+Reports the version of the server as a long string. It is determined
+by the value of PG_VERSION_STR when building the server.
+   
+  
+ 
+
  
   wal_block_size (integer)
   
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 5bdbc1e..1be57d2 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3690,11 +3690,14 @@ bar
   
 
   
+SERVER_VERSION
 SERVER_VERSION_NAME
 SERVER_VERSION_NUM
 
 
-The server's version number as a string, for
+The server's version number as a long string, for
+example PostgreSQL 11devel ...,
+as a short string, for
 example 9.6.2, 10.1 or 11beta1,
 and in numeric form, for
 example 90602 or 11.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 246fea8..fd843d4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -500,6 +500,7 @@ static char *locale_collate;
 static char *locale_ctype;
 static char *server_encoding_string;
 static char *server_version_string;
+static char *server_version_raw_string;
 static int	server_version_num;
 static char *timezone_string;
 static char *log_timezone_string;
@@ -3298,6 +3299,18 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		/* Can't be set in postgresql.conf */
+		{"server_version_raw", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows the server version string."),
+			NULL,
+			GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		_version_raw_string,
+		PG_VERSION_STR,
+		NULL, NULL, NULL
+	},
+
+	{
 		/* Not for general use --- used by SET ROLE */
 		{"role", PGC_USERSET, UNGROUPED,
 			gettext_noop("Sets the current role."),
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index fe0b83e..e2ba8ee 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3358,7 +3358,8 @@ void
 SyncVariables(void)
 {
 	char		vbuf[32];
-	const char *server_version;
+	const char *server_version,
+			   *server_version_raw;
 
 	/* get stuff from connection */
 	pset.encoding = PQclientEncoding(pset.db);
@@ -3385,6 +3386,17 @@ SyncVariables(void)
 	snprintf(vbuf, sizeof(vbuf), "%d", pset.sversion);
 	SetVariable(pset.vars, "SERVER_VERSION_NUM", vbuf);
 
+	server_version_raw = PQparameterStatus(pset.db, "server_version_raw");
+	/* fall back again */
+	if (!server_version_raw)
+	{
+		snprintf(vbuf, sizeof(vbuf), "PostgreSQL ");
+		formatPGVersionNumber(pset.sversion, true, vbuf + strlen(vbuf),
+			  sizeof(vbuf) - strlen(vbuf));
+		server_version_raw = vbuf;
+	}
+	SetVariable(pset.vars, "SERVER_VERSION", server_version_raw);
+
 	/* send stuff to it, too */
 	PQsetErrorVerbosity(pset.db, pset.verbosity);
 	PQsetErrorContextVisibility(pset.db, pset.show_context);
@@ -3403,6 +3415,7 @@ UnsyncVariables(void)
 	SetVariable(pset.vars, "HOST", NULL);
 	SetVariable(pset.vars, "PORT", NULL);
 	SetVariable(pset.vars, "ENCODING", NULL);
+	SetVariable(pset.vars, "SERVER_VERSION", NULL);
 	SetVariable(pset.vars, "SERVER_VERSION_NAME", NULL);
 	SetVariable(pset.vars, "SERVER_VERSION_NUM", NULL);
 }
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a58f701..4aa18fd 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -281,6 +281,10 @@ pqSetenvPoll(PGconn *conn)
 		{
 			char	   *ptr;
 
+			/* keep 

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-06 Thread Simon Riggs
On 1 September 2017 at 22:08, Michael Paquier  wrote:
> On Sat, Sep 2, 2017 at 1:53 AM, Robert Haas  wrote:
>> On Mon, Aug 21, 2017 at 5:30 PM, Simon Riggs  wrote:
>>> Thinking ahead, are we going to add a new --no-objecttype switch every
>>> time someone wants it?
>>
>> I'd personally be fine with --no-whatever for any whatever that might
>> be a subsidiary property of database objects.  We've got
>> --no-security-labels, --no-tablespaces, --no-owner, and
>> --no-privileges already, so what's wrong with --no-comments?
>>
>> (We've also got --no-publications; I think it's arguable whether that
>> is the same kind of thing.)
>
> And --no-subscriptions in the same bucket.

Yes, it is. I was suggesting that we remove those as well.

But back to the main point which is that --no-comments discards ALL
comments simply to exclude one pointless and annoying comment. That
runs counter to our stance that we do not allow silent data loss. I
want to solve the problem too. I accept that not everyone uses
comments, but if they do, spilling them all on the floor is a user
visible slip up that we should not be encouraging. Sorry y'all.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create replication slot in pg_basebackup if requested and not yet present

2017-09-06 Thread Peter Eisentraut
On 8/18/17 05:28, Michael Banck wrote:
>>> Rebased, squashed and slighly edited version attached. I've added this
>>> to the 2017-07 commitfest now as well:
>>>
>>> https://commitfest.postgresql.org/14/1112/
>> Can you rebase this past some conflicting changes?
> Thanks for letting me know, PFA a rebased version.

I have reviewed the thread so far.  I think there is general agreement
that something like this would be good to have.

I have not found any explanation, however, why the "if not exists"
behavior is desirable, let alone as the default.  I can only think of
two workflows here:  Either you have scripts for previous PG versions
that create the slot externally, in which can you omit --create, or you
use the new functionality to have pg_basebackup create the slot.  I
don't see any use for pg_basebackup to opportunistically use a slot if
it happens to exist.  Even if there is one, it should not be the
default.  So please change that.

A minor point, I suggest to print the message about the replication slot
being created *after* the slot has been created.  This aligns with how
logical replication subscriptions work.  I don't see the need for
printing a message about temporary slots.  Since they are temporary,
they will go away automatically, so there is nothing the user needs to
know there.

With these two changes, I think I'd be content with this patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-09-06 Thread Dilip Kumar
On Mon, Sep 4, 2017 at 10:52 AM, Amit Khandekar  wrote:
> On 4 September 2017 at 07:43, Amit Kapila  wrote:
> Oops sorry. Now attached.

I have done some basic testing and initial review of the patch.  I
have some comments/doubts.  I will continue the review.

+ if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture)
+ ExecARUpdateTriggers(estate, resultRelInfo, InvalidOid,

For passing invalid ItemPointer we are using InvalidOid, this seems
bit odd to me
are we using simmilar convention some other place? I think it would be better to
just pass 0?

--

- if ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
- (event == TRIGGER_EVENT_UPDATE && update_old_table))
+ if (oldtup != NULL &&
+ ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
+ (event == TRIGGER_EVENT_UPDATE && update_old_table)))
  {
  Tuplestorestate *old_tuplestore;

- Assert(oldtup != NULL);

Only if TRIGGER_EVENT_UPDATE it is possible that oldtup can be NULL,
so we have added an extra
check for oldtup and removed the Assert, but if  TRIGGER_EVENT_DELETE
we never expect it to be NULL.

Is it better to put Assert outside the condition check (Assert(oldtup
!= NULL || event == TRIGGER_EVENT_UPDATE)) ?
same for the newtup.

I think we should also explain in comments about why oldtup or newtup
can be NULL in case of if
TRIGGER_EVENT_UPDATE

---

+triggers affect the row being moved. As far as AFTER ROW
+triggers are concerned, AFTER DELETE and
+AFTER INSERT triggers are applied; but
+AFTER UPDATE triggers are not applied
+because the UPDATE has been converted to a
+DELETE and INSERT.

Above comments says that ARUpdate trigger is not fired but below code call
ARUpdateTrigger

+ if (mtstate->operation == CMD_UPDATE && mtstate->mt_transition_capture)
+ ExecARUpdateTriggers(estate, resultRelInfo, InvalidOid,
+ NULL,
+ tuple,
+ NULL,
+ mtstate->mt_transition_capture);


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> * Clarification that this will work for current AND past server versions

The short answer is it works.  I do not think we need a longer answer.

(The existing comment in libpq says that the "select version()" method
works in any server version that supports protocol v2, which means that
we will derive a correct server version ID for any server that modern
libpq is capable of connecting to at all.  I've not gone back to re-verify
what I wrote in 2003, but I see no reason to doubt it.  We'd have been
considerably more fussed about whether PQserverVersion worked correctly
for ancient server versions back then than we need be today.)

> * Clarification to avoid confusion between VERSION and SERVER_VERSION

In what way are these variables not adequately specified already?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 12:01 PM, Tom Lane  wrote:
>> This seems like a pretty sound argument to me.  I think Tom's probably
>> right that the changes in generic-gcc.h are the important ones, but
>> I'm not sure that's an argument against patching generics.h.  Given
>> that pg_atomic_compare_exchange_u32_impl is defined to update *old
>> there seems to be no reason to call pg_atomic_read_u32_impl every time
>> through the loop.
>
> Probably not.  I'm not quite 100% convinced of that, because of my
> observation that gcc is capable of generating different and better
> code for some of these primitives if it can prove that the return
> value is not needed.  It's not clear that that could apply in any
> of these uses of pg_atomic_compare_exchange_u32_impl, though.
> In any case, by my own argument, it shouldn't matter, because if
> any of these are really performance-critical then we should be
> looking for better ways.

It's not a question of whether the return value is used, but of
whether the updated value of *old is used.

Unless I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura
>  wrote:
>> But I think, generic version still should be "fixed".
>> If generic version is not reached on any platform, then why it is kept?
>> If it is reached somewhere, then it should be improved.

> This seems like a pretty sound argument to me.  I think Tom's probably
> right that the changes in generic-gcc.h are the important ones, but
> I'm not sure that's an argument against patching generics.h.  Given
> that pg_atomic_compare_exchange_u32_impl is defined to update *old
> there seems to be no reason to call pg_atomic_read_u32_impl every time
> through the loop.

Probably not.  I'm not quite 100% convinced of that, because of my
observation that gcc is capable of generating different and better
code for some of these primitives if it can prove that the return
value is not needed.  It's not clear that that could apply in any
of these uses of pg_atomic_compare_exchange_u32_impl, though.
In any case, by my own argument, it shouldn't matter, because if
any of these are really performance-critical then we should be
looking for better ways.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 06:38, Tom Lane  wrote:
> Simon Riggs  writes:
>> Based upon input from Tom and Fabien, I propose this additional doc patch.
>
> I do not think any of this is appropriate, particularly not the reference
> to 7.0.3.

OK, no problem.

SERVER_VERSION_NUM is a great new feature. I think these points need
further changes

* An example of the intended use of SERVER_VERSION_NUM in psql
* Clarification that this will work for current AND past server versions
* Clarification to avoid confusion between VERSION and SERVER_VERSION

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Assert that the correct locks are held when calling PageGetLSN()

2017-09-06 Thread Jacob Champion
On Tue, Sep 5, 2017 at 10:49 PM, Michael Paquier
 wrote:
> On Wed, Sep 6, 2017 at 1:15 AM, Jacob Champion  wrote:
>> On Sun, Sep 3, 2017 at 11:14 PM, Michael Paquier
>>  wrote:
>>> +#if defined(USE_ASSERT_CHECKING) && !defined(FRONTEND)
>>> +void
>>> +AssertPageIsLockedForLSN(Page page)
>>> +{
>>> From a design point of view, bufmgr.c is an API interface for buffer
>>> management on the backend-size, so just using FRONTEND there looks
>>> wrong to me (bufmgr.h is already bad enough IMO now).
>>
>> The comments suggested that bufmgr.h could be incidentally pulled into
>> frontend code through other headers. Or do you mean that I don't need
>> to check for FRONTEND in the C source file (since, I assume, it's only
>> ever being compiled for the backend)? I'm not sure what you mean by
>> "bufmgr.h is already bad enough".
>
> I mean that this should not become worse without a better reason. This
> patch makes it so.
>
>>> This bit in src/backend/access/transam/README may interest you:
>>> Note that we must only use PageSetLSN/PageGetLSN() when we know the action
>>> is serialised. Only Startup process may modify data blocks during recovery,
>>> so Startup process may execute PageGetLSN() without fear of serialisation
>>> problems. All other processes must only call PageSet/GetLSN when holding
>>> either an exclusive buffer lock or a shared lock plus buffer header lock,
>>> or be writing the data block directly rather than through shared buffers
>>> while holding AccessExclusiveLock on the relation.
>>>
>>> So I see no problem with the exiting caller.
>>
>> Is heap_xlog_visible only called during startup?
>
> Recovery is more exact. When replaying a XLOG_HEAP2_VISIBLE record.
>
>> If so, is there a
>> general rule for which locks are required during startup and which
>> aren't?
>
> You can surely say that a process is fine to read a variable without a
> lock if it is the only one updating it, other processes would need a
> lock to read a variable. In this case the startup process is the only
> one updating blocks, so that's at least one code path where the is no
> need to take a lock when looking at a page LSN with PageGetLSN.

Is there a way to programmatically determine whether or not we're in
such a situation? That could be added to the assertion.

The code here is pretty inconsistent on whether or not PageGetLSN is
called inside or outside a lock -- see XLogReadBufferForRedoExtended
for instance.

> Another example is pageinspect which works on a copy of a page and
> uses PageGetLSN, so no direct locks on the buffer page is needed.

And again -- this case should be correctly handled by the new
assertion. Copies of pages are not checked for locks; we can't recover
a buffer header from a page that isn't part of the BufferBlocks array.

> In short, it seems to me that this patch should be rejected in its
> current shape.

Is the half of the patch that switches PageGetLSN to
BufferGetLSNAtomic correct, at least?

> There is no need to put more constraints on a API which
> does not need more constraints and which is used widely by extensions.

The goal here is not to add more constraints, but to verify that the
existing constraints are followed. Perhaps this patch doesn't do that
well/correctly, but I want to make sure we're on the same page
regarding the end goal.

--Jacob


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 9:42 AM, Sokolov Yura
 wrote:
> Yes, you're right.
>
> But I think, generic version still should be "fixed".
> If generic version is not reached on any platform, then why it is kept?
> If it is reached somewhere, then it should be improved.

This seems like a pretty sound argument to me.  I think Tom's probably
right that the changes in generic-gcc.h are the important ones, but
I'm not sure that's an argument against patching generics.h.  Given
that pg_atomic_compare_exchange_u32_impl is defined to update *old
there seems to be no reason to call pg_atomic_read_u32_impl every time
through the loop.  Whether doing so hurts performance in practice is
likely to depend on a bunch of stuff, but we don't normally refuse to
remove unnecessary code on the grounds that it will rarely be reached.
Given that CPUs are weird, it's possible that there is some system
where throwing an extra read of a value we already have into the loop
works out to a win, but in the absence of evidence I'm reluctant to
presume it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 11:03 AM, Tom Lane  wrote:
> That's fair enough.  We need to have a discussion about exactly what
> the knob does, which is distinct from the question of how you spell
> the incantation for twiddling it.  I'm dubious that a dumb "force a
> custom plan" setting is going to solve all that many cases usefully.

I think what people need is the ability to force the behavior in
either direction - either insist on a custom plan, or insist on a
generic plan.  The former is what you need if the plan stinks, and the
latter is what you need if replanning is a waste of effort.  I have
seen both cases.  The latter has been a bigger problem than the
former, because the former can be hacked around in various ugly and
inefficient ways, but if we're adding a knob I think it should have
three settings.  There is perhaps an argument for even more
configurability, like altering the number of plan tries from 5 to some
other value, but I'm not clear that there's any use case for values
other than 0, 5, and +infinity.

>> I think it is in general unfortunate that we don't have a mechanism to
>> change a GUC for the lifespan of one particular query, like this:
>
>> LET custom_plan_tries = 0 IN SELECT ...
>
> Hmm.  I think the core problem here is that we're trying to control
> the plancache, which is a pretty much behind-the-scenes mechanism.
> Except in the case of an explicit PREPARE, you can't even see from
> SQL that the cache is being used, or when it's used.  So part of what
> needs to be thought about, if we use the GUC approach, is when the
> GUC's value is consulted.  If we don't do anything special then
> the GUC(s) would be consulted when retrieving plans from the cache,
> and changes in their values from one retrieval to the next might
> cause funny behavior.  Maybe the relevant settings need to be captured
> when the plancache entry is made ... not sure.

What sort of funny behavior are you concerned about?  It seems likely
to me that in most cases the GUC will have the same value every time
through, but if it doesn't, I'm not sure why we'd need to use the old
value rather than the current one.  Indeed, if the user changes the
GUC from "force custom" to "force generic" and reruns the function, we
want the new value to take effect, lest a POLA violation occur.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> I don't think we can just indefinitely continue to resist
> providing manual control over this behavior on the theory that some
> day we'll fix it.

That's fair enough.  We need to have a discussion about exactly what
the knob does, which is distinct from the question of how you spell
the incantation for twiddling it.  I'm dubious that a dumb "force a
custom plan" setting is going to solve all that many cases usefully.

> I think a GUC is a decent, though not perfect, mechanism for this.
> This problem isn't restricted to PL/pgsql; indeed, the cases I've seen
> have come via prepared queries, not PL/pgsql functions.

That's 100% correct, and is actually the best reason not to consider
a PRAGMA (or any other plpgsql-specific mechanism) as the incantation
spelling.

> I think it is in general unfortunate that we don't have a mechanism to
> change a GUC for the lifespan of one particular query, like this:

> LET custom_plan_tries = 0 IN SELECT ...

Hmm.  I think the core problem here is that we're trying to control
the plancache, which is a pretty much behind-the-scenes mechanism.
Except in the case of an explicit PREPARE, you can't even see from
SQL that the cache is being used, or when it's used.  So part of what
needs to be thought about, if we use the GUC approach, is when the
GUC's value is consulted.  If we don't do anything special then
the GUC(s) would be consulted when retrieving plans from the cache,
and changes in their values from one retrieval to the next might
cause funny behavior.  Maybe the relevant settings need to be captured
when the plancache entry is made ... not sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Tom Lane
Alvaro Herrera  writes:
> Peter Eisentraut wrote:
>> We also need to have a plan for handling the build farm.  Maybe keep the
>> vcregress.pl upgradecheck target as a thin wrapper for the time being?

> The buildfarm already runs "make check" in src/bin/ when TAP tests are
> enabled, which should be enough to trigger the rewritten test, no?

I think Peter's on about the Windows case.  Not sure how that's handled,
but it's not "make check".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Robert Haas
On Tue, Sep 5, 2017 at 1:38 PM, Tom Lane  wrote:
> The complaint I have about PRAGMA is that it's yet another syntax for
> accomplishing pretty much the same thing.  If you don't like the GUC
> solution, we've already got the "comp_option" syntax for static options
> in plpgsql.  Sure, that's not too pretty, but that's not a good reason
> to invent yet another way to do it.

On the general question of whether we should have something like this,
I expressed a lot of doubt when
e6faf910d75027bdce7cd0f2033db4e912592bcc first went in about whether
that algorithm was really going to work, and nothing has happened
since then to remove any of that doubt.  It's pretty clear to me from
experiences with customer problems that the heuristics we have often
fail to do the right thing, either because queries with no hope of
benefiting still replan 5 times - which can waste a ton of CPU when
there are many different queries in the plan cache and many sessions -
or because queries that would benefit in some cases give up on
replanning before they hit a case where a parameter-specific plan
helps.  I don't think we can just indefinitely continue to resist
providing manual control over this behavior on the theory that some
day we'll fix it.  It's been six years and we haven't made any
significant progress.  In some cases, a long delay without any
progress might just point to a lack of effort that should have been
applied, but in this case I think it's because the problem is
incredibly hard.

I think what we ideally want to do is notice whether the new bind
variables cause a change in selectivity which is sufficient to justify
a re-plan.  If we annotated the original plan with markers indicating
that it was valid for all values with a frequency of more than X and
less than Y, for example, we could cover most cases involving
equality; range queries would need some other kind of annotation.
However, it's unclear how the planner could produce such annotations,
and it's unclear how expensive checking against them would be if we
had them.  Barring somebody having a brilliant insight about how to
make some system that's way better than what we have right now, I
think we can't hold out much hope of any better fix than a manual
knob.

I think a GUC is a decent, though not perfect, mechanism for this.
This problem isn't restricted to PL/pgsql; indeed, the cases I've seen
have come via prepared queries, not PL/pgsql functions.  Even without
that, one advantage of a GUC is that they are fairly broadly
understood and experienced users understand what they can do with
them.  They can be set at various different scopes (system, user,
database, SET clause for a particular function) and it's relatively
convenient to do so.  Some kind of PL/pgsql-specific PRAGMA syntax is
more likely to be overlooked by users who would actually benefit from
it, and also won't cover non-PL/pgsql cases.  If we were going to go
the PRAGMA route, it would make more sense to me to define that as a
way of setting a GUC for the scope of one PL/pgsql block, like PRAGMA
SETTING(custom_plan_tries, 0).  I think it is in general unfortunate
that we don't have a mechanism to change a GUC for the lifespan of one
particular query, like this:

LET custom_plan_tries = 0 IN SELECT ...

I bet a lot of people would find that quite convenient.  The problem
of needing to set a planner GUC for one particular query is pretty
common, and Pavel is absolutely right that having to do SET beforehand
and RESET afterward is ugly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-09-06 Thread Alexander Korotkov
On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
> check that queries give same results on master and standby.  They just
> check that *return codes* of psql are equal.
>
> # Run test queries and compare their result
>> my $master_result = $node_master->psql("postgres", $queries);
>> my $standby_result = $node_standby->psql("postgres", $queries);
>> is($master_result, $standby_result, "$test_name: query result matches");
>
>
> Attached patch fixes this problem by using safe_psql() which returns psql
> output instead of return code.  For safety, this patch replaces psql() with
> safe_psql() in other places too.
>
> I think this should be backpatched to 9.6 as bugfix.
>

Also, it would be nice to call wal-check on bloom check (now wal-check
isn't called even on check-world).
See attached patch.  My changes to Makefile could be cumbersome.  Sorry for
that, I don't have much experience with them...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


wal-check-on-bloom-check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Alvaro Herrera
Peter Eisentraut wrote:

> I propose splitting the single Perl script into three separate test
> files: one for basic command-line option handling and so on (I would
> like to expand that later), one for the main upgrade test, and one for
> the funny database names tests.

Check.

> We also need to have a plan for handling the build farm.  Maybe keep the
> vcregress.pl upgradecheck target as a thin wrapper for the time being?

The buildfarm already runs "make check" in src/bin/ when TAP tests are
enabled, which should be enough to trigger the rewritten test, no?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-06 Thread Yura Sokolov
I've changed to "need review" to gain more attention from other.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is anything preventing us from allowing write to foreign tables from standby?

2017-09-06 Thread Alexander Korotkov
Hi!

We're currently blocking writing queries on standby if even they are
modifying contents of foreign tables.  But do we have serious reasons for
that?
Keeping in the mind FDW-sharding, making FDW-tables writable from standby
would be good to prevent single-master bottleneck.
I wrote simple patch enabling writing to foreign tables from standbys.  It
works from the first glance for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


standby-fdw-writable.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 16:36, Tom Lane wrote:

Sokolov Yura  writes:

On 2017-09-06 15:56, Tom Lane wrote:

The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the 
less-generic
atomics code, and the latter is where our attention should be 
focused.
I think basically all of the improvement Sokolov got was from 
upgrading

the coverage of generic-gcc.h.



Not exactly. I've checked, that new version of generic
pg_atomic_fetch_or_u32
loop also gives improvement.


But once you put in the generic-gcc version, that's not reached 
anymore.




Yes, you're right.

But I think, generic version still should be "fixed".
If generic version is not reached on any platform, then why it is kept?
If it is reached somewhere, then it should be improved.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Based upon input from Tom and Fabien, I propose this additional doc patch.

I do not think any of this is appropriate, particularly not the reference
to 7.0.3.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Sokolov Yura  writes:
> On 2017-09-06 15:56, Tom Lane wrote:
>> The point I'm trying to make is that if tweaking generic.h improves
>> performance then it's an indicator of missed cases in the less-generic
>> atomics code, and the latter is where our attention should be focused.
>> I think basically all of the improvement Sokolov got was from upgrading
>> the coverage of generic-gcc.h.

> Not exactly. I've checked, that new version of generic 
> pg_atomic_fetch_or_u32
> loop also gives improvement.

But once you put in the generic-gcc version, that's not reached anymore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 15:56, Tom Lane wrote:

Simon Riggs  writes:

On 5 September 2017 at 21:23, Tom Lane  wrote:
Moreover, it matters which primitive you're testing, on which 
platform,

with which compiler, because we have a couple of layers of atomic ops
implementations.



If there is no gain on 2-socket, at least there is no loss either.


The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the less-generic
atomics code, and the latter is where our attention should be focused.
I think basically all of the improvement Sokolov got was from upgrading
the coverage of generic-gcc.h.

regards, tom lane


Not exactly. I've checked, that new version of generic 
pg_atomic_fetch_or_u32

loop also gives improvement. Without that check I'd not suggest to fix
generic atomic functions. Of course, gcc intrinsic gives more gain.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Simon Riggs
On 5 September 2017 at 11:58, Fabien COELHO  wrote:
>
> Hello Simon,
>
>> Does raise the further question of how psql behaves when we connect to
>> a pre-10 server, so we have SERVER_VERSION_NUM but yet it is not set.
>> How does this
>> \if SERVER_VERSION_NUM < x
>
>
> The if does not have expressions (yet), it just handles TRUE/ON/1 and
> FALSE/0/OFF.
>
>> Do we need some macro or suggested special handling?
>
>
> If "SERVER_VERSION_NUM" is available in 10, then:
>
>   -- exit if version < 10 (\if is ignored and \q is executed)
>   \if false \echo "prior 10" \q \endif
>
>   -- then test version through a server side expression, will work
>   SELECT :SERVER_VERSION_NUM < 11 AS prior_11 \gset
>   \if :prior_11
> -- version 10
>   \else
> -- version 11 or more
>   \endif


Based upon input from Tom and Fabien, I propose this additional doc patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


clarify_psql_server_version.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix bloom WAL tap test

2017-09-06 Thread Alexander Korotkov
Hi!

I just realized that these lines of contrib/bloom/t/001_wal.pl don't check
that queries give same results on master and standby.  They just check that
*return codes* of psql are equal.

# Run test queries and compare their result
> my $master_result = $node_master->psql("postgres", $queries);
> my $standby_result = $node_standby->psql("postgres", $queries);
> is($master_result, $standby_result, "$test_name: query result matches");


Attached patch fixes this problem by using safe_psql() which returns psql
output instead of return code.  For safety, this patch replaces psql() with
safe_psql() in other places too.

I think this should be backpatched to 9.6 as bugfix.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-bloom-wal-check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Peter Eisentraut
On 4/14/17 02:00, Michael Paquier wrote:
> Attached is an updated patch to use --no-sync with pg_dumpall calls.

Please send a rebased patch.

I propose splitting the single Perl script into three separate test
files: one for basic command-line option handling and so on (I would
like to expand that later), one for the main upgrade test, and one for
the funny database names tests.

In the testing file, you removed the paragraph that explains how to do
cross-version upgrade testing.  It's unfortunate that we would lose that
functionality.  What can we do about that?

We also need to have a plan for handling the build farm.  Maybe keep the
vcregress.pl upgradecheck target as a thin wrapper for the time being?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Why isn't this an open item for PG10?

Why should it be?  This behavior has existed for a long time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> On 5 September 2017 at 21:23, Tom Lane  wrote:
>> Moreover, it matters which primitive you're testing, on which platform,
>> with which compiler, because we have a couple of layers of atomic ops
>> implementations.

> If there is no gain on 2-socket, at least there is no loss either.

The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the less-generic
atomics code, and the latter is where our attention should be focused.
I think basically all of the improvement Sokolov got was from upgrading
the coverage of generic-gcc.h.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 04:14, Ashutosh Bapat
 wrote:
> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
>  wrote:
>>
>> Thanks.  Bearing all that in mind, I ran through a series of test
>> scenarios and discovered that my handling for JOIN_ANTI was wrong: I
>> thought that I had to deal with inverting the result, but I now see
>> that that's handled elsewhere (calc_joinrel_size_estimate() I think).
>> So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
>> same way.
>
> I agree, esp. after looking at eqjoinsel_semi(), which is used for
> both semi and anti joins, it becomes more clear.
>
>>
>> That just leaves the question of whether we should try to handle the
>> empty RHS and single-value RHS cases using statistics.  My intuition
>> is that we shouldn't, but I'll be happy to change my intuition and
>> code that up if that is the feedback from planner gurus.
>
> Empty RHS can result from dummy relations also, which are produced by
> constraint exclusion, so may be that's an interesting case. Single
> value RHS may be interesting with partitioned table with all rows in a
> given partition end up with the same partition key value. But may be
> those are just different patches. I am not sure.
>
>>
>> Please find attached a new version, and a test script I used, which
>> shows a bunch of interesting cases.  I'll add this to the commitfest.
>
> I added some "stable" tests to your patch taking inspiration from the
> test SQL file. I think those will be stable across machines and runs.
> Please let me know if those look good to you.

Why isn't this an open item for PG10?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Simon Riggs
On 5 September 2017 at 21:23, Tom Lane  wrote:
> Jeff Janes  writes:
>> What scale factor and client count? How many cores per socket?  It looks
>> like Sokolov was just starting to see gains at 200 clients on 72 cores,
>> using -N transaction.

...
> Moreover, it matters which primitive you're testing, on which platform,
> with which compiler, because we have a couple of layers of atomic ops
> implementations.
...

I think Sokolov was aiming at 4-socket servers specifically, rather
than as a general performance gain.

If there is no gain on 2-socket, at least there is no loss either.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] document and use SPI_result_code_string()

2017-09-06 Thread Tom Lane
Michael Paquier  writes:
> Fine for 0002. This reminds me of LockGXact and RemoveGXact in
> twophase.c, as well as _hash_squeezebucket that have some code paths
> that cannot return... Any thoughts about having some kind of
> PG_NOTREACHED defined to 0 which could be put in an assertion?

Generally we just do "Assert(false)", maybe with "not reached" in a
comment.  I don't feel a strong need to invent a new way to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 14:54, Sokolov Yura wrote:

On 2017-09-06 07:23, Tom Lane wrote:

Jeff Janes  writes:
What scale factor and client count? How many cores per socket?  It 
looks
like Sokolov was just starting to see gains at 200 clients on 72 
cores,

using -N transaction.

This means that Sokolov's proposed changes in atomics/generic.h
ought to be uninteresting for performance on any platform we care 
about --- at

least for pg_atomic_fetch_add_u32().


May I cite you this way:


I apologize for tone of my previous message.
My tongue is my enemy.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-09-06 Thread Jeevan Ladhe
Hi Ashutosh,

I have tried to address your comments in V27 patch series[1].
Please find my comments inlined.


> >>
> >> The current set of patches contains 6 patches as below:
> >>
> >> 0001:
> >> Refactoring existing ATExecAttachPartition  code so that it can be used
> >> for
> >> default partitioning as well
>

If I understand correctly these comments refer to patch 0001 of V25_rebase
series, which is related to "Fix assumptions that get_qual_from_partbound()"
and not refactoring. This is patch 0001 in V27 now.

  * Returns an expression tree describing the passed-in relation's partition
> - * constraint.
> + * constraint. If there are no partition constraints returns NULL e.g. in
> case
> + * default partition is the only partition.
> The first sentence uses singular constraint. The second uses plural. Given
> that
> partition bounds together form a single constraint we should use singular
> constraint in the second sentence as well.
>

I have changed the wording now.


>
> Do we want to add a similar comment in the prologue of
> generate_partition_qual(). The current wording there seems to cover this
> case,
> but do we want to explicitly mention this case?
>

I have added a comment there.


>
> +if (!and_args)
> +result = NULL;
> While this is correct, I am increasingly seeing (and_args != NIL) usage.
>

Changed this to:
+   if (and_args == NIL)
+   result = NULL;


> get_partition_qual_relid() is called from pg_get_partition_
> constraintdef(),
> constr_expr = get_partition_qual_relid(relationId);
>
> /* Quick exit if not a partition */
> if (constr_expr == NULL)
> PG_RETURN_NULL();
> The comment is now wrong since a default partition may have no
> constraints. May
> be rewrite it as simply, "Quick exit if no partition constraint."
>

Fixed.


>
> generate_partition_qual() has three callers and all of them are capable of
> handling NIL partition constraint for default partition. May be it's
> better to
> mention in the commit message that we have checked that the callers of
> this function
> can handle NIL partition constraint.
>

Added in commit message.


> >>
> >> 0002:
> >> This patch teaches the partitioning code to handle the NIL returned by
> >> get_qual_for_list().
> >> This is needed because a default partition will not have any constraints
> >> in case
> >> it is the only partition of its parent.
>

Comments below refer to patch 0002 in V25_rebase(0003 in V25), which
adds basic support for default partition, which is now 0002 in V27.


> If the partition being dropped is the default partition,
> heap_drop_with_catalog() locks default partition twice, once as the default
> partition and the second time as the partition being dropped. So, it will
> be
> counted as locked twice. There doesn't seem to be any harm in this, since
> the
> lock will be help till the transaction ends, by when all the locks will be
> released.
>

 Fixed.


> Same is the case with cache invalidation message. If we are dropping
> default
> partition, the cache invalidation message on "default partition" is not
> required. Again this might be harmless, but better to avoid it.


Fixed.


> Similar problems exists with ATExecDetachPartition(), default partition
> will be
> locked twice if it's being detached.
>

In ATExecDetachPartition() we do not have OID of the relation being
detached
available at the time we lock default partition. Moreover, here we are
taking an
exclusive lock on default OID and an share lock on partition being detached.
As you correctly said in your earlier comment that it will be counted as
locked
twice, which to me also seems harmless. As these locks are to be held till
commit of the transaction nobody else is supposed to be releasing these
locks in
between. I am not able to visualize a problem here, but still I have tried
to
avoid locking the default partition table twice, please review the changes
and
let me know your thoughts.


> +/*
> + * If this is a default partition, pg_partitioned_table must have
> it's
> + * OID as value of 'partdefid' for it's parent (i.e. rel) entry.
> + */
> +if (castNode(PartitionBoundSpec, boundspec)->is_default)
> +{
> +Oidpartdefid;
> +
> +partdefid = get_default_partition_oid(RelationGetRelid(rel));
> +Assert(partdefid == inhrelid);
> +}
> Since an accidental change or database corruption may change the default
> partition OID in pg_partition_table. An Assert won't help in such a case.
> May
> be we should throw an error or at least report a warning. If we throw an
> error,
> the table will become useless (or even the database will become useless
> RelationBuildPartitionDesc is called from RelationCacheInitializePhase3()
> on
> such a corrupted table). To avoid that we may raise a warning.
>
> I have added a warning here instead of Assert.


> I am wondering whether we could avoid call to 

Re: [HACKERS] 【ECPG】strncpy function does not set the end character '\0'

2017-09-06 Thread Tom Lane
Michael Meskes  writes:
>> In the function ECPGnoticeReceiver, we use the stncpy function copy
>> the
>> sqlstate to sqlca->sqlstate. And the  sqlca->sqlstate is defined as
>> the size
>> of 5, and the copy size is sizeof(sqlca->sqlstate). However, from the
>> previous strcmp function, the sqlstate size may be 5,such as
>> ECPG_SQLSTATE_INVALID_CURSOR_NAME. So there may be lack of the end
>> character
>> for sqlca->sqlstate.

> Why do you think there should be one? My memory might be wrong but I
> don't think it's supposed to be a null terminated string.

That field is defined as char[5] in struct sqlca_t, so the intent is
clearly that it not be null terminated.  However, it looks to me like
there'd be at least one alignment-padding byte after it, and that byte
is likely to be 0 in a lot of situations (particularly for statically
allocated sqlca_t's).  So a lot of the time, you could get away with
using strcmp() or other functions that expect null termination.

I'm thinking therefore that there's probably code out there that tries
to do strcmp(sqlca->sqlstate, "22000") or suchlike, and it works often
enough that the authors haven't identified their bug.  The question is
do we want to try to make that be valid code.

Changing the field declaration to char[5+1] would be easy enough, but
I have no idea how many places in ecpglib would need to change to make
sure that the last byte gets set to 0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 07:23, Tom Lane wrote:

Jeff Janes  writes:
What scale factor and client count? How many cores per socket?  It 
looks
like Sokolov was just starting to see gains at 200 clients on 72 
cores,

using -N transaction.

This means that Sokolov's proposed changes in atomics/generic.h
ought to be uninteresting for performance on any platform we care about 
--- at

least for pg_atomic_fetch_add_u32().


May I cite you this way:
"Tom Lane says PostgreSQL core team doesn't care for 4 socket 72 core
Intel Xeon servers"
???

To be honestly, I didn't measure exact impact of changing 
pg_atomic_fetch_add_u32.

I found issue with pg_atomic_fetch_or_u32, cause it is highly contended
both in LWLock, and in buffer page state management. I found changing
loop for generic pg_atomic_fetch_or_u32 already gives improvement.
And I changed loop for other generic atomic functions to make
all functions uniform.

It is bad style guide not to changing worse (but correct) code into
better (and also correct) just because you can't measure difference
(in my opinion. Perhaps i am mistaken).

(and yes: gcc intrinsic gives more improvement).

--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
 wrote:
>
> Thanks.  Bearing all that in mind, I ran through a series of test
> scenarios and discovered that my handling for JOIN_ANTI was wrong: I
> thought that I had to deal with inverting the result, but I now see
> that that's handled elsewhere (calc_joinrel_size_estimate() I think).
> So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
> same way.

I agree, esp. after looking at eqjoinsel_semi(), which is used for
both semi and anti joins, it becomes more clear.

>
> That just leaves the question of whether we should try to handle the
> empty RHS and single-value RHS cases using statistics.  My intuition
> is that we shouldn't, but I'll be happy to change my intuition and
> code that up if that is the feedback from planner gurus.

Empty RHS can result from dummy relations also, which are produced by
constraint exclusion, so may be that's an interesting case. Single
value RHS may be interesting with partitioned table with all rows in a
given partition end up with the same partition key value. But may be
those are just different patches. I am not sure.

>
> Please find attached a new version, and a test script I used, which
> shows a bunch of interesting cases.  I'll add this to the commitfest.

I added some "stable" tests to your patch taking inspiration from the
test SQL file. I think those will be stable across machines and runs.
Please let me know if those look good to you.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 23e5526..4c1bae6 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2697,26 +2697,63 @@ neqjoinsel(PG_FUNCTION_ARGS)
 	Oid			eqop;
 	float8		result;
 
-	/*
-	 * We want 1 - eqjoinsel() where the equality operator is the one
-	 * associated with this != operator, that is, its negator.
-	 */
-	eqop = get_negator(operator);
-	if (eqop)
+
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
 	{
-		result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel,
-	PointerGetDatum(root),
-	ObjectIdGetDatum(eqop),
-	PointerGetDatum(args),
-	Int16GetDatum(jointype),
-	PointerGetDatum(sjinfo)));
+		VariableStatData leftvar;
+		VariableStatData rightvar;
+		double		nullfrac;
+		bool		reversed;
+		HeapTuple	statsTuple;
+
+		get_join_variables(root, args, sjinfo, , , );
+		statsTuple = reversed ? rightvar.statsTuple : leftvar.statsTuple;
+		if (HeapTupleIsValid(statsTuple))
+			nullfrac = ((Form_pg_statistic) GETSTRUCT(statsTuple))->stanullfrac;
+		else
+			nullfrac = 0.0;
+		ReleaseVariableStats(leftvar);
+		ReleaseVariableStats(rightvar);
+
+		/*
+		 * For semi-joins, if there is more than one distinct value in the RHS
+		 * relation then every non-null LHS row must find a row to join since
+		 * it can only be equal to one of them.  We'll assume that there is
+		 * always more than one distinct RHS value for the sake of stability,
+		 * though in theory we could have special cases for empty RHS
+		 * (selectivity = 0) and single-distinct-value RHS (selectivity =
+		 * fraction of LHS that has the same value as the single RHS value).
+		 *
+		 * For anti-joins, if we use the same assumption that there is more
+		 * than one distinct key in the RHS relation, then every non-null LHS
+		 * row must be suppressed by the anti-join leaving only nullfrac.
+		 */
+		result = 1.0 - nullfrac;
 	}
 	else
 	{
-		/* Use default selectivity (should we raise an error instead?) */
-		result = DEFAULT_EQ_SEL;
+		/*
+		 * We want 1 - eqjoinsel() where the equality operator is the one
+		 * associated with this != operator, that is, its negator.
+		 */
+		eqop = get_negator(operator);
+		if (eqop)
+		{
+			result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel,
+		PointerGetDatum(root),
+		ObjectIdGetDatum(eqop),
+		PointerGetDatum(args),
+		Int16GetDatum(jointype),
+		PointerGetDatum(sjinfo)));
+		}
+		else
+		{
+			/* Use default selectivity (should we raise an error instead?) */
+			result = DEFAULT_EQ_SEL;
+		}
+		result = 1.0 - result;
 	}
-	result = 1.0 - result;
+
 	PG_RETURN_FLOAT8(result);
 }
 
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9f4c88d..10bfb68 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1845,6 +1845,89 @@ SELECT '' AS "xxx", *
  | 1 | 4 | one | -1
 (1 row)
 
+-- SEMI and ANTI join neq selectivity
+ANALYZE J1_TBL;
+ANALYZE J2_TBL;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT * FROM J1_TBL t1
+WHERE EXISTS (SELECT * FROM J2_TBL t2 WHERE t1.i <> t2.i);
+QUERY PLAN 
+---
+ Nested Loop Semi Join (actual rows=9 

Re: [HACKERS] additional contrib test suites

2017-09-06 Thread Thomas Munro
On Sat, Aug 12, 2017 at 1:20 PM, Peter Eisentraut
 wrote:
> Here are some small test suites for some contrib modules as well as
> pg_archivecleanup that didn't have one previously, as well as one patch
> to improve code coverage in a module.
>
> Will add to commit fest.  Testing on different platforms and with
> different build configurations would be useful.

After applying these patches cleanly on top of
0b554e4e63a4ba4852c01951311713e23acdae02 and running "./configure
--enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
--with-icu && make && make check-world" I saw this failure:

cd . && 
TESTDIR='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup'
PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/bin:$PATH"
LD_LIBRARY_PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/lib"
PGPORT='65432' 
PG_REGRESS='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/regress/pg_regress'
/usr/bin/prove -I ../../../src/test/perl/ -I .  t/*.pl
t/010_pg_archivecleanup.pl .. 1/42
#   Failed test 'fails if restart file name is not specified: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: must specify oldest kept WAL file
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:must specify restartfilename)'
#   Failed test 'fails with too many parameters: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: too many command-line arguments
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:too many parameters)'
#   Failed test 'fails with invalid restart file name: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: invalid file name argument
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:invalid filename)'
# Looks like you failed 3 tests of 42.
t/010_pg_archivecleanup.pl .. Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/42 subtests
Test Summary Report
---
t/010_pg_archivecleanup.pl (Wstat: 768 Tests: 42 Failed: 3)
  Failed tests:  12, 16, 18
  Non-zero exit status: 3
Files=1, Tests=42,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.05 cusr
 0.00 csys =  0.08 CPU)
Result: FAIL

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-09-06 Thread Thomas Munro
On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshov
 wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>
> Functions added:
>  - gist_stat(text) - shows statistics on GiST Tree
>  - gist_tree(text) - shows GiST tree
>  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
>  - gist_print(text) - prints objects stored in GiST tree
>  - spgist_stat(text) - shows statistics on SP-GiST
>  - spgist_print(text) - prints objects stored in index
>  - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
>  - gin_stats() - originally gin_statpage(text) - shows statistics
>  - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I leave
> pgindented one.
> The patch is applicable to the commit
> 866f4a7c210857aa342bf901558d170325094dde.

Hi Alexey,

This patch still applies but doesn't build after commits 2cd70845 and c6293249.

ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have
‘FormData_pg_attribute’)
   st->index->rd_att->attrs[st->attnum]->attbyval,
...several similar errors...

For example, that expression needs to be changed to:

   TupleDescAttr(st->index->rd_att, attnum)->attbyval

Here is some superficial review since I'm here:

+/*
+ * process_tuples
+ * Retrieves the number of TIDs in a tuple.
+ */
+static void
+process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup)

"process_tuples" vs "process_tuple"

+ /* do no distiguish various null category */

"do not distinguish various null categories"

+ st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true;

That's a long way to write (st->category != GIN_CAT_NORM_KEY)!

+ * We scaned the whole page, so we should take right page

"scanned"

+/*
+ * refind_position
+ * Refinds a previois position, at returns it has correctly
+ * set offset and buffer is locked.
+ */

"previous", s/, at returns/.  On return/

+ memset(st->nulls, false, 2 * sizeof(*st->nulls));

"false" looks strange here where an int is expected.  The size
expression is weird.  I think you should just write:

st->nulls[0] = false;
st->nulls[1] = false;

Investigating the types more closely, I see that 'nulls' is declared
like this (in struct GinStatState):

+ char nulls[2];

Then later you do this:

+ htuple = heap_form_tuple(funcctx->attinmeta->tupdesc,
+ st->dvalues,
+ st->nulls);

But that's not OK, heap_form_tuple takes bool *.  bool and char are
not interchangeable (they may have different sizes on some platforms,
among other problems, even though usually they are both 1 byte).  So I
think you should declare it as "bool nulls[2]".

Meanwhile in TypeStorage you have a member "bool *nulls", which you
then initialise like so:

+ st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls));

That cast is wrong.

+/*
+ * gin_count_estimate
+ * Outputs number of indexed rows matched query. It doesn't touch heap at
+ * all.

Maybe "... number of indexed rows matching query."?

+ if (attnum < 0 || attnum >= st->index->rd_att->natts)
+ elog(ERROR, "Wrong column's number");
+ st->attnum = attnum;

Maybe "invalid column number" or "invalid attribute number"?

+ elog(ERROR, "Column type is not a tsvector");

s/C/c/ (according to convention).

+ * Prints various stat about index internals.

s/stat/stats/

+ elog(ERROR, "Relation %s.%s is not an index",

s/R/r/

+ elog(ERROR, "Index %s.%s has wrong type",

s/I/i/

+  gin_value_count prints estimated counts for each
+  indexed values, single-argument function will return result for a first
+  column of index. For example:

"... for each indexed value.  The single argument version will return
results for the first column of an index.  For example:"

+  gin_count_estimate outputs number of indexed
+ rows matched query. It doesn't touch heap at all. For example:

"... outputs the number of indexed rows matched by a query. ..."

+  gist_print prints objects stored in
+  GiST tree, works only if objects in index have
+  textual representation (type_out functions should be
+  implemented for the given object type). It's known to work with R-tree
+  GiST based index. Note, in example below, objects are
+  of type box. For example:

"... prints objects stored in a GiST tree.  it works only if the
objects in the index have textual representation (type_out functions
should be implemented for the given object type).  It's known to work
with R-tree GiST-based indexes.  Note: in the example below, objects
are of type box.  For example:"

+  

Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-06 Thread Rafia Sabih
On Fri, Jun 2, 2017 at 6:31 PM, Amit Kapila  wrote:
>
> Your reasoning sounds sensible to me.  I think the other way to attack
> this problem is that we can maintain some local queue in each of the
> workers when the shared memory queue becomes full.  Basically, we can
> extend your "Faster processing at Gather node" patch [1] such that
> instead of fixed sized local queue, we can extend it when the shm
> queue become full.  I think that way we can handle both the problems
> (worker won't stall if shm queues are full and workers can do batched
> writes in shm queue to avoid the shm queue communication overhead) in
> a similar way.
>
>
> [1] - 
> https://www.postgresql.org/message-id/CAOGQiiMwhOd5-iKZnizn%2BEdzZmB0bc3xa6rKXQgvhbnQ29zCJg%40mail.gmail.com
>

I worked on this idea of using local queue as a temporary buffer to
write the tuples when master is busy and shared queue is full, and it
gives quite some improvement in the query performance.

Design:
On a basic level, the design of this patch can be explained as
following, similar to shm_mq, there is a new structure local_mq which
is private for each worker. Once shared queue is full, we write the
tuple in local queue. Since, local queue is never shared we do not
need any sort of locking for writing in it, hence writing in local
queue is one cheap operation.

Once local queue is atleast 5% (for this version, I've kept this, but
we might need to modify it) full we copy the data from local to shared
queue. In case both the queues are full, wait till master reads from
shared queue, then copy some data from local to shared queue, till
required space is available, subsequently write the tuple to local
queue. If at any instant local queue becomes empty then we write the
tuple in shared queue itself, provided there is space. At the time of
worker shutdown we copy all the data from local queue to shared queue.

For this version of the patch I have kept the size of local queue =
100 * PARALLEL_TUPLE_QUEUE_SIZE = 6553600, which might not be the best
and I am open to understand the reasons for modifying it. But it is
kept that way for the scenarios where gather/gather-merge node is
slow. And I expect when a master is busy it might be for some long
time or the data to be processed is high and we would not want our
worker to wait for some long time.

Performance:
These experiments are on TPC-H scale factor 20. The patch is giving
around 20-30% performance improvement in queries with selectivity
something around 20-30%.

Head:
Default plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

QUERY PLAN
-
 Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..334367.85 rows=10313822 width=129) (actual
time=0.057..26389.587 rows=10258702 loops=1)
   Index Cond: (l_orderkey < 1500)
   Filter: (l_extendedprice < '5'::numeric)
   Rows Removed by Filter: 4737888
 Planning time: 1.686 ms
 Execution time: 27402.801 ms
(6 rows)

Force parallelism plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

  QUERY PLAN
-
 Gather  (cost=0.57..193789.78 rows=10313822 width=129) (actual
time=0.354..41153.916 rows=10258702 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..193789.78 rows=2578456 width=129) (actual
time=0.062..6530.167 rows=2051740 loops=5)
 Index Cond: (l_orderkey < 1500)
 Filter: (l_extendedprice < '5'::numeric)
 Rows Removed by Filter: 947578
 Planning time: 0.383 ms
 Execution time: 42027.645 ms
(9 rows)

Patch:
Force parallelism plan

 explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

  QUERY PLAN
-
 Gather  (cost=0.57..193789.78 rows=10313822 width=129) (actual
time=0.413..16690.294 rows=10258702 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..193789.78 rows=2578456 width=129) (actual
time=0.047..6185.527 rows=2051740 loops=5)
 Index Cond: (l_orderkey < 1500)
 Filter: (l_extendedprice < '5'::numeric)
 Rows Removed by Filter: 947578
 Planning time: 0.406 ms
 Execution time: 17616.750 ms
(9 rows)

Head:
Default plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 3000;

QUERY PLAN

Re: [HACKERS] Copyright in partition.h and partition.c

2017-09-06 Thread Daniel Gustafsson
> On 06 Sep 2017, at 02:56, Amit Langote  wrote:
> 
> On 2017/09/05 21:14, Tom Lane wrote:
>> Amit Langote  writes:
>>> On 2017/09/05 15:48, Etsuro Fujita wrote:
 Here is the copyright in partition.h:
 
  * Copyright (c) 2007-2017, PostgreSQL Global Development Group
 
 I think it's reasonable that that matches the copyright in partition.c,
 but partition.c has:
 
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
 
 Is that intentional?
>> 
>>> No, it's unintentional.  The difference may have resulted from copying
>>> different files to become partition.h and partition.c, respectively.
>> 
>>> Maybe, we should change both to say 2016-2017?
>> 
>>> I don't know the exact rule for how we determine those years.  Is there
>>> some rule in place about that?  When I look at execParallel.c, which
>>> supposedly got introduced into the tree recently, I see 1996-2017.  OTOH,
>>> the files in contrib/bloom all have 2016-2017.
>> 
>> Our usual practice is to write the copyright like it is in partition.c
>> even in new files.  This avoids any question about whether any of the
>> code was copied-and-pasted from somewhere else in PG.  Even if not one
>> word in the file can be traced to code that was somewhere else before,
>> it seems to me that this is an appropriate thing to do, to give due
>> credit to those who came before us.
> 
> Agreed.
> 
>> In short: we should make partition.h's copyright look like partition.c's
>> not vice versa.
> 
> Attached patch does that.

This reminded me that I’d seen one of these before while hacking, and with some
grep and xargs abuse I spotted one more (there might be more that my command
line fu didn’t catch though).  Attached could perhaps be included with the
above patch?

Perhaps the copyright script should be expanded to catch these?  (and I
volunteer to attempt that unless it’s deemed an uninteresting feature)

cheers ./daniel



header_copyright.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-06 Thread Amit Langote
On 2017/09/04 10:10, Amit Langote wrote:
> On 2017/09/02 2:52, Robert Haas wrote:
>> It strikes me that this patch set is doing two things but maybe in the
>> opposite order that I would have chosen to attack them.  First,
>> there's getting partition pruning to use something other than
>> constraint exclusion.  Second, there's deferring work that is
>> currently done at an early stage of the process until later, so that
>> we waste less effort on partitions that are ultimately going to be
>> pruned.
> 
> OK.
> 
>> Therefore, IMHO, it would be best to focus first on how we're going to
>> identify the partitions that survive pruning, and then afterwards work
>> on transposing that logic to happen before partitions are opened and
>> locked.  That way, we get some incremental benefit sooner, and also
>> unblock some other development work.
> 
> Alright, I will try to do it that way.

Attached set of patches that does things that way.  Individual patches
described below:

[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.

This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key.  We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.

[PATCH 2/5] WIP: planner-side changes for partition-pruning

This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.

Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.

Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.

[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key.  Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.

[PATCH 4/5] WIP: Implement get_partitions_for_keys()

This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.

[PATCH 5/5] Add more tests for the new partitioning-related planning

More tests.


Some TODO items still remain to be done:

* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests

Thanks,
Amit
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 6 Sep 2017 09:28:14 +0900
Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

...except when the partitioned table in question is the result rel
of the query.

This allows us perform scan and join planning for each sub-tree in a
given partition tree, with each sub-tree's root partitioned table
getting its own RelOptInfo.  Previously only the root of the whole
partition tree got a RelOptInfo, along with the leaf partitions,
with each leaf partition's AppendRelInfo pointing to the former as
its parent.
---
 src/backend/optimizer/path/allpaths.c  |  34 ++-
 src/backend/optimizer/plan/planner.c   |   3 +-
 src/backend/optimizer/prep/prepunion.c | 166 +
 src/backend/optimizer/util/plancat.c   |   7 +-
 4 files changed, 146 insertions(+), 64 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..6c3511bd47 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
RangeTblEntry *rte;
 
rte = planner_rt_fetch(rel->relid, root);
+
+   /*
+* Get the partitioned_rels list from root->pcinfo_list after
+* confirming that rel is actually a root partitioned table.
+*/
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
-   partitioned_rels = get_partitioned_child_rels(root, rel->relid);
-   /* The root partitioned table is included as a child rel */
-   Assert(list_length(partitioned_rels) >= 1);
+   

  1   2   >