Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Peter Geoghegan
On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro
 wrote:
> 2.  All participants: parallel sequential scan, sort, spool to disk;
> barrier; leader: merge spooled tuples and build btree.
>
> This patch is doing the 2nd thing.  My understanding is that some
> systems might choose to do that if they don't have or don't like the
> table's statistics, since repartitioning for balanced load requires
> carefully chosen ranges and is highly sensitive to distribution
> problems.

The second thing here seems to offer comparable scalability to other
system implementation's of the first thing. They seem to have reused
"partitioning to sort in parallel" for B-Tree builds, at least in some
cases, despite this. WAL logging is the biggest serial bottleneck here
for other systems, I've heard -- that's still going to be pretty much
serial.

I think that the fact that some systems do partitioning for parallel
B-Tree builds might have as much to do with their ability to create
B-Tree indexes in place as anything else. Apparently, some systems
don't use temp files, instead writing out what is for all intents and
purposes part of a finished B-Tree as runs (no use of
temp_tablespaces). That may be a big part of what makes it worthwhile
to try to use partitioning. I understand that only the highest client
counts will see much direct performance benefit relative to the first
approach.

> It's pretty clear that approach 1 is a difficult project.  From my
> research into dynamic repartitioning in the context of hash joins, I
> can see that that infrastructure is a significant project in its own
> right: subproblems include super efficient tuple exchange, buffering,
> statistics/planning and dealing with/adapting to bad outcomes.  I also
> suspect that repartitioning operators might need to be specialised for
> different purposes like sorting vs hash joins, which may have
> differing goals.  I think it's probably easy to build a slow dynamic
> repartitioning mechanism that frequently results in terrible worst
> case scenarios where you paid a fortune in IPC overheads and still
> finished up with one worker pulling most of the whole load.  Without
> range partitioning, I don't believe you can merge the resulting
> non-disjoint btrees efficiently so you'd probably finish up writing a
> complete new btree to mash them together.  As for merging disjoint
> btrees, I assume there are ways to do a structure-preserving merge
> that just rebuilds some internal pages and incorporates the existing
> leaf pages directly, a bit like tree manipulation in functional
> programming languages; that'll take some doing.

I agree with all that. "Stitching together" disjoint B-Trees does seem
to have some particular risks, which users of other systems are
cautioned against in their documentation. You can end up with an
unbalanced B-Tree.

> So I'm in favour of this patch, which is relatively simple and give us
> faster index builds soon.  Eventually we might also be able to have
> approach 1.  From what I gather, it's entirely possible that we might
> still need 2 to fall back on in some cases.

Right. And it can form the basis of an implementation of 1, which in
any case seems to be much more compelling for parallel query, when a
great deal more can be pushed down, and we are not particularly likely
to be I/O bound (usually not much writing to the heap, or WAL
logging).

> Will you move the BufFile changes to a separate patch in the next revision?

That is the plan. I need to get set up with a new machine here, having
given back my work laptop to Heroku, but it shouldn't take too long.

Thanks for the review.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-01-31 Thread Kyotaro HORIGUCHI
Hello, I'll add the rebased version to the next CF.

At Fri, 20 Jan 2017 11:07:29 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170120.110729.107284864.horiguchi.kyot...@lab.ntt.co.jp>
> > > > - Delaying recycling a segment until the last partial record on it
> > > >   completes. This seems doable in page-wise (coarse resolution)
> > > >   but would cost additional reading of past xlog files (page
> > > >   header of past pages is required).
> > > 
> > > Hm, yes. That looks like the least invasive way to go. At least that
> > > looks more correct than the others.
> > 
> > The attached patch does that. Usually it reads page headers only
> > on segment boundaries, but once continuation record found (or
> > failed to read the next page header, that is, the first record on
> > the first page in the next segment has not been replicated), it
> > becomes to happen on every page boundary until non-continuation
> > page comes.
> > 
> > I leave a debug info (at LOG level) in the attached file shown on
> > every state change of keep pointer. At least for pgbench, the
> > cost seems ignorable.
> 
> I revised it. It became neater and less invasive.
> 
>  - Removed added keep from struct WalSnd. It is never referrenced
>from other processes. It is static variable now.
> 
>  - Restore keepPtr from replication slot on starting.

keepPtr is renamed to a more meaningful name restartLSN.

>  - Moved the main part to more appropriate position.

- Removed the debug print code.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 47f9f867305934cbc5fdbd9629e61be65353fc9c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 1 Feb 2017 16:07:22 +0900
Subject: [PATCH] Fix a bug of physical replication slot.

A physical-replication standby can stop just at the boundary of WAL
segments. restart_lsn of the slot on the master can be assumed to be
the same location. The last segment on the master will be removed
after some checkpoints for the case. If the first record of the next
segment is a continuation record, it is only on the master and its
beginning is only on the standby so the standby cannot restart because
the record to read is scattered to two sources.

This patch detains restart_lsn in the last sgement when the first page
of the next segment is a continuation record.
---
 src/backend/replication/walsender.c | 107 +---
 1 file changed, 100 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5909b7d..cfbe70e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -188,6 +188,12 @@ static volatile sig_atomic_t replication_active = false;
 static LogicalDecodingContext *logical_decoding_ctx = NULL;
 static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
+/*
+ * Segment keep pointer for physical slots. Has a valid value only when it
+ * differs from the current flush pointer.
+ */
+static XLogRecPtr	   keepPtr = InvalidXLogRecPtr;
+
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
@@ -220,7 +226,7 @@ static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, Tran
 static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
 
-static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+static bool XLogRead(char *buf, XLogRecPtr startptr, Size count, bool noutfoundok);
 
 
 /* Initialize walsender process before entering the main command loop */
@@ -541,6 +547,9 @@ StartReplication(StartReplicationCmd *cmd)
 			ereport(ERROR,
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 (errmsg("cannot use a logical replication slot for physical replication";
+
+		/* Restore keepPtr from replication slot */
+		keepPtr = MyReplicationSlot->data.restart_lsn;
 	}
 
 	/*
@@ -556,6 +565,10 @@ StartReplication(StartReplicationCmd *cmd)
 	else
 		FlushPtr = GetFlushRecPtr();
 
+	/* Set InvalidXLogRecPtr if catching up */
+	if (keepPtr == FlushPtr)
+		keepPtr = InvalidXLogRecPtr;
+
 	if (cmd->timeline != 0)
 	{
 		XLogRecPtr	switchpoint;
@@ -777,7 +790,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 		count = flushptr - targetPagePtr;
 
 	/* now actually read the data, we know it's there */
-	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ, false);
 
 	return count;
 }
@@ -1563,7 +1576,7 @@ static void
 ProcessStandbyReplyMessage(void)
 {
 	XLogRecPtr	writePtr,
-flushPtr,
+flushPtr, oldFlushPtr,
 applyPtr;
 	bool		replyRequested;
 
@@ -1592,6 +1605,7 @@ ProcessStandbyReplyMessage(void)
 		WalSnd	   *walsnd = MyWalSnd;
 
 		SpinLockAcquire(>mutex);
+		oldFlushPtr = walsnd->flush;
 		

Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Rahila Syed
Hello Robert,

>I am a bit mystified about how this manages to work with array keys.
>_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>is already BTPARALLEL_DONE.  It seems like one of those two things has
>to be wrong.

btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker
which is
performing scan for latest array key and which has encountered end of scan.
This is ensured by
following check in _bt_parallel_done(),

   if (so->arrayKeyCount >= btscan->btps_arrayKeyCount &&
   btscan->btps_pageStatus != BTPARALLEL_DONE)

Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This
ensures that when any worker reaches
_bt_advance_array_keys() it advances latest scan which is marked by
btscan->btps_arrayKeyCount only when latest scan
has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in
_bt_advance_array_keys(). Otherwise, the worker just
advances its local so->arrayKeyCount.

I hope this provides some clarification.

Thank you,
Rahila Syed





On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas  wrote:

> On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas 
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila 
> wrote:
> >> In spite of being careful, I missed reorganizing the functions in
> >> genam.h which I have done in attached patch.
> >
> > Cool.  Committed parallel-generic-index-scan.2.patch.  Thanks.
>
> Reviewing parallel_index_scan_v6.patch:
>
> I think it might be better to swap the return value and the out
> parameter for _bt_parallel_seize; that is, return a bool, and have
> callers ignore the value of the out parameter (e.g. *pageno).
>
> I think _bt_parallel_advance_scan should be renamed something that
> includes the word "keys", like _bt_parallel_advance_array_keys.
>
> The hunk in indexam.c looks like a leftover that can be omitted.
>
> +/*
> + * Below flags are used to indicate the state of parallel scan.
>
> They aren't really flags any more; they're members of an enum.  I
> think you could just leave this sentence out entirely and start right
> in with descriptions of the individual values.  But maybe all of those
> descriptions should end in a period (instead of one ending in a period
> but not the other three) since they read like complete sentences.
>
> + * btinitparallelscan - Initializing BTParallelScanDesc for parallel
> btree scan
>
> Initializing -> initialize
>
> + *  btparallelrescan() -- reset parallel scan
>
> Previous two prototypes have one dash, this one has two.  Make it
> consistent, please.
>
> + * Ideally, we don't need to acquire spinlock here, but being
> + * consistent with heap_rescan seems to be a good idea.
>
> How about: In theory, we don't need to acquire the spinlock here,
> because there shouldn't be any other workers running at this point,
> but we do so for consistency.
>
> + * _bt_parallel_seize() -- returns the next block to be scanned for
> forward
> + *  scans and latest block scanned for backward scans.
>
> I think the description should be more like "Begin the process of
> advancing the scan to a new page.  Other scans must wait until we call
> bt_parallel_release() or bt_parallel_done()."  And likewise
> _bt_parallel_release() should say something like "Complete the process
> of advancing the scan to a new page.  We now have the new value for
> btps_scanPage; some other backend can now begin advancing the scan."
> And _bt_parallel_done should say something like "Mark the parallel
> scan as complete."
>
> I am a bit mystified about how this manages to work with array keys.
> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
> is already BTPARALLEL_DONE.  It seems like one of those two things has
> to be wrong.
>
> _bt_readpage's header comment should be updated to note that, in the
> case of a parallel scan, _bt_parallel_seize should have been called
> prior to entering this function, and _bt_parallel_release will be
> called prior to return (or this could alternatively be phrased in
> terms of btps_pageStatus on entry/exit).
>
> _bt_readnextpage isn't very clear about the meaning of its blkno
> argument.  It looks like it always has to be valid when scanning
> forward, but only in the parallel case when scanning backward?  That
> is a little odd.  Another, possibly related thing that is odd is that
> when _bt_steppage() finds no tuples and decides to advance to a new
> page again, there's a very short comment in the forward case and a
> very long comment in the backward case:
>
> /* nope, keep going */
> vs.
> /*
>  * For parallel scans, get the last page scanned as it is quite
>  * possible that 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Thomas Munro
On Wed, Feb 1, 2017 at 5:37 PM, Michael Paquier
 wrote:
> On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan  wrote:
>> On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
>>  wrote:
>>> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
 Attached is V7 of the patch.
>>>
>>> I am doing some testing.  First, some superficial things from first pass:
>>>
>>> [Various minor cosmetic issues]
>>
>> Oops.
>
> As this review is very recent, I have moved the patch to CF 2017-03.

 ParallelContext *
-CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
+CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers,
+ bool serializable_okay)
 {
MemoryContext oldcontext;
ParallelContext *pcxt;
@@ -143,7 +144,7 @@ CreateParallelContext(parallel_worker_main_type
entrypoint, int nworkers)
 * workers, at least not until somebody enhances that mechanism to be
 * parallel-aware.
 */
-   if (IsolationIsSerializable())
+   if (IsolationIsSerializable() && !serializable_okay)
nworkers = 0;

That's a bit weird but I can't think of a problem with it.  Workers
run with MySerializableXact == InvalidSerializableXact, even though
they may have the snapshot of a SERIALIZABLE leader.  Hopefully soon
the restriction on SERIALIZABLE in parallel queries can be lifted
anyway, and then this could be removed.

Here are some thoughts on the overall approach.  Disclaimer:  I
haven't researched the state of the art in parallel sort or btree
builds.  But I gather from general reading that there are a couple of
well known approaches, and I'm sure you'll correct me if I'm off base
here.

1.  All participants: parallel sequential scan, repartition on the fly
so each worker has tuples in a non-overlapping range, sort, build
disjoint btrees; barrier; leader: merge disjoint btrees into one.

2.  All participants: parallel sequential scan, sort, spool to disk;
barrier; leader: merge spooled tuples and build btree.

This patch is doing the 2nd thing.  My understanding is that some
systems might choose to do that if they don't have or don't like the
table's statistics, since repartitioning for balanced load requires
carefully chosen ranges and is highly sensitive to distribution
problems.

It's pretty clear that approach 1 is a difficult project.  From my
research into dynamic repartitioning in the context of hash joins, I
can see that that infrastructure is a significant project in its own
right: subproblems include super efficient tuple exchange, buffering,
statistics/planning and dealing with/adapting to bad outcomes.  I also
suspect that repartitioning operators might need to be specialised for
different purposes like sorting vs hash joins, which may have
differing goals.  I think it's probably easy to build a slow dynamic
repartitioning mechanism that frequently results in terrible worst
case scenarios where you paid a fortune in IPC overheads and still
finished up with one worker pulling most of the whole load.  Without
range partitioning, I don't believe you can merge the resulting
non-disjoint btrees efficiently so you'd probably finish up writing a
complete new btree to mash them together.  As for merging disjoint
btrees, I assume there are ways to do a structure-preserving merge
that just rebuilds some internal pages and incorporates the existing
leaf pages directly, a bit like tree manipulation in functional
programming languages; that'll take some doing.

So I'm in favour of this patch, which is relatively simple and give us
faster index builds soon.  Eventually we might also be able to have
approach 1.  From what I gather, it's entirely possible that we might
still need 2 to fall back on in some cases.

Will you move the BufFile changes to a separate patch in the next revision?

Still testing and reviewing, more soon.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-01-31 Thread Kyotaro HORIGUCHI
At Tue, 31 Jan 2017 14:38:39 +0300, Nikita Glukhov  
wrote in <1622dc9f-cecf-cee3-b71e-b2bf34649...@postgrespro.ru>
> On 31.01.2017 13:04, Kyotaro HORIGUCHI wrote:
> > The following comment,
> >
> >> /* Can any range from range_box to be overlower than this argument? */
> >
> > This might be better to be using the same wording to its users,
> > for example the comment for overLeft4D is the following.
> >
> > | /* Can any rectangle from rect_box does not extend the right of this
> > | argument? */
> >
> > And the documentation uses the following sentence,
> >
> > https://www.postgresql.org/docs/current/static/functions-geometry.html
> > | Does not extend to the right of?
> >
> > So I think "Can any range from range_box does not extend the
> > upper of this argument?" is better than the overlower. What do
> > you think?
> 
> I think "does not extend the upper" is better than unclear "overlower"
> too.
> So I've corrected this comment in the attached v03 patch.

Thank you. I think this patch is already in a good shape. Let's
wait for a while for some commiter to pick up this. If you're
afraid that this might be forgotten, it might be better to
register this as an entry in the next commit fest.

> > I confirmed other functions in geo_spgist but found no bugs like this.
> 
> I found no bugs in other functions too.
> 
> 
> > The minimal bad example for the '&<' case is the following.
> >
> > =# create table t (b box);
> > =# create index on t using spgist(b);
> > =# insert into t values ('((215, 305),(210,300))'::box);
> > =# select * from t where b &< '((100,200),(300,400))'::box;
> >   b
> > -
> >  (215,305),(210,300)
> > (1 row)
> >
> > =# set enable_seqscan to false;
> > =# select * from t where b &< '((100,200),(300,400))'::box;
> >  b
> > ---
> > (0 rows)
> 
> I don't know how you were able to reproduce this bug in
> spg_box_quad_inner_consistent()
> using only one box because index tree must have at least one inner
> node (or 157 boxes)
> in order to spg_box_quad_inner_consistent() began to be called.
> That's why existing
> tests for box_ops didn't detect this bug: box_temp table has only 56
> rows.

GiST seems to split the space even for the first tuple. I saw
spg_box_quad_inner_consistent called several times on SELECT and
fortunately the scan for the tuple going into wrong node.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 8:01 AM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>
>
Robert's suggestion surely makes the approach more general. But,  the
existing approach makes it easier to decide the RM's for which we
support the consistency check facility. Surely, we can use a list to
track the RMs which should (/not) be masked. But, I think we always
have to mask the lsn of the pages at least. Isn't it?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).
+ 1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
At Wed, 01 Feb 2017 12:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170201.121304.267734380.horiguchi.kyot...@lab.ntt.co.jp>
> > >  I tried a committed Logical Replication environment. I found
> > >  that replication between databases of different encodings did
> > >  not convert encodings in character type columns. Is this
> > >  behavior correct?
> > 
> > The output plugin for subscription is pgoutput and it currently
> > doesn't consider encoding but would easiliy be added if desired
> > encoding is informed.
> > 
> > The easiest (but somewhat seems fragile) way I can guess is,
> > 
> > - Subscriber connects with client_encoding specification and the
> >   output plugin pgoutput decide whether it accepts the encoding
> >   or not. If the subscriber doesn't, pgoutput send data without
> >   conversion.
> > 
> > The attached small patch does this and works with the following
> > CREATE SUBSCRIPTION.
> 
> Oops. It forgets to care conversion failure. It is amended in the
> attached patch.
> 
> > CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
> > client_encoding=EUC_JP' PUBLICATION pub1;
> > 
> > 
> > Also we may have explicit negotiation on, for example,
> > CREATE_REPLICATION_SLOT.
> > 
> >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> > 
> > Or output plugin may take options.
> > 
> >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> > 
> > 
> > Any opinions?

This patch chokes replication when the publisher finds an
inconvertible character in a tuple to be sent. For the case,
dropping-then-recreating subscription is necessary to go forward.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index-only scan

2017-01-31 Thread Michael Paquier
On Thu, Jan 19, 2017 at 9:07 PM, Rafia Sabih
 wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-31 Thread Kyotaro HORIGUCHI
Hello, thank you for moving this to the next CF.

At Wed, 1 Feb 2017 13:09:51 +0900, Michael Paquier  
wrote in 
> On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
>  wrote:
> > Six new syscaches in 665d1fa was conflicted and 3-way merge
> > worked correctly. The new syscaches don't seem to be targets of
> > this patch.
> 
> To be honest, I am not completely sure what to think about this patch.
> Moved to next CF as there is a new version, and no new reviews to make
> the discussion perhaps move on.

I'm thinking the following is the status of this topic.

- The patch stll is not getting conflicted.

- This is not a hollistic measure for memory leak but surely
  saves some existing cases.

- Shared catcache is another discussion (and won't really
  proposed in a short time due to the issue on locking.)

- As I mentioned, a patch that caps the number of negative
  entries is avaiable (in first-created - first-delete manner)
  but it is having a loose end of how to determine the
  limitation.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside ExecEndGather)

2017-01-31 Thread Michael Paquier
On Fri, Dec 2, 2016 at 9:20 PM, Haribabu Kommi  wrote:
> Moved to next CF with "needs review" status.

There has not been much interest in this patch, moved again, this time
to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-31 Thread Michael Paquier
On Fri, Dec 30, 2016 at 12:55 PM, Haribabu Kommi
 wrote:
> Any Comments on the approach?

I have moved this patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 7:20 PM, Etsuro Fujita
 wrote:
> Attached is the new version of the patch.  I also addressed other comments
> from you: moved rewriting the fdw_scan_tlist to postgres_fdw.c,
> added/revised comments, and added regression tests for the case where a
> pushed down UPDATE/DELETE on a join has RETURNING.
>
> My apologies for having been late to work on this.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-31 Thread Michael Paquier
On Fri, Jan 20, 2017 at 3:03 AM, Petr Jelinek
 wrote:
> Okay, here is v3 with some small fixes and rebased on top of rename.
> Also it's rebased without the
> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch as I
> don't expect that one to go further for now.
>
> Thanks for testing!

This patch needs a rebase, moved to next CF with "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 7:37 AM, Craig Ringer  wrote:
> Rebased series attached, on top of current master (which includes
> logical replicaiton).
>
> I'm inclined to think I should split out a few of the changes from
> 0005, roughly along the lines of the bullet points in its commit
> message. Anyone feel strongly about how granular this should be?
>
> This patch series is a pre-requisite for supporting logical
> replication using a physical standby as a source, but does not its
> self enable logical replication from a physical standby.

There are patches but no reviews yet so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 4:02 PM, Craig Ringer  wrote:
> If we want to save the 4 bytes per xmin advance (probably not worth
> caring) we can instead skip setting it on the standby, in which case
> it'll be potentially wrong until the next checkpoint. I'd rather make
> sure it stays correct.

Those patches still apply and no reviews have come in yet, so moved to
CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-01-31 Thread Michael Paquier
On Tue, Jan 17, 2017 at 2:40 PM, Amit Langote
 wrote:
> Hi Amit,
>
> On 2016/12/23 14:21, Amit Khandekar wrote:
>> Currently an Append plan node does not execute its subplans in
>> parallel. There is no distribution of workers across its subplans. The
>> second subplan starts running only after the first subplan finishes,
>> although the individual subplans may be running parallel scans.
>>
>> Secondly, we create a partial Append path for an appendrel, but we do
>> that only if all of its member subpaths are partial paths. If one or
>> more of the subplans is a non-parallel path, there will be only a
>> non-parallel Append. So whatever node is sitting on top of Append is
>> not going to do a parallel plan; for example, a select count(*) won't
>> divide it into partial aggregates if the underlying Append is not
>> partial.
>>
>> The attached patch removes both of the above restrictions.  There has
>> already been a mail thread [1] that discusses an approach suggested by
>> Robert Haas for implementing this feature. This patch uses this same
>> approach.
>
> I was looking at the executor portion of this patch and I noticed that in
> exec_append_initialize_next():
>
> if (appendstate->as_padesc)
> return parallel_append_next(appendstate);
>
> /*
>  * Not parallel-aware. Fine, just go on to the next subplan in the
>  * appropriate direction.
>  */
> if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
> appendstate->as_whichplan++;
> else
> appendstate->as_whichplan--;
>
> which seems to mean that executing Append in parallel mode disregards the
> scan direction.  I am not immediately sure what implications that has, so
> I checked what heap scan does when executing in parallel mode, and found
> this in heapgettup():
>
> else if (backward)
> {
> /* backward parallel scan not supported */
> Assert(scan->rs_parallel == NULL);
>
> Perhaps, AppendState.as_padesc would not have been set if scan direction
> is backward, because parallel mode would be disabled for the whole query
> in that case (PlannerGlobal.parallelModeOK = false).  Maybe add an
> Assert() similar to one in heapgettup().

There have been some reviews, but the patch has not been updated in
two weeks. Marking as "returned with feedback".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 8:16 AM, Tom Lane  wrote:
> [ in the service of closing out this thread... ]
>
> Peter Geoghegan  writes:
>> Finally, 0003-* is a Valgrind suppression borrowed from my parallel
>> CREATE INDEX patch. It's self-explanatory.
>
> Um, I didn't find it all that self-explanatory.  Why wouldn't we want
> to avoid writing undefined data?  I think the comment at least needs
> to explain exactly what part of the written data might be uninitialized.
> And I'd put the comment into valgrind.supp, too, not in the commit msg.
>
> Also, the suppression seems far too broad.  It would for instance
> block any complaint about a write() invoked via an elog call from
> any function invoked from any LogicalTape* function, no matter
> how far removed.

It seems like a new patch will be provided, so moved to next CF with
"waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-01-31 Thread Kyotaro HORIGUCHI
Thank you.

At Wed, 1 Feb 2017 14:11:58 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Amit Kapila
On Tue, Jan 31, 2017 at 5:53 PM, Rahila Syed  wrote:
> Hello,
>
>>Agreed, that it makes sense to consider only the number of pages to
>>scan for computation of parallel workers.  I think for index scan we
>>should consider both index and heap pages that need to be scanned
>>(costing of index scan consider both index and heap pages).   I thin
>>where considering heap pages matter more is when the finally selected
>>rows are scattered across heap pages or we need to apply a filter on
>>rows after fetching from the heap.  OTOH, we can consider just pages
>>in the index as that is where mainly the parallelism works
> IMO, considering just index pages will give a better estimate of work to be
> done
> in parallel. As the amount of work/number of pages divided amongst workers
> is irrespective of
> the number of heap pages scanned.
>

Yeah, I understand that point and I can see there is strong argument
to do that way, but let's wait and see what others including Robert
have to say about this point.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-01-31 Thread Michael Paquier
On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
 wrote:
> On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
>  wrote:
>>
> The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.

Is an update going to be provided? I have moved this patch to next CF
with "waiting on author" as status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
Thanks Robert for taking your time for the review.  I'll update the
patch with the changes suggested by you.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
> Nikita Glukhov  writes:
>> On 25.01.2017 23:58, Tom Lane wrote:
>>> I think you need to take a second look at the code you're producing
>>> and realize that it's not so clean either.  This extract from
>>> populate_record_field, for example, is pretty horrid:
>
>> But what if we introduce some helper macros like this:
>
>> #define JsValueIsNull(jsv) \
>>  ((jsv)->is_json ? !(jsv)->val.json.str \
>>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)
>
>> #define JsValueIsString(jsv) \
>>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)
>
> Yeah, I was wondering about that too.  I'm not sure that you can make
> a reasonable set of helper macros that will fix this, but if you want
> to try, go for it.
>
> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
> to go back to the manual every darn time to convince myself whether
> that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
> the reader (... or the author) having memorized C's precedence rules
> in quite that much detail.  Extra parens help.

Moved to CF 2017-03 as discussion is going on and more review is
needed on the last set of patches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: session server side variables

2017-01-31 Thread Pavel Stehule
2017-02-01 6:05 GMT+01:00 Michael Paquier :

> On Wed, Jan 11, 2017 at 10:42 PM, Craig Ringer 
> wrote:
> > There is no code yet. Code review and testing is where things get firmer.
> >
> > My personal stance right now is that I'd like to see catalog-decared
> typed
> > variables. I would prefer them to be transactional and would at least
> oppose
> > anything that didn't allow future room for that capability. I'd prefer
> that
> > non-transactional vars be clearly declared as such.
> >
> > In the end though... I'm not the one implementing it. I can have some
> > influence through the code review process. But it's whoever steps up
> with a
> > proposed implementation that has the biggest say. The rest of us can say
> yes
> > or no to some degree... but nobody can make someone else implement
> something
> > they don't want.
>
> The last patch is from the 6th of December and does not apply anymore:
> https://www.postgresql.org/message-id/CAFj8pRA9w_AujBAYdLR0UVfXwwoxhmn%
> 2BFbNHnD3_NL%3DJ9x3y8w%40mail.gmail.com
> I don't have a better idea than marking this patch as "returned with
> feedback" for now, as the thread has died 3 weeks ago as well.
>

There is not a agreement on the form of session variables.

Regards

Pavel


> --
> Michael
>


Re: [HACKERS] Cache Hash Index meta page.

2017-01-31 Thread Michael Paquier
On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy  wrote:
>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>> force_refresh);
>>
>> If the cache is initialized and force_refresh is not true, then this
>> just returns the cached data, and the metabuf argument isn't used.
>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>> the metabuffer, pin and lock it, use it to set the cache, release the
>> lock, and return with the pin still held.  If *metabuf !=
>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>
> Thanks, Robert I have made a new patch which tries to do same. Now I
> think code looks less complicated.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 11:05 AM, Claudio Freire  wrote:
> Updated and rebased v7 attached.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 6:32 PM, Peter Moser  wrote:
> [reviews and discussions]

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera 
wrote:

>
> > > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> > > +do { \
> > > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> > > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> > > +} while (0)
> >
> > > Actually, I think this macro could just return the TID so that it can
> be
> > > used as struct assignment, just like ItemPointerCopy does internally --
> > > callers can do
> > >ctid = HeapTupleHeaderGetNextTid(tup);
> >
> > While I agree with your proposal, I wonder why we have ItemPointerCopy()
> in
> > the first place because we freely copy TIDs as struct assignment. Is
> there
> > a reason for that? And if there is, does it impact this specific case?
>
> I dunno.  This macro is present in our very first commit d31084e9d1118b.
> Maybe it's an artifact from the Lisp to C conversion.  Even then, we had
> some cases of iptrs being copied by struct assignment, so it's not like
> it didn't work.  Perhaps somebody envisioned that the internal details
> could change, but that hasn't happened in two decades so why should we
> worry about it now?  If somebody needs it later, it can be changed then.
>
>
May I suggest in that case that we apply the attached patch which removes
all references to ItemPointerCopy and its definition as well? This will
avoid confusion in future too. No issues noticed in regression tests.


> > There is one issue that bothers me. The current implementation lacks
> > ability to convert WARM chains into HOT chains. The README.WARM has some
> > proposal to do that. But it requires additional free bit in tuple header
> > (which we don't have) and of course, it needs to be vetted and
> implemented.
> > If the heap ends up with many WARM tuples, then index-only-scans will
> > become ineffective because index-only-scan can not skip a heap page, if
> it
> > contains a WARM tuple. Alternate ideas/suggestions and review of the
> design
> > are welcome!
>
> t_infomask2 contains one last unused bit,


Umm, WARM is using 2 unused bits from t_infomask2. You mean there is
another free bit after that too?


> and we could reuse vacuum
> full's bits (HEAP_MOVED_OUT, HEAP_MOVED_IN), but that will need some
> thinking ahead.  Maybe now's the time to start versioning relations so
> that we can ensure clusters upgraded to pg10 do not contain any of those
> bits in any tuple headers.
>

Yeah, IIRC old VACUUM FULL was removed in 9.0, which is good 6 year old.
Obviously, there still a chance that a pre-9.0 binary upgraded cluster
exists and upgrades to 10. So we still need to do something about them if
we reuse these bits. I'm surprised to see that we don't have any mechanism
in place to clear those bits. So may be we add something to do that.

I'd some other ideas (and a patch too) to reuse bits from t_ctid.ip_pos
given that offset numbers can be represented in just 13 bits, even with the
maximum block size. Can look at that if it comes to finding more bits.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_itempointercopy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 2:44 AM, Peter Eisentraut
 wrote:
> On 1/15/17 5:53 AM, Pavel Stehule wrote:
>> the regress test fails
>>
>>  Program received signal SIGSEGV, Segmentation fault.
>> 0x007bbc2b in pattern_char_isalpha (locale_is_c=0 '\000',
>> locale=0x1a73220, is_multibyte=1 '\001', c=97 'a') at selfuncs.c:5291
>> 5291return isalpha_l((unsigned char) c, locale->lt);
>
> Here is an updated patch that fixes this crash and is rebased on top of
> recent changes.

Patch that still applies + no reviews = moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
 wrote:
> I noticed that this patch is conflicting with 665d1fa (Logical
> replication) so I rebased this. Only executor/Makefile
> conflicted.

The patches still apply, moved to CF 2017-03. Be aware of that:
$ git diff HEAD~6 --check
contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces.
+   PendingAsyncRequest *areq,
contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces.
+   bool reinit);
src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2017-01-31 Thread Michael Paquier
On Wed, Jan 11, 2017 at 11:55 PM, Pavel Stehule  wrote:
> I'll mark this patch as ready for commiter

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: session server side variables

2017-01-31 Thread Michael Paquier
On Wed, Jan 11, 2017 at 10:42 PM, Craig Ringer  wrote:
> There is no code yet. Code review and testing is where things get firmer.
>
> My personal stance right now is that I'd like to see catalog-decared typed
> variables. I would prefer them to be transactional and would at least oppose
> anything that didn't allow future room for that capability. I'd prefer that
> non-transactional vars be clearly declared as such.
>
> In the end though... I'm not the one implementing it. I can have some
> influence through the code review process. But it's whoever steps up with a
> proposed implementation that has the biggest say. The rest of us can say yes
> or no to some degree... but nobody can make someone else implement something
> they don't want.

The last patch is from the 6th of December and does not apply anymore:
https://www.postgresql.org/message-id/CAFj8pRA9w_AujBAYdLR0UVfXwwoxhmn%2BFbNHnD3_NL%3DJ9x3y8w%40mail.gmail.com
I don't have a better idea than marking this patch as "returned with
feedback" for now, as the thread has died 3 weeks ago as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 12:57 AM, Vitaly Burovoy
 wrote:
> Hello,
>
> I've reviewed the patch[1].
>
Noting to add from my side as well.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-01-31 Thread Michael Paquier
On Sat, Jan 28, 2017 at 8:02 PM, Amit Kapila  wrote:
> On Fri, Jan 27, 2017 at 5:15 PM, Ashutosh Sharma  
> wrote:
>>>
>>> Don't you think we should try to identify the reason of the deadlock
>>> error reported by you up thread [1]?  I know that you and Ashutosh are
>>> not able to reproduce it, but still I feel some investigation is
>>> required to find the reason.  It is quite possible that the test case
>>> is such that the deadlock is expected in rare cases, if that is the
>>> case then it is okay.  I have not spent enough time on that to comment
>>> whether it is a test or code issue.
>>
>> I am finally able to reproduce the issue using the attached script
>> file (deadlock_report). Basically, once I was able to reproduce the
>> issue with hash index I also thought of checking it with a non unique
>> B-Tree index and was able to reproduce it with B-Tree index as well.
>> This certainly tells us that there is nothing wrong at the code level
>> rather there is something wrong with the test script which is causing
>> this deadlock issue. Well, I have ran pgbench with two different
>> configurations and my observations are as follows:
>>
>> 1) With Primary keys i.e. uinque values: I have never encountered
>> deadlock issue with this configuration.
>>
>> 2) With non unique indexes (be it hash or B-Tree): I have seen
>> deadlock many a times with this configuration. Basically when we have
>> non-unique values associated with a column then there is a high
>> probability that multiple records will get updated with a single
>> 'UPDATE' statement and when there are huge number of backends trying
>> to do so there is high chance of getting deadlock which i assume is
>> expected behaviour in database.
>>
>
> I agree with your analysis, surely trying to update multiple rows with
> same values from different backends can lead to deadlock.

Moved that to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-01-31 Thread Michael Paquier
On Fri, Jan 13, 2017 at 12:23 PM, Amit Kapila  wrote:
> On Fri, Jan 13, 2017 at 1:04 AM, Jesper Pedersen
>  wrote:
>> On 12/27/2016 01:58 AM, Amit Kapila wrote:
>>>
>>> After recent commit's 7819ba1e and 25216c98, this patch requires a
>>> rebase.  Attached is the rebased patch.
>>>
>>
>> This needs a rebase after commit e898437.
>>
>
> Attached find the rebased patch.

The patch applies, and there have been no reviews since the last
version has been submitted. So moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2017-01-31 Thread Michael Paquier
On Sat, Dec 31, 2016 at 7:35 AM, Alvaro Herrera
 wrote:
> Attached is v4, which fixes a couple of relatively minor bugs.  There
> are still things to tackle before this is committable, but coding review
> of the new executor node would be welcome.

Moved to CF 2017-03 because of a lack of reviews. The patch fails to
apply and needs a rebase, so it is marked as "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2017-01-31 Thread Michael Paquier
On Mon, Dec 5, 2016 at 11:35 AM, Haribabu Kommi
 wrote:
> Moved to next CF with "needs review" status.

Same, this time to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2017-01-31 Thread Michael Paquier
On Mon, Jan 30, 2017 at 8:30 PM, Etsuro Fujita
 wrote:
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery, which
> are flags to indicate whether to deparse the input relations as subqueries.
> is_subquery_rel would work well for handling the cases of full joins with
> restrictions on the input relations, but I noticed that that wouldn't work
> well when extending to handle the cases where we deparse the input relations
> as subqueries to evaluate PHVs remotely.
> * Since appendSubqueryAlias in the previous patch is pretty small, I
> included the code into deparseRangeTblRef.

The patch is very fresh, so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 9:36 AM, Alvaro Herrera  wrote:
>> I propose that we should finish the job by inventing CatalogTupleDelete(),
>> which for the moment would be a trivial wrapper around
>> simple_heap_delete(), maybe just a macro for it.
>>
>> If there's no objections I'll go make that happen in a day or two.
>
> Sounds good.

As you are on it, I have moved the patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-31 Thread Michael Paquier
On Mon, Jan 23, 2017 at 6:56 PM, Etsuro Fujita
 wrote:
> Other changes:
>
> * Also modified CreateLocalJoinPath so that we pass outer/inner paths, not
> outer/inner rels, because it would be more flexible for the FDW to build the
> local-join path from paths it chose.
> * Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

The patch is moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-31 Thread Ashutosh Bapat
>
>> However, ExecHashIncreaseNumBatches() may change the
>> number of buckets; the patch does not seem to account for spaceUsed changes
>> because of that.
>
> That's what this hunk is intended to do:
>
> @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
> TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,
>
> hashtable->nbuckets_optimal);
>
> +   /* account for the increase in space that will be used by buckets */
> +   hashtable->spaceUsed += sizeof(HashJoinTuple) *
> +   (hashtable->nbuckets_optimal - hashtable->nbuckets);
> +   if (hashtable->spaceUsed > hashtable->spacePeak)
> +   hashtable->spacePeak = hashtable->spaceUsed;
> +

Sorry, I missed that hunk. You are right, it's getting accounted for.

>>
>> In ExecHashTableReset(), do we want to update spacePeak while setting
>> spaceUsed.
>
> I figured there was no way that the new spaceUsed value could be
> bigger than spacePeak, because we threw out all chunks and have just
> the bucket array, and we had that number of buckets before, so
> spacePeak must at least have been set to a number >= this number
> either when we expanded to this many buckets, or when we created the
> hashtable in the first place.  Perhaps I should
> Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

>
>> While this patch tracks space usage more accurately, I am afraid we might be
>> overdoing it; a reason why we don't track space usage accurately now. But I
>> think I will leave it to be judged by someone who is more familiar with the
>> code and possibly has historical perspective.
>
> Well it's not doing more work; it doesn't make any practical
> difference whatsoever but it's technically doing less work than
> master, by doing memory accounting only when acquiring a new 32KB
> chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

> But if by overdoing it you mean that no one really cares about
> the tiny increase in accuracy so the patch on its own is a bit of a
> waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

> Depending on tuple size, you
> could imagine something like 64 bytes of header and unused space per
> 32KB chunk that we're not accounting for, and that's only 0.2%.  So I
> probably wouldn't propose this refactoring just on accuracy grounds
> alone.
>
> This refactoring is intended to pave the way for shared memory
> accounting in the later patches, which would otherwise generate ugly
> IPC if done for every time a tuple is allocated.  I considered using
> atomic add to count space per tuple, or maintaining per-backend local
> subtotals and periodically summing.  Then I realised that switching to
> per-chunk accounting would fix the IPC problem AND be justifiable on
> theoretical grounds.  When we allocate a new 32KB chunk, we really are
> using 32KB more of your memory.

+1.

Thanks for considering the comments.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting planning time with EXPLAIN

2017-01-31 Thread Michael Paquier
On Wed, Jan 4, 2017 at 7:59 PM, Ashutosh Bapat
 wrote:
> Here are patches for following

Those patches have received no code-level reviews, so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 2:53 PM, Michael Paquier
 wrote:
> The latest patch available still applies, one person has added his
> name (John G) in October though there have been no reviews. There have
> been a couple of arguments against this patch, and the thread has had
> no activity for the last month, so I would be incline to mark that as
> returned with feedback. Thoughts?

And done as such. Feel free to update if there is something fresh.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : Parallel Merge Join

2017-01-31 Thread Michael Paquier
On Thu, Jan 5, 2017 at 12:57 AM, Dilip Kumar  wrote:
> On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila  wrote:
>> Review comments:
>> 1.
>> + bool is_partial);
>> +
>>
>> Seems additional new line is not required.
> Fixed

This patch has a patch, no new reviews. Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan  wrote:
> On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
>  wrote:
>> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
>>> Attached is V7 of the patch.
>>
>> I am doing some testing.  First, some superficial things from first pass:
>>
>> [Various minor cosmetic issues]
>
> Oops.

As this review is very recent, I have moved the patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-31 Thread Michael Paquier
On Tue, Jan 17, 2017 at 9:18 PM, Amit Kapila  wrote:
> On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar  wrote:
>> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
>>> I have reviewed the latest patch and I don't have any more comments.
>>> So if there is no objection from other reviewers I can move it to
>>> "Ready For Committer"?
>>
>> Seeing no objections, I have moved it to Ready For Committer.
>>
>
> Thanks for the review.

Moved to CF 2017-03, the 8th commit fest of this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump follow into the future

2017-01-31 Thread Michael Paquier
On Fri, Dec 2, 2016 at 1:36 PM, Haribabu Kommi  wrote:
> Patch received feedback at the end of commitfest.
> Closed in 2016-11 commitfest with "moved to next CF".
> Please feel free to update the status once you submit the updated patch.

And the thread has died as well weeks ago. I am marking that as
"returned with feedback" as there are no new patches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:31 PM, Michael Paquier
 wrote:
> On Tue, Nov 29, 2016 at 1:33 PM, Michael Paquier
>  wrote:
>> Patch moved to CF 2017-01.
>
> And nothing has happened since, the patch rotting a bit because of a
> conflict in pg_dump's TAP tests. Attached is a rebased version.

Moved to CF 2017-03..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2017-01-31 Thread Michael Paquier
On Tue, Nov 29, 2016 at 1:33 PM, Michael Paquier
 wrote:
> On Fri, Nov 25, 2016 at 4:41 PM, Michael Paquier
>  wrote:
>> On Fri, Nov 25, 2016 at 4:02 PM, Ideriha, Takeshi
>>  wrote:
>>> I applied your fixed patch and new one, and confirmed the applied source 
>>> passed the tests successfully. And I also checked manually the error 
>>> messages were emitted successfully when cr/lf are included in dbname or 
>>> rolename or data_directory.
>>>
>>> AFAICT, this patch satisfies the concept discussed before. So I’ve switched 
>>> this patch “Ready for Committer”.
>>
>> Thanks for the review, Ideriha-san. (See you next week perhaps?)
>
> Patch moved to CF 2017-01.

And nothing has happened since, the patch rotting a bit because of a
conflict in pg_dump's TAP tests. Attached is a rebased version.
-- 
Michael


0001-Forbid-newline-and-carriage-return-characters-in-dat.patch
Description: Binary data


0002-Ensure-clean-up-of-data-directory-even-with-restrict.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-01-31 Thread Michael Paquier
On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro
 wrote:
> Ok.  I see that there is a new compelling reason to move the ring
> buffer to the sender side: then I think lag tracking will work
> automatically for the new logical replication that just landed on
> master.  I will try it that way.  Thanks for the feedback!

Seeing no new patches, marked as returned with feedback. Feel free of
course to refresh the CF entry once you have a new patch!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-01-31 Thread Michael Paquier
On Tue, Dec 13, 2016 at 10:32 AM, Petr Jelinek
 wrote:
> Okay, this version looks good to me, marked as RfC.

The patches still apply, moved to CF 2017-03 with same status: RfC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-31 Thread Michael Paquier
On Fri, Jan 27, 2017 at 10:26 AM, Haribabu Kommi
 wrote:
> Thanks for the review.
> Let's wait for the committer's opinion.

I have moved this patch to CF 2017-03 to wait for this to happen.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-01-31 Thread Michael Paquier
On Mon, Jan 30, 2017 at 10:46 AM, Tsunakawa, Takayuki
 wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
>> I think it is better to document in some way if we decide to go-ahead with
>> the patch.
>
> Sure, I added these sentences.

Patch has been moved to CF 2017-03. There is a recent new patch, and
the discussion is moving on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
 wrote:
> Six new syscaches in 665d1fa was conflicted and 3-way merge
> worked correctly. The new syscaches don't seem to be targets of
> this patch.

To be honest, I am not completely sure what to think about this patch.
Moved to next CF as there is a new version, and no new reviews to make
the discussion perhaps move on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-01-31 Thread Michael Paquier
On Sun, Jan 29, 2017 at 11:09 AM, Ashutosh Sharma  wrote:
> okay. Thanks. I have done changes on top of this patch.

Moved to CF 2017-03 as there is a new patch, no reviews.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 6:49 PM, Masahiko Sawada  wrote:
> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
> use attached patch.

This patch has been moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
At Wed, 01 Feb 2017 12:05:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170201.120540.183393194.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> At Tue, 31 Jan 2017 12:46:18 +, "Shinoda, Noriyoshi" 
>  wrote in 
> 
> >  I tried a committed Logical Replication environment. I found
> >  that replication between databases of different encodings did
> >  not convert encodings in character type columns. Is this
> >  behavior correct?
> 
> The output plugin for subscription is pgoutput and it currently
> doesn't consider encoding but would easiliy be added if desired
> encoding is informed.
> 
> The easiest (but somewhat seems fragile) way I can guess is,
> 
> - Subscriber connects with client_encoding specification and the
>   output plugin pgoutput decide whether it accepts the encoding
>   or not. If the subscriber doesn't, pgoutput send data without
>   conversion.
> 
> The attached small patch does this and works with the following
> CREATE SUBSCRIPTION.

Oops. It forgets to care conversion failure. It is amended in the
attached patch.

> CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
> client_encoding=EUC_JP' PUBLICATION pub1;
> 
> 
> Also we may have explicit negotiation on, for example,
> CREATE_REPLICATION_SLOT.
> 
>  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> 
> Or output plugin may take options.
> 
>  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> 
> 
> Any opinions?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 1f30de6..3c457a2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,6 +443,20 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+		{
+			char *p;
+
+			p = pg_server_to_client(outputstr, strlen(outputstr));
+
+			/* Send the converted string on when succeeded */
+			if (p)
+			{
+pfree(outputstr);
+outputstr = p;
+			}
+		}
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
 		appendBinaryStringInfo(out, outputstr, len); /* data */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 31 Jan 2017 12:46:18 +, "Shinoda, Noriyoshi" 
 wrote in 

>  I tried a committed Logical Replication environment. I found
>  that replication between databases of different encodings did
>  not convert encodings in character type columns. Is this
>  behavior correct?

The output plugin for subscription is pgoutput and it currently
doesn't consider encoding but would easiliy be added if desired
encoding is informed.

The easiest (but somewhat seems fragile) way I can guess is,

- Subscriber connects with client_encoding specification and the
  output plugin pgoutput decide whether it accepts the encoding
  or not. If the subscriber doesn't, pgoutput send data without
  conversion.

The attached small patch does this and works with the following
CREATE SUBSCRIPTION.

CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
client_encoding=EUC_JP' PUBLICATION pub1;


Also we may have explicit negotiation on, for example,
CREATE_REPLICATION_SLOT.

 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'

Or output plugin may take options.

 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'


Any opinions?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 1f30de6..6a235d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,6 +443,9 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+			outputstr = pg_server_to_client(outputstr, strlen(outputstr));
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
 		appendBinaryStringInfo(out, outputstr, len); /* data */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
>
> Can this patch check contrib/bloom?

Only full pages are applied at redo by the generic WAL facility. So
you would finish by comparing a page with itself in generic_redo.

> +/*
> + * For a speculative tuple, the content of t_ctid is conflicting
> + * between the backup page and current page. Hence, we set it
> + * to the current block number and current offset.
> + */
>
> Why does it differ?  Is that a bug?

This has been discussed twice in this thread, once by me, once by Alvaro:
https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com
https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com

> +/*
> + * Leave if no masking functions defined, this is possible in the case
> + * resource managers generating just full page writes, comparing an
> + * image to itself has no meaning in those cases.
> + */
> +if (RmgrTable[rmid].rm_mask == NULL)
> +return;
>
> ...and also...
>
> +/*
> + * This feature is enabled only for the resource managers where
> + * a masking function is defined.
> + */
> +for (i = 0; i <= RM_MAX_ID; i++)
> +{
> +if (RmgrTable[i].rm_mask != NULL)
>
> Why do we assume that the feature is only enabled for RMs that have a
> mask function?  Why not instead assume that if there's no masking
> function, no masking is required?

Not all RMs work on full pages. Tracking in smgr.h the list of RMs
that are no-ops when masking things is easier than having empty
routines declared all over the code base. Don't you think so>

> +void(*rm_mask) (char *page, BlockNumber blkno);
>
> Could the page be passed as type "Page" rather than a "char *" to make
> things more convenient for the masking functions?  If not, could those
> functions at least do something like "Page page = (Page) pagebytes;"
> rather than "Page page_norm = (Page) page;"?

xlog_internal.h is used as well by frontends, this makes the header
dependencies cleaner. (I have looked at using Page when hacking this
stuff, but the header dependencies are not worth it, I don't recall
all the details though this was a couple of months back).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:17 AM, Robert Haas  wrote:
> Well, the question even for that case is whether it really costs
> anything.  My bet is that it is nearly free when it doesn't help, but
> that could be wrong.  My experience running pgbench tests is that
> prewarming all of pgbench_accounts on a scale factor that fits in
> shared_buffers using "dd" took just a few seconds, but when accessing
> the blocks in random order the cache took many minutes to heat up.

And benchmarks like dbt-1 have a pre-warming period added in the test
itself where the user can specify in a number of seconds to linearly
increase the load from 0% to 100%, just for filling in the OS and PG's
cache... This feature would be helpful.

> Now, I assume that this patch sorts the I/O (although I haven't
> checked that) and therefore I expect that the prewarm completes really
> fast.  If that's not the case, then that's bad.  But if it is the
> case, then it's not really hurting you even if the workload changes
> completely.

Having that working fast would be really nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 6:22 PM, Craig Ringer  wrote:
> That's where you've misunderstood - it isn't committed yet. The point or
> this change is to allow us to do logical decoding at the PREPARE TRANSACTION
> point. The xact is not yet committed or rolled back.

Yes, I got that. I was looking for a why or an actual use-case.

> Stas wants this for a conflict-free logical semi-synchronous replication
> multi master solution.

This sentence is hard to decrypt, less without "multi master" as the
concept applies basically only to only one master node.

> At PREPARE TRANSACTION time we replay the xact to
> other nodes, each of which applies it and PREPARE TRANSACTION, then replies
> to confirm it has successfully prepared the xact. When all nodes confirm the
> xact is prepared it is safe for the origin node to COMMIT PREPARED. The
> other nodes then see hat the first node has committed and they commit too.

OK, this is the argument I was looking for. So in your schema the
origin node, the one generating the changes, is itself in charge of
deciding if the 2PC should work or not. There are two channels between
the origin node and the replicas replaying the logical changes, one is
for the logical decoder with a receiver, the second one is used to
communicate the WAL apply status. I thought about something like
postgres_fdw doing this job with a transaction that does writes across
several nodes, that's why I got confused about this feature.
Everything goes through one channel, so the failure handling is just
simplified.

> Alternately if any node replies "could not replay xact" or "could not
> prepare xact" the origin node knows to ROLLBACK PREPARED. All the other
> nodes see that and rollback too.

The origin node could just issue the ROLLBACK or COMMIT and the
logical replicas would just apply this change.

> To really make it rock solid you also have to send the old and new values of
> a row, or have row versions, or send old row hashes. Something I also want
> to have, but we can mostly get that already with REPLICA IDENTITY FULL.

On a primary key (or a unique index), the default replica identity is
enough I think.

> It is of interest to me because schema changes in MM logical replication are
> more challenging awkward and restrictive without it. Optimistic conflict
> resolution doesn't work well for schema changes and once the conflicting
> schema changes are committed on different nodes there is no going back. So
> you need your async system to have a global locking model for schema changes
> to stop conflicts arising. Or expect the user not to do anything silly /
> misunderstand anything and know all the relevant system limitations and
> requirements... which we all know works just great in practice. You also
> need a way to ensure that schema changes don't render
> committed-but-not-yet-replayed row changes from other peers nonsensical. The
> safest way is a barrier where all row changes committed on any node before
> committing the schema change on the origin node must be fully replayed on
> every other node, making an async MM system temporarily sync single master
> (and requiring all nodes to be up and reachable). Otherwise you need a way
> to figure out how to conflict-resolve incoming rows with missing columns /
> added columns / changed types / renamed tables  etc which is no fun and
> nearly impossible in the general case.

That's one vision of things, FDW-like approaches would be a second,
but those are not able to pass down utility statements natively,
though this stuff can be done with the utility hook.

> I think the purpose of having the GID available to the decoding output
> plugin at PREPARE TRANSACTION time is that it can co-operate with a global
> transaction manager that way. Each node can tell the GTM "I'm ready to
> commit [X]". It is IMO not crucial since you can otherwise use a (node-id,
> xid) tuple, but it'd be nice for coordinating with external systems,
> simplifying inter node chatter, integrating logical deocding into bigger
> systems with external transaction coordinators/arbitrators etc. It seems
> pretty silly _not_ to have it really.

Well, Postgres-XC/XL save the 2PC GID for this purpose in the GTM,
this way the COMMIT/ABORT PREPARED can be issued from any nodes, and
there is a centralized conflict resolution, the latter being done with
a huge cost, causing much bottleneck in scaling performance.

> Personally I don't think lack of access to the GID justifies blocking 2PC
> logical decoding. It can be added separately. But it'd be nice to have
> especially if it's cheap.

I think it should be added reading this thread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of replication commands using printsimple

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 11:59 PM, Robert Haas  wrote:
> Sorry, I have a little more nitpicking.

Thanks for the input.

> How about having
> printsimple() use pq_sendcountedtext() instead of pq_sendint()
> followed by pq_sendbytes(), as it does for TEXTOID?
>
> Other than that, this looks fine to me now.

pq_sendcountedtext() does some encoding conversion, which is why I
haven't used because we deal only with integers in this patch... Now
if you wish to switch to that I have really no arguments against.
-- 
Michael


refactor-repl-cmd-output-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:11 AM, Peter Eisentraut
 wrote:
> And here is a rebased patch for the original feature.  I think this
> addresses all raised concerns and suggestions now.

Thanks for the new version. That looks good to me after an extra lookup.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Tom Lane wrote:

> BTW, the reason I think it's good cleanup is that it's something that my
> colleagues at Salesforce also had to do as part of putting PG on top of a
> different storage engine that had different ideas about index handling.
> Essentially it's providing a bit of abstraction as to whether catalog
> storage is exactly heaps or not (a topic I've noticed Robert is starting
> to take some interest in, as well).

Yeah, I remembered that too.  Of course, we'd need to change the whole
idea of mapping tuples to C structs too, but this seemed a nice step
forward.  (I renamed Pavan's proposed routine precisely to avoid the
word "Heap" in it.)

> However, the patch misses an
> important part of such an abstraction layer by not also converting
> catalog-related simple_heap_delete() calls into some sort of
> CatalogTupleDelete() operation.  It is certainly a peculiarity of
> PG heaps that deletions don't require any immediate index work --- most
> other storage engines would need that.

> I propose that we should finish the job by inventing CatalogTupleDelete(),
> which for the moment would be a trivial wrapper around
> simple_heap_delete(), maybe just a macro for it.
> 
> If there's no objections I'll go make that happen in a day or two.

Sounds good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-31 Thread Thomas Munro
On Wed, Feb 1, 2017 at 2:10 AM, Ashutosh Bapat
 wrote:
>>
>> 0003-hj-refactor-memory-accounting-v4.patch:
>> [...]
>>
> I looked at this patch. I agree that it accounts the memory usage more
> accurately. Here are few comments.

Thanks for the review!

> spaceUsed is defined with comment
> SizespaceUsed;/* memory space currently used by tuples */
>
> In ExecHashTableCreate(), although the space is allocated for buckets, no
> tuples are yet inserted, so no space is used by the tuples, so going strictly
> by the comment, spaceUsed should be 0 in that function. But I think the patch
> is accounting the spaceUsed more accurately. Without this patch, the actual
> allocation might cross spaceAllowed without being noticed. With this patch
> that's not possible. Probably we should change the comment to say memory space
> currently allocated.

Right, that comment is out of date.  It is now the space used by the
bucket array and the tuples.  I will fix that in the next version.

> However, ExecHashIncreaseNumBatches() may change the
> number of buckets; the patch does not seem to account for spaceUsed changes
> because of that.

That's what this hunk is intended to do:

@@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,

hashtable->nbuckets_optimal);

+   /* account for the increase in space that will be used by buckets */
+   hashtable->spaceUsed += sizeof(HashJoinTuple) *
+   (hashtable->nbuckets_optimal - hashtable->nbuckets);
+   if (hashtable->spaceUsed > hashtable->spacePeak)
+   hashtable->spacePeak = hashtable->spaceUsed;
+
hashtable->nbuckets = hashtable->nbuckets_optimal;
hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal;

It knows that spaceUsed already includes the old bucket array
(nbuckets), so it figures out how much bigger the new bucket array
will be (nbuckets_optimal - nbuckets) and adds that.

> Without this patch ExecHashTableInsert() used to account for the space used by
> a single tuple inserted. The patch moves this calculation in dense_alloc() and
> accounts for out-of-bound allocation for larger tuples. That's good.
>
> The change in ExecChooseHashTableSize() too looks fine.
>
> In ExecHashTableReset(), do we want to update spacePeak while setting
> spaceUsed.

I figured there was no way that the new spaceUsed value could be
bigger than spacePeak, because we threw out all chunks and have just
the bucket array, and we had that number of buckets before, so
spacePeak must at least have been set to a number >= this number
either when we expanded to this many buckets, or when we created the
hashtable in the first place.  Perhaps I should
Assert(hashtable->spaceUsed <= hashtable->spacePeak).

> While this patch tracks space usage more accurately, I am afraid we might be
> overdoing it; a reason why we don't track space usage accurately now. But I
> think I will leave it to be judged by someone who is more familiar with the
> code and possibly has historical perspective.

Well it's not doing more work; it doesn't make any practical
difference whatsoever but it's technically doing less work than
master, by doing memory accounting only when acquiring a new 32KB
chunk.  But if by overdoing it you mean that no one really cares about
the tiny increase in accuracy so the patch on its own is a bit of a
waste of time, you're probably right.  Depending on tuple size, you
could imagine something like 64 bytes of header and unused space per
32KB chunk that we're not accounting for, and that's only 0.2%.  So I
probably wouldn't propose this refactoring just on accuracy grounds
alone.

This refactoring is intended to pave the way for shared memory
accounting in the later patches, which would otherwise generate ugly
IPC if done for every time a tuple is allocated.  I considered using
atomic add to count space per tuple, or maintaining per-backend local
subtotals and periodically summing.  Then I realised that switching to
per-chunk accounting would fix the IPC problem AND be justifiable on
theoretical grounds.  When we allocate a new 32KB chunk, we really are
using 32KB more of your memory.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Andres Freund
On 2017-01-31 17:21:28 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
> > fairly widely used in extensions - it seems like a pretty harsh change
> > to not leave some backward compatibility layer in place.
> 
> If an extension is doing that, it is probably constructing tuples to put
> into the catalog, which means it'd be equally (and much more quietly)
> broken by any change to the catalog's schema.  We've never considered
> such an argument as a reason not to change catalog schemas, though.

I know of several extensions that use CatalogUpdateIndexes() to update
their own tables. Citus included (It's trivial to change on our side, so
that's not a reason to do or not do something).  There really is no
convenient API to do so without it.

> (I'm a little more concerned by Alvaro's apparent position that WARM
> is a done deal; I didn't think so.  This particular change seems like
> good cleanup anyhow, however.)

Yea, I don't think we're even close to that either.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> (I'm a little more concerned by Alvaro's apparent position that WARM
>> is a done deal; I didn't think so.  This particular change seems like
>> good cleanup anyhow, however.)

> Agreed.

BTW, the reason I think it's good cleanup is that it's something that my
colleagues at Salesforce also had to do as part of putting PG on top of a
different storage engine that had different ideas about index handling.
Essentially it's providing a bit of abstraction as to whether catalog
storage is exactly heaps or not (a topic I've noticed Robert is starting
to take some interest in, as well).  However, the patch misses an
important part of such an abstraction layer by not also converting
catalog-related simple_heap_delete() calls into some sort of
CatalogTupleDelete() operation.  It is certainly a peculiarity of
PG heaps that deletions don't require any immediate index work --- most
other storage engines would need that.

I propose that we should finish the job by inventing CatalogTupleDelete(),
which for the moment would be a trivial wrapper around
simple_heap_delete(), maybe just a macro for it.

If there's no objections I'll go make that happen in a day or two.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund  writes:
> > Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
> > fairly widely used in extensions - it seems like a pretty harsh change
> > to not leave some backward compatibility layer in place.
> 
> If an extension is doing that, it is probably constructing tuples to put
> into the catalog, which means it'd be equally (and much more quietly)
> broken by any change to the catalog's schema.  We've never considered
> such an argument as a reason not to change catalog schemas, though.
> 
> In short, I've got mighty little sympathy for that argument.

+1

> (I'm a little more concerned by Alvaro's apparent position that WARM
> is a done deal; I didn't think so.  This particular change seems like
> good cleanup anyhow, however.)

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Should `pg_upgrade --check` check relation filenodes are present?

2017-01-31 Thread Tom Lane
Craig de Stigter  writes:
> We attempted to pg_upgrade a database on a replication slave, and got the
> error:

> error while creating link for relation "."
> ("/var/lib/postgresql-ext/PG_9.2_201204301/19171/141610397" to
> "/var/lib/postgresql-ext/PG_9.5_201510051/16401/9911696"): No such file or
> directory
>> 
>> 
>> 
> The missing table turned out to be an unlogged table, and the data file for
> it was not present on the slave machine. That's reasonable. In our case we
> are able to start over from a snapshot and drop all the unlogged tables
> before trying again.

> However, this problem was not caught by the `--check` command. I'm looking
> at the source code and it appears that pg_upgrade does not attempt to
> verify relation filenodes actually exist before proceeding, whether using
> --check or not.

> Should it? I assume the reasoning is because it would take a long time and
> perhaps the benefit of doing so would be minimal?

This failure would occur before we'd done anything irretrievable to the
source DB, so I'm not all that concerned.  You could have just re-initdb'd
the target directory and started over (after dropping the unlogged tables
of course).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Robert Haas
On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas  wrote:
> On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila  wrote:
>> In spite of being careful, I missed reorganizing the functions in
>> genam.h which I have done in attached patch.
>
> Cool.  Committed parallel-generic-index-scan.2.patch.  Thanks.

Reviewing parallel_index_scan_v6.patch:

I think it might be better to swap the return value and the out
parameter for _bt_parallel_seize; that is, return a bool, and have
callers ignore the value of the out parameter (e.g. *pageno).

I think _bt_parallel_advance_scan should be renamed something that
includes the word "keys", like _bt_parallel_advance_array_keys.

The hunk in indexam.c looks like a leftover that can be omitted.

+/*
+ * Below flags are used to indicate the state of parallel scan.

They aren't really flags any more; they're members of an enum.  I
think you could just leave this sentence out entirely and start right
in with descriptions of the individual values.  But maybe all of those
descriptions should end in a period (instead of one ending in a period
but not the other three) since they read like complete sentences.

+ * btinitparallelscan - Initializing BTParallelScanDesc for parallel btree scan

Initializing -> initialize

+ *  btparallelrescan() -- reset parallel scan

Previous two prototypes have one dash, this one has two.  Make it
consistent, please.

+ * Ideally, we don't need to acquire spinlock here, but being
+ * consistent with heap_rescan seems to be a good idea.

How about: In theory, we don't need to acquire the spinlock here,
because there shouldn't be any other workers running at this point,
but we do so for consistency.

+ * _bt_parallel_seize() -- returns the next block to be scanned for forward
+ *  scans and latest block scanned for backward scans.

I think the description should be more like "Begin the process of
advancing the scan to a new page.  Other scans must wait until we call
bt_parallel_release() or bt_parallel_done()."  And likewise
_bt_parallel_release() should say something like "Complete the process
of advancing the scan to a new page.  We now have the new value for
btps_scanPage; some other backend can now begin advancing the scan."
And _bt_parallel_done should say something like "Mark the parallel
scan as complete."

I am a bit mystified about how this manages to work with array keys.
_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
is already BTPARALLEL_DONE.  It seems like one of those two things has
to be wrong.

_bt_readpage's header comment should be updated to note that, in the
case of a parallel scan, _bt_parallel_seize should have been called
prior to entering this function, and _bt_parallel_release will be
called prior to return (or this could alternatively be phrased in
terms of btps_pageStatus on entry/exit).

_bt_readnextpage isn't very clear about the meaning of its blkno
argument.  It looks like it always has to be valid when scanning
forward, but only in the parallel case when scanning backward?  That
is a little odd.  Another, possibly related thing that is odd is that
when _bt_steppage() finds no tuples and decides to advance to a new
page again, there's a very short comment in the forward case and a
very long comment in the backward case:

/* nope, keep going */
vs.
/*
 * For parallel scans, get the last page scanned as it is quite
 * possible that by the time we try to fetch previous page, other
 * worker has also decided to scan that previous page.  We could
 * avoid that by doing _bt_parallel_release once we have read the
 * current page, but it is bad to make other workers wait till we
 * read the page.
 */

Now it seems to me that these cases are symmetric and the issues are
identical.  It's basically that, while we were judging whether the
current page has useful contents, some other process could have
advanced the scan (since _bt_readpage un-seized it).

-/* check for deleted page */
 page = BufferGetPage(so->currPos.buf);
 TestForOldSnapshot(scan->xs_snapshot, rel, page);
 opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+/* check for deleted page */

This is an independent change; committed.

What kind of testing has been done to ensure that this doesn't have
concurrency bugs?  What's the highest degree of parallelism that's
been tested?  Did that test include array keys?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Tom Lane
Andres Freund  writes:
> Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
> fairly widely used in extensions - it seems like a pretty harsh change
> to not leave some backward compatibility layer in place.

If an extension is doing that, it is probably constructing tuples to put
into the catalog, which means it'd be equally (and much more quietly)
broken by any change to the catalog's schema.  We've never considered
such an argument as a reason not to change catalog schemas, though.

In short, I've got mighty little sympathy for that argument.

(I'm a little more concerned by Alvaro's apparent position that WARM
is a done deal; I didn't think so.  This particular change seems like
good cleanup anyhow, however.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Andres Freund
On 2017-01-31 19:10:05 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2017-01-31 14:10:01 -0300, Alvaro Herrera wrote:
> 
> > > Hmm, I was thinking we would get rid of CatalogUpdateIndexes altogether.
> > > Two of the callers are in the new routines (which I propose to rename to
> > > CatalogTupleInsert and CatalogTupleUpdate); the only remaining one is in
> > > InsertPgAttributeTuple.  I propose that we inline the three lines into
> > > all those places and just remove CatalogUpdateIndexes.  Half the out-of-
> > > core places that are using this function will be broken as soon as WARM
> > > lands anyway.  I see no reason to keep it.  (I have already modified the
> > > patch this way -- no need to resend).
> > > 
> > > Unless there are objections I will push this later this afternoon.
> > 
> > Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
> > fairly widely used in extensions - it seems like a pretty harsh change
> > to not leave some backward compatibility layer in place.
> 
> Yeah, I can put it back if there's pushback about the removal, but I
> think it's going to break due to WARM anyway.

I'm a bit doubtful (but not extremely so) that that's ok.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Tom Lane
BTW ... while I've been fooling with this issue, I've gotten a bit
annoyed at the fact that "\set" prints the variables in, essentially,
creation order.  That makes the list ugly and hard to find things in,
and it exposes some psql implementation details to users.  I propose
the attached simple patch to keep the list in name order instead.

(This is on top of my previous patch, but it'd be pretty trivial
to modify to apply against HEAD.)

regards, tom lane

diff --git a/src/bin/psql/variables.c b/src/bin/psql/variables.c
index b9b8fcb..9ca1000 100644
*** a/src/bin/psql/variables.c
--- b/src/bin/psql/variables.c
*** valid_variable_name(const char *name)
*** 43,48 
--- 43,51 
  /*
   * A "variable space" is represented by an otherwise-unused struct _variable
   * that serves as list header.
+  *
+  * The list entries are kept in name order (according to strcmp).  This
+  * is mainly to make the results of PrintVariables() more pleasing.
   */
  VariableSpace
  CreateVariableSpace(void)
*** GetVariable(VariableSpace space, const c
*** 74,84 
  
  	for (current = space->next; current; current = current->next)
  	{
! 		if (strcmp(current->name, name) == 0)
  		{
  			/* this is correct answer when value is NULL, too */
  			return current->value;
  		}
  	}
  
  	return NULL;
--- 77,91 
  
  	for (current = space->next; current; current = current->next)
  	{
! 		int			cmp = strcmp(current->name, name);
! 
! 		if (cmp == 0)
  		{
  			/* this is correct answer when value is NULL, too */
  			return current->value;
  		}
+ 		if (cmp > 0)
+ 			break;/* it's not there */
  	}
  
  	return NULL;
*** SetVariable(VariableSpace space, const c
*** 247,253 
  		 current;
  		 previous = current, current = current->next)
  	{
! 		if (strcmp(current->name, name) == 0)
  		{
  			/*
  			 * Found entry, so update, unless assign hook returns false.
--- 254,262 
  		 current;
  		 previous = current, current = current->next)
  	{
! 		int			cmp = strcmp(current->name, name);
! 
! 		if (cmp == 0)
  		{
  			/*
  			 * Found entry, so update, unless assign hook returns false.
*** SetVariable(VariableSpace space, const c
*** 293,298 
--- 302,309 
  
  			return confirmed;
  		}
+ 		if (cmp > 0)
+ 			break;/* it's not there */
  	}
  
  	/* not present, make new entry ... unless we were asked to delete */
*** SetVariable(VariableSpace space, const c
*** 303,309 
  		current->value = pg_strdup(value);
  		current->substitute_hook = NULL;
  		current->assign_hook = NULL;
! 		current->next = NULL;
  		previous->next = current;
  	}
  	return true;
--- 314,320 
  		current->value = pg_strdup(value);
  		current->substitute_hook = NULL;
  		current->assign_hook = NULL;
! 		current->next = previous->next;
  		previous->next = current;
  	}
  	return true;
*** SetVariableHooks(VariableSpace space, co
*** 343,349 
  		 current;
  		 previous = current, current = current->next)
  	{
! 		if (strcmp(current->name, name) == 0)
  		{
  			/* found entry, so update */
  			current->substitute_hook = shook;
--- 354,362 
  		 current;
  		 previous = current, current = current->next)
  	{
! 		int			cmp = strcmp(current->name, name);
! 
! 		if (cmp == 0)
  		{
  			/* found entry, so update */
  			current->substitute_hook = shook;
*** SetVariableHooks(VariableSpace space, co
*** 354,359 
--- 367,374 
  (void) (*ahook) (current->value);
  			return;
  		}
+ 		if (cmp > 0)
+ 			break;/* it's not there */
  	}
  
  	/* not present, make new entry */
*** SetVariableHooks(VariableSpace space, co
*** 362,368 
  	current->value = NULL;
  	current->substitute_hook = shook;
  	current->assign_hook = ahook;
! 	current->next = NULL;
  	previous->next = current;
  	if (shook)
  		current->value = (*shook) (current->value);
--- 377,383 
  	current->value = NULL;
  	current->substitute_hook = shook;
  	current->assign_hook = ahook;
! 	current->next = previous->next;
  	previous->next = current;
  	if (shook)
  		current->value = (*shook) (current->value);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-01-31 14:10:01 -0300, Alvaro Herrera wrote:

> > Hmm, I was thinking we would get rid of CatalogUpdateIndexes altogether.
> > Two of the callers are in the new routines (which I propose to rename to
> > CatalogTupleInsert and CatalogTupleUpdate); the only remaining one is in
> > InsertPgAttributeTuple.  I propose that we inline the three lines into
> > all those places and just remove CatalogUpdateIndexes.  Half the out-of-
> > core places that are using this function will be broken as soon as WARM
> > lands anyway.  I see no reason to keep it.  (I have already modified the
> > patch this way -- no need to resend).
> > 
> > Unless there are objections I will push this later this afternoon.
> 
> Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
> fairly widely used in extensions - it seems like a pretty harsh change
> to not leave some backward compatibility layer in place.

Yeah, I can put it back if there's pushback about the removal, but I
think it's going to break due to WARM anyway.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-01-31 Thread Jim Nasby

On 11/29/16 9:58 AM, Jeff Janes wrote:


Considering a single SSD can do 70% of that limit, I would say yes.


Next question becomes... should there even be an upper limit?


Where the contortions needed to prevent calculation overflow become
annoying?

I'm not a big fan of nannyism in general, but the limits on this
parameter seem particularly pointless.  You can't write out more buffers
than exist in the dirty state, nor more than implied
by bgwriter_lru_multiplier.  So what is really the worse that can happen
if you make it too high?


Attached is a patch that ups the limit to INT_MAX / 2, which is the same 
as shared_buffers.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
commit 37e42579c5bdf8868109b157827bcac5ed83af7e
Author: Jim Nasby 
Date:   Tue Jan 31 15:55:15 2017 -0600

Increase upper limit for bgwriter_lru_maxpages

The old maximum limited the bgwriter to 780MB/s with 8k pages and a
bgwriter_delay of 10ms. Modern hardware can easily exceed that.

Current code does not depend on any limits to this setting (at most it
will make one complete pass through shared buffers per round), but
setting it higher than the maximum for shared_buffers is surely an
error, so treat it as such.

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 74ca4e7432..e9642fe609 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1843,7 +1843,7 @@ static struct config_int ConfigureNamesInt[] =
GUC_UNIT_BLOCKS
},
,
-   1024, 16, INT_MAX / 2,
+   1024, 16, INT_MAX / 2, /* See also bgwriter_lru_maxpages. */
NULL, NULL, NULL
},
 
@@ -2419,7 +2419,7 @@ static struct config_int ConfigureNamesInt[] =
NULL
},
_lru_maxpages,
-   100, 0, 1000,
+   100, 0, INT_MAX / 2, /* Same upper limit as shared_buffers */
NULL, NULL, NULL
},
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Should `pg_upgrade --check` check relation filenodes are present?

2017-01-31 Thread Craig de Stigter
Hi list

We attempted to pg_upgrade a database on a replication slave, and got the
error:

error while creating link for relation "."
>> ("/var/lib/postgresql-ext/PG_9.2_201204301/19171/141610397" to
>> "/var/lib/postgresql-ext/PG_9.5_201510051/16401/9911696"): No such file or
>> directory
>
>
>
The missing table turned out to be an unlogged table, and the data file for
it was not present on the slave machine. That's reasonable. In our case we
are able to start over from a snapshot and drop all the unlogged tables
before trying again.

However, this problem was not caught by the `--check` command. I'm looking
at the source code and it appears that pg_upgrade does not attempt to
verify relation filenodes actually exist before proceeding, whether using
--check or not.

Should it? I assume the reasoning is because it would take a long time and
perhaps the benefit of doing so would be minimal?


-- 
Regards,
Craig de Stigter

Developer
Koordinates

+64 21 256 9488 <+64%2021%20256%209488> / koordinates.com / @koordinates



Re: [HACKERS] ICU integration

2017-01-31 Thread Thomas Munro
On Tue, Jan 10, 2017 at 9:45 AM, Peter Geoghegan  wrote:
> * I think it's worth looking into ucol_nextSortKeyPart(), and using
> that as an alternative to ucol_getSortKey(). It doesn't seem any
> harder, and when I tested it it was clearly faster. (I think that
> ucol_nextSortKeyPart() is more or less intended to be used for
> abbreviated keys.)

+1

I assume (but haven't checked) that ucol_nextSortKeyPart accesses only
the start of the string via the UCharIterator passed in, unless you
have the rare reverse-accent-sort feature enabled (maybe used only in
fr_CA, it looks like it is required to scan the whole string looking
for the last accent).  So I assume that uiter_setUTF8 and
ucol_nextSortKeyPart would usually do a small fixed amount of work,
whereas this patch's icu_to_uchar allocates space and converts the
whole variable length string every time.

On the other hand, I see that this patch's icu_to_uchar can deal with
encodings other than UTF8.  At first glance, the UCharIterator
interface provides a way to make a UCharIterator that iterates over a
string of UChar or a string of UTF8, but I'm not sure how to make it
do character-by-character transcoding from arbitrary encodings via the
C API.  It seems like that should be possible using ucnv_getNextUChar
as the source of transcoded characters, but I'm not sure how to wire
those two things together (in C++ I think you'd subclass
icu::CharacterIterator).

That's about abbreviation, but I note that you can also compare
strings using iterators with ucol_strcollIter, avoiding the need to
allocate and transcode up front.  I have no idea whether that'd pay
off.

+   A change in collation definitions can lead to corrupt indexes and other
+   problems where the database system relies on stored objects having a
+   certain sort order.  Generally, this should be avoided, but it can happen
+   in legitimate circumstances, such as when
+   using pg_upgrade to upgrade to server binaries linked
+   with a newer version of ICU.  When this happens, all objects depending on
+   the collation should be rebuilt, for example,
+   using REINDEX.  When that is done, the collation version
+   can be refreshed using the command ALTER COLLATION ... REFRESH
+   VERSION.  This will update the system catalog to record the
+   current collator version and will make the warning go away.  Note that this
+   does not actually check whether all affected objects have been rebuilt

I think this is a pretty reasonable first approach to this problem.
It's a simple way to flag up a problem to the DBA, but leaves all the
responsibility for figuring out how to fix it to the DBA.  I think we
should considering going further in later patches (tracking the
version used at last rebuild per index etc as discussed, so that the
condition is cleared only by rebuilding the affected things).

(REPARTITION anyone?)

As far as I know there are two things moving: ICU code and CLDR data.
Here we can see the CLDR versions being pulled into ICU:

http://bugs.icu-project.org/trac/log/icu/trunk/source/data/locales?rev=39273

Clearly when you upgrade your system from (say) Debian 8 to Debian 9
and the ICU major version changes we expect to have to REINDEX, but
does anyone know if such data changes are ever pulled into the minor
version package upgrades you get from regular apt-get update of (say)
a Debian 8 or CentOS 7 or FreeBSD 11 system?  In other words, do we
expect collversion changes to happen basically any time in response to
regular system updates, or only when you're doing a major upgrade of
some kind, as the above-quoted documentation patch implies?

+   collversion = ntohl(*((uint32 *) versioninfo));

UVersionInfo is an array of four uint8_t.  That doesn't sound like
something that needs to be bit-swizzled... is it really?  Even if it
is arranged differently on different architectures, I'm not sure why
you care since we only ever use it to compare for equality on the same
system.  But aside from that, I don't love this cast to uint32.  I
wonder if we should use u_versionToString to respect boundaries a bit
better?

I have another motivation for wanting to model collation versions as
strings: I have been contemplating a version check for system-provided
collations too, piggy-backing on your work here.  Obviously PostgreSQL
can't directly know anything about system collation versions, but I'm
thinking of a GUC system_collation_version_command which you could
optionally set to a script that knows how to inspect the local
operating system.  For example, a package maintainer might be
interested in writing such a script that knows how to ask the package
system for a locale data version.  A brute force approach that works
on many systems could be 'tar c /usr/share/local/*/LC_COLLATE | md5'.
A string would provide more leeway than a measly int32.  That's
independent of this patch and you might hate the whole idea, but seems
to be the kind of thing you anticipated when you 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Andres Freund
On 2017-01-31 14:10:01 -0300, Alvaro Herrera wrote:
> Pavan Deolasee wrote:
> 
> > Two new APIs added.
> > 
> > - CatalogInsertHeapAndIndex which does a simple_heap_insert followed by
> > catalog updates
> > - CatalogUpdateHeapAndIndex which does a simple_heap_update followed by
> > catalog updates
> > 
> > There are only a handful callers remain for simple_heap_insert/update after
> > this patch. They are typically working with already opened indexes and
> > hence I left them unchanged.
> 
> Hmm, I was thinking we would get rid of CatalogUpdateIndexes altogether.
> Two of the callers are in the new routines (which I propose to rename to
> CatalogTupleInsert and CatalogTupleUpdate); the only remaining one is in
> InsertPgAttributeTuple.  I propose that we inline the three lines into
> all those places and just remove CatalogUpdateIndexes.  Half the out-of-
> core places that are using this function will be broken as soon as WARM
> lands anyway.  I see no reason to keep it.  (I have already modified the
> patch this way -- no need to resend).
> 
> Unless there are objections I will push this later this afternoon.

Hm, sorry for missing this earlier.  I think CatalogUpdateIndexes() is
fairly widely used in extensions - it seems like a pretty harsh change
to not leave some backward compatibility layer in place.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Unless there are objections I will push this later this afternoon.

Done.  Let's get on with the show -- please post a rebased WARM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-31 Thread Pavel Stehule
Hi

2017-01-31 14:57 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2017-01-24 21:38 GMT+01:00 Alvaro Herrera :
>
> > > I think it would be good to have a more complex test case in regress --
> > > let's say there is a table with some simple XML values, then we use
> > > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > > large document, and then XMLTABLE uses that document as input document.
> >
> > I have a 16K lines long real XML 6.MB. Probably we would not to append it
> > to regress tests.
> >
> > It is really fast - original customer implementation 20min, nested our
> > xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
>
> That's great numbers, kudos for the hard work here.  That will make for
> a nice headline in pg10 PR materials.  But what I was getting at is that
> I would like to exercise a bit more of the expression handling in
> xmltable execution, to make sure it doesn't handle just string literals.
>

done


>
> > I have a plan to create tests based on pg_proc and CTE - if all works,
> then
> > the query must be empty
> >
> > with x as (select proname, proowner, procost, pronargs,
> > array_to_string(proargnames,',') as proargnames,
> > array_to_string(proargtypes,',') as proargtypes from pg_proc), y as
> (select
> > xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> > proargnames, proargtypes)) as proc from x), z as (select xmltable.* from
> y,
> > lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> > procost float, pronargs int, proargnames text, proargtypes text)) select
> *
> > from z except select * from x;
>
> Nice one :-)
>

please see attached patch

* enhanced regress tests
* clean memory context work

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


xmltable-42.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Tom Lane
I wrote:
> Attached is a draft patch for that.  I chose to make a second hook rather
> than complicate the assign hook API, mainly because it allows more code
> sharing --- all the bool vars can share the same substitute hook, and
> so can the three-way vars as long as "on" and "off" are the appropriate
> substitutes.

> I only touched the behavior for bool vars here, but if people like this
> solution it could be fleshed out to apply to all the built-in variables.

Attached is a finished version that includes hooks for all the variables
for which they were clearly sensible.  (FETCH_COUNT doesn't seem to really
need one, and I didn't do anything with HISTSIZE or IGNOREEOF either.
It might be worth bringing the latter two into the hooks paradigm, but
that seems like fit material for a separate patch.)

I updated the documentation as well.  I think this is committable if
there are not objections.

regards, tom lane

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 4e51e90..b9c8fcc 100644
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** EOF
*** 455,462 
any, by an equal sign on the command line. To unset a variable,
leave off the equal sign. To set a variable with an empty value,
use the equal sign but leave off the value. These assignments are
!   done during a very early stage of start-up, so variables reserved
!   for internal purposes might get overwritten later.


  
--- 455,462 
any, by an equal sign on the command line. To unset a variable,
leave off the equal sign. To set a variable with an empty value,
use the equal sign but leave off the value. These assignments are
!   done during command line processing, so variables that reflect
!   connection state will get overwritten later.


  
*** lo_import 152801
*** 2692,2698 
  class="parameter">name to value, or if more than one value
  is given, to the concatenation of all of them. If only one
! argument is given, the variable is set with an empty value. To
  unset a variable, use the \unset command.
  
  
--- 2692,2698 
  class="parameter">name to value, or if more than one value
  is given, to the concatenation of all of them. If only one
! argument is given, the variable is set to an empty-string value. To
  unset a variable, use the \unset command.
  
  
*** lo_import 152801
*** 2709,2717 
  
  
  
! Although you are welcome to set any variable to anything you
! want, psql treats several variables
! as special. They are documented in the section about variables.
  
  
  
--- 2709,2719 
  
  
  
! Certain variables are special, in that they
! control psql's behavior or are
! automatically set to reflect connection state.  These variables are
! documented in , below.
  
  
  
*** testdb= \setenv LESS -imx
*** 2835,2840 
--- 2837,2850 
  Unsets (deletes) the psql variable name.
  
+ 
+ 
+ Most variables that control psql's behavior
+ cannot be unset; instead, an \unset command is interpreted
+ as setting them to their default values.
+ See , below.
+ 
  

  
*** bar
*** 3053,3059 
  
  
  If you call \set without a second argument, the
! variable is set, with an empty string as value. To unset (i.e., delete)
  a variable, use the command \unset.  To show the
  values of all variables, call \set without any argument.
  
--- 3063,3069 
  
  
  If you call \set without a second argument, the
! variable is set to an empty-string value. To unset (i.e., delete)
  a variable, use the command \unset.  To show the
  values of all variables, call \set without any argument.
  
*** bar
*** 3082,3089 
  By convention, all specially treated variables' names
  consist of all upper-case ASCII letters (and possibly digits and
  underscores). To ensure maximum compatibility in the future, avoid
! using such variable names for your own purposes. A list of all specially
! treated variables follows.
 
  
  
--- 3092,3114 
  By convention, all specially treated variables' names
  consist of all upper-case ASCII letters (and possibly digits and
  underscores). To ensure maximum compatibility in the future, avoid
! using such variable names for your own purposes.
!
! 
!
! Variables that control psql's behavior
! generally cannot be unset or set to invalid values.  An \unset
! command is allowed but is interpreted as setting the variable to its
! default value.  A \set command 

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-31 Thread Craig Ringer
On 31 Jan. 2017 22:43, "Konstantin Knizhnik" 
wrote:



On 31.01.2017 09:29, Michael Paquier wrote:

> On Fri, Jan 27, 2017 at 8:52 AM, Craig Ringer 
> wrote:
>
>> Now, if it's simpler to just xlog the gid at COMMIT PREPARED time when
>> wal_level >= logical I don't think that's the end of the world. But
>> since we already have almost everything we need in memory, why not
>> just stash the gid on ReorderBufferTXN?
>>
> I have been through this thread... And to be honest, I have a hard
> time understanding for which purpose the information of a 2PC
> transaction is useful in the case of logical decoding. The prepare and
> commit prepared have been received by a node which is at the root of
> the cluster tree, a node of the cluster at an upper level, or a
> client, being in charge of issuing all the prepare queries, and then
> issue the commit prepared to finish the transaction across a cluster.
> In short, even if you do logical decoding from the root node, or the
> one at a higher level, you would care just about the fact that it has
> been committed.
>

 in any state. So there three records in the WAL: PREPARE, PRECOMMIT,
COMMIT_PREPARED and
recovery can involve either all of them, either PRECOMMIT+COMMIT_PREPARED
either just COMMIT_PREPARED.


That's your modified Pg though.

This 2pc logical decoding patch proposal is for core and I think it just
confused things to introduce discussion of unrelated changes made by your
product to the codebase.





-- 
Konstantin Knizhnik

Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-01-31 Thread Vitaly Burovoy
Hello,

I've reviewed the patch[1].

Result of testing:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed


The patch introduce a new type macaddr8 for EUI-64 addresses[2]
(assuming OUI field is 24 bits wide) with EUI-48 (existing "macaddr"
type) interoperability.
It is a mostly copy-pasted macaddr implementation with necessary
changes for increased range.
Consensus was reached on such implementation in the current thread before.

There are two patch files for convenient reviewing: base macaddr8
implementation and its supporting in btree-gin and btree-gist indexes.

The patch:
* cleanly applies to the current master
(6af8b89adba16f97bee0d3b01256861e10d0e4f1);
* passes tests;
* looks fine, follows the PostgreSQL style guide;
* has documentation changes;
* has tests.

All notes and requirements were took into account and the patch was
changed according to them.
I have no suggestions on improving it.

The new status of this patch is: Ready for Committer

P.S.:
1. The doc and error/hint messages should be proof-read by a native speaker.
2. A committer should bump catversion. It is not bumped in the patch
because it is unclear when it is committed.

[1]https://postgr.es/m/CAJrrPGeT8zrGPMcRVk_wRvYD-ETcgUz6WRrc2C=inubmrkr...@mail.gmail.com
[2]http://standards.ieee.org/develop/regauth/tut/eui64.pdf
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Robert Haas
On Tue, Jan 31, 2017 at 1:06 PM, Emre Hasegeli  wrote:
> This is what he wrote:
>
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases, because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
>
> I re-implemented "is point on line" operator on my patch so that it
> would, at least, work on the most trivial cases.

Right.  So you and he do not agree on the goal.  He wants it to work
in MORE than the most trivial cases, and you wrote it so that it will
work in AT LEAST the most trivial cases.  Those are not the same.

> I listed the problems I am trying to solve in here:
>
> https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

Those problems boil down to "let's get rid of the tolerance
completely", and there's no consensus on that being a good thing to
do.  In particular, I can't remember anybody but you being
unequivocally in favor of it.

> We couldn't agree on how we should treat on floating point numbers.  I
> think we should threat them just like the "float" datatype.

Got it, but if other people don't agree then this is going nowhere.

Personally, I've got my doubts about how sane it is to make anything
work here with the built-in floating point types.  The CPU is going to
do some kind of black magic, which may differ between machines, and
good luck getting consistent semantics that work everywhere.  For
example it seems entirely plausible (with some implementation) that
you might find that (1, 2) is on the line from (0, 0) to (2, 4) but
that (1, 3) is not on the line from (0, 0) to (2, 6) because the slope
1/2 can be represented exactly in base 2 and the slope 1/3 cannot.  If
you try to fix that kind of thing by introducing a tolerance, then you
might get it wrong when the line goes from (0, 0) to (10,
21).  Now, if we represented values using home-grown
arbitrary-precision arithmetic - like numeric does - then it might be
possible to get such cases right, especially if you supported exact
rather than approximate representations of fractional quantities.
Otherwise, I think there are inevitably going to be artifacts, and
tinkering with this is just changing which artifacts we get without
really solving the problem.  But I am notorious for my low opinion of
floating-point arithmetic; maybe somebody who understands it better or
likes it more can come up with something that's more clearly an
improvement.  Regardless, we have to have an agreement in order to
commit anything here, and I don't see one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-31 Thread Tom Lane
Corey Huinker  writes:
> On Tue, Jan 31, 2017 at 1:04 AM, Fabien COELHO  wrote:
>> The ParseVariableBool function has been updated, and the new version is
>> much cleaner, including all fixes that I suggested in your copy, so you can
>> use it in your patch.

> I see there's still a lot of activity in the thread, I can't tell if it's
> directly related to ParseVariableBool() or in the way it is called. Should
> I wait for the dust to settle over there?

I think ParseVariableBool is only likely to change to reject a NULL value
rather than silently interpreting it as FALSE, which is the way it is in
HEAD right now.  That behavior is a leftover hack, really, and moving the
treatment of unset values upstream seems a lot cleaner.  See my draft
patch at
https://www.postgresql.org/message-id/30629.1485881...@sss.pgh.pa.us

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Tom Lane
"Daniel Verite"  writes:
> I notice that in the commited patch, you added the ability
> for DeleteVariable() to reject the deletion if the hook
> disagrees. 

Right.

> But this can't happen in practice because as mentioned just upthread
> the hook called with NULL doesn't know if the variable is getting unset
> or initialized, so rejecting on NULL is not an option.

It would have required the caller to set a value before installing the
hook, which would require some reshuffling of responsibility.  In the
draft patch I sent a little bit ago, this is handled by installing a
"substitution hook" first, and that hook transmogrifies NULL into an
allowed setting.  That gets to the same place in a slightly different
way, but it also covers allowing "\unset FOO", which inserting initial
values wouldn't.

> Attached is a proposed patch to add initial values to
> SetVariableAssignHook() to solve this problem, and an example for
> \unset AUTOCOMMIT being denied by the hook.

I think disallowing \unset is a nonstarter on compatibility grounds.
We should allow \unset but treat it like setting to "off" (or whatever
the default value is).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Emre Hasegeli
> Backing up a bit here, have we lost track of the problem that we're
> trying to solve?  Tom gave his opinion here:
>
> https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us
>
> But I don't see that the latest patch I can find does anything to fix
> that.

This is what he wrote:

> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I re-implemented "is point on line" operator on my patch so that it
would, at least, work on the most trivial cases.  This is not very
nice, but it is predictable.  I have tried to prevent it cascade to
larger problems like object-containment tests failing unexpectedly.  I
have gone through all of the geometric operations and re-implemented
the ones with similar problems, too.  It is no work at all compared to
discussions we have to have :-).

My initial idea was to keep the fuzzy behaviour for some operators
like "is point on line", but the more I get into it the less value I
see in doing so.  The same family operators like "is point on line" is
currently badly broken:

> postgres=# select '(1,0)'::point ## '{1,2,0}'::line;
> ?column?
> --
> (2,2)
> (1 row)

This makes me wonder if anybody is ever using those operators.  In the
end, I don't care about those operators.  They are unlikely to be
supported by indexes.  I can simplify my patch to leave them as they
are.

> Now maybe that's not the problem that Emre is trying to solve,
> but then it is not very clear to me what problem he IS trying to
> solve.  And I think Kyotaro Horiguchi is trying to solve yet another
> problem which is again different.  So IMHO the first task here is to
> agree on a clear statement of what we'd like to fix, and then, given a
> patch, we can judge whether it's fixed.

I listed the problems I am trying to solve in here:

https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

> Maybe I'm being dumb here and it's clear to you guys what the issues
> under discussion are.  If so, apologies for that, but the thread has
> gotten too theoretical for me and I can't figure out what the
> top-level concern is any more.  I believe we all agree these macros
> are bad, but there seems to be no agreement that I can discern on what
> would be better or why.

We couldn't agree on how we should treat on floating point numbers.  I
think we should threat them just like the "float" datatype.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-01-31 Thread Robert Haas
On Mon, Jan 23, 2017 at 7:00 AM, Nikhil Sontakke
 wrote:
> 4) Minor nit-pick on existing code.
>
> (errmsg_plural("%u two-phase state file was written "
>   "for
> long-running prepared transactions",
>   "%u
> two-phase state files were written "
>   "for
> long-running prepared transactions",
>   serialized_xacts,
>   serialized_xacts)
>
> Shouldn’t the singular part of the message above be:
> "%u two-phase state file was written for a long-running prepared transaction"
>
> But, then, English is not my native language, so I might be off here :-)

If there's one file per long-running prepared transaction, which I
think is true, then I agree with you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-31 Thread Corey Huinker
>
>
> Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
> used a TupleDesc, a default expression list, and a relation name. You
> could as well make NextCopyFrom() smarter so as it does nothing if no
> expression contexts are given by the caller, which is the case of your
> function here. To be honest, I find a bit weird to use
> NextCopyFromRawFields when there is already a bunch of sanity checks
> happening in NextCopyFrom(), where you surely want to have them
> checked even for your function.
>
> Looking at the last patch proposed, I find the current patch proposed
> as immature, as rsTupDesc basically overlaps with the relation a
> caller can define in BeginCopyFrom(), so marking this patch as
> "returned with feedback".
> --
> Michael
>

I like that suggestion and will move forward on it. I wasn't sure there
would be support for such a refactoring.

As for the copy from stdin case, Andrew Gierth laid out why that's nearly
impossible to unwind in the network protocol (the act of starting the copy
causes the query result to end before any rows were returned), and that
might make STDIN input a dead issue.


Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-31 Thread Corey Huinker
On Tue, Jan 31, 2017 at 1:04 AM, Fabien COELHO  wrote:

>
> This is code lifted from variable.c's ParseVariableBool().  When the other
>>> patch for "psql hooks" is committed (the one that detects when the string
>>> wasn't a valid boolean), this code will go away and we'll just use
>>> ParseVariableBool() again.
>>>
>>
> The ParseVariableBool function has been updated, and the new version is
> much cleaner, including all fixes that I suggested in your copy, so you can
> use it in your patch.
>
> --
> Fabien.
>

I see there's still a lot of activity in the thread, I can't tell if it's
directly related to ParseVariableBool() or in the way it is called. Should
I wait for the dust to settle over there?


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Daniel Verite
I wrote:

> This would allow the hook to distinguish between initialization and
> unsetting, which in turn will allow it to deny the \unset in the
> cases when it doesn't make any sense conceptually (like AUTOCOMMIT).

I notice that in the commited patch, you added the ability
for DeleteVariable() to reject the deletion if the hook
disagrees. 
+   {
+   /* Allow deletion only if hook is okay with
NULL value */
+   if (!(*current->assign_hook) (NULL))
+   return false;   /* message
printed by hook */

But this can't happen in practice because as mentioned just upthread
the hook called with NULL doesn't know if the variable is getting unset
or initialized, so rejecting on NULL is not an option.

Attached is a proposed patch to add initial values to
SetVariableAssignHook() to solve this problem, and an example for
\unset AUTOCOMMIT being denied by the hook.

As a side effect, we see all buit-in variables when issuing \set
rather than just a few.

Does it make sense?

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 0574b5b..631b3f6 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -166,14 +166,6 @@ main(int argc, char *argv[])
 
SetVariable(pset.vars, "VERSION", PG_VERSION_STR);
 
-   /* Default values for variables */
-   SetVariableBool(pset.vars, "AUTOCOMMIT");
-   SetVariable(pset.vars, "VERBOSITY", "default");
-   SetVariable(pset.vars, "SHOW_CONTEXT", "errors");
-   SetVariable(pset.vars, "PROMPT1", DEFAULT_PROMPT1);
-   SetVariable(pset.vars, "PROMPT2", DEFAULT_PROMPT2);
-   SetVariable(pset.vars, "PROMPT3", DEFAULT_PROMPT3);
-
parse_psql_options(argc, argv, );
 
/*
@@ -785,6 +777,11 @@ showVersion(void)
 static bool
 autocommit_hook(const char *newval)
 {
+   if (newval == NULL)
+   {
+   psql_error("built-in variable %s cannot be unset.\n", 
"AUTOCOMMIT");
+   return false;
+   }
return ParseVariableBool(newval, "AUTOCOMMIT", );
 }
 
@@ -1002,20 +999,20 @@ EstablishVariableSpace(void)
 {
pset.vars = CreateVariableSpace();
 
-   SetVariableAssignHook(pset.vars, "AUTOCOMMIT", autocommit_hook);
-   SetVariableAssignHook(pset.vars, "ON_ERROR_STOP", on_error_stop_hook);
-   SetVariableAssignHook(pset.vars, "QUIET", quiet_hook);
-   SetVariableAssignHook(pset.vars, "SINGLELINE", singleline_hook);
-   SetVariableAssignHook(pset.vars, "SINGLESTEP", singlestep_hook);
-   SetVariableAssignHook(pset.vars, "FETCH_COUNT", fetch_count_hook);
-   SetVariableAssignHook(pset.vars, "ECHO", echo_hook);
-   SetVariableAssignHook(pset.vars, "ECHO_HIDDEN", echo_hidden_hook);
-   SetVariableAssignHook(pset.vars, "ON_ERROR_ROLLBACK", 
on_error_rollback_hook);
-   SetVariableAssignHook(pset.vars, "COMP_KEYWORD_CASE", 
comp_keyword_case_hook);
-   SetVariableAssignHook(pset.vars, "HISTCONTROL", histcontrol_hook);
-   SetVariableAssignHook(pset.vars, "PROMPT1", prompt1_hook);
-   SetVariableAssignHook(pset.vars, "PROMPT2", prompt2_hook);
-   SetVariableAssignHook(pset.vars, "PROMPT3", prompt3_hook);
-   SetVariableAssignHook(pset.vars, "VERBOSITY", verbosity_hook);
-   SetVariableAssignHook(pset.vars, "SHOW_CONTEXT", show_context_hook);
+   SetVariableAssignHook(pset.vars, "AUTOCOMMIT", autocommit_hook, "on");
+   SetVariableAssignHook(pset.vars, "ON_ERROR_STOP", on_error_stop_hook, 
"off");
+   SetVariableAssignHook(pset.vars, "QUIET", quiet_hook, "off");
+   SetVariableAssignHook(pset.vars, "SINGLELINE", singleline_hook, "off");
+   SetVariableAssignHook(pset.vars, "SINGLESTEP", singlestep_hook, "off");
+   SetVariableAssignHook(pset.vars, "FETCH_COUNT", fetch_count_hook, "-1");
+   SetVariableAssignHook(pset.vars, "ECHO", echo_hook, "none");
+   SetVariableAssignHook(pset.vars, "ECHO_HIDDEN", echo_hidden_hook, 
"off");
+   SetVariableAssignHook(pset.vars, "ON_ERROR_ROLLBACK", 
on_error_rollback_hook, "off");
+   SetVariableAssignHook(pset.vars, "COMP_KEYWORD_CASE", 
comp_keyword_case_hook, "preserve-upper");
+   SetVariableAssignHook(pset.vars, "HISTCONTROL", histcontrol_hook, 
"none");
+   SetVariableAssignHook(pset.vars, "PROMPT1", prompt1_hook, 
DEFAULT_PROMPT1);
+   SetVariableAssignHook(pset.vars, "PROMPT2", prompt2_hook, 
DEFAULT_PROMPT2);
+   SetVariableAssignHook(pset.vars, "PROMPT3", prompt3_hook, 
DEFAULT_PROMPT3);
+   SetVariableAssignHook(pset.vars, "VERBOSITY", verbosity_hook, 
"default");
+   SetVariableAssignHook(pset.vars, "SHOW_CONTEXT", show_context_hook, 
"errors");
 }
diff --git a/src/bin/psql/variables.c b/src/bin/psql/variables.c
index 91e4ae8..2e5c3e0 100644
--- a/src/bin/psql/variables.c
+++ 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Pavan Deolasee wrote:

> Two new APIs added.
> 
> - CatalogInsertHeapAndIndex which does a simple_heap_insert followed by
> catalog updates
> - CatalogUpdateHeapAndIndex which does a simple_heap_update followed by
> catalog updates
> 
> There are only a handful callers remain for simple_heap_insert/update after
> this patch. They are typically working with already opened indexes and
> hence I left them unchanged.

Hmm, I was thinking we would get rid of CatalogUpdateIndexes altogether.
Two of the callers are in the new routines (which I propose to rename to
CatalogTupleInsert and CatalogTupleUpdate); the only remaining one is in
InsertPgAttributeTuple.  I propose that we inline the three lines into
all those places and just remove CatalogUpdateIndexes.  Half the out-of-
core places that are using this function will be broken as soon as WARM
lands anyway.  I see no reason to keep it.  (I have already modified the
patch this way -- no need to resend).

Unless there are objections I will push this later this afternoon.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> One possible compromise that would address your concern about display
>> is to modify the hook API some more so that variable hooks could actually
>> substitute new values.  Then for example the bool-variable hooks could
>> effectively replace "\set AUTOCOMMIT" by "\set AUTOCOMMIT on" and
>> "\unset AUTOCOMMIT" by "\set AUTOCOMMIT off", ensuring that interrogation
>> of their values always produces sane results.

> Agreed, that looks like a good compromise.

Attached is a draft patch for that.  I chose to make a second hook rather
than complicate the assign hook API, mainly because it allows more code
sharing --- all the bool vars can share the same substitute hook, and
so can the three-way vars as long as "on" and "off" are the appropriate
substitutes.

I only touched the behavior for bool vars here, but if people like this
solution it could be fleshed out to apply to all the built-in variables.

Maybe we should merge SetVariableSubstituteHook and SetVariableAssignHook
into one function?

regards, tom lane

diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 0574b5b..985cfcb 100644
*** a/src/bin/psql/startup.c
--- b/src/bin/psql/startup.c
*** showVersion(void)
*** 777,787 
  
  
  /*
!  * Assign hooks for psql variables.
   *
   * This isn't an amazingly good place for them, but neither is anywhere else.
   */
  
  static bool
  autocommit_hook(const char *newval)
  {
--- 777,804 
  
  
  /*
!  * Substitute hooks and assign hooks for psql variables.
   *
   * This isn't an amazingly good place for them, but neither is anywhere else.
   */
  
+ static char *
+ bool_substitute_hook(char *newval)
+ {
+ 	if (newval == NULL)
+ 	{
+ 		/* "\unset FOO" becomes "\set FOO off" */
+ 		newval = pg_strdup("off");
+ 	}
+ 	else if (newval[0] == '\0')
+ 	{
+ 		/* "\set FOO" becomes "\set FOO on" */
+ 		pg_free(newval);
+ 		newval = pg_strdup("on");
+ 	}
+ 	return newval;
+ }
+ 
  static bool
  autocommit_hook(const char *newval)
  {
*** EstablishVariableSpace(void)
*** 1002,1015 
--- 1019,1039 
  {
  	pset.vars = CreateVariableSpace();
  
+ 	SetVariableSubstituteHook(pset.vars, "AUTOCOMMIT", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "AUTOCOMMIT", autocommit_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ON_ERROR_STOP", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ON_ERROR_STOP", on_error_stop_hook);
+ 	SetVariableSubstituteHook(pset.vars, "QUIET", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "QUIET", quiet_hook);
+ 	SetVariableSubstituteHook(pset.vars, "SINGLELINE", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "SINGLELINE", singleline_hook);
+ 	SetVariableSubstituteHook(pset.vars, "SINGLESTEP", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "SINGLESTEP", singlestep_hook);
  	SetVariableAssignHook(pset.vars, "FETCH_COUNT", fetch_count_hook);
  	SetVariableAssignHook(pset.vars, "ECHO", echo_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ECHO_HIDDEN", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ECHO_HIDDEN", echo_hidden_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ON_ERROR_ROLLBACK", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ON_ERROR_ROLLBACK", on_error_rollback_hook);
  	SetVariableAssignHook(pset.vars, "COMP_KEYWORD_CASE", comp_keyword_case_hook);
  	SetVariableAssignHook(pset.vars, "HISTCONTROL", histcontrol_hook);
diff --git a/src/bin/psql/variables.c b/src/bin/psql/variables.c
index 91e4ae8..61c4ccc 100644
*** a/src/bin/psql/variables.c
--- b/src/bin/psql/variables.c
*** CreateVariableSpace(void)
*** 52,57 
--- 52,58 
  	ptr = pg_malloc(sizeof *ptr);
  	ptr->name = NULL;
  	ptr->value = NULL;
+ 	ptr->substitute_hook = NULL;
  	ptr->assign_hook = NULL;
  	ptr->next = NULL;
  
*** ParseVariableBool(const char *value, con
*** 101,111 
  	size_t		len;
  	bool		valid = true;
  
  	if (value == NULL)
! 	{
! 		*result = false;		/* not set -> assume "off" */
! 		return valid;
! 	}
  
  	len = strlen(value);
  
--- 102,110 
  	size_t		len;
  	bool		valid = true;
  
+ 	/* Treat "unset" as an empty string, which will lead to error below */
  	if (value == NULL)
! 		value = "";
  
  	len = strlen(value);
  
*** ParseVariableNum(const char *value, cons
*** 152,159 
  	char	   *end;
  	long		numval;
  
  	if (value == NULL)
! 		return false;
  	errno = 0;
  	numval = strtol(value, , 0);
  	if (errno == 0 && *end == '\0' && end != value && numval == (int) numval)
--- 151,160 
  	char	   *end;
  	long		numval;
  
+ 	/* Treat "unset" as an empty string, which will lead to error below */
  	if (value == NULL)
! 		value = "";
! 
  	errno = 0;
  	numval = strtol(value, , 0);
  	if (errno == 0 && *end == '\0' && end != value && numval == (int) numval)
*** SetVariable(VariableSpace space, const c
*** 

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-31 Thread Mithun Cy
On Tue, Jan 31, 2017 at 9:47 PM, Robert Haas  wrote:
> Now, I assume that this patch sorts the I/O (although I haven't
> checked that) and therefore I expect that the prewarm completes really
> fast.  If that's not the case, then that's bad.  But if it is the
> case, then it's not really hurting you even if the workload changes
> completely.

-- JFYI Yes in the patch we load the sorted
.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:37 PM, Alvaro Herrera 
wrote:

> Pavan Deolasee wrote:
>
> >
> > Sounds good. Should I submit that as a separate patch on current master?
>
> Yes, please.
>
>
Attached.

Two new APIs added.

- CatalogInsertHeapAndIndex which does a simple_heap_insert followed by
catalog updates
- CatalogUpdateHeapAndIndex which does a simple_heap_update followed by
catalog updates

There are only a handful callers remain for simple_heap_insert/update after
this patch. They are typically working with already opened indexes and
hence I left them unchanged.

make check-world passes with the patch.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


catalog_update.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Robert Haas
On Thu, Jan 26, 2017 at 5:53 AM, Emre Hasegeli  wrote:
> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

As if on cue, let me say that a behavior-changing GUC sounds like a
terrible idea to me.  It's almost never good when a GUC can cause the
same queries to return answers in different sessions, and even worse,
it seems like the GUC might have the effect of letting us build
indexes that are only valid for the value of the GUC with which they
were built.

Backing up a bit here, have we lost track of the problem that we're
trying to solve?  Tom gave his opinion here:

https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us

But I don't see that the latest patch I can find does anything to fix
that.  Now maybe that's not the problem that Emre is trying to solve,
but then it is not very clear to me what problem he IS trying to
solve.  And I think Kyotaro Horiguchi is trying to solve yet another
problem which is again different.  So IMHO the first task here is to
agree on a clear statement of what we'd like to fix, and then, given a
patch, we can judge whether it's fixed.

Maybe I'm being dumb here and it's clear to you guys what the issues
under discussion are.  If so, apologies for that, but the thread has
gotten too theoretical for me and I can't figure out what the
top-level concern is any more.  I believe we all agree these macros
are bad, but there seems to be no agreement that I can discern on what
would be better or why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-31 Thread Robert Haas
On Tue, Jan 31, 2017 at 1:48 AM, Michael Paquier
 wrote:
> I partially agree with this paragraph, at least there are advantages
> to do so for cases where the data fits in shared buffers. Even for
> data sets fitting in RAM that can be an advantage as the buffers would
> get evicted from Postgres' cache but not the OS. Now for cases where
> there are many load patterns on a given database (I have some here),
> that's hard to make this thing by default on.

Well, the question even for that case is whether it really costs
anything.  My bet is that it is nearly free when it doesn't help, but
that could be wrong.  My experience running pgbench tests is that
prewarming all of pgbench_accounts on a scale factor that fits in
shared_buffers using "dd" took just a few seconds, but when accessing
the blocks in random order the cache took many minutes to heat up.
Now, I assume that this patch sorts the I/O (although I haven't
checked that) and therefore I expect that the prewarm completes really
fast.  If that's not the case, then that's bad.  But if it is the
case, then it's not really hurting you even if the workload changes
completely.

Again, I'm not really arguing for enable-by-default, but I think if
this is well-implemented the chances of it actually hurting anything
are very low, so you'll either win or it'll make no difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-31 Thread Peter Eisentraut
And here is a rebased patch for the original feature.  I think this
addresses all raised concerns and suggestions now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ce2680ef072a9a4dc2cb879a70610d71ad24 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 27 Jan 2017 23:52:53 -0500
Subject: [PATCH v5] Add CREATE SEQUENCE AS  clause

This stores a data type, required to be an integer type, with the
sequence.  The sequences min and max values default to the range
supported by the type, and they cannot be set to values exceeding that
range.  The internal implementation of the sequence is not affected.

Change the serial types to create sequences of the appropriate type.
This makes sure that the min and max values of the sequence for a serial
column match the range of values supported by the table column.  So the
sequence can no longer overflow the table column.

This also makes monitoring for sequence exhaustion/wraparound easier,
which currently requires various contortions to cross-reference the
sequences with the table columns they are used with.

This commit also effectively reverts the pg_sequence column reordering
in f3b421da5f4addc95812b9db05a24972b8fd9739, because the new seqtypid
column allows us to fill the hole in the struct and create a more
natural overall column ordering.
---
 doc/src/sgml/catalogs.sgml  |  14 +++-
 doc/src/sgml/information_schema.sgml|   4 +-
 doc/src/sgml/ref/alter_sequence.sgml|  30 ++--
 doc/src/sgml/ref/create_sequence.sgml   |  36 ++
 src/backend/catalog/information_schema.sql  |   4 +-
 src/backend/commands/sequence.c |  89 +---
 src/backend/parser/gram.y   |   6 +-
 src/backend/parser/parse_utilcmd.c  |   2 +-
 src/bin/pg_dump/pg_dump.c   | 103 +++-
 src/bin/pg_dump/t/002_pg_dump.pl|   2 +
 src/include/catalog/catversion.h|   2 +-
 src/include/catalog/pg_proc.h   |   2 +-
 src/include/catalog/pg_sequence.h   |   8 ++-
 src/test/modules/test_pg_dump/t/001_base.pl |   1 +
 src/test/regress/expected/sequence.out  |  49 +
 src/test/regress/sql/sequence.sql   |  13 
 16 files changed, 266 insertions(+), 99 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 086fafc694..f57937a17a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -5774,10 +5774,11 @@ pg_sequence Columns
  
 
  
-  seqcycle
-  bool
+  seqtypid
+  oid
+  pg_type.oid
   
-  Whether the sequence cycles
+  Data type of the sequence
  
 
  
@@ -5814,6 +5815,13 @@ pg_sequence Columns
   
   Cache size of the sequence
  
+
+ 
+  seqcycle
+  bool
+  
+  Whether the sequence cycles
+ 
 

   
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index c43e325d06..a3a19ce8ce 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -4653,9 +4653,7 @@ sequences Columns
   data_type
   character_data
   
-   The data type of the sequence.  In
-   PostgreSQL, this is currently always
-   bigint.
+   The data type of the sequence.
   
  
 
diff --git a/doc/src/sgml/ref/alter_sequence.sgml b/doc/src/sgml/ref/alter_sequence.sgml
index 3b52e875e3..252a668189 100644
--- a/doc/src/sgml/ref/alter_sequence.sgml
+++ b/doc/src/sgml/ref/alter_sequence.sgml
@@ -23,7 +23,9 @@
 
  
 
-ALTER SEQUENCE [ IF EXISTS ] name [ INCREMENT [ BY ] increment ]
+ALTER SEQUENCE [ IF EXISTS ] name
+[ AS data_type ]
+[ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
 [ START [ WITH ] start ]
 [ RESTART [ [ WITH ] restart ] ]
@@ -81,6 +83,26 @@ Parameters
  
 
  
+  data_type
+  
+   
+The optional
+clause AS data_type
+changes the data type of the sequence.  Valid types are
+are smallint, integer,
+and bigint.
+   
+
+   
+Note that changing the data type does not automatically change the
+minimum and maximum values.  You can use the clauses NO
+MINVALUE and NO MAXVALUE to adjust the
+minimum and maximum values to the range of the new data type.
+   
+  
+ 
+
+ 
   increment
   

@@ -102,7 +124,7 @@ Parameters
 class="parameter">minvalue determines
 the minimum value a sequence can generate. If NO
 MINVALUE is specified, the defaults of 1 and
--263 for ascending and descending sequences,
+the minimum value of the data type for ascending and descending sequences,
 respectively, will be used.  If neither option is 

Re: [HACKERS] multivariate statistics (v19)

2017-01-31 Thread Tomas Vondra

On 01/31/2017 07:52 AM, Amit Langote wrote:

On 2017/01/31 6:57, Tomas Vondra wrote:

On 01/30/2017 09:37 PM, Alvaro Herrera wrote:

Looks good to me.  I don't think we need to keep the names very short --
I would propose "standistinct", "stahistogram", "stadependencies".



Yeah, I got annoyed by the short names too.

This however reminds me that perhaps pg_mv_statistic is not the best name.
I know others proposed pg_statistic_ext (and pg_stats_ext), and while I
wasn't a big fan initially, I think it's a better name. People generally
don't know what 'multivariate' means, while 'extended' is better known
(e.g. because Oracle uses it for similar stuff).

So I think I'll switch to that name too.


+1 to pg_statistics_ext. Maybe, even pg_statistics_extended, however
being that verbose may not be warranted.



Yeah, I think pg_statistic_extended / pg_stats_extended seems fine.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Robert Haas
On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
 wrote:
> I've attached the patch with the modified changes. PFA.

Can this patch check contrib/bloom?

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+/*
+ * For GIN_DELETED page, the page is initialized to empty.
+ * Hence mask everything.
+ */
+if (opaque->flags & GIN_DELETED)
+memset(page_norm, MASK_MARKER, BLCKSZ);
+else
+mask_unused_space(page_norm);

If the page is initialized to empty, why do we need to mask
anything/everything?  Anyway, it doesn't seem right to mask the
GinPageOpaque structure itself. Or at least it doesn't seem right to
mask the flags word.

+/*
+ * For gist leaf pages, mask some line pointer bits, particularly
+ * those marked as used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+if (!HeapTupleHeaderXminFrozen(page_htup))
+page_htup->t_infomask |= HEAP_XACT_MASK;
+else
+page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
HEAP_XMAX_INVALID;

Comment doesn't address this logic.  Also, the "else" case shouldn't
exist at all, I think.

+/*
+ * For a speculative tuple, the content of t_ctid is conflicting
+ * between the backup page and current page. Hence, we set it
+ * to the current block number and current offset.
+ */

Why does it differ?  Is that a bug?

+/*
+ * Mask everything on a DELETED page since it will be re-initialized
+ * during replay.
+ */
+if ((maskopaq->btpo_flags & BTP_DELETED) != 0)
+{
+/* Mask Page Content */
+memset(page_norm + SizeOfPageHeaderData, MASK_MARKER,
+   BLCKSZ - SizeOfPageHeaderData);
+
+/* Mask pd_lower and pd_upper */
+memset(&((PageHeader) page_norm)->pd_lower, MASK_MARKER,
+   sizeof(uint16));
+memset(&((PageHeader) page_norm)->pd_upper, MASK_MARKER,
+   sizeof(uint16));

This isn't consistent with the GIN_DELETE case - it is more selective
about what it masks.  Probably that logic should be adapted to look
more like this.

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment (still) doesn't explain why we need to do this.

+/*
+ * During replay of a btree page split, we don't set the BTP_SPLIT_END
+ * flag of the right sibling and initialize th cycle_id to 0 for the same
+ * page.
+ */

A reference to the name of the redo function might be helpful here
(and in some other places), unless it's just ${AMNAME}_redo.  "th" is
a typo for "the".

+appendStringInfoString(buf, " (full page
image, apply)");
+else
+appendStringInfoString(buf, " (full page image)");

How about "(full page image)" and "(full page image, for WAL verification)"?

Similarly in XLogDumpDisplayRecord, I think we should assume that
"FPW" means what it has always meant, and leave that output alone.
Instead, distinguish the WAL-consistency-checking case when it
happens, e.g. "(FPW for consistency check)".

+checkConsistency(XLogReaderState *record)

How about CheckXLogConsistency()?

+ * If needs_backup is true or wal consistency check is enabled for

...or WAL checking is enabled...

+ * If WAL consistency is enabled for the resource manager of

If WAL consistency checking is enabled...

+ * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag

with the BKPIMAGE_APPLY flag

- * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
- * with all-zeroes pages up to the referenced block number.  In
- * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
+ * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag
+ * is not set for the backup block, the relation is extended with all-zeroes
+ * pages up to the referenced block number.

OK, I'm puzzled by this.  Surely we don't want the WAL consistency
checking facility to cause the relation to be extended like this.  And
I don't see why it would, because the WAL consistency checking happens
after the page changes have already been applied.  I wonder if this
hunk is in error and should be dropped.

+PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX);
+phdr->pd_prune_xid = PG_UINT32_MAX;
+phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES;
+phdr->pd_flags |= PD_ALL_VISIBLE;
+#define MASK_MARKER0xFF
(and many others)

Why do we mask by setting bits rather than clearing bits?  My
intuition would have been to zero things we want to mask, rather than
setting them 

Re: [HACKERS] patch: function xmltable

2017-01-31 Thread Pavel Stehule
2017-01-31 14:57 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2017-01-24 21:38 GMT+01:00 Alvaro Herrera :
>
> > > I think it would be good to have a more complex test case in regress --
> > > let's say there is a table with some simple XML values, then we use
> > > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > > large document, and then XMLTABLE uses that document as input document.
> >
> > I have a 16K lines long real XML 6.MB. Probably we would not to append it
> > to regress tests.
> >
> > It is really fast - original customer implementation 20min, nested our
> > xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
>
> That's great numbers, kudos for the hard work here.  That will make for
> a nice headline in pg10 PR materials.  But what I was getting at is that
> I would like to exercise a bit more of the expression handling in
> xmltable execution, to make sure it doesn't handle just string literals.
>

I'll try to write some more dynamic examples.


>
> > I have a plan to create tests based on pg_proc and CTE - if all works,
> then
> > the query must be empty
> >
> > with x as (select proname, proowner, procost, pronargs,
> > array_to_string(proargnames,',') as proargnames,
> > array_to_string(proargtypes,',') as proargtypes from pg_proc), y as
> (select
> > xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> > proargnames, proargtypes)) as proc from x), z as (select xmltable.* from
> y,
> > lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> > procost float, pronargs int, proargnames text, proargtypes text)) select
> *
> > from z except select * from x;
>
> Nice one :-)
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


  1   2   >