Re: BUG #15112: Unable to run pg_upgrade with earthdistance extension

2018-04-05 Thread Noah Misch
On Mon, Apr 02, 2018 at 04:24:02PM -0400, Bruce Momjian wrote:
> > I am not sure we can fix this without requiring people to drop and
> > recreate such indexes.  However, I am even at a loss in how to fix the
> > CREATE FUNCTION to reference a cast in the same schema as the function,
> > in this case 'public'.  We can rewrite the cast to not use :: and use a
> > function call with schema qualification. e.g. public.earth(), but how do
> > we know what schema that is in, i.e. what if the extension is loaded
> > into a schema other than public?  

The task is to convert it to being a non-relocatable extension that uses
@extschema@, like here:
https://www.postgresql.org/docs/devel/static/extend-extensions.html#EXTEND-EXTENSIONS-EXAMPLE

Keiko Oda, you can hack around this by modifying the problematic extension
function in each database:

CREATE OR REPLACE FUNCTION public.ll_to_earth(float8, float8)
RETURNS public.earth
LANGUAGE SQL
IMMUTABLE STRICT
PARALLEL SAFE
AS 'SELECT 
public.cube(public.cube(public.cube(public.earth()*cos(radians($1))*cos(radians($2))),public.earth()*cos(radians($1))*sin(radians($2))),public.earth()*sin(radians($1)))::public.earth';

There's no need to drop and recreate indexes.  However, if the table is small
and nothing depends on the index, that is an easy workaround that one can
employ right now.  (Drop before upgrade and re-create after upgrade.)

> > FYI, earthdistance is certainly not the only case of this problem.  

True; I've been expecting a report like this for contrib/xml2 especially.



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-05 Thread Craig Ringer
Summary to date:


It's worse than I thought originally, because:

- Most widely deployed kernels have cases where they don't tell you about
losing your writes at all; and
- Information about loss of writes can be masked by closing and re-opening
a file

So the checkpointer cannot trust that a successful fsync() means ... a
successful fsync().

Also, it's been reported to me off-list that anyone on the system calling
sync(2) or the sync shell command will also generally consume the write
error, causing us not to see it when we fsync(). The same is true
for /proc/sys/vm/drop_caches. I have not tested these yet.

There's some level of agreement that we should PANIC on fsync() errors, at
least on Linux, but likely everywhere. But we also now know it's
insufficient to be fully protective.

I previously though that errors=remount-ro was a sufficient safeguard. It
isn't. There doesn't seem to be anything that is, for ext3, ext4, btrfs or
xfs.

It's not clear to me yet why data_err=abort isn't sufficient in
data=ordered or data=writeback mode on ext3 or ext4, needs more digging.
(In my test tools that's:
make FSTYPE=ext4 MKFSOPTS="" MOUNTOPTS="errors=remount-ro,
data_err=abort,data=journal"
as of the current version d7fe802ec). AFAICS that's because
data_error=abort only affects data=ordered, not data=journal. If you use
data=ordered, you at least get retries of the same write failing. This post
https://lkml.org/lkml/2008/10/10/80 added the option and has some
explanation, but doesn't explain why it doesn't affect data=journal.

zfs is probably not affected by the issues, per Thomas Munro. I haven't run
my test scripts on it yet because my kernel doesn't have zfs support and
I'm prioritising the multi-process / open-and-close issues.

So far none of the FSes and options I've tried exhibit the behavour I
actually want, which is to make the fs readonly or inaccessible on I/O
error.

ENOSPC doesn't seem to be a concern during normal operation of major file
systems (ext3, ext4, btrfs, xfs) because they reserve space before
returning from write(). But if a buffered write does manage to fail due to
ENOSPC we'll definitely see the same problems. This makes ENOSPC on NFS a
potentially data corrupting condition since NFS doesn't preallocate space
before returning from write().

I think what we really need is a block-layer fix, where an I/O error flips
the block device into read-only mode, as if blockdev --setrohad
been used. Though I'd settle for a kernel panic, frankly. I don't think
anybody really wants this, but I'd rather either of those to silent data
loss.

I'm currently tweaking my test to do some close and reopen the file between
each write() and fsync(), and to support running with nfs.

I've also just found the device-mapper "flakey" driver, which looks
fantastic for simulating unreliable I/O with intermittent faults. I've been
using the "error" target in a mapping, which lets me remap some of the
device to always error, but "flakey" looks very handy for actual PostgreSQL
testing.

For the sake of Google, these are errors known to be associated with the
problem:

ext4, and ext3 mounted with ext4 driver:

[42084.327345] EXT4-fs warning (device dm-0): ext4_end_bio:323: I/O error
10 writing to inode 12 (offset 0 size 0 starting block 59393)
[42084.327352] Buffer I/O error on device dm-0, logical block 59393

xfs:

[42193.771367] XFS (dm-0): writeback error on sector 118784
[42193.784477] XFS (dm-0): writeback error on sector 118784

jfs: (nil, silence in the kernel logs)

You should also beware of "lost page write" or "lost write" errors.


Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Heikki Linnakangas
I started looking at the "Improve compactify_tuples and 
PageRepairFragmentation" patch, and set up a little performance test of 
WAL replay. I ran pgbench, scale 5, to generate about 1 GB of WAL, and 
timed how long it takes to replay that WAL. To focus purely on CPU 
overhead, I kept the data directory in /dev/shm/.


Profiling that, without any patches applied, I noticed that a lot of 
time was spent in read()s on the postmaster-death pipe, i.e. in 
PostmasterIsAlive(). We call that between *every* WAL record.


As a quick test to see how much that matters, I commented out the 
PostmasterIsAlive() call from HandleStartupProcInterrupts(). On 
unpatched master, replaying that 1 GB of WAL takes about 20 seconds on 
my laptop. Without the PostmasterIsAlive() call, 17 seconds.


That seems like an utter waste of time. I'm almost inclined to call that 
a performance bug. As a straightforward fix, I'd suggest that we call 
HandleStartupProcInterrupts() in the WAL redo loop, not on every record, 
but only e.g. every 32 records. That would make the main redo loop less 
responsive to shutdown, SIGHUP, or postmaster death, but that seems OK. 
There are also calls to HandleStartupProcInterrupts() in the various 
other loops, that wait for new WAL to arrive or recovery delay, so this 
would only affect the case where we're actively replaying records.


- Heikki
>From 596d3804c784b06f2125aa7727b82a265b08ccfb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 5 Apr 2018 10:18:01 +0300
Subject: [PATCH 1/1] Call HandleStartupProcInterrupts() less frequently in WAL
 redo.

HandleStartupProcInterrupts() calls PostmasterIsAlive(), which calls read()
on the postmaster death watch pipe. That's relatively expensive, when we do
it between every WAL record.
---
 src/backend/access/transam/xlog.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..3406c58c9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7072,6 +7072,7 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			uint32		records_replayed = 0;
 
 			InRedo = true;
 
@@ -7105,8 +7106,13 @@ StartupXLOG(void)
 }
 #endif
 
-/* Handle interrupt signals of startup process */
-HandleStartupProcInterrupts();
+/*
+ * Handle interrupt signals of startup process. This includes
+ * a call to PostmasterIsAlive(), which isn't totally free, so
+ * don't do this on every record, to avoid the overhead.
+ */
+if (records_replayed % 32 == 0)
+	HandleStartupProcInterrupts();
 
 /*
  * Pause WAL replay, if requested by a hot-standby session via
@@ -7269,6 +7275,7 @@ StartupXLOG(void)
 
 /* Remember this record as the last-applied one */
 LastRec = ReadRecPtr;
+records_replayed++;
 
 /* Allow read-only connections if we're consistent now */
 CheckRecoveryConsistency();
-- 
2.11.0



Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-04-05 Thread Amit Langote
Fuiita-san,

On 2018/04/05 15:56, Etsuro Fujita wrote:
> (2018/04/05 15:37), Amit Langote wrote:
>> I noticed that the 2nd patch (foreign-routing-fdwapi-5.patch) fails to
>> apply to copy.c:
> 
> I forgot to mention this: the second patch is created on top of the first
> patch (postgres-fdw-refactoring-5.patch) and the patch in [1] as before.

Ah, sorry I hadn't noticed that in your previous email.

Might be a good idea to attach the bug-fix patch here as well, and perhaps
add numbers to the file names like:

0001_postgres-fdw-refactoring-5.patch
0002_BUGFIX-copy-from-check-constraint-fix.patch
0003_foreign-routing-fdwapi-5.patch


Just one minor comment:

I wonder why you decided not to have the CheckValidResultRel() call and
the statement that sets ri_PartitionReadyForRouting inside the newly added
ExecInitRoutingInfo itself.  If ExecInitRoutingInfo does the last
necessary steps for a ResultRelInfo (and hence the partition) to be ready
to be used for routing, why not finish everything there.  So the changes
to ExecPrepareTupleRouting which look like this in the patch:

+if (!partrel->ri_PartitionReadyForRouting)
+{
+CheckValidResultRel(partrel, CMD_INSERT);
+
+/* Set up information needed for routing tuples to the partition */
+ExecInitRoutingInfo(mtstate, estate, proute, partrel, partidx);
+
+partrel->ri_PartitionReadyForRouting = true;
+}

will become:

+if (!partrel->ri_PartitionReadyForRouting)
+ExecInitRoutingInfo(mtstate, estate, proute, partrel, partidx);


As I see no other issues, I will mark this as Ready for Committer.

Thanks,
Amit




Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 07:01, Pavan Deolasee  wrote:

>> +/*
>> + * Given OID of the partition leaf, return the index of the leaf in the
>> + * partition hierarchy.
>> + */
>> +int
>> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
>> +{
>> +   int i;
>> +
>> +   for (i = 0; i < proute->num_partitions; i++)
>> +   {
>> +   if (proute->partition_oids[i] == partoid)
>> +   break;
>> +   }
>> +
>> +   Assert(i < proute->num_partitions);
>> +   return i;
>> +}
>>
>> Shouldn't we at least warn in a comment that this is O(N)? And document
>> that it does weird stuff if the OID isn't found?
>
>
> Yeah, added a comment. Also added a ereport(ERROR) if we do not find the
> partition. There was already an Assert, but may be ERROR is better.
>
>>
>>
>> Perhaps just introduce a PARTOID syscache?
>>
>
> Probably as a separate patch. Anything more than a handful partitions is
> anyways known to be too slow and I doubt this code will add anything
> material impact to that.

There's a few others trying to change that now, so I think we should
consider working on this now.

PARTOID syscache sounds like a good approach.

>> diff --git a/src/backend/executor/nodeMerge.c
>> b/src/backend/executor/nodeMerge.c
>> new file mode 100644
>> index 000..0e0d0795d4d
>> --- /dev/null
>> +++ b/src/backend/executor/nodeMerge.c
>> @@ -0,0 +1,575 @@
>>
>> +/*-
>> + *
>> + * nodeMerge.c
>> + *   routines to handle Merge nodes relating to the MERGE command
>>
>> Isn't this file misnamed and it should be execMerge.c?  The rest of the
>> node*.c files are for nodes that are invoked via execProcnode /
>> ExecProcNode().  This isn't an actual executor node.
>
>
> Makes sense. Done. (Now that the patch is committed, I don't know if there
> would be a rethink about changing file names. May be not, just raising that
> concern)

My review notes suggest a file called execMerge.c. I didn't spot the
filename change.

I think it's important to do that because there is no executor node
called Merge. That is especially confusing because there *is* an
executor node called MergeAppend and we want some cognitive distance
between those things.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Postgres stucks in deadlock detection

2018-04-05 Thread Konstantin Knizhnik

Hi,
Thank for your feedback.

On 04.04.2018 21:15, Andres Freund wrote:

Hi,

On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:

Several times we and our customers are suffered from the problem that
Postgres got stuck in deadlock detection.
One scenario is YCSB workload with Zipf's distribution when many clients are
trying to update the same record and compete for it's lock.
Another scenario is large number of clients performing inserts in the same
table. In this case the source of the problem is relation extension lock.
In both cases number of connection is large enough: several hundreds.

Have you ever observed that in the field? This sounds more artificial
than real to me.


Yes, it was real customer's problem. And it  is not so difficult to 
reproduce it: you just need to have system with larger number of cores 
(in our case 72 physical cores)

and spawn 1000 clients which performs normal inserts to the same table:


cat test.sql
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;

pgbench -U postgres test -c 1000 -j 36 -T 3600 -P 1 -f test.sql


With deadlock detection timeout equal to 100msec it happens immediately.
With default 1 second deadlock timeout it will take about 12 minutes 
after which the following messages can be found in the log:


2018-03-23 20:25:27.287 MSK [41469] STATEMENT: insert into test select 
i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.287 MSK [41707] LOG: process 41707 acquired 
ExclusiveLock on extension of relation 16387 of database 16384 after 
61294.863 ms
2018-03-23 20:25:27.287 MSK [41707] STATEMENT: insert into test select 
i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.303 MSK [42251] LOG: process 42251 still waiting for 
ExclusiveLock on extension of relation 16387 of database 16384 after 
1121.274 ms
2018-03-23 20:25:27.303 MSK [42251] DETAIL: Process holding the lock: 
41707. Wait queue: 41577, 41880, 41482, 41649, 42275, 42237, 42117, 
41430, 42121, 42091, 41507, 42265, 41738, 42018, 41490, 42048, 42269, 
42108, 41843, 42213, 42134, 42341, 42083, 41763, 41536, 41980, 41793, 
41878, 42065, 42152, 42193, 42022, 42103, 42354, 41849, 42120, 41520, 
41593, 42020, 42123, 42012, 42235, 42242, 41982, 41661, 41734, 42385, 
41756, 41733, 41415, 41874, 42161, 41749, 41431, 42175, 42100, 4, 
41435, 41762, 42352, 41840, 41879, 42348, 41445, 42088, 42187, 41992, 
42327, 42258, 41586, 42034, 41851, 41440, 42192, 41726, 41471, 42185, 
41607, 41844, 42016, 41868, 41397, 41932, 42343, 41545, 41690, 41549, 
41684, 42304, 42105, 41551, 41580, 41728, 42172, 41898, 41560, 41411, 
41657, 41444, 42170, 41481, 42241, 42263, 41884, 42014, 41608, 42289, 
42191, 42067, 42011, 41959, 41681, 41768, 42194, 42090, 41527, 41655, 
41638, 41458, 41552, 41446, 41403, 41666, 42021, 41614, 41542, 41588, 
41937, 42008, 42280, 42071, 41390, 41483...


At this moment TPS drops to zero and it is even not possible to login to 
Postgres.

Increasing deadlock timeout to one minute still doesn't help:

2018-04-02 17:16:59 MSK [644699] [6-1] 
db=bp,user=bp,app=[unknown],client=172.29.224.48 LOG: process 644699 
acquired ExclusiveLock on extension of relation 23554 of database 16385 
after 585248.495 ms
2018-04-02 17:16:59 MSK [629624]: [6-1] 
db=bp,user=bp,app=[unknown],client=172.29.224.64 LOG: process 629624 
acquired ExclusiveLock on extension of relation 23554 of database 16385 
after 585132.924 ms
2018-04-02 17:16:59 MSK [646297]: [6-1] 
db=bp,user=bp,app=[unknown],client=172.29.224.57 LOG: process 646297 
acquired ExclusiveLock on extension of relation 23554 of database 16385 
after 584982.438 ms
2018-04-02 17:16:59 MSK [629264]: [6-1] 
db=bp,user=bp,app=[unknown],client=172.29.224.66 LOG: process 629264 
acquired ExclusiveLock on extension of relation 23554 of database 16385 
after 584959.859 ms



And this is the backtrace of backend which blocks the systems:

#0  0x7fde34a73576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x7fde34a73668 in __new_sem_wait_slow.constprop.0 () from 
/lib64/libpthread.so.0
#2  0x00686b02 in PGSemaphoreLock (sema=0x7fcdc7748618) at pg_sema.c:310
#3  0x006ec0d4 in LWLockAcquire (lock=0x7fcdc774ee80, 
mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1233
#4  0x006e99c5 in CheckDeadLock () at proc.c:1671
#5  ProcSleep (locallock=locallock@entry=0x118a158, 
lockMethodTable=lockMethodTable@entry=0x9538a0 ) at 
proc.c:1261
#6  0x006e49ef in WaitOnLock (locallock=locallock@entry=0x118a158, 
owner=owner@entry=0x12490f0) at lock.c:1702
#7  0x006e5c0b in LockAcquireExtended 
(locktag=locktag@entry=0x7ffc5b5f7d10, lockmode=lockmode@entry=7, 
sessionLock=sessionLock@entry=0'\000', dontWait=dontWait@entry=0'\000',
reportMemoryError=reportMemoryError@entry=1'\001') at lock.c:998
#8  0x006e6221 in LockAcquire (locktag=locktag@entry=0x7ffc5b5f7d10, 
lockmode=lockmode@entry=7, sessionLock=sessionLock@entry=0'\000', 
dontWait

Re: CALL optional in PL/pgSQL

2018-04-05 Thread Heikki Linnakangas

On 27/03/18 03:00, Andrew Dunstan wrote:

On Fri, Mar 2, 2018 at 2:01 AM, Tom Lane  wrote:

I think this is an actively bad idea.  It introduces an inherent ambiguity
into the grammar; for instance

 PERFORM (2);

now has two valid interpretations.  The only way to resolve that is with
heuristics or treating a bunch more words as reserved keywords, neither of
which are appetizing.  (I didn't look to see which way Peter did it, but
his description of his patch as "not very pretty" doesn't fill me with
happiness.)  And it would likely cause headaches down the road whenever
we attempt to add new syntax to plpgsql.

I think we should reject the idea.


Well, the upside would be increased Oracle compatibility. I don't
think that's worthless.

I haven't dug deeply into it, but Peter's patch didn't look
desperately ugly to me at first glance.


I don't much like this either. The ambiguity it introduces in the 
grammar is bad. I'll mark this as rejected in the commitfest.


- Heikki



Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Amit Langote
Hi.

On 2018/04/05 0:45, Jesper Pedersen wrote:
> Hi,
> 
> On 04/04/2018 09:29 AM, David Rowley wrote:
>> Thanks for updating. I've made a pass over v49 and I didn't find very
>> much wrong with it.
>>
>> The only real bug I found was a missing IsA(rinfo->clause, Const) in
>> the pseudoconstant check inside
>> generate_partition_pruning_steps_internal.

Fixed.

>> Most of the changes are comment fixes with a few stylistic changes
>> thrown which are pretty much all there just to try to shrink the code
>> a line or two or reduce indentation.
>>
>> I feel pretty familiar with this code now and assuming the attached is
>> included I'm happy for someone else, hopefully, a committer to take a
>> look at it.

Thank you, your changes look good to me.

>> I'll leave the following notes:
>>
>> 1. Still not sure about RelOptInfo->has_default_part. This flag is
>> only looked at in generate_partition_pruning_steps. The RelOptInfo and
>> the boundinfo is available to look at, it's just that the
>> partition_bound_has_default macro is defined in partition.c rather
>> than partition.h.

Hmm, it might not be such a bad idea to bring out the
PartitionBoundInfoData into partition.h.  If we do that, we won't need the
has_default_part that the patch adds to RelOptInfo.

In the Attached v50 set, 0002 does that.

>> 2. Don't really like the new isopne variable name. It's not very
>> simple to decode, perhaps something like is_not_eq is better?

isopne does sound a bit unintelligible.  I propose op_is_ne so that it
sounds consistent with the preceding member of the struct that's called
opno.  I want to keep "ne" and not start calling it not_eq, as a few other
places use the string "ne" to refer to a similar thing, like:

/* inequality */
Datum
range_ne(PG_FUNCTION_ARGS)

Datum
timestamptz_ne_date(PG_FUNCTION_ARGS)

Since the field is local to partprune.c, I guess that it's fine as the
comment where it's defined tells what it is.

>> 3. The part of the code I'm least familiar with is
>> get_steps_using_prefix_recurse(). I admit to not having had time to
>> fully understand that and consider ways to break it.

The purpose of that code is to generate *all* needed steps to be combined
using COMBINE_INTERSECT such that the pruning will occur using the most
restrictive set of clauses in cases where the same key is referenced in
multiple restriction clauses containing non-equality operators.  So, for a
range partitioned table on (a, b):

For a query like

explain select * from foo a <= 1 and a <= 3 and b < 5 and b <= 10

Pruning steps generated to be combined with an enclosing INTERSECT step
will be as follows:

<= (1, 10)
<  (1, 5)
<= (3, 10)
<  (3, 5)

>> Marking as ready for committer.

Thank you!

> Passes check-world, and CommitFest app has been updated to reflect the
> current patch set. Trivial changes attached.

Merged these changes.  Thanks again Jesper.

Attached v50.

Thanks,
Amit




Re: chained transactions

2018-04-05 Thread Heikki Linnakangas

On 15/03/18 18:39, Alvaro Herrera wrote:

 From 517bc6d9fefdee9135857d9562f644f2984ace32 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Sun, 18 Feb 2018 09:33:53 -0500
Subject: [PATCH v1 6/8] Turn transaction_isolation into GUC enum

XXX no idea why it was done the way it was, but this seems much simpler
and apparently doesn't change any functionality.

Enums are recent -- 52a8d4f8f7e2, only 10 years old, and the commit
didn't convert all cases, leaving some for later.  Funnily enough,
default_transaction_isolation was changed afterwards by ad6bf716baa7 but
not this one ... I guess "later" is now upon us for it.


With this patch, this stops working:

set transaction_isolation='default';

Other than that, +1 on this patch. I haven't looked closely at the rest 
of the patches yet.


- Heikki



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-05 Thread Craig Ringer
On 5 April 2018 at 15:09, Craig Ringer  wrote:


> Also, it's been reported to me off-list that anyone on the system calling
> sync(2) or the sync shell command will also generally consume the write
> error, causing us not to see it when we fsync(). The same is true
> for /proc/sys/vm/drop_caches. I have not tested these yet.
>

I just confirmed this with a tweak to the test that

records the file position
close()s the fd
sync()s
open(s) the file
lseek()s back to the recorded position

This causes the test to completely ignore the I/O error, which is not
reported to it at any time.

Fair enough, really, when you look at it from the kernel's point of view.
What else can it do? Nobody has the file open. It'd have to mark the file
its self as bad somehow. But that's pretty bad for our robustness AFAICS.


> There's some level of agreement that we should PANIC on fsync() errors, at
> least on Linux, but likely everywhere. But we also now know it's
> insufficient to be fully protective.
>


If dirty writeback fails between our close() and re-open() I see the same
behaviour as with sync(). To test that I set dirty_writeback_centisecs
and dirty_expire_centisecs to 1 and added a usleep(3*100*1000) between
close() and open(). (It's still plenty slow). So sync() is a convenient way
to simulate something other than our own fsync() writing out the dirty
buffer.

If I omit the sync() then we get the error reported by fsync() once when we
re open() the file and fsync() it, because the buffers weren't written out
yet, so the error wasn't generated until we re-open()ed the file. But I
doubt that'll happen much in practice because dirty writeback will get to
it first so the error will be seen and discarded before we reopen the file
in the checkpointer.

In other words, it looks like *even with a new kernel with the error
reporting bug fixes*, if I understand how the backends and checkpointer
interact when it comes to file descriptors, we're unlikely to notice I/O
errors and fail a checkpoint. We may notice I/O errors if a backend does
its own eager writeback for large I/O operations, or if the checkpointer
fsync()s a file before the kernel's dirty writeback gets around to trying
to flush the pages that will fail.

I haven't tested anything with multiple processes / multiple FDs yet, where
we keep one fd open while writing on another.

But at this point I don't see any way to make Pg reliably detect I/O errors
and fail a checkpoint then redo and retry. To even fix this by PANICing
like I proposed originally, we need to know we have to PANIC.

AFAICS it's completely unsafe to write(), close(), open() and fsync() and
expect that the fsync() makes any promises about the write(). Which if I
read Pg's low level storage code right, makes it completely unable to
reliably detect I/O errors.

When put it that way, it sounds fair enough too. How long is the kernel
meant to remember that there was a write error on the file triggered by a
write initiated by some seemingly unrelated process, some unbounded time
ago, on a since-closed file?

But it seems to put Pg on the fast track to O_DIRECT.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] logical decoding of two-phase transactions

2018-04-05 Thread Tomas Vondra


On 4/5/18 8:50 AM, Nikhil Sontakke wrote:
> Hi Tomas,
> 
>> 1) There's a race condition in LogicalLockTransaction. The code does
>> roughly this:
>>
>>   if (!BecomeDecodeGroupMember(...))
>>  ... bail out ...
>>
>>   Assert(MyProc->decodeGroupLeader);
>>   lwlock = LockHashPartitionLockByProc(MyProc->decodeGroupLeader);
>>   ...
>>
>> but AFAICS there is no guarantee that the transaction does not commit
>> (or even abort) right after the become decode group member. In which
>> case LogicalDecodeRemoveTransaction might have already reset our pointer
>> to a leader to NULL. In which case the Assert() and lock will fail.
>>
>> I've initially thought this can be fixed by setting decodeLocked=true in
>> BecomeDecodeGroupMember, but that's not really true - that would fix the
>> race for aborts, but not commits. LogicalDecodeRemoveTransaction skips
>> the wait for commits entirely, and just resets the flags anyway.
>>
>> So this needs a different fix, I think. BecomeDecodeGroupMember also
>> needs the leader PGPROC pointer, but it does not have the issue because
>> it gets it as a parameter. I think the same thing would work for here
>> too - that is, use the AssignDecodeGroupLeader() result instead.
>>
> 
> That's a good catch. One of the earlier patches had a check for this
> (it also had an ill-placed assert above though) which we removed as
> part of the ongoing review.
> 
> Instead of doing the above, we can just re-check if the
> decodeGroupLeader pointer has become NULL and if so, re-assert that
> the leader has indeed gone away before returning false. I propose a
> diff like below.
> 
> /*
> 
>  * If we were able to add ourself, then Abort processing will
> 
> -* interlock with us.
> 
> +* interlock with us. If the leader was done in the meanwhile
> 
> +* it could have removed us and gone away as well.
> 
>  */
> 
> -   Assert(MyProc->decodeGroupLeader);
> 
> +   if (MyProc->decodeGroupLeader == NULL)
> 
> +   {
> 
> +   Assert(BackendXidGetProc(txn->xid) == NULL);
> 
> +   return false
> 
> +   }
> 
> 

Uh? Simply rechecking if MyProc->decodeGroupLeader is NULL obviously
does not fix the race condition - it might get NULL right after the
check. So we need to either lookup the PROC again (and then get the
associated lwlock), or hold some other type of lock.


>>
>> 3) I don't quite understand why BecomeDecodeGroupMember does the
>> cross-check using PID. In which case would it help?
>>
> 
> When I wrote this support, I had written it with the intention of
> supporting both 2PC (in which case pid is 0) and in-progress regular
> transactions. That's why the presence of PID in these functions. The
> current use case is just for 2PC, so we could remove it.
> 

Sure, but why do we need to cross-check the PID at all? I may be missing
something here, but I don't see what does this protect against?

> 
>>
>> 5) ReorderBufferCommitInternal does elog(LOG) about interrupting the
>> decoding of aborted transaction only in one place. There are about three
>> other places where we check LogicalLockTransaction. Seems inconsistent.
>>
> 
> Note that I have added it for the OPTIONAL test_decoding test cases
> (which AFAIK we don't plan to commit in that state) which demonstrate
> concurrent rollback interlocking with the lock/unlock APIs. The first
> ELOG was enough to catch the interaction. If we think these elogs
> should be present in the code, then yes, I can add it elsewhere as
> well as part of an earlier patch.
> 

Ah, I see. Makes sense. I've been looking at the patch as a whole and
haven't realized it's part of this piece.

> 
> Ok, I am looking at your provided patch and incorporating relevant
> changes from it. WIll submit a patch set soon.
> 

OK.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra 
wrote:

> On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> > On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander  > > wrote:
> >
> > On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> > mailto:tomas.von...@2ndquadrant.com>>
> > wrote:
>> > And if you try this with a temporary table (not
> > hidden in transaction,
> > > > so the bgworker can see it), the worker will fail
> > with this:
> > > >
> > > >   ERROR:  cannot access temporary tables of other
> > sessions
> > > >
> > > > But of course, this is just another way how to crash
> > without updating
> > > > the result for the launcher, so checksums may end up
> > being enabled
> > > > anyway.
> > > >
> > > >
> > > > Yeah, there will be plenty of side-effect issues from
> that
> > > > crash-with-wrong-status case. Fixing that will at least
> > make things
> > > > safer -- in that checksums won't be enabled when not put
> > on all pages.
> > > >
> > >
> > > Sure, the outcome with checksums enabled incorrectly is a
> > consequence of
> > > bogus status, and fixing that will prevent that. But that
> > wasn't my main
> > > point here - not articulated very clearly, though.
> > >
> > > The bigger question is how to handle temporary tables
> > gracefully, so
> > > that it does not terminate the bgworker like this at all.
> > This might be
> > > even bigger issue than dropped relations, considering that
> > temporary
> > > tables are pretty common part of applications (and it also
> > includes
> > > CREATE/DROP).
> > >
> > > For some clusters it might mean the online checksum
> > enabling would
> > > crash+restart infinitely (well, until reaching
> MAX_ATTEMPTS).
> > >
> > > Unfortunately, try_relation_open() won't fix this, as the
> > error comes
> > > from ReadBufferExtended. And it's not a matter of simply
> > creating a
> > > ReadBuffer variant without that error check, because
> > temporary tables
> > > use local buffers.
> > >
> > > I wonder if we could just go and set the checksums anyway,
> > ignoring the
> > > local buffers. If the other session does some changes,
> > it'll overwrite
> > > our changes, this time with the correct checksums. But it
> > seems pretty
> > > dangerous (I mean, what if they're writing stuff while
> > we're updating
> > > the checksums? Considering the various short-cuts for
> > temporary tables,
> > > I suspect that would be a boon for race conditions.)
> > >
> > > Another option would be to do something similar to running
> > transactions,
> > > i.e. wait until all temporary tables (that we've seen at
> > the beginning)
> > > disappear. But we're starting to wait on more and more
> stuff.
> > >
> > > If we do this, we should clearly log which backends we're
> > waiting for,
> > > so that the admins can go and interrupt them manually.
> > >
> > >
> > >
> > > Yeah, waiting for all transactions at the beginning is pretty
> > simple.
> > >
> > > Making the worker simply ignore temporary tables would also be
> > easy.
> > >
> > > One of the bigger issues here is temporary tables are
> > *session* scope
> > > and not transaction, so we'd actually need the other session
> > to finish,
> > > not just the transaction.
> > >
> > > I guess what we could do is something like this:
> > >
> > > 1. Don't process temporary tables in the checksumworker,
> period.
> > > Instead, build a list of any temporary tables that existed
> > when the
> > > worker started in this particular database (basically anything
> > that we
> > > got in our scan). Once we have processed the complete
> > database, keep
> > > re-scanning pg_class until those particular tables are gone
> > (search by oid).
> > >
> > > That means that any temporary tables that are created *while*
> > we are
> > > processing a database are ignored, but they should already be
> > receiving
> > > checksums.
> > >
> > > It definitely leads to a potential issue with long running
> > temp tables.
> > 

Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-05 Thread Kyotaro HORIGUCHI
Hello.

The commit leaves three warnings for
-Wunused-but-set-variable. Two of them are not assertion-only but
really not used at all.

I also found that nodeMerge.c has one such variable.

regards.

At Thu, 5 Apr 2018 15:43:55 +0900, Masahiko Sawada  
wrote in 
> On Thu, Apr 5, 2018 at 2:40 PM, Masahiko Sawada  wrote:
> > On Thu, Apr 5, 2018 at 1:30 AM, Teodor Sigaev  wrote:
> >> Thanks for everyone, pushed with minor editorization
> >>
> >
> > Thank you for committing!
> > I found a typo in nbtpage.c and attached a patch fixes it.
> >
> 
> I also found an incorrect documentation in create_index.sgml as follows.
> 
>  vacuum_cleanup_index_scale_factor
>  
>  
>   Per-table value for  linkend="guc-vacuum-cleanup-index-scale-factor"/>.
>  
>  
> 
> 
> I think it should be "Per-index". Attached a patch for fixing it. And
> sorry for missing it at review.
> 
> Regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 505a67e6ed..b920d66731 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -85,7 +85,7 @@ void
 _bt_upgrademetapage(Page page)
 {
 	BTMetaPageData *metad;
-	BTPageOpaque metaopaque;
+	BTPageOpaque metaopaque PG_USED_FOR_ASSERTS_ONLY;
 
 	metad = BTPageGetMeta(page);
 	metaopaque = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -118,7 +118,6 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
 {
 	Buffer			metabuf;
 	Page			metapg;
-	BTPageOpaque	metaopaque;
 	BTMetaPageData *metad;
 	bool			needsRewrite = false;
 	XLogRecPtr		recptr;
@@ -126,7 +125,6 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId oldestBtpoXact,
 	/* read the metapage and check if it needs rewrite */
 	metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
 	metapg = BufferGetPage(metabuf);
-	metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
 	metad = BTPageGetMeta(metapg);
 
 	/* outdated version of metapage always needs rewrite */
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 06badc90ba..66a66f2dad 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -786,13 +786,11 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
 {
 	Buffer			metabuf;
 	Page			metapg;
-	BTPageOpaque	metaopaque;
 	BTMetaPageData *metad;
 	bool			result = false;
 
 	metabuf = _bt_getbuf(info->index, BTREE_METAPAGE, BT_READ);
 	metapg = BufferGetPage(metabuf);
-	metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
 	metad = BTPageGetMeta(metapg);
 
 	if (metad->btm_version < BTREE_VERSION)
diff --git a/src/backend/executor/nodeMerge.c b/src/backend/executor/nodeMerge.c
index 0e0d0795d4..b21e69903d 100644
--- a/src/backend/executor/nodeMerge.c
+++ b/src/backend/executor/nodeMerge.c
@@ -485,7 +485,7 @@ ExecMerge(ModifyTableState *mtstate, EState *estate, TupleTableSlot *slot,
 	ItemPointer tupleid;
 	ItemPointerData tuple_ctid;
 	bool		matched = false;
-	char		relkind;
+	char		relkind PG_USED_FOR_ASSERTS_ONLY;
 	Datum		datum;
 	bool		isNull;
 


Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-04-05 Thread Amit Langote
On 2018/04/05 16:31, Amit Langote wrote:
> Fuiita-san,

Oops, sorry about misspelling your name here, Fujita-san.

- Amit




Re: Online enabling of checksums

2018-04-05 Thread Tomas Vondra


On 4/5/18 11:07 AM, Magnus Hagander wrote:
> 
> 
> On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
>
> ...
>
> It however still fails to initialize the attempts field after allocating
> the db entry in BuildDatabaseList, so if you try running with
> -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
> 
>  WARNING:  attempts = -1684366952
>  WARNING:  attempts = 1010514489
>  WARNING:  attempts = -1145390664
>  WARNING:  attempts = 1162101570
> 
> I guess those are not the droids we're looking for?
> 
> 
> When looking at that and after a quick discussion, we just decided it's
> better to completely remove the retry logic. As you mentioned in some
> earlier mail, we had all this logic to retry databases (unlikely) but
> not relations (likely). Attached patch simplifies it to only detect the
> "database was dropped" case (which is fine), and consider every other
> kind of failure a permanent one and just not turn on checksums in those
> cases.
> 

OK, works for me.

> 
> 
> Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
> initialized. I think it only ever gets set in launcher_exit(), but that
> does not seem sufficient. I suspect it's the reason for this behavior:
> 
> 
> It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
> (The whole memset-to-zero)
> 

OK, seems fine now.

> 
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  database "template0" does not allow connections
>     HINT:  Allow connections using ALTER DATABASE and try again.
>     test=# alter database template0 allow_connections = true;
>     ALTER DATABASE
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: already running
>     test=# select pg_disable_data_checksums();
>      pg_disable_data_checksums
>     ---
> 
>     (1 row)
> 
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: has been cancelled
> 
> 
> Turns out that wasn't the problem. The problem was that we *set* it
> before erroring out with the "does not allow connections", but never
> cleared it. In that case, it would be listed as launcher-is-running even
> though the launcher was never started.
> 
> Basically the check for datallowconn was put in the wrong place. That
> check should go away completely once we merge (because we should also
> merge the part that allows us to bypass it), but for now I have moved
> the check to the correct place.
> 

Ah, OK. I was just guessing.

> 
> 
> Attached is a diff with a couple of minor comment tweaks, and correct
> initialization of the attempts field.
> 
> 
> Thanks. This is included in the attached update, along with the above
> fixes and some other small touches from Daniel. 
> 

This patch version seems fine to me. I'm inclined to mark it RFC.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: New files for MERGE

2018-04-05 Thread Robert Haas
On Wed, Apr 4, 2018 at 3:09 PM, Tom Lane  wrote:
> Well, what's on the table is reverting this patch and asking you to try
> again in the v12 cycle.  Given Andres' concerns about the executor design,
> and mine about the way the parsing end is built, there's certainly no way
> that that's all getting fixed by Saturday.  Given pretty much everybody's
> unhappiness with the way this patch was forced through at the last minute,
> I do not think you should expect that we'll say, "okay, we'll let you ship
> a bad version of MERGE because there's no more time in this cycle".

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: New files for MERGE

2018-04-05 Thread Simon Riggs
On 4 April 2018 at 21:28, Simon Riggs  wrote:
> On 4 April 2018 at 21:14, Andres Freund  wrote:
>
>>> The normal way is to make review comments that allow change. Your
>>> request for change of the parser data structures is fine and can be
>>> done, possibly by Saturday
>>
>> I did request changes, and you've so far ignored those requests.
>
> Pavan tells me he has replied to you and is working on specific changes.

Specific changes requested have now been implemented by Pavan and
committed by me.

My understanding is that he is working on a patch for Tom's requested
parser changes, will post on other thread.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-04-05 Thread Teodor Sigaev
Thanks to everyone, fixes are pushed except nodeMerge.c, I don't wish to 
increase entropy around MERGE patch :)


Kyotaro HORIGUCHI wrote:

Hello.

The commit leaves three warnings for
-Wunused-but-set-variable. Two of them are not assertion-only but
really not used at all.

I also found that nodeMerge.c has one such variable.

regards.

At Thu, 5 Apr 2018 15:43:55 +0900, Masahiko Sawada  wrote in 


On Thu, Apr 5, 2018 at 2:40 PM, Masahiko Sawada  wrote:

On Thu, Apr 5, 2018 at 1:30 AM, Teodor Sigaev  wrote:

Thanks for everyone, pushed with minor editorization



Thank you for committing!
I found a typo in nbtpage.c and attached a patch fixes it.



I also found an incorrect documentation in create_index.sgml as follows.

  vacuum_cleanup_index_scale_factor
  
  
   Per-table value for .
  
  
 

I think it should be "Per-index". Attached a patch for fixing it. And
sorry for missing it at review.

Regards,




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] Add support for tuple routing to foreign partitions

2018-04-05 Thread Etsuro Fujita

(2018/04/05 16:31), Amit Langote wrote:

Might be a good idea to attach the bug-fix patch here as well, and perhaps
add numbers to the file names like:

0001_postgres-fdw-refactoring-5.patch
0002_BUGFIX-copy-from-check-constraint-fix.patch
0003_foreign-routing-fdwapi-5.patch


OK


Just one minor comment:

I wonder why you decided not to have the CheckValidResultRel() call and
the statement that sets ri_PartitionReadyForRouting inside the newly added
ExecInitRoutingInfo itself.  If ExecInitRoutingInfo does the last
necessary steps for a ResultRelInfo (and hence the partition) to be ready
to be used for routing, why not finish everything there.  So the changes
to ExecPrepareTupleRouting which look like this in the patch:

+if (!partrel->ri_PartitionReadyForRouting)
+{
+CheckValidResultRel(partrel, CMD_INSERT);
+
+/* Set up information needed for routing tuples to the partition */
+ExecInitRoutingInfo(mtstate, estate, proute, partrel, partidx);
+
+partrel->ri_PartitionReadyForRouting = true;
+}

will become:

+if (!partrel->ri_PartitionReadyForRouting)
+ExecInitRoutingInfo(mtstate, estate, proute, partrel, partidx);


Good idea!  Modified that way.


As I see no other issues, I will mark this as Ready for Committer.


Thanks!

Attached is an updated version of the patch set plus the patch in [1]. 
Patch 0003_foreign-routing-fdwapi-6.patch can be applied on top of patch 
0001_postgres-fdw-refactoring-6.patch and 
0002_copy-from-check-constraint-fix.patch.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/5aba4074.1090...@lab.ntt.co.jp
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 376,387  static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
--- 376,396 
  static void create_cursor(ForeignScanState *node);
  static void fetch_more_data(ForeignScanState *node);
  static void close_cursor(PGconn *conn, unsigned int cursor_number);
+ static PgFdwModifyState *create_foreign_modify(EState *estate,
+ 	  ResultRelInfo *resultRelInfo,
+ 	  CmdType operation,
+ 	  Plan *subplan,
+ 	  char *query,
+ 	  List *target_attrs,
+ 	  bool has_returning,
+ 	  List *retrieved_attrs);
  static void prepare_foreign_modify(PgFdwModifyState *fmstate);
  static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
  		 ItemPointer tupleid,
  		 TupleTableSlot *slot);
  static void store_returning_result(PgFdwModifyState *fmstate,
  	   TupleTableSlot *slot, PGresult *res);
+ static void finish_foreign_modify(PgFdwModifyState *fmstate);
  static List *build_remote_returning(Index rtindex, Relation rel,
  	   List *returningList);
  static void rebuild_fdw_scan_tlist(ForeignScan *fscan, List *tlist);
***
*** 1681,1698  postgresBeginForeignModify(ModifyTableState *mtstate,
  		   int eflags)
  {
  	PgFdwModifyState *fmstate;
! 	EState	   *estate = mtstate->ps.state;
! 	CmdType		operation = mtstate->operation;
! 	Relation	rel = resultRelInfo->ri_RelationDesc;
! 	RangeTblEntry *rte;
! 	Oid			userid;
! 	ForeignTable *table;
! 	UserMapping *user;
! 	AttrNumber	n_params;
! 	Oid			typefnoid;
! 	bool		isvarlena;
! 	ListCell   *lc;
! 	TupleDesc	tupdesc = RelationGetDescr(rel);
  
  	/*
  	 * Do nothing in EXPLAIN (no ANALYZE) case.  resultRelInfo->ri_FdwState
--- 1690,1699 
  		   int eflags)
  {
  	PgFdwModifyState *fmstate;
! 	char	   *query;
! 	List	   *target_attrs;
! 	bool		has_returning;
! 	List	   *retrieved_attrs;
  
  	/*
  	 * Do nothing in EXPLAIN (no ANALYZE) case.  resultRelInfo->ri_FdwState
***
*** 1701,1782  postgresBeginForeignModify(ModifyTableState *mtstate,
  	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
  		return;
  
- 	/* Begin constructing PgFdwModifyState. */
- 	fmstate = (PgFdwModifyState *) palloc0(sizeof(PgFdwModifyState));
- 	fmstate->rel = rel;
- 
- 	/*
- 	 * Identify which user to do the remote access as.  This should match what
- 	 * ExecCheckRTEPerms() does.
- 	 */
- 	rte = rt_fetch(resultRelInfo->ri_RangeTableIndex, estate->es_range_table);
- 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
- 
- 	/* Get info about foreign table. */
- 	table = GetForeignTable(RelationGetRelid(rel));
- 	user = GetUserMapping(userid, table->serverid);
- 
- 	/* Open connection; report that we'll create a prepared statement. */
- 	fmstate->conn = GetConnection(user, true);
- 	fmstate->p_name = NULL;		/* prepared statement not made yet */
- 
  	/* Deconstruct fdw_private data. */
! 	fmstate->query = strVal(list_nth(fdw_private,
! 	 FdwModifyPrivateUpdateSql));
! 	fmstate->target_attrs = (List *) list_nth(fdw_private,
! 			  FdwModifyPrivateTargetAttnums);
! 	fmstate->has_returning = intVal(list_nth(fdw_private,
! 			 FdwModifyPrivateHasReturning));
! 	fmstate->retrieved_attrs = (List *) list_nth(fdw_private,
!  FdwModifyPrivateRetriev

Re: Add support for printing/reading MergeAction nodes

2018-04-05 Thread Pavan Deolasee
On Wed, Apr 4, 2018 at 11:21 PM, Tom Lane  wrote:

> Simon Riggs  writes:
> > On 4 April 2018 at 17:19, Tom Lane  wrote:
> >> If the MERGE patch has broken this, I'm going to push back on that
> >> and push back on it hard, because it probably means there are a
> >> whole bunch of other raw-grammar-output-only node types that can
> >> now get past the parser stage as well, as children of these nodes.
> >> The answer to that is not to add a ton of new infrastructure, it's
> >> "you did MERGE wrong".
>
> > MERGE contains multiple actions of Insert, Update and Delete and these
> > could be output in various debug modes. I'm not clear what meaning we
> > might attach to them if we looked since that differs from normal
> > INSERTs, UPDATEs, DELETEs, but lets see.
>
> What I'm complaining about is that that's a very poorly designed parsetree
> representation.  It may not be the worst one I've ever seen, but it's
> got claims in that direction.  You're repurposing InsertStmt et al to
> do something that's *not* an INSERT statement, but by chance happens
> to share some (not all) of the same fields.  This is bad because it
> invites confusion, and then bugs of commission or omission (eg, assuming
> that some particular processing has happened or not happened to subtrees
> of that parse node).  The most egregious way in which it's a bad idea
> is that, as I said, InsertStmt doesn't even exist in post-parse-analysis
> trees so far as the normal type of INSERT is concerned.  This just opens
> a whole batch of ways to screw up.  We have some types of raw parse nodes
> that are replaced entirely during parse analysis, and we have some others
> where it's convenient to use the same node type before and after parse
> analysis, but we don't have any that are one way in one context and the
> other way in other contexts.  And this is not the place to start.
>
> I think you'd have been better off to fold all of those fields into
> MergeAction, or perhaps make several variants of MergeAction.
>
>
Hi Tom,

Thanks for the review comments.

Attached patch refactors the grammar/parser side per your comments. We no
longer use InsertStmt/UpdateStmt/DeleteStmt/SelectStmt as part of
MergeAction. Instead we only collect the necessary information for running
the INSERT/UPDATE/DELETE actions. Speaking of MergeAction itself, I decided
to use a new parser-only node named MergeWhenClause and removed unnecessary
members from the MergeAction node which now gets to planner/executor.

Regarding the original report by Marina I suspect she may have turned
debug_print_parse=on while running regression. I could reproduce the
failures in the isolation tests by doing same. The attached patch however
passes all tests with the following additional GUCs. So I am more confident
that we should have got the outfuncs.c support ok.

debug_print_parse=on
debug_print_rewritten=on
debug_print_plan=on

Also, I am now running tests with -DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST since the buildfarm had earlier uncovered
some issues with those flags. No problem there too.

This now also enforces single VALUES clause in the grammar itself instead
of doing that check at parse-analyse time. So that's a net improvement too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0001-Separate-raw-parse-representation-from-rest.patch
Description: Binary data


Get the name of the target Relation from Query struct?

2018-04-05 Thread Ernst-Georg Schmid
Hello,

I want to get the target Relation name for a UPDATE / INSERT / DELETE in a 
planner_hook. Do I understand struct Query correctly that:

Query->resultRelation will be the index into Query->rtable to give me the 
target Relation?

And if yes, what would rtable give me as list entry? An OID or a Relation or 
something else?

Best regards,

Ernst-Georg




Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Magnus Hagander
On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck 
wrote:

> Hi,
>
> On Wed, Apr 04, 2018 at 07:25:11PM +0200, Magnus Hagander wrote:
> > On Wed, Apr 4, 2018 at 3:47 PM, Alvaro Herrera 
> > wrote:
> > > Michael Banck wrote:
> > >
> > > > However, the pg_basebackup testsuite takes up 800+ MB to run,
> > >
> > > Uh, you're right.  This seems a bit over the top.  Can we reduce that
> > > without losing coverage?  We've gone great lengths to avoid large
> > > amounts of data in tests elsewhere.
> >
> > That didn't come out of this patch, right? This is a pre-existing issue?
>
> It was around that ballpark before, but this patch probably made
> things worse as it adds four additional datadirs (around 40 MB each
> here) and we are close to 1 GB now.
>
> I haven't looked at the other testsuites, but if it is ok to cleanup the
> basebackups after each set of tests suceeded, that would alleviate the
> problem.
>
> Otherwise, I had a quick look and there is no obvious outlier; the
> pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> could be cut down somehow?) and the base backups are 22-40 MB each, and
> there is around 20 of them, so that adds up to more than 750 MB.
>
>
It certainly seems reasonable to delete the base backups once they're made,
after each step, rather than keeping them around forever.

Do we have a precedent somewhere for how we do this, or does our test
framework already have a way to do it? How are all the actual data
directories etc cleaned up?

Or should it just be a matter of sprinkling some unlink() calls throughout
the test file?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Add support for printing/reading MergeAction nodes

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 11:31, Pavan Deolasee  wrote:

> Attached patch refactors the grammar/parser side per your comments. We no
> longer use InsertStmt/UpdateStmt/DeleteStmt/SelectStmt as part of
> MergeAction. Instead we only collect the necessary information for running
> the INSERT/UPDATE/DELETE actions. Speaking of MergeAction itself, I decided
> to use a new parser-only node named MergeWhenClause and removed unnecessary
> members from the MergeAction node which now gets to planner/executor.

That looks good to me. Simply separation of duty.

> Regarding the original report by Marina I suspect she may have turned
> debug_print_parse=on while running regression. I could reproduce the
> failures in the isolation tests by doing same. The attached patch however
> passes all tests with the following additional GUCs. So I am more confident
> that we should have got the outfuncs.c support ok.
>
> debug_print_parse=on
> debug_print_rewritten=on
> debug_print_plan=on
>
> Also, I am now running tests with -DCOPY_PARSE_PLAN_TREES
> -DRAW_EXPRESSION_COVERAGE_TEST since the buildfarm had earlier uncovered
> some issues with those flags. No problem there too.

OK, so $OP fixed.

> This now also enforces single VALUES clause in the grammar itself instead of
> doing that check at parse-analyse time. So that's a net improvement too.

OK, that's good. I've updated the docs to show this restriction correctly.

I'll commit this tomorrow morning unless further comments or edits.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types

2018-04-05 Thread Teodor Sigaev

somehow you missed some parts in 0001 patch, at least regression tests fail:

  CREATE EXTENSION btree_gin;
+ ERROR:  could not find function "gin_extract_value_uuid" in file 
"/usr/local/pgsql/lib/btree_gin.so"



Matheus de Oliveira wrote:

Hi all.

On Wed, Mar 21, 2018 at 1:47 PM, Tomas Vondra > wrote:



Do you plan to post an updated version of the patch, of what is your
response to the points raised last week?


Very sorry about the long delay. I've been in a long trip, no time to look that 
carefully.


I still haven't made my mind regarding usefulness of range opclasses, so
I suggest to split the patch into two parts - 0001 for the opclasses
we're confident are useful, and 0002 for those extras. The committer
then may apply either 0001 or 0001+0002, depending on his judgment.


I liked the idea. So, follows the patches:
- 0001-btree_gin-uuid--base.v2.patch - all types but anyrange, and with the 
adjustments on comments you proposed
- 0002-btree_gin-uuid--anyrange.v2.patch - adding the anyrange type (must be 
applied after 0001)


Anything else missing?

Best regards,
--
Matheus de Oliveira




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Jesper Pedersen

Hi Simon and Paven,

On 04/04/2018 08:46 AM, Jesper Pedersen wrote:

On 03/30/2018 07:10 AM, Simon Riggs wrote:

No problems found, but moving proposed commit to 2 April pm



There is a warning for this, as attached.



Updated version due to latest refactoring.

Best regards,
  Jesper

diff --git a/src/backend/executor/execMerge.c b/src/backend/executor/execMerge.c
index 471f64361d..e35025880d 100644
--- a/src/backend/executor/execMerge.c
+++ b/src/backend/executor/execMerge.c
@@ -48,7 +48,6 @@ ExecMerge(ModifyTableState *mtstate, EState *estate, TupleTableSlot *slot,
 	ItemPointer tupleid;
 	ItemPointerData tuple_ctid;
 	bool		matched = false;
-	char		relkind;
 	Datum		datum;
 	bool		isNull;
 


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 12:38, Jesper Pedersen  wrote:
> Hi Simon and Paven,
>
> On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
>>
>> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>>>
>>> No problems found, but moving proposed commit to 2 April pm
>>>
>>
>> There is a warning for this, as attached.
>>
>
> Updated version due to latest refactoring.

Thanks for your input. Removing that seems to prevent compilation.

Did something change in between?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Pavan Deolasee
On Thu, Apr 5, 2018 at 5:08 PM, Jesper Pedersen 
wrote:

> Hi Simon and Paven,
>
> On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
>
>> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>>
>>> No problems found, but moving proposed commit to 2 April pm
>>>
>>>
>> There is a warning for this, as attached.
>>
>>
> Updated version due to latest refactoring.
>


Hi Jesper,

The variable would become unused in non-assert builds. I see that. But
simply removing it is not a solution and I don't think the code will
compile that way. We should either rewrite that assertion or put it inside
a #ifdef ASSERT_CHECKING block or simple remove that assertion because we
already check for relkind in parse_merge.c. Will check.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Jesper Pedersen

Hi,

On 04/05/2018 07:48 AM, Simon Riggs wrote:

Updated version due to latest refactoring.


Thanks for your input. Removing that seems to prevent compilation.

Did something change in between?



Updated for non-assert build.

Best regards,
 Jesper
diff --git a/src/backend/executor/execMerge.c b/src/backend/executor/execMerge.c
index 471f64361d..53f4afff0f 100644
--- a/src/backend/executor/execMerge.c
+++ b/src/backend/executor/execMerge.c
@@ -48,13 +48,11 @@ ExecMerge(ModifyTableState *mtstate, EState *estate, TupleTableSlot *slot,
 	ItemPointer tupleid;
 	ItemPointerData tuple_ctid;
 	bool		matched = false;
-	char		relkind;
 	Datum		datum;
 	bool		isNull;
 
-	relkind = resultRelInfo->ri_RelationDesc->rd_rel->relkind;
-	Assert(relkind == RELKIND_RELATION ||
-		   relkind == RELKIND_PARTITIONED_TABLE);
+	Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind ||
+		   resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
 	/*
 	 * Reset per-tuple memory context to free any expression evaluation


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 12:56, Jesper Pedersen  wrote:
> Hi,
>
> On 04/05/2018 07:48 AM, Simon Riggs wrote:
>>>
>>> Updated version due to latest refactoring.
>>
>>
>> Thanks for your input. Removing that seems to prevent compilation.
>>
>> Did something change in between?
>>
>
> Updated for non-assert build.

Thanks, pushed. Sorry to have you wait til v3

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types

2018-04-05 Thread Tomas Vondra
Damn, I should have noticed that during the last review, but I missed
that somehow. Not sure Matheus will have time to look into it, so here
is a (hopefully) fixed version.

regards

On 4/5/18 1:16 PM, Teodor Sigaev wrote:
> somehow you missed some parts in 0001 patch, at least regression tests
> fail:
> 
>   CREATE EXTENSION btree_gin;
> + ERROR:  could not find function "gin_extract_value_uuid" in file
> "/usr/local/pgsql/lib/btree_gin.so"
> 
> 

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e966395ce95de914105f542bc6c5700bcde4b29f Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 5 Apr 2018 14:02:58 +0200
Subject: [PATCH 1/2] Add support for uuid, bool, bpchar and name to btree_gin

These data types all have a btree opclass, but were unsupported by
btree_gin for some reason.
---
 contrib/btree_gin/Makefile|   4 +-
 contrib/btree_gin/btree_gin--1.2--1.3.sql | 128 ++
 contrib/btree_gin/btree_gin.c |  31 +++-
 contrib/btree_gin/btree_gin.control   |   2 +-
 contrib/btree_gin/expected/bool.out   | 119 +++
 contrib/btree_gin/expected/bpchar.out | 109 +
 contrib/btree_gin/expected/name.out   |  97 ++
 contrib/btree_gin/expected/uuid.out   | 104 
 contrib/btree_gin/sql/bool.sql|  27 +++
 contrib/btree_gin/sql/bpchar.sql  |  22 +
 contrib/btree_gin/sql/name.sql|  21 +
 contrib/btree_gin/sql/uuid.sql|  28 +++
 doc/src/sgml/btree-gin.sgml   |   3 +-
 13 files changed, 690 insertions(+), 5 deletions(-)
 create mode 100644 contrib/btree_gin/btree_gin--1.2--1.3.sql
 create mode 100644 contrib/btree_gin/expected/bool.out
 create mode 100644 contrib/btree_gin/expected/bpchar.out
 create mode 100644 contrib/btree_gin/expected/name.out
 create mode 100644 contrib/btree_gin/expected/uuid.out
 create mode 100644 contrib/btree_gin/sql/bool.sql
 create mode 100644 contrib/btree_gin/sql/bpchar.sql
 create mode 100644 contrib/btree_gin/sql/name.sql
 create mode 100644 contrib/btree_gin/sql/uuid.sql

diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile
index 690e1d7..a9e9925 100644
--- a/contrib/btree_gin/Makefile
+++ b/contrib/btree_gin/Makefile
@@ -5,13 +5,13 @@ OBJS = btree_gin.o $(WIN32RES)
 
 EXTENSION = btree_gin
 DATA = btree_gin--1.0.sql btree_gin--1.0--1.1.sql btree_gin--1.1--1.2.sql \
-	btree_gin--unpackaged--1.0.sql
+	 btree_gin--1.2--1.3.sql btree_gin--unpackaged--1.0.sql
 PGFILEDESC = "btree_gin - B-tree equivalent GIN operator classes"
 
 REGRESS = install_btree_gin int2 int4 int8 float4 float8 money oid \
 	timestamp timestamptz time timetz date interval \
 	macaddr macaddr8 inet cidr text varchar char bytea bit varbit \
-	numeric enum
+	numeric enum uuid name bool bpchar
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/btree_gin/btree_gin--1.2--1.3.sql b/contrib/btree_gin/btree_gin--1.2--1.3.sql
new file mode 100644
index 000..db675b7
--- /dev/null
+++ b/contrib/btree_gin/btree_gin--1.2--1.3.sql
@@ -0,0 +1,128 @@
+/* contrib/btree_gin/btree_gin--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION btree_gin UPDATE TO '1.3'" to load this file. \quit
+
+-- uuid datatype support new in 1.3.
+CREATE FUNCTION gin_extract_value_uuid(uuid, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION gin_compare_prefix_uuid(uuid, uuid, int2, internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION gin_extract_query_uuid(uuid, internal, int2, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE OPERATOR CLASS uuid_ops
+DEFAULT FOR TYPE uuid USING gin
+AS
+OPERATOR1   <,
+OPERATOR2   <=,
+OPERATOR3   =,
+OPERATOR4   >=,
+OPERATOR5   >,
+FUNCTION1   uuid_cmp(uuid,uuid),
+FUNCTION2   gin_extract_value_uuid(uuid, internal),
+FUNCTION3   gin_extract_query_uuid(uuid, internal, int2, internal, internal),
+FUNCTION4   gin_btree_consistent(internal, int2, anyelement, int4, internal, internal),
+FUNCTION5   gin_compare_prefix_uuid(uuid,uuid,int2, internal),
+STORAGE uuid;
+
+-- name datatype support new in 1.3.
+CREATE FUNCTION gin_extract_value_name(name, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION gin_compare_prefix_name(name, name, int2, internal)
+RETURNS int4
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT IMMUTABLE;
+
+CREATE FUNCTION gin_extract_query_name(name, internal, int2, internal, internal)
+RETURNS internal
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT 

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Teodor Sigaev
The variable would become unused in non-assert builds. I see that. But simply 
removing it is not a solution and I don't think the code will compile that way. 
We should either rewrite that assertion or put it inside a #ifdef 
ASSERT_CHECKING block or simple remove that assertion because we already check 
for relkind in parse_merge.c. Will check.




That is noted by Kyotaro HORIGUCHI
https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp

and his suggestion to use special macro looks better for me:
-   charrelkind;
+   charrelkind PG_USED_FOR_ASSERTS_ONLY;


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Jesper Pedersen

Hi,

On 04/05/2018 08:04 AM, Simon Riggs wrote:

On 5 April 2018 at 12:56, Jesper Pedersen  wrote:

Updated for non-assert build.


Thanks, pushed. Sorry to have you wait til v3



That patch was a but rushed, and cut off too much.

As attached.

Best regards,
 Jesper
diff --git a/src/backend/executor/execMerge.c b/src/backend/executor/execMerge.c
index 53f4afff0f..d39ddd3034 100644
--- a/src/backend/executor/execMerge.c
+++ b/src/backend/executor/execMerge.c
@@ -51,7 +51,7 @@ ExecMerge(ModifyTableState *mtstate, EState *estate, TupleTableSlot *slot,
 	Datum		datum;
 	bool		isNull;
 
-	Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind ||
+	Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_RELATION ||
 		   resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 
 	/*


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 13:19, Jesper Pedersen  wrote:
> Hi,
>
> On 04/05/2018 08:04 AM, Simon Riggs wrote:
>>
>> On 5 April 2018 at 12:56, Jesper Pedersen 
>> wrote:
>>>
>>> Updated for non-assert build.
>>
>>
>> Thanks, pushed. Sorry to have you wait til v3
>>
>
> That patch was a but rushed, and cut off too much.

Yes, noted, already fixed. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 13:18, Teodor Sigaev  wrote:
>> The variable would become unused in non-assert builds. I see that. But
>> simply removing it is not a solution and I don't think the code will compile
>> that way. We should either rewrite that assertion or put it inside a #ifdef
>> ASSERT_CHECKING block or simple remove that assertion because we already
>> check for relkind in parse_merge.c. Will check.
>>
>
> That is noted by Kyotaro HORIGUCHI
> https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp
>
> and his suggestion to use special macro looks better for me:
> -   charrelkind;
> +   charrelkind PG_USED_FOR_ASSERTS_ONLY;

Thanks both, I already fixed that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Michael Banck
Hi,

On Thu, Apr 05, 2018 at 01:02:27PM +0200, Magnus Hagander wrote:
> On Wed, Apr 4, 2018 at 8:22 PM, Michael Banck 
> wrote:
> > Otherwise, I had a quick look and there is no obvious outlier; the
> > pgdata is 220 MB after the testrun (195 MB of which is WAL, maybe that
> > could be cut down somehow?) and the base backups are 22-40 MB each, and
> > there is around 20 of them, so that adds up to more than 750 MB.
>
> It certainly seems reasonable to delete the base backups once they're made,
> after each step, rather than keeping them around forever.

I had a look at this and found a copy-pasto in one of the test cases
while testing, patch attached.

I've also attached a second patch (that applies on top of the first)
that removes the base backups once they are no longer needed, also
attached (but see below).
 
> Do we have a precedent somewhere for how we do this, or does our test
> framework already have a way to do it? How are all the actual data
> directories etc cleaned up?

They (and the base backups) are getting purged on success of the whole
testsuite. So to be clear - we are not leaving behind 1 GB of disk space
on success, but we use 1 GB of disk space during the test.
 
> Or should it just be a matter of sprinkling some unlink() calls throughout
> the test file?

I used rmtree() from File::Path (which is also used by PostgresNode to
clean up) to remove them during the run.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e2f1465472..8e2d1ddb2c 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -334,7 +334,7 @@ ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
 $node->command_ok(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxs", '-X', 'stream' ],
 	'pg_basebackup -X stream runs');
-ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxf/pg_wal")),
+ok(grep(/^[0-9A-F]{24}$/, slurp_dir("$tempdir/backupxs/pg_wal")),
 	'WAL files copied');
 $node->command_ok(
 	[ 'pg_basebackup', '-D', "$tempdir/backupxst", '-X', 'stream', '-Ft' ],
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index e2f1465472..afb392dbb3 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -3,6 +3,7 @@ use warnings;
 use Cwd;
 use Config;
 use File::Basename qw(basename dirname);
+use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
 use Test::More tests => 104;
@@ -135,6 +136,7 @@ foreach my $filename (@tempRelationFiles)
 # Make sure existing backup_label was ignored.
 isnt(slurp_file("$tempdir/backup/backup_label"),
 	'DONOTCOPY', 'existing backup_label not copied');
+rmtree("$tempdir/backup");
 
 $node->command_ok(
 	[   'pg_basebackup', '-D', "$tempdir/backup2", '--waldir',
@@ -142,10 +144,13 @@ $node->command_ok(
 	'separate xlog directory');
 ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
 ok(-d "$tempdir/xlog2/", 'xlog directory was created');
+rmtree("$tempdir/backup2");
+rmtree("$tempdir/xlog2");
 
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
 	'tar format');
 ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
+rmtree("$tempdir/tarbackup");
 
 $node->command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
@@ -212,6 +217,7 @@ SKIP:
 	ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created');
 	my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar";
 	is(scalar(@tblspc_tars), 1, 'one tablespace tar was created');
+	rmtree("$tempdir/tarbackup2");
 
 	# Create an unlogged table to test that forks other than init are not copied.
 	$node->safe_psql('postgres',
@@ -281,6 +287,7 @@ SKIP:
 
 	ok( -d "$tempdir/backup1/pg_replslot",
 		'pg_replslot symlink copied as directory');
+	rmtree("$tempdir/backup1");
 
 	mkdir "$tempdir/tbl=spc2";
 	$node->safe_psql('postgres', "DROP TABLE test1;");
@@ -295,6 +302,7 @@ SKIP:
 	ok(-d "$tempdir/tbackup/tbl=spc2",
 		'tablespace with = sign was relocated');
 	$node->safe_psql('postgres', "DROP TABLESPACE tblspc2;");
+	rmtree("$tempdir/backup3");
 
 	mkdir "$tempdir/$superlongname";
 	$node->safe_psql('postgres',
@@ -303,12 +311,14 @@ SKIP:
 		[ 'pg_basebackup', '-D', "$tempdir/tarbackup_l3", '-Ft' ],
 		'pg_basebackup tar with long symlink target');
 	$node->safe_psql('postgres', "DROP TABLESPACE tblspc3;");
+	rmtree("$tempdir/tarbackup_l3");
 }
 
 $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/backupR", '-R' ],
 	'pg_basebackup -R runs');
 ok(-f "$tempdir/backupR/recovery.conf", 'recovery.conf was created');
 my $recover

Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types

2018-04-05 Thread Matheus de Oliveira
On Thu, Apr 5, 2018 at 8:16 AM, Teodor Sigaev  wrote:

> somehow you missed some parts in 0001 patch, at least regression tests
> fail:
>
>   CREATE EXTENSION btree_gin;
> + ERROR:  could not find function "gin_extract_value_uuid" in file
> "/usr/local/pgsql/lib/btree_gin.so"
>
>
Ouch... My fault, filterdiff is acting weird for some reason...

Here is the corrected versions... I tested here applying on a clean master
just to be sure, all looks good.

Very sorry about that mess. I hope it can get in v11, it is a patch so
simple, but so useful for many people.

Best regards,
-- 
Matheus de Oliveira
diff --git a/contrib/btree_gin/Makefile b/contrib/btree_gin/Makefile
new file mode 100644
index 690e1d7..a9e9925
*** a/contrib/btree_gin/Makefile
--- b/contrib/btree_gin/Makefile
*** OBJS = btree_gin.o $(WIN32RES)
*** 5,17 
  
  EXTENSION = btree_gin
  DATA = btree_gin--1.0.sql btree_gin--1.0--1.1.sql btree_gin--1.1--1.2.sql \
! 	btree_gin--unpackaged--1.0.sql
  PGFILEDESC = "btree_gin - B-tree equivalent GIN operator classes"
  
  REGRESS = install_btree_gin int2 int4 int8 float4 float8 money oid \
  	timestamp timestamptz time timetz date interval \
  	macaddr macaddr8 inet cidr text varchar char bytea bit varbit \
! 	numeric enum
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 5,17 
  
  EXTENSION = btree_gin
  DATA = btree_gin--1.0.sql btree_gin--1.0--1.1.sql btree_gin--1.1--1.2.sql \
! 	 btree_gin--1.2--1.3.sql btree_gin--unpackaged--1.0.sql
  PGFILEDESC = "btree_gin - B-tree equivalent GIN operator classes"
  
  REGRESS = install_btree_gin int2 int4 int8 float4 float8 money oid \
  	timestamp timestamptz time timetz date interval \
  	macaddr macaddr8 inet cidr text varchar char bytea bit varbit \
! 	numeric enum uuid name bool bpchar
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
diff --git a/contrib/btree_gin/btree_gin--1.2--1.3.sql b/contrib/btree_gin/btree_gin--1.2--1.3.sql
new file mode 100644
index ...db675b7
*** a/contrib/btree_gin/btree_gin--1.2--1.3.sql
--- b/contrib/btree_gin/btree_gin--1.2--1.3.sql
***
*** 0 
--- 1,128 
+ /* contrib/btree_gin/btree_gin--1.2--1.3.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "ALTER EXTENSION btree_gin UPDATE TO '1.3'" to load this file. \quit
+ 
+ -- uuid datatype support new in 1.3.
+ CREATE FUNCTION gin_extract_value_uuid(uuid, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_compare_prefix_uuid(uuid, uuid, int2, internal)
+ RETURNS int4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_extract_query_uuid(uuid, internal, int2, internal, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR CLASS uuid_ops
+ DEFAULT FOR TYPE uuid USING gin
+ AS
+ OPERATOR1   <,
+ OPERATOR2   <=,
+ OPERATOR3   =,
+ OPERATOR4   >=,
+ OPERATOR5   >,
+ FUNCTION1   uuid_cmp(uuid,uuid),
+ FUNCTION2   gin_extract_value_uuid(uuid, internal),
+ FUNCTION3   gin_extract_query_uuid(uuid, internal, int2, internal, internal),
+ FUNCTION4   gin_btree_consistent(internal, int2, anyelement, int4, internal, internal),
+ FUNCTION5   gin_compare_prefix_uuid(uuid,uuid,int2, internal),
+ STORAGE uuid;
+ 
+ -- name datatype support new in 1.3.
+ CREATE FUNCTION gin_extract_value_name(name, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_compare_prefix_name(name, name, int2, internal)
+ RETURNS int4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_extract_query_name(name, internal, int2, internal, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE OPERATOR CLASS name_ops
+ DEFAULT FOR TYPE name USING gin
+ AS
+ OPERATOR1   <,
+ OPERATOR2   <=,
+ OPERATOR3   =,
+ OPERATOR4   >=,
+ OPERATOR5   >,
+ FUNCTION1   btnamecmp(name,name),
+ FUNCTION2   gin_extract_value_name(name, internal),
+ FUNCTION3   gin_extract_query_name(name, internal, int2, internal, internal),
+ FUNCTION4   gin_btree_consistent(internal, int2, anyelement, int4, internal, internal),
+ FUNCTION5   gin_compare_prefix_name(name,name,int2, internal),
+ STORAGE name;
+ 
+ -- bool datatype support new in 1.3.
+ CREATE FUNCTION gin_extract_value_bool(bool, internal)
+ RETURNS internal
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_compare_prefix_bool(bool, bool, int2, internal)
+ RETURNS int4
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C STRICT IMMUTABLE;
+ 
+ CREATE FUNCTION gin_extract_query_bool(bool, internal, int2, internal, internal)

typcategory for regconfig

2018-04-05 Thread Dmitry Dolgov
Hi,

Does anyone know, why `typcategory` value for tsvector `regconfig` is
`TYPCATEGORY_NUMERIC`, but in all the tests it's being used in string format?
It's probably not a big deal, but in this thread [1] it prevents me from
adopting the nice solution with a boolean flag for `to_tsvector` function,
because Postgres can't distinguish between `to_tsvector(regconfig, text)` and
`to_tsvector(jsonb, boolean)` in the expression:

to_tsvector('english', 'some text')

If it's value would be `TYPCATEGORY_STRING`, then everything will be fine,
since a string type will win. Also, it doesn't break any existing tests, so I
wonder whether it should be like that or not?

1: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcXJQbS1b4kJ_HeAOoOc%3DunfnOrUEL%3DKGgE32QKDww7d8g%40mail.gmail.com



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Amit Langote
On 2018/04/05 12:14, Amit Langote wrote:
> I will post comments on your v19 later today.

I looked at it and here are my thoughts on it after having for the first
time looked very closely at it.

* Regarding PartitionPruneInfo:

I think the names of the fields could be improved -- like pruning_steps
instead prunesteps, unpruned_parts instead of allpartindexs.  The latter
is even a bit misleading because it doesn't in fact contain *all*
partition indexes, only those that are unpruned, because each either has a
subpath or it appears in (unpruned) partitioned_rels list.  Also, I didn't
like the name subnodeindex and subpartindex very much.  subplan_indexes
and parent_indexes would sound more informative to me.

* make_partition_pruneinfo has a parameter resultRelations that's not used
anywhere

* In make_partition_pruneinfo()

allsubnodeindex = palloc(sizeof(int) * root->simple_rel_array_size);
allsubpartindex = palloc(sizeof(int) * root->simple_rel_array_size);

I think these arrays need to have root->simple_rel_array_size + 1
elements, because they're subscripted using RT indexes which start at 1.

* Also in make_partition_pruneinfo()

/* Initialize to -1 to indicate the rel was not found */
for (i = 0; i < root->simple_rel_array_size; i++)
{
allsubnodeindex[i] = -1;
allsubpartindex[i] = -1;
}

Maybe, allocate the arrays above mentioned using palloc0 and don't do this
initialization.  Instead make the indexes that are stored in these start
with 1 and consider 0 as invalid entries.

* Regarding the code added in execPartition.c and execPartition.h:

I wondered why PartitionedRelPruning is named the way it is.  I saw many
parallels with PartitionDispatchData both in terms of the main thing it
consists of, that is, the map that translates partition indexes as in
partition descriptor to that of subplans or of some other executor
structure.  Also, I imagine you tried to mimic PartitionTupleRouting with
PartitionPruning but not throughout.  For example, tuple routing struct
pointer variables are throughout called proute, whereas PartitionPruning
ones are called partprune instead of, say, pprune.  Consistency would
help, imho.

* Instead of nesting PartitionedRelPruning inside another, just store them
in a global flat array in the PartitionPruning, like PartitionTupleRouting
does for PartitionDispatch of individual partitioned tables in the tree.

typedef struct PartitionedRelPruning
{
int nparts;
int*subnodeindex;
struct PartitionedRelPruning **subpartprune;

* I don't see why there needs to be nparts in the above, because it
already has a PartitionPruneContext member which has that information.

In fact, I made most of changes myself while going through the code.
Please see attached the delta patch.  It also tweaks quite a few other
things including various comments.  I think parts of it apply to 0001,
0003, and 0004 patches.  See if this looks good to you.

Thanks,
Amit
diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 17da8cdbd3..1041871e51 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -39,11 +39,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation 
rel,
 bool 
*isnull,
 int 
maxfieldlen);
 static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map);
-static void find_subplans_for_extparams_recurse(
-   
PartitionedRelPruning *partrelprune,
+static void find_subplans_for_extparams_recurse(PartitionPruningDispatch 
*all_ppd,
+   int 
dispatch_offset,

Bitmapset **validsubplans);
-static void find_subplans_for_allparams_recurse(
-   
PartitionedRelPruning *partrelprune,
+static void find_subplans_for_allparams_recurse(PartitionPruningDispatch 
*all_ppd,
+   int 
dispatch_offset,

Bitmapset **validsubplans);
 
 
@@ -1343,27 +1343,27 @@ adjust_partition_tlist(List *tlist, TupleConversionMap 
*map)
 PartitionPruning *
 ExecSetupPartitionPruning(PlanState *planstate, List *partitionpruneinfo)
 {
-   PartitionedRelPruning *partrelprunes;
-   PartitionPruning *partprune;
+   PartitionPruning *pprune;
+   PartitionPruningDispatch *all_ppd;
ListCell   *lc;
int i;
 
Assert(partitionpruneinfo != NIL);
 
-   partprune = (PartitionPruning *) palloc(sizeof(PartitionPruning));
-   partrelprunes = (PartitionedRelPruning *)
-

Re: Get the name of the target Relation from Query struct?

2018-04-05 Thread David Rowley
On 5 April 2018 at 22:34, Ernst-Georg Schmid
 wrote:
> I want to get the target Relation name for a UPDATE / INSERT / DELETE in a 
> planner_hook. Do I understand struct Query correctly that:
>
> Query->resultRelation will be the index into Query->rtable to give me the 
> target Relation?

Yes

> And if yes, what would rtable give me as list entry? An OID or a Relation or 
> something else?

The list_nth(query->rtable, query->resultRelation) will give you a
RangeTblEntry which has a property called relid, which is the
Relation's OID as per pg_class.oid.

If you want the relation name from the OID then you'll need something
like get_rel_name().

You'll probably also want to check the query->commandType to ensure
the command is one that will actually have a valid resultRelation.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Validate page level checksums in base backups

2018-04-05 Thread Alvaro Herrera
Michael Banck wrote:
> Hi,

> > Do we have a precedent somewhere for how we do this, or does our test
> > framework already have a way to do it? How are all the actual data
> > directories etc cleaned up?
> 
> They (and the base backups) are getting purged on success of the whole
> testsuite. So to be clear - we are not leaving behind 1 GB of disk space
> on success, but we use 1 GB of disk space during the test.

Yeah, but it means you force the OS to keep 1 GB of useless dung in
cache.  I would hope that creating 40 MB, deleting those, then creating
further 40 MB (lather, rinse, repeat) would not overall evict 1 GB from
my OS cache.  Unless the OS keeps the unlinked files in cache, which
would be stupid.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



AW: Get the name of the target Relation from Query struct?

2018-04-05 Thread Ernst-Georg Schmid
Thank you David, so I am on the right track.
 
>If you want the relation name from the OID then you'll need something
>like get_rel_name().

Hm, lsyscache.c says for get_rel_name():

  * NOTE: since relation name is not unique, be wary of code that uses this
  * for anything except preparing error messages.

How do I get the schema name too, then?

Call get_rel_namespace() and then get_rel_name() again with the Oid that was 
returned?

Best regards,

Ernst-Georg


Re: typcategory for regconfig

2018-04-05 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> Does anyone know, why `typcategory` value for tsvector `regconfig` is
> `TYPCATEGORY_NUMERIC`,

Because OID is.  I think we need all the OID-alias types to be the same
category as OID, else we're likely to have issues with queries like

... where oid = 'foo'::regwhatever

It's conceivable that we could move OID and all the reg* types into
their own category, but then the other time-honored locution of

... where oid = 25

would possibly give issues.

> It's probably not a big deal, but in this thread [1] it prevents me from
> adopting the nice solution with a boolean flag for `to_tsvector` function,
> because Postgres can't distinguish between `to_tsvector(regconfig, text)` and
> `to_tsvector(jsonb, boolean)` in the expression:

We are *not* putting these in category string.  They are not strings.
Furthermore, if we did so because

> ... then everything will be fine,
> since a string type will win.

then the odds are very good that these types would start to "win" some
other cases that we'd rather they didn't.

> Also, it doesn't break any existing tests

Doesn't prove a thing.  We do not have a suite of test cases exercising
whether the type resolution code will avoid doing the wrong thing.

I think you need to bite the bullet and just provide the flag in
the 3-argument case (regconfig,json[b],bool).

regards, tom lane



Query Rewrite for Materialized Views (FDW Extension)

2018-04-05 Thread Dent John
Hi,

I wanted to share the project I've been working on which dynamically rewrites 
queries to target materialized views when views are available that can satisfy 
part of a query with lower cost plans.

I embarked upon as an interesting side project. It took me a bit more time than 
I anticipated, but the result works for my use case. Because of that, I thought 
it worth sharing. However I would caution that my use case is not exactly of a 
commercial scale... so please heed the following obligatory warning:

**NOTE: this is not "production ready" code — if it works for you, then great, 
but use it after thorough testing, and with appropriate caution.**

There are some limitations to the rewrite opportunities it takes up, and it 
will almost certainly fail on complex materialized views composed of deeply 
nested queries.

The extension does not have extensive (actually: any) documentation, but the 
few test cases should make obvious to the inclined reader how it works. This is 
deliberate at this early a stage: I don't want to encourage uninformed adoption 
because of the possibility of data loss or incorrect query rewrites.

The extension is written against a Postgres 10.1 source tree.

Source code: https://github.com/d-e-n-t-y/pg_fdw_mv_rewrite

Questions or comments are very welcome.

denty.



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread David Rowley
Hi Amit,

Thanks for having a look at this.

On 6 April 2018 at 00:54, Amit Langote  wrote:
> I looked at it and here are my thoughts on it after having for the first
> time looked very closely at it.
>
> * Regarding PartitionPruneInfo:
>
> I think the names of the fields could be improved -- like pruning_steps
> instead prunesteps, unpruned_parts instead of allpartindexs.  The latter
> is even a bit misleading because it doesn't in fact contain *all*
> partition indexes, only those that are unpruned, because each either has a
> subpath or it appears in (unpruned) partitioned_rels list.  Also, I didn't
> like the name subnodeindex and subpartindex very much.  subplan_indexes
> and parent_indexes would sound more informative to me.

Seems mostly fair. I'm not a fan of using the term "unpruned" though.
I'll have a think.  The "all" is meant in terms of what exists as
subnodes.

subplan_indexes and parent_indexes seem like better names, I agree.

> * make_partition_pruneinfo has a parameter resultRelations that's not used
> anywhere

It gets used in 0005.

I guess I could only add it in 0005, but make_partition_pruneinfo is
only used in 0003, so you could say the same about that entire
function.

Do you think I should delay adding that parameter until the 0005 patch?

> * In make_partition_pruneinfo()
>
> allsubnodeindex = palloc(sizeof(int) * root->simple_rel_array_size);
> allsubpartindex = palloc(sizeof(int) * root->simple_rel_array_size);
>
> I think these arrays need to have root->simple_rel_array_size + 1
> elements, because they're subscripted using RT indexes which start at 1.

RT indexes are always 1-based. See setup_simple_rel_arrays. It already
sets the array size to list_length(rtable) + 1.

> * Also in make_partition_pruneinfo()
>
> /* Initialize to -1 to indicate the rel was not found */
> for (i = 0; i < root->simple_rel_array_size; i++)
> {
> allsubnodeindex[i] = -1;
> allsubpartindex[i] = -1;
> }
>
> Maybe, allocate the arrays above mentioned using palloc0 and don't do this
> initialization.  Instead make the indexes that are stored in these start
> with 1 and consider 0 as invalid entries.

0 is a valid subplan index. It is possible to make this happen, but
I'd need to subtract 1 everywhere I used the map. That does not seem
very nice. Seems more likely to result in bugs where we might forget
to do the - 1.

Did you want this because you'd rather have the palloc0() than the for
loop setting the array elements to -1? Or is there another reason?

> * Regarding the code added in execPartition.c and execPartition.h:
>
> I wondered why PartitionedRelPruning is named the way it is.  I saw many
> parallels with PartitionDispatchData both in terms of the main thing it
> consists of, that is, the map that translates partition indexes as in
> partition descriptor to that of subplans or of some other executor
> structure.  Also, I imagine you tried to mimic PartitionTupleRouting with
> PartitionPruning but not throughout.  For example, tuple routing struct
> pointer variables are throughout called proute, whereas PartitionPruning
> ones are called partprune instead of, say, pprune.  Consistency would
> help, imho.

Yes, I saw similarities and did end up moving all the code into
execPartition a while back.

I'll look into this renaming.

> * Instead of nesting PartitionedRelPruning inside another, just store them
> in a global flat array in the PartitionPruning, like PartitionTupleRouting
> does for PartitionDispatch of individual partitioned tables in the tree.
>
> typedef struct PartitionedRelPruning
> {
> int nparts;
> int*subnodeindex;
> struct PartitionedRelPruning **subpartprune;

There is a flat array in PartitionPruning. subpartprune contains
pointers into that array. I want to have this pointer array so I can
directly reference the flat array in PartitionPruning.

Maybe I've misunderstood what you mean here.

> * I don't see why there needs to be nparts in the above, because it
> already has a PartitionPruneContext member which has that information.

Good point. I'll remove that.

> In fact, I made most of changes myself while going through the code.
> Please see attached the delta patch.  It also tweaks quite a few other
> things including various comments.  I think parts of it apply to 0001,
> 0003, and 0004 patches.  See if this looks good to you.

Thanks. I'll look.

It's late over this side now, so will look tomorrow.

Thanks again for reviewing this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Alvaro Herrera
Heikki Linnakangas wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

I think calling PostmasterIsAlive only every 32 records is sensible, but
I'm not so sure about the other two things happening in
HandleStartupProcInterrupts (which are much cheaper anyway, since they
only read a local flag); if records are large or otherwise slow to
replay, I'd rather not wait for 32 of them before the process honoring
my shutdown request.  Why not split the function in two, or maybe add a
flag "check for postmaster alive too", passed as true every 32 records?

The lesson seems to be that PostmasterIsAlive is moderately expensive
(one syscall).  It's also used in WaitLatchOrSocket when
WL_POSTMASTER_DEATH is given.  I didn't audit its callers terribly
carefully but I think these uses are not as performance-sensitive.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: typcategory for regconfig

2018-04-05 Thread Dmitry Dolgov
> On 5 April 2018 at 15:27, Tom Lane  wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
>> Does anyone know, why `typcategory` value for tsvector `regconfig` is
>> `TYPCATEGORY_NUMERIC`,
>
> Because OID is.  I think we need all the OID-alias types to be the same
> category as OID, else we're likely to have issues with queries like

Ok, I see, thanks.

> I think you need to bite the bullet and just provide the flag in
> the 3-argument case (regconfig,json[b],bool).

Well, it's already like that. I have now:

to_tsvector(json(b), boolean)
to_tsvector(regconfig, json(b), boolean)

and as I mentioned above the first one is conflicting with
to_tsvector(regconfig, text).



Re: typcategory for regconfig

2018-04-05 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> On 5 April 2018 at 15:27, Tom Lane  wrote:
>> I think you need to bite the bullet and just provide the flag in
>> the 3-argument case (regconfig,json[b],bool).

> Well, it's already like that. I have now:

> to_tsvector(json(b), boolean)
> to_tsvector(regconfig, json(b), boolean)

> and as I mentioned above the first one is conflicting with
> to_tsvector(regconfig, text).

Right.  So you need to either drop that form, or consider doing
something other than add-a-bool.  Maybe the alternate behavior
should have a different function name, instead of being selected
by an argument?

regards, tom lane



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 08:23, Heikki Linnakangas  wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

+1

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Get the name of the target Relation from Query struct?

2018-04-05 Thread David Rowley
On 6 April 2018 at 01:20, Ernst-Georg Schmid
 wrote:
>>If you want the relation name from the OID then you'll need something
>>like get_rel_name().
>
> Hm, lsyscache.c says for get_rel_name():
>
>   * NOTE: since relation name is not unique, be wary of code that uses this
>   * for anything except preparing error messages.
>
> How do I get the schema name too, then?
>
> Call get_rel_namespace() and then get_rel_name() again with the Oid that was 
> returned?

No, get_rel_name returns the relname from pg_class. What you need is
in pg_namespace:

namespace from relid: get_namespace_name(get_rel_namespace(relid))
relname from relid: get_rel_name(relid;

If you're going to relation_open the rel, then you might want to use:

get_namespace_name(RelationGetNamespace(rel))

and

RelationGetRelationName(rel;

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: typcategory for regconfig

2018-04-05 Thread Dmitry Dolgov
> On 5 April 2018 at 15:48, Tom Lane  wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On 5 April 2018 at 15:27, Tom Lane  wrote:
>>> I think you need to bite the bullet and just provide the flag in
>>> the 3-argument case (regconfig,json[b],bool).
>
>> Well, it's already like that. I have now:
>
>> to_tsvector(json(b), boolean)
>> to_tsvector(regconfig, json(b), boolean)
>
>> and as I mentioned above the first one is conflicting with
>> to_tsvector(regconfig, text).
>
> Right.  So you need to either drop that form, or consider doing
> something other than add-a-bool.  Maybe the alternate behavior
> should have a different function name, instead of being selected
> by an argument?

Yep, I'll swallow my perfectionism and go with a new function.



Re: WIP: Covering + unique indexes.

2018-04-05 Thread Erik Rijkers

On 2018-04-05 00:09, Alexander Korotkov wrote:

Hi!

Thank you for review!  Revised patchset is attached.
[0001-Covering-core-v12.patch]
[0002-Covering-btree-v12.patch]
[0003-Covering-amcheck-v12.patch]
[0004-Covering-natts-v12.patch]


Really nice performance gains.

I read through the docs and made some changes.  I hope it can count as 
improvement.


It would probably also be a good idea to add the term "covering index" 
somewhere, at least in the documentation's index; the term does now not 
occur anywhere.  (This doc-patch does not add it)


thanks,

Erik Rijkers
--- doc/src/sgml/ref/create_index.sgml.orig	2018-04-05 14:36:00.904617793 +0200
+++ doc/src/sgml/ref/create_index.sgml	2018-04-05 15:49:03.778805965 +0200
@@ -148,31 +148,27 @@
   INCLUDE
   

-An optional INCLUDE clause allows to specify the
-list of columns which will be included in the non-key part of the index.
-Columns listed in this clause cannot co-exist as index key columns,
-and vice versa.  The INCLUDE columns exist solely to
+The optional INCLUDE clause specifies a
+list of columns which will be included as a non-key part in the index.
+Columns listed in this clause cannot also be present as index key columns.
+The INCLUDE columns exist solely to
 allow more queries to benefit from index-only scans
-by including specified columns into the index.  Values of these columns
+by including the values of the specified columns in the index.  These values
 would otherwise have to be obtained by reading the table's heap.
-Having these columns in the INCLUDE clause
-in some cases allows PostgreSQL to skip
-the heap read completely.

 

-In the UNIQUE indexes, uniqueness is only enforced
+In UNIQUE indexes, uniqueness is only enforced
 for key columns.  Columns listed in the INCLUDE
-clause have no influence to uniqueness enforcement.  Other constraints
+clause have no effect on uniqueness enforcement.  Other constraints
 (PRIMARY KEY and EXCLUDE) work the same way.

 

-Columns listed in the INCLUDE clause doesn't need
-appropriate operator class to exist.  Therefore,
-INCLUDE clause if useful to add non-key index
-columns, whose data types don't have operator classes defined for
-given access method.
+Columns listed in the INCLUDE clause don't need
+appropriate operator classes; the clause can contain non-key index
+columns whose data types don't have operator classes defined for
+a given access method.

 

@@ -182,12 +178,12 @@
 

 Currently, only the B-tree index access method supports this feature.
-In B-tree indexes, values of columns listed in the
-INCLUDE clause are included into leaf tuples which
-are linked to the heap tuples, but aren't included into pivot tuples
+In B-tree indexes, the values of columns listed in the
+INCLUDE clause are included in leaf tuples which
+are linked to the heap tuples, but are not included into pivot tuples
 used for tree navigation.  Therefore, moving columns from the list of
 key columns to the INCLUDE clause can slightly
-reduce index size and improve tree branching factor.
+reduce index size and improve the tree branching factor.

   
  


Re: typcategory for regconfig

2018-04-05 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
> On 5 April 2018 at 15:48, Tom Lane  wrote:
>> Right.  So you need to either drop that form, or consider doing
>> something other than add-a-bool.  Maybe the alternate behavior
>> should have a different function name, instead of being selected
>> by an argument?

> Yep, I'll swallow my perfectionism and go with a new function.

There's plenty of nearby precedent for that, eg plainto_tsquery and
phraseto_tsquery, so I'm not even sure this is a less attractive option.

regards, tom lane



Re: WIP: Covering + unique indexes.

2018-04-05 Thread Alexander Korotkov
On Thu, Apr 5, 2018 at 5:02 PM, Erik Rijkers  wrote:

> On 2018-04-05 00:09, Alexander Korotkov wrote:
>
>> Thank you for review!  Revised patchset is attached.
>> [0001-Covering-core-v12.patch]
>> [0002-Covering-btree-v12.patch]
>> [0003-Covering-amcheck-v12.patch]
>> [0004-Covering-natts-v12.patch]
>>
>
> Really nice performance gains.
>
> I read through the docs and made some changes.  I hope it can count as
> improvement.
>

Thank you for your improvements of the docs.  Your chenges will be
incorporated into new revision of patchset which I'm going to post today.


> It would probably also be a good idea to add the term "covering index"
> somewhere, at least in the documentation's index; the term does now not
> occur anywhere.  (This doc-patch does not add it)
>

I'll think about it.  May be we'll define "covering index" in the docs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Flexible configuration for full-text search

2018-04-05 Thread Teodor Sigaev

Some notices:

0) patch conflicts with last changes in gram.y, conflicts are trivial.

1) jsonb in catalog. I'm ok with it, any opinions?

2) pg_ts_config_map.h, "jsonb   mapdicts" isn't decorated with #ifdef 
CATALOG_VARLEN like other varlena columns in catalog. It it's right, pls, 
explain and add comment.


3) I see changes in pg_catalog, including drop column, change its type, change 
index, change function etc. Did you pay attention to pg_upgrade? I don't see it 
in patch.


4) Seems, it could work:
ALTER TEXT SEARCH CONFIGURATION russian
  ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
  word, hword, hword_part
WITH english_stem union (russian_stem, simple);
 ^ simple way instead of
WITH english_stem union (case russian_stem when match then keep else simple 
end);

4) Initial approach suggested to distinguish three state of dictionary result: 
null (unknown word), stopword and usual word. Now only two, we lost possibility 
to catch stopwords. One of way to use stopwrods is: let we have to identical fts 
configurations, except one skips stopwords and another doesn't. Second 
configuration is used for indexing, and first one for search by default. But if 
we can't  find anything ('to be or to be' - phrase contains stopwords only) 
then we can use second configuration. For now, we need to keep two variant of 
each dictionary - with and without stopwords. But if it's possible to 
distinguish stop and nonstop words in configuration then we don't need to have 
duplicated dictionaries.



Aleksandr Parfenov wrote:

On Fri, 30 Mar 2018 14:43:30 +
Aleksander Alekseev  wrote:


The following review has been posted through the commitfest
application: make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

LGTM.

The new status of this patch is: Ready for Committer


It seems that after d204ef6 (MERGE SQL Command) in master the patch
doesn't apply due to a conflict in keywords lists (grammar and header).
The new version of the patch without conflicts is attached.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: [HACKERS] path toward faster partition pruning

2018-04-05 Thread Alvaro Herrera
Amit Langote wrote:

> >> 1. Still not sure about RelOptInfo->has_default_part. This flag is
> >> only looked at in generate_partition_pruning_steps. The RelOptInfo and
> >> the boundinfo is available to look at, it's just that the
> >> partition_bound_has_default macro is defined in partition.c rather
> >> than partition.h.
> 
> Hmm, it might not be such a bad idea to bring out the
> PartitionBoundInfoData into partition.h.  If we do that, we won't need the
> has_default_part that the patch adds to RelOptInfo.
> 
> In the Attached v50 set, 0002 does that.

After looking at this for a moment, I again come to the conclusion that
the overall layout of partitioning code and definitions is terrible.
But we already know that, and there's a patch in commitfest to improve
things.  So my intention right now is to hold my nose and get this
pushed; we'll fix it afterwards.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor
On 4/5/18, Tom Lane  wrote:
> Here are the results of an evening's desultory hacking on v13.
>
> [numeric function oids with overloaded name]

Thank you for the detailed review and for improving the function
references (not to mention the type references I somehow left on the
table). I was also not quite satisfied with just the regproc columns.

> I did not like the hard-wired handling of proargtypes and proallargtypes
> in genbki.pl; it hardly seems impossible that we'll want similar
> conversions for other array-of-OID columns in future.  After a bit of
> thought, it seemed like we could allow
>
> oidvectorproargtypes BKI_LOOKUP(pg_type);
>
> Oid  proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>
> and just teach genbki.pl that if a lookup rule is attached to
> an oidvector or Oid[] column, it means to apply the rule to
> each array element individually.

I think that's a good idea. I went an extra step and extracted the
common logic into a function (attached draft patch to be applied on
top of yours). It treats all lookups as operating on arrays. The
common case is that we pass a single-element array. That may seem
awkward, but I think it's clear. The code is slimmer, and the lines
now fit within 80 characters.

> I also changed genbki.pl so that it'd warn about entries that aren't
> recognized by the lookup rules.  This seems like a good idea for
> catching errors, such as (ahem) applying BKI_LOOKUP to a column
> that isn't even an OID.

Yikes, I must have fat-fingered that during the comment reformatting.

Unrelated, I noticed my quoting of defaults that contain back-slashes
was half-baked, so I'll include that fix in the next patchset. I'll
put out a new one in a couple days, to give a chance for further
review and discussion of the defaults. I didn't feel the need to
respond to the other messages, but yours and Andres' points are well
taken.

-John Naylor
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index f6be50a..8d47109 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -343,34 +343,21 @@ EOM
 			#
 			# If the column type is oidvector or oid[], we have to replace
 			# each element of the array as per the lookup rule.
-			#
-			# If we don't have a unique value to substitute, warn and
-			# leave the entry unchanged.
 			if ($column->{lookup})
 			{
 my $lookup = $lookup_kind{ $column->{lookup} };
+my @lookupnames;
+my @lookupoids;
 
 die "unrecognized BKI_LOOKUP type " . $column->{lookup}
   if !defined($lookup);
 
 if ($atttype eq 'oidvector')
 {
-	my @lookupnames = split /\s+/, $bki_values{$attname};
-	my @lookupoids;
-	foreach my $lookupname (@lookupnames)
-	{
-		my $lookupoid = $lookup->{ $lookupname };
-		if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-		{
-			$lookupname = $lookupoid;
-		}
-		else
-		{
-			warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-if $lookupname ne '-' && $lookupname ne '0';
-		}
-		push @lookupoids, $lookupname;
-	}
+	@lookupnames = split /\s+/, $bki_values{$attname};
+	@lookupoids = lookup_oids($lookup, $catname,
+\%bki_values, @lookupnames);
+
 	$bki_values{$attname} = join(' ', @lookupoids);
 }
 elsif ($atttype eq 'oid[]')
@@ -378,39 +365,21 @@ EOM
 	if ($bki_values{$attname} ne '_null_')
 	{
 		$bki_values{$attname} =~ s/[{}]//g;
-		my @lookupnames = split /,/, $bki_values{$attname};
-		my @lookupoids;
-		foreach my $lookupname (@lookupnames)
-		{
-			my $lookupoid = $lookup->{ $lookupname };
-			if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-			{
-$lookupname = $lookupoid;
-			}
-			else
-			{
-warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-	if $lookupname ne '-' && $lookupname ne '0';
-			}
-			push @lookupoids, $lookupname;
-		}
+		@lookupnames = split /,/, $bki_values{$attname};
+		@lookupoids = lookup_oids($lookup, $catname,
+	\%bki_values, @lookupnames);
+
 		$bki_values{$attname} =
 			sprintf "{%s}", join(',', @lookupoids);
 	}
 }
 else
 {
-	my $lookupname = $bki_values{$attname};
-	my $lookupoid = $lookup->{ $lookupname };
-	if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
-	{
-		$bki_values{$attname} = $lookupoid;
-	}
-	else
-	{
-		warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',', values(%bki_values))
-			if $lookupname ne '-' && $lookupname ne '0';
-	}
+	$lookupnames[0] = $bki_values{$attname};
+	@lookupoids = lookup_oids($lookup, $catname,
+\%bki_values, @lookupnames);
+
+	$bki_values{$attname} = $lookupoids[0];
 }
 			}
 		}
@@ -759,6 +728,32 @@ sub morph_row_for_schemapg
 	}
 }
 
+# Perfo

Re: Online enabling of checksums

2018-04-05 Thread Andrey Borodin


> 5 апр. 2018 г., в 14:33, Tomas Vondra  
> написал(а):
> 
> This patch version seems fine to me. I'm inclined to mark it RFC.
+1
The patch works fine for me. I've tried different combinations of backend 
cancelation and the only suspicious thing I found is that you can start 
multiple workers by cancelling launcher and not cancelling worker. Is it 
problematic behavior? If we run pg_enable_data_checksums() it checks for 
existing launcher for a reason, m.b. it should check for worker too?

I think it worth to capitalize WAL in "re-write the page to wal".

Best regards, Andrey Borodin.


Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin  wrote:

>
>
> > 5 апр. 2018 г., в 14:33, Tomas Vondra 
> написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend
> cancelation and the only suspicious thing I found is that you can start
> multiple workers by cancelling launcher and not cancelling worker. Is it
> problematic behavior? If we run pg_enable_data_checksums() it checks for
> existing launcher for a reason, m.b. it should check for worker too?
>

I don't think it's a problem in itself -- it will cause pointless work, but
not actually cause any poroblems I think (whereas duplicate launchers could
cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

I think it worth to capitalize WAL in "re-write the page to wal".
>

In the comment, right? Yeah, easy fix.,

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-04-05 Thread Ildus Kurbangaliev
On Wed, 04 Apr 2018 16:07:25 +0300
Marina Polyakova  wrote:

> Hello, hackers!
> 
> Here there's a seventh version of the patch for error handling and 
> retrying of transactions with serialization/deadlock failures in
> pgbench (based on the commit
> a08dc711952081d63577fc182fcf955958f70add). I added the option
> --max-tries-time which is an implemetation of Fabien Coelho's
> proposal in [1]: the transaction with serialization or deadlock
> failure can be retried if the total time of all its tries is less
> than this limit (in ms). This option can be combined with the option
> --max-tries. But if none of them are used, failed transactions are
> not retried at all.
> 
> Also:
> * Now when the first failure occurs in the transaction it is always 
> reported as a failure since only after the remaining commands of this 
> transaction are executed we find out whether we can try again or not. 
> Therefore add the messages about retrying or ending the failed 
> transaction to the "fails" debugging level so you can distinguish 
> failures (which are retried) and errors (which are not retried).
> * Fix a report on the latency average because the total time includes 
> time for both errors and successful transactions.
> * Code cleanup (including tests).
> 
> [1] 
> https://www.postgresql.org/message-id/alpine.DEB.2.20.1803292134380.16472%40lancre
> 
> > Maybe the max retry should rather be expressed in time rather than 
> > number
> > of attempts, or both approach could be implemented?  
> 

Hi, I did a little review of your patch. It seems to work as
expected, documentation and tests are there. Still I have few comments.

There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with
corresponding fprintf(stderr..) I think it's time to do it like in the
main code, wrap with some function like log(level, msg).

In CSTATE_RETRY state used_time is used only in printing but calculated
more than needed.

In my opinion Debuglevel should be renamed to DebugLevel that looks
nicer, also there DEBUGLEVEl (where last letter is in lower case) which
is very confusing.

I have checked overall functionality of this patch, but haven't checked
any special cases yet.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Online enabling of checksums

2018-04-05 Thread Andrey Borodin


> 5 апр. 2018 г., в 19:58, Magnus Hagander  написал(а):
> 
> 
> 
> On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin  wrote:
> 
> 
> > 5 апр. 2018 г., в 14:33, Tomas Vondra  
> > написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend 
> cancelation and the only suspicious thing I found is that you can start 
> multiple workers by cancelling launcher and not cancelling worker. Is it 
> problematic behavior? If we run pg_enable_data_checksums() it checks for 
> existing launcher for a reason, m.b. it should check for worker too?
> 
> I don't think it's a problem in itself -- it will cause pointless work, but 
> not actually cause any poroblems I think (whereas duplicate launchers could 
> cause interesting things to happen).
> 
> How did you actually cancel the launcher to end up in this situation?
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
  pid  | backend_type  
---+---
 98587 | checksumhelper worker
 98589 | checksumhelper worker
 98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling 
pg_disable_data_checksums(), but this is extremely stupid for a user.

Best regards, Andrey Borodin.


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Alvaro Herrera
Quick item: parse_clause.h fails cpluspluscheck because it has a C++
keyword as a function argument name:

./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ before 
‘namespace’
   List **namespace);
  ^

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] btree_gin, add support for uuid, bool, name, bpchar and anyrange types

2018-04-05 Thread Teodor Sigaev

Thanks to everyone, first patch is pushed.

Range opclass seems unusable because comparing function is close to dummy and 
BTree opclass is only useful to implement unique check constraint. So, for range 
it should different index structure to be useful.


Matheus de Oliveira wrote:



On Thu, Apr 5, 2018 at 8:16 AM, Teodor Sigaev > wrote:


somehow you missed some parts in 0001 patch, at least regression tests fail:

   CREATE EXTENSION btree_gin;
+ ERROR:  could not find function "gin_extract_value_uuid" in file
"/usr/local/pgsql/lib/btree_gin.so"


Ouch... My fault, filterdiff is acting weird for some reason...

Here is the corrected versions... I tested here applying on a clean master just 
to be sure, all looks good.


Very sorry about that mess. I hope it can get in v11, it is a patch so simple, 
but so useful for many people.


Best regards,
--
Matheus de Oliveira




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane
John Naylor  writes:
> On 4/5/18, Tom Lane  wrote:
>> I did not like the hard-wired handling of proargtypes and proallargtypes
>> in genbki.pl; it hardly seems impossible that we'll want similar
>> conversions for other array-of-OID columns in future.  After a bit of
>> thought, it seemed like we could allow
>>  oidvectorproargtypes BKI_LOOKUP(pg_type);
>>  Oid  proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>> and just teach genbki.pl that if a lookup rule is attached to
>> an oidvector or Oid[] column, it means to apply the rule to
>> each array element individually.

> I think that's a good idea. I went an extra step and extracted the
> common logic into a function (attached draft patch to be applied on
> top of yours). It treats all lookups as operating on arrays. The
> common case is that we pass a single-element array. That may seem
> awkward, but I think it's clear. The code is slimmer, and the lines
> now fit within 80 characters.

Sounds good.  I too was bothered by the duplication of code, but
I'm not a good enough Perl programmer to have thought of that solution.

Something that bothered me a bit while writing the warning-producing code
is that showing %bki_values isn't actually that great a way of identifying
the trouble spot.  By this point we've expanded out defaults and possibly
replaced some other macros, so it doesn't look that much like what was
in the .dat file.  I think what would be ideal, both here and in some
other places like AddDefaultValues, is to be able to finger the location
of the bad tuple by filename and line number, but I have no idea whether
it's practical to annotate the tuples with that while reading the .dat
files.  Any thoughts?

(Obviously, better error messages could be a future improvement; it's not
something we have to get done before the conversion.)

> Unrelated, I noticed my quoting of defaults that contain back-slashes
> was half-baked, so I'll include that fix in the next patchset. I'll
> put out a new one in a couple days, to give a chance for further
> review and discussion of the defaults. I didn't feel the need to
> respond to the other messages, but yours and Andres' points are well
> taken.

We're getting down to the wire here --- I think the plan is to close
the CF on Saturday or Sunday, and then push the bootstrap changes right
after that.  So please turn around whatever you're planning to do ASAP.
I'm buckling down to a final review today and tomorrow.

regards, tom lane



Re: Removing useless DISTINCT clauses

2018-04-05 Thread Melanie Plageman
Hi David,
The updated patch looks good to me.
I've changed the status to "ready for committer"
Thanks


Re: json(b)_to_tsvector with numeric values

2018-04-05 Thread Dmitry Dolgov
> On 4 April 2018 at 16:09, Teodor Sigaev  wrote:
>
>>> Hm, seems, it's useful feature, but I suggest to make separate function
>>> jsonb_any_to_tsvector and add support for boolean too (if you know better
>>> name for function, do not hide it). Changing behavior of existing
>>> function
>>> is not obvious for users and, seems, should not backpatched.
>>
>>
>> What do you think about having not a separate function, but a flag
>> argument to
>> the existing one (like `create` in `jsonb_set`), that will have false as
>> default value? The result would be the same, but without an extra function
>> with
>> almost the same implementation.
>
>
> tsvector jsonb_to_tsvector(jsonb[, bool]) ?
> Agreed. Second arg should be optional.

Unfortunately, this idea with a flag argument can't be implemented easily
(related discussion is here [1]). So I've modified the patch accordingly to
your original suggestion about having separate functions
`json(b)_all_to_tsvector`.

1: 
https://www.postgresql.org/message-id/flat/CA%2Bq6zcVJ%2BWx%2B-%3DkkN5UC0T-LtsJWnx0g9S0xSnn3jUWkriufDA%40mail.gmail.com
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c4..895b60a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9696,6 +9696,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 
+ json(b)_all_to_tsvector( config regconfig ,  document json(b))
+
+tsvector
+
+  reduce each string, numeric or boolean value in the document to a tsvector,
+  and then concatenate those in document order to produce a single tsvector
+
+json_all_to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::json)
+'123':5 'fat':2 'rat':3
+   
+   
+
  
   ts_delete
  
diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
index ea5947a..02c2b00 100644
--- a/src/backend/tsearch/to_tsany.c
+++ b/src/backend/tsearch/to_tsany.c
@@ -267,12 +267,12 @@ to_tsvector(PG_FUNCTION_ARGS)
 		PointerGetDatum(in)));
 }
 
-Datum
-jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+/*
+ * Worker function for jsonb(_all)_to_tsvector(_byid)
+ */
+static TSVector
+jsonb_to_tsvector_worker(Oid cfgId, Jsonb *jb, bool allTypes)
 {
-	Oid			cfgId = PG_GETARG_OID(0);
-	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
-	TSVector	result;
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -281,11 +281,24 @@ jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_jsonb_string_values(jb, &state, add_to_tsvector);
+	if (allTypes)
+		iterate_jsonb_all_values(jb, &state, add_to_tsvector);
+	else
+		iterate_jsonb_string_values(jb, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(jb, 1);
 
-	result = make_tsvector(&prs);
+	return make_tsvector(&prs);
+}
+
+Datum
+jsonb_to_tsvector_byid(PG_FUNCTION_ARGS)
+{
+	Oid			cfgId = PG_GETARG_OID(0);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
+	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
 
 	PG_RETURN_TSVECTOR(result);
 }
@@ -295,19 +308,48 @@ jsonb_to_tsvector(PG_FUNCTION_ARGS)
 {
 	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
 	Oid			cfgId;
+	TSVector	result;
 
 	cfgId = getTSCurrentConfig(true);
-	PG_RETURN_DATUM(DirectFunctionCall2(jsonb_to_tsvector_byid,
-		ObjectIdGetDatum(cfgId),
-		JsonbPGetDatum(jb)));
+	result = jsonb_to_tsvector_worker(cfgId, jb, false);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
 }
 
 Datum
-json_to_tsvector_byid(PG_FUNCTION_ARGS)
+jsonb_all_to_tsvector_byid(PG_FUNCTION_ARGS)
 {
 	Oid			cfgId = PG_GETARG_OID(0);
-	text	   *json = PG_GETARG_TEXT_P(1);
+	Jsonb	   *jb = PG_GETARG_JSONB_P(1);
 	TSVector	result;
+
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+Datum
+jsonb_all_to_tsvector(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *jb = PG_GETARG_JSONB_P(0);
+	Oid			cfgId;
+	TSVector	result;
+
+	cfgId = getTSCurrentConfig(true);
+	result = jsonb_to_tsvector_worker(cfgId, jb, true);
+	PG_FREE_IF_COPY(jb, 1);
+
+	PG_RETURN_TSVECTOR(result);
+}
+
+/*
+ * Worker function for json(_all)_to_tsvector(_byid)
+ */
+static TSVector
+json_to_tsvector_worker(Oid cfgId, text *json, bool allTypes)
+{
 	TSVectorBuildState state;
 	ParsedText	prs;
 
@@ -316,11 +358,20 @@ json_to_tsvector_byid(PG_FUNCTION_ARGS)
 	state.prs = &prs;
 	state.cfgId = cfgId;
 
-	iterate_json_string_values(json, &state, add_to_tsvector);
+	iterate_json_values(json, allTypes, &state, add_to_tsvector);
 
-	PG_FREE_IF_COPY(json, 1);
+	return make_tsvector(&prs);
+}
+
+Datum
+json_to_tsvector_byid(PG_FUNCTION_ARGS)
+{
+	Oid			cfgId = PG_GETARG_OID(0);
+	text	   *json = PG_GETARG_TEXT_P(1);
+	TSVector	result;
 
-	result = make_tsvector(&prs);
+	result = json_to_tsvector_worker(cfgId, json, false);
+	PG_FREE_IF_COPY(json, 1);
 
 	PG_RETURN_TSVECTOR(result);
 }
@@ -330,11 +381,40 @@ json_to_tsvec

Re: [PATCH] Logical decoding of TRUNCATE

2018-04-05 Thread Peter Eisentraut
On 4/3/18 22:31, Peter Eisentraut wrote:
> The problem I see now is that when we WAL-log a truncate, we include all
> the relids in one record.  That seems useful.  But during decoding we
> split this into one change per relid.  So at the receiving end, truncate
> can only process one relation at a time, which will fail if there are
> foreign keys involved.  The solution that had been proposed here was to
> ignore foreign keys on the subscriber, which is clearly problematic.

> I'm going to try to hack up an alternative approach and see how it works
> out.

Done here.  I added a separate callback for decoding truncates, which
receives all the relations at once.  That information can then be
shipped off together and applied together on the receiving side.  So
foreign keys are not a problem anymore.  This ended up being a bit
larger than the original patch, but I think it's sound behavior and
future-proof.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From df101e32c358ac9243285d4e8f125803988e5508 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 5 Apr 2018 11:46:41 -0400
Subject: [PATCH v19 1/2] Logical decoding of TRUNCATE

---
 contrib/test_decoding/Makefile|  2 +-
 contrib/test_decoding/expected/truncate.out   | 25 ++
 contrib/test_decoding/sql/truncate.sql| 10 +++
 contrib/test_decoding/test_decoding.c | 61 +
 doc/src/sgml/logicaldecoding.sgml | 27 +-
 src/backend/access/heap/heapam.c  |  7 ++
 src/backend/access/rmgrdesc/heapdesc.c| 14 +++
 src/backend/commands/tablecmds.c  | 87 +--
 src/backend/replication/logical/decode.c  | 41 +
 src/backend/replication/logical/logical.c | 43 +
 .../replication/logical/reorderbuffer.c   | 35 
 src/include/access/heapam_xlog.h  | 23 -
 src/include/commands/tablecmds.h  |  2 +
 src/include/replication/output_plugin.h   | 10 +++
 src/include/replication/reorderbuffer.h   | 24 -
 15 files changed, 398 insertions(+), 13 deletions(-)
 create mode 100644 contrib/test_decoding/expected/truncate.out
 create mode 100644 contrib/test_decoding/sql/truncate.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 6c18189d9d..1d601d8144 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -39,7 +39,7 @@ submake-test_decoding:
 
 REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
decoding_into_rel binary prepared replorigin time messages \
-   spill slot
+   spill slot truncate
 
 regresscheck: | submake-regress submake-test_decoding temp-install
$(pg_regress_check) \
diff --git a/contrib/test_decoding/expected/truncate.out 
b/contrib/test_decoding/expected/truncate.out
new file mode 100644
index 00..be85178206
--- /dev/null
+++ b/contrib/test_decoding/expected/truncate.out
@@ -0,0 +1,25 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+CREATE TABLE tab1 (id serial unique, data int);
+CREATE TABLE tab2 (a int primary key, b int);
+TRUNCATE tab1;
+TRUNCATE tab1, tab1 RESTART IDENTITY CASCADE;
+TRUNCATE tab1, tab2;
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');
+ data 
+--
+ BEGIN
+ table public.tab1: TRUNCATE: (no-flags)
+ COMMIT
+ BEGIN
+ table public.tab1: TRUNCATE: restart_seqs cascade
+ COMMIT
+ BEGIN
+ table public.tab1, public.tab2: TRUNCATE: (no-flags)
+ COMMIT
+(9 rows)
+
diff --git a/contrib/test_decoding/sql/truncate.sql 
b/contrib/test_decoding/sql/truncate.sql
new file mode 100644
index 00..88f113fd5b
--- /dev/null
+++ b/contrib/test_decoding/sql/truncate.sql
@@ -0,0 +1,10 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
+
+CREATE TABLE tab1 (id serial unique, data int);
+CREATE TABLE tab2 (a int primary key, b int);
+
+TRUNCATE tab1;
+TRUNCATE tab1, tab1 RESTART IDENTITY CASCADE;
+TRUNCATE tab1, tab2;
+
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');
diff --git a/contrib/test_decoding/test_decoding.c 
b/contrib/test_decoding/test_decoding.c
index a94aeeae29..c238f12e66 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -52,6 +52,10 @@ static void pg_decode_commit_txn(LogicalDecodingContext *ctx,
 static void pg_decode_change(LogicalDecodingContext *ctx,
 ReorderBufferTXN *txn, Relation rel,
 ReorderBufferChange *change);
+static void pg_decode_truncate(LogicalDecodingContext *ctx,

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 16:09, Alvaro Herrera  wrote:
> Quick item: parse_clause.h fails cpluspluscheck because it has a C++
> keyword as a function argument name:
>
> ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ 
> before ‘namespace’
>List **namespace);
>   ^

How's this as a fix?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fnamespace.v1.patch
Description: Binary data


Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Alvaro Herrera
Simon Riggs wrote:
> On 5 April 2018 at 16:09, Alvaro Herrera  wrote:
> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++
> > keyword as a function argument name:
> >
> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ 
> > before ‘namespace’
> >List **namespace);
> >   ^
> 
> How's this as a fix?

WFM

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-04-05 Thread David Steele
Hi Michael,

On 4/5/18 2:55 AM, Michael Paquier wrote:
> On Wed, Apr 04, 2018 at 08:03:54PM -0400, David Steele wrote:
> 
>> Instead I have created variables in file_perm.c
>> that hold the current file create mode, dir create mode, and mode mask.
>> All call sites use those variables (e.g. pg_dir_create_mode), which I
>> think are much clear.
> 
> Hm.  I personally find that even more confusing, especially with
> SetDataDirectoryCreatePerm which basically is the same as
> SetConfigOption for the backend.  

The GUC shows the current mode of the data directory, while the
variables in file_perm.c store the mode that should be used to create
new dirs/files.  One is certainly based on the other but I thought it
best to split them for clarity.

> In your case pg_dir_create_mode is not
> aimed at remaining a constant as you may change it depending on if
> grouping is enabled or not...  And all the other iterations of such
> variables in src/common/ are constants (see NumScanKeywords or
> forkNames[] for example), so I cannot see with a good eye this change.

The idea is to have a variable that give the fir dir/file create mode
without having to trust that umask() is set correctly or do mode masking
on chmod().  This makes a number of call sites simpler and, I hope,
easier to read.

> You also forgot a call to SetDataDirectoryCreatePerm or
> pg_dir_create_mode remains to its default.

Are saying *if* a call is forgotten?

Yes, the file/dir create modes will use the default restrictive
permissions unless set otherwise.  I see this as a good thing.  Worst
case, we miss some areas where group access should be enabled and we
find those over the next few months.

What we don't want is a regression in the current, default behavior.
This is design is intended to avoid that outcome.

> The interface of file_perm.h that you are introducing is not confusing
> anymore though..

Yes, that was the idea.  I think it makes it clearer what we are trying
to do and centralizes variables related to create modes in one place.

I've attached new patches that exclude Windows from permissions tests
and deal with bootstrap permissions in a better way.  PostgresMain() and
AuxiliaryProcessMain() now use checkDataDir() to set permissions.

Thanks!
-- 
-David
da...@pgmasters.net
From 30599b54e0f31e6c6842e028eeca3048b9a371ca Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 5 Apr 2018 11:34:58 -0400
Subject: [PATCH 1/2] Refactor file permissions in backend/frontend

Adds a new header (file_perm.h) and makes all front-end utilities use the new 
constants for setting umask. Converts mkdir() calls in the backend to 
MakeDirectoryDefaultPerm() if the original call used default permissions. Adds 
tests to make sure permissions in PGDATA are set correctly by the front-end 
tools.
---
 src/backend/access/transam/xlog.c|  2 +-
 src/backend/commands/tablespace.c| 18 ---
 src/backend/postmaster/postmaster.c  |  5 +-
 src/backend/postmaster/syslogger.c   |  5 +-
 src/backend/replication/basebackup.c |  5 +-
 src/backend/replication/slot.c   |  5 +-
 src/backend/storage/file/copydir.c   |  2 +-
 src/backend/storage/file/fd.c| 38 +--
 src/backend/storage/ipc/dsm_impl.c   |  3 +-
 src/backend/storage/ipc/ipc.c|  4 ++
 src/backend/utils/init/miscinit.c|  5 +-
 src/bin/initdb/initdb.c  | 24 -
 src/bin/initdb/t/001_initdb.pl   | 11 -
 src/bin/pg_basebackup/pg_basebackup.c|  9 ++--
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 14 +-
 src/bin/pg_basebackup/t/020_pg_receivewal.pl | 14 +-
 src/bin/pg_basebackup/walmethods.c   |  6 ++-
 src/bin/pg_ctl/pg_ctl.c  |  4 +-
 src/bin/pg_ctl/t/001_start_stop.pl   | 18 ++-
 src/bin/pg_resetwal/pg_resetwal.c|  5 +-
 src/bin/pg_resetwal/t/001_basic.pl   | 12 -
 src/bin/pg_rewind/RewindTest.pm  |  4 ++
 src/bin/pg_rewind/file_ops.c |  7 +--
 src/bin/pg_rewind/t/001_basic.pl | 11 -
 src/bin/pg_upgrade/file.c|  5 +-
 src/bin/pg_upgrade/pg_upgrade.c  |  3 +-
 src/bin/pg_upgrade/test.sh   | 11 +
 src/common/Makefile  |  4 +-
 src/common/file_perm.c   | 19 
 src/include/common/file_perm.h   | 32 
 src/include/storage/fd.h |  3 ++
 src/test/perl/PostgresNode.pm|  3 ++
 src/test/perl/TestLib.pm | 73 
 src/tools/msvc/Mkvcbuild.pm  |  4 +-
 34 files changed, 315 insertions(+), 73 deletions(-)
 create mode 100644 src/common/file_perm.c
 create mode 100644 src/include/common/file_perm.h

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index b4fd8395b7..5398d2f925 10064

Re: new function for tsquery creartion

2018-04-05 Thread Teodor Sigaev

Thanks to everyone, pushed with some editorization:

1) translate russian test to prevent potential problems with encoding
2) fix inconsistency 'or cat' and 'cat or', second example doesn't treat OR as 
lexeme, but first one does.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: Allow workers to override datallowconn

2018-04-05 Thread Magnus Hagander
On Thu, Mar 29, 2018 at 4:39 PM, Tomas Vondra 
wrote:

>
>
> On 03/02/2018 12:16 PM, Magnus Hagander wrote:
> > On Fri, Feb 23, 2018 at 7:55 PM, Magnus Hagander  > > wrote:
> >
> > On Fri, Feb 23, 2018 at 7:52 PM, Tom Lane  > > wrote:
> >
> > Magnus Hagander  > > writes:
> > > Here's another attempt at moving this one forward. Basically
> this adds a
> > > new GucSource being GUC_S_CLIENT_EARLY. It now runs through
> the parameters
> > > once before CheckMyDatabase, with source set to
> GUC_S_CLIENT_EARLY. In this
> > > source, *only* parameters that are flagged as GUC_ALLOW_EARLY
> will be set,
> > > any other parameters are ignored (without error). For now,
> only the
> > > ignore_connection_restriction is allowed at this stage. Then
> it runs
> > > CheckMyDatabase(), and after that it runs through all the
> parameters again,
> > > now with the GUC_S_CLIENT source as usual, which will now
> process all
> > > other  variables.
> >
> > Ick.  This is an improvement over the other way of attacking the
> > problem?
> > I do not think so.
> >
> >
> > Nope, I'm far from sure that it is. I just wanted to show what it'd
> > look like.
> >
> > I personally think the second patch (the one adding a parameter to
> > BackendWorkerInitializeConnection) is the cleanest one. It doesn't
> > solve Andres' problem, but perhaps that should be the job of a
> > different patch.
> >
> >
> >
> > FWIW, I just realized that thue poc patch that adds the GUC also breaks
> > a large part of the regression tests. As a side-effect of it breaking
> > how DateStyle works. That's obviously a fixable problem, but it seems
> > not worth spending time on if that's not the way forward anyway.
> >
> > Andres, do you have any other ideas of directions to look that would
> > make you withdraw your objection? I'm happy to try to write up a patch
> > that solves it in a way that everybody can agree with. But right now it
> > seems to be stuck between one way that's strongly objected to by you and
> > one way that's strongly objected to by Tom. And I'd rather not have that
> > end up with not getting the problem solved at all for *any* of the
> > usecases...
> >
>
> My 2c: I think we should just go with the simplest solution, that is the
> patch sent on 22/2 19:54 (i.e. BackgroundWorkerInitializeConnectionByOid
> with an extra parameter).
>
> It would be nice to have something more generic that also works for the
> Andres' use case, but the patches submitted to this thread are not
> particularly pretty. Also, Tom suggested there may be safety issues when
> the GUCs are processed earlier - I agree Andres is right the GUCs are
> effectively ASCII-only so the encoding is not an issue, but perhaps
> there are other issues (Tom suggested this merely as an example).
>
> So I agree with Magnus, the extra flag seems to be perfectly fine for
> bgworkers, and I'd leave the more generic solution for a future patch if
> anyone wants to hack on it.
>

With no further feedback on this, I have pushed this version of the patch
(rebased). Thanks!


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [PATCH] Logical decoding of TRUNCATE

2018-04-05 Thread Alvaro Herrera
This sounds like a good approach.

> +static void
> +pg_decode_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> +int nrelations, Relation relations[], 
> ReorderBufferChange *change)
> +{

> + for (i = 0; i < nrelations; i++)
> + {
> + Oid relid = RelationGetRelid(relations[i]);
> +
> + if (i > 0)
> + appendStringInfoString(ctx->out, ", ");
> +
> + appendStringInfoString(ctx->out,
> +
> quote_qualified_identifier(
> +
> get_namespace_name(get_rel_namespace(relid)),
> +
> get_rel_name(relid)));

Note that you start the loop having the Relation; yet you go extra
length to grab the relnamespace and relname from syscache instead of
just relations[i]->rd_rel->relname etc.

pgoutput doesn't do it that way, so it doesn't affect logical
replication, but I think it's best not to create awkward code in
test_decoding, since it's so widely copied.


> + else if (info == XLOG_HEAP_TRUNCATE)
> + {
> + xl_heap_truncate *xlrec = (xl_heap_truncate *) rec;
> +
> + if (xlrec->flags & XLH_TRUNCATE_CASCADE)
> + appendStringInfo(buf, "cascade ");
> + if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS)
> + appendStringInfo(buf, "restart_seqs ");
> + appendStringInfo(buf, "nrelids %u", xlrec->nrelids);
> + /* skip the list of relids */
> + }

Maybe not a big deal, but for future pg_waldump users I'm sure it'll be
nice to have the OIDs here.

> +void
> +ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
> + DropBehavior behavior, 
> bool restart_seqs)
> +{

Please add a comment atop this function.

  
> + /*
> +  * Write a WAL record to allow this set of actions to be logically 
> decoded.
> +  *
> +  * Assemble an array of relids so we can write a single WAL record for 
> the
> +  * whole action.
> +  */
> + if (list_length(relids_logged) > 0)
> + {
> + xl_heap_truncate xlrec;
> + int i = 0;

I wonder if this should happen only if logical decoding?  (Maybe it
already occurs because relids_logged would be empty?  Worth a comment in
that case)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane
I experimented with converting all frontend code to include just the
catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
proposed new policy.  I soon found that we'd overlooked one thing:
some clients expect to see the relation OID macros, eg
LargeObjectRelationId.  Attached is a patch that changes things around
so that those appear in the _d files instead of the master files.
This is cleaner anyway because it removes duplication of the OIDs in
the master files, with attendant risk of error.  For example we
have this change in pg_aggregate.h:

-#define AggregateRelationId  2600
-
-CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
+CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS

Some of the CATALOG lines spill well past 80 characters with this,
although many of the affected ones already were overlength, eg

-#define DatabaseRelationId 1262
-#define DatabaseRelation_Rowtype_Id  1248
-
-CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) 
BKI_SCHEMA_MACRO
+CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION 
BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

I thought about improving that by removing the restriction that these
BKI annotations appear on the same line as the CATALOG macro, so that
we could break the above into several lines.  I think the original key
reason for the restriction was to avoid accidentally taking some bit
of a DATA line as a BKI annotation.  With the DATA lines gone from these
files, that's no longer a significant hazard (although passing references
to BKI keywords in comments might still be hazards for the Perl scripts).
However, if we try to format things like

CATALOG(pg_database,1262,DatabaseRelationId)
BKI_SHARED_RELATION
BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
{
fields...
}

I'm afraid that neither pgindent nor a lot of common editors would indent
that very nicely.  So at least for the moment I'm inclined to just keep
it all on one line ... we know how that behaves, anyway.

regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 519247e..fb3d62a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -104,18 +104,29 @@ sub ParseHeader
 			{
 push @{ $catalog{indexing} }, "build indices\n";
 			}
-			elsif (/^CATALOG\(([^,]*),(\d+)\)/)
+			elsif (/^CATALOG\(([^,]*),(\d+),(\w+)\)/)
 			{
 $catalog{catname} = $1;
 $catalog{relation_oid} = $2;
+$catalog{relation_oid_macro} = $3;
 
 $catalog{bootstrap} = /BKI_BOOTSTRAP/ ? ' bootstrap' : '';
 $catalog{shared_relation} =
   /BKI_SHARED_RELATION/ ? ' shared_relation' : '';
 $catalog{without_oids} =
   /BKI_WITHOUT_OIDS/ ? ' without_oids' : '';
-$catalog{rowtype_oid} =
-  /BKI_ROWTYPE_OID\((\d+)\)/ ? " rowtype_oid $1" : '';
+if (/BKI_ROWTYPE_OID\((\d+),(\w+)\)/)
+{
+	$catalog{rowtype_oid} = $1;
+	$catalog{rowtype_oid_clause} = " rowtype_oid $1";
+	$catalog{rowtype_oid_macro} = $2;
+}
+else
+{
+	$catalog{rowtype_oid} = '';
+	$catalog{rowtype_oid_clause} = '';
+	$catalog{rowtype_oid_macro} = '';
+}
 $catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0;
 $declaring_attributes = 1;
 			}
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index f6be50a..fe8c3ca 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -228,6 +228,7 @@ my @tables_needing_macros;
 # produce output, one catalog at a time
 foreach my $catname (@catnames)
 {
+	my $catalog = $catalogs{$catname};
 
 	# Create one definition header with macro definitions for each catalog.
 	my $def_file = $output_path . $catname . '_d.h';
@@ -258,13 +259,21 @@ foreach my $catname (@catnames)
 
 EOM
 
+	# Emit OID macros for catalog's OID and rowtype OID, if wanted
+	print $def
+	  sprintf("#define %s %s\n", $catalog->{relation_oid_macro}, $catalog->{relation_oid})
+	  if $catalog->{relation_oid_macro} ne '';
+	print $def
+	  sprintf("#define %s %s\n", $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid})
+	  if $catalog->{rowtype_oid_macro} ne '';
+	print $def "\n";
+
 	# .bki CREATE command for this catalog
-	my $catalog = $catalogs{$catname};
 	print $bki "create $catname $catalog->{relation_oid}"
 	  . $catalog->{shared_relation}
 	  . $catalog->{bootstrap}
 	  . $catalog->{without_oids}
-	  . $catalog->{rowtype_oid} . "\n";
+	  . $catalog->{rowtype_oid_clause} . "\n";
 
 	my $first = 1;
 
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 9732f61..0e6285f 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -15,7 +15,7 @@ while (<>)
 	next if /^CATALOG\(.*BKI_BOOTSTRAP/;
 	next
 	  unless /\boid *=> *'(\d+)'/
-		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+)\)/
+		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin  wrote:

>
>
> > 5 апр. 2018 г., в 19:58, Magnus Hagander 
> написал(а):
> >
> >
> >
> > On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin 
> wrote:
> >
> >
> > > 5 апр. 2018 г., в 14:33, Tomas Vondra 
> написал(а):
> > >
> > > This patch version seems fine to me. I'm inclined to mark it RFC.
> > +1
> > The patch works fine for me. I've tried different combinations of
> backend cancelation and the only suspicious thing I found is that you can
> start multiple workers by cancelling launcher and not cancelling worker. Is
> it problematic behavior? If we run pg_enable_data_checksums() it checks for
> existing launcher for a reason, m.b. it should check for worker too?
> >
> > I don't think it's a problem in itself -- it will cause pointless work,
> but not actually cause any poroblems I think (whereas duplicate launchers
> could cause interesting things to happen).
> >
> > How did you actually cancel the launcher to end up in this situation?
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
>
> select pid,backend_type from pg_stat_activity where backend_type ~'checks';
>   pid  | backend_type
> ---+---
>  98587 | checksumhelper worker
>  98589 | checksumhelper worker
>  98591 | checksumhelper worker
> (3 rows)
>
> There is a way to shoot yourself in a leg then by calling
> pg_disable_data_checksums(), but this is extremely stupid for a user
>


Ah, didn't consider query cancel. I'm not sure how much  we should actually
care about it, but it's easy enough to trap that signal and just do a clean
shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch
just landed (which also removes some docs).



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 122f034f17..6257563eaa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19540,6 +19540,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   
 
+  
+   Data Checksum Functions
+
+   
+The functions shown in  can
+be used to enable or disable data checksums in a running cluster.
+See  for details.
+   
+
+   
+Checksum SQL Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+  
+ 
+ 
+  
+   
+
+ pg_enable_data_checksums
+
+pg_enable_data_checksums(cost_delay int, cost_limit int)
+   
+   
+void
+   
+   
+
+ Initiates data checksums for the cluster. This will switch the data checksums mode
+ to in progress and start a background worker that will process
+ all data in the database and enable checksums for it. When all data pages have had
+ checksums enabled, the cluster will automatically switch to checksums
+ on.
+
+
+ If cost_delay and cost_limit are
+ specified, the speed of the process is throttled using the same principles as
+ Cost-based Vacuum Delay.
+
+   
+  
+  
+   
+
+ pg_disable_data_checksums
+
+pg_disable_data_checksums()
+   
+   
+void
+   
+   
+Disables data checksums for the cluster.
+   
+  
+ 
+
+   
+
+  
+
   
Database Object Management Functions
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4e01e5641c..7cd6ee85dc 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -211,6 +211,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent. Enabling checksums
-may incur a noticeable performance penalty. This option can only
-be set during initialization, and cannot be changed later. If
-set, checksums are calculated for all objects, in all databases.
+may incur a noticeable performance penalty. If set, checksums
+are calculated f

Re: pgsql: New files for MERGE

2018-04-05 Thread Andres Freund
On 2018-04-05 06:09:31 -0400, Robert Haas wrote:
> On Wed, Apr 4, 2018 at 3:09 PM, Tom Lane  wrote:
> > Well, what's on the table is reverting this patch and asking you to try
> > again in the v12 cycle.  Given Andres' concerns about the executor design,
> > and mine about the way the parsing end is built, there's certainly no way
> > that that's all getting fixed by Saturday.  Given pretty much everybody's
> > unhappiness with the way this patch was forced through at the last minute,
> > I do not think you should expect that we'll say, "okay, we'll let you ship
> > a bad version of MERGE because there's no more time in this cycle".
> 
> +1.

+1.



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
Hi,

On 2018-04-05 10:23:43 +0300, Heikki Linnakangas wrote:
> Profiling that, without any patches applied, I noticed that a lot of time
> was spent in read()s on the postmaster-death pipe, i.e. in
> PostmasterIsAlive(). We call that between *every* WAL record.

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records.

I agree this is a performance problem. I do however not like the fix.
ISTM the better approach would be to try to reduce the cost of
PostmasterIsAlive() on common platforms - it should be nearly free if
done right.

One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
check for postmaster death inside the handler, without reacting to
it. Then the the actual PostmasterIsAlive() checks are just a check of a
single sig_atomic_t.

Greetings,

Andres Freund



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread Tom Lane
BTW, I experimented with adding blank lines between the hash items in the
.dat files, and that seemed to make a nice improvement in readability,
converting masses of rather gray text into visibly distinct stanzas.
I'm not dead set on that, but try it and see what you think.

A small example from pg_aggregate.dat:

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },
{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

versus

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },

{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },

{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

regards, tom lane



Re: Flexible configuration for full-text search

2018-04-05 Thread Andres Freund
Hi,

On 2018-04-05 17:26:10 +0300, Teodor Sigaev wrote:
> Some notices:
> 
> 0) patch conflicts with last changes in gram.y, conflicts are trivial.
> 
> 1) jsonb in catalog. I'm ok with it, any opinions?
> 
> 2) pg_ts_config_map.h, "jsonb   mapdicts" isn't decorated with #ifdef
> CATALOG_VARLEN like other varlena columns in catalog. It it's right, pls,
> explain and add comment.
> 
> 3) I see changes in pg_catalog, including drop column, change its type,
> change index, change function etc. Did you pay attention to pg_upgrade? I
> don't see it in patch.
> 
> 4) Seems, it could work:
> ALTER TEXT SEARCH CONFIGURATION russian
>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>   word, hword, hword_part
> WITH english_stem union (russian_stem, simple);
>^ simple way instead of
> WITH english_stem union (case russian_stem when match then keep else simple 
> end);
> 
> 4) Initial approach suggested to distinguish three state of dictionary
> result: null (unknown word), stopword and usual word. Now only two, we lost
> possibility to catch stopwords. One of way to use stopwrods is: let we have
> to identical fts configurations, except one skips stopwords and another
> doesn't. Second configuration is used for indexing, and first one for search
> by default. But if we can't  find anything ('to be or to be' - phrase
> contains stopwords only) then we can use second configuration. For now, we
> need to keep two variant of each dictionary - with and without stopwords.
> But if it's possible to distinguish stop and nonstop words in configuration
> then we don't need to have duplicated dictionaries.

Just to be clear: I object to attempting to merge this into v11. This
introduces new user interface, arrived late in the development cycle,
and hasn't seen that much review. Not something that should be merged
two minutes before midnight.

I think it's good to continue reviewing, don't get me wrong.


- Andres



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Tom Lane
Andres Freund  writes:
> ISTM the better approach would be to try to reduce the cost of
> PostmasterIsAlive() on common platforms - it should be nearly free if
> done right.

+1 if it's doable.

> One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> check for postmaster death inside the handler, without reacting to
> it. Then the the actual PostmasterIsAlive() checks are just a check of a
> single sig_atomic_t.

AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
ever writes to it.  So this sketch seems off to me, even assuming that
not-ignoring SIGPIPE causes no problems elsewhere.

While it's not POSIX, at least some platforms are capable of delivering
a separate signal on parent process death.  Perhaps using that where
available would be enough of an answer.

regards, tom lane



Re: some last patches breaks plan cache

2018-04-05 Thread Peter Eisentraut
On 4/4/18 14:03, Tomas Vondra wrote:
>> If there's really no other way, you could use a PG_TRY block to 
>> ensure that the pointer gets reset on the way out. But I question
>> why we've got a design that requires that in the first place. It's
>> likely to have more problems than this.
> 
> I agree it needs a solution that does not require us to track and
> manually reset pointers on random places. No argument here.

I've committed a fix with PG_TRY.

A more complete solution would be to able to keep the plan independent
of a resowner.  That would require a bit more deep surgery in SPI, it
seems.  I'll take a look if it's doable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel Aggregates for string_agg and array_agg

2018-04-05 Thread Tels
Moin,

On Wed, April 4, 2018 11:41 pm, David Rowley wrote:
> Hi Tomas,
>
> Thanks for taking another look.
>
> On 5 April 2018 at 07:12, Tomas Vondra 
> wrote:
>> Other than that, the patch seems fine to me, and it's already marked as
>> RFC so I'll leave it at that.
>
> Thanks.

I have one more comment - sorry for not writing sooner, the flu got to me ...

Somewhere in the code there is a new allocation of memory when the string
grows beyond the current size - and that doubles the size. This can lead
to a lot of wasted space (think: constructing a string that is a bit over
1 Gbyte, which would presumable allocate 2 GByte).

The same issue happens when each worker allocated 512 MByte for a 256
Mbyte + 1 result.

IMHO a factor of like 1.4 or 1.2 would work better here - not sure what
the current standard in situations like this in PG is.

>> The last obstacle seems to be the argument about the risks of the patch
>> breaking queries of people relying on the ordering. Not sure what's are
>> the right next steps in this regard ...
>
> yeah, seems like a bit of a stalemate.
>
> Personally, think if the group of people Tom mentions do exist, then
> they've already been through some troubled times since Parallel Query
> was born. I'd hope they've already put up their defenses due to the
> advent of that feature. I stand by my thoughts that it's pretty
> bizarre to draw the line here when we've probably been causing these
> people issues for many years already. I said my piece on this already
> so likely not much point in going on about it. These people are also
> perfectly capable of sidestepping the whole issue with setting
> max_parallel_workers_per_gather TO 0.

Coming from the Perl side, this issue has popped up there a lot with the
ordering of hash keys (e.g. "for my $key (keys %$hash) { ... }") and even
tho there was never a guarantee, it often caught people by surprise.
Mostly in testsuites, tho, because real output often needs some ordering,
anyway.

However, changes where nevertheless done, and code had to be adapted. But
the world didn't end, so to speak.

For PG, I think the benefits greatly outweight the problems with the
sorting order - after all, if you have code that relies on an implicit
order, you've already got a problem, no?

So my favourite would be something along these lines:

* add the string_agg
* document it in the release notes, and document workaround/solutions (add
ORDER BY, disabled workers etc.)
* if not already done, stress in the documentation that if you don't ORDER
things, things might not come out in the order you expect - especially
with paralell queries, new processing nodes etc.

Best wishes,

Tels

PS: We use string_agg() in a case where we first agg each row, then
string_agg() all rows, and the resulting string is really huge. We did run
into the "out of memory"-problem, so we now use a LIMIT and assembly the
resulting parts on the client side. My wish here would be to better know
how large the LIMIT can be, I found it quite difficult to predict with how
many rows PG runs out of memory for the string buffer, even tho all rows
have almost the same length as text. But that aside, getting the parts
faster with parallel agg would be very cool, too.





Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 17:07, Alvaro Herrera  wrote:
> Simon Riggs wrote:
>> On 5 April 2018 at 16:09, Alvaro Herrera  wrote:
>> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++
>> > keyword as a function argument name:
>> >
>> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ 
>> > before ‘namespace’
>> >List **namespace);
>> >   ^
>>
>> How's this as a fix?
>
> WFM

Pushed

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
On 2018-04-05 14:39:27 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > ISTM the better approach would be to try to reduce the cost of
> > PostmasterIsAlive() on common platforms - it should be nearly free if
> > done right.
> 
> +1 if it's doable.
> 
> > One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> > check for postmaster death inside the handler, without reacting to
> > it. Then the the actual PostmasterIsAlive() checks are just a check of a
> > single sig_atomic_t.
> 
> AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
> ever writes to it.  So this sketch seems off to me, even assuming that
> not-ignoring SIGPIPE causes no problems elsewhere.

Yea, you're probably right. I'm mostly brainstorming here.

(FWIW, I don't think not ignoring SIGPIPE would be a huge issue if we
don't immediately take any action on its account)


> While it's not POSIX, at least some platforms are capable of delivering
> a separate signal on parent process death.  Perhaps using that where
> available would be enough of an answer.

Yea, that'd work on linux. Which is probably the platform 80-95% of
performance critical PG workloads run on.  There's
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
but I'm not sure it provides enough opportunity for cleanup.

Greetings,

Andres Freund



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-05 Thread Andres Freund
On 2018-04-04 22:10:06 -0700, David G. Johnston wrote:
> On Wednesday, April 4, 2018, Amit Kapila  wrote:
> 
> > On Thu, Apr 5, 2018 at 7:14 AM, Andres Freund  wrote:
> > >
> >
> > > Questions:
> > >
> > > - I'm not perfectly happy with
> > >   "tuple to be locked was already moved to another partition due to
> > concurrent update"
> > >   as the error message. If somebody has a better suggestions.
> > >
> >
> > I don't have any better suggestion, but I have noticed a small
> > inconsistency in the message.  In case of delete, the message is
> > "tuple to be updated was ...". I think here it should be "tuple to be
> > deleted was ..."
> >
> 
> The whole "moved to another partition" explains why and seems better placed
> in the errdetail.  The error itself should indicate which attempted action
> failed.  And the attempted action for the end user usually isn't the scope
> of "locked tuple" - it's the insert or update, the locking is a side effect
> (why).

Well, update/delete have their own messages, don't think you can get
this for inserts (there'd be no tuple to follow across EPQ). The case I
copied from above, was locking a tuple, hence the reference to that.

I don't agree with moving "moved to another partition" to errdetail,
that's *the* crucial detail. If there's anything in the error message,
it should be that.

Greetings,

Andres Freund



Re: SET TRANSACTION in PL/pgSQL

2018-04-05 Thread Peter Eisentraut
On 4/4/18 13:53, Tomas Vondra wrote:
>> Here is the same patch rewritten using SPI, using the new no_snapshots
>> facility recently introduced.
> 
> Yeah, doing that using SPI seems much cleaner and more like the rest of
> the commands. Most of the patch is boilerplate to support the grammar,
> and the one interesting piece exec_stmt_set seems fine to me.
> 
> Barring any objections, I'll mark it as RFC tomorrow morning.

You apparently didn't, but I committed it anyway. ;-)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-05 Thread Bruce Momjian
On Thu, Apr  5, 2018 at 03:09:57PM +0800, Craig Ringer wrote:
> ENOSPC doesn't seem to be a concern during normal operation of major file
> systems (ext3, ext4, btrfs, xfs) because they reserve space before returning
> from write(). But if a buffered write does manage to fail due to ENOSPC we'll
> definitely see the same problems. This makes ENOSPC on NFS a potentially data
> corrupting condition since NFS doesn't preallocate space before returning from
> write().

This does explain why NFS has a reputation for unreliability for
Postgres.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-05 Thread Alvaro Herrera
Pavan Deolasee wrote:
> On Thu, Apr 5, 2018 at 7:14 AM, Andres Freund  wrote:

> +   /*
> +* As long as we don't support an UPDATE of INSERT ON CONFLICT for
> +* a partitioned table we shouldn't reach to a case where tuple to
> +* be lock is moved to another partition due to concurrent update
> +* of the partition key.
> +*/
> +   Assert(!ItemPointerIndicatesMovedPartitions(&hufd.ctid));
> +
> 
> This is no longer true; at least not entirely. We still don't support ON
> CONFLICT DO UPDATE to move a row to a different partition, but otherwise it
> works now. See 555ee77a9668e3f1b03307055b5027e13bf1a715.

Right.  So I think the assert() should remain, but the comment should
say "As long as we don't update moving a tuple to a different partition
during INSERT ON CONFLICT DO UPDATE on a partitioned table, ..."

FWIW I think the code flow is easier to read with the renamed macros.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-04-05 Thread Andres Freund
On 2018-04-05 10:17:59 +0530, Amit Kapila wrote:
> On Thu, Apr 5, 2018 at 7:14 AM, Andres Freund  wrote:
> Why?  tid is both an input and output parameter.  The input tid is
> valid and is verified at the top of the function, now if no row
> version is visible, then it should have the same value as passed Tid.
> I am not telling that it was super important to have that assertion,
> but if it is valid then it can catch a case where we might have missed
> checking the tuple which has invalid block number (essentialy the case
> introduced by the patch).

You're right. It's bonkers that the output parameter isn't set to an
invalid value if the tuple isn't found. Makes the whole function
entirely useless.

> > - I'm not perfectly happy with
> >   "tuple to be locked was already moved to another partition due to 
> > concurrent update"
> >   as the error message. If somebody has a better suggestions.
> >
> 
> I don't have any better suggestion, but I have noticed a small
> inconsistency in the message.  In case of delete, the message is
> "tuple to be updated was ...". I think here it should be "tuple to be
> deleted was ...".

Yea, I noticed that too. Note that the message a few lines up is
similarly wrong:
ereport(ERROR,
(errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
 errmsg("tuple to be updated was already modified by an 
operation triggered by the current command"),
 errhint("Consider using an AFTER trigger instead of a BEFORE 
trigger to propagate changes to other rows.")));


> > - should heap_get_latest_tid() error out when the chain ends in a moved
> >   tuple?
> 
> Won't the same question applies to the similar usage in
> EvalPlanQualFetch and heap_lock_updated_tuple_rec.

I don't think so?


> In EvalPlanQualFetch, we consider such a tuple to be deleted and will
> silently miss/skip it which seems contradictory to the places where we
> have detected such a situation and raised an error.

if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("tuple to be locked was already moved to 
another partition due to concurrent update")));


> In heap_lock_updated_tuple_rec, we will skip locking the versions of a
> tuple after we encounter a tuple version that is moved to another
> partition.

I don't think that's true? We'll not lock *any* tuple in that case, but
return HeapTupleUpdated. Which callers then interpret in whatever way
they need to?

Greetings,

Andres Freund



Re: [HACKERS] Runtime Partition Pruning

2018-04-05 Thread Jesper Pedersen

Hi David,

First of all: Solid patch set with good documentation.

On 04/05/2018 09:41 AM, David Rowley wrote:

Seems mostly fair. I'm not a fan of using the term "unpruned" though.
I'll have a think.  The "all" is meant in terms of what exists as
subnodes.



'included_parts' / 'excluded_parts' probably isn't better...


subplan_indexes and parent_indexes seem like better names, I agree.



More clear.


* Also in make_partition_pruneinfo()

 /* Initialize to -1 to indicate the rel was not found */
 for (i = 0; i < root->simple_rel_array_size; i++)
 {
 allsubnodeindex[i] = -1;
 allsubpartindex[i] = -1;
 }

Maybe, allocate the arrays above mentioned using palloc0 and don't do this
initialization.  Instead make the indexes that are stored in these start
with 1 and consider 0 as invalid entries.


0 is a valid subplan index. It is possible to make this happen, but
I'd need to subtract 1 everywhere I used the map. That does not seem
very nice. Seems more likely to result in bugs where we might forget
to do the - 1.

Did you want this because you'd rather have the palloc0() than the for
loop setting the array elements to -1? Or is there another reason?



I think that doing palloc0 would be confusing; -1 is more clear, 
especially since it is used with 'allpartindexes' which is a Bitmapset.


Doing the variable renames as Amit suggests would be good.

I ran some tests (v50_v20) (make check-world passes), w/ and w/o 
choose_custom_plan() being false, and seeing good performance results 
without running into issues.


Maybe 0005 should be expanded in partition_prune.sql with the supported 
cases to make those more clear.


Thanks !

Best regards,
 Jesper



Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Andres Freund
Hi,

On 2018-04-05 11:31:48 +0530, Pavan Deolasee wrote:
> > +/*-
> > 
> > + *
> > + * nodeMerge.c
> > + *   routines to handle Merge nodes relating to the MERGE command
> >
> > Isn't this file misnamed and it should be execMerge.c?  The rest of the
> > node*.c files are for nodes that are invoked via execProcnode /
> > ExecProcNode().  This isn't an actual executor node.
> >
> 
> Makes sense. Done. (Now that the patch is committed, I don't know if there
> would be a rethink about changing file names. May be not, just raising that
> concern)

It absolutely definitely needed to be renamed. But that's been done, so
...


> > What's the reasoning behind making this be an anomaluous type of
> > executor node?

> Didn't quite get that. I think naming of the file was bad (fixed now), but
> I think it's a good idea to move the new code in a new file, from
> maintainability as well as coverage perspective. If you've something very
> different in mind, can you explain in more details?

Well, it was kinda modeled as an executor node, including the file
name. That's somewhat fixed now.   I still am extremely suspicious of
the codeflow here.

My impression is that this simply shouldn't be going through
nodeModifyTuple, but be it's own nodeMerge node. The trigger handling
would need to be abstraced into execTrigger.c or such to avoid
duplication.  We're now going from nodeModifyTable.c:ExecModifyTable()
into execMerge.c:ExecMerge(), back to
nodeModifyTable.c:ExecUpdate/Insert(). To avoid ExecInsert() doing
things that aren't appropriate for merge, we then pass an actionState,
which neuters part of ExecUpdate/Insert(). This is just bad.

I think this should be cleaned up before it can be released, which means
this feature should be reverted and cleaned up nicely before being
re-committed. Otherwise we'll sit on this bad architecture and it'll
make future features harder. And if somebody cleans it up Simon will
complain that things are needlessly being destabilized (hello xlog.c
with it's 1000+ LOC functions).


> > +   /*
> > +* If there are not WHEN MATCHED actions, we are done.
> > +*/
> > +   if (mergeMatchedActionStates == NIL)
> > +   return true;
> >
> > Maybe I'm confused, but why is mergeMatchedActionStates attached to
> > per-partition info? How can this differ in number between partitions,
> > requiring us to re-check it below fetching the partition info above?
> >
> >
> Because each partition may have a columns in different order, dropped
> attributes etc. So we need to give treatment to the quals/targetlists. See
> ON CONFLICT DO UPDATE for similar code.

But the count wouldn't change, no? So we return before building the
partition info if there's no MATCHED action?


> > It seems fairly bad architecturally to me that we now have
> > EvalPlanQual() loops in both this routine *and*
> > ExecUpdate()/ExecDelete().
> >
> >
> This was done after review by Peter and I think I like the new way too.
> Also keeps the regular UPDATE/DELETE code paths least changed and let Merge
> handle concurrency issues specific to it.

It also makes the whole code barely readable. Minimal amount of changes
isn't a bad goal, but if the consequence of that is poor layering and
repeated code it's bad as well.


> > +   /*
> > +* Initialize hufdp. Since the caller is only interested in the failure
> > +* status, initialize with the state that is used to indicate
> > successful
> > +* operation.
> > +*/
> > +   if (hufdp)
> > +   hufdp->result = HeapTupleMayBeUpdated;
> > +
> >
> > This signals for me that the addition of the result field to HUFD wasn't
> > architecturally the right thing. HUFD is supposed to be supposed to be
> > returned by heap_update(), reusing and setting it from other places
> > seems like a layering violation to me.

> I am not sure I agree. Sure we can keep adding more parameters to
> ExecUpdate/ExecDelete and such routines, but I thought passing back all
> information via a single struct makes more sense.

You can just wrap HUFD in another struct that has the necessary
information.


> > +
> > +   /*
> > +* Add a whole-row-Var entry to support references to "source.*".
> > +*/
> > +   var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
> > +   te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList)
> > + 1,
> > +NULL, true);
> >
> > Why can't this properly dealt with by transformWholeRowRef() etc?
> >
> >
> I just followed ON CONFLICT style. May be there is a better way, but not
> clear how.

What code are you referring to?



> > Why is this, and not building a proper executor node for merge that
> > knows how to get the tuples, the right approach?  We did a rough
> > equivalent for matview updates, and I think it turned out to be a pretty
> > bad plan.
> >
> >
> I am still not sure why that would be any better. Can you explain in detail
> what exactly you've i

Re: Vacuum: allow usage of more than 1GB of work mem

2018-04-05 Thread Heikki Linnakangas

On 03/04/18 17:20, Claudio Freire wrote:

Ok, rebased patches attached


Thanks! I took a look at this.

First, now that the data structure is more complicated, I think it's 
time to abstract it, and move it out of vacuumlazy.c. The Tid Map needs 
to support the following operations:


* Add TIDs, in order (in 1st phase of vacuum)
* Random lookup, by TID (when scanning indexes)
* Iterate through all TIDs, in order (2nd pass over heap)

Let's add a new source file to hold the code for the tid map data 
structure, with functions corresponding those operations.


I took a stab at doing that, and I think it makes vacuumlazy.c nicer.

Secondly, I'm not a big fan of the chosen data structure. I think the 
only reason that the segmented "multi-array" was chosen is that each 
"segment" works is similar to the simple array that we used to have. 
After putting it behind the abstraction, it seems rather ad hoc. There 
are many standard textbook data structures that we could use instead, 
and would be easier to understand and reason about, I think.


So, I came up with the attached patch. I used a B-tree as the data 
structure. Not sure it's the best one, I'm all ears for suggestions and 
bikeshedding on alternatives, but I'm pretty happy with that. I would 
expect it to be pretty close to the simple array with binary search in 
performance characteristics. It'd be pretty straightforward to optimize 
further, and e.g. use a bitmap of OffsetNumbers or some other more dense 
data structure in the B-tree leaf pages, but I resisted doing that as 
part of this patch.


I haven't done any performance testing of this (and not much testing in 
general), but at least the abstraction seems better this way. 
Performance testing would be good, too. In particular, I'd like to know 
how this might affect the performance of lazy_tid_reaped(). That's a hot 
spot when vacuuming indexes, so we don't want to add any cycles there. 
Was there any ready-made test kits on that in this thread? I didn't see 
any at a quick glance, but it's a long thread..


- Heikki
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e090..634059b8ff 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o vacuumtidmap.o variable.o view.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index d2a006671a..0cc6b98831 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -4,23 +4,21 @@
  *	  Concurrent ("lazy") vacuuming.
  *
  *
- * The major space usage for LAZY VACUUM is storage for the array of dead tuple
- * TIDs.  We want to ensure we can vacuum even the very largest relations with
- * finite memory space usage.  To do that, we set upper bounds on the number of
- * tuples we will keep track of at once.
+ * The major space usage for LAZY VACUUM is storage of dead tuple TIDs.
+ * We want to ensure we can vacuum even the very largest relations with
+ * finite memory space usage.  To do that, we set upper bounds on the amount
+ * of memory used to track the TIDs at any one time.
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
- * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
- * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * use a specialized data structure to hold the TIDs, which keeps track of
+ * the memory used.  If the memory limit is about to be exceeded, we suspend
+ * the heap scan phase and perform a pass of index cleanup and page
+ * compaction, then resume the heap scan with an empty tid map.
  *
  * If we're processing a table with no indexes, we can just vacuum each page
  * as we go; there's no need to save up multiple tuples to minimize the number
- * of index scans performed.  So we don't use maintenance_work_mem memory for
- * the TID array, just enough to hold as many heap tuples as fit on one page.
+ * of index scans performed.
  *
  *
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
@@ -94,13 +92,6 @@
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
 /*
- * Guesstimation of number of dead tuples per page.  This is used to
- * provide an upper limit to memory allocated when vacuuming small
- * 

Re: pgsql: New files for MERGE

2018-04-05 Thread Bruce Momjian
On Thu, Apr  5, 2018 at 11:15:20AM +0100, Simon Riggs wrote:
> On 4 April 2018 at 21:28, Simon Riggs  wrote:
> > On 4 April 2018 at 21:14, Andres Freund  wrote:
> >
> >>> The normal way is to make review comments that allow change. Your
> >>> request for change of the parser data structures is fine and can be
> >>> done, possibly by Saturday
> >>
> >> I did request changes, and you've so far ignored those requests.
> >
> > Pavan tells me he has replied to you and is working on specific changes.
> 
> Specific changes requested have now been implemented by Pavan and
> committed by me.
> 
> My understanding is that he is working on a patch for Tom's requested
> parser changes, will post on other thread.

Simon, you have three committers in this thread suggesting this patch be
reverted.  Are you just going to barrel ahead with the fixes without
addressing their emails?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 7:30 PM, Magnus Hagander  wrote:

> On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin 
> wrote:
>
>>
>>
>> > 5 апр. 2018 г., в 19:58, Magnus Hagander 
>> написал(а):
>> >
>> >
>> >
>> > On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin 
>> wrote:
>> >
>> >
>> > > 5 апр. 2018 г., в 14:33, Tomas Vondra 
>> написал(а):
>> > >
>> > > This patch version seems fine to me. I'm inclined to mark it RFC.
>> > +1
>> > The patch works fine for me. I've tried different combinations of
>> backend cancelation and the only suspicious thing I found is that you can
>> start multiple workers by cancelling launcher and not cancelling worker. Is
>> it problematic behavior? If we run pg_enable_data_checksums() it checks for
>> existing launcher for a reason, m.b. it should check for worker too?
>> >
>> > I don't think it's a problem in itself -- it will cause pointless work,
>> but not actually cause any poroblems I think (whereas duplicate launchers
>> could cause interesting things to happen).
>> >
>> > How did you actually cancel the launcher to end up in this situation?
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>>
>> select pid,backend_type from pg_stat_activity where backend_type
>> ~'checks';
>>   pid  | backend_type
>> ---+---
>>  98587 | checksumhelper worker
>>  98589 | checksumhelper worker
>>  98591 | checksumhelper worker
>> (3 rows)
>>
>> There is a way to shoot yourself in a leg then by calling
>> pg_disable_data_checksums(), but this is extremely stupid for a user
>>
>
>
> Ah, didn't consider query cancel. I'm not sure how much  we should
> actually care about it, but it's easy enough to trap that signal and just
> do a clean shutdown on it, so I've done that.
>
> PFA a patch that does that, and also rebased over the datallowconn patch
> just landed (which also removes some docs).
>
>

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online enabling of checksums

2018-04-05 Thread Andres Freund
On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.
> 
> Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool.  We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

- Andres



Re: Online enabling of checksums

2018-04-05 Thread Andres Freund
On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > I have now pushed this latest version with some minor text adjustments and
> > a catversion bump.
> > 
> > Thanks for all the reviews!
> 
> I want to be on the record that I think merging a nontrival feature that
> got submitted 2018-02-21, just before the start of the last last CF, is
> an abuse of process, and not cool.  We've other people working hard to
> follow the process, and circumventing it like this just signals to
> people trying to follow the rules that they're fools.
> 
> Merging ~2kloc patches like that is going to cause pain. And even if
> not, it causes procedual damage.

And even worse, without even announcing an intent to commit and giving
people a chance to object.

Greetings,

Andres Freund



Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander
On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund  wrote:

> On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> > On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > > I have now pushed this latest version with some minor text adjustments
> and
> > > a catversion bump.
> > >
> > > Thanks for all the reviews!
> >
> > I want to be on the record that I think merging a nontrival feature that
> > got submitted 2018-02-21, just before the start of the last last CF, is
> > an abuse of process, and not cool.  We've other people working hard to
> > follow the process, and circumventing it like this just signals to
> > people trying to follow the rules that they're fools.
> >
> > Merging ~2kloc patches like that is going to cause pain. And even if
> > not, it causes procedual damage.
>

I can understand those arguments, and if that's the view of others as well,
I can of course revert that.


And even worse, without even announcing an intent to commit and giving
> people a chance to object.
>

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such a
rule.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Online enabling of checksums

2018-04-05 Thread Andres Freund


On April 5, 2018 1:20:52 PM PDT, Magnus Hagander  wrote:
>On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund 
>wrote:
>
>And even worse, without even announcing an intent to commit and giving
>> people a chance to object.
>>
>
>At least this patch was posted on the lists before commit, unlike many
>others from many different people. And AFAIK there has never been such
>a
>rule.

The more debatable a decision is, the more important it IMO becomes to give 
people a chance to object. Don't think there needs to be a hard rule to always 
announce an intent to commit.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor
On 4/6/18, Tom Lane  wrote:
> I experimented with converting all frontend code to include just the
> catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
> proposed new policy.  I soon found that we'd overlooked one thing:
> some clients expect to see the relation OID macros, eg
> LargeObjectRelationId.  Attached is a patch that changes things around
> so that those appear in the _d files instead of the master files.
> This is cleaner anyway because it removes duplication of the OIDs in
> the master files, with attendant risk of error.  For example we
> have this change in pg_aggregate.h:
>
> -#define AggregateRelationId  2600
> -
> -CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
> +CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS
>
> Some of the CATALOG lines spill well past 80 characters with this,
> although many of the affected ones already were overlength, eg
>
> -#define DatabaseRelationId   1262
> -#define DatabaseRelation_Rowtype_Id  1248
> -
> -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248)
> BKI_SCHEMA_MACRO
> +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION
> BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

It seems most of the time the FooRelationId labels are predictable,
although not as pristine as the Anum_* constants. One possibility that
came to mind is to treat these like pg_type OID #defines -- have a
simple rule that can be overridden for historical reasons. In this
case the pg_database change would simply be:

-#define DatabaseRelationId 1262
-#define DatabaseRelation_Rowtype_Id  1248
-

and genbki.pl would know what to do. But for pg_am:

-#define AccessMethodRelationId 2601
-
-CATALOG(pg_am,2601)
+CATALOG(pg_am,2601) BKI_REL_LABEL(AccessMethod)

I haven't thought this through yet. I imagine it will add as well as
remove a bit of complexity, code-wise. The upside is most CATALOG
lines will remain unchanged, and those that do won't end up quite as
long. I can try a draft tomorrow to see how it looks, unless you see
an obvious downside.

-John Naylor



Re: WIP: a way forward on bootstrap data

2018-04-05 Thread John Naylor
On 4/5/18, Tom Lane  wrote:
> I've generalized the BKI_LOOKUP(pg_proc) code so that
> you can use either regproc-like or regprocedure-like notation, and then
> applied that to relevant columns.
> [...]
> bootstrap-v13-delta.patch is a diff atop your patch series for the
> in-tree files, and convert_oid2name.patch adjusts that script to
> make use of the additional conversion capability.

Looking at convert_oid2name.patch again, I see this:

+   elsif ($catname eq 'pg_am')
+   {
+   $values{aggfnoid} = 
lookup_procname($values{aggfnoid});
+   }

aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup.
Do you remember the intent here?

-John Naylor



Re: Parallel Aggregates for string_agg and array_agg

2018-04-05 Thread Tomas Vondra


On 04/05/2018 09:10 PM, Tels wrote:
> Moin,
> 
> On Wed, April 4, 2018 11:41 pm, David Rowley wrote:
>> Hi Tomas,
>>
>> Thanks for taking another look.
>>
>> On 5 April 2018 at 07:12, Tomas Vondra 
>> wrote:
>>> Other than that, the patch seems fine to me, and it's already marked as
>>> RFC so I'll leave it at that.
>>
>> Thanks.
> 
> I have one more comment - sorry for not writing sooner, the flu got to me ...
> 
> Somewhere in the code there is a new allocation of memory when the string
> grows beyond the current size - and that doubles the size. This can lead
> to a lot of wasted space (think: constructing a string that is a bit over
> 1 Gbyte, which would presumable allocate 2 GByte).
> 

I don't think we support memory chunks above 1GB, so that's likely going
to fail anyway. See

  #define MaxAllocSize   ((Size) 0x3fff) /* 1 gigabyte - 1 */
  #define AllocSizeIsValid(size) ((Size) (size) <= MaxAllocSize)

But I get your point - we may be wasting space here. But that's hardly
something this patch should mess with - that's a more generic allocation
question.

> The same issue happens when each worker allocated 512 MByte for a 256
> Mbyte + 1 result.
> 
> IMHO a factor of like 1.4 or 1.2 would work better here - not sure what
> the current standard in situations like this in PG is.
> 

With a 2x scale factor, we only waste 25% of the space on average.
Consider that you're growing because you've reached the current size,
and you double the size - say, from 1MB to 2MB. But the 1MB wasted space
is the worst case - in reality we'll use something between 1MB and 2MB,
so 1.5MB on average. At which point we've wasted just 0.5MB, i.e. 25%.

That sounds perfectly reasonable to me. Lower factor would be more
expensive in terms of repalloc, for example.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



  1   2   >