Re: Regarding ambulkdelete, amvacuumcleanup index methods

2018-01-23 Thread Masahiko Sawada
On Wed, Jan 24, 2018 at 1:27 PM, Abinaya k  wrote:
> Hai all,
>   We are building In-memory index extension for postgres. We would
> capture table inserts, updates, deletes using triggers. During vacuum
> operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as
> part of index cleanup). As we handle all updates, deletes using triggers, we
> don't have to do any index cleanup in ambulkdelete. But, what stats should i
> return from ambulkdelete and amvacuumcleanup? Is that necessary to return
> stats from ambulkdelete and amvacuumcleanup ?

Both ambulkdelete and amvacuumcleanup return an IndexBulkDeleteResult.
If you return a non-NULL value, the values of returned
IndexBulkDeleteResult are used for updating the index statistics and
reporting the statistics of bulk deletion in lazy_cleanup_index. For
example, num_pages and num_index_tuples are used for updating
pg_class.relpages and pg_class.reltuples. But if you return NULL from
them, these are skipped.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Help needed in using 'on_dsm_detach' callback

2018-01-23 Thread Gaddam Sai Ram
Hello people,

  We are trying to build an in-memory index in postgres using 
dsa. 



Here is how we implemented dsa part.

We have PROC_DSA_AREA global variable(Process specific DSA Pointer)

We have a piece of traditional postgres shared memory to store dsa_handle

Each process that needs to use DSA, should create/attach DSA (based on 
dsa_handle stored in shmem)

Once created/attached, set the process dsa pointer to PROC_DSA_AREA variable

Subsequent DSA access in that process will use PROC_DSA_AREA variable with out 
looking to create/attach



Problem here is, on DSA Detach, i would like to reset PROC_DSA_AREA variable to 
NULL. Otherwise subsequent DSA calls tries to use the same pointer and may end 
up into Segmentation Faults. How could i do that? 





Found that there is a callback for dsa detach but that function requires 
segment pointer as an argument, Should be as below:




on_dsm_detach(PROC_DSA_AREA-segment_maps[0].segment, detach_func);



** detach_func will set PROC_DSA_AREA variable to NULL.







But i couldn't access that segment, as DSA_AREA struct is defined in dsa.c, so 
I am unable to include.



Any other ways to get dsa detach event, or to access DSA Segment pointer?



Got stuck here.. kindly help me to proceed further.






 


Thank you,
G. Sai Ram








Re: [HACKERS] Subscription code improvements

2018-01-23 Thread Masahiko Sawada
On Tue, Jan 23, 2018 at 7:58 AM, Stephen Frost  wrote:
> Greetings,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Fri, Aug 18, 2017 at 2:09 PM, Masahiko Sawada  
>> wrote:
>> > On Thu, Aug 17, 2017 at 8:17 PM, Masahiko Sawada  
>> > wrote:
>> >> On Thu, Aug 17, 2017 at 9:10 AM, Peter Eisentraut
>> >>  wrote:
>> >>> On 8/8/17 05:58, Masahiko Sawada wrote:
>>  Are you planning to work on remaining patches 0005 and 0006 that
>>  improve the subscription codes in PG11 cycle? If not, I will take over
>>  them and work on the next CF.
>> >>>
>> >>> Based on your assessment, the remaining patches were not required bug
>> >>> fixes.  So I think preparing them for the next commit fest would be 
>> >>> great.
>> >>>
>> >>
>> >> Thank you for the comment.
>> >>
>> >> After more thought, since 0001 and 0003 patches on the first mail also
>> >> improve the subscription codes and are worth to be considered, I
>> >> picked total 4 patches up and updated them. I'm planning to work on
>> >> these patches in the next CF.
>> >>
>> >
>> > Added this item to the next commit fest.
>>
>> The patch set fails to apply. Please provide a rebased version. I am
>> moving this entry to next CF with waiting on author as status.
>
> Masahiko Sawada, this patch is still in Waiting on Author and hasn't
> progressed in a very long time.  Is there any chance you'll be able to
> provide an updated patch soon for review?  Or should this patch be
> closed out?
>

Thank you for notification. Since it seems to me that no one is
interested in this patch, it would be better to close out this patch.
This patch is a refactoring patch but subscription code seems to work
fine so far. If a problem appears around subscriptions, I might
propose it again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



JIT compiling with LLVM v9.0

2018-01-23 Thread Andres Freund
Hi,

I've spent the last weeks working on my LLVM compilation patchset. In
the course of that I *heavily* revised it. While still a good bit away
from committable, it's IMO definitely not a prototype anymore.

There's too many small changes, so I'm only going to list the major
things. A good bit of that is new. The actual LLVM IR emissions itself
hasn't changed that drastically.  Since I've not described them in
detail before I'll describe from scratch in a few cases, even if things
haven't fully changed.


== JIT Interface ==

To avoid emitting code in very small increments (increases mmap/mremap
rw vs exec remapping, compile/optimization time), code generation
doesn't happen for every single expression individually, but in batches.

The basic object to emit code via is a jit context created with:
  extern LLVMJitContext *llvm_create_context(bool optimize);
which in case of expression is stored on-demand in the EState. For other
usecases that might not be the right location.

To emit LLVM IR (ie. the portabe code that LLVM then optimizes and
generates native code for), one gets a module from that with:
  extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);

to which "arbitrary" numbers of functions can be added. In case of
expression evaluation, we get the module once for every expression, and
emit one function for the expression itself, and one for every
applicable/referenced deform function.

As explained above, we do not want to emit code immediately from within
ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed
expression sets the function to callback, which gets the actual native
function on the first actual call.  That allows to batch together the
generation of all native functions that are defined before the first
expression is evaluated - in a lot of queries that'll be all.

Said callback then calls
  extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
which'll emit code for the "in progress" mutable module if necessary,
and then searches all generated functions for the name. The names are
created via
  extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
currently "evalexpr" and deform" with a generation and counter suffix.

Currently expression which do not have access to an EState, basically
all "parent" less expressions, aren't JIT compiled. That could be
changed, but I so far do not see a huge need.


== Error handling ==

There's two aspects to error handling.

Firstly, generated (LLVM IR) and emitted functions (mmap()ed segments)
need to be cleaned up both after a successful query execution and after
an error.  I've settled on a fairly boring resowner based mechanism. On
errors all expressions owned by a resowner are released, upon success
expressions are reassigned to the parent / released on commit (unless
executor shutdown has cleaned them up of course).


A second, less pretty and newly developed, aspect of error handling is
OOM handling inside LLVM itself. The above resowner based mechanism
takes care of cleaning up emitted code upon ERROR, but there's also the
chance that LLVM itself runs out of memory. LLVM by default does *not*
use any C++ exceptions. It's allocations are primarily funneled through
the standard "new" handlers, and some direct use of malloc() and
mmap(). For the former a 'new handler' exists
http://en.cppreference.com/w/cpp/memory/new/set_new_handler for the
latter LLVM provides callback that get called upon failure
(unfortunately mmap() failures are treated as fatal rather than OOM
errors).
What I've chosen to do, and I'd be interested to get some input about
that, is to have two functions that LLVM using code must use:
  extern void llvm_enter_fatal_on_oom(void);
  extern void llvm_leave_fatal_on_oom(void);
before interacting with LLVM code (ie. emitting IR, or using the above
functions) llvm_enter_fatal_on_oom() needs to be called.

When a libstdc++ new or LLVM error occurs, the handlers set up by the
above functions trigger a FATAL error. We have to use FATAL rather than
ERROR, as we *cannot* reliably throw ERROR inside a foreign library
without risking corrupting its internal state.

Users of the above sections do *not* have to use PG_TRY/CATCH blocks,
the handlers instead are reset on toplevel sigsetjmp() level.


Using a relatively small enter/leave protected section of code, rather
than setting up these handlers globally, avoids negative interactions
with extensions that might use C++ like e.g. postgis. As LLVM code
generation should never execute arbitrary code, just setting these
handlers temporarily ought to suffice.


== LLVM Interface / patches ==

Unfortunately a bit of required LLVM functionality, particularly around
error handling but also initialization, aren't currently fully exposed
via LLVM's C-API.  A bit more *optional* API isn't exposed either.

Instead of requiring a brand-new version of LLVM that has exposed this
functionality I decided it's better to have a 

Re: Doc tweak for huge_pages?

2018-01-23 Thread Justin Pryzby
On Wed, Jan 24, 2018 at 07:46:41AM +0100, Catalin Iacob wrote:
> I see Peter assigned himself as committer, some more information below
> for him to decide on the strength of the anti THP message.
Thanks for digging this up!

> And it would be good if somebody could run benchmarks on pre 4.6 and
> post 4.6 kernels. I would love to but have no access to big (or
> medium) hardware.
I should be able to do this, since I have a handful of kernels upgrades on my
todo list.  Can you recommend a test ?  Otherwise I'll come up with something
for pgbench.

But I think any test should be independant of and not influence the doc change
(I don't know anywhere else in the docs which talks about behaviors of specific
kernel versions, which often have vendor patches backpatched anyway).

> So maybe we should weaken the language against THP. Maybe present the
> known facts so far, even if the post 4.6 situation is vague/unknown:
> before Linux 4.6 there were repeated reports of THP problems with
> Postgres, Linux >= 4.6 might improve things but this isn't confirmed.
> And it would be good if somebody could run benchmarks on pre 4.6 and
> post 4.6 kernels. I would love to but have no access to big (or
> medium) hardware.
I think all the details should go elsewhere in the docs; config.sgml already
references this:
https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-HUGE-PAGES
..but it doesn't currently mention "transparent" hugepages.

Justin



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2018-01-23 Thread amul sul
On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila  wrote:
> On Fri, Jan 12, 2018 at 11:43 AM, amul sul  wrote:
[]
> I have asked to change the message "tuple to be updated .." after
> heap_lock_tuple call not in nodeModifyTable.c, so please revert the
> message in nodeModifyTable.c.
>

Understood, fixed in the attached version, sorry I'd missed it.

> Have you verified the changes in execReplication.c in some way? It is
> not clear to me how do you ensure to set the special value
> (InvalidBlockNumber) in CTID for delete operation via logical
> replication?  The logical replication worker uses function
> ExecSimpleRelationDelete to perform Delete and there is no way it can
> pass the correct value of row_moved in heap_delete.  Am I missing
> something due to which we don't need to do this?
>

You are correct, from ExecSimpleRelationDelete, heap_delete will always
receive row_moved = false and InvalidBlockNumber will never set.

I didn't found any test case to hit changes in execReplication.c.  I am not sure
what should we suppose do here, and having LOG is how much worse either.

What do you think, should we add an assert like EvalPlanQualFetch() here or
the current LOG message is fine?

Thanks for the review.

Regards,
Amul Sul


0001-Invalidate-ip_blkid-v5-wip.patch
Description: Binary data


0002-isolation-tests-v3.patch
Description: Binary data


Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.

2018-01-23 Thread Kyotaro HORIGUCHI
Thank you for looking this.

At Wed, 24 Jan 2018 00:13:51 +0300, Sergei Kornilov  wrote in 
<348951516742...@web54j.yandex.ru>
> Hello
> I tested this patch and think it can be commited to master. Is there a CF 
> record? I can not find one.

Not yet. I'm thinking of creating an entry in the next CF after
waiting someone to pickup this.

> Should we also make backport to older versions? I test on REL_10_STABLE - 
> patch builds and works ok, but "make check" fails on new testcase with error:
> >  CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops);
> >+ ERROR:  missing support function 4 for attribute 1 of index "t_a_a1_idx"
> and with different explain results.

Thank you for checking that. d3a4f89 allowed that and
inet_gist_decompress is removed at the same time. Unfortunately I
didn't find a type on master that has both decompress and fetch
methods so I prefer to split the regression patch among target
versions.

master has its own one. 9.6 and 10 share the same patch.  9.5 has
a different one since the signature of consistent method has
changed as of 9.6. This is not required for 9.4 and earlier since
they don't check canreturn on attribute basis. (And they don't
try index-only on GiST.)

On the other hand the fix patch applies to all affected versions.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 0e221b3396215d67167622cbc07d04b5185d6f66 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 24 Jan 2018 15:25:48 +0900
Subject: [PATCH] Regression test for the failure of check_index_only.

check_index_only forgets the case where two or more index columns on
the same table column but with different operator class.
---
 src/test/regress/expected/create_index.out | 42 ++
 src/test/regress/sql/create_index.sql  | 27 +++
 2 files changed, 69 insertions(+)

diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 9c0f3e3..d95157c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2913,6 +2913,48 @@ explain (costs off)
 (2 rows)
 
 --
+-- Check for duplicate indexkey with different opclasses
+--
+-- opclass that doesn't have function 9 (GIST_FETCH_PROC) for GiST
+CREATE OPERATOR CLASS test_inet_ops FOR TYPE inet USING gist AS
+OPERATOR3   &&,
+FUNCTION1   inet_gist_consistent (internal, inet, integer, oid, internal),
+FUNCTION2   inet_gist_union (internal, internal),
+FUNCTION3   inet_gist_compress (internal),
+FUNCTION4   inet_gist_decompress (internal),
+FUNCTION5   inet_gist_penalty (internal, internal, internal),
+FUNCTION6   inet_gist_picksplit (internal, internal),
+FUNCTION7   inet_gist_same (inet, inet, internal);
+CREATE TABLE t (a inet);
+CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops);
+INSERT INTO t VALUES ('192.168.0.1');
+VACUUM t;
+SELECT * FROM t WHERE a && '192.168.0.1'; -- should retuan a value
+  a  
+-
+ 192.168.0.1
+(1 row)
+
+-- enfoce GiST
+SET enable_seqscan TO false;
+SET enable_bitmapscan TO false;
+EXPLAIN (COSTS OFF) SELECT * FROM t WHERE a && '192.168.0.1'; -- shoudn't be index only scan
+QUERY PLAN
+--
+ Index Scan using t_a_a1_idx on t
+   Index Cond: (a && '192.168.0.1'::inet)
+(2 rows)
+
+SELECT * FROM t WHERE a && '192.168.0.1'; -- also should return a value
+  a  
+-
+ 192.168.0.1
+(1 row)
+
+-- cleanup
+DROP TABLE t;
+DROP OPERATOR CLASS test_inet_ops USING gist;
+--
 -- REINDEX (VERBOSE)
 --
 CREATE TABLE reindex_verbose(id integer primary key);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index b199c23..95a5060 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -982,6 +982,33 @@ explain (costs off)
   select * from tenk1 where (thousand, tenthous) in ((1,1001), (null,null));
 
 --
+-- Check for duplicate indexkey with different opclasses
+--
+-- opclass that doesn't have function 9 (GIST_FETCH_PROC) for GiST
+CREATE OPERATOR CLASS test_inet_ops FOR TYPE inet USING gist AS
+OPERATOR3   &&,
+FUNCTION1   inet_gist_consistent (internal, inet, integer, oid, internal),
+FUNCTION2   inet_gist_union (internal, internal),
+FUNCTION3   inet_gist_compress (internal),
+FUNCTION4   inet_gist_decompress (internal),
+FUNCTION5   inet_gist_penalty (internal, internal, internal),
+FUNCTION6   inet_gist_picksplit (internal, internal),
+FUNCTION7   inet_gist_same (inet, inet, internal);
+CREATE TABLE t (a inet);
+CREATE INDEX 

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 6:43 PM, Amit Kapila  wrote:
> On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munro
>  wrote:
>> On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila  wrote:
> I am going to repeat my previous suggest that we use a Barrier here.
> Given the discussion subsequent to my original proposal, this can be a
> lot simpler than what I suggested originally.  Each worker does
> BarrierAttach() before beginning to read tuples (exiting if the phase
> returned is non-zero) and BarrierArriveAndDetach() when it's done
> sorting.  The leader does BarrierAttach() before launching workers and
> BarrierArriveAndWait() when it's done sorting.
>>>
>>> How does leader detect if one of the workers does BarrierAttach and
>>> then fails (either exits or error out) before doing
>>> BarrierArriveAndDetach?
>>
>> If you attach and then exit cleanly, that's a programming error and
>> would cause anyone who runs BarrierArriveAndWait() to hang forever.
>>
>
> Right, but what if the worker dies due to something proc_exit(1) or
> something like that before calling BarrierArriveAndWait.  I think this
> is part of the problem we have solved in
> WaitForParallelWorkersToFinish such that if the worker exits abruptly
> at any point due to some reason, the system should not hang.

Actually what I said before is no longer true: after commit 2badb5af,
if you exit unexpectedly then the new ParallelWorkerShutdown() exit
hook delivers PROCSIG_PARALLEL_MESSAGE (apparently after detaching
from the error queue) and the leader aborts when it tries to read the
error queue.  I just hacked Parallel Hash like this:

BarrierAttach(build_barrier);
+   if (ParallelWorkerNumber == 0)
+   {
+   pg_usleep(100);
+   proc_exit(1);
+   }

Now I see:

postgres=# select count(*) from foox r join foox s on r.a = s.a;
ERROR:  lost connection to parallel worker

Using a debugger I can see the leader raising that error with this stack:

HandleParallelMessages at parallel.c:890
ProcessInterrupts at postgres.c:3053
ConditionVariableSleep(cv=0x00010a62e4c8,
wait_event_info=134217737) at condition_variable.c:151
BarrierArriveAndWait(barrier=0x00010a62e4b0,
wait_event_info=134217737) at barrier.c:191
MultiExecParallelHash(node=0x7ffcd9050b10) at nodeHash.c:312
MultiExecHash(node=0x7ffcd9050b10) at nodeHash.c:112
MultiExecProcNode(node=0x7ffcd9050b10) at execProcnode.c:502
ExecParallelHashJoin [inlined]
ExecHashJoinImpl(pstate=0x7ffcda01baa0, parallel='\x01') at
nodeHashjoin.c:291
ExecParallelHashJoin(pstate=0x7ffcda01baa0) at nodeHashjoin.c:582

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Doc tweak for huge_pages?

2018-01-23 Thread Catalin Iacob
On Tue, Jan 23, 2018 at 7:13 PM, Catalin Iacob  wrote:
> By the way, Fedora 27 does disable THP by default, they deviate from
> upstream in this regard:

> When I have some time I'll try to do some digging into history of the
> Fedora kernel package to see if they provide a rationale for changing
> the default. That might hint whether it's likely that future RHEL will
> change as well.

I see Peter assigned himself as committer, some more information below
for him to decide on the strength of the anti THP message.

commit 9a031d5070d9f8f5916c48637bd0c237cd52eaf9
Author: Josh Boyer 
Date:   Thu Mar 27 18:31:16 2014 -0400

Switch to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE instead of always on

The benefit of THP has been somewhat questionable overall for a while,
and it's been known to cause performance issues with some workloads.
Upstream also considers it to be overly complicated and really not worth
it on machines with memory in the amounts found on typical desktops/SMB
servers.

Switch to using it via madvise, which most applications that care about
it should likely already be doing.

Debian 9 also seems to default to madvise instead of always.

Digging more into it, there were changes in the 4.6 kernel (released
May 2016) that should improve THP, more precisely:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=444eb2a449ef36fe115431ed7b71467c4563c7f1

This also lead Debian to change their default in September 2017 (so
for the future Debian release) back to always, referencing the 44eb2a
improvements:
https://anonscm.debian.org/cgit/kernel/linux.git/commit/debian/changelog?id=611a8e67260e8b8190ab991206a3867681d6df91

Ben Hutchings 2017-09-29 14:32:09 (GMT)
thp: Enable TRANSPARENT_HUGEPAGE_ALWAYS instead of TRANSPARENT_HUGEPAGE_MADVISE
As advised by Andrea Arcangeli - since commit 444eb2a449ef "mm: thp:
set THP defrag by default to madvise and add a stall-free defrag
option" this will generally be best for performance.

So maybe we should weaken the language against THP. Maybe present the
known facts so far, even if the post 4.6 situation is vague/unknown:
before Linux 4.6 there were repeated reports of THP problems with
Postgres, Linux >= 4.6 might improve things but this isn't confirmed.
And it would be good if somebody could run benchmarks on pre 4.6 and
post 4.6 kernels. I would love to but have no access to big (or
medium) hardware.



Re: [HACKERS] taking stdbool.h into use

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 11:33:56AM -0500, Peter Eisentraut wrote:
> Here is a proposed patch set to clean this up.  First, add some test
> coverage for record_image_cmp.  (There wasn't any, only for
> record_image_eq as part of MV testing.)  Then, remove the GET_ macros
> from record_image_{eq,cmp}.  And finally remove all that byte-masking
> stuff from postgres.h.

Good catch. Coverage reports mark those areas as empty! Similarly the
functions for record_* are mostly not used. Some tests could be added
for them at the same time. The four error paths of those functions are
tested as well, which is cool to see. Even if the buildfarm explodes
after 0002 and 0003, 0001 is still a good addition. The set looks good
to me by the way.
--
Michael


signature.asc
Description: PGP signature


Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.

2018-01-23 Thread Andrey Borodin
Hi!
> 24 янв. 2018 г., в 2:13, Sergei Kornilov  написал(а):
> 
> Should we also make backport to older versions? I test on REL_10_STABLE - 
> patch builds and works ok, but "make check" fails on new testcase with error:
>> CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops);
>> + ERROR:  missing support function 4 for attribute 1 of index "t_a_a1_idx"
> and with different explain results.

To be usable for 10 the patch must use full-featured opclass in tests: there is 
no decompress GiST function in test_inet_ops which was required before 11.
I think, explain results will be identical.

Best regards, Andrey Borodin.


Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 9:38 PM, Amit Kapila  wrote:
> On Wed, Jan 24, 2018 at 10:38 AM, Peter Geoghegan  wrote:
>> The leader can go ahead and sort before calling something like a new
>> WaitForParallelWorkersToAttach() function (or even
>> WaitForParallelWorkersToFinish()). If we did add a
>> WaitForParallelWorkersToAttach() function, then the performance hit
>> would probably not be noticeable in practice. The
>> parallel_leader_participates case would still work without special
>> care (that's the main hazard that makes using a barrier seem
>> unappealing).
>>
>
> +1. I also think so.  Another advantage is that even if one of the
> workers is not able to start, we don't need to return an error at the
> end of the query, rather we can detect that situation early and can
> execute the query successfully.  OTOH, if we are not convinced about
> performance implications, then WaitForParallelWorkersToFinish should
> serve the purpose which can be called at later point of time.

There is another disadvantage to just using
WaitForParallelWorkersToFinish() to wait for nbtsort.c workers to
finish (and not inventing a WaitForParallelWorkersToAttach() function,
and calling that instead), which is: there can be no custom wait event
that specifically mentions parallel CREATE INDEX.

-- 
Peter Geoghegan



Re: Incorrect comments in partition.c

2018-01-23 Thread Etsuro Fujita
(2018/01/24 13:06), Amit Langote wrote:
> On 2018/01/23 20:43, Etsuro Fujita wrote:
>> Here is a comment for get_qual_for_list in partition.c:
>>
>>* get_qual_for_list
>>*
>>* Returns an implicit-AND list of expressions to use as a list partition's
>> - * constraint, given the partition key and bound structures.
>>
>> I don't think the part "given the partition key and bound structures."
>> is correct because we pass the *parent relation* and partition bound
>> structure to that function.  So I think we should change that part as
>> such.  get_qual_for_range has the same issue, so I think we need this
>> change for that function as well.
> 
> Yeah.  It seems 6f6b99d1335 [1] changed their signatures to support
> handling default partitions.
> 
>> Another one I noticed in comments in partition.c is:
>>
>>   * get_qual_for_hash
>>   *
>>   * Given a list of partition columns, modulus and remainder
>> corresponding to a
>>   * partition, this function returns CHECK constraint expression Node for
>> that
>>   * partition.
>>
>> I think the part "Given a list of partition columns, modulus and
>> remainder corresponding to a partition" is wrong because we pass to that
>> function the parent relation and partition bound structure the same way
>> as for get_qual_for_list/get_qual_for_range.  So what about changing the
>> above to something like this, similarly to
>> get_qual_for_list/get_qual_for_range:
>>
>>   Returns a CHECK constraint expression to use as a hash partition's
>>   constraint, given the parent relation and partition bound structure.
> 
> Makes sense.
> 
>> Also:
>>
>>   * The partition constraint for a hash partition is always a call to the
>>   * built-in function satisfies_hash_partition().  The first two
>> arguments are
>>   * the modulus and remainder for the partition; the remaining arguments
>> are the
>>   * values to be hashed.
>>
>> I also think the part "The first two arguments are the modulus and
>> remainder for the partition;" is wrong (see satisfies_hash_partition).
>> But I don't think we need to correct that here because we have described
>> about the arguments in comments for that function.  So I'd like to
>> propose removing the latter comment entirely from the above.
> 
> Here, too.  The comment seems to have been obsoleted by f3b0897a121 [2].

Thanks for the review and related git info!

Best regards,
Etsuro Fujita



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munro
 wrote:
> On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila  wrote:
 I am going to repeat my previous suggest that we use a Barrier here.
 Given the discussion subsequent to my original proposal, this can be a
 lot simpler than what I suggested originally.  Each worker does
 BarrierAttach() before beginning to read tuples (exiting if the phase
 returned is non-zero) and BarrierArriveAndDetach() when it's done
 sorting.  The leader does BarrierAttach() before launching workers and
 BarrierArriveAndWait() when it's done sorting.
>>
>> How does leader detect if one of the workers does BarrierAttach and
>> then fails (either exits or error out) before doing
>> BarrierArriveAndDetach?
>
> If you attach and then exit cleanly, that's a programming error and
> would cause anyone who runs BarrierArriveAndWait() to hang forever.
>

Right, but what if the worker dies due to something proc_exit(1) or
something like that before calling BarrierArriveAndWait.  I think this
is part of the problem we have solved in
WaitForParallelWorkersToFinish such that if the worker exits abruptly
at any point due to some reason, the system should not hang.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 10:38 AM, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munro
>  wrote:
>>> Yes, this is what I am trying to explain on parallel create index
>>> thread.  I think there we need to either use
>>> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a
>>> new API as proposed in that thread) if we don't want to use barriers.
>>> I see a minor disadvantage in using WaitForParallelWorkersToFinish
>>> which I will say on that thread.
>>
>> Ah, I see.  So if you wait for them to attach you can detect
>> unexpected dead workers (via shm_mq_receive), at the cost of having
>> the leader wasting time waiting around for forked processes to say
>> hello when it could instead be sorting tuples.
>
> The leader can go ahead and sort before calling something like a new
> WaitForParallelWorkersToAttach() function (or even
> WaitForParallelWorkersToFinish()). If we did add a
> WaitForParallelWorkersToAttach() function, then the performance hit
> would probably not be noticeable in practice. The
> parallel_leader_participates case would still work without special
> care (that's the main hazard that makes using a barrier seem
> unappealing).
>

+1. I also think so.  Another advantage is that even if one of the
workers is not able to start, we don't need to return an error at the
end of the query, rather we can detect that situation early and can
execute the query successfully.  OTOH, if we are not convinced about
performance implications, then WaitForParallelWorkersToFinish should
serve the purpose which can be called at later point of time.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Possible performance regression in version 10.1 with pgbench read-write tests.

2018-01-23 Thread Mithun Cy
On Wed, Jan 24, 2018 at 7:36 AM, Amit Kapila  wrote:

> Both the cases look identical, but from the document attached, it
> seems the case-1 is for scale factor 300.

Oops sorry it was a typo. CASE 1 is scale factor 300 which will fit in
shared buffer =8GB.



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munro
 wrote:
>> Yes, this is what I am trying to explain on parallel create index
>> thread.  I think there we need to either use
>> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a
>> new API as proposed in that thread) if we don't want to use barriers.
>> I see a minor disadvantage in using WaitForParallelWorkersToFinish
>> which I will say on that thread.
>
> Ah, I see.  So if you wait for them to attach you can detect
> unexpected dead workers (via shm_mq_receive), at the cost of having
> the leader wasting time waiting around for forked processes to say
> hello when it could instead be sorting tuples.

The leader can go ahead and sort before calling something like a new
WaitForParallelWorkersToAttach() function (or even
WaitForParallelWorkersToFinish()). If we did add a
WaitForParallelWorkersToAttach() function, then the performance hit
would probably not be noticeable in practice. The
parallel_leader_participates case would still work without special
care (that's the main hazard that makes using a barrier seem
unappealing).

-- 
Peter Geoghegan



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila  wrote:
>>> I am going to repeat my previous suggest that we use a Barrier here.
>>> Given the discussion subsequent to my original proposal, this can be a
>>> lot simpler than what I suggested originally.  Each worker does
>>> BarrierAttach() before beginning to read tuples (exiting if the phase
>>> returned is non-zero) and BarrierArriveAndDetach() when it's done
>>> sorting.  The leader does BarrierAttach() before launching workers and
>>> BarrierArriveAndWait() when it's done sorting.
>
> How does leader detect if one of the workers does BarrierAttach and
> then fails (either exits or error out) before doing
> BarrierArriveAndDetach?

If you attach and then exit cleanly, that's a programming error and
would cause anyone who runs BarrierArriveAndWait() to hang forever.
If you attach and raise an error, the leader will receive an error
message via CFI() and will then raise an error itself and terminate
all workers during cleanup.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 5:43 PM, Amit Kapila  wrote:
>> Hmm.  Yeah.  I can't seem to reach a stuck case and was probably just
>> confused and managed to confuse Robert too.  If you make
>> fork_process() fail randomly (see attached), I see that there are a
>> couple of easily reachable failure modes (example session at bottom of
>> message):
>>
>
> In short, we are good with committed code. Right?

Yep.  Sorry for the noise.

> Yes, this is what I am trying to explain on parallel create index
> thread.  I think there we need to either use
> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a
> new API as proposed in that thread) if we don't want to use barriers.
> I see a minor disadvantage in using WaitForParallelWorkersToFinish
> which I will say on that thread.

Ah, I see.  So if you wait for them to attach you can detect
unexpected dead workers (via shm_mq_receive), at the cost of having
the leader wasting time waiting around for forked processes to say
hello when it could instead be sorting tuples.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 12:20 AM, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas  wrote:
>> As Amit says, what remains is the case where fork() fails or the
>> worker dies before it reaches the line in ParallelWorkerMain that
>> reads shm_mq_set_sender(mq, MyProc).  In those cases, no error will be
>> signaled until you call WaitForParallelWorkersToFinish().  If you wait
>> prior to that point for a number of workers equal to
>> nworkers_launched, you will wait forever in those cases.
>
> Another option might be to actually call
> WaitForParallelWorkersToFinish() in place of a condition variable or
> barrier, as Amit suggested at one point.
>

Yes, the only thing that is slightly worrying about using
WaitForParallelWorkersToFinish is that backend leader needs to wait
for workers to finish rather than just finishing sort related work.  I
think there shouldn't be much difference between when the sort is done
and the workers actually finish the remaining resource cleanup.
However, OTOH, if we are not okay with this solution and want to go
with some kind of usage of barriers to solve this problem, then we can
evaluate that as well, but I feel it is better if we can use the
method which is used in other parallelism code to solve this problem
(which is to use WaitForParallelWorkersToFinish).

>> I am going to repeat my previous suggest that we use a Barrier here.
>> Given the discussion subsequent to my original proposal, this can be a
>> lot simpler than what I suggested originally.  Each worker does
>> BarrierAttach() before beginning to read tuples (exiting if the phase
>> returned is non-zero) and BarrierArriveAndDetach() when it's done
>> sorting.  The leader does BarrierAttach() before launching workers and
>> BarrierArriveAndWait() when it's done sorting.
>>

How does leader detect if one of the workers does BarrierAttach and
then fails (either exits or error out) before doing
BarrierArriveAndDetach?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 9:55 AM, Thomas Munro
 wrote:
> On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapila  wrote:
>> On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas  wrote:
>>> In the case of Gather, for example, we won't
>>> WaitForParallelWorkersToFinish() until we've read all the tuples.  If
>>> there's a tuple queue that does not yet have a sender, then we'll just
>>> wait for it to get one, even if the sender fell victim to a fork
>>> failure and is never going to show up.
>>>
>>
>> Hmm, I think that case will be addressed because tuple queues can
>> detect if the leader is not attached.  It does in code path
>> shm_mq_receive->shm_mq_counterparty_gone.  In
>> shm_mq_counterparty_gone, it can detect if the worker is gone by using
>> GetBackgroundWorkerPid.  Moreover, I have manually tested this
>> particular case before saying your patch is fine.  Do you have some
>> other case in mind which I am missing?
>
> Hmm.  Yeah.  I can't seem to reach a stuck case and was probably just
> confused and managed to confuse Robert too.  If you make
> fork_process() fail randomly (see attached), I see that there are a
> couple of easily reachable failure modes (example session at bottom of
> message):
>

In short, we are good with committed code. Right?

> 1.  HandleParallelMessages() is reached and raises a "lost connection
> to parallel worker" error because shm_mq_receive() returns
> SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked
> GetBackgroundWorkerPid() just as you said.  I guess that's happening
> because some other process is (coincidentally) sending
> PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a
> process is unexpectedly stopped.
>
> 2.  WaitForParallelWorkersToFinish() is reached and raises a "parallel
> worker failed to initialize" error.  TupleQueueReaderNext() set done
> to true, because shm_mq_receive() returned SHM_MQ_DETACHED.  Once
> again, that is because shm_mq_counterparty_gone() returned true.  This
> is the bit Robert and I missed in our off-list discussion.
>
> As long as we always get our latch set by the postmaster after a fork
> failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is
> guaranteed to return BGWH_STOPPED after that, and as long as we only
> ever use latch/CFI loops to wait, and as long as we try to read from a
> shm_mq, then I don't see a failure mode that hangs.
>
> This doesn't help a parallel.c client that doesn't try to read from a
> shm_mq though, like parallel CREATE INDEX (using the proposed design
> without dynamic Barrier).
>

Yes, this is what I am trying to explain on parallel create index
thread.  I think there we need to either use
WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a
new API as proposed in that thread) if we don't want to use barriers.
I see a minor disadvantage in using WaitForParallelWorkersToFinish
which I will say on that thread.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 10:03 AM, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munro
>  wrote:
>>> Hmm, I think that case will be addressed because tuple queues can
>>> detect if the leader is not attached.  It does in code path
>>> shm_mq_receive->shm_mq_counterparty_gone.  In
>>> shm_mq_counterparty_gone, it can detect if the worker is gone by using
>>> GetBackgroundWorkerPid.  Moreover, I have manually tested this
>>> particular case before saying your patch is fine.  Do you have some
>>> other case in mind which I am missing?
>>
>> Hmm.  Yeah.  I can't seem to reach a stuck case and was probably just
>> confused and managed to confuse Robert too.  If you make
>> fork_process() fail randomly (see attached), I see that there are a
>> couple of easily reachable failure modes (example session at bottom of
>> message):
>>
>> 1.  HandleParallelMessages() is reached and raises a "lost connection
>> to parallel worker" error because shm_mq_receive() returns
>> SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked
>> GetBackgroundWorkerPid() just as you said.  I guess that's happening
>> because some other process is (coincidentally) sending
>> PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a
>> process is unexpectedly stopped.
>>
>> 2.  WaitForParallelWorkersToFinish() is reached and raises a "parallel
>> worker failed to initialize" error.  TupleQueueReaderNext() set done
>> to true, because shm_mq_receive() returned SHM_MQ_DETACHED.  Once
>> again, that is because shm_mq_counterparty_gone() returned true.  This
>> is the bit Robert and I missed in our off-list discussion.
>>
>> As long as we always get our latch set by the postmaster after a fork
>> failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is
>> guaranteed to return BGWH_STOPPED after that, and as long as we only
>> ever use latch/CFI loops to wait, and as long as we try to read from a
>> shm_mq, then I don't see a failure mode that hangs.
>
> What about the parallel_leader_participation=off case?
>

There is nothing special about that case, there shouldn't be any
problem till we can detect the worker failures appropriately.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munro
 wrote:
>> Hmm, I think that case will be addressed because tuple queues can
>> detect if the leader is not attached.  It does in code path
>> shm_mq_receive->shm_mq_counterparty_gone.  In
>> shm_mq_counterparty_gone, it can detect if the worker is gone by using
>> GetBackgroundWorkerPid.  Moreover, I have manually tested this
>> particular case before saying your patch is fine.  Do you have some
>> other case in mind which I am missing?
>
> Hmm.  Yeah.  I can't seem to reach a stuck case and was probably just
> confused and managed to confuse Robert too.  If you make
> fork_process() fail randomly (see attached), I see that there are a
> couple of easily reachable failure modes (example session at bottom of
> message):
>
> 1.  HandleParallelMessages() is reached and raises a "lost connection
> to parallel worker" error because shm_mq_receive() returns
> SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked
> GetBackgroundWorkerPid() just as you said.  I guess that's happening
> because some other process is (coincidentally) sending
> PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a
> process is unexpectedly stopped.
>
> 2.  WaitForParallelWorkersToFinish() is reached and raises a "parallel
> worker failed to initialize" error.  TupleQueueReaderNext() set done
> to true, because shm_mq_receive() returned SHM_MQ_DETACHED.  Once
> again, that is because shm_mq_counterparty_gone() returned true.  This
> is the bit Robert and I missed in our off-list discussion.
>
> As long as we always get our latch set by the postmaster after a fork
> failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is
> guaranteed to return BGWH_STOPPED after that, and as long as we only
> ever use latch/CFI loops to wait, and as long as we try to read from a
> shm_mq, then I don't see a failure mode that hangs.

What about the parallel_leader_participation=off case?

-- 
Peter Geoghegan



Regarding ambulkdelete, amvacuumcleanup index methods

2018-01-23 Thread Abinaya k
Hai all,
  We are building In-memory index extension for postgres. We would
capture table inserts, updates, deletes using triggers. During vacuum
operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as
part of index cleanup). As we handle all updates, deletes using triggers,
we don't have to do any index cleanup in ambulkdelete. But, what stats
should i return from ambulkdelete and amvacuumcleanup? Is that necessary to
return stats from ambulkdelete and amvacuumcleanup ?


Re: PATCH: Configurable file mode mask

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 10:48:07PM -0500, David Steele wrote:
> Sorry - it means "level of effort".  I was trying to get an idea if it
> is something that could be available in the PG11 development cycle, or
> if I should be looking at other ways to get the testing for this patch
> completed.

I don't think that it would be more than a couple of hours digging
things out, but that may be optimistic. What only needs to be done is
extending get_new_node with an optional argument to define the bin
directory of a PostgresNode and register the path. If the binary
directory is undefined, just rely on PATH and call pg_config --bindir to
get the information, then register it. What we need to be careful at is
that all the binary calls of PostgresNode.pm need to be changed to use
the binary path, which is not really complicated, but it can be easy to
miss some.

There is one small issue with TestLib::check_pg_config but I think that
we can live with the existing version relying on PATH. Once this
refactoring is done, you need the patch set I sent previously here with
some tiny modifications:
https://www.postgresql.org/message-id/CAB7nPqRJXz0sEuUL36eBsF7iZtOQGMJoJPGFWxHLuS6TYPxf5w%40mail.gmail.com

Those modifications involve just looking at the environment variables
specified in pg_upgrade's TESTING and change the node binaries if those
are defined. It is also necessary to tweak probin's paths for the dump
consistency checks, which is something that my last patch set was not
doing.

It is tiring to see this topic come up again and again, so you know
what, let's bite the bullet. I commit myself into coding this thing with
a patch set for the next commit fest. the only thing I want to be sure
about is if folks on this list are fine with the plan of this email.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapila  wrote:
> On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas  wrote:
>> In the case of Gather, for example, we won't
>> WaitForParallelWorkersToFinish() until we've read all the tuples.  If
>> there's a tuple queue that does not yet have a sender, then we'll just
>> wait for it to get one, even if the sender fell victim to a fork
>> failure and is never going to show up.
>>
>
> Hmm, I think that case will be addressed because tuple queues can
> detect if the leader is not attached.  It does in code path
> shm_mq_receive->shm_mq_counterparty_gone.  In
> shm_mq_counterparty_gone, it can detect if the worker is gone by using
> GetBackgroundWorkerPid.  Moreover, I have manually tested this
> particular case before saying your patch is fine.  Do you have some
> other case in mind which I am missing?

Hmm.  Yeah.  I can't seem to reach a stuck case and was probably just
confused and managed to confuse Robert too.  If you make
fork_process() fail randomly (see attached), I see that there are a
couple of easily reachable failure modes (example session at bottom of
message):

1.  HandleParallelMessages() is reached and raises a "lost connection
to parallel worker" error because shm_mq_receive() returns
SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked
GetBackgroundWorkerPid() just as you said.  I guess that's happening
because some other process is (coincidentally) sending
PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a
process is unexpectedly stopped.

2.  WaitForParallelWorkersToFinish() is reached and raises a "parallel
worker failed to initialize" error.  TupleQueueReaderNext() set done
to true, because shm_mq_receive() returned SHM_MQ_DETACHED.  Once
again, that is because shm_mq_counterparty_gone() returned true.  This
is the bit Robert and I missed in our off-list discussion.

As long as we always get our latch set by the postmaster after a fork
failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is
guaranteed to return BGWH_STOPPED after that, and as long as we only
ever use latch/CFI loops to wait, and as long as we try to read from a
shm_mq, then I don't see a failure mode that hangs.

This doesn't help a parallel.c client that doesn't try to read from a
shm_mq though, like parallel CREATE INDEX (using the proposed design
without dynamic Barrier).  We can't rely on a coincidental
PROCSIG_PARALLEL_MESSAGE like in (1), and it's not going to try to
read from a queue like in (2).  So wouldn't it need a way to perform
its own similar check for unexpectedly BGWH_STOPPED processes whenever
its latch is set?

If there were some way for the postmaster to cause reason
PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of
just notification via kill(SIGUSR1) when it fails to fork a parallel
worker, we'd get (1) for free in any latch/CFI loop code.  But I
understand that we can't do that by project edict.

===

postgres=# create table foox as select generate_series(1, 1) as a;
SELECT 1
postgres=# alter table foox set (parallel_workers = 4);
ALTER TABLE
postgres=# set max_parallel_workers_per_gather = 4;
SET
postgres=# select count(*) from foox;
ERROR:  lost connection to parallel worker
postgres=# select count(*) from foox;
ERROR:  parallel worker failed to initialize
HINT:  More details may be available in the server log.
postgres=# select count(*) from foox;
 count
---
 1
(1 row)

postgres=# select count(*) from foox;
ERROR:  lost connection to parallel worker
postgres=# select count(*) from foox;
ERROR:  lost connection to parallel worker
postgres=# select count(*) from foox;
ERROR:  lost connection to parallel worker
postgres=# select count(*) from foox;
ERROR:  lost connection to parallel worker

-- 
Thomas Munro
http://www.enterprisedb.com


chaos-monkey-fork-process.patch
Description: Binary data


Re: documentation is now XML

2018-01-23 Thread Tom Lane
"David G. Johnston"  writes:
> On Tuesday, January 23, 2018, Bruce Momjian  wrote:
>> All agreed, but what alternatives are being developed?

> I seem to recall a proposal a while back to gain margin on some of the
> limits by pruning the release notes section down to at least this century
> and archiving putting the older ones elsewhere.

Yeah; I did and still do think that's a good idea.  But so far as the
toolchain is concerned, that's just a band-aid.

Anyway, we're on XML now, and it seems to be working fairly well.
I don't feel a need to revisit that.  It's probably true that the
TeX-based toolchain was potentially capable of producing finer
typesetting results than the XML chain ... but, honestly, who's
printing the PG manual on dead trees anymore?  I find the PDF output
to be mostly a legacy thing in the first place.

regards, tom lane



Re: [HACKERS] MERGE SQL Statement for PG11

2018-01-23 Thread Simon Riggs
On 24 January 2018 at 01:35, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 5:51 AM, Simon Riggs  wrote:
>>> Not yet working
>>> * Partitioning
>>> * RLS
>>>
>>> Based on this successful progress I imagine I'll be looking to commit
>>> this by the end of the CF, allowing us 2 further months to bugfix.
>>
>> This is complete and pretty clean now. 1200 lines of code, plus docs and 
>> tests.
>
> That timeline seems aggressive to me.

Thank you for the review.

> Also, the patch appears to have
> bitrot. Please rebase, and post a new version.

Will do, though I'm sure that's only minor since we rebased only a few days ago.

> Some feedback based on a short read-through of your v11:
>
> * What's the postgres_fdw status?

MERGE currently works with normal relations and materialized views only.

> * Relatedly, when/where does applying the junkfilter happen, if not here?:

A few lines later in the same file. Junkfilters are still used.

> * Isn't "consider INSERT ... ON CONFLICT DO UPDATE" obsolete in these
> doc changes?:
>
>> -F312   MERGE statement NO  consider INSERT ... ON CONFLICT DO UPDATE
>> -F313   Enhanced MERGE statementNO
>> -F314   MERGE statement with DELETE branch  NO
>> +F312   MERGE statement YES consider INSERT ... ON CONFLICT DO UPDATE
>> +F313   Enhanced MERGE statementYES
>> +F314   MERGE statement with DELETE branch  YES

I think that it is still useful to highlight the existence of a
non-standard feature which people might not be otherwise unaware.

If you wish me to remove a reference to your work, I will bow to your wish.

> * What's the deal with transition tables? Your docs say this:
>
>> +   
>> +> class="parameter">source_table_name
>> +
>> + 
>> +  The name (optionally schema-qualified) of the source table, view or
>> +  transition table.
>> + 
>> +
>> +   
>
> But the code says this:
>
>> +   /*
>> +* XXX if we support transition tables this would need to move 
>> earlier
>> +* before ExecSetupTransitionCaptureState()
>> +*/
>> +   switch (action->commandType)

Changes made by MERGE can theoretically be visible in a transition
table. When I tried to implement that there were problems caused by
the way INSERT ON CONFLICT has been implemented, so it will take a
fair amount of study and lifting to make that work. For now, they are
not supported and generate an error message. That behaviour was not
documented, but I've done that now.

SQL Std says "Transition tables are not generally updatable (and
therefore not simply updatable) and their columns are not updatable."

So MERGE cannot be a target of a transition table, but it can be the
source of one. So the docs you cite are correct, as is the comment.

> * Can UPDATEs themselves accept an UPDATE-style WHERE clause, in
> addition to the WHEN quals, and the main ON join quals?
>
> I don't see any examples of this in your regression tests.

No, they can't. Oracle allows it but it isn't SQL Standard.

> * You added a new MERGE command tag. Shouldn't there be changes to
> libpq's fe-exec.c, to go along with that?

Nice catch. One liner patch and docs updated in repo for next patch.

> * I noticed this:
>
>> + else if (node->operation == CMD_MERGE)
>> + {
>> + /*
>> + * XXX Add more detailed instrumentation for MERGE changes
>> + * when running EXPLAIN ANALYZE?
>> + */
>> + }
>
> I think that doing better here is necessary.

Please define better. It may be possible.

> * I'm not a fan of stored rules, but I still think that this needs to
> be discussed:
>
>> -   product_queries = fireRules(parsetree,
>> +   /*
>> +* First rule of MERGE club is we don't talk about rules
>> +*/
>> +   if (event == CMD_MERGE)
>> +   product_queries = NIL;
>> +   else
>> +   product_queries = fireRules(parsetree,
>> result_relation,
>> event,
>> locks,

It has been discussed, in my recent proposal, mentioned in the doc
page for clarity.

It has also been discussed previously in discussions around MERGE.
It's not clear how they would work, but we already have an example of
a statement type that doesn't support rules.

> * This seems totally unnecessary at first blush:
>
>> +   /*
>> +* Lock tuple for update.
>> +*
>> +* XXX Is this really needed? I put this in
>> +* just to get hold of the existing tuple.
>> +* But if we do need, then we probably
>> +* should be looking at the return value of
>> +* heap_lock_tuple() and take appropriate
>> +* action.
>> +

Re: [PATCH] fix for C4141 warning on MSVC

2018-01-23 Thread Tom Lane
Thomas Munro  writes:
> Here's one like that.

Pushed; we'll soon see if the buildfarm likes it.  I added a tweak to
prevent forced inlining at -O0, as discussed in the other thread;
and worked on the comments a bit.

regards, tom lane



Re: Incorrect comments in partition.c

2018-01-23 Thread Amit Langote
On 2018/01/23 20:43, Etsuro Fujita wrote:
> Here is a comment for get_qual_for_list in partition.c:
> 
>   * get_qual_for_list
>   *
>   * Returns an implicit-AND list of expressions to use as a list partition's
> - * constraint, given the partition key and bound structures.
> 
> I don't think the part "given the partition key and bound structures."
> is correct because we pass the *parent relation* and partition bound
> structure to that function.  So I think we should change that part as
> such.  get_qual_for_range has the same issue, so I think we need this
> change for that function as well.

Yeah.  It seems 6f6b99d1335 [1] changed their signatures to support
handling default partitions.

> Another one I noticed in comments in partition.c is:
> 
>  * get_qual_for_hash
>  *
>  * Given a list of partition columns, modulus and remainder
> corresponding to a
>  * partition, this function returns CHECK constraint expression Node for
> that
>  * partition.
> 
> I think the part "Given a list of partition columns, modulus and
> remainder corresponding to a partition" is wrong because we pass to that
> function the parent relation and partition bound structure the same way
> as for get_qual_for_list/get_qual_for_range.  So what about changing the
> above to something like this, similarly to
> get_qual_for_list/get_qual_for_range:
> 
>  Returns a CHECK constraint expression to use as a hash partition's
>  constraint, given the parent relation and partition bound structure.

Makes sense.

> Also:
> 
>  * The partition constraint for a hash partition is always a call to the
>  * built-in function satisfies_hash_partition().  The first two
> arguments are
>  * the modulus and remainder for the partition; the remaining arguments
> are the
>  * values to be hashed.
> 
> I also think the part "The first two arguments are the modulus and
> remainder for the partition;" is wrong (see satisfies_hash_partition).
> But I don't think we need to correct that here because we have described
> about the arguments in comments for that function.  So I'd like to
> propose removing the latter comment entirely from the above.

Here, too.  The comment seems to have been obsoleted by f3b0897a121 [2].

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f6b99d1335

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f3b0897a121




Re: PATCH: Configurable file mode mask

2018-01-23 Thread David Steele
On 1/23/18 9:22 PM, Michael Paquier wrote:
> On Tue, Jan 23, 2018 at 09:18:51AM -0500, David Steele wrote:
>> On 1/20/18 5:47 PM, Michael Paquier wrote:
>>> Making this possible would require first some
>>> refactoring of PostgresNode.pm so as a node is aware of the binary paths
>>> it uses to be able to manipulate multiple instances with different
>>> install paths at the same time.
>>
>> Any idea of what the LOE would be for that?
> 
> What does LOE means?

Sorry - it means "level of effort".  I was trying to get an idea if it
is something that could be available in the PG11 development cycle, or
if I should be looking at other ways to get the testing for this patch
completed.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: documentation is now XML

2018-01-23 Thread David G. Johnston
On Tuesday, January 23, 2018, Bruce Momjian  wrote:

> On Tue, Jan 23, 2018 at 10:22:53PM -0500, Tom Lane wrote:
> (a) it's got hard
> > limits we're approaching,



> All agreed, but what alternatives are being developed?
>
>
I seem to recall a proposal a while back to gain margin on some of the
limits by pruning the release notes section down to at least this century
and archiving putting the older ones elsewhere.

David J.


Re: documentation is now XML

2018-01-23 Thread Tom Lane
Bruce Momjian  writes:
> On Thu, Nov 23, 2017 at 03:39:24PM -0500, Tom Lane wrote:
>> Also, we're way overdue for getting out from under the creaky TeX-based
>> toolchain for producing PDFs.

> I am coming in late here, but I am not aware of any open source
> professional typesetting software that has output quality as good as
> TeX.

I like TeX as much as the next guy --- I wrote my thesis with it,
a long time ago --- but there's no denying that (a) it's got hard
limits we're approaching, (b) the downstream conversion to PDF is
buggy, and (c) nobody is working on fixing it.

regards, tom lane



Re: copy.c allocation constant

2018-01-23 Thread Bruce Momjian
On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote:
> 
> 
> While reading copy.c I noticed this line:
> 
> 
> #define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
> 
> 
> Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
> 65535 so we palloc a power of 2?
> 
> 
> I have no idea if this affects performance, but it did strike me as strange.

Coming in late here, but it does seem very odd.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: Allow UPDATE to move rows between partitions.

2018-01-23 Thread Amit Kapila
On Tue, Jan 23, 2018 at 9:59 PM, Robert Haas  wrote:
> On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapila  wrote:
>> On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas  wrote:
>>> Allow UPDATE to move rows between partitions.
>>
>> +If an UPDATE on a partitioned table causes a row to 
>> move
>> +to another partition, it will be performed as a 
>> DELETE
>> +from the original partition followed by an INSERT 
>> into
>> +the new partition. In this case, all row-level BEFORE
>> +UPDATE triggers and all row-level
>> +BEFORE DELETE triggers are fired 
>> on
>> +the original partition.
>>
>> Do we need to maintain triggers related behavior for logical
>> replication?  In logical replication, we use ExecSimpleRelationDelete
>> to perform Delete operation which is not aware of this special
>> behavior (execute before update trigger for this case).
>
> Hmm.  I don't think there's any way for the logical decoding
> infrastructure to identify this case at present.  I suppose if we want
> that behavior, we'd need to modify the WAL format, and the changes
> might not be too straightforward because the after-image of the tuple
> wouldn't be available in the DELETE record.  I think the only reason
> we fire the UPDATE triggers is because we can't decide until after
> they've finished executing that we really want to DELETE and INSERT
> instead; by the time we are replicating the changes, we know what the
> final shape of the operation ended up being, so it's not clear to me
> that firing UPDATE triggers at that point would be useful.  I fear
> that trying to change this is going to cost performance (and developer
> time) for no real benefit, so my gut feeling is to leave it alone.
>

I think the problem is not only for triggers behavior but also for the
pending patch which takes care of concurrent updates/deletes.  I think
in case of logical replication we won't be able to detect that the row
has been moved to another partition by the logical worker as part of
replaying the actions done on master.  See my response to Amul's patch
[1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LHVnNWYF53F1gUGx6CTxuvznozvU-Lr-dfE%3DQeu1gEcg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Amit Kapila
On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas  wrote:
> On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas  wrote:
>> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila  
>> wrote:
>>
>> OK.  I've committed this patch and back-patched it to 9.6.
>> Back-patching to 9.5 didn't looks simple because nworkers_launched
>> doesn't exist on that branch, and it doesn't matter very much anyway
>> since the infrastructure has no in-core users in 9.5.
>
> I was just talking to Thomas, and one or the other of us realized that
> this doesn't completely fix the problem.  I think that if a worker
> fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
> after the point where it attached to the error queue, this patch is
> sufficient to make that reliably turn into an error.  But what if it
> fails before that - e.g. fork() failure?  In that case, this process
> will catch the problem if the calling process calls
> WaitForParallelWorkersToFinish, but does that actually happen in any
> interesting cases?  Maybe not.
>
> In the case of Gather, for example, we won't
> WaitForParallelWorkersToFinish() until we've read all the tuples.  If
> there's a tuple queue that does not yet have a sender, then we'll just
> wait for it to get one, even if the sender fell victim to a fork
> failure and is never going to show up.
>

Hmm, I think that case will be addressed because tuple queues can
detect if the leader is not attached.  It does in code path
shm_mq_receive->shm_mq_counterparty_gone.  In
shm_mq_counterparty_gone, it can detect if the worker is gone by using
GetBackgroundWorkerPid.  Moreover, I have manually tested this
particular case before saying your patch is fine.  Do you have some
other case in mind which I am missing?

> We could *almost* fix this by having execParallel.c include a Barrier
> in the DSM, similar to what I proposed for parallel CREATE INDEX.
> When a worker starts, it attaches to the barrier; when it exits, it
> detaches.  Instead of looping until the leader is done and all the
> tuple queues are detached, Gather or Gather Merge could loop until the
> leader is done and everyone detaches from the barrier.  But that would
> require changes to the Barrier API, which isn't prepared to have the
> caller alternate waiting with other activity like reading the tuple
> queues, which would clearly be necessary in this case.  Moreover, it
> doesn't work at all when parallel_leader_participation=off, because in
> that case we'll again just wait forever for some worker to show up,
> and if none of them do, then we're hosed.
>
> One idea to fix this is to add a new function in parallel.c with a
> name like CheckForFailedWorkers() that Gather and Gather Merge call
> just before they WaitLatch().  If a worker fails to start, the
> postmaster will send SIGUSR1 to the leader, which will set the latch,
> so we can count on the function being run after every worker death.
> The function would go through and check each worker for which
> pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to
> see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) ==
> BGWH_STOPPED.  If so, ERROR.
>
> This lets us *run* for arbitrary periods of time without detecting a
> fork failure, but before we *wait* we will notice.  Getting more
> aggressive than that sounds expensive, but I'm open to ideas.
>
> If we adopt this approach, then Peter's patch could probably make use
> of it, too.  It's a little tricky because ConditionVariableSleep()
> tries not to return spuriously, so we'd either have to change that or
> stick CheckForFailedWorkers() into that function's loop.  I suspect
> the former is a better approach.  Or maybe that patch should still use
> the Barrier approach and avoid all of this annoyance.
>

I think we can discuss your proposed solution once the problem is
clear.  As of now, it is not very clear to me whether there is any
pending problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Handling better supported channel binding types for SSL implementations

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 12:08:37PM -0500, Peter Eisentraut wrote:
> On 1/22/18 02:29, Michael Paquier wrote:
>> However there is as well the argument that this list's contents are not
>> directly used now, and based on what I saw from the MacOS SSL and GnuTLS
>> patches that would not be the case after either.
> 
> Right, there is no facility for negotiating the channel binding type, so
> a boolean result should be enough.

I am not completely convinced either that we need to complicate the code
to handle channel binding type negotiation.

> In which case we wouldn't actually need this for GnuTLS yet.

Sure. This depends mainly on how the patch for Mac's Secure Transport
moves forward.
--
Michael


signature.asc
Description: PGP signature


Re: PATCH: Configurable file mode mask

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 09:18:51AM -0500, David Steele wrote:
> On 1/20/18 5:47 PM, Michael Paquier wrote:
>> Making this possible would require first some
>> refactoring of PostgresNode.pm so as a node is aware of the binary paths
>> it uses to be able to manipulate multiple instances with different
>> install paths at the same time.
> 
> Any idea of what the LOE would be for that?

What does LOE means?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] fix for C4141 warning on MSVC

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 1:42 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Wed, Jan 24, 2018 at 1:16 PM, Michail Nikolaev
>>  wrote:
>>> Just very small fix for C4141 warning
>
>> Thanks.  This is similar to the fix I proposed over here:
>> https://www.postgresql.org/message-id/CAEepm%3D2iTKvbebiK3CdoczQk4_FfDt1EeU4c%2BnGE340JH7gQ0g%40mail.gmail.com
>
>> Perhaps we should combine these?
>
> +1.  The previous thread seems to have died off with nothing happening,
> but we still have those issues to resolve.  I like the idea of making
> the macro be a direct drop-in substitute for "inline" rather than
> something you use in addition to "inline".  And if we do that, it
> definitely should expand to plain "inline" if we have no other idea.

Here's one like that.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-warnings-about-always-inline-v2.patch
Description: Binary data


Re: Failed to request an autovacuum work-item in silence

2018-01-23 Thread Masahiko Sawada
On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
 wrote:
>
> Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada 
> escreveu:
>>
>> Hi all,
>>
>> While reading the code, I realized that the requesting an autovacuum
>> work-item could fail in silence if work-item array is full. So the
>> users cannot realize that work-item is never performed.
>> AutoVacuumRequestWork() seems to behave so from the initial
>> implementation but is there any reason of such behavior? It seems to
>> me that it can be a problem even now that there is only one kind of
>> work-item. Attached patch for fixing it.
>
>
> Seems reasonable but maybe you can use the word "worker" instead of "work
> item" for report message.
>

Thank you for the comment.
Or can we use the word "work-item" since the commit log and source
code use this word?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: pgsql: Add parallel-aware hash joins.

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 12:10 PM, Tom Lane  wrote:
> There is a very clear secular trend up in the longer data series,
> which indicates that we're testing more stuff,

+1

> which doesn't bother
> me in itself as long as the time is well spent.  However, the trend
> over the last two months is very bad, and I do not think that we can
> point to any large improvement in test coverage that someone committed
> since November.

I'm not sure if coverage.postgresql.org has a way to view historical
reports so we can see the actual percentage change, but as I recall
it, commit fa330f9ad "Add some regression tests that exercise hash
join code." pushed nodeHash.c and possibly nodeHashJoin.c into green
territory on here:

https://coverage.postgresql.org/src/backend/executor/index.html

> Looking more closely at the shorter series, there are four pretty obvious
> step changes since 2016-09.  The PNG's x-axis doesn't have enough
> resolution to match these up to commits, but looking at the underlying
> data, they clearly correspond to:
>
> Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
> Improve regression test coverage for hash indexes.
>
> Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
> Speed up hash_index regression test.
>
> Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
> Add some regression tests that exercise hash join code.

Joining check runtimes with the commit log (see attached), I see:

 2017-11-30 | fa330f9a | Add some regression tests that exercise  | 00:08:30
 2017-11-29 | 84940644 | New C function: bms_add_range| 00:08:18

That's +2.4%.

> Branch: master [180428404] 2017-12-21 00:43:41 -0800
> Add parallel-aware hash joins.

 2017-12-21 | cce1ecfc | Adjust assertion in GetCurrentCommandId. | 00:09:03
 2017-12-21 | 6719b238 | Rearrange execution of PARAM_EXTERN Para |
 2017-12-21 | 8a0596cb | Get rid of copy_partition_key|
 2017-12-21 | 9ef6aba1 | Fix typo |
 2017-12-21 | c98c35cd | Avoid putting build-location-dependent s |
 2017-12-21 | 59d1e2b9 | Cancel CV sleep during subtransaction ab |
 2017-12-21 | 18042840 | Add parallel-aware hash joins.   |
 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45

That's +3.4%.  That's a bit more than I expected.  I saw 2.5% on my
development box and hoped that'd be OK for a complex feature with a
lot of paths to test.

But hang on a minute -- how did we get to 08:45 from 08:30 between
those commits?  Of course this is all noisy data and individual
samples are all over the place, but I think I see some signal here:

 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45
 2017-12-19 | 7d3583ad | Test instrumentation of Hash nodes with  | 00:08:43
 2017-12-19 | 8526bcb2 | Try again to fix accumulation of paralle |
 2017-12-19 | 38fc5470 | Re-fix wrong costing of Sort under Gathe | 00:08:31
 2017-12-19 | 09a65f5a | Mark a few parallelism-related variables | 00:08:27

Both 8526bcb2 and 7d3583ad add Gather rescans.  Doesn't 8526bcb2 add a
query that forks 40 times?  I wonder how long it takes to start a
background worker on a Mac Cube.  From that commit:

+ ->  Gather (actual rows=9800 loops=10)
+   Workers Planned: 4
+   Workers Launched: 4
+   ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)

> I thought that the hash index test case was excessively expensive for
> what it covered, and I'm now thinking the same about hash joins.

Does join-test-shrink.patch (from my earlier message) help much, on
prairiedog?  It cut "check" time by ~1.5% on my low end machine.

-- 
Thomas Munro
http://www.enterprisedb.com
date|  commit  |   desc   |  check   
+--+--+--
 2018-01-22 | b3f84012 | Move handling of database properties fro | 00:09:20
 2018-01-22 | d6c84667 | Reorder code in pg_dump to dump comments | 
 2018-01-22 | f4987043 | PL/Python: Fix tests for older Python ve | 
 2018-01-22 | 2b792ab0 | Make pg_dump's ACL, sec label, and comme | 
 2018-01-22 | 8561e484 | Transaction control in PL procedures | 
 2018-01-22 | b9ff79b8 | Fix docs typo| 
 2018-01-21 | 1cc4f536 | Support huge pages on Windows| 00:09:20
 2018-01-21 | 5c15a54e | Fix wording of "hostaddrs"   | 00:09:19
 2018-01-21 | 815f84aa | doc:  update intermediate certificate in | 
 2018-01-20 | 918e02a2 | Improve type conversion of SPI_processed | 00:09:17
 2018-01-20 | 96102a32 | Suppress possibly-uninitialized-variable | 00:09:18
 2018-01-19 | eee50a8d | PL/Python: Simplify PLyLong_FromInt64| 00:09:19
 2018-01-19 | 2f178441 | Allow UPDATE to move rows between partit | 00:09:14
 2018-01-19 | 7f17fd6f | Fix CompareIndexInfo's attnum comparison | 
 2018-01-19 | 8b9e9644 | Replace AclObjectKind with 

Re: Remove utils/dsa.h from autovacuum.c

2018-01-23 Thread Masahiko Sawada
On Wed, Jan 24, 2018 at 3:39 AM, Alvaro Herrera  wrote:
> Masahiko Sawada wrote:
>> Hi,
>>
>> Attached a patch for $subject. The implementation of autovacuum
>> work-item has been changed by commit
>> 31ae1638ce35c23979f9bcbb92c6bb51744dbccb but the loading of dsa.h
>> header file is remained.
>
> You're right -- pushed.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Missing wal_receiver_status_interval in Subscribers section

2018-01-23 Thread Masahiko Sawada
On Wed, Jan 24, 2018 at 9:34 AM, Bruce Momjian  wrote:
> On Tue, Jan 23, 2018 at 06:36:04PM -0500, Bruce Momjian wrote:
>>
>> Can someone confirm this so I can apply this patch?
>
> Never mind.  I see this was applied:
>

Thank you for your kind response.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 04:22:22PM +0100, Daniel Gustafsson wrote:
> On 23 Jan 2018, at 05:52, Michael Paquier  wrote:
>> 2) indexam.sgml mentions using pg_strcasecmp() for consistency with the
>> core code when defining amproperty for an index AM. Well, with this
>> patch I think that for consistency with the core code that would involve
>> using strcmp instead, extension developers can of course still use
>> pg_strcasecmp.
> 
> That part I’m less sure about, the propname will not be casefolded by the
> parser so pg_strcasecmp() should still be the recommendation here no?

Yes, you are right. I had a brain fade here as all the option names here
go through SQL-callable functions.

>> Those are changed as well in the attached, which applies on top of your
>> v6. I have added as well in it the tests I spotted were missing. If this
>> looks right to you, I am fine to switch this patch as ready for
>> committer, without considering the issues with isCachable and isStrict
>> in CREATE FUNCTION of course.
> 
> Apart from the amproperty hunk, I’m definately +1 on adding your patch on top
> of my v6 one.  Thanks for all your help and review!

OK. Could you publish a v7? I will switch the entry as ready for
committer.
--
Michael


signature.asc
Description: PGP signature


RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-23 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> Oh, incidentally -- in our internal testing, we found that
> wal_sync_method=open_datasync was significantly faster than
> wal_sync_method=fdatasync.  You might find that open_datasync isn't much
> different from pmem_drain, even though they're both faster than fdatasync.

That's interesting.  How fast was open_datasync in what environment (Linux 
distro/kernel version, HDD or SSD etc.)?

Is it now time to change the default setting to open_datasync on Linux, at 
least when O_DIRECT is not used (i.e. WAL archiving or streaming replication is 
used)?

[Current port/linux.h]
/*
 * Set the default wal_sync_method to fdatasync.  With recent Linux versions,
 * xlogdefs.h's normal rules will prefer open_datasync, which (a) doesn't
 * perform better and (b) causes outright failures on ext4 data=journal
 * filesystems, because those don't support O_DIRECT.
 */
#define PLATFORM_DEFAULT_SYNC_METHODSYNC_METHOD_FDATASYNC


pg_test_fsync showed open_datasync is slower on my RHEL6 VM:

ep
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  4276.373 ops/sec 234 usecs/op
fdatasync  4895.256 ops/sec 204 usecs/op
fsync  4797.094 ops/sec 208 usecs/op
fsync_writethrough  n/a
open_sync  4575.661 ops/sec 219 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync  2243.680 ops/sec 446 usecs/op
fdatasync  4347.466 ops/sec 230 usecs/op
fsync  4337.312 ops/sec 231 usecs/op
fsync_writethrough  n/a
open_sync  2329.700 ops/sec 429 usecs/op
ep

Regards
Takayuki Tsunakawa



Re: documentation is now XML

2018-01-23 Thread Bruce Momjian
On Thu, Nov 23, 2017 at 03:39:24PM -0500, Tom Lane wrote:
> Also, we're way overdue for getting out from under the creaky TeX-based
> toolchain for producing PDFs.  Every time we make releases, I worry
> whether we're going to get blindsided by its bugs with hotlinks that get
> split across pages, since page breaks tend to vary in position depending
> on exactly whose version of the toolchain and style sheets you build with.
> I've also been living in fear of the day we hit some hardwired TeX limit
> that we can't increase or work around.  We've had to hack around such
> limits repeatedly in the past (eg commit 944b41fc0).  Now, it's probably
> unlikely that growth of the release notes would be enough to put us over
> the top in any back branch --- but if it did happen, we'd find out about
> it at a point in the release cycle where there's very little margin for
> error.

I am coming in late here, but I am not aware of any open source
professional typesetting software that has output quality as good as
TeX.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [PATCH] fix for C4141 warning on MSVC

2018-01-23 Thread Thomas Munro
On Wed, Jan 24, 2018 at 1:16 PM, Michail Nikolaev
 wrote:
> Just very small fix for C4141 warning
> (https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-1-c4141).
>
> Also could be viewed on Github -
> https://github.com/michail-nikolaev/postgres/commit/38a590a00110a4ea870d625470e4c898e5ad79aa
>
> Tested both MSVC and gcc builds.

Thanks.  This is similar to the fix I proposed over here:

https://www.postgresql.org/message-id/CAEepm%3D2iTKvbebiK3CdoczQk4_FfDt1EeU4c%2BnGE340JH7gQ0g%40mail.gmail.com

Two differences:

1.  I also fixed a warning from antique GCC which didn't understand this magic.
2.  You also defined it as plain old "inline" for unknown compilers.

Perhaps we should combine these?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-23 Thread David Rowley
On 23 January 2018 at 23:22, Amit Langote  wrote:
> On 2018/01/23 15:44, David Rowley wrote:
>> Attached is what I had in mind about how to do this.
>
> Thanks for the delta patch.  I will start looking at it tomorrow.

Thanks. I've been looking more at this and I've made a few more
adjustments in the attached.

This delta patch should be applied against the
faster_partition_prune_v21_delta_drowley_v1.patch one I sent
yesterday. This changes a few comments, also now correctly passes the
context to get_partitions_excluded_by_ne_clauses and fixes a small
error where the patch was failing to record a notnull for the
partition key when it saw a strict <> clause. It was only doing this
for the opposite case, but both seem to be perfectly applicable. I
also made a small adjustment to the regression tests to ensure this is
covered.

I'm now going to start work on basing the partition pruning patch on
top of this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_partition_prune_v21_delta_drowley_v1_delta.patch
Description: Binary data


Re: Missing wal_receiver_status_interval in Subscribers section

2018-01-23 Thread Bruce Momjian

Can someone confirm this so I can apply this patch?

---

On Fri, Nov 17, 2017 at 06:34:29PM +0900, Masahiko Sawada wrote:
> Hi,
> 
> I found that the doc says in 19.6.4. Subscribers section that "Note
> that wal_receiver_timeout and wal_retrieve_retry_interval
> configuration parameters affect the logical replication workers as
> well" but subscriber actually uses wal_receiver_status_interval as
> well. So I think we should mention it as well. Any thoughts?
> 
> Attached patch adds wal_receiver_stats_interval to the doc.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index 996e825..3c8c504 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -3410,7 +3410,8 @@ ANY  class="parameter">num_sync ( http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] Planning counters in pg_stat_statements

2018-01-23 Thread Tom Lane
Thomas Munro  writes:
> One problem is that pgss_planner_hook doesn't have the source query
> text.  That problem could be solved by adding a query_string argument
> to the planner hook function type and also planner(),
> standard_planner(), pg_plan_query(), pg_plan_queries().  I don't know
> if that change would be acceptable or if there is a better way that
> doesn't change extern functions that will annoy extension owners.

Within the planner, I'd be inclined to store the query string pointer in
PlannerGlobal (which is accessible from PlannerInfo "root"), but I'm
not sure how many of the functions you mention would still need an
explicit signature change.  Anyway that doesn't particularly bother
me --- it's something that might need to happen anyway, if we ever
hope to throw errors with location pointers from inside the planner.

> Something I wondered about but didn't try: we could skip the above
> problem AND the extra pgss_store() by instead pushing (query ID,
> planning time) into a backend-private buffer and then later pushing it
> into shmem when we eventually call pgss_store() for the execution
> counters.

Meh.  Part of the reason I don't like what you submitted before is that
it supposes there's a mostly one-to-one relationship between planner calls
and executor calls; which there is not, when you start considering edge
cases like prepared statements.  A global would make that worse.

> Unfortunately I'm not going to have bandwidth to figure this out
> during this commitfest due to other priorities so I'm going to call
> this "returned with feedback".

OK.  There's still time to get it done in the March 'fest.

regards, tom lane



Re: pgsql: Add parallel-aware hash joins.

2018-01-23 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jan 22, 2018 at 6:53 PM, Tom Lane  wrote:
>> Here's a possibly more useful graph of regression test timings over
>> the last year.  I pulled this from the buildfarm database: it is the
>> reported runtime for the "installcheck-C" step in each successful
>> build of HEAD on dromedary, going back to Jan. 2017.

> Right, but this doesn't seem to show any big spike in the runtime at
> the time when parallel hash was committed, or when the preparatory
> patch to add test coverage for hash joins got committed.  Rather,
> there's a gradual increase over time.

Well, there's just too much noise in this chart.  Let's try another
machine: prairiedog, which is a lot slower so that the 1s resolution
isn't such a limiting factor, and it's also one that I know there hasn't
been much of any system change in.

The first attached PNG shows the "installcheck-C" runtime for as far
back as the buildfarm database has the data, and the second zooms in
on events since late 2016.  As before, I've dropped individual outlier
results (those significantly slower than any nearby run) on the grounds
that they probably represent interference from nightly backups.  I also
attached the raw data (including outliers) in case anyone wants to do
their own analysis.

There is a very clear secular trend up in the longer data series,
which indicates that we're testing more stuff, which doesn't bother
me in itself as long as the time is well spent.  However, the trend
over the last two months is very bad, and I do not think that we can
point to any large improvement in test coverage that someone committed
since November.

Looking more closely at the shorter series, there are four pretty obvious
step changes since 2016-09.  The PNG's x-axis doesn't have enough
resolution to match these up to commits, but looking at the underlying
data, they clearly correspond to:

Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
Improve regression test coverage for hash indexes.

Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
Speed up hash_index regression test.

Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
Add some regression tests that exercise hash join code.

Branch: master [180428404] 2017-12-21 00:43:41 -0800
Add parallel-aware hash joins.

I thought that the hash index test case was excessively expensive for
what it covered, and I'm now thinking the same about hash joins.

regards, tom lane

  snapshot   | stage_duration 
-+
 2014-03-19 20:09:45 | 00:03:36
 2014-03-20 03:14:59 | 00:03:36
 2014-03-20 20:18:16 | 00:03:37
 2014-03-21 15:14:58 | 00:03:37
 2014-03-22 08:13:01 | 00:03:33
 2014-03-23 17:31:09 | 00:03:36
 2014-03-24 03:15:41 | 00:03:40
 2014-03-24 16:25:33 | 00:03:40
 2014-03-25 17:31:06 | 00:03:41
 2014-03-26 07:40:27 | 00:03:42
 2014-03-26 15:15:26 | 00:03:39
 2014-03-28 05:31:19 | 00:03:42
 2014-03-28 15:15:16 | 00:03:41
 2014-03-29 03:15:25 | 00:03:41
 2014-03-29 20:11:03 | 00:03:40
 2014-03-30 04:25:56 | 00:03:40
 2014-03-31 03:15:13 | 00:03:40
 2014-03-31 15:15:26 | 00:03:40
 2014-04-01 05:30:55 | 00:03:39
 2014-04-01 18:27:27 | 00:03:41
 2014-04-02 06:27:47 | 00:03:41
 2014-04-03 06:27:32 | 00:03:40
 2014-04-03 20:11:41 | 00:03:39
 2014-04-04 08:11:19 | 00:03:41
 2014-04-04 20:11:51 | 00:03:40
 2014-04-05 07:22:58 | 00:03:40
 2014-04-06 08:10:55 | 00:03:37
 2014-04-07 03:15:40 | 00:03:38
 2014-04-07 15:15:37 | 00:03:38
 2014-04-08 05:31:15 | 00:03:40
 2014-04-08 19:19:25 | 00:03:40
 2014-04-09 03:15:51 | 00:03:39
 2014-04-09 19:24:29 | 00:03:39
 2014-04-10 06:08:41 | 00:03:38
 2014-04-10 15:15:44 | 00:03:39
 2014-04-11 03:15:42 | 00:03:39
 2014-04-13 03:15:38 | 00:03:40
 2014-04-13 15:15:48 | 00:03:40
 2014-04-14 03:15:38 | 00:03:41
 2014-04-14 15:15:49 | 00:03:40
 2014-04-15 03:15:43 | 00:03:41
 2014-04-15 15:15:39 | 00:03:42
 2014-04-16 03:15:48 | 00:03:42
 2014-04-16 20:11:45 | 00:03:41
 2014-04-17 07:06:24 | 00:03:42
 2014-04-17 16:25:58 | 00:03:42
 2014-04-18 08:11:54 | 00:03:41
 2014-04-18 15:15:40 | 00:03:41
 2014-04-19 15:15:44 | 00:03:41
 2014-04-20 03:15:38 | 00:03:41
 2014-04-20 15:15:44 | 00:03:41
 2014-04-22 05:31:27 | 00:03:42
 2014-04-22 15:15:49 | 00:03:41
 2014-04-23 03:15:41 | 00:03:40
 2014-04-23 18:28:37 | 00:03:40
 2014-04-24 06:28:45 | 00:03:39
 2014-04-24 15:15:44 | 00:03:41
 2014-04-25 08:12:47 | 00:03:42
 2014-04-26 03:15:44 | 00:03:42
 2014-04-27 16:26:23 | 00:03:41
 2014-04-28 04:26:14 | 00:03:42
 2014-04-28 20:13:16 | 00:03:41
 2014-04-29 15:15:50 | 00:03:41
 2014-04-30 05:32:21 | 00:03:43
 2014-04-30 20:13:44 | 00:03:42
 2014-05-01 05:57:23 | 00:03:40
 2014-05-02 08:13:47 | 00:03:43
 2014-05-04 15:15:53 | 00:03:44
 2014-05-05 15:16:21 | 00:03:44
 2014-05-06 08:13:57 | 00:03:42
 2014-05-06 19:23:42 | 00:03:43
 2014-05-07 08:14:54 | 00:03:43
 2014-05-08 08:12:04 | 00:03:40
 2014-05-08 20:12:25 | 00:03:41

Re: [HACKERS] Planning counters in pg_stat_statements

2018-01-23 Thread Thomas Munro
On Fri, Jan 19, 2018 at 12:14 AM, Thomas Munro
 wrote:
> On Sat, Jan 13, 2018 at 2:16 PM, Tom Lane  wrote:
>> What we could/should do instead, I think, is have pgss_planner_hook
>> make its own pgss_store() call to log the planning time results
>> (which would mean we don't need the added PlannedStmt field at all).
>> That would increase the overhead of this feature, which might mean
>> that it'd be worth having a pg_stat_statements GUC to enable it.
>> But it'd be a whole lot cleaner.
>
> Ok.  It seems a bit strange to have to go through the locking twice,
> but I don't have a better idea.  First attempt seems to be working.

One problem is that pgss_planner_hook doesn't have the source query
text.  That problem could be solved by adding a query_string argument
to the planner hook function type and also planner(),
standard_planner(), pg_plan_query(), pg_plan_queries().  I don't know
if that change would be acceptable or if there is a better way that
doesn't change extern functions that will annoy extension owners.
When I tried that it basically worked except for a problem with
"PREPARE ... " (what we want to show) vs "" (what my experimental patch now shows -- oops).

Something I wondered about but didn't try: we could skip the above
problem AND the extra pgss_store() by instead pushing (query ID,
planning time) into a backend-private buffer and then later pushing it
into shmem when we eventually call pgss_store() for the execution
counters.  If you think of that as using global variables to pass
state between functions just because we didn't structure our program
right it sounds terrible, but if you think of it as a dtrace-style
per-thread tracing event buffer that improves performance by batching
up collected data, it sounds great :-D

Unfortunately I'm not going to have bandwidth to figure this out
during this commitfest due to other priorities so I'm going to call
this "returned with feedback".

-- 
Thomas Munro
http://www.enterprisedb.com



For consideration - clarification of multi-dimensional arrays in our docs.

2018-01-23 Thread David G. Johnston
Hey all!

This doesn't come up that often but enough that it seems hammering home
that multi-dimension <> array-of-array seems warranted.

The first and last chuck cover definition and iteration respectively.  The
second chuck removes the mention of "subarray" since that's what we don't
want people to think.

If the note seems a bit heavy-handed working it in as a normal paragraph in
the same or nearby location would work too.

Thoughts?

diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
index f4d4a610ef..a2dfd4de0a 100644
--- a/doc/src/sgml/array.sgml
+++ b/doc/src/sgml/array.sgml
@@ -14,6 +14,12 @@
   or domain can be created.
  

+ 
+  A multi-dimensional array is not the same as an array-of-arrays.  In
+  particular it is not useful to access a two-dimensional array with
+  a single dimension - you will be given null instead of an array
+  containing the elements of the specified row.
+ 
  
   Declaration of Array Types

@@ -115,7 +121,7 @@ CREATE TABLE tictactoe (
 '{{1,2,3},{4,5,6},{7,8,9}}'
 
This constant is a two-dimensional, 3-by-3 array consisting of
-   three subarrays of integers.
+   nine integers.
   

   
@@ -374,6 +380,25 @@ SELECT cardinality(schedule) FROM sal_emp WHERE name =
'Carol';
 (1 row)
 
  
+
+ 
+  unnest, and other iteration access of arrays, is
performed
+  depth-first, for each lower dimension all higher dimension cells are
returned
+  before the next lower bound index.  For a two-dimensional array this is
termed
+  row-first.
+
+
+SELECT unnest(schedule) FROM sal_emp WHERE name = 'Bill';
+
+unnest
+--
+ meeting
+ lunch
+ training
+ presentation
+(4 rows)
+
+ 
  

  

David J.
diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
index f4d4a610ef..a2dfd4de0a 100644
--- a/doc/src/sgml/array.sgml
+++ b/doc/src/sgml/array.sgml
@@ -14,6 +14,12 @@
   or domain can be created.
  
 
+ 
+  A multi-dimensional array is not the same as an array-of-arrays.  In
+  particular it is not useful to access a two-dimensional array with
+  a single dimension - you will be given null instead of an array
+  containing the elements of the specified row.
+ 
  
   Declaration of Array Types
 
@@ -115,7 +121,7 @@ CREATE TABLE tictactoe (
 '{{1,2,3},{4,5,6},{7,8,9}}'
 
This constant is a two-dimensional, 3-by-3 array consisting of
-   three subarrays of integers.
+   nine integers.
   
 
   
@@ -374,6 +380,25 @@ SELECT cardinality(schedule) FROM sal_emp WHERE name = 
'Carol';
 (1 row)
 
  
+
+ 
+  unnest, and other iteration access of arrays, is 
performed
+  depth-first, for each lower dimension all higher dimension cells are returned
+  before the next lower bound index.  For a two-dimensional array this is 
termed
+  row-first.
+
+
+SELECT unnest(schedule) FROM sal_emp WHERE name = 'Bill';
+
+unnest
+--
+ meeting
+ lunch
+ training
+ presentation
+(4 rows)
+
+ 
  
 
  


Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 1:05 PM, Robert Haas  wrote:
> I was just talking to Thomas, and one or the other of us realized that
> this doesn't completely fix the problem.  I think that if a worker
> fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
> after the point where it attached to the error queue, this patch is
> sufficient to make that reliably turn into an error.  But what if it
> fails before that - e.g. fork() failure?  In that case, this process
> will catch the problem if the calling process calls
> WaitForParallelWorkersToFinish, but does that actually happen in any
> interesting cases?  Maybe not.
>
> In the case of Gather, for example, we won't
> WaitForParallelWorkersToFinish() until we've read all the tuples.  If
> there's a tuple queue that does not yet have a sender, then we'll just
> wait for it to get one, even if the sender fell victim to a fork
> failure and is never going to show up.
>
> We could *almost* fix this by having execParallel.c include a Barrier
> in the DSM, similar to what I proposed for parallel CREATE INDEX.
> When a worker starts, it attaches to the barrier; when it exits, it
> detaches.  Instead of looping until the leader is done and all the
> tuple queues are detached, Gather or Gather Merge could loop until the
> leader is done and everyone detaches from the barrier.  But that would
> require changes to the Barrier API, which isn't prepared to have the
> caller alternate waiting with other activity like reading the tuple
> queues, which would clearly be necessary in this case.  Moreover, it
> doesn't work at all when parallel_leader_participation=off, because in
> that case we'll again just wait forever for some worker to show up,
> and if none of them do, then we're hosed.

I'm relieved that we all finally seem to be on the same page on this issue.

> One idea to fix this is to add a new function in parallel.c with a
> name like CheckForFailedWorkers() that Gather and Gather Merge call
> just before they WaitLatch().  If a worker fails to start, the
> postmaster will send SIGUSR1 to the leader, which will set the latch,
> so we can count on the function being run after every worker death.
> The function would go through and check each worker for which
> pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to
> see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) ==
> BGWH_STOPPED.  If so, ERROR.

Sounds workable to me.

> This lets us *run* for arbitrary periods of time without detecting a
> fork failure, but before we *wait* we will notice.  Getting more
> aggressive than that sounds expensive, but I'm open to ideas.

Is that really that different to any other case? I imagine that in
practice most interrupt checks happen within the leader in a
WaitLatch() loop (or similar). Besides, the kinds of failures we're
talking about are incredibly rare in the real world. I think it's fine
if the leader is not very promptly aware of when this happens, or if
execution becomes slow in some other way.

> If we adopt this approach, then Peter's patch could probably make use
> of it, too.  It's a little tricky because ConditionVariableSleep()
> tries not to return spuriously, so we'd either have to change that or
> stick CheckForFailedWorkers() into that function's loop.  I suspect
> the former is a better approach.  Or maybe that patch should still use
> the Barrier approach and avoid all of this annoyance.

I have the same misgivings about using a barrier to solve this problem
for parallel CREATE INDEX as before. Specifically: why should
nbtsort.c solve this problem for itself? I don't think that it's in
any way special. One can imagine exactly the same problem with other
parallel DDL implementations, fixable using exactly the same solution.

I'm not saying that nbtsort.c shouldn't have to do anything
differently; just that it should buy into some general solution. That
said, it would be ideal if the current approach I take in nbtsort.c
didn't have to be changed *at all*, because if it was 100% correct
already then I think we'd have a more robust parallel.h interface (it
is certainly a "nice to have" for me). The fact that multiple hackers
at least tacitly assumed that nworkers_launched is a reliable
indicator of how many workers will show up tells us something.
Moreover, parallel_leader_participation=off is always going to be
wonky under a regime based on "see who shows up", because failure
becomes indistinguishable from high latency -- I'd rather avoid having
to think about nbtsort.c in distributed systems terms.

-- 
Peter Geoghegan



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Thomas Munro
On Fri, Jan 19, 2018 at 6:22 AM, Robert Haas  wrote:
>> (3)
>> erm, maybe it's a problem that errors occurring in workers while the
>> leader is waiting at a barrier won't unblock the leader (we don't
>> detach from barriers on abort/exit) -- I'll look into this.
>
> I think if there's an ERROR, the general parallelism machinery is
> going to arrange to kill every worker, so nothing matters in that case
> unless barrier waits ignore interrupts, which I'm pretty sure they
> don't.  (Also: if they do, I'll hit the ceiling; that would be awful.)

(After talking this through with Robert off-list).  Right, the
CHECK_FOR_INTERRUPTS() in ConditionVariableSleep() handles errors from
parallel workers.  There is no problem here.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Libpq support to connect to standby server as priority

2018-01-23 Thread Jing Wang
Hi All,

Recently I put a proposal to support 'prefer-read' parameter in
target_session_attrs in libpq. Now I updated the patch with adding content
in the sgml and regression test case.

Some people may have noticed there is already another patch (
https://commitfest.postgresql.org/15/1148/ ) which looks similar with this.
But I would say this patch is more complex than my proposal.

It is better separate these 2 patches to consider.

Regards,
Jing Wang
Fujitsu Australia


libpq_support_perfer-read_002.patch
Description: Binary data


Tab complete after FROM ONLY

2018-01-23 Thread Vik Fearing
While reviewing the patch for tab completion after SELECT, I realized
that psql doesn't know how to complete after FROM if the ONLY keyword is
present.

This trivial patch fixes that, as well as JOIN ONLY and TABLE ONLY.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 8bc4a194a5..68b155c778 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3359,7 +3359,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_CONST("TRANSACTION");
 
 /* TABLE, but not TABLE embedded in other commands */
-	else if (Matches1("TABLE"))
+	else if (Matches1("TABLE") || Matches2("TABLE", "ONLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_relations, NULL);
 
 /* TABLESAMPLE */
@@ -3455,11 +3455,11 @@ psql_completion(const char *text, int start, int end)
 
 /* ... FROM ... */
 /* TODO: also include SRF ? */
-	else if (TailMatches1("FROM") && !Matches3("COPY|\\copy", MatchAny, "FROM"))
+	else if ((TailMatches1("FROM") || TailMatches2("FROM", "ONLY")) && !Matches3("COPY|\\copy", MatchAny, "FROM"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tsvmf, NULL);
 
 /* ... JOIN ... */
-	else if (TailMatches1("JOIN"))
+	else if (TailMatches1("JOIN") || TailMatches2("JOIN", "ONLY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tsvmf, NULL);
 
 /* Backslash commands */


Re: Using ProcSignal to get memory context stats from a running backend

2018-01-23 Thread Alvaro Herrera
Did this go anywhere?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: BRIN multi-range indexes

2018-01-23 Thread Tomas Vondra


On 01/23/2018 09:05 PM, Alvaro Herrera wrote:
> This stuff sounds pretty nice.  However, have a look at this report:
> 
> https://codecov.io/gh/postgresql-cfbot/postgresql/commit/2aa632dae3066900e15d2d42a4aad811dec11f08
> 
> it seems to me that the new code is not tested at all.  Shouldn't you
> add a few more tests?
> 

I have a hard time reading the report, but you're right I haven't added
any tests for the new opclasses (bloom and minmax_multi). I agree that's
something that needs to be addressed.

> I think 0004 should apply to unpatched master (except for the parts
> that concern files not in master); sounds like a good candidate for
> first apply. Then 0001, which seems mostly just refactoring. 0002 and
> 0003 are the really interesting ones (minus the code removed by
> 0004).
> 

That sounds like a reasonable plan. I'll reorder the patch series along
those lines in the next few days.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Robert Haas
On Tue, Jan 23, 2018 at 2:11 PM, Peter Geoghegan  wrote:
> Finally, it's still not clear to me why nodeGather.c's use of
> parallel_leader_participation=off doesn't suffer from similar problems
> [1].

Thomas and I just concluded that it does.  See my email on the other
thread just now.

I thought that I had the failure cases all nailed down here now, but I
guess not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Robert Haas
On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas  wrote:
> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila  wrote:
>> It does turn such cases into error but only at the end when someone
>> calls WaitForParallelWorkersToFinish.  If one tries to rely on it at
>> any other time, it won't work as I think is the case for Parallel
>> Create Index where Peter G. is trying to use it differently.  I am not
>> 100% sure whether Parallel Create index has this symptom as I have not
>> tried it with that patch, but I and Peter are trying to figure that
>> out.
>
> OK.  I've committed this patch and back-patched it to 9.6.
> Back-patching to 9.5 didn't looks simple because nworkers_launched
> doesn't exist on that branch, and it doesn't matter very much anyway
> since the infrastructure has no in-core users in 9.5.

I was just talking to Thomas, and one or the other of us realized that
this doesn't completely fix the problem.  I think that if a worker
fails, even by some unorthodox mechanism like an abrupt proc_exit(1),
after the point where it attached to the error queue, this patch is
sufficient to make that reliably turn into an error.  But what if it
fails before that - e.g. fork() failure?  In that case, this process
will catch the problem if the calling process calls
WaitForParallelWorkersToFinish, but does that actually happen in any
interesting cases?  Maybe not.

In the case of Gather, for example, we won't
WaitForParallelWorkersToFinish() until we've read all the tuples.  If
there's a tuple queue that does not yet have a sender, then we'll just
wait for it to get one, even if the sender fell victim to a fork
failure and is never going to show up.

We could *almost* fix this by having execParallel.c include a Barrier
in the DSM, similar to what I proposed for parallel CREATE INDEX.
When a worker starts, it attaches to the barrier; when it exits, it
detaches.  Instead of looping until the leader is done and all the
tuple queues are detached, Gather or Gather Merge could loop until the
leader is done and everyone detaches from the barrier.  But that would
require changes to the Barrier API, which isn't prepared to have the
caller alternate waiting with other activity like reading the tuple
queues, which would clearly be necessary in this case.  Moreover, it
doesn't work at all when parallel_leader_participation=off, because in
that case we'll again just wait forever for some worker to show up,
and if none of them do, then we're hosed.

One idea to fix this is to add a new function in parallel.c with a
name like CheckForFailedWorkers() that Gather and Gather Merge call
just before they WaitLatch().  If a worker fails to start, the
postmaster will send SIGUSR1 to the leader, which will set the latch,
so we can count on the function being run after every worker death.
The function would go through and check each worker for which
pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to
see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) ==
BGWH_STOPPED.  If so, ERROR.

This lets us *run* for arbitrary periods of time without detecting a
fork failure, but before we *wait* we will notice.  Getting more
aggressive than that sounds expensive, but I'm open to ideas.

If we adopt this approach, then Peter's patch could probably make use
of it, too.  It's a little tricky because ConditionVariableSleep()
tries not to return spuriously, so we'd either have to change that or
stick CheckForFailedWorkers() into that function's loop.  I suspect
the former is a better approach.  Or maybe that patch should still use
the Barrier approach and avoid all of this annoyance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-01-23 Thread Peter Eisentraut
On 1/23/18 14:59, Daniel Gustafsson wrote:
> It’s not specific to the implementation per se, but it increases the 
> likelyhood
> of hitting it.  In order to load certificates from Keychains the cert common
> name must be specified in the connstr, when importing the testfiles into
> keychains I ran into it for example src/test/ssl/client_ca.config.

The change is

- 'psql', '-X', '-A', '-t', '-c', "SELECT 'connected with $connstr'",
+ 'psql', '-X', '-A', '-t', '-c', "SELECT \$\$connected with $connstr\$\$",

So the problem must have been a single quote in the connstr.

That can surely happen, but then so can having a $$.  So without a
concrete example, I'm not sure how to proceed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: BRIN multi-range indexes

2018-01-23 Thread Alvaro Herrera
This stuff sounds pretty nice.  However, have a look at this report:

https://codecov.io/gh/postgresql-cfbot/postgresql/commit/2aa632dae3066900e15d2d42a4aad811dec11f08

it seems to me that the new code is not tested at all.  Shouldn't you
add a few more tests?

I think 0004 should apply to unpatched master (except for the parts that
concern files not in master); sounds like a good candidate for first
apply.  Then 0001, which seems mostly just refactoring.  0002 and 0003
are the really interesting ones (minus the code removed by 0004).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-01-23 Thread Daniel Gustafsson
> On 23 Jan 2018, at 18:20, Peter Eisentraut  
> wrote:
> 
> On 1/21/18 18:08, Daniel Gustafsson wrote:
>> As per before, my patch for running tests against another set of binaries is
>> included as well as a fix for connstrings with spaces, but with the recent
>> hacking by Peter I assume this is superfluous.  It was handy for development 
>> so
>> I’ve kept it around though.
> 
> 0002-Allow-running-SSL-tests-against-different-binar-v4.patch should be
> obsoleted by f5da5683a86e9fc42fdf3eae2da8b096bda76a8a.

Yes.

> 0003-Allow-spaces-in-connectionstrings-v4.patch looks reasonable, but
> I'm not sure what circumstance is triggering that.  Is it specific to
> your implementation?

It’s not specific to the implementation per se, but it increases the likelyhood
of hitting it.  In order to load certificates from Keychains the cert common
name must be specified in the connstr, when importing the testfiles into
keychains I ran into it for example src/test/ssl/client_ca.config.

cheers ./daniel


pg_rewind and replication slots

2018-01-23 Thread Laurenz Albe
When a primary with replication slots gets reset with pg_rewind,
it keeps the replication slots.

This does no harm per se, but when it gets promoted again,
the replication slots are still there and are in the way.
Won't they also hold back the xmin horizon?

I think that pg_rewind should delete all replication slots.

Yours,
Laurenz Albe



Re: pgsql: Add parallel-aware hash joins.

2018-01-23 Thread Robert Haas
On Mon, Jan 22, 2018 at 6:53 PM, Tom Lane  wrote:
> Here's a possibly more useful graph of regression test timings over
> the last year.  I pulled this from the buildfarm database: it is the
> reported runtime for the "installcheck-C" step in each successful
> build of HEAD on dromedary, going back to Jan. 2017.  I picked dromedary
> because I know that that machine hasn't gotten any software updates
> nor is there anything else very interesting going on on it.  I dropped
> three or four obvious outlier reports (possibly those ran during the
> machine's nightly backup cron job).  The reported runtime is only
> precise to 1s, and a couple seconds jitter is hardly surprising, so
> there's a good deal of noise.  Still, it's possible to discern when
> I put some effort into test runtime reduction back in April, and
> it can be seen that things have gotten notably slower since the
> beginning of November.

Right, but this doesn't seem to show any big spike in the runtime at
the time when parallel hash was committed, or when the preparatory
patch to add test coverage for hash joins got committed.  Rather,
there's a gradual increase over time.  Either we're making the server
slower (which would be bad) or we're adding proper test coverage for
all the new features that we're adding (which would be good).  We
can't expect every feature patch to preserve the runtime of the tests
absolutely unchanged; figuring out what can be optimized is a separate
exercise from adding test coverage either for new things or for things
that weren't previously covered.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-23 Thread Petr Jelinek
On 23/01/18 16:01, Nikhil Sontakke wrote:
>> I must be missing something in this discussion, cos I don't see any
>> problems with this "other option".
>>
>> Surely we prepare the 2PCxact and then it is persistent. Yes,
>> potentially for an unbounded period of time. And it is (usually) up to
>> the XA manager to resolve that. 2PC interacts with transaction
>> management and yes, it can be slow. But the choice is slow and
>> consistent, or not. This would only be used with the full choice of
>> the user, just like synchronous_commit.

It's not about transaction being persistent but the abort command being
blocked.

>>
>> In this case, we call the decoding plugin's precommit hook which would
>> then prepare the 2PCxact and set a non-persistent flag saying it is
>> being decoded. If decoding completes normally we release the lock and
>> commit. If decoding fails or the DBA has another reason to do so, we
>> provide a function that allows the flag to be unlocked. While it is
>> locked the 2PCxact cannot be aborted or committed.

Output plugin can't deal with precommit, that has to be handled
elsewhere but in principle this is true.

>>
>> There is no danger of accidental abort because the prepare has persistent 
>> state.
> 
> This concurrent abort handling while decoding is ongoing is turning
> out to be complex affair.
> 
> Thinking more about this, just to provide an example, we have a
> decoding plugin hook to determine if a GID for a 2PC was decoded at
> PREPARE time or COMMIT time as part of the 2PC logical decoding patch.
> We need that to determine the *same* static answer every time we see a
> specific GID while decoding across restarts; the plugin should know
> what it had done the last time around and should tell us the same
> later as well. It just occurred to me that as Simon also mentioned, it
> could/should also be the decoding plugin's responsibility to indicate
> if it's ok to go ahead with the abort of the transaction.
> 
> So, we could consider adding a preAbort hook. That preAbort hook gets
> the GID, XID and other parameters as needed and tells us whether we
> can go ahead with the abort or if we need to wait out (maybe we pass
> in an ok_to_wait param as well). As an example, a plugin could lookup
> some shmem structure which points to the current transaction being
> decoded and does related processing to ensure that it stops decoding
> at a clean juncture, thus keeping the response time bounded to a
> maximum of one change record apply cycle. That passes the onus onto
> the plugin writers and keeps the core code around this concurrent
> abort handling clean.
> 

Having this as responsibility of plugin sounds interesting. It certainly
narrows the scope for which we need to solve the abort issue. For 2PC
that may be okay as we need to somehow interact with transaction manager
as Simon noted. I am not sure if this helps streaming use-case though as
there is not going to be any external transaction management involved there.

In any case all this interlocking could potentially be made less
impact-full by only doing it when we know the transaction did catalog
changes prior to currently decoded change (which we do during decoding)
since that's the only time we are interested in if it aborted or not.

This all leads me to another idea. What if logical decoding provided API
for "locking/unlocking" the currently decoded transaction against abort.
This function would then be called by both decoding and output plugin
before any catalog read. The function can be smart enough to be NOOP if
the transaction is not running (ie we are not doing 2PC decoding or
streaming) or when the transaction didn't do any catalog modifications
(we already have that info easily accessible as bool).

That would mean we'd never do any kind of heavy locking for prolonged
periods of time (ie network calls) but only during catalog access and
only when needed. It would also solve this for both 2PC and streaming
and it would be easy to use by plugin authors. Just document that some
call should be done before catalog access when in output plugin, can
even be Asserted that the call was done probably.

Thoughts?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services




Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 10:50 AM, Peter Geoghegan  wrote:
> On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas  wrote:
>> I am going to repeat my previous suggest that we use a Barrier here.
>> Given the discussion subsequent to my original proposal, this can be a
>> lot simpler than what I suggested originally.  Each worker does
>> BarrierAttach() before beginning to read tuples (exiting if the phase
>> returned is non-zero) and BarrierArriveAndDetach() when it's done
>> sorting.  The leader does BarrierAttach() before launching workers and
>> BarrierArriveAndWait() when it's done sorting.  If we don't do this,
>> we're going to have to invent some other mechanism to count the
>> participants that actually initialize successfully, but that seems
>> like it's just duplicating code.
>
> I think that this closes the door to leader non-participation as
> anything other than a developer-only debug option, which might be
> fine. If parallel_leader_participation=off (or some way of getting the
> same behavior through a #define) is to be retained, then an artificial
> wait is required as a substitute for the leader's participation as a
> worker.

This idea of an artificial wait seems pretty grotty to me. If we made
it one second, would that be okay with Valgrind builds? And when it
wasn't sufficient, wouldn't we be back to waiting forever?

Finally, it's still not clear to me why nodeGather.c's use of
parallel_leader_participation=off doesn't suffer from similar problems
[1].

[1] 
https://postgr.es/m/CAH2-Wz=cAMX5btE1s=aTz7CLwzpEPm_NsUhAMAo5t5=1i9v...@mail.gmail.com
-- 
Peter Geoghegan



Re: using index or check in ALTER TABLE SET NOT NULL

2018-01-23 Thread Sergei Kornilov
Hello Stephen

I changed the suggested things and added some regression tests. Also i wrote 
few words to the documentation. New patch is attached.

Thank you for feedback!
regards, Sergeidiff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 286c7a8..cf2c56f 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -189,6 +189,13 @@ ALTER TABLE [ IF EXISTS ] name
  
 
  
+  Full table scan is performed to check that no existing row
+  in the table has null values in given column.  It is possible to avoid
+  this scan by adding a valid CHECK constraint to
+  the table that would allow only NOT NULL values for given column.
+ 
+
+ 
   If this table is a partition, one cannot perform DROP NOT NULL
   on a column if it is marked NOT NULL in the parent
   table.  To drop the NOT NULL constraint from all the
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee..6f7ec4a 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1235,7 +1235,7 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 	 * not contain any row that would belong to the new partition, we can
 	 * avoid scanning the default partition.
 	 */
-	if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
+	if (ConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
 	{
 		ereport(INFO,
 (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
@@ -1279,8 +1279,8 @@ check_default_allows_bound(Relation parent, Relation default_rel,
 			 * that it will not contain any row that would belong to the new
 			 * partition, we can avoid scanning the child table.
 			 */
-			if (PartConstraintImpliedByRelConstraint(part_rel,
-	 def_part_constraints))
+			if (ConstraintImpliedByRelConstraint(part_rel,
+ def_part_constraints))
 			{
 ereport(INFO,
 		(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2e768dd..4f0916d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -162,7 +162,7 @@ typedef struct AlteredTableInfo
 	/* Information saved by Phases 1/2 for Phase 3: */
 	List	   *constraints;	/* List of NewConstraint */
 	List	   *newvals;		/* List of NewColumnValue */
-	bool		new_notnull;	/* T if we added new NOT NULL constraints */
+	bool		verify_new_notnull; /* T if we should recheck NOT NULL */
 	int			rewrite;		/* Reason for forced rewrite, if any */
 	Oid			newTableSpace;	/* new tablespace; 0 means no change */
 	bool		chgPersistence; /* T if SET LOGGED/UNLOGGED is used */
@@ -377,6 +377,7 @@ static ObjectAddress ATExecDropNotNull(Relation rel, const char *colName, LOCKMO
 static void ATPrepSetNotNull(Relation rel, bool recurse, bool recursing);
 static ObjectAddress ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
  const char *colName, LOCKMODE lockmode);
+static bool NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr);
 static ObjectAddress ATExecColumnDefault(Relation rel, const char *colName,
 	Node *newDefault, LOCKMODE lockmode);
 static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName,
@@ -4368,7 +4369,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			 * Test the current data within the table against new constraints
 			 * generated by ALTER TABLE commands, but don't rebuild data.
 			 */
-			if (tab->constraints != NIL || tab->new_notnull ||
+			if (tab->constraints != NIL || tab->verify_new_notnull ||
 tab->partition_constraint != NULL)
 ATRewriteTable(tab, InvalidOid, lockmode);
 
@@ -4529,7 +4530,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	}
 
 	notnull_attrs = NIL;
-	if (newrel || tab->new_notnull)
+	if (newrel || tab->verify_new_notnull)
 	{
 		/*
 		 * If we are rebuilding the tuples OR if we added any new NOT NULL
@@ -5528,7 +5529,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		 * null, but since it's filled specially, there's no need to test
 		 * anything.)
 		 */
-		tab->new_notnull |= colDef->is_not_null;
+		tab->verify_new_notnull |= colDef->is_not_null;
 	}
 
 	/*
@@ -5930,8 +5931,17 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel,
 
 		CatalogTupleUpdate(attr_rel, >t_self, tuple);
 
-		/* Tell Phase 3 it needs to test the constraint */
-		tab->new_notnull = true;
+		/*
+		 * Verifying table data can be done in Phase 3 with single table scan
+		 * for all constraint (both existing and new)
+		 * We can skip this check only if every new NOT NULL contraint
+		 * implied by existed table constraints
+		 */
+		if (!NotNullImpliedByRelConstraints(rel, (Form_pg_attribute) GETSTRUCT(tuple)))
+		{
+			

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Peter Geoghegan
On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas  wrote:
> As Amit says, what remains is the case where fork() fails or the
> worker dies before it reaches the line in ParallelWorkerMain that
> reads shm_mq_set_sender(mq, MyProc).  In those cases, no error will be
> signaled until you call WaitForParallelWorkersToFinish().  If you wait
> prior to that point for a number of workers equal to
> nworkers_launched, you will wait forever in those cases.

Another option might be to actually call
WaitForParallelWorkersToFinish() in place of a condition variable or
barrier, as Amit suggested at one point.

> I am going to repeat my previous suggest that we use a Barrier here.
> Given the discussion subsequent to my original proposal, this can be a
> lot simpler than what I suggested originally.  Each worker does
> BarrierAttach() before beginning to read tuples (exiting if the phase
> returned is non-zero) and BarrierArriveAndDetach() when it's done
> sorting.  The leader does BarrierAttach() before launching workers and
> BarrierArriveAndWait() when it's done sorting.  If we don't do this,
> we're going to have to invent some other mechanism to count the
> participants that actually initialize successfully, but that seems
> like it's just duplicating code.

I think that this closes the door to leader non-participation as
anything other than a developer-only debug option, which might be
fine. If parallel_leader_participation=off (or some way of getting the
same behavior through a #define) is to be retained, then an artificial
wait is required as a substitute for the leader's participation as a
worker.

-- 
Peter Geoghegan



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-23 Thread Nikolay Shaplov
В письме от 23 января 2018 12:03:50 пользователь Peter Eisentraut написал:

> >>> This patch raises error if user tries o set or change toast.* option for
> >>> a table that does not have a TOST relation.
> > 
> > There might be a case for raising a warning in this situation,
> > but I would disagree with making it a hard error in any case.
> > All that's going to accomplish is to break people's scripts and
> > get them mad at you.
> 
> It might also be useful to set these reloptions before you add a new
> column to a table.
As Robert have just said, this will not work with current master.

If we, as I've suggested in previous letter, will force TOAST creation if 
toast.* option is set, then your use case will also work.

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-23 Thread Robert Haas
On Fri, Jan 19, 2018 at 9:42 AM, Robert Haas  wrote:
> That's not necessarily an argument against this patch, which by the
> way I have not reviewed.  Even a 5% speedup on this kind of workload
> is potentially worthwhile; everyone likes it when things go faster.
> I'm just not convinced you can get very much more than that on a
> realistic workload.  Of course, I might be wrong.

Oh, incidentally -- in our internal testing, we found that
wal_sync_method=open_datasync was significantly faster than
wal_sync_method=fdatasync.  You might find that open_datasync isn't
much different from pmem_drain, even though they're both faster than
fdatasync.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Remove utils/dsa.h from autovacuum.c

2018-01-23 Thread Alvaro Herrera
Masahiko Sawada wrote:
> Hi,
> 
> Attached a patch for $subject. The implementation of autovacuum
> work-item has been changed by commit
> 31ae1638ce35c23979f9bcbb92c6bb51744dbccb but the loading of dsa.h
> header file is remained.

You're right -- pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-23 Thread Robert Haas
On Tue, Jan 23, 2018 at 12:03 PM, Peter Eisentraut
 wrote:
> It might also be useful to set these reloptions before you add a new
> column to a table.

I don't understand.  AAUI, it is currently the case that if you set
the options before the TOAST table exists, they are lost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-23 Thread Robert Haas
On Mon, Jan 22, 2018 at 10:13 PM, Peter Geoghegan  wrote:
> _bt_leader_heapscan() can detect when workers exit early, at least in
> the vast majority of cases. It can do this simply by processing
> interrupts and automatically propagating any error -- nothing special
> about that. It can also detect when workers have finished
> successfully, because of course, that's the main reason for its
> existence. What remains, exactly?

As Amit says, what remains is the case where fork() fails or the
worker dies before it reaches the line in ParallelWorkerMain that
reads shm_mq_set_sender(mq, MyProc).  In those cases, no error will be
signaled until you call WaitForParallelWorkersToFinish().  If you wait
prior to that point for a number of workers equal to
nworkers_launched, you will wait forever in those cases.

I am going to repeat my previous suggest that we use a Barrier here.
Given the discussion subsequent to my original proposal, this can be a
lot simpler than what I suggested originally.  Each worker does
BarrierAttach() before beginning to read tuples (exiting if the phase
returned is non-zero) and BarrierArriveAndDetach() when it's done
sorting.  The leader does BarrierAttach() before launching workers and
BarrierArriveAndWait() when it's done sorting.  If we don't do this,
we're going to have to invent some other mechanism to count the
participants that actually initialize successfully, but that seems
like it's just duplicating code.

This proposal has some minor advantages even when no fork() failure or
similar occurs.  If, for example, one or more workers take a long time
to start, the leader doesn't have to wait for them before writing out
the index.  As soon as all the workers that attached to the Barrier
have arrived at the end of phase 0, the leader can build a new tape
set from all of the tapes that exist at that time.  It does not need
to wait for the remaining workers to start up and create empty tapes.
This is only a minor advantage since we probably shouldn't be doing
CREATE INDEX in parallel in the first place if the index build is so
short that this scenario is likely to occur, but we get it basically
for free, so why not?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Possible performance regression in version 10.1 with pgbench read-write tests.

2018-01-23 Thread Mithun Cy
Hi all,

When I was trying to do read-write pgbench bench-marking of PostgreSQL
9.6.6 vs 10.1 I found PostgreSQL 10.1 regresses against 9.6.6 in some
cases.

Non Default settings and test
==
Server:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 &

Pgbench:
CASE 1: when data fits shared buffers.
./pgbench -i -s 1000 postgres

CASE 2: when data exceeds shared buffers.
./pgbench -i -s 1000 postgres

./pgbench -c $threads -j $threads -T 1800 -M prepared postgres

Script "perf_buff_mgmt_write-2.sh" which is added below can be used to run same.


Machine : "cthulhu" 8 node numa machine with 128 hyper threads.
===
>numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 37885 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 31215 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 15331 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 36774 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 62 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 9653 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 50209 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 43966 MB

CASE 1:
In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS :
35554.573858 and in 10.1 at 72 clients TPS dips to 26882.828133 so
nearly 23% decrease in TPS.

CASE 2:
In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS :
24861.074079 and in 10.1 at 72 clients TPS dips to 18372.565663 so
nearly 26% decrease in TPS.

Added "Postgresql_benchmarking_9.6vs10.ods" which gives more detailed
TPS numbers. And, TPS is median of 3 runs result.

I have not run bisect yet to find what has caused the issue.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Postgresql_benchmarking_9.6vs10.ods
Description: application/vnd.oasis.opendocument.spreadsheet


perf_buff_mgmt_write-2.sh
Description: Bourne shell script


Re: Doc tweak for huge_pages?

2018-01-23 Thread Catalin Iacob
On Mon, Jan 22, 2018 at 7:23 AM, Justin Pryzby  wrote:
> Consider this shorter, less-severe sounding alternative:
> "... (but note that this feature can degrade performance of some
> PostgreSQL workloads)."

I think the patch looks good now.

As Justin mentions, as far as I see the only arguable piece is how
strong the language should be against Linux THP.

On one hand it can be argued that warning about THP issues is not the
job of this patch. On the other hand this patch does say more about
THP and Googling does bring up a lot of trouble and advice to disable
THP, including:

https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi=zfsol14rqnqa8hwz1r4rsndlr1b...@mail.gmail.com
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge

The RedHat article above says "However, THP is not recommended for
database workloads."

I'll leave this to the committer and switch this patch to Ready for Committer.

By the way, Fedora 27 does disable THP by default, they deviate from
upstream in this regard:

[catalin@fedie scripts]$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
[catalin@fedie scripts]$ grep TRANSPARENT /boot/config-4.14.13-300.fc27.x86_64
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y

When I have some time I'll try to do some digging into history of the
Fedora kernel package to see if they provide a rationale for changing
the default. That might hint whether it's likely that future RHEL will
change as well.



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-23 Thread Petr Jelinek
Hi,

On 23/01/18 18:47, Marco Nenciarini wrote:
> Il 23/01/18 18:25, Petr Jelinek ha scritto:
>> On 23/01/18 18:19, Marco Nenciarini wrote:
>>> Il 23/01/18 18:13, Petr Jelinek ha scritto:
 Hi,

 On 23/01/18 15:38, Marco Nenciarini wrote:
> Il 22/01/18 23:18, Petr Jelinek ha scritto:
>> On 22/01/18 19:45, Petr Jelinek wrote:
>>
>> Actually on second look, I don't like the new boolean parameter much.
>> I'd rather we didn't touch the input list and always close only
>> relations opened inside the ExecuteTruncateGuts().
>>
>> It may mean more list(s) but the current interface is still not clean.
>>
>
> Now ExecuteTruncateGuts unconditionally closes the relations that it
> opens. The caller has now always the responsibility to close the
> relations passed with the explicit_rels list.

 This looks good.

>
> Version 15 attached.
>

 I see you still do CASCADE on the subscriber though.

>>>
>>> No it doesn't. The following code in worker.c prevents that.
>>>
>>>
>>> +   /*
>>> +* Even if we used CASCADE on the upstream master we explicitly
>>> +* default to replaying changes without further cascading.
>>> +* This might be later changeable with a user specified option.
>>> +*/
>>> +   cascade = false;
>>
>> Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always
>> instead of this (keeping the comment). Unless you plan to make it option
>> as part of this patch, the current coding is confusing.
>>
> 
> Ok, Removed.
> 

Great, thanks, I think this is ready now so marking as such.


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Improve scripting language in pgbench

2018-01-23 Thread Peter Eisentraut
On 1/10/18 07:36, Fabien COELHO wrote:
> I just noticed while rebasing stuff that there is some crust in 
> "pgbench/t/001_pgbench_with_server.pl" coming from this patch:
> 
>   +=head
>   +
>   +} });
>   +
>   +=cut
> 
> I cannot find any use for these lines which are ignored by perl execution 
> anyway. It may be some leftovers from debugging which got past everyone. 
> If so, I think that it is better removed, see the attached cleanup patch.

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-23 Thread Petr Jelinek
On 23/01/18 18:19, Marco Nenciarini wrote:
> Il 23/01/18 18:13, Petr Jelinek ha scritto:
>> Hi,
>>
>> On 23/01/18 15:38, Marco Nenciarini wrote:
>>> Il 22/01/18 23:18, Petr Jelinek ha scritto:
 On 22/01/18 19:45, Petr Jelinek wrote:

 Actually on second look, I don't like the new boolean parameter much.
 I'd rather we didn't touch the input list and always close only
 relations opened inside the ExecuteTruncateGuts().

 It may mean more list(s) but the current interface is still not clean.

>>>
>>> Now ExecuteTruncateGuts unconditionally closes the relations that it
>>> opens. The caller has now always the responsibility to close the
>>> relations passed with the explicit_rels list.
>>
>> This looks good.
>>
>>>
>>> Version 15 attached.
>>>
>>
>> I see you still do CASCADE on the subscriber though.
>>
> 
> No it doesn't. The following code in worker.c prevents that.
> 
> 
> + /*
> +  * Even if we used CASCADE on the upstream master we explicitly
> +  * default to replaying changes without further cascading.
> +  * This might be later changeable with a user specified option.
> +  */
> + cascade = false;

Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always
instead of this (keeping the comment). Unless you plan to make it option
as part of this patch, the current coding is confusing.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-23 Thread Marco Nenciarini
Il 23/01/18 18:13, Petr Jelinek ha scritto:
> Hi,
> 
> On 23/01/18 15:38, Marco Nenciarini wrote:
>> Il 22/01/18 23:18, Petr Jelinek ha scritto:
>>> On 22/01/18 19:45, Petr Jelinek wrote:
>>>
>>> Actually on second look, I don't like the new boolean parameter much.
>>> I'd rather we didn't touch the input list and always close only
>>> relations opened inside the ExecuteTruncateGuts().
>>>
>>> It may mean more list(s) but the current interface is still not clean.
>>>
>>
>> Now ExecuteTruncateGuts unconditionally closes the relations that it
>> opens. The caller has now always the responsibility to close the
>> relations passed with the explicit_rels list.
> 
> This looks good.
> 
>>
>> Version 15 attached.
>>
> 
> I see you still do CASCADE on the subscriber though.
> 

No it doesn't. The following code in worker.c prevents that.


+   /*
+* Even if we used CASCADE on the upstream master we explicitly
+* default to replaying changes without further cascading.
+* This might be later changeable with a user specified option.
+*/
+   cascade = false;

There is also a test that check it works as intended:

+ # should not cascade on replica
+ $node_publisher->safe_psql('postgres', "TRUNCATE tab_rep CASCADE");
+
+ $node_publisher->wait_for_catchup($appname);
+
+ $result = $node_subscriber->safe_psql('postgres',
+   "SELECT count(*), min(a), max(a) FROM tab_notrep_fk");
+ is($result, qq(1|-1|-1),
+   'check replicated truncate does not cascade on replica');
+
+ $result = $node_subscriber->safe_psql('postgres',
+   "SELECT nextval('seq_notrep')");
+ is($result, qq(102),
+   'check replicated truncate does not restart identities');
+

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-01-23 Thread Peter Eisentraut
On 1/21/18 18:08, Daniel Gustafsson wrote:
> As per before, my patch for running tests against another set of binaries is
> included as well as a fix for connstrings with spaces, but with the recent
> hacking by Peter I assume this is superfluous.  It was handy for development 
> so
> I’ve kept it around though.

0002-Allow-running-SSL-tests-against-different-binar-v4.patch should be
obsoleted by f5da5683a86e9fc42fdf3eae2da8b096bda76a8a.

0003-Allow-spaces-in-connectionstrings-v4.patch looks reasonable, but
I'm not sure what circumstance is triggering that.  Is it specific to
your implementation?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-23 Thread Petr Jelinek
Hi,

On 23/01/18 15:38, Marco Nenciarini wrote:
> Il 22/01/18 23:18, Petr Jelinek ha scritto:
>> On 22/01/18 19:45, Petr Jelinek wrote:
>>
>> Actually on second look, I don't like the new boolean parameter much.
>> I'd rather we didn't touch the input list and always close only
>> relations opened inside the ExecuteTruncateGuts().
>>
>> It may mean more list(s) but the current interface is still not clean.
>>
> 
> Now ExecuteTruncateGuts unconditionally closes the relations that it
> opens. The caller has now always the responsibility to close the
> relations passed with the explicit_rels list.

This looks good.

> 
> Version 15 attached.
> 

I see you still do CASCADE on the subscriber though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Handling better supported channel binding types for SSL implementations

2018-01-23 Thread Peter Eisentraut
On 1/22/18 02:29, Michael Paquier wrote:
> However there is as well the argument that this list's contents are not
> directly used now, and based on what I saw from the MacOS SSL and GnuTLS
> patches that would not be the case after either.

Right, there is no facility for negotiating the channel binding type, so
a boolean result should be enough.

In which case we wouldn't actually need this for GnuTLS yet.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Test-cases for exclusion constraints is missing in alter_table.sql file

2018-01-23 Thread Peter Eisentraut
On 1/17/18 23:57, Ashutosh Sharma wrote:
> By elsewhere do you mean in the contrib module 'sepgsql'
> (sepgsql/sql/ddl.sql) because other than that i didn't find any other
> places having ALTER TABLE ... ADD CONSTRAINT ... EXCLUDE ... sort of
> sql queries other than the files which i mentioned in my earlier
> email. Thanks.

There is

ALTER TABLE partitioned ADD EXCLUDE USING gist (a WITH &&);

in alter_table.sql.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-23 Thread Peter Eisentraut
On 1/18/18 18:42, Tom Lane wrote:
> Robert Haas  writes:
>> On Wed, Jan 17, 2018 at 3:50 PM, Nikolay Shaplov  wrote:
>>> This patch raises error if user tries o set or change toast.* option for a
>>> table that does not have a TOST relation.

> There might be a case for raising a warning in this situation,
> but I would disagree with making it a hard error in any case.
> All that's going to accomplish is to break people's scripts and
> get them mad at you.

It might also be useful to set these reloptions before you add a new
column to a table.

So I think this change is not desirable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Configurable file mode mask

2018-01-23 Thread Peter Eisentraut
On 1/23/18 09:33, David Steele wrote:
> What if I update pg_upgrade/test.sh to optionally allow group
> permissions and we configure an animal to test it if it gets committed?
> 
> It's not ideal, I know, but it would get the permissions patch over the
> line and is consistent with how we currently test pg_upgrade.

Basically, what you'd need is some way to pass some options to the
initdb invocations in the pg_upgrade test script, right?  That would
seem reasonable, and also useful for testing upgrading with other initdb
options.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-23 Thread Tom Lane
Bruce Momjian  writes:
> Oh, I see what you mean.  I was just worried that some code might expect
> template1 to always have an oid of 1, but we can just call that code
> broken.

Ever since we invented template0, it's been possible to drop and recreate
template1, so I'd say any expectation that it must have OID 1 has been
wrong for a long time.  This change will just make the situation more
common.

regards, tom lane



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-23 Thread Marco Nenciarini
Il 22/01/18 23:18, Petr Jelinek ha scritto:
> On 22/01/18 19:45, Petr Jelinek wrote:
> 
> Actually on second look, I don't like the new boolean parameter much.
> I'd rather we didn't touch the input list and always close only
> relations opened inside the ExecuteTruncateGuts().
> 
> It may mean more list(s) but the current interface is still not clean.
> 

Now ExecuteTruncateGuts unconditionally closes the relations that it
opens. The caller has now always the responsibility to close the
relations passed with the explicit_rels list.

Version 15 attached.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2e768dd5e4..bdce4164d6 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 1404,1419  ExecuteTruncate(TruncateStmt *stmt)
}
  
/*
!* Check foreign key references.  In CASCADE mode, this should be
!* unnecessary since we just pulled in all the references; but as a
!* cross-check, do it anyway if in an Assert-enabled build.
 */
  #ifdef USE_ASSERT_CHECKING
-   heap_truncate_check_FKs(rels, false);
- #else
-   if (stmt->behavior == DROP_RESTRICT)
heap_truncate_check_FKs(rels, false);
  #endif
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them
--- 1404,1427 
}
  
/*
!* Suppress foreign key references check if session replication role is
!* set to REPLICA.
 */
+   if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
+   {
+ 
+   /*
+* Check foreign key references.  In CASCADE mode, this should 
be
+* unnecessary since we just pulled in all the references; but 
as a
+* cross-check, do it anyway if in an Assert-enabled build.
+*/
  #ifdef USE_ASSERT_CHECKING
heap_truncate_check_FKs(rels, false);
+ #else
+   if (stmt->behavior == DROP_RESTRICT)
+   heap_truncate_check_FKs(rels, false);
  #endif
+   }
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them
diff --git a/src/test/regress/expected/index d967e8dd21..86748430c5 100644
*** a/src/test/regress/expected/truncate.out
--- b/src/test/regress/expected/truncate.out
***
*** 68,73  HINT:  Truncate table "trunc_b" at the same time, or use 
TRUNCATE ... CASCADE.
--- 68,77 
  TRUNCATE TABLE truncate_a CASCADE;  -- ok
  NOTICE:  truncate cascades to table "trunc_b"
  NOTICE:  truncate cascades to table "trunc_e"
+ -- Ignore foreign-key checks with session_replication_role = replica
+ SET session_replication_role = replica;
+ TRUNCATE TABLE truncate_a;-- ok
+ RESET session_replication_role;
  -- circular references
  ALTER TABLE truncate_a ADD FOREIGN KEY (col1) REFERENCES trunc_c;
  -- Add some data to verify that truncating actually works ...
diff --git a/src/test/regress/sql/truncate.sqindex fbd1d1a8a5..0d0a3705d2 100644
*** a/src/test/regress/sql/truncate.sql
--- b/src/test/regress/sql/truncate.sql
***
*** 33,38  TRUNCATE TABLE trunc_c,trunc_d,trunc_e,truncate_a,trunc_b;   
-- ok
--- 33,43 
  TRUNCATE TABLE truncate_a RESTRICT; -- fail
  TRUNCATE TABLE truncate_a CASCADE;  -- ok
  
+ -- Ignore foreign-key checks with session_replication_role = replica
+ SET session_replication_role = replica;
+ TRUNCATE TABLE truncate_a;-- ok
+ RESET session_replication_role;
+ 
  -- circular references
  ALTER TABLE truncate_a ADD FOREIGN KEY (col1) REFERENCES trunc_c;
  
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 1e22c1eefc..d1feea4909 100644
*** a/contrib/test_decoding/expected/ddl.out
--- b/contrib/test_decoding/expected/ddl.out
***
*** 543,548  UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
--- 543,550 
  UPDATE table_with_unique_not_null SET id = -id;
  UPDATE table_with_unique_not_null SET id = -id;
  DELETE FROM table_with_unique_not_null WHERE data = 3;
+ TRUNCATE table_with_unique_not_null;
+ TRUNCATE table_with_unique_not_null, table_with_unique_not_null RESTART 
IDENTITY CASCADE;
  -- check toast support
  BEGIN;
  CREATE SEQUENCE toasttable_rand_seq START 79 INCREMENT 1499; -- portable 
"random"
***
*** 660,665  SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
--- 662,673 
   table public.table_with_unique_not_null: DELETE: id[integer]:4
   COMMIT
   BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: (no-flags)
+  COMMIT
+  BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: restart_seqs cascade
+  COMMIT
+  BEGIN
   table public.toasttable: INSERT: 

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-23 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> The most obvious hack is just to exclude PL objects with OIDs below 16384
>> from the dump.  An issue with that is that it'd lose any nondefault
>> changes to the ACL for plpgsql, which seems like a problem.  On the
>> other hand, I think the rule we have in place for the public schema is
>> preventing dumping local adjustments to that, and there have been no
>> complaints.  (Maybe I'm wrong and the initacl mechanism handles that
>> case?  If so, seems like we could get it to handle local changes to
>> plpgsql's ACL as well.)

> Both the public schema and plpgsql's ACLs should be handled properly
> (with local changes preserved) through the pg_init_privs system.  I'm
> not sure what you're referring to regarding a rule preventing dumping
> local adjustments to the public schema, as far as I can recall we've
> basically always supported that.

I went looking and realized that actually what we're interested in here
is the plpgsql extension, not the plpgsql language ... and in fact the
behavior I was thinking of is already there, except for some reason it's
only applied during binary upgrade.  So I think we should give serious
consideration to the attached, which just removes the binary_upgrade
condition (and adjusts nearby comments).  With this, I get empty dumps
for the default state of template1 and postgres, which is what one
really would expect.  And testing shows that if you modify the ACL
of language plpgsql, that does still come through as expected.

(Note: this breaks the parts of the pg_dump regression tests that
expect to see "CREATE LANGUAGE plpgsql".  I've not bothered to update
those, pending the decision on whether to do this.)

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d65ea54..87b15a1 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** selectDumpableAccessMethod(AccessMethodI
*** 1617,1637 
   * selectDumpableExtension: policy-setting subroutine
   *		Mark an extension as to be dumped or not
   *
!  * Normally, we dump all extensions, or none of them if include_everything
!  * is false (i.e., a --schema or --table switch was given).  However, in
!  * binary-upgrade mode it's necessary to skip built-in extensions, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
  	/*
! 	 * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to
! 	 * change permissions on those objects, if they wish to, and have those
! 	 * changes preserved.
  	 */
! 	if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
  		extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
  	else
  		extinfo->dobj.dump = extinfo->dobj.dump_contains =
--- 1617,1637 
   * selectDumpableExtension: policy-setting subroutine
   *		Mark an extension as to be dumped or not
   *
!  * Built-in extensions should be skipped except for checking ACLs, since we
   * assume those will already be installed in the target database.  We identify
   * such extensions by their having OIDs in the range reserved for initdb.
+  * We dump all user-added extensions by default, or none of them if
+  * include_everything is false (i.e., a --schema or --table switch was given).
   */
  static void
  selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt)
  {
  	/*
! 	 * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to
! 	 * change permissions on their member objects, if they wish to, and have
! 	 * those changes preserved.
  	 */
! 	if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid)
  		extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL;
  	else
  		extinfo->dobj.dump = extinfo->dobj.dump_contains =


Re: PATCH: Configurable file mode mask

2018-01-23 Thread Peter Eisentraut
On 1/19/18 16:54, Alvaro Herrera wrote:
> If the only problem is that buildfarm would run tests twice, then I
> think we should just press forward with this regardless of that: it
> seems a chicken-and-egg problem, because buildfarm cannot be upgraded to
> use some new test method if the method doesn't exist yet.  As a
> solution, let's just live with some run duplication for a period until
> the machines are upgraded to a future buildfarm client code.

Well, people protested against that approach.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Fwd: Regarding ambulkdelete, amvacuumcleanup index methods

2018-01-23 Thread Abinaya k
-- Forwarded message --
From: "Abinaya k" 
Date: Jan 23, 2018 1:43 PM
Subject: Regarding ambulkdelete, amvacuumcleanup index methods
To: 
Cc:

Hai all,
  We are building In-memory index extension for postgres. We would
capture table inserts, updates, deletes using triggers. During vacuum
operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as
part of index cleanup). As we handle all updates, deletes using triggers,
we don't have to do any index cleanup in ambulkdelete. But, what stats
should i return from ambulkdelete and amvacuumcleanup? Is that necessary to
return stats from ambulkdelete and amvacuumcleanup ?


Re: [HACKERS] taking stdbool.h into use

2018-01-23 Thread Peter Eisentraut
On 1/16/18 01:14, Michael Paquier wrote:
>> I don't see it either.  I think the actually useful parts of that patch
>> were to declare record_image_cmp's result correctly as int rather than
>> bool, and to cope with varlena fields of unequal size.  Unfortunately
>> there seems to be no contemporaneous mailing list discussion, so
>> it's not clear what Kevin based this change on.
> 
> This was a hotfix after a buildfarm breakage, the base commit being
> f566515.
> 
>> Want to try reverting the GET_X_BYTES() parts of it and see if the
>> buildfarm complains?
> 
> So, Peter, are you planning to do so?

Here is a proposed patch set to clean this up.  First, add some test
coverage for record_image_cmp.  (There wasn't any, only for
record_image_eq as part of MV testing.)  Then, remove the GET_ macros
from record_image_{eq,cmp}.  And finally remove all that byte-masking
stuff from postgres.h.

>> Note if that if we simplify the GetDatum macros, it's possible that
>> record_image_cmp would change behavior, since it might now see signed not
>> unsigned values depending on whether the casts do sign extension or not.
>> Not sure if that'd be a problem.

I wasn't able to construct such a case.  Everything still works unsigned
end-to-end, I think.  But if you can think of a case, we can throw it
into the tests.  The tests already contain cases of comparing positive
and negative integers.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 686b2a6f2c0a455dccbecf07d163af5d6f9c9e88 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 23 Jan 2018 10:13:45 -0500
Subject: [PATCH 1/3] Add tests for record_image_eq and record_image_cmp

record_image_eq was covered a bit by the materialized view code that it
is meant to support, but record_image_cmp was not tested at all.
---
 src/test/regress/expected/rowtypes.out | 161 +
 src/test/regress/sql/rowtypes.sql  |  53 +++
 2 files changed, 214 insertions(+)

diff --git a/src/test/regress/expected/rowtypes.out 
b/src/test/regress/expected/rowtypes.out
index a4bac8e3b5..e3c23a41cd 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -369,6 +369,167 @@ LINE 1: select * from cc order by f1;
   ^
 HINT:  Use an explicit ordering operator or modify the query.
 --
+-- Tests for record_image_{eq,cmp}
+--
+create type testtype1 as (a int, b int);
+-- all true
+select row(1, 2)::testtype1 *< row(1, 3)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 2)::testtype1 *<= row(1, 3)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 2)::testtype1 *= row(1, 2)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 2)::testtype1 *<> row(1, 3)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 3)::testtype1 *>= row(1, 2)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 3)::testtype1 *> row(1, 2)::testtype1;
+ ?column? 
+--
+ t
+(1 row)
+
+-- all false
+select row(1, -2)::testtype1 *< row(1, -3)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, -2)::testtype1 *<= row(1, -3)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, -2)::testtype1 *= row(1, -3)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, -2)::testtype1 *<> row(1, -2)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, -3)::testtype1 *>= row(1, -2)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, -3)::testtype1 *> row(1, -2)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+-- This returns the "wrong" order because record_image_cmp works on
+-- unsigned datums without knowing about the actual data type.
+select row(1, -2)::testtype1 *< row(1, 3)::testtype1;
+ ?column? 
+--
+ f
+(1 row)
+
+-- other types
+create type testtype2 as (a smallint, b bool);  -- byval different sizes
+select row(1, true)::testtype2 *< row(2, true)::testtype2;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(-2, true)::testtype2 *< row(-1, true)::testtype2;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(0, false)::testtype2 *< row(0, true)::testtype2;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(0, false)::testtype2 *<> row(0, true)::testtype2;
+ ?column? 
+--
+ t
+(1 row)
+
+create type testtype3 as (a int, b text);  -- variable length
+select row(1, 'abc')::testtype3 *< row(1, 'abd')::testtype3;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 'abc')::testtype3 *< row(1, 'abcd')::testtype3;
+ ?column? 
+--
+ t
+(1 row)
+
+select row(1, 'abc')::testtype3 *> row(1, 'abd')::testtype3;
+ ?column? 
+--
+ f
+(1 row)
+
+select row(1, 'abc')::testtype3 *<> row(1, 'abd')::testtype3;
+ ?column? 
+--
+ t
+(1 row)
+
+create type testtype4 as (a int, b point);  -- by ref, fixed length
+select row(1, '(1,2)')::testtype4 *< row(1, 

Re: [HACKERS] PoC: full merge join on comparison clause

2018-01-23 Thread Stephen Frost
Greetings Alexander,

* Alexander Kuzmenkov (a.kuzmen...@postgrespro.ru) wrote:
> Here is the patch rebased to a852cfe9.

Thanks for updating it.  This would definitely be nice to have.
Ashutosh, thanks for your previous review, would you have a chance to
look at it again?  Would be great to at least get this to ready for
committer before the end of this CF.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgsql: Allow UPDATE to move rows between partitions.

2018-01-23 Thread Robert Haas
On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapila  wrote:
> On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas  wrote:
>> Allow UPDATE to move rows between partitions.
>
> +If an UPDATE on a partitioned table causes a row to 
> move
> +to another partition, it will be performed as a DELETE
> +from the original partition followed by an INSERT into
> +the new partition. In this case, all row-level BEFORE
> +UPDATE triggers and all row-level
> +BEFORE DELETE triggers are fired on
> +the original partition.
>
> Do we need to maintain triggers related behavior for logical
> replication?  In logical replication, we use ExecSimpleRelationDelete
> to perform Delete operation which is not aware of this special
> behavior (execute before update trigger for this case).

Hmm.  I don't think there's any way for the logical decoding
infrastructure to identify this case at present.  I suppose if we want
that behavior, we'd need to modify the WAL format, and the changes
might not be too straightforward because the after-image of the tuple
wouldn't be available in the DELETE record.  I think the only reason
we fire the UPDATE triggers is because we can't decide until after
they've finished executing that we really want to DELETE and INSERT
instead; by the time we are replicating the changes, we know what the
final shape of the operation ended up being, so it's not clear to me
that firing UPDATE triggers at that point would be useful.  I fear
that trying to change this is going to cost performance (and developer
time) for no real benefit, so my gut feeling is to leave it alone.
However, what do other people think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-23 Thread Tom Lane
Haribabu Kommi  writes:
> On Tue, Jan 23, 2018 at 8:56 AM, Tom Lane  wrote:
>> I'm thinking we'd still need to do what I said in the previous message,
>> and change pg_dump so that the restore session will absorb ALTER DATABASE
>> settings before proceeding.  Otherwise, at minimum, we have a hazard of
>> differing behaviors in serial and parallel restores.  It might be that
>> lc_monetary is the only GUC that makes a real difference for this purpose,
>> but I haven't got much faith in that proposition at the moment.

> Yes, restore session should absorb all the ALTER DATABASE and ALTER ROLE
> IN DATABASE settings also to make sure that the target database is in same
> state when the dump has started.

I've pushed a patch to do that.

> currently "default_transaction_read_only" is the only GUC that affects the
> absorbing the restore of a database.

Well, that's exactly the $64 question.  Are we sure we have a complete,
reliable list of which GUCs do or do not affect data restoration?
I wouldn't count on it.  If nothing else, extensions might have custom
GUCs that we could not predict the behavior of.

> As you said in upthread, I feel splitting them into two _TOC entries and
> dump as last commands of each database? Does it have any problem with
> parallel restore?

As I said yesterday, I'm not really eager to do that.  It's a lot of
complexity and a continuing maintenance headache to solve a use-case
that I find pretty debatable.  Anyone who's actually put in
"default_transaction_read_only = on" in a way that restricts their
maintenance roles must have created procedures for undoing it easily;
otherwise day-to-day maintenance would be a problem.  Further, I don't
find the original hack's distinction between pg_dump and pg_dumpall
to be really convincing.  Maybe that says we should resolve this by just
sticking "SET default_transaction_read_only = off" into regular pg_dump
output after all --- perhaps with a switch added to enable it.  But I'd
kind of like to see the argument why we should worry about this at all
made afresh.

regards, tom lane



Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-23 Thread Robert Haas
On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila  wrote:
> It does turn such cases into error but only at the end when someone
> calls WaitForParallelWorkersToFinish.  If one tries to rely on it at
> any other time, it won't work as I think is the case for Parallel
> Create Index where Peter G. is trying to use it differently.  I am not
> 100% sure whether Parallel Create index has this symptom as I have not
> tried it with that patch, but I and Peter are trying to figure that
> out.

OK.  I've committed this patch and back-patched it to 9.6.
Back-patching to 9.5 didn't looks simple because nworkers_launched
doesn't exist on that branch, and it doesn't matter very much anyway
since the infrastructure has no in-core users in 9.5.

I'll respond to the other thread about the issues specific to parallel
CREATE INDEX.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2018-01-23 Thread Pavel Stehule
2018-01-22 23:15 GMT+01:00 Stephen Frost :

> Pavel,
>
> * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > here is a GUC based patch for plancache controlling. Looks so this code
> is
> > working.
>
> This really could use a new thread, imv.  This thread is a year old and
> about a completely different feature than what you've implemented here.
>

true, but now it is too late


> > It is hard to create regress tests. Any ideas?
>
> Honestly, my idea for this would be to add another option to EXPLAIN
> which would make it spit out generic and custom plans, or something of
> that sort.
>
>
I looked there, but it needs cycle dependency between CachedPlan and
PlannedStmt. It needs more code than this patch and code will not be nicer.
I try to do some game with prepared statements



> Please review your patches before sending them and don't send in patches
> which have random unrelated whitespace changes.
>
> > diff --git a/src/backend/utils/cache/plancache.c
> b/src/backend/utils/cache/plancache.c
> > index ad8a82f1e3..cc99cf6dcc 100644
> > --- a/src/backend/utils/cache/plancache.c
> > +++ b/src/backend/utils/cache/plancache.c
> > @@ -1031,6 +1033,12 @@ choose_custom_plan(CachedPlanSource *plansource,
> ParamListInfo boundParams)
> >   if (IsTransactionStmtPlan(plansource))
> >   return false;
> >
> > + /* See if settings wants to force the decision */
> > + if (plancache_mode & PLANCACHE_FORCE_GENERIC_PLAN)
> > + return false;
> > + if (plancache_mode & PLANCACHE_FORCE_CUSTOM_PLAN)
> > + return true;
>
> Not a big deal, but I probably would have just expanded the conditional
> checks against cursor_options (just below) rather than adding
> independent if() statements.
>

I don't think so proposed change is better - My opinion is not strong - and
commiter can change it simply

>
> > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> > index ae22185fbd..4ce275e39d 100644
> > --- a/src/backend/utils/misc/guc.c
> > +++ b/src/backend/utils/misc/guc.c
> > @@ -3916,6 +3923,16 @@ static struct config_enum ConfigureNamesEnum[] =
> >   NULL, NULL, NULL
> >   },
> >
> > + {
> > + {"plancache_mode", PGC_USERSET, QUERY_TUNING_OTHER,
> > + gettext_noop("Forces use of custom or generic
> plans."),
> > + gettext_noop("It can control query plan cache.")
> > + },
> > + _mode,
> > + PLANCACHE_DEFAULT, plancache_mode_options,
> > + NULL, NULL, NULL
> > + },
>
> That long description is shorter than the short description.  How about:
>
> short: Controls the planner's selection of custom or generic plan.
> long: Prepared queries have custom and generic plans and the planner
> will attempt to choose which is better; this can be set to override
> the default behavior.
>

changed - thank you for text

>
> (either that, or just ditch the long description)
>
> > diff --git a/src/include/utils/plancache.h
> b/src/include/utils/plancache.h
> > index 87fab19f3c..962895cc1a 100644
> > --- a/src/include/utils/plancache.h
> > +++ b/src/include/utils/plancache.h
> > @@ -143,7 +143,6 @@ typedef struct CachedPlan
> >   MemoryContext context;  /* context containing this
> CachedPlan */
> >  } CachedPlan;
> >
> > -
> >  extern void InitPlanCache(void);
> >  extern void ResetPlanCache(void);
> >
>
> Random unrelated whitespace change...
>

fixed

>
> This is also missing documentation updates.
>

I wrote some basic doc, but native speaker should to write more words about
used strategies.


> Setting to Waiting for Author, but with those changes I would think we
> could mark it ready-for-committer, it seems pretty straight-forward to
> me and there seems to be general agreement (now) that it's worthwhile to
> have.
>
> Thanks!
>

attached updated patch

Regards

Pavel


> Stephen
>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 45b2af14eb..d6cef97ca9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4387,6 +4387,30 @@ SELECT * FROM parent WHERE key = 2400;
   
  
 
+ 
+
+
+ 
+ Plan Cache Options
+
+ 
+
+ 
+  plancache_mode (enum)
+  
+   plancache_mode configuration parameter
+  
+  
+  
+   
+Prepared queries have custom and generic plans and the planner 
+will attempt to choose which is better; this can be set to override 
+the default behavior. The allowed values are default,
+force_custom_plan and force_generic_plan.
+   
+  
+ 
+
  
 

diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 8d7d8e04c9..1f8884896b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -106,6 +106,8 @@ static void PlanCacheRelCallback(Datum arg, Oid relid);
 static void 

Re: Invalidation pg catalog cache in trigger functions

2018-01-23 Thread Константин Евтеев
Sorry, I forgot to show version of PostgreSQL:
https://www.postgresql.org/message-id/CAAqA9PQXEmG%3Dk3WpDTmHZL-VKcMpDEA3ZC06Qr0ASO3oTA7bdw%40mail.gmail.com
"Invalidation pg catalog cache in trigger functions"
was detected on "PostgreSQL 9.2.18 on x86_64-unknown-linux-gnu, compiled by
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit"
Also it can be reproduced on PostgreSQL 9.2.24
For newer versions it is not actual - PostgreSQL 9.3.20, 9.4.15, 9.4.12


But
Issue
https://www.postgresql.org/message-id/20171030125345.1448.24...@wrigleys.postgresql.org

"BUG #14879: Bug connected with table structure modification and trigger
function query plan invalidation"
Is actual for PostgreSQL  9.2.24; 9.3.20; 9.4.12; 9.4.15

In this thread I propose to find out the cause of
Issue
https://www.postgresql.org/message-id/20171030125345.1448.24...@wrigleys.postgresql.org
"BUG #14879: Bug connected with table structure modification and trigger
function query plan invalidation"

--
Konstantin Evteev


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-23 Thread Bruce Momjian
On Sun, Jan 21, 2018 at 11:02:29AM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Jan 19, 2018 at 02:54:25PM -0500, Tom Lane wrote:
> >> Command was: DROP DATABASE "template1";
> 
> > Uh, the oid of the template1 database is 1, and I assume we would want
> > to preserve that too.
> 
> I don't feel any huge attachment to that.  In the first place, under
> this proposal recreating template1 is something you would only need to do
> if you weren't satisfied with its default properties as set by initdb.
> Which ought to be a minority of users.  In the second place, if you are
> changing those properties from the way initdb set it up, it's not really
> virgin template1 anymore, so why shouldn't it have a new OID?

Oh, I see what you mean.  I was just worried that some code might expect
template1 to always have an oid of 1, but we can just call that code
broken.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



  1   2   >