Re: Regarding ambulkdelete, amvacuumcleanup index methods
On Wed, Jan 24, 2018 at 1:27 PM, Abinaya kwrote: > Hai all, > We are building In-memory index extension for postgres. We would > capture table inserts, updates, deletes using triggers. During vacuum > operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as > part of index cleanup). As we handle all updates, deletes using triggers, we > don't have to do any index cleanup in ambulkdelete. But, what stats should i > return from ambulkdelete and amvacuumcleanup? Is that necessary to return > stats from ambulkdelete and amvacuumcleanup ? Both ambulkdelete and amvacuumcleanup return an IndexBulkDeleteResult. If you return a non-NULL value, the values of returned IndexBulkDeleteResult are used for updating the index statistics and reporting the statistics of bulk deletion in lazy_cleanup_index. For example, num_pages and num_index_tuples are used for updating pg_class.relpages and pg_class.reltuples. But if you return NULL from them, these are skipped. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Help needed in using 'on_dsm_detach' callback
Hello people, We are trying to build an in-memory index in postgres using dsa. Here is how we implemented dsa part. We have PROC_DSA_AREA global variable(Process specific DSA Pointer) We have a piece of traditional postgres shared memory to store dsa_handle Each process that needs to use DSA, should create/attach DSA (based on dsa_handle stored in shmem) Once created/attached, set the process dsa pointer to PROC_DSA_AREA variable Subsequent DSA access in that process will use PROC_DSA_AREA variable with out looking to create/attach Problem here is, on DSA Detach, i would like to reset PROC_DSA_AREA variable to NULL. Otherwise subsequent DSA calls tries to use the same pointer and may end up into Segmentation Faults. How could i do that? Found that there is a callback for dsa detach but that function requires segment pointer as an argument, Should be as below: on_dsm_detach(PROC_DSA_AREA-segment_maps[0].segment, detach_func); ** detach_func will set PROC_DSA_AREA variable to NULL. But i couldn't access that segment, as DSA_AREA struct is defined in dsa.c, so I am unable to include. Any other ways to get dsa detach event, or to access DSA Segment pointer? Got stuck here.. kindly help me to proceed further. Thank you, G. Sai Ram
Re: [HACKERS] Subscription code improvements
On Tue, Jan 23, 2018 at 7:58 AM, Stephen Frostwrote: > Greetings, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> On Fri, Aug 18, 2017 at 2:09 PM, Masahiko Sawada >> wrote: >> > On Thu, Aug 17, 2017 at 8:17 PM, Masahiko Sawada >> > wrote: >> >> On Thu, Aug 17, 2017 at 9:10 AM, Peter Eisentraut >> >> wrote: >> >>> On 8/8/17 05:58, Masahiko Sawada wrote: >> Are you planning to work on remaining patches 0005 and 0006 that >> improve the subscription codes in PG11 cycle? If not, I will take over >> them and work on the next CF. >> >>> >> >>> Based on your assessment, the remaining patches were not required bug >> >>> fixes. So I think preparing them for the next commit fest would be >> >>> great. >> >>> >> >> >> >> Thank you for the comment. >> >> >> >> After more thought, since 0001 and 0003 patches on the first mail also >> >> improve the subscription codes and are worth to be considered, I >> >> picked total 4 patches up and updated them. I'm planning to work on >> >> these patches in the next CF. >> >> >> > >> > Added this item to the next commit fest. >> >> The patch set fails to apply. Please provide a rebased version. I am >> moving this entry to next CF with waiting on author as status. > > Masahiko Sawada, this patch is still in Waiting on Author and hasn't > progressed in a very long time. Is there any chance you'll be able to > provide an updated patch soon for review? Or should this patch be > closed out? > Thank you for notification. Since it seems to me that no one is interested in this patch, it would be better to close out this patch. This patch is a refactoring patch but subscription code seems to work fine so far. If a problem appears around subscriptions, I might propose it again. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
JIT compiling with LLVM v9.0
Hi, I've spent the last weeks working on my LLVM compilation patchset. In the course of that I *heavily* revised it. While still a good bit away from committable, it's IMO definitely not a prototype anymore. There's too many small changes, so I'm only going to list the major things. A good bit of that is new. The actual LLVM IR emissions itself hasn't changed that drastically. Since I've not described them in detail before I'll describe from scratch in a few cases, even if things haven't fully changed. == JIT Interface == To avoid emitting code in very small increments (increases mmap/mremap rw vs exec remapping, compile/optimization time), code generation doesn't happen for every single expression individually, but in batches. The basic object to emit code via is a jit context created with: extern LLVMJitContext *llvm_create_context(bool optimize); which in case of expression is stored on-demand in the EState. For other usecases that might not be the right location. To emit LLVM IR (ie. the portabe code that LLVM then optimizes and generates native code for), one gets a module from that with: extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context); to which "arbitrary" numbers of functions can be added. In case of expression evaluation, we get the module once for every expression, and emit one function for the expression itself, and one for every applicable/referenced deform function. As explained above, we do not want to emit code immediately from within ExecInitExpr()/ExecReadyExpr(). To facilitate that readying a JITed expression sets the function to callback, which gets the actual native function on the first actual call. That allows to batch together the generation of all native functions that are defined before the first expression is evaluated - in a lot of queries that'll be all. Said callback then calls extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); which'll emit code for the "in progress" mutable module if necessary, and then searches all generated functions for the name. The names are created via extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); currently "evalexpr" and deform" with a generation and counter suffix. Currently expression which do not have access to an EState, basically all "parent" less expressions, aren't JIT compiled. That could be changed, but I so far do not see a huge need. == Error handling == There's two aspects to error handling. Firstly, generated (LLVM IR) and emitted functions (mmap()ed segments) need to be cleaned up both after a successful query execution and after an error. I've settled on a fairly boring resowner based mechanism. On errors all expressions owned by a resowner are released, upon success expressions are reassigned to the parent / released on commit (unless executor shutdown has cleaned them up of course). A second, less pretty and newly developed, aspect of error handling is OOM handling inside LLVM itself. The above resowner based mechanism takes care of cleaning up emitted code upon ERROR, but there's also the chance that LLVM itself runs out of memory. LLVM by default does *not* use any C++ exceptions. It's allocations are primarily funneled through the standard "new" handlers, and some direct use of malloc() and mmap(). For the former a 'new handler' exists http://en.cppreference.com/w/cpp/memory/new/set_new_handler for the latter LLVM provides callback that get called upon failure (unfortunately mmap() failures are treated as fatal rather than OOM errors). What I've chosen to do, and I'd be interested to get some input about that, is to have two functions that LLVM using code must use: extern void llvm_enter_fatal_on_oom(void); extern void llvm_leave_fatal_on_oom(void); before interacting with LLVM code (ie. emitting IR, or using the above functions) llvm_enter_fatal_on_oom() needs to be called. When a libstdc++ new or LLVM error occurs, the handlers set up by the above functions trigger a FATAL error. We have to use FATAL rather than ERROR, as we *cannot* reliably throw ERROR inside a foreign library without risking corrupting its internal state. Users of the above sections do *not* have to use PG_TRY/CATCH blocks, the handlers instead are reset on toplevel sigsetjmp() level. Using a relatively small enter/leave protected section of code, rather than setting up these handlers globally, avoids negative interactions with extensions that might use C++ like e.g. postgis. As LLVM code generation should never execute arbitrary code, just setting these handlers temporarily ought to suffice. == LLVM Interface / patches == Unfortunately a bit of required LLVM functionality, particularly around error handling but also initialization, aren't currently fully exposed via LLVM's C-API. A bit more *optional* API isn't exposed either. Instead of requiring a brand-new version of LLVM that has exposed this functionality I decided it's better to have a
Re: Doc tweak for huge_pages?
On Wed, Jan 24, 2018 at 07:46:41AM +0100, Catalin Iacob wrote: > I see Peter assigned himself as committer, some more information below > for him to decide on the strength of the anti THP message. Thanks for digging this up! > And it would be good if somebody could run benchmarks on pre 4.6 and > post 4.6 kernels. I would love to but have no access to big (or > medium) hardware. I should be able to do this, since I have a handful of kernels upgrades on my todo list. Can you recommend a test ? Otherwise I'll come up with something for pgbench. But I think any test should be independant of and not influence the doc change (I don't know anywhere else in the docs which talks about behaviors of specific kernel versions, which often have vendor patches backpatched anyway). > So maybe we should weaken the language against THP. Maybe present the > known facts so far, even if the post 4.6 situation is vague/unknown: > before Linux 4.6 there were repeated reports of THP problems with > Postgres, Linux >= 4.6 might improve things but this isn't confirmed. > And it would be good if somebody could run benchmarks on pre 4.6 and > post 4.6 kernels. I would love to but have no access to big (or > medium) hardware. I think all the details should go elsewhere in the docs; config.sgml already references this: https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-HUGE-PAGES ..but it doesn't currently mention "transparent" hugepages. Justin
Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapilawrote: > On Fri, Jan 12, 2018 at 11:43 AM, amul sul wrote: [] > I have asked to change the message "tuple to be updated .." after > heap_lock_tuple call not in nodeModifyTable.c, so please revert the > message in nodeModifyTable.c. > Understood, fixed in the attached version, sorry I'd missed it. > Have you verified the changes in execReplication.c in some way? It is > not clear to me how do you ensure to set the special value > (InvalidBlockNumber) in CTID for delete operation via logical > replication? The logical replication worker uses function > ExecSimpleRelationDelete to perform Delete and there is no way it can > pass the correct value of row_moved in heap_delete. Am I missing > something due to which we don't need to do this? > You are correct, from ExecSimpleRelationDelete, heap_delete will always receive row_moved = false and InvalidBlockNumber will never set. I didn't found any test case to hit changes in execReplication.c. I am not sure what should we suppose do here, and having LOG is how much worse either. What do you think, should we add an assert like EvalPlanQualFetch() here or the current LOG message is fine? Thanks for the review. Regards, Amul Sul 0001-Invalidate-ip_blkid-v5-wip.patch Description: Binary data 0002-isolation-tests-v3.patch Description: Binary data
Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.
Thank you for looking this. At Wed, 24 Jan 2018 00:13:51 +0300, Sergei Kornilovwrote in <348951516742...@web54j.yandex.ru> > Hello > I tested this patch and think it can be commited to master. Is there a CF > record? I can not find one. Not yet. I'm thinking of creating an entry in the next CF after waiting someone to pickup this. > Should we also make backport to older versions? I test on REL_10_STABLE - > patch builds and works ok, but "make check" fails on new testcase with error: > > CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops); > >+ ERROR: missing support function 4 for attribute 1 of index "t_a_a1_idx" > and with different explain results. Thank you for checking that. d3a4f89 allowed that and inet_gist_decompress is removed at the same time. Unfortunately I didn't find a type on master that has both decompress and fetch methods so I prefer to split the regression patch among target versions. master has its own one. 9.6 and 10 share the same patch. 9.5 has a different one since the signature of consistent method has changed as of 9.6. This is not required for 9.4 and earlier since they don't check canreturn on attribute basis. (And they don't try index-only on GiST.) On the other hand the fix patch applies to all affected versions. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 0e221b3396215d67167622cbc07d04b5185d6f66 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 24 Jan 2018 15:25:48 +0900 Subject: [PATCH] Regression test for the failure of check_index_only. check_index_only forgets the case where two or more index columns on the same table column but with different operator class. --- src/test/regress/expected/create_index.out | 42 ++ src/test/regress/sql/create_index.sql | 27 +++ 2 files changed, 69 insertions(+) diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index 9c0f3e3..d95157c 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -2913,6 +2913,48 @@ explain (costs off) (2 rows) -- +-- Check for duplicate indexkey with different opclasses +-- +-- opclass that doesn't have function 9 (GIST_FETCH_PROC) for GiST +CREATE OPERATOR CLASS test_inet_ops FOR TYPE inet USING gist AS +OPERATOR3 &&, +FUNCTION1 inet_gist_consistent (internal, inet, integer, oid, internal), +FUNCTION2 inet_gist_union (internal, internal), +FUNCTION3 inet_gist_compress (internal), +FUNCTION4 inet_gist_decompress (internal), +FUNCTION5 inet_gist_penalty (internal, internal, internal), +FUNCTION6 inet_gist_picksplit (internal, internal), +FUNCTION7 inet_gist_same (inet, inet, internal); +CREATE TABLE t (a inet); +CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops); +INSERT INTO t VALUES ('192.168.0.1'); +VACUUM t; +SELECT * FROM t WHERE a && '192.168.0.1'; -- should retuan a value + a +- + 192.168.0.1 +(1 row) + +-- enfoce GiST +SET enable_seqscan TO false; +SET enable_bitmapscan TO false; +EXPLAIN (COSTS OFF) SELECT * FROM t WHERE a && '192.168.0.1'; -- shoudn't be index only scan +QUERY PLAN +-- + Index Scan using t_a_a1_idx on t + Index Cond: (a && '192.168.0.1'::inet) +(2 rows) + +SELECT * FROM t WHERE a && '192.168.0.1'; -- also should return a value + a +- + 192.168.0.1 +(1 row) + +-- cleanup +DROP TABLE t; +DROP OPERATOR CLASS test_inet_ops USING gist; +-- -- REINDEX (VERBOSE) -- CREATE TABLE reindex_verbose(id integer primary key); diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql index b199c23..95a5060 100644 --- a/src/test/regress/sql/create_index.sql +++ b/src/test/regress/sql/create_index.sql @@ -982,6 +982,33 @@ explain (costs off) select * from tenk1 where (thousand, tenthous) in ((1,1001), (null,null)); -- +-- Check for duplicate indexkey with different opclasses +-- +-- opclass that doesn't have function 9 (GIST_FETCH_PROC) for GiST +CREATE OPERATOR CLASS test_inet_ops FOR TYPE inet USING gist AS +OPERATOR3 &&, +FUNCTION1 inet_gist_consistent (internal, inet, integer, oid, internal), +FUNCTION2 inet_gist_union (internal, internal), +FUNCTION3 inet_gist_compress (internal), +FUNCTION4 inet_gist_decompress (internal), +FUNCTION5 inet_gist_penalty (internal, internal, internal), +FUNCTION6 inet_gist_picksplit (internal, internal), +FUNCTION7 inet_gist_same (inet, inet, internal); +CREATE TABLE t (a inet); +CREATE INDEX
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Jan 24, 2018 at 6:43 PM, Amit Kapilawrote: > On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munro > wrote: >> On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila wrote: > I am going to repeat my previous suggest that we use a Barrier here. > Given the discussion subsequent to my original proposal, this can be a > lot simpler than what I suggested originally. Each worker does > BarrierAttach() before beginning to read tuples (exiting if the phase > returned is non-zero) and BarrierArriveAndDetach() when it's done > sorting. The leader does BarrierAttach() before launching workers and > BarrierArriveAndWait() when it's done sorting. >>> >>> How does leader detect if one of the workers does BarrierAttach and >>> then fails (either exits or error out) before doing >>> BarrierArriveAndDetach? >> >> If you attach and then exit cleanly, that's a programming error and >> would cause anyone who runs BarrierArriveAndWait() to hang forever. >> > > Right, but what if the worker dies due to something proc_exit(1) or > something like that before calling BarrierArriveAndWait. I think this > is part of the problem we have solved in > WaitForParallelWorkersToFinish such that if the worker exits abruptly > at any point due to some reason, the system should not hang. Actually what I said before is no longer true: after commit 2badb5af, if you exit unexpectedly then the new ParallelWorkerShutdown() exit hook delivers PROCSIG_PARALLEL_MESSAGE (apparently after detaching from the error queue) and the leader aborts when it tries to read the error queue. I just hacked Parallel Hash like this: BarrierAttach(build_barrier); + if (ParallelWorkerNumber == 0) + { + pg_usleep(100); + proc_exit(1); + } Now I see: postgres=# select count(*) from foox r join foox s on r.a = s.a; ERROR: lost connection to parallel worker Using a debugger I can see the leader raising that error with this stack: HandleParallelMessages at parallel.c:890 ProcessInterrupts at postgres.c:3053 ConditionVariableSleep(cv=0x00010a62e4c8, wait_event_info=134217737) at condition_variable.c:151 BarrierArriveAndWait(barrier=0x00010a62e4b0, wait_event_info=134217737) at barrier.c:191 MultiExecParallelHash(node=0x7ffcd9050b10) at nodeHash.c:312 MultiExecHash(node=0x7ffcd9050b10) at nodeHash.c:112 MultiExecProcNode(node=0x7ffcd9050b10) at execProcnode.c:502 ExecParallelHashJoin [inlined] ExecHashJoinImpl(pstate=0x7ffcda01baa0, parallel='\x01') at nodeHashjoin.c:291 ExecParallelHashJoin(pstate=0x7ffcda01baa0) at nodeHashjoin.c:582 -- Thomas Munro http://www.enterprisedb.com
Re: Doc tweak for huge_pages?
On Tue, Jan 23, 2018 at 7:13 PM, Catalin Iacobwrote: > By the way, Fedora 27 does disable THP by default, they deviate from > upstream in this regard: > When I have some time I'll try to do some digging into history of the > Fedora kernel package to see if they provide a rationale for changing > the default. That might hint whether it's likely that future RHEL will > change as well. I see Peter assigned himself as committer, some more information below for him to decide on the strength of the anti THP message. commit 9a031d5070d9f8f5916c48637bd0c237cd52eaf9 Author: Josh Boyer Date: Thu Mar 27 18:31:16 2014 -0400 Switch to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE instead of always on The benefit of THP has been somewhat questionable overall for a while, and it's been known to cause performance issues with some workloads. Upstream also considers it to be overly complicated and really not worth it on machines with memory in the amounts found on typical desktops/SMB servers. Switch to using it via madvise, which most applications that care about it should likely already be doing. Debian 9 also seems to default to madvise instead of always. Digging more into it, there were changes in the 4.6 kernel (released May 2016) that should improve THP, more precisely: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=444eb2a449ef36fe115431ed7b71467c4563c7f1 This also lead Debian to change their default in September 2017 (so for the future Debian release) back to always, referencing the 44eb2a improvements: https://anonscm.debian.org/cgit/kernel/linux.git/commit/debian/changelog?id=611a8e67260e8b8190ab991206a3867681d6df91 Ben Hutchings 2017-09-29 14:32:09 (GMT) thp: Enable TRANSPARENT_HUGEPAGE_ALWAYS instead of TRANSPARENT_HUGEPAGE_MADVISE As advised by Andrea Arcangeli - since commit 444eb2a449ef "mm: thp: set THP defrag by default to madvise and add a stall-free defrag option" this will generally be best for performance. So maybe we should weaken the language against THP. Maybe present the known facts so far, even if the post 4.6 situation is vague/unknown: before Linux 4.6 there were repeated reports of THP problems with Postgres, Linux >= 4.6 might improve things but this isn't confirmed. And it would be good if somebody could run benchmarks on pre 4.6 and post 4.6 kernels. I would love to but have no access to big (or medium) hardware.
Re: [HACKERS] taking stdbool.h into use
On Tue, Jan 23, 2018 at 11:33:56AM -0500, Peter Eisentraut wrote: > Here is a proposed patch set to clean this up. First, add some test > coverage for record_image_cmp. (There wasn't any, only for > record_image_eq as part of MV testing.) Then, remove the GET_ macros > from record_image_{eq,cmp}. And finally remove all that byte-masking > stuff from postgres.h. Good catch. Coverage reports mark those areas as empty! Similarly the functions for record_* are mostly not used. Some tests could be added for them at the same time. The four error paths of those functions are tested as well, which is cool to see. Even if the buildfarm explodes after 0002 and 0003, 0001 is still a good addition. The set looks good to me by the way. -- Michael signature.asc Description: PGP signature
Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.
Hi! > 24 янв. 2018 г., в 2:13, Sergei Kornilovнаписал(а): > > Should we also make backport to older versions? I test on REL_10_STABLE - > patch builds and works ok, but "make check" fails on new testcase with error: >> CREATE INDEX ON t USING gist (a test_inet_ops, a inet_ops); >> + ERROR: missing support function 4 for attribute 1 of index "t_a_a1_idx" > and with different explain results. To be usable for 10 the patch must use full-featured opclass in tests: there is no decompress GiST function in test_inet_ops which was required before 11. I think, explain results will be identical. Best regards, Andrey Borodin.
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Jan 23, 2018 at 9:38 PM, Amit Kapilawrote: > On Wed, Jan 24, 2018 at 10:38 AM, Peter Geoghegan wrote: >> The leader can go ahead and sort before calling something like a new >> WaitForParallelWorkersToAttach() function (or even >> WaitForParallelWorkersToFinish()). If we did add a >> WaitForParallelWorkersToAttach() function, then the performance hit >> would probably not be noticeable in practice. The >> parallel_leader_participates case would still work without special >> care (that's the main hazard that makes using a barrier seem >> unappealing). >> > > +1. I also think so. Another advantage is that even if one of the > workers is not able to start, we don't need to return an error at the > end of the query, rather we can detect that situation early and can > execute the query successfully. OTOH, if we are not convinced about > performance implications, then WaitForParallelWorkersToFinish should > serve the purpose which can be called at later point of time. There is another disadvantage to just using WaitForParallelWorkersToFinish() to wait for nbtsort.c workers to finish (and not inventing a WaitForParallelWorkersToAttach() function, and calling that instead), which is: there can be no custom wait event that specifically mentions parallel CREATE INDEX. -- Peter Geoghegan
Re: Incorrect comments in partition.c
(2018/01/24 13:06), Amit Langote wrote: > On 2018/01/23 20:43, Etsuro Fujita wrote: >> Here is a comment for get_qual_for_list in partition.c: >> >>* get_qual_for_list >>* >>* Returns an implicit-AND list of expressions to use as a list partition's >> - * constraint, given the partition key and bound structures. >> >> I don't think the part "given the partition key and bound structures." >> is correct because we pass the *parent relation* and partition bound >> structure to that function. So I think we should change that part as >> such. get_qual_for_range has the same issue, so I think we need this >> change for that function as well. > > Yeah. It seems 6f6b99d1335 [1] changed their signatures to support > handling default partitions. > >> Another one I noticed in comments in partition.c is: >> >> * get_qual_for_hash >> * >> * Given a list of partition columns, modulus and remainder >> corresponding to a >> * partition, this function returns CHECK constraint expression Node for >> that >> * partition. >> >> I think the part "Given a list of partition columns, modulus and >> remainder corresponding to a partition" is wrong because we pass to that >> function the parent relation and partition bound structure the same way >> as for get_qual_for_list/get_qual_for_range. So what about changing the >> above to something like this, similarly to >> get_qual_for_list/get_qual_for_range: >> >> Returns a CHECK constraint expression to use as a hash partition's >> constraint, given the parent relation and partition bound structure. > > Makes sense. > >> Also: >> >> * The partition constraint for a hash partition is always a call to the >> * built-in function satisfies_hash_partition(). The first two >> arguments are >> * the modulus and remainder for the partition; the remaining arguments >> are the >> * values to be hashed. >> >> I also think the part "The first two arguments are the modulus and >> remainder for the partition;" is wrong (see satisfies_hash_partition). >> But I don't think we need to correct that here because we have described >> about the arguments in comments for that function. So I'd like to >> propose removing the latter comment entirely from the above. > > Here, too. The comment seems to have been obsoleted by f3b0897a121 [2]. Thanks for the review and related git info! Best regards, Etsuro Fujita
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munrowrote: > On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila wrote: I am going to repeat my previous suggest that we use a Barrier here. Given the discussion subsequent to my original proposal, this can be a lot simpler than what I suggested originally. Each worker does BarrierAttach() before beginning to read tuples (exiting if the phase returned is non-zero) and BarrierArriveAndDetach() when it's done sorting. The leader does BarrierAttach() before launching workers and BarrierArriveAndWait() when it's done sorting. >> >> How does leader detect if one of the workers does BarrierAttach and >> then fails (either exits or error out) before doing >> BarrierArriveAndDetach? > > If you attach and then exit cleanly, that's a programming error and > would cause anyone who runs BarrierArriveAndWait() to hang forever. > Right, but what if the worker dies due to something proc_exit(1) or something like that before calling BarrierArriveAndWait. I think this is part of the problem we have solved in WaitForParallelWorkersToFinish such that if the worker exits abruptly at any point due to some reason, the system should not hang. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 10:38 AM, Peter Geogheganwrote: > On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munro > wrote: >>> Yes, this is what I am trying to explain on parallel create index >>> thread. I think there we need to either use >>> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a >>> new API as proposed in that thread) if we don't want to use barriers. >>> I see a minor disadvantage in using WaitForParallelWorkersToFinish >>> which I will say on that thread. >> >> Ah, I see. So if you wait for them to attach you can detect >> unexpected dead workers (via shm_mq_receive), at the cost of having >> the leader wasting time waiting around for forked processes to say >> hello when it could instead be sorting tuples. > > The leader can go ahead and sort before calling something like a new > WaitForParallelWorkersToAttach() function (or even > WaitForParallelWorkersToFinish()). If we did add a > WaitForParallelWorkersToAttach() function, then the performance hit > would probably not be noticeable in practice. The > parallel_leader_participates case would still work without special > care (that's the main hazard that makes using a barrier seem > unappealing). > +1. I also think so. Another advantage is that even if one of the workers is not able to start, we don't need to return an error at the end of the query, rather we can detect that situation early and can execute the query successfully. OTOH, if we are not convinced about performance implications, then WaitForParallelWorkersToFinish should serve the purpose which can be called at later point of time. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Possible performance regression in version 10.1 with pgbench read-write tests.
On Wed, Jan 24, 2018 at 7:36 AM, Amit Kapilawrote: > Both the cases look identical, but from the document attached, it > seems the case-1 is for scale factor 300. Oops sorry it was a typo. CASE 1 is scale factor 300 which will fit in shared buffer =8GB. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Jan 23, 2018 at 9:02 PM, Thomas Munrowrote: >> Yes, this is what I am trying to explain on parallel create index >> thread. I think there we need to either use >> WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a >> new API as proposed in that thread) if we don't want to use barriers. >> I see a minor disadvantage in using WaitForParallelWorkersToFinish >> which I will say on that thread. > > Ah, I see. So if you wait for them to attach you can detect > unexpected dead workers (via shm_mq_receive), at the cost of having > the leader wasting time waiting around for forked processes to say > hello when it could instead be sorting tuples. The leader can go ahead and sort before calling something like a new WaitForParallelWorkersToAttach() function (or even WaitForParallelWorkersToFinish()). If we did add a WaitForParallelWorkersToAttach() function, then the performance hit would probably not be noticeable in practice. The parallel_leader_participates case would still work without special care (that's the main hazard that makes using a barrier seem unappealing). -- Peter Geoghegan
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapilawrote: >>> I am going to repeat my previous suggest that we use a Barrier here. >>> Given the discussion subsequent to my original proposal, this can be a >>> lot simpler than what I suggested originally. Each worker does >>> BarrierAttach() before beginning to read tuples (exiting if the phase >>> returned is non-zero) and BarrierArriveAndDetach() when it's done >>> sorting. The leader does BarrierAttach() before launching workers and >>> BarrierArriveAndWait() when it's done sorting. > > How does leader detect if one of the workers does BarrierAttach and > then fails (either exits or error out) before doing > BarrierArriveAndDetach? If you attach and then exit cleanly, that's a programming error and would cause anyone who runs BarrierArriveAndWait() to hang forever. If you attach and raise an error, the leader will receive an error message via CFI() and will then raise an error itself and terminate all workers during cleanup. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 5:43 PM, Amit Kapilawrote: >> Hmm. Yeah. I can't seem to reach a stuck case and was probably just >> confused and managed to confuse Robert too. If you make >> fork_process() fail randomly (see attached), I see that there are a >> couple of easily reachable failure modes (example session at bottom of >> message): >> > > In short, we are good with committed code. Right? Yep. Sorry for the noise. > Yes, this is what I am trying to explain on parallel create index > thread. I think there we need to either use > WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a > new API as proposed in that thread) if we don't want to use barriers. > I see a minor disadvantage in using WaitForParallelWorkersToFinish > which I will say on that thread. Ah, I see. So if you wait for them to attach you can detect unexpected dead workers (via shm_mq_receive), at the cost of having the leader wasting time waiting around for forked processes to say hello when it could instead be sorting tuples. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Wed, Jan 24, 2018 at 12:20 AM, Peter Geogheganwrote: > On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas wrote: >> As Amit says, what remains is the case where fork() fails or the >> worker dies before it reaches the line in ParallelWorkerMain that >> reads shm_mq_set_sender(mq, MyProc). In those cases, no error will be >> signaled until you call WaitForParallelWorkersToFinish(). If you wait >> prior to that point for a number of workers equal to >> nworkers_launched, you will wait forever in those cases. > > Another option might be to actually call > WaitForParallelWorkersToFinish() in place of a condition variable or > barrier, as Amit suggested at one point. > Yes, the only thing that is slightly worrying about using WaitForParallelWorkersToFinish is that backend leader needs to wait for workers to finish rather than just finishing sort related work. I think there shouldn't be much difference between when the sort is done and the workers actually finish the remaining resource cleanup. However, OTOH, if we are not okay with this solution and want to go with some kind of usage of barriers to solve this problem, then we can evaluate that as well, but I feel it is better if we can use the method which is used in other parallelism code to solve this problem (which is to use WaitForParallelWorkersToFinish). >> I am going to repeat my previous suggest that we use a Barrier here. >> Given the discussion subsequent to my original proposal, this can be a >> lot simpler than what I suggested originally. Each worker does >> BarrierAttach() before beginning to read tuples (exiting if the phase >> returned is non-zero) and BarrierArriveAndDetach() when it's done >> sorting. The leader does BarrierAttach() before launching workers and >> BarrierArriveAndWait() when it's done sorting. >> How does leader detect if one of the workers does BarrierAttach and then fails (either exits or error out) before doing BarrierArriveAndDetach? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 9:55 AM, Thomas Munrowrote: > On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapila wrote: >> On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas wrote: >>> In the case of Gather, for example, we won't >>> WaitForParallelWorkersToFinish() until we've read all the tuples. If >>> there's a tuple queue that does not yet have a sender, then we'll just >>> wait for it to get one, even if the sender fell victim to a fork >>> failure and is never going to show up. >>> >> >> Hmm, I think that case will be addressed because tuple queues can >> detect if the leader is not attached. It does in code path >> shm_mq_receive->shm_mq_counterparty_gone. In >> shm_mq_counterparty_gone, it can detect if the worker is gone by using >> GetBackgroundWorkerPid. Moreover, I have manually tested this >> particular case before saying your patch is fine. Do you have some >> other case in mind which I am missing? > > Hmm. Yeah. I can't seem to reach a stuck case and was probably just > confused and managed to confuse Robert too. If you make > fork_process() fail randomly (see attached), I see that there are a > couple of easily reachable failure modes (example session at bottom of > message): > In short, we are good with committed code. Right? > 1. HandleParallelMessages() is reached and raises a "lost connection > to parallel worker" error because shm_mq_receive() returns > SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked > GetBackgroundWorkerPid() just as you said. I guess that's happening > because some other process is (coincidentally) sending > PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a > process is unexpectedly stopped. > > 2. WaitForParallelWorkersToFinish() is reached and raises a "parallel > worker failed to initialize" error. TupleQueueReaderNext() set done > to true, because shm_mq_receive() returned SHM_MQ_DETACHED. Once > again, that is because shm_mq_counterparty_gone() returned true. This > is the bit Robert and I missed in our off-list discussion. > > As long as we always get our latch set by the postmaster after a fork > failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is > guaranteed to return BGWH_STOPPED after that, and as long as we only > ever use latch/CFI loops to wait, and as long as we try to read from a > shm_mq, then I don't see a failure mode that hangs. > > This doesn't help a parallel.c client that doesn't try to read from a > shm_mq though, like parallel CREATE INDEX (using the proposed design > without dynamic Barrier). > Yes, this is what I am trying to explain on parallel create index thread. I think there we need to either use WaitForParallelWorkersToFinish or WaitForParallelWorkersToAttach (a new API as proposed in that thread) if we don't want to use barriers. I see a minor disadvantage in using WaitForParallelWorkersToFinish which I will say on that thread. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 10:03 AM, Peter Geogheganwrote: > On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munro > wrote: >>> Hmm, I think that case will be addressed because tuple queues can >>> detect if the leader is not attached. It does in code path >>> shm_mq_receive->shm_mq_counterparty_gone. In >>> shm_mq_counterparty_gone, it can detect if the worker is gone by using >>> GetBackgroundWorkerPid. Moreover, I have manually tested this >>> particular case before saying your patch is fine. Do you have some >>> other case in mind which I am missing? >> >> Hmm. Yeah. I can't seem to reach a stuck case and was probably just >> confused and managed to confuse Robert too. If you make >> fork_process() fail randomly (see attached), I see that there are a >> couple of easily reachable failure modes (example session at bottom of >> message): >> >> 1. HandleParallelMessages() is reached and raises a "lost connection >> to parallel worker" error because shm_mq_receive() returns >> SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked >> GetBackgroundWorkerPid() just as you said. I guess that's happening >> because some other process is (coincidentally) sending >> PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a >> process is unexpectedly stopped. >> >> 2. WaitForParallelWorkersToFinish() is reached and raises a "parallel >> worker failed to initialize" error. TupleQueueReaderNext() set done >> to true, because shm_mq_receive() returned SHM_MQ_DETACHED. Once >> again, that is because shm_mq_counterparty_gone() returned true. This >> is the bit Robert and I missed in our off-list discussion. >> >> As long as we always get our latch set by the postmaster after a fork >> failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is >> guaranteed to return BGWH_STOPPED after that, and as long as we only >> ever use latch/CFI loops to wait, and as long as we try to read from a >> shm_mq, then I don't see a failure mode that hangs. > > What about the parallel_leader_participation=off case? > There is nothing special about that case, there shouldn't be any problem till we can detect the worker failures appropriately. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Jan 23, 2018 at 8:25 PM, Thomas Munrowrote: >> Hmm, I think that case will be addressed because tuple queues can >> detect if the leader is not attached. It does in code path >> shm_mq_receive->shm_mq_counterparty_gone. In >> shm_mq_counterparty_gone, it can detect if the worker is gone by using >> GetBackgroundWorkerPid. Moreover, I have manually tested this >> particular case before saying your patch is fine. Do you have some >> other case in mind which I am missing? > > Hmm. Yeah. I can't seem to reach a stuck case and was probably just > confused and managed to confuse Robert too. If you make > fork_process() fail randomly (see attached), I see that there are a > couple of easily reachable failure modes (example session at bottom of > message): > > 1. HandleParallelMessages() is reached and raises a "lost connection > to parallel worker" error because shm_mq_receive() returns > SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked > GetBackgroundWorkerPid() just as you said. I guess that's happening > because some other process is (coincidentally) sending > PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a > process is unexpectedly stopped. > > 2. WaitForParallelWorkersToFinish() is reached and raises a "parallel > worker failed to initialize" error. TupleQueueReaderNext() set done > to true, because shm_mq_receive() returned SHM_MQ_DETACHED. Once > again, that is because shm_mq_counterparty_gone() returned true. This > is the bit Robert and I missed in our off-list discussion. > > As long as we always get our latch set by the postmaster after a fork > failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is > guaranteed to return BGWH_STOPPED after that, and as long as we only > ever use latch/CFI loops to wait, and as long as we try to read from a > shm_mq, then I don't see a failure mode that hangs. What about the parallel_leader_participation=off case? -- Peter Geoghegan
Regarding ambulkdelete, amvacuumcleanup index methods
Hai all, We are building In-memory index extension for postgres. We would capture table inserts, updates, deletes using triggers. During vacuum operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as part of index cleanup). As we handle all updates, deletes using triggers, we don't have to do any index cleanup in ambulkdelete. But, what stats should i return from ambulkdelete and amvacuumcleanup? Is that necessary to return stats from ambulkdelete and amvacuumcleanup ?
Re: PATCH: Configurable file mode mask
On Tue, Jan 23, 2018 at 10:48:07PM -0500, David Steele wrote: > Sorry - it means "level of effort". I was trying to get an idea if it > is something that could be available in the PG11 development cycle, or > if I should be looking at other ways to get the testing for this patch > completed. I don't think that it would be more than a couple of hours digging things out, but that may be optimistic. What only needs to be done is extending get_new_node with an optional argument to define the bin directory of a PostgresNode and register the path. If the binary directory is undefined, just rely on PATH and call pg_config --bindir to get the information, then register it. What we need to be careful at is that all the binary calls of PostgresNode.pm need to be changed to use the binary path, which is not really complicated, but it can be easy to miss some. There is one small issue with TestLib::check_pg_config but I think that we can live with the existing version relying on PATH. Once this refactoring is done, you need the patch set I sent previously here with some tiny modifications: https://www.postgresql.org/message-id/CAB7nPqRJXz0sEuUL36eBsF7iZtOQGMJoJPGFWxHLuS6TYPxf5w%40mail.gmail.com Those modifications involve just looking at the environment variables specified in pg_upgrade's TESTING and change the node binaries if those are defined. It is also necessary to tweak probin's paths for the dump consistency checks, which is something that my last patch set was not doing. It is tiring to see this topic come up again and again, so you know what, let's bite the bullet. I commit myself into coding this thing with a patch set for the next commit fest. the only thing I want to be sure about is if folks on this list are fine with the plan of this email. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 3:45 PM, Amit Kapilawrote: > On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas wrote: >> In the case of Gather, for example, we won't >> WaitForParallelWorkersToFinish() until we've read all the tuples. If >> there's a tuple queue that does not yet have a sender, then we'll just >> wait for it to get one, even if the sender fell victim to a fork >> failure and is never going to show up. >> > > Hmm, I think that case will be addressed because tuple queues can > detect if the leader is not attached. It does in code path > shm_mq_receive->shm_mq_counterparty_gone. In > shm_mq_counterparty_gone, it can detect if the worker is gone by using > GetBackgroundWorkerPid. Moreover, I have manually tested this > particular case before saying your patch is fine. Do you have some > other case in mind which I am missing? Hmm. Yeah. I can't seem to reach a stuck case and was probably just confused and managed to confuse Robert too. If you make fork_process() fail randomly (see attached), I see that there are a couple of easily reachable failure modes (example session at bottom of message): 1. HandleParallelMessages() is reached and raises a "lost connection to parallel worker" error because shm_mq_receive() returns SHM_MQ_DETACHED, I think because shm_mq_counterparty_gone() checked GetBackgroundWorkerPid() just as you said. I guess that's happening because some other process is (coincidentally) sending PROCSIG_PARALLEL_MESSAGE at shutdown, causing us to notice that a process is unexpectedly stopped. 2. WaitForParallelWorkersToFinish() is reached and raises a "parallel worker failed to initialize" error. TupleQueueReaderNext() set done to true, because shm_mq_receive() returned SHM_MQ_DETACHED. Once again, that is because shm_mq_counterparty_gone() returned true. This is the bit Robert and I missed in our off-list discussion. As long as we always get our latch set by the postmaster after a fork failure (ie kill SIGUSR1) and after GetBackgroundWorkerPid() is guaranteed to return BGWH_STOPPED after that, and as long as we only ever use latch/CFI loops to wait, and as long as we try to read from a shm_mq, then I don't see a failure mode that hangs. This doesn't help a parallel.c client that doesn't try to read from a shm_mq though, like parallel CREATE INDEX (using the proposed design without dynamic Barrier). We can't rely on a coincidental PROCSIG_PARALLEL_MESSAGE like in (1), and it's not going to try to read from a queue like in (2). So wouldn't it need a way to perform its own similar check for unexpectedly BGWH_STOPPED processes whenever its latch is set? If there were some way for the postmaster to cause reason PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of just notification via kill(SIGUSR1) when it fails to fork a parallel worker, we'd get (1) for free in any latch/CFI loop code. But I understand that we can't do that by project edict. === postgres=# create table foox as select generate_series(1, 1) as a; SELECT 1 postgres=# alter table foox set (parallel_workers = 4); ALTER TABLE postgres=# set max_parallel_workers_per_gather = 4; SET postgres=# select count(*) from foox; ERROR: lost connection to parallel worker postgres=# select count(*) from foox; ERROR: parallel worker failed to initialize HINT: More details may be available in the server log. postgres=# select count(*) from foox; count --- 1 (1 row) postgres=# select count(*) from foox; ERROR: lost connection to parallel worker postgres=# select count(*) from foox; ERROR: lost connection to parallel worker postgres=# select count(*) from foox; ERROR: lost connection to parallel worker postgres=# select count(*) from foox; ERROR: lost connection to parallel worker -- Thomas Munro http://www.enterprisedb.com chaos-monkey-fork-process.patch Description: Binary data
Re: documentation is now XML
"David G. Johnston"writes: > On Tuesday, January 23, 2018, Bruce Momjian wrote: >> All agreed, but what alternatives are being developed? > I seem to recall a proposal a while back to gain margin on some of the > limits by pruning the release notes section down to at least this century > and archiving putting the older ones elsewhere. Yeah; I did and still do think that's a good idea. But so far as the toolchain is concerned, that's just a band-aid. Anyway, we're on XML now, and it seems to be working fairly well. I don't feel a need to revisit that. It's probably true that the TeX-based toolchain was potentially capable of producing finer typesetting results than the XML chain ... but, honestly, who's printing the PG manual on dead trees anymore? I find the PDF output to be mostly a legacy thing in the first place. regards, tom lane
Re: [HACKERS] MERGE SQL Statement for PG11
On 24 January 2018 at 01:35, Peter Geogheganwrote: > On Tue, Jan 23, 2018 at 5:51 AM, Simon Riggs wrote: >>> Not yet working >>> * Partitioning >>> * RLS >>> >>> Based on this successful progress I imagine I'll be looking to commit >>> this by the end of the CF, allowing us 2 further months to bugfix. >> >> This is complete and pretty clean now. 1200 lines of code, plus docs and >> tests. > > That timeline seems aggressive to me. Thank you for the review. > Also, the patch appears to have > bitrot. Please rebase, and post a new version. Will do, though I'm sure that's only minor since we rebased only a few days ago. > Some feedback based on a short read-through of your v11: > > * What's the postgres_fdw status? MERGE currently works with normal relations and materialized views only. > * Relatedly, when/where does applying the junkfilter happen, if not here?: A few lines later in the same file. Junkfilters are still used. > * Isn't "consider INSERT ... ON CONFLICT DO UPDATE" obsolete in these > doc changes?: > >> -F312 MERGE statement NO consider INSERT ... ON CONFLICT DO UPDATE >> -F313 Enhanced MERGE statementNO >> -F314 MERGE statement with DELETE branch NO >> +F312 MERGE statement YES consider INSERT ... ON CONFLICT DO UPDATE >> +F313 Enhanced MERGE statementYES >> +F314 MERGE statement with DELETE branch YES I think that it is still useful to highlight the existence of a non-standard feature which people might not be otherwise unaware. If you wish me to remove a reference to your work, I will bow to your wish. > * What's the deal with transition tables? Your docs say this: > >> + >> +> class="parameter">source_table_name >> + >> + >> + The name (optionally schema-qualified) of the source table, view or >> + transition table. >> + >> + >> + > > But the code says this: > >> + /* >> +* XXX if we support transition tables this would need to move >> earlier >> +* before ExecSetupTransitionCaptureState() >> +*/ >> + switch (action->commandType) Changes made by MERGE can theoretically be visible in a transition table. When I tried to implement that there were problems caused by the way INSERT ON CONFLICT has been implemented, so it will take a fair amount of study and lifting to make that work. For now, they are not supported and generate an error message. That behaviour was not documented, but I've done that now. SQL Std says "Transition tables are not generally updatable (and therefore not simply updatable) and their columns are not updatable." So MERGE cannot be a target of a transition table, but it can be the source of one. So the docs you cite are correct, as is the comment. > * Can UPDATEs themselves accept an UPDATE-style WHERE clause, in > addition to the WHEN quals, and the main ON join quals? > > I don't see any examples of this in your regression tests. No, they can't. Oracle allows it but it isn't SQL Standard. > * You added a new MERGE command tag. Shouldn't there be changes to > libpq's fe-exec.c, to go along with that? Nice catch. One liner patch and docs updated in repo for next patch. > * I noticed this: > >> + else if (node->operation == CMD_MERGE) >> + { >> + /* >> + * XXX Add more detailed instrumentation for MERGE changes >> + * when running EXPLAIN ANALYZE? >> + */ >> + } > > I think that doing better here is necessary. Please define better. It may be possible. > * I'm not a fan of stored rules, but I still think that this needs to > be discussed: > >> - product_queries = fireRules(parsetree, >> + /* >> +* First rule of MERGE club is we don't talk about rules >> +*/ >> + if (event == CMD_MERGE) >> + product_queries = NIL; >> + else >> + product_queries = fireRules(parsetree, >> result_relation, >> event, >> locks, It has been discussed, in my recent proposal, mentioned in the doc page for clarity. It has also been discussed previously in discussions around MERGE. It's not clear how they would work, but we already have an example of a statement type that doesn't support rules. > * This seems totally unnecessary at first blush: > >> + /* >> +* Lock tuple for update. >> +* >> +* XXX Is this really needed? I put this in >> +* just to get hold of the existing tuple. >> +* But if we do need, then we probably >> +* should be looking at the return value of >> +* heap_lock_tuple() and take appropriate >> +* action. >> +
Re: [PATCH] fix for C4141 warning on MSVC
Thomas Munrowrites: > Here's one like that. Pushed; we'll soon see if the buildfarm likes it. I added a tweak to prevent forced inlining at -O0, as discussed in the other thread; and worked on the comments a bit. regards, tom lane
Re: Incorrect comments in partition.c
On 2018/01/23 20:43, Etsuro Fujita wrote: > Here is a comment for get_qual_for_list in partition.c: > > * get_qual_for_list > * > * Returns an implicit-AND list of expressions to use as a list partition's > - * constraint, given the partition key and bound structures. > > I don't think the part "given the partition key and bound structures." > is correct because we pass the *parent relation* and partition bound > structure to that function. So I think we should change that part as > such. get_qual_for_range has the same issue, so I think we need this > change for that function as well. Yeah. It seems 6f6b99d1335 [1] changed their signatures to support handling default partitions. > Another one I noticed in comments in partition.c is: > > * get_qual_for_hash > * > * Given a list of partition columns, modulus and remainder > corresponding to a > * partition, this function returns CHECK constraint expression Node for > that > * partition. > > I think the part "Given a list of partition columns, modulus and > remainder corresponding to a partition" is wrong because we pass to that > function the parent relation and partition bound structure the same way > as for get_qual_for_list/get_qual_for_range. So what about changing the > above to something like this, similarly to > get_qual_for_list/get_qual_for_range: > > Returns a CHECK constraint expression to use as a hash partition's > constraint, given the parent relation and partition bound structure. Makes sense. > Also: > > * The partition constraint for a hash partition is always a call to the > * built-in function satisfies_hash_partition(). The first two > arguments are > * the modulus and remainder for the partition; the remaining arguments > are the > * values to be hashed. > > I also think the part "The first two arguments are the modulus and > remainder for the partition;" is wrong (see satisfies_hash_partition). > But I don't think we need to correct that here because we have described > about the arguments in comments for that function. So I'd like to > propose removing the latter comment entirely from the above. Here, too. The comment seems to have been obsoleted by f3b0897a121 [2]. Thanks, Amit [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f6b99d1335 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f3b0897a121
Re: PATCH: Configurable file mode mask
On 1/23/18 9:22 PM, Michael Paquier wrote: > On Tue, Jan 23, 2018 at 09:18:51AM -0500, David Steele wrote: >> On 1/20/18 5:47 PM, Michael Paquier wrote: >>> Making this possible would require first some >>> refactoring of PostgresNode.pm so as a node is aware of the binary paths >>> it uses to be able to manipulate multiple instances with different >>> install paths at the same time. >> >> Any idea of what the LOE would be for that? > > What does LOE means? Sorry - it means "level of effort". I was trying to get an idea if it is something that could be available in the PG11 development cycle, or if I should be looking at other ways to get the testing for this patch completed. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: documentation is now XML
On Tuesday, January 23, 2018, Bruce Momjianwrote: > On Tue, Jan 23, 2018 at 10:22:53PM -0500, Tom Lane wrote: > (a) it's got hard > > limits we're approaching, > All agreed, but what alternatives are being developed? > > I seem to recall a proposal a while back to gain margin on some of the limits by pruning the release notes section down to at least this century and archiving putting the older ones elsewhere. David J.
Re: documentation is now XML
Bruce Momjianwrites: > On Thu, Nov 23, 2017 at 03:39:24PM -0500, Tom Lane wrote: >> Also, we're way overdue for getting out from under the creaky TeX-based >> toolchain for producing PDFs. > I am coming in late here, but I am not aware of any open source > professional typesetting software that has output quality as good as > TeX. I like TeX as much as the next guy --- I wrote my thesis with it, a long time ago --- but there's no denying that (a) it's got hard limits we're approaching, (b) the downstream conversion to PDF is buggy, and (c) nobody is working on fixing it. regards, tom lane
Re: copy.c allocation constant
On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote: > > > While reading copy.c I noticed this line: > > > #define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ > > > Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as > 65535 so we palloc a power of 2? > > > I have no idea if this affects performance, but it did strike me as strange. Coming in late here, but it does seem very odd. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: pgsql: Allow UPDATE to move rows between partitions.
On Tue, Jan 23, 2018 at 9:59 PM, Robert Haaswrote: > On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapila wrote: >> On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas wrote: >>> Allow UPDATE to move rows between partitions. >> >> +If an UPDATE on a partitioned table causes a row to >> move >> +to another partition, it will be performed as a >> DELETE >> +from the original partition followed by an INSERT >> into >> +the new partition. In this case, all row-level BEFORE >> +UPDATE triggers and all row-level >> +BEFORE DELETE triggers are fired >> on >> +the original partition. >> >> Do we need to maintain triggers related behavior for logical >> replication? In logical replication, we use ExecSimpleRelationDelete >> to perform Delete operation which is not aware of this special >> behavior (execute before update trigger for this case). > > Hmm. I don't think there's any way for the logical decoding > infrastructure to identify this case at present. I suppose if we want > that behavior, we'd need to modify the WAL format, and the changes > might not be too straightforward because the after-image of the tuple > wouldn't be available in the DELETE record. I think the only reason > we fire the UPDATE triggers is because we can't decide until after > they've finished executing that we really want to DELETE and INSERT > instead; by the time we are replicating the changes, we know what the > final shape of the operation ended up being, so it's not clear to me > that firing UPDATE triggers at that point would be useful. I fear > that trying to change this is going to cost performance (and developer > time) for no real benefit, so my gut feeling is to leave it alone. > I think the problem is not only for triggers behavior but also for the pending patch which takes care of concurrent updates/deletes. I think in case of logical replication we won't be able to detect that the row has been moved to another partition by the logical worker as part of replaying the actions done on master. See my response to Amul's patch [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1LHVnNWYF53F1gUGx6CTxuvznozvU-Lr-dfE%3DQeu1gEcg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Wed, Jan 24, 2018 at 2:35 AM, Robert Haaswrote: > On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas wrote: >> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila >> wrote: >> >> OK. I've committed this patch and back-patched it to 9.6. >> Back-patching to 9.5 didn't looks simple because nworkers_launched >> doesn't exist on that branch, and it doesn't matter very much anyway >> since the infrastructure has no in-core users in 9.5. > > I was just talking to Thomas, and one or the other of us realized that > this doesn't completely fix the problem. I think that if a worker > fails, even by some unorthodox mechanism like an abrupt proc_exit(1), > after the point where it attached to the error queue, this patch is > sufficient to make that reliably turn into an error. But what if it > fails before that - e.g. fork() failure? In that case, this process > will catch the problem if the calling process calls > WaitForParallelWorkersToFinish, but does that actually happen in any > interesting cases? Maybe not. > > In the case of Gather, for example, we won't > WaitForParallelWorkersToFinish() until we've read all the tuples. If > there's a tuple queue that does not yet have a sender, then we'll just > wait for it to get one, even if the sender fell victim to a fork > failure and is never going to show up. > Hmm, I think that case will be addressed because tuple queues can detect if the leader is not attached. It does in code path shm_mq_receive->shm_mq_counterparty_gone. In shm_mq_counterparty_gone, it can detect if the worker is gone by using GetBackgroundWorkerPid. Moreover, I have manually tested this particular case before saying your patch is fine. Do you have some other case in mind which I am missing? > We could *almost* fix this by having execParallel.c include a Barrier > in the DSM, similar to what I proposed for parallel CREATE INDEX. > When a worker starts, it attaches to the barrier; when it exits, it > detaches. Instead of looping until the leader is done and all the > tuple queues are detached, Gather or Gather Merge could loop until the > leader is done and everyone detaches from the barrier. But that would > require changes to the Barrier API, which isn't prepared to have the > caller alternate waiting with other activity like reading the tuple > queues, which would clearly be necessary in this case. Moreover, it > doesn't work at all when parallel_leader_participation=off, because in > that case we'll again just wait forever for some worker to show up, > and if none of them do, then we're hosed. > > One idea to fix this is to add a new function in parallel.c with a > name like CheckForFailedWorkers() that Gather and Gather Merge call > just before they WaitLatch(). If a worker fails to start, the > postmaster will send SIGUSR1 to the leader, which will set the latch, > so we can count on the function being run after every worker death. > The function would go through and check each worker for which > pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to > see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) == > BGWH_STOPPED. If so, ERROR. > > This lets us *run* for arbitrary periods of time without detecting a > fork failure, but before we *wait* we will notice. Getting more > aggressive than that sounds expensive, but I'm open to ideas. > > If we adopt this approach, then Peter's patch could probably make use > of it, too. It's a little tricky because ConditionVariableSleep() > tries not to return spuriously, so we'd either have to change that or > stick CheckForFailedWorkers() into that function's loop. I suspect > the former is a better approach. Or maybe that patch should still use > the Barrier approach and avoid all of this annoyance. > I think we can discuss your proposed solution once the problem is clear. As of now, it is not very clear to me whether there is any pending problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Handling better supported channel binding types for SSL implementations
On Tue, Jan 23, 2018 at 12:08:37PM -0500, Peter Eisentraut wrote: > On 1/22/18 02:29, Michael Paquier wrote: >> However there is as well the argument that this list's contents are not >> directly used now, and based on what I saw from the MacOS SSL and GnuTLS >> patches that would not be the case after either. > > Right, there is no facility for negotiating the channel binding type, so > a boolean result should be enough. I am not completely convinced either that we need to complicate the code to handle channel binding type negotiation. > In which case we wouldn't actually need this for GnuTLS yet. Sure. This depends mainly on how the patch for Mac's Secure Transport moves forward. -- Michael signature.asc Description: PGP signature
Re: PATCH: Configurable file mode mask
On Tue, Jan 23, 2018 at 09:18:51AM -0500, David Steele wrote: > On 1/20/18 5:47 PM, Michael Paquier wrote: >> Making this possible would require first some >> refactoring of PostgresNode.pm so as a node is aware of the binary paths >> it uses to be able to manipulate multiple instances with different >> install paths at the same time. > > Any idea of what the LOE would be for that? What does LOE means? -- Michael signature.asc Description: PGP signature
Re: [PATCH] fix for C4141 warning on MSVC
On Wed, Jan 24, 2018 at 1:42 PM, Tom Lanewrote: > Thomas Munro writes: >> On Wed, Jan 24, 2018 at 1:16 PM, Michail Nikolaev >> wrote: >>> Just very small fix for C4141 warning > >> Thanks. This is similar to the fix I proposed over here: >> https://www.postgresql.org/message-id/CAEepm%3D2iTKvbebiK3CdoczQk4_FfDt1EeU4c%2BnGE340JH7gQ0g%40mail.gmail.com > >> Perhaps we should combine these? > > +1. The previous thread seems to have died off with nothing happening, > but we still have those issues to resolve. I like the idea of making > the macro be a direct drop-in substitute for "inline" rather than > something you use in addition to "inline". And if we do that, it > definitely should expand to plain "inline" if we have no other idea. Here's one like that. -- Thomas Munro http://www.enterprisedb.com fix-warnings-about-always-inline-v2.patch Description: Binary data
Re: Failed to request an autovacuum work-item in silence
On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mellowrote: > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada > escreveu: >> >> Hi all, >> >> While reading the code, I realized that the requesting an autovacuum >> work-item could fail in silence if work-item array is full. So the >> users cannot realize that work-item is never performed. >> AutoVacuumRequestWork() seems to behave so from the initial >> implementation but is there any reason of such behavior? It seems to >> me that it can be a problem even now that there is only one kind of >> work-item. Attached patch for fixing it. > > > Seems reasonable but maybe you can use the word "worker" instead of "work > item" for report message. > Thank you for the comment. Or can we use the word "work-item" since the commit log and source code use this word? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: pgsql: Add parallel-aware hash joins.
On Wed, Jan 24, 2018 at 12:10 PM, Tom Lanewrote: > There is a very clear secular trend up in the longer data series, > which indicates that we're testing more stuff, +1 > which doesn't bother > me in itself as long as the time is well spent. However, the trend > over the last two months is very bad, and I do not think that we can > point to any large improvement in test coverage that someone committed > since November. I'm not sure if coverage.postgresql.org has a way to view historical reports so we can see the actual percentage change, but as I recall it, commit fa330f9ad "Add some regression tests that exercise hash join code." pushed nodeHash.c and possibly nodeHashJoin.c into green territory on here: https://coverage.postgresql.org/src/backend/executor/index.html > Looking more closely at the shorter series, there are four pretty obvious > step changes since 2016-09. The PNG's x-axis doesn't have enough > resolution to match these up to commits, but looking at the underlying > data, they clearly correspond to: > > Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400 > Improve regression test coverage for hash indexes. > > Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400 > Speed up hash_index regression test. > > Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800 > Add some regression tests that exercise hash join code. Joining check runtimes with the commit log (see attached), I see: 2017-11-30 | fa330f9a | Add some regression tests that exercise | 00:08:30 2017-11-29 | 84940644 | New C function: bms_add_range| 00:08:18 That's +2.4%. > Branch: master [180428404] 2017-12-21 00:43:41 -0800 > Add parallel-aware hash joins. 2017-12-21 | cce1ecfc | Adjust assertion in GetCurrentCommandId. | 00:09:03 2017-12-21 | 6719b238 | Rearrange execution of PARAM_EXTERN Para | 2017-12-21 | 8a0596cb | Get rid of copy_partition_key| 2017-12-21 | 9ef6aba1 | Fix typo | 2017-12-21 | c98c35cd | Avoid putting build-location-dependent s | 2017-12-21 | 59d1e2b9 | Cancel CV sleep during subtransaction ab | 2017-12-21 | 18042840 | Add parallel-aware hash joins. | 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45 That's +3.4%. That's a bit more than I expected. I saw 2.5% on my development box and hoped that'd be OK for a complex feature with a lot of paths to test. But hang on a minute -- how did we get to 08:45 from 08:30 between those commits? Of course this is all noisy data and individual samples are all over the place, but I think I see some signal here: 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45 2017-12-19 | 7d3583ad | Test instrumentation of Hash nodes with | 00:08:43 2017-12-19 | 8526bcb2 | Try again to fix accumulation of paralle | 2017-12-19 | 38fc5470 | Re-fix wrong costing of Sort under Gathe | 00:08:31 2017-12-19 | 09a65f5a | Mark a few parallelism-related variables | 00:08:27 Both 8526bcb2 and 7d3583ad add Gather rescans. Doesn't 8526bcb2 add a query that forks 40 times? I wonder how long it takes to start a background worker on a Mac Cube. From that commit: + -> Gather (actual rows=9800 loops=10) + Workers Planned: 4 + Workers Launched: 4 + -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > I thought that the hash index test case was excessively expensive for > what it covered, and I'm now thinking the same about hash joins. Does join-test-shrink.patch (from my earlier message) help much, on prairiedog? It cut "check" time by ~1.5% on my low end machine. -- Thomas Munro http://www.enterprisedb.com date| commit | desc | check +--+--+-- 2018-01-22 | b3f84012 | Move handling of database properties fro | 00:09:20 2018-01-22 | d6c84667 | Reorder code in pg_dump to dump comments | 2018-01-22 | f4987043 | PL/Python: Fix tests for older Python ve | 2018-01-22 | 2b792ab0 | Make pg_dump's ACL, sec label, and comme | 2018-01-22 | 8561e484 | Transaction control in PL procedures | 2018-01-22 | b9ff79b8 | Fix docs typo| 2018-01-21 | 1cc4f536 | Support huge pages on Windows| 00:09:20 2018-01-21 | 5c15a54e | Fix wording of "hostaddrs" | 00:09:19 2018-01-21 | 815f84aa | doc: update intermediate certificate in | 2018-01-20 | 918e02a2 | Improve type conversion of SPI_processed | 00:09:17 2018-01-20 | 96102a32 | Suppress possibly-uninitialized-variable | 00:09:18 2018-01-19 | eee50a8d | PL/Python: Simplify PLyLong_FromInt64| 00:09:19 2018-01-19 | 2f178441 | Allow UPDATE to move rows between partit | 00:09:14 2018-01-19 | 7f17fd6f | Fix CompareIndexInfo's attnum comparison | 2018-01-19 | 8b9e9644 | Replace AclObjectKind with
Re: Remove utils/dsa.h from autovacuum.c
On Wed, Jan 24, 2018 at 3:39 AM, Alvaro Herrerawrote: > Masahiko Sawada wrote: >> Hi, >> >> Attached a patch for $subject. The implementation of autovacuum >> work-item has been changed by commit >> 31ae1638ce35c23979f9bcbb92c6bb51744dbccb but the loading of dsa.h >> header file is remained. > > You're right -- pushed. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Missing wal_receiver_status_interval in Subscribers section
On Wed, Jan 24, 2018 at 9:34 AM, Bruce Momjianwrote: > On Tue, Jan 23, 2018 at 06:36:04PM -0500, Bruce Momjian wrote: >> >> Can someone confirm this so I can apply this patch? > > Never mind. I see this was applied: > Thank you for your kind response. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
On Tue, Jan 23, 2018 at 04:22:22PM +0100, Daniel Gustafsson wrote: > On 23 Jan 2018, at 05:52, Michael Paquierwrote: >> 2) indexam.sgml mentions using pg_strcasecmp() for consistency with the >> core code when defining amproperty for an index AM. Well, with this >> patch I think that for consistency with the core code that would involve >> using strcmp instead, extension developers can of course still use >> pg_strcasecmp. > > That part I’m less sure about, the propname will not be casefolded by the > parser so pg_strcasecmp() should still be the recommendation here no? Yes, you are right. I had a brain fade here as all the option names here go through SQL-callable functions. >> Those are changed as well in the attached, which applies on top of your >> v6. I have added as well in it the tests I spotted were missing. If this >> looks right to you, I am fine to switch this patch as ready for >> committer, without considering the issues with isCachable and isStrict >> in CREATE FUNCTION of course. > > Apart from the amproperty hunk, I’m definately +1 on adding your patch on top > of my v6 one. Thanks for all your help and review! OK. Could you publish a v7? I will switch the entry as ready for committer. -- Michael signature.asc Description: PGP signature
RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
From: Robert Haas [mailto:robertmh...@gmail.com] > Oh, incidentally -- in our internal testing, we found that > wal_sync_method=open_datasync was significantly faster than > wal_sync_method=fdatasync. You might find that open_datasync isn't much > different from pmem_drain, even though they're both faster than fdatasync. That's interesting. How fast was open_datasync in what environment (Linux distro/kernel version, HDD or SSD etc.)? Is it now time to change the default setting to open_datasync on Linux, at least when O_DIRECT is not used (i.e. WAL archiving or streaming replication is used)? [Current port/linux.h] /* * Set the default wal_sync_method to fdatasync. With recent Linux versions, * xlogdefs.h's normal rules will prefer open_datasync, which (a) doesn't * perform better and (b) causes outright failures on ext4 data=journal * filesystems, because those don't support O_DIRECT. */ #define PLATFORM_DEFAULT_SYNC_METHODSYNC_METHOD_FDATASYNC pg_test_fsync showed open_datasync is slower on my RHEL6 VM: ep 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 4276.373 ops/sec 234 usecs/op fdatasync 4895.256 ops/sec 204 usecs/op fsync 4797.094 ops/sec 208 usecs/op fsync_writethrough n/a open_sync 4575.661 ops/sec 219 usecs/op Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 2243.680 ops/sec 446 usecs/op fdatasync 4347.466 ops/sec 230 usecs/op fsync 4337.312 ops/sec 231 usecs/op fsync_writethrough n/a open_sync 2329.700 ops/sec 429 usecs/op ep Regards Takayuki Tsunakawa
Re: documentation is now XML
On Thu, Nov 23, 2017 at 03:39:24PM -0500, Tom Lane wrote: > Also, we're way overdue for getting out from under the creaky TeX-based > toolchain for producing PDFs. Every time we make releases, I worry > whether we're going to get blindsided by its bugs with hotlinks that get > split across pages, since page breaks tend to vary in position depending > on exactly whose version of the toolchain and style sheets you build with. > I've also been living in fear of the day we hit some hardwired TeX limit > that we can't increase or work around. We've had to hack around such > limits repeatedly in the past (eg commit 944b41fc0). Now, it's probably > unlikely that growth of the release notes would be enough to put us over > the top in any back branch --- but if it did happen, we'd find out about > it at a point in the release cycle where there's very little margin for > error. I am coming in late here, but I am not aware of any open source professional typesetting software that has output quality as good as TeX. -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [PATCH] fix for C4141 warning on MSVC
On Wed, Jan 24, 2018 at 1:16 PM, Michail Nikolaevwrote: > Just very small fix for C4141 warning > (https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warning-level-1-c4141). > > Also could be viewed on Github - > https://github.com/michail-nikolaev/postgres/commit/38a590a00110a4ea870d625470e4c898e5ad79aa > > Tested both MSVC and gcc builds. Thanks. This is similar to the fix I proposed over here: https://www.postgresql.org/message-id/CAEepm%3D2iTKvbebiK3CdoczQk4_FfDt1EeU4c%2BnGE340JH7gQ0g%40mail.gmail.com Two differences: 1. I also fixed a warning from antique GCC which didn't understand this magic. 2. You also defined it as plain old "inline" for unknown compilers. Perhaps we should combine these? -- Thomas Munro http://www.enterprisedb.com
Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
On 23 January 2018 at 23:22, Amit Langotewrote: > On 2018/01/23 15:44, David Rowley wrote: >> Attached is what I had in mind about how to do this. > > Thanks for the delta patch. I will start looking at it tomorrow. Thanks. I've been looking more at this and I've made a few more adjustments in the attached. This delta patch should be applied against the faster_partition_prune_v21_delta_drowley_v1.patch one I sent yesterday. This changes a few comments, also now correctly passes the context to get_partitions_excluded_by_ne_clauses and fixes a small error where the patch was failing to record a notnull for the partition key when it saw a strict <> clause. It was only doing this for the opposite case, but both seem to be perfectly applicable. I also made a small adjustment to the regression tests to ensure this is covered. I'm now going to start work on basing the partition pruning patch on top of this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services faster_partition_prune_v21_delta_drowley_v1_delta.patch Description: Binary data
Re: Missing wal_receiver_status_interval in Subscribers section
Can someone confirm this so I can apply this patch? --- On Fri, Nov 17, 2017 at 06:34:29PM +0900, Masahiko Sawada wrote: > Hi, > > I found that the doc says in 19.6.4. Subscribers section that "Note > that wal_receiver_timeout and wal_retrieve_retry_interval > configuration parameters affect the logical replication workers as > well" but subscriber actually uses wal_receiver_status_interval as > well. So I think we should mention it as well. Any thoughts? > > Attached patch adds wal_receiver_stats_interval to the doc. > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index 996e825..3c8c504 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -3410,7 +3410,8 @@ ANY class="parameter">num_sync ( http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: [HACKERS] Planning counters in pg_stat_statements
Thomas Munrowrites: > One problem is that pgss_planner_hook doesn't have the source query > text. That problem could be solved by adding a query_string argument > to the planner hook function type and also planner(), > standard_planner(), pg_plan_query(), pg_plan_queries(). I don't know > if that change would be acceptable or if there is a better way that > doesn't change extern functions that will annoy extension owners. Within the planner, I'd be inclined to store the query string pointer in PlannerGlobal (which is accessible from PlannerInfo "root"), but I'm not sure how many of the functions you mention would still need an explicit signature change. Anyway that doesn't particularly bother me --- it's something that might need to happen anyway, if we ever hope to throw errors with location pointers from inside the planner. > Something I wondered about but didn't try: we could skip the above > problem AND the extra pgss_store() by instead pushing (query ID, > planning time) into a backend-private buffer and then later pushing it > into shmem when we eventually call pgss_store() for the execution > counters. Meh. Part of the reason I don't like what you submitted before is that it supposes there's a mostly one-to-one relationship between planner calls and executor calls; which there is not, when you start considering edge cases like prepared statements. A global would make that worse. > Unfortunately I'm not going to have bandwidth to figure this out > during this commitfest due to other priorities so I'm going to call > this "returned with feedback". OK. There's still time to get it done in the March 'fest. regards, tom lane
Re: pgsql: Add parallel-aware hash joins.
Robert Haaswrites: > On Mon, Jan 22, 2018 at 6:53 PM, Tom Lane wrote: >> Here's a possibly more useful graph of regression test timings over >> the last year. I pulled this from the buildfarm database: it is the >> reported runtime for the "installcheck-C" step in each successful >> build of HEAD on dromedary, going back to Jan. 2017. > Right, but this doesn't seem to show any big spike in the runtime at > the time when parallel hash was committed, or when the preparatory > patch to add test coverage for hash joins got committed. Rather, > there's a gradual increase over time. Well, there's just too much noise in this chart. Let's try another machine: prairiedog, which is a lot slower so that the 1s resolution isn't such a limiting factor, and it's also one that I know there hasn't been much of any system change in. The first attached PNG shows the "installcheck-C" runtime for as far back as the buildfarm database has the data, and the second zooms in on events since late 2016. As before, I've dropped individual outlier results (those significantly slower than any nearby run) on the grounds that they probably represent interference from nightly backups. I also attached the raw data (including outliers) in case anyone wants to do their own analysis. There is a very clear secular trend up in the longer data series, which indicates that we're testing more stuff, which doesn't bother me in itself as long as the time is well spent. However, the trend over the last two months is very bad, and I do not think that we can point to any large improvement in test coverage that someone committed since November. Looking more closely at the shorter series, there are four pretty obvious step changes since 2016-09. The PNG's x-axis doesn't have enough resolution to match these up to commits, but looking at the underlying data, they clearly correspond to: Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400 Improve regression test coverage for hash indexes. Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400 Speed up hash_index regression test. Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800 Add some regression tests that exercise hash join code. Branch: master [180428404] 2017-12-21 00:43:41 -0800 Add parallel-aware hash joins. I thought that the hash index test case was excessively expensive for what it covered, and I'm now thinking the same about hash joins. regards, tom lane snapshot | stage_duration -+ 2014-03-19 20:09:45 | 00:03:36 2014-03-20 03:14:59 | 00:03:36 2014-03-20 20:18:16 | 00:03:37 2014-03-21 15:14:58 | 00:03:37 2014-03-22 08:13:01 | 00:03:33 2014-03-23 17:31:09 | 00:03:36 2014-03-24 03:15:41 | 00:03:40 2014-03-24 16:25:33 | 00:03:40 2014-03-25 17:31:06 | 00:03:41 2014-03-26 07:40:27 | 00:03:42 2014-03-26 15:15:26 | 00:03:39 2014-03-28 05:31:19 | 00:03:42 2014-03-28 15:15:16 | 00:03:41 2014-03-29 03:15:25 | 00:03:41 2014-03-29 20:11:03 | 00:03:40 2014-03-30 04:25:56 | 00:03:40 2014-03-31 03:15:13 | 00:03:40 2014-03-31 15:15:26 | 00:03:40 2014-04-01 05:30:55 | 00:03:39 2014-04-01 18:27:27 | 00:03:41 2014-04-02 06:27:47 | 00:03:41 2014-04-03 06:27:32 | 00:03:40 2014-04-03 20:11:41 | 00:03:39 2014-04-04 08:11:19 | 00:03:41 2014-04-04 20:11:51 | 00:03:40 2014-04-05 07:22:58 | 00:03:40 2014-04-06 08:10:55 | 00:03:37 2014-04-07 03:15:40 | 00:03:38 2014-04-07 15:15:37 | 00:03:38 2014-04-08 05:31:15 | 00:03:40 2014-04-08 19:19:25 | 00:03:40 2014-04-09 03:15:51 | 00:03:39 2014-04-09 19:24:29 | 00:03:39 2014-04-10 06:08:41 | 00:03:38 2014-04-10 15:15:44 | 00:03:39 2014-04-11 03:15:42 | 00:03:39 2014-04-13 03:15:38 | 00:03:40 2014-04-13 15:15:48 | 00:03:40 2014-04-14 03:15:38 | 00:03:41 2014-04-14 15:15:49 | 00:03:40 2014-04-15 03:15:43 | 00:03:41 2014-04-15 15:15:39 | 00:03:42 2014-04-16 03:15:48 | 00:03:42 2014-04-16 20:11:45 | 00:03:41 2014-04-17 07:06:24 | 00:03:42 2014-04-17 16:25:58 | 00:03:42 2014-04-18 08:11:54 | 00:03:41 2014-04-18 15:15:40 | 00:03:41 2014-04-19 15:15:44 | 00:03:41 2014-04-20 03:15:38 | 00:03:41 2014-04-20 15:15:44 | 00:03:41 2014-04-22 05:31:27 | 00:03:42 2014-04-22 15:15:49 | 00:03:41 2014-04-23 03:15:41 | 00:03:40 2014-04-23 18:28:37 | 00:03:40 2014-04-24 06:28:45 | 00:03:39 2014-04-24 15:15:44 | 00:03:41 2014-04-25 08:12:47 | 00:03:42 2014-04-26 03:15:44 | 00:03:42 2014-04-27 16:26:23 | 00:03:41 2014-04-28 04:26:14 | 00:03:42 2014-04-28 20:13:16 | 00:03:41 2014-04-29 15:15:50 | 00:03:41 2014-04-30 05:32:21 | 00:03:43 2014-04-30 20:13:44 | 00:03:42 2014-05-01 05:57:23 | 00:03:40 2014-05-02 08:13:47 | 00:03:43 2014-05-04 15:15:53 | 00:03:44 2014-05-05 15:16:21 | 00:03:44 2014-05-06 08:13:57 | 00:03:42 2014-05-06 19:23:42 | 00:03:43 2014-05-07 08:14:54 | 00:03:43 2014-05-08 08:12:04 | 00:03:40 2014-05-08 20:12:25 | 00:03:41
Re: [HACKERS] Planning counters in pg_stat_statements
On Fri, Jan 19, 2018 at 12:14 AM, Thomas Munrowrote: > On Sat, Jan 13, 2018 at 2:16 PM, Tom Lane wrote: >> What we could/should do instead, I think, is have pgss_planner_hook >> make its own pgss_store() call to log the planning time results >> (which would mean we don't need the added PlannedStmt field at all). >> That would increase the overhead of this feature, which might mean >> that it'd be worth having a pg_stat_statements GUC to enable it. >> But it'd be a whole lot cleaner. > > Ok. It seems a bit strange to have to go through the locking twice, > but I don't have a better idea. First attempt seems to be working. One problem is that pgss_planner_hook doesn't have the source query text. That problem could be solved by adding a query_string argument to the planner hook function type and also planner(), standard_planner(), pg_plan_query(), pg_plan_queries(). I don't know if that change would be acceptable or if there is a better way that doesn't change extern functions that will annoy extension owners. When I tried that it basically worked except for a problem with "PREPARE ... " (what we want to show) vs "" (what my experimental patch now shows -- oops). Something I wondered about but didn't try: we could skip the above problem AND the extra pgss_store() by instead pushing (query ID, planning time) into a backend-private buffer and then later pushing it into shmem when we eventually call pgss_store() for the execution counters. If you think of that as using global variables to pass state between functions just because we didn't structure our program right it sounds terrible, but if you think of it as a dtrace-style per-thread tracing event buffer that improves performance by batching up collected data, it sounds great :-D Unfortunately I'm not going to have bandwidth to figure this out during this commitfest due to other priorities so I'm going to call this "returned with feedback". -- Thomas Munro http://www.enterprisedb.com
For consideration - clarification of multi-dimensional arrays in our docs.
Hey all! This doesn't come up that often but enough that it seems hammering home that multi-dimension <> array-of-array seems warranted. The first and last chuck cover definition and iteration respectively. The second chuck removes the mention of "subarray" since that's what we don't want people to think. If the note seems a bit heavy-handed working it in as a normal paragraph in the same or nearby location would work too. Thoughts? diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml index f4d4a610ef..a2dfd4de0a 100644 --- a/doc/src/sgml/array.sgml +++ b/doc/src/sgml/array.sgml @@ -14,6 +14,12 @@ or domain can be created. + + A multi-dimensional array is not the same as an array-of-arrays. In + particular it is not useful to access a two-dimensional array with + a single dimension - you will be given null instead of an array + containing the elements of the specified row. + Declaration of Array Types @@ -115,7 +121,7 @@ CREATE TABLE tictactoe ( '{{1,2,3},{4,5,6},{7,8,9}}' This constant is a two-dimensional, 3-by-3 array consisting of - three subarrays of integers. + nine integers. @@ -374,6 +380,25 @@ SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol'; (1 row) + + + unnest, and other iteration access of arrays, is performed + depth-first, for each lower dimension all higher dimension cells are returned + before the next lower bound index. For a two-dimensional array this is termed + row-first. + + +SELECT unnest(schedule) FROM sal_emp WHERE name = 'Bill'; + +unnest +-- + meeting + lunch + training + presentation +(4 rows) + + David J. diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml index f4d4a610ef..a2dfd4de0a 100644 --- a/doc/src/sgml/array.sgml +++ b/doc/src/sgml/array.sgml @@ -14,6 +14,12 @@ or domain can be created. + + A multi-dimensional array is not the same as an array-of-arrays. In + particular it is not useful to access a two-dimensional array with + a single dimension - you will be given null instead of an array + containing the elements of the specified row. + Declaration of Array Types @@ -115,7 +121,7 @@ CREATE TABLE tictactoe ( '{{1,2,3},{4,5,6},{7,8,9}}' This constant is a two-dimensional, 3-by-3 array consisting of - three subarrays of integers. + nine integers. @@ -374,6 +380,25 @@ SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol'; (1 row) + + + unnest, and other iteration access of arrays, is performed + depth-first, for each lower dimension all higher dimension cells are returned + before the next lower bound index. For a two-dimensional array this is termed + row-first. + + +SELECT unnest(schedule) FROM sal_emp WHERE name = 'Bill'; + +unnest +-- + meeting + lunch + training + presentation +(4 rows) + +
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Jan 23, 2018 at 1:05 PM, Robert Haaswrote: > I was just talking to Thomas, and one or the other of us realized that > this doesn't completely fix the problem. I think that if a worker > fails, even by some unorthodox mechanism like an abrupt proc_exit(1), > after the point where it attached to the error queue, this patch is > sufficient to make that reliably turn into an error. But what if it > fails before that - e.g. fork() failure? In that case, this process > will catch the problem if the calling process calls > WaitForParallelWorkersToFinish, but does that actually happen in any > interesting cases? Maybe not. > > In the case of Gather, for example, we won't > WaitForParallelWorkersToFinish() until we've read all the tuples. If > there's a tuple queue that does not yet have a sender, then we'll just > wait for it to get one, even if the sender fell victim to a fork > failure and is never going to show up. > > We could *almost* fix this by having execParallel.c include a Barrier > in the DSM, similar to what I proposed for parallel CREATE INDEX. > When a worker starts, it attaches to the barrier; when it exits, it > detaches. Instead of looping until the leader is done and all the > tuple queues are detached, Gather or Gather Merge could loop until the > leader is done and everyone detaches from the barrier. But that would > require changes to the Barrier API, which isn't prepared to have the > caller alternate waiting with other activity like reading the tuple > queues, which would clearly be necessary in this case. Moreover, it > doesn't work at all when parallel_leader_participation=off, because in > that case we'll again just wait forever for some worker to show up, > and if none of them do, then we're hosed. I'm relieved that we all finally seem to be on the same page on this issue. > One idea to fix this is to add a new function in parallel.c with a > name like CheckForFailedWorkers() that Gather and Gather Merge call > just before they WaitLatch(). If a worker fails to start, the > postmaster will send SIGUSR1 to the leader, which will set the latch, > so we can count on the function being run after every worker death. > The function would go through and check each worker for which > pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to > see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) == > BGWH_STOPPED. If so, ERROR. Sounds workable to me. > This lets us *run* for arbitrary periods of time without detecting a > fork failure, but before we *wait* we will notice. Getting more > aggressive than that sounds expensive, but I'm open to ideas. Is that really that different to any other case? I imagine that in practice most interrupt checks happen within the leader in a WaitLatch() loop (or similar). Besides, the kinds of failures we're talking about are incredibly rare in the real world. I think it's fine if the leader is not very promptly aware of when this happens, or if execution becomes slow in some other way. > If we adopt this approach, then Peter's patch could probably make use > of it, too. It's a little tricky because ConditionVariableSleep() > tries not to return spuriously, so we'd either have to change that or > stick CheckForFailedWorkers() into that function's loop. I suspect > the former is a better approach. Or maybe that patch should still use > the Barrier approach and avoid all of this annoyance. I have the same misgivings about using a barrier to solve this problem for parallel CREATE INDEX as before. Specifically: why should nbtsort.c solve this problem for itself? I don't think that it's in any way special. One can imagine exactly the same problem with other parallel DDL implementations, fixable using exactly the same solution. I'm not saying that nbtsort.c shouldn't have to do anything differently; just that it should buy into some general solution. That said, it would be ideal if the current approach I take in nbtsort.c didn't have to be changed *at all*, because if it was 100% correct already then I think we'd have a more robust parallel.h interface (it is certainly a "nice to have" for me). The fact that multiple hackers at least tacitly assumed that nworkers_launched is a reliable indicator of how many workers will show up tells us something. Moreover, parallel_leader_participation=off is always going to be wonky under a regime based on "see who shows up", because failure becomes indistinguishable from high latency -- I'd rather avoid having to think about nbtsort.c in distributed systems terms. -- Peter Geoghegan
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Fri, Jan 19, 2018 at 6:22 AM, Robert Haaswrote: >> (3) >> erm, maybe it's a problem that errors occurring in workers while the >> leader is waiting at a barrier won't unblock the leader (we don't >> detach from barriers on abort/exit) -- I'll look into this. > > I think if there's an ERROR, the general parallelism machinery is > going to arrange to kill every worker, so nothing matters in that case > unless barrier waits ignore interrupts, which I'm pretty sure they > don't. (Also: if they do, I'll hit the ceiling; that would be awful.) (After talking this through with Robert off-list). Right, the CHECK_FOR_INTERRUPTS() in ConditionVariableSleep() handles errors from parallel workers. There is no problem here. -- Thomas Munro http://www.enterprisedb.com
Re: Libpq support to connect to standby server as priority
Hi All, Recently I put a proposal to support 'prefer-read' parameter in target_session_attrs in libpq. Now I updated the patch with adding content in the sgml and regression test case. Some people may have noticed there is already another patch ( https://commitfest.postgresql.org/15/1148/ ) which looks similar with this. But I would say this patch is more complex than my proposal. It is better separate these 2 patches to consider. Regards, Jing Wang Fujitsu Australia libpq_support_perfer-read_002.patch Description: Binary data
Tab complete after FROM ONLY
While reviewing the patch for tab completion after SELECT, I realized that psql doesn't know how to complete after FROM if the ONLY keyword is present. This trivial patch fixes that, as well as JOIN ONLY and TABLE ONLY. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 8bc4a194a5..68b155c778 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3359,7 +3359,7 @@ psql_completion(const char *text, int start, int end) COMPLETE_WITH_CONST("TRANSACTION"); /* TABLE, but not TABLE embedded in other commands */ - else if (Matches1("TABLE")) + else if (Matches1("TABLE") || Matches2("TABLE", "ONLY")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_relations, NULL); /* TABLESAMPLE */ @@ -3455,11 +3455,11 @@ psql_completion(const char *text, int start, int end) /* ... FROM ... */ /* TODO: also include SRF ? */ - else if (TailMatches1("FROM") && !Matches3("COPY|\\copy", MatchAny, "FROM")) + else if ((TailMatches1("FROM") || TailMatches2("FROM", "ONLY")) && !Matches3("COPY|\\copy", MatchAny, "FROM")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tsvmf, NULL); /* ... JOIN ... */ - else if (TailMatches1("JOIN")) + else if (TailMatches1("JOIN") || TailMatches2("JOIN", "ONLY")) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tsvmf, NULL); /* Backslash commands */
Re: Using ProcSignal to get memory context stats from a running backend
Did this go anywhere? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: BRIN multi-range indexes
On 01/23/2018 09:05 PM, Alvaro Herrera wrote: > This stuff sounds pretty nice. However, have a look at this report: > > https://codecov.io/gh/postgresql-cfbot/postgresql/commit/2aa632dae3066900e15d2d42a4aad811dec11f08 > > it seems to me that the new code is not tested at all. Shouldn't you > add a few more tests? > I have a hard time reading the report, but you're right I haven't added any tests for the new opclasses (bloom and minmax_multi). I agree that's something that needs to be addressed. > I think 0004 should apply to unpatched master (except for the parts > that concern files not in master); sounds like a good candidate for > first apply. Then 0001, which seems mostly just refactoring. 0002 and > 0003 are the really interesting ones (minus the code removed by > 0004). > That sounds like a reasonable plan. I'll reorder the patch series along those lines in the next few days. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Jan 23, 2018 at 2:11 PM, Peter Geogheganwrote: > Finally, it's still not clear to me why nodeGather.c's use of > parallel_leader_participation=off doesn't suffer from similar problems > [1]. Thomas and I just concluded that it does. See my email on the other thread just now. I thought that I had the failure cases all nailed down here now, but I guess not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Tue, Jan 23, 2018 at 11:15 AM, Robert Haaswrote: > On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila wrote: >> It does turn such cases into error but only at the end when someone >> calls WaitForParallelWorkersToFinish. If one tries to rely on it at >> any other time, it won't work as I think is the case for Parallel >> Create Index where Peter G. is trying to use it differently. I am not >> 100% sure whether Parallel Create index has this symptom as I have not >> tried it with that patch, but I and Peter are trying to figure that >> out. > > OK. I've committed this patch and back-patched it to 9.6. > Back-patching to 9.5 didn't looks simple because nworkers_launched > doesn't exist on that branch, and it doesn't matter very much anyway > since the infrastructure has no in-core users in 9.5. I was just talking to Thomas, and one or the other of us realized that this doesn't completely fix the problem. I think that if a worker fails, even by some unorthodox mechanism like an abrupt proc_exit(1), after the point where it attached to the error queue, this patch is sufficient to make that reliably turn into an error. But what if it fails before that - e.g. fork() failure? In that case, this process will catch the problem if the calling process calls WaitForParallelWorkersToFinish, but does that actually happen in any interesting cases? Maybe not. In the case of Gather, for example, we won't WaitForParallelWorkersToFinish() until we've read all the tuples. If there's a tuple queue that does not yet have a sender, then we'll just wait for it to get one, even if the sender fell victim to a fork failure and is never going to show up. We could *almost* fix this by having execParallel.c include a Barrier in the DSM, similar to what I proposed for parallel CREATE INDEX. When a worker starts, it attaches to the barrier; when it exits, it detaches. Instead of looping until the leader is done and all the tuple queues are detached, Gather or Gather Merge could loop until the leader is done and everyone detaches from the barrier. But that would require changes to the Barrier API, which isn't prepared to have the caller alternate waiting with other activity like reading the tuple queues, which would clearly be necessary in this case. Moreover, it doesn't work at all when parallel_leader_participation=off, because in that case we'll again just wait forever for some worker to show up, and if none of them do, then we're hosed. One idea to fix this is to add a new function in parallel.c with a name like CheckForFailedWorkers() that Gather and Gather Merge call just before they WaitLatch(). If a worker fails to start, the postmaster will send SIGUSR1 to the leader, which will set the latch, so we can count on the function being run after every worker death. The function would go through and check each worker for which pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, ) == BGWH_STOPPED. If so, ERROR. This lets us *run* for arbitrary periods of time without detecting a fork failure, but before we *wait* we will notice. Getting more aggressive than that sounds expensive, but I'm open to ideas. If we adopt this approach, then Peter's patch could probably make use of it, too. It's a little tricky because ConditionVariableSleep() tries not to return spuriously, so we'd either have to change that or stick CheckForFailedWorkers() into that function's loop. I suspect the former is a better approach. Or maybe that patch should still use the Barrier approach and avoid all of this annoyance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative
On 1/23/18 14:59, Daniel Gustafsson wrote: > It’s not specific to the implementation per se, but it increases the > likelyhood > of hitting it. In order to load certificates from Keychains the cert common > name must be specified in the connstr, when importing the testfiles into > keychains I ran into it for example src/test/ssl/client_ca.config. The change is - 'psql', '-X', '-A', '-t', '-c', "SELECT 'connected with $connstr'", + 'psql', '-X', '-A', '-t', '-c', "SELECT \$\$connected with $connstr\$\$", So the problem must have been a single quote in the connstr. That can surely happen, but then so can having a $$. So without a concrete example, I'm not sure how to proceed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: BRIN multi-range indexes
This stuff sounds pretty nice. However, have a look at this report: https://codecov.io/gh/postgresql-cfbot/postgresql/commit/2aa632dae3066900e15d2d42a4aad811dec11f08 it seems to me that the new code is not tested at all. Shouldn't you add a few more tests? I think 0004 should apply to unpatched master (except for the parts that concern files not in master); sounds like a good candidate for first apply. Then 0001, which seems mostly just refactoring. 0002 and 0003 are the really interesting ones (minus the code removed by 0004). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative
> On 23 Jan 2018, at 18:20, Peter Eisentraut> wrote: > > On 1/21/18 18:08, Daniel Gustafsson wrote: >> As per before, my patch for running tests against another set of binaries is >> included as well as a fix for connstrings with spaces, but with the recent >> hacking by Peter I assume this is superfluous. It was handy for development >> so >> I’ve kept it around though. > > 0002-Allow-running-SSL-tests-against-different-binar-v4.patch should be > obsoleted by f5da5683a86e9fc42fdf3eae2da8b096bda76a8a. Yes. > 0003-Allow-spaces-in-connectionstrings-v4.patch looks reasonable, but > I'm not sure what circumstance is triggering that. Is it specific to > your implementation? It’s not specific to the implementation per se, but it increases the likelyhood of hitting it. In order to load certificates from Keychains the cert common name must be specified in the connstr, when importing the testfiles into keychains I ran into it for example src/test/ssl/client_ca.config. cheers ./daniel
pg_rewind and replication slots
When a primary with replication slots gets reset with pg_rewind, it keeps the replication slots. This does no harm per se, but when it gets promoted again, the replication slots are still there and are in the way. Won't they also hold back the xmin horizon? I think that pg_rewind should delete all replication slots. Yours, Laurenz Albe
Re: pgsql: Add parallel-aware hash joins.
On Mon, Jan 22, 2018 at 6:53 PM, Tom Lanewrote: > Here's a possibly more useful graph of regression test timings over > the last year. I pulled this from the buildfarm database: it is the > reported runtime for the "installcheck-C" step in each successful > build of HEAD on dromedary, going back to Jan. 2017. I picked dromedary > because I know that that machine hasn't gotten any software updates > nor is there anything else very interesting going on on it. I dropped > three or four obvious outlier reports (possibly those ran during the > machine's nightly backup cron job). The reported runtime is only > precise to 1s, and a couple seconds jitter is hardly surprising, so > there's a good deal of noise. Still, it's possible to discern when > I put some effort into test runtime reduction back in April, and > it can be seen that things have gotten notably slower since the > beginning of November. Right, but this doesn't seem to show any big spike in the runtime at the time when parallel hash was committed, or when the preparatory patch to add test coverage for hash joins got committed. Rather, there's a gradual increase over time. Either we're making the server slower (which would be bad) or we're adding proper test coverage for all the new features that we're adding (which would be good). We can't expect every feature patch to preserve the runtime of the tests absolutely unchanged; figuring out what can be optimized is a separate exercise from adding test coverage either for new things or for things that weren't previously covered. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
On 23/01/18 16:01, Nikhil Sontakke wrote: >> I must be missing something in this discussion, cos I don't see any >> problems with this "other option". >> >> Surely we prepare the 2PCxact and then it is persistent. Yes, >> potentially for an unbounded period of time. And it is (usually) up to >> the XA manager to resolve that. 2PC interacts with transaction >> management and yes, it can be slow. But the choice is slow and >> consistent, or not. This would only be used with the full choice of >> the user, just like synchronous_commit. It's not about transaction being persistent but the abort command being blocked. >> >> In this case, we call the decoding plugin's precommit hook which would >> then prepare the 2PCxact and set a non-persistent flag saying it is >> being decoded. If decoding completes normally we release the lock and >> commit. If decoding fails or the DBA has another reason to do so, we >> provide a function that allows the flag to be unlocked. While it is >> locked the 2PCxact cannot be aborted or committed. Output plugin can't deal with precommit, that has to be handled elsewhere but in principle this is true. >> >> There is no danger of accidental abort because the prepare has persistent >> state. > > This concurrent abort handling while decoding is ongoing is turning > out to be complex affair. > > Thinking more about this, just to provide an example, we have a > decoding plugin hook to determine if a GID for a 2PC was decoded at > PREPARE time or COMMIT time as part of the 2PC logical decoding patch. > We need that to determine the *same* static answer every time we see a > specific GID while decoding across restarts; the plugin should know > what it had done the last time around and should tell us the same > later as well. It just occurred to me that as Simon also mentioned, it > could/should also be the decoding plugin's responsibility to indicate > if it's ok to go ahead with the abort of the transaction. > > So, we could consider adding a preAbort hook. That preAbort hook gets > the GID, XID and other parameters as needed and tells us whether we > can go ahead with the abort or if we need to wait out (maybe we pass > in an ok_to_wait param as well). As an example, a plugin could lookup > some shmem structure which points to the current transaction being > decoded and does related processing to ensure that it stops decoding > at a clean juncture, thus keeping the response time bounded to a > maximum of one change record apply cycle. That passes the onus onto > the plugin writers and keeps the core code around this concurrent > abort handling clean. > Having this as responsibility of plugin sounds interesting. It certainly narrows the scope for which we need to solve the abort issue. For 2PC that may be okay as we need to somehow interact with transaction manager as Simon noted. I am not sure if this helps streaming use-case though as there is not going to be any external transaction management involved there. In any case all this interlocking could potentially be made less impact-full by only doing it when we know the transaction did catalog changes prior to currently decoded change (which we do during decoding) since that's the only time we are interested in if it aborted or not. This all leads me to another idea. What if logical decoding provided API for "locking/unlocking" the currently decoded transaction against abort. This function would then be called by both decoding and output plugin before any catalog read. The function can be smart enough to be NOOP if the transaction is not running (ie we are not doing 2PC decoding or streaming) or when the transaction didn't do any catalog modifications (we already have that info easily accessible as bool). That would mean we'd never do any kind of heavy locking for prolonged periods of time (ie network calls) but only during catalog access and only when needed. It would also solve this for both 2PC and streaming and it would be easy to use by plugin authors. Just document that some call should be done before catalog access when in output plugin, can even be Asserted that the call was done probably. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Jan 23, 2018 at 10:50 AM, Peter Geogheganwrote: > On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas wrote: >> I am going to repeat my previous suggest that we use a Barrier here. >> Given the discussion subsequent to my original proposal, this can be a >> lot simpler than what I suggested originally. Each worker does >> BarrierAttach() before beginning to read tuples (exiting if the phase >> returned is non-zero) and BarrierArriveAndDetach() when it's done >> sorting. The leader does BarrierAttach() before launching workers and >> BarrierArriveAndWait() when it's done sorting. If we don't do this, >> we're going to have to invent some other mechanism to count the >> participants that actually initialize successfully, but that seems >> like it's just duplicating code. > > I think that this closes the door to leader non-participation as > anything other than a developer-only debug option, which might be > fine. If parallel_leader_participation=off (or some way of getting the > same behavior through a #define) is to be retained, then an artificial > wait is required as a substitute for the leader's participation as a > worker. This idea of an artificial wait seems pretty grotty to me. If we made it one second, would that be okay with Valgrind builds? And when it wasn't sufficient, wouldn't we be back to waiting forever? Finally, it's still not clear to me why nodeGather.c's use of parallel_leader_participation=off doesn't suffer from similar problems [1]. [1] https://postgr.es/m/CAH2-Wz=cAMX5btE1s=aTz7CLwzpEPm_NsUhAMAo5t5=1i9v...@mail.gmail.com -- Peter Geoghegan
Re: using index or check in ALTER TABLE SET NOT NULL
Hello Stephen I changed the suggested things and added some regression tests. Also i wrote few words to the documentation. New patch is attached. Thank you for feedback! regards, Sergeidiff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 286c7a8..cf2c56f 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -189,6 +189,13 @@ ALTER TABLE [ IF EXISTS ] name + Full table scan is performed to check that no existing row + in the table has null values in given column. It is possible to avoid + this scan by adding a valid CHECK constraint to + the table that would allow only NOT NULL values for given column. + + + If this table is a partition, one cannot perform DROP NOT NULL on a column if it is marked NOT NULL in the parent table. To drop the NOT NULL constraint from all the diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 8adc4ee..6f7ec4a 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1235,7 +1235,7 @@ check_default_allows_bound(Relation parent, Relation default_rel, * not contain any row that would belong to the new partition, we can * avoid scanning the default partition. */ - if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints)) + if (ConstraintImpliedByRelConstraint(default_rel, def_part_constraints)) { ereport(INFO, (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints", @@ -1279,8 +1279,8 @@ check_default_allows_bound(Relation parent, Relation default_rel, * that it will not contain any row that would belong to the new * partition, we can avoid scanning the child table. */ - if (PartConstraintImpliedByRelConstraint(part_rel, - def_part_constraints)) + if (ConstraintImpliedByRelConstraint(part_rel, + def_part_constraints)) { ereport(INFO, (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints", diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 2e768dd..4f0916d 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -162,7 +162,7 @@ typedef struct AlteredTableInfo /* Information saved by Phases 1/2 for Phase 3: */ List *constraints; /* List of NewConstraint */ List *newvals; /* List of NewColumnValue */ - bool new_notnull; /* T if we added new NOT NULL constraints */ + bool verify_new_notnull; /* T if we should recheck NOT NULL */ int rewrite; /* Reason for forced rewrite, if any */ Oid newTableSpace; /* new tablespace; 0 means no change */ bool chgPersistence; /* T if SET LOGGED/UNLOGGED is used */ @@ -377,6 +377,7 @@ static ObjectAddress ATExecDropNotNull(Relation rel, const char *colName, LOCKMO static void ATPrepSetNotNull(Relation rel, bool recurse, bool recursing); static ObjectAddress ATExecSetNotNull(AlteredTableInfo *tab, Relation rel, const char *colName, LOCKMODE lockmode); +static bool NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr); static ObjectAddress ATExecColumnDefault(Relation rel, const char *colName, Node *newDefault, LOCKMODE lockmode); static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName, @@ -4368,7 +4369,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode) * Test the current data within the table against new constraints * generated by ALTER TABLE commands, but don't rebuild data. */ - if (tab->constraints != NIL || tab->new_notnull || + if (tab->constraints != NIL || tab->verify_new_notnull || tab->partition_constraint != NULL) ATRewriteTable(tab, InvalidOid, lockmode); @@ -4529,7 +4530,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) } notnull_attrs = NIL; - if (newrel || tab->new_notnull) + if (newrel || tab->verify_new_notnull) { /* * If we are rebuilding the tuples OR if we added any new NOT NULL @@ -5528,7 +5529,7 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, * null, but since it's filled specially, there's no need to test * anything.) */ - tab->new_notnull |= colDef->is_not_null; + tab->verify_new_notnull |= colDef->is_not_null; } /* @@ -5930,8 +5931,17 @@ ATExecSetNotNull(AlteredTableInfo *tab, Relation rel, CatalogTupleUpdate(attr_rel, >t_self, tuple); - /* Tell Phase 3 it needs to test the constraint */ - tab->new_notnull = true; + /* + * Verifying table data can be done in Phase 3 with single table scan + * for all constraint (both existing and new) + * We can skip this check only if every new NOT NULL contraint + * implied by existed table constraints + */ + if (!NotNullImpliedByRelConstraints(rel, (Form_pg_attribute) GETSTRUCT(tuple))) + { +
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Tue, Jan 23, 2018 at 10:36 AM, Robert Haaswrote: > As Amit says, what remains is the case where fork() fails or the > worker dies before it reaches the line in ParallelWorkerMain that > reads shm_mq_set_sender(mq, MyProc). In those cases, no error will be > signaled until you call WaitForParallelWorkersToFinish(). If you wait > prior to that point for a number of workers equal to > nworkers_launched, you will wait forever in those cases. Another option might be to actually call WaitForParallelWorkersToFinish() in place of a condition variable or barrier, as Amit suggested at one point. > I am going to repeat my previous suggest that we use a Barrier here. > Given the discussion subsequent to my original proposal, this can be a > lot simpler than what I suggested originally. Each worker does > BarrierAttach() before beginning to read tuples (exiting if the phase > returned is non-zero) and BarrierArriveAndDetach() when it's done > sorting. The leader does BarrierAttach() before launching workers and > BarrierArriveAndWait() when it's done sorting. If we don't do this, > we're going to have to invent some other mechanism to count the > participants that actually initialize successfully, but that seems > like it's just duplicating code. I think that this closes the door to leader non-participation as anything other than a developer-only debug option, which might be fine. If parallel_leader_participation=off (or some way of getting the same behavior through a #define) is to be retained, then an artificial wait is required as a substitute for the leader's participation as a worker. -- Peter Geoghegan
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
В письме от 23 января 2018 12:03:50 пользователь Peter Eisentraut написал: > >>> This patch raises error if user tries o set or change toast.* option for > >>> a table that does not have a TOST relation. > > > > There might be a case for raising a warning in this situation, > > but I would disagree with making it a hard error in any case. > > All that's going to accomplish is to break people's scripts and > > get them mad at you. > > It might also be useful to set these reloptions before you add a new > column to a table. As Robert have just said, this will not work with current master. If we, as I've suggested in previous letter, will force TOAST creation if toast.* option is set, then your use case will also work. -- Do code for fun. signature.asc Description: This is a digitally signed message part.
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
On Fri, Jan 19, 2018 at 9:42 AM, Robert Haaswrote: > That's not necessarily an argument against this patch, which by the > way I have not reviewed. Even a 5% speedup on this kind of workload > is potentially worthwhile; everyone likes it when things go faster. > I'm just not convinced you can get very much more than that on a > realistic workload. Of course, I might be wrong. Oh, incidentally -- in our internal testing, we found that wal_sync_method=open_datasync was significantly faster than wal_sync_method=fdatasync. You might find that open_datasync isn't much different from pmem_drain, even though they're both faster than fdatasync. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Remove utils/dsa.h from autovacuum.c
Masahiko Sawada wrote: > Hi, > > Attached a patch for $subject. The implementation of autovacuum > work-item has been changed by commit > 31ae1638ce35c23979f9bcbb92c6bb51744dbccb but the loading of dsa.h > header file is remained. You're right -- pushed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
On Tue, Jan 23, 2018 at 12:03 PM, Peter Eisentrautwrote: > It might also be useful to set these reloptions before you add a new > column to a table. I don't understand. AAUI, it is currently the case that if you set the options before the TOAST table exists, they are lost. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On Mon, Jan 22, 2018 at 10:13 PM, Peter Geogheganwrote: > _bt_leader_heapscan() can detect when workers exit early, at least in > the vast majority of cases. It can do this simply by processing > interrupts and automatically propagating any error -- nothing special > about that. It can also detect when workers have finished > successfully, because of course, that's the main reason for its > existence. What remains, exactly? As Amit says, what remains is the case where fork() fails or the worker dies before it reaches the line in ParallelWorkerMain that reads shm_mq_set_sender(mq, MyProc). In those cases, no error will be signaled until you call WaitForParallelWorkersToFinish(). If you wait prior to that point for a number of workers equal to nworkers_launched, you will wait forever in those cases. I am going to repeat my previous suggest that we use a Barrier here. Given the discussion subsequent to my original proposal, this can be a lot simpler than what I suggested originally. Each worker does BarrierAttach() before beginning to read tuples (exiting if the phase returned is non-zero) and BarrierArriveAndDetach() when it's done sorting. The leader does BarrierAttach() before launching workers and BarrierArriveAndWait() when it's done sorting. If we don't do this, we're going to have to invent some other mechanism to count the participants that actually initialize successfully, but that seems like it's just duplicating code. This proposal has some minor advantages even when no fork() failure or similar occurs. If, for example, one or more workers take a long time to start, the leader doesn't have to wait for them before writing out the index. As soon as all the workers that attached to the Barrier have arrived at the end of phase 0, the leader can build a new tape set from all of the tapes that exist at that time. It does not need to wait for the remaining workers to start up and create empty tapes. This is only a minor advantage since we probably shouldn't be doing CREATE INDEX in parallel in the first place if the index build is so short that this scenario is likely to occur, but we get it basically for free, so why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Possible performance regression in version 10.1 with pgbench read-write tests.
Hi all, When I was trying to do read-write pgbench bench-marking of PostgreSQL 9.6.6 vs 10.1 I found PostgreSQL 10.1 regresses against 9.6.6 in some cases. Non Default settings and test == Server: ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 & Pgbench: CASE 1: when data fits shared buffers. ./pgbench -i -s 1000 postgres CASE 2: when data exceeds shared buffers. ./pgbench -i -s 1000 postgres ./pgbench -c $threads -j $threads -T 1800 -M prepared postgres Script "perf_buff_mgmt_write-2.sh" which is added below can be used to run same. Machine : "cthulhu" 8 node numa machine with 128 hyper threads. === >numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103 node 0 size: 65498 MB node 0 free: 37885 MB node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111 node 1 size: 65536 MB node 1 free: 31215 MB node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119 node 2 size: 65536 MB node 2 free: 15331 MB node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127 node 3 size: 65536 MB node 3 free: 36774 MB node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40 node 4 size: 65536 MB node 4 free: 62 MB node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48 node 5 size: 65536 MB node 5 free: 9653 MB node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56 node 6 size: 65536 MB node 6 free: 50209 MB node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64 node 7 size: 65536 MB node 7 free: 43966 MB CASE 1: In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS : 35554.573858 and in 10.1 at 72 clients TPS dips to 26882.828133 so nearly 23% decrease in TPS. CASE 2: In 9.6.6 peak performance is achieved at 72 concurrent cleints TPS : 24861.074079 and in 10.1 at 72 clients TPS dips to 18372.565663 so nearly 26% decrease in TPS. Added "Postgresql_benchmarking_9.6vs10.ods" which gives more detailed TPS numbers. And, TPS is median of 3 runs result. I have not run bisect yet to find what has caused the issue. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com Postgresql_benchmarking_9.6vs10.ods Description: application/vnd.oasis.opendocument.spreadsheet perf_buff_mgmt_write-2.sh Description: Bourne shell script
Re: Doc tweak for huge_pages?
On Mon, Jan 22, 2018 at 7:23 AM, Justin Pryzbywrote: > Consider this shorter, less-severe sounding alternative: > "... (but note that this feature can degrade performance of some > PostgreSQL workloads)." I think the patch looks good now. As Justin mentions, as far as I see the only arguable piece is how strong the language should be against Linux THP. On one hand it can be argued that warning about THP issues is not the job of this patch. On the other hand this patch does say more about THP and Googling does bring up a lot of trouble and advice to disable THP, including: https://www.postgresql.org/message-id/CANQNgOrD02f8mR3Y8Pi=zfsol14rqnqa8hwz1r4rsndlr1b...@mail.gmail.com https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge The RedHat article above says "However, THP is not recommended for database workloads." I'll leave this to the committer and switch this patch to Ready for Committer. By the way, Fedora 27 does disable THP by default, they deviate from upstream in this regard: [catalin@fedie scripts]$ cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never [catalin@fedie scripts]$ grep TRANSPARENT /boot/config-4.14.13-300.fc27.x86_64 CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y CONFIG_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y CONFIG_TRANSPARENT_HUGE_PAGECACHE=y When I have some time I'll try to do some digging into history of the Fedora kernel package to see if they provide a rationale for changing the default. That might hint whether it's likely that future RHEL will change as well.
Re: [PATCH] Logical decoding of TRUNCATE
Hi, On 23/01/18 18:47, Marco Nenciarini wrote: > Il 23/01/18 18:25, Petr Jelinek ha scritto: >> On 23/01/18 18:19, Marco Nenciarini wrote: >>> Il 23/01/18 18:13, Petr Jelinek ha scritto: Hi, On 23/01/18 15:38, Marco Nenciarini wrote: > Il 22/01/18 23:18, Petr Jelinek ha scritto: >> On 22/01/18 19:45, Petr Jelinek wrote: >> >> Actually on second look, I don't like the new boolean parameter much. >> I'd rather we didn't touch the input list and always close only >> relations opened inside the ExecuteTruncateGuts(). >> >> It may mean more list(s) but the current interface is still not clean. >> > > Now ExecuteTruncateGuts unconditionally closes the relations that it > opens. The caller has now always the responsibility to close the > relations passed with the explicit_rels list. This looks good. > > Version 15 attached. > I see you still do CASCADE on the subscriber though. >>> >>> No it doesn't. The following code in worker.c prevents that. >>> >>> >>> + /* >>> +* Even if we used CASCADE on the upstream master we explicitly >>> +* default to replaying changes without further cascading. >>> +* This might be later changeable with a user specified option. >>> +*/ >>> + cascade = false; >> >> Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always >> instead of this (keeping the comment). Unless you plan to make it option >> as part of this patch, the current coding is confusing. >> > > Ok, Removed. > Great, thanks, I think this is ready now so marking as such. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pgsql: Improve scripting language in pgbench
On 1/10/18 07:36, Fabien COELHO wrote: > I just noticed while rebasing stuff that there is some crust in > "pgbench/t/001_pgbench_with_server.pl" coming from this patch: > > +=head > + > +} }); > + > +=cut > > I cannot find any use for these lines which are ignored by perl execution > anyway. It may be some leftovers from debugging which got past everyone. > If so, I think that it is better removed, see the attached cleanup patch. fixed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
On 23/01/18 18:19, Marco Nenciarini wrote: > Il 23/01/18 18:13, Petr Jelinek ha scritto: >> Hi, >> >> On 23/01/18 15:38, Marco Nenciarini wrote: >>> Il 22/01/18 23:18, Petr Jelinek ha scritto: On 22/01/18 19:45, Petr Jelinek wrote: Actually on second look, I don't like the new boolean parameter much. I'd rather we didn't touch the input list and always close only relations opened inside the ExecuteTruncateGuts(). It may mean more list(s) but the current interface is still not clean. >>> >>> Now ExecuteTruncateGuts unconditionally closes the relations that it >>> opens. The caller has now always the responsibility to close the >>> relations passed with the explicit_rels list. >> >> This looks good. >> >>> >>> Version 15 attached. >>> >> >> I see you still do CASCADE on the subscriber though. >> > > No it doesn't. The following code in worker.c prevents that. > > > + /* > + * Even if we used CASCADE on the upstream master we explicitly > + * default to replaying changes without further cascading. > + * This might be later changeable with a user specified option. > + */ > + cascade = false; Ah, that's pretty ugly, why don't we just use DROP_RESTRICT always instead of this (keeping the comment). Unless you plan to make it option as part of this patch, the current coding is confusing. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
Il 23/01/18 18:13, Petr Jelinek ha scritto: > Hi, > > On 23/01/18 15:38, Marco Nenciarini wrote: >> Il 22/01/18 23:18, Petr Jelinek ha scritto: >>> On 22/01/18 19:45, Petr Jelinek wrote: >>> >>> Actually on second look, I don't like the new boolean parameter much. >>> I'd rather we didn't touch the input list and always close only >>> relations opened inside the ExecuteTruncateGuts(). >>> >>> It may mean more list(s) but the current interface is still not clean. >>> >> >> Now ExecuteTruncateGuts unconditionally closes the relations that it >> opens. The caller has now always the responsibility to close the >> relations passed with the explicit_rels list. > > This looks good. > >> >> Version 15 attached. >> > > I see you still do CASCADE on the subscriber though. > No it doesn't. The following code in worker.c prevents that. + /* +* Even if we used CASCADE on the upstream master we explicitly +* default to replaying changes without further cascading. +* This might be later changeable with a user specified option. +*/ + cascade = false; There is also a test that check it works as intended: + # should not cascade on replica + $node_publisher->safe_psql('postgres', "TRUNCATE tab_rep CASCADE"); + + $node_publisher->wait_for_catchup($appname); + + $result = $node_subscriber->safe_psql('postgres', + "SELECT count(*), min(a), max(a) FROM tab_notrep_fk"); + is($result, qq(1|-1|-1), + 'check replicated truncate does not cascade on replica'); + + $result = $node_subscriber->safe_psql('postgres', + "SELECT nextval('seq_notrep')"); + is($result, qq(102), + 'check replicated truncate does not restart identities'); + Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative
On 1/21/18 18:08, Daniel Gustafsson wrote: > As per before, my patch for running tests against another set of binaries is > included as well as a fix for connstrings with spaces, but with the recent > hacking by Peter I assume this is superfluous. It was handy for development > so > I’ve kept it around though. 0002-Allow-running-SSL-tests-against-different-binar-v4.patch should be obsoleted by f5da5683a86e9fc42fdf3eae2da8b096bda76a8a. 0003-Allow-spaces-in-connectionstrings-v4.patch looks reasonable, but I'm not sure what circumstance is triggering that. Is it specific to your implementation? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Logical decoding of TRUNCATE
Hi, On 23/01/18 15:38, Marco Nenciarini wrote: > Il 22/01/18 23:18, Petr Jelinek ha scritto: >> On 22/01/18 19:45, Petr Jelinek wrote: >> >> Actually on second look, I don't like the new boolean parameter much. >> I'd rather we didn't touch the input list and always close only >> relations opened inside the ExecuteTruncateGuts(). >> >> It may mean more list(s) but the current interface is still not clean. >> > > Now ExecuteTruncateGuts unconditionally closes the relations that it > opens. The caller has now always the responsibility to close the > relations passed with the explicit_rels list. This looks good. > > Version 15 attached. > I see you still do CASCADE on the subscriber though. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Handling better supported channel binding types for SSL implementations
On 1/22/18 02:29, Michael Paquier wrote: > However there is as well the argument that this list's contents are not > directly used now, and based on what I saw from the MacOS SSL and GnuTLS > patches that would not be the case after either. Right, there is no facility for negotiating the channel binding type, so a boolean result should be enough. In which case we wouldn't actually need this for GnuTLS yet. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Test-cases for exclusion constraints is missing in alter_table.sql file
On 1/17/18 23:57, Ashutosh Sharma wrote: > By elsewhere do you mean in the contrib module 'sepgsql' > (sepgsql/sql/ddl.sql) because other than that i didn't find any other > places having ALTER TABLE ... ADD CONSTRAINT ... EXCLUDE ... sort of > sql queries other than the files which i mentioned in my earlier > email. Thanks. There is ALTER TABLE partitioned ADD EXCLUDE USING gist (a WITH &&); in alter_table.sql. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist
On 1/18/18 18:42, Tom Lane wrote: > Robert Haaswrites: >> On Wed, Jan 17, 2018 at 3:50 PM, Nikolay Shaplov wrote: >>> This patch raises error if user tries o set or change toast.* option for a >>> table that does not have a TOST relation. > There might be a case for raising a warning in this situation, > but I would disagree with making it a hard error in any case. > All that's going to accomplish is to break people's scripts and > get them mad at you. It might also be useful to set these reloptions before you add a new column to a table. So I think this change is not desirable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PATCH: Configurable file mode mask
On 1/23/18 09:33, David Steele wrote: > What if I update pg_upgrade/test.sh to optionally allow group > permissions and we configure an animal to test it if it gets committed? > > It's not ideal, I know, but it would get the permissions patch over the > line and is consistent with how we currently test pg_upgrade. Basically, what you'd need is some way to pass some options to the initdb invocations in the pg_upgrade test script, right? That would seem reasonable, and also useful for testing upgrading with other initdb options. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
Bruce Momjianwrites: > Oh, I see what you mean. I was just worried that some code might expect > template1 to always have an oid of 1, but we can just call that code > broken. Ever since we invented template0, it's been possible to drop and recreate template1, so I'd say any expectation that it must have OID 1 has been wrong for a long time. This change will just make the situation more common. regards, tom lane
Re: [PATCH] Logical decoding of TRUNCATE
Il 22/01/18 23:18, Petr Jelinek ha scritto: > On 22/01/18 19:45, Petr Jelinek wrote: > > Actually on second look, I don't like the new boolean parameter much. > I'd rather we didn't touch the input list and always close only > relations opened inside the ExecuteTruncateGuts(). > > It may mean more list(s) but the current interface is still not clean. > Now ExecuteTruncateGuts unconditionally closes the relations that it opens. The caller has now always the responsibility to close the relations passed with the explicit_rels list. Version 15 attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 2e768dd5e4..bdce4164d6 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 1404,1419 ExecuteTruncate(TruncateStmt *stmt) } /* !* Check foreign key references. In CASCADE mode, this should be !* unnecessary since we just pulled in all the references; but as a !* cross-check, do it anyway if in an Assert-enabled build. */ #ifdef USE_ASSERT_CHECKING - heap_truncate_check_FKs(rels, false); - #else - if (stmt->behavior == DROP_RESTRICT) heap_truncate_check_FKs(rels, false); #endif /* * If we are asked to restart sequences, find all the sequences, lock them --- 1404,1427 } /* !* Suppress foreign key references check if session replication role is !* set to REPLICA. */ + if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA) + { + + /* +* Check foreign key references. In CASCADE mode, this should be +* unnecessary since we just pulled in all the references; but as a +* cross-check, do it anyway if in an Assert-enabled build. +*/ #ifdef USE_ASSERT_CHECKING heap_truncate_check_FKs(rels, false); + #else + if (stmt->behavior == DROP_RESTRICT) + heap_truncate_check_FKs(rels, false); #endif + } /* * If we are asked to restart sequences, find all the sequences, lock them diff --git a/src/test/regress/expected/index d967e8dd21..86748430c5 100644 *** a/src/test/regress/expected/truncate.out --- b/src/test/regress/expected/truncate.out *** *** 68,73 HINT: Truncate table "trunc_b" at the same time, or use TRUNCATE ... CASCADE. --- 68,77 TRUNCATE TABLE truncate_a CASCADE; -- ok NOTICE: truncate cascades to table "trunc_b" NOTICE: truncate cascades to table "trunc_e" + -- Ignore foreign-key checks with session_replication_role = replica + SET session_replication_role = replica; + TRUNCATE TABLE truncate_a;-- ok + RESET session_replication_role; -- circular references ALTER TABLE truncate_a ADD FOREIGN KEY (col1) REFERENCES trunc_c; -- Add some data to verify that truncating actually works ... diff --git a/src/test/regress/sql/truncate.sqindex fbd1d1a8a5..0d0a3705d2 100644 *** a/src/test/regress/sql/truncate.sql --- b/src/test/regress/sql/truncate.sql *** *** 33,38 TRUNCATE TABLE trunc_c,trunc_d,trunc_e,truncate_a,trunc_b; -- ok --- 33,43 TRUNCATE TABLE truncate_a RESTRICT; -- fail TRUNCATE TABLE truncate_a CASCADE; -- ok + -- Ignore foreign-key checks with session_replication_role = replica + SET session_replication_role = replica; + TRUNCATE TABLE truncate_a;-- ok + RESET session_replication_role; + -- circular references ALTER TABLE truncate_a ADD FOREIGN KEY (col1) REFERENCES trunc_c; diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out index 1e22c1eefc..d1feea4909 100644 *** a/contrib/test_decoding/expected/ddl.out --- b/contrib/test_decoding/expected/ddl.out *** *** 543,548 UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2; --- 543,550 UPDATE table_with_unique_not_null SET id = -id; UPDATE table_with_unique_not_null SET id = -id; DELETE FROM table_with_unique_not_null WHERE data = 3; + TRUNCATE table_with_unique_not_null; + TRUNCATE table_with_unique_not_null, table_with_unique_not_null RESTART IDENTITY CASCADE; -- check toast support BEGIN; CREATE SEQUENCE toasttable_rand_seq START 79 INCREMENT 1499; -- portable "random" *** *** 660,665 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc --- 662,673 table public.table_with_unique_not_null: DELETE: id[integer]:4 COMMIT BEGIN + table public.table_with_unique_not_null: TRUNCATE: (no-flags) + COMMIT + BEGIN + table public.table_with_unique_not_null: TRUNCATE: restart_seqs cascade + COMMIT + BEGIN table public.toasttable: INSERT:
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Stephen Frostwrites: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> The most obvious hack is just to exclude PL objects with OIDs below 16384 >> from the dump. An issue with that is that it'd lose any nondefault >> changes to the ACL for plpgsql, which seems like a problem. On the >> other hand, I think the rule we have in place for the public schema is >> preventing dumping local adjustments to that, and there have been no >> complaints. (Maybe I'm wrong and the initacl mechanism handles that >> case? If so, seems like we could get it to handle local changes to >> plpgsql's ACL as well.) > Both the public schema and plpgsql's ACLs should be handled properly > (with local changes preserved) through the pg_init_privs system. I'm > not sure what you're referring to regarding a rule preventing dumping > local adjustments to the public schema, as far as I can recall we've > basically always supported that. I went looking and realized that actually what we're interested in here is the plpgsql extension, not the plpgsql language ... and in fact the behavior I was thinking of is already there, except for some reason it's only applied during binary upgrade. So I think we should give serious consideration to the attached, which just removes the binary_upgrade condition (and adjusts nearby comments). With this, I get empty dumps for the default state of template1 and postgres, which is what one really would expect. And testing shows that if you modify the ACL of language plpgsql, that does still come through as expected. (Note: this breaks the parts of the pg_dump regression tests that expect to see "CREATE LANGUAGE plpgsql". I've not bothered to update those, pending the decision on whether to do this.) regards, tom lane diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d65ea54..87b15a1 100644 *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** selectDumpableAccessMethod(AccessMethodI *** 1617,1637 * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Normally, we dump all extensions, or none of them if include_everything ! * is false (i.e., a --schema or --table switch was given). However, in ! * binary-upgrade mode it's necessary to skip built-in extensions, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for from-initdb extensions, to allow users to ! * change permissions on those objects, if they wish to, and have those ! * changes preserved. */ ! if (dopt->binary_upgrade && extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains = --- 1617,1637 * selectDumpableExtension: policy-setting subroutine * Mark an extension as to be dumped or not * ! * Built-in extensions should be skipped except for checking ACLs, since we * assume those will already be installed in the target database. We identify * such extensions by their having OIDs in the range reserved for initdb. + * We dump all user-added extensions by default, or none of them if + * include_everything is false (i.e., a --schema or --table switch was given). */ static void selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) { /* ! * Use DUMP_COMPONENT_ACL for built-in extensions, to allow users to ! * change permissions on their member objects, if they wish to, and have ! * those changes preserved. */ ! if (extinfo->dobj.catId.oid <= (Oid) g_last_builtin_oid) extinfo->dobj.dump = extinfo->dobj.dump_contains = DUMP_COMPONENT_ACL; else extinfo->dobj.dump = extinfo->dobj.dump_contains =
Re: PATCH: Configurable file mode mask
On 1/19/18 16:54, Alvaro Herrera wrote: > If the only problem is that buildfarm would run tests twice, then I > think we should just press forward with this regardless of that: it > seems a chicken-and-egg problem, because buildfarm cannot be upgraded to > use some new test method if the method doesn't exist yet. As a > solution, let's just live with some run duplication for a period until > the machines are upgraded to a future buildfarm client code. Well, people protested against that approach. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Fwd: Regarding ambulkdelete, amvacuumcleanup index methods
-- Forwarded message -- From: "Abinaya k"Date: Jan 23, 2018 1:43 PM Subject: Regarding ambulkdelete, amvacuumcleanup index methods To: Cc: Hai all, We are building In-memory index extension for postgres. We would capture table inserts, updates, deletes using triggers. During vacuum operation, postgres would give calls to ambulkdelete, amvacuumcleanup (as part of index cleanup). As we handle all updates, deletes using triggers, we don't have to do any index cleanup in ambulkdelete. But, what stats should i return from ambulkdelete and amvacuumcleanup? Is that necessary to return stats from ambulkdelete and amvacuumcleanup ?
Re: [HACKERS] taking stdbool.h into use
On 1/16/18 01:14, Michael Paquier wrote: >> I don't see it either. I think the actually useful parts of that patch >> were to declare record_image_cmp's result correctly as int rather than >> bool, and to cope with varlena fields of unequal size. Unfortunately >> there seems to be no contemporaneous mailing list discussion, so >> it's not clear what Kevin based this change on. > > This was a hotfix after a buildfarm breakage, the base commit being > f566515. > >> Want to try reverting the GET_X_BYTES() parts of it and see if the >> buildfarm complains? > > So, Peter, are you planning to do so? Here is a proposed patch set to clean this up. First, add some test coverage for record_image_cmp. (There wasn't any, only for record_image_eq as part of MV testing.) Then, remove the GET_ macros from record_image_{eq,cmp}. And finally remove all that byte-masking stuff from postgres.h. >> Note if that if we simplify the GetDatum macros, it's possible that >> record_image_cmp would change behavior, since it might now see signed not >> unsigned values depending on whether the casts do sign extension or not. >> Not sure if that'd be a problem. I wasn't able to construct such a case. Everything still works unsigned end-to-end, I think. But if you can think of a case, we can throw it into the tests. The tests already contain cases of comparing positive and negative integers. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 686b2a6f2c0a455dccbecf07d163af5d6f9c9e88 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 23 Jan 2018 10:13:45 -0500 Subject: [PATCH 1/3] Add tests for record_image_eq and record_image_cmp record_image_eq was covered a bit by the materialized view code that it is meant to support, but record_image_cmp was not tested at all. --- src/test/regress/expected/rowtypes.out | 161 + src/test/regress/sql/rowtypes.sql | 53 +++ 2 files changed, 214 insertions(+) diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out index a4bac8e3b5..e3c23a41cd 100644 --- a/src/test/regress/expected/rowtypes.out +++ b/src/test/regress/expected/rowtypes.out @@ -369,6 +369,167 @@ LINE 1: select * from cc order by f1; ^ HINT: Use an explicit ordering operator or modify the query. -- +-- Tests for record_image_{eq,cmp} +-- +create type testtype1 as (a int, b int); +-- all true +select row(1, 2)::testtype1 *< row(1, 3)::testtype1; + ?column? +-- + t +(1 row) + +select row(1, 2)::testtype1 *<= row(1, 3)::testtype1; + ?column? +-- + t +(1 row) + +select row(1, 2)::testtype1 *= row(1, 2)::testtype1; + ?column? +-- + t +(1 row) + +select row(1, 2)::testtype1 *<> row(1, 3)::testtype1; + ?column? +-- + t +(1 row) + +select row(1, 3)::testtype1 *>= row(1, 2)::testtype1; + ?column? +-- + t +(1 row) + +select row(1, 3)::testtype1 *> row(1, 2)::testtype1; + ?column? +-- + t +(1 row) + +-- all false +select row(1, -2)::testtype1 *< row(1, -3)::testtype1; + ?column? +-- + f +(1 row) + +select row(1, -2)::testtype1 *<= row(1, -3)::testtype1; + ?column? +-- + f +(1 row) + +select row(1, -2)::testtype1 *= row(1, -3)::testtype1; + ?column? +-- + f +(1 row) + +select row(1, -2)::testtype1 *<> row(1, -2)::testtype1; + ?column? +-- + f +(1 row) + +select row(1, -3)::testtype1 *>= row(1, -2)::testtype1; + ?column? +-- + f +(1 row) + +select row(1, -3)::testtype1 *> row(1, -2)::testtype1; + ?column? +-- + f +(1 row) + +-- This returns the "wrong" order because record_image_cmp works on +-- unsigned datums without knowing about the actual data type. +select row(1, -2)::testtype1 *< row(1, 3)::testtype1; + ?column? +-- + f +(1 row) + +-- other types +create type testtype2 as (a smallint, b bool); -- byval different sizes +select row(1, true)::testtype2 *< row(2, true)::testtype2; + ?column? +-- + t +(1 row) + +select row(-2, true)::testtype2 *< row(-1, true)::testtype2; + ?column? +-- + t +(1 row) + +select row(0, false)::testtype2 *< row(0, true)::testtype2; + ?column? +-- + t +(1 row) + +select row(0, false)::testtype2 *<> row(0, true)::testtype2; + ?column? +-- + t +(1 row) + +create type testtype3 as (a int, b text); -- variable length +select row(1, 'abc')::testtype3 *< row(1, 'abd')::testtype3; + ?column? +-- + t +(1 row) + +select row(1, 'abc')::testtype3 *< row(1, 'abcd')::testtype3; + ?column? +-- + t +(1 row) + +select row(1, 'abc')::testtype3 *> row(1, 'abd')::testtype3; + ?column? +-- + f +(1 row) + +select row(1, 'abc')::testtype3 *<> row(1, 'abd')::testtype3; + ?column? +-- + t +(1 row) + +create type testtype4 as (a int, b point); -- by ref, fixed length +select row(1, '(1,2)')::testtype4 *< row(1,
Re: [HACKERS] PoC: full merge join on comparison clause
Greetings Alexander, * Alexander Kuzmenkov (a.kuzmen...@postgrespro.ru) wrote: > Here is the patch rebased to a852cfe9. Thanks for updating it. This would definitely be nice to have. Ashutosh, thanks for your previous review, would you have a chance to look at it again? Would be great to at least get this to ready for committer before the end of this CF. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Allow UPDATE to move rows between partitions.
On Tue, Jan 23, 2018 at 8:44 AM, Amit Kapilawrote: > On Sat, Jan 20, 2018 at 2:03 AM, Robert Haas wrote: >> Allow UPDATE to move rows between partitions. > > +If an UPDATE on a partitioned table causes a row to > move > +to another partition, it will be performed as a DELETE > +from the original partition followed by an INSERT into > +the new partition. In this case, all row-level BEFORE > +UPDATE triggers and all row-level > +BEFORE DELETE triggers are fired on > +the original partition. > > Do we need to maintain triggers related behavior for logical > replication? In logical replication, we use ExecSimpleRelationDelete > to perform Delete operation which is not aware of this special > behavior (execute before update trigger for this case). Hmm. I don't think there's any way for the logical decoding infrastructure to identify this case at present. I suppose if we want that behavior, we'd need to modify the WAL format, and the changes might not be too straightforward because the after-image of the tuple wouldn't be available in the DELETE record. I think the only reason we fire the UPDATE triggers is because we can't decide until after they've finished executing that we really want to DELETE and INSERT instead; by the time we are replicating the changes, we know what the final shape of the operation ended up being, so it's not clear to me that firing UPDATE triggers at that point would be useful. I fear that trying to change this is going to cost performance (and developer time) for no real benefit, so my gut feeling is to leave it alone. However, what do other people think? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum
Haribabu Kommiwrites: > On Tue, Jan 23, 2018 at 8:56 AM, Tom Lane wrote: >> I'm thinking we'd still need to do what I said in the previous message, >> and change pg_dump so that the restore session will absorb ALTER DATABASE >> settings before proceeding. Otherwise, at minimum, we have a hazard of >> differing behaviors in serial and parallel restores. It might be that >> lc_monetary is the only GUC that makes a real difference for this purpose, >> but I haven't got much faith in that proposition at the moment. > Yes, restore session should absorb all the ALTER DATABASE and ALTER ROLE > IN DATABASE settings also to make sure that the target database is in same > state when the dump has started. I've pushed a patch to do that. > currently "default_transaction_read_only" is the only GUC that affects the > absorbing the restore of a database. Well, that's exactly the $64 question. Are we sure we have a complete, reliable list of which GUCs do or do not affect data restoration? I wouldn't count on it. If nothing else, extensions might have custom GUCs that we could not predict the behavior of. > As you said in upthread, I feel splitting them into two _TOC entries and > dump as last commands of each database? Does it have any problem with > parallel restore? As I said yesterday, I'm not really eager to do that. It's a lot of complexity and a continuing maintenance headache to solve a use-case that I find pretty debatable. Anyone who's actually put in "default_transaction_read_only = on" in a way that restricts their maintenance roles must have created procedures for undoing it easily; otherwise day-to-day maintenance would be a problem. Further, I don't find the original hack's distinction between pg_dump and pg_dumpall to be really convincing. Maybe that says we should resolve this by just sticking "SET default_transaction_read_only = off" into regular pg_dump output after all --- perhaps with a switch added to enable it. But I'd kind of like to see the argument why we should worry about this at all made afresh. regards, tom lane
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapilawrote: > It does turn such cases into error but only at the end when someone > calls WaitForParallelWorkersToFinish. If one tries to rely on it at > any other time, it won't work as I think is the case for Parallel > Create Index where Peter G. is trying to use it differently. I am not > 100% sure whether Parallel Create index has this symptom as I have not > tried it with that patch, but I and Peter are trying to figure that > out. OK. I've committed this patch and back-patched it to 9.6. Back-patching to 9.5 didn't looks simple because nworkers_launched doesn't exist on that branch, and it doesn't matter very much anyway since the infrastructure has no in-core users in 9.5. I'll respond to the other thread about the issues specific to parallel CREATE INDEX. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
2018-01-22 23:15 GMT+01:00 Stephen Frost: > Pavel, > > * Pavel Stehule (pavel.steh...@gmail.com) wrote: > > here is a GUC based patch for plancache controlling. Looks so this code > is > > working. > > This really could use a new thread, imv. This thread is a year old and > about a completely different feature than what you've implemented here. > true, but now it is too late > > It is hard to create regress tests. Any ideas? > > Honestly, my idea for this would be to add another option to EXPLAIN > which would make it spit out generic and custom plans, or something of > that sort. > > I looked there, but it needs cycle dependency between CachedPlan and PlannedStmt. It needs more code than this patch and code will not be nicer. I try to do some game with prepared statements > Please review your patches before sending them and don't send in patches > which have random unrelated whitespace changes. > > > diff --git a/src/backend/utils/cache/plancache.c > b/src/backend/utils/cache/plancache.c > > index ad8a82f1e3..cc99cf6dcc 100644 > > --- a/src/backend/utils/cache/plancache.c > > +++ b/src/backend/utils/cache/plancache.c > > @@ -1031,6 +1033,12 @@ choose_custom_plan(CachedPlanSource *plansource, > ParamListInfo boundParams) > > if (IsTransactionStmtPlan(plansource)) > > return false; > > > > + /* See if settings wants to force the decision */ > > + if (plancache_mode & PLANCACHE_FORCE_GENERIC_PLAN) > > + return false; > > + if (plancache_mode & PLANCACHE_FORCE_CUSTOM_PLAN) > > + return true; > > Not a big deal, but I probably would have just expanded the conditional > checks against cursor_options (just below) rather than adding > independent if() statements. > I don't think so proposed change is better - My opinion is not strong - and commiter can change it simply > > > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c > > index ae22185fbd..4ce275e39d 100644 > > --- a/src/backend/utils/misc/guc.c > > +++ b/src/backend/utils/misc/guc.c > > @@ -3916,6 +3923,16 @@ static struct config_enum ConfigureNamesEnum[] = > > NULL, NULL, NULL > > }, > > > > + { > > + {"plancache_mode", PGC_USERSET, QUERY_TUNING_OTHER, > > + gettext_noop("Forces use of custom or generic > plans."), > > + gettext_noop("It can control query plan cache.") > > + }, > > + _mode, > > + PLANCACHE_DEFAULT, plancache_mode_options, > > + NULL, NULL, NULL > > + }, > > That long description is shorter than the short description. How about: > > short: Controls the planner's selection of custom or generic plan. > long: Prepared queries have custom and generic plans and the planner > will attempt to choose which is better; this can be set to override > the default behavior. > changed - thank you for text > > (either that, or just ditch the long description) > > > diff --git a/src/include/utils/plancache.h > b/src/include/utils/plancache.h > > index 87fab19f3c..962895cc1a 100644 > > --- a/src/include/utils/plancache.h > > +++ b/src/include/utils/plancache.h > > @@ -143,7 +143,6 @@ typedef struct CachedPlan > > MemoryContext context; /* context containing this > CachedPlan */ > > } CachedPlan; > > > > - > > extern void InitPlanCache(void); > > extern void ResetPlanCache(void); > > > > Random unrelated whitespace change... > fixed > > This is also missing documentation updates. > I wrote some basic doc, but native speaker should to write more words about used strategies. > Setting to Waiting for Author, but with those changes I would think we > could mark it ready-for-committer, it seems pretty straight-forward to > me and there seems to be general agreement (now) that it's worthwhile to > have. > > Thanks! > attached updated patch Regards Pavel > Stephen > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 45b2af14eb..d6cef97ca9 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4387,6 +4387,30 @@ SELECT * FROM parent WHERE key = 2400; + + + + + Plan Cache Options + + + + + plancache_mode (enum) + + plancache_mode configuration parameter + + + + +Prepared queries have custom and generic plans and the planner +will attempt to choose which is better; this can be set to override +the default behavior. The allowed values are default, +force_custom_plan and force_generic_plan. + + + + diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 8d7d8e04c9..1f8884896b 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -106,6 +106,8 @@ static void PlanCacheRelCallback(Datum arg, Oid relid); static void
Re: Invalidation pg catalog cache in trigger functions
Sorry, I forgot to show version of PostgreSQL: https://www.postgresql.org/message-id/CAAqA9PQXEmG%3Dk3WpDTmHZL-VKcMpDEA3ZC06Qr0ASO3oTA7bdw%40mail.gmail.com "Invalidation pg catalog cache in trigger functions" was detected on "PostgreSQL 9.2.18 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit" Also it can be reproduced on PostgreSQL 9.2.24 For newer versions it is not actual - PostgreSQL 9.3.20, 9.4.15, 9.4.12 But Issue https://www.postgresql.org/message-id/20171030125345.1448.24...@wrigleys.postgresql.org "BUG #14879: Bug connected with table structure modification and trigger function query plan invalidation" Is actual for PostgreSQL 9.2.24; 9.3.20; 9.4.12; 9.4.15 In this thread I propose to find out the cause of Issue https://www.postgresql.org/message-id/20171030125345.1448.24...@wrigleys.postgresql.org "BUG #14879: Bug connected with table structure modification and trigger function query plan invalidation" -- Konstantin Evteev
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
On Sun, Jan 21, 2018 at 11:02:29AM -0500, Tom Lane wrote: > Bruce Momjianwrites: > > On Fri, Jan 19, 2018 at 02:54:25PM -0500, Tom Lane wrote: > >> Command was: DROP DATABASE "template1"; > > > Uh, the oid of the template1 database is 1, and I assume we would want > > to preserve that too. > > I don't feel any huge attachment to that. In the first place, under > this proposal recreating template1 is something you would only need to do > if you weren't satisfied with its default properties as set by initdb. > Which ought to be a minority of users. In the second place, if you are > changing those properties from the way initdb set it up, it's not really > virgin template1 anymore, so why shouldn't it have a new OID? Oh, I see what you mean. I was just worried that some code might expect template1 to always have an oid of 1, but we can just call that code broken. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +