[HACKERS] [parallel query] random server crash while running tpc-h query on power2
Hi All, Recently while running tpc-h queries on postgresql master branch, I am noticed random server crash. Most of the time server crash coming while turn tpch query number 3 - (but its very random). Call Stack of server crash: (gdb) bt #0 0x102aa9ac in ExplainNode (planstate=0x1001a7471d8, ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0, es=0x1001a65ad48) at explain.c:1516 #1 0x102aaeb8 in ExplainNode (planstate=0x1001a746b50, ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0, es=0x1001a65ad48) at explain.c:1599 #2 0x102aaeb8 in ExplainNode (planstate=0x1001a7468b8, ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0, es=0x1001a65ad48) at explain.c:1599 #3 0x102aaeb8 in ExplainNode (planstate=0x1001a745e48, ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0, es=0x1001a65ad48) at explain.c:1599 #4 0x102aaeb8 in ExplainNode (planstate=0x1001a745bb0, ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0, es=0x1001a65ad48) at explain.c:1599 #5 0x102aaeb8 in ExplainNode (planstate=0x1001a745870, ancestors=0x1001a745348, relationship=0x0, plan_name=0x0, es=0x1001a65ad48) at explain.c:1599 #6 0x102a81d8 in ExplainPrintPlan (es=0x1001a65ad48, queryDesc=0x1001a7442a0) at explain.c:599 #7 0x102a7e20 in ExplainOnePlan (plannedstmt=0x1001a744208, into=0x0, es=0x1001a65ad48, queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose) select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment = 'BUILD"..., params=0x0, planduration=0x3fffe17c6488) at explain.c:515 #8 0x102a7968 in ExplainOneQuery (query=0x1001a65ab98, into=0x0, es=0x1001a65ad48, queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose) select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment = 'BUILD"..., params=0x0) at explain.c:357 #9 0x102a74b8 in ExplainQuery (stmt=0x1001a6fe468, queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose) select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment = 'BUILD"..., params=0x0, dest=0x1001a65acb0) at explain.c:245 (gdb) p *w $2 = {num_workers = 2139062143, instrument = 0x1001af8a8d0} (gdb) p planstate $3 = (PlanState *) 0x1001a7471d8 (gdb) p *planstate $4 = {type = T_HashJoinState, plan = 0x1001a740730, state = 0x1001a745758, instrument = 0x1001a7514a8, worker_instrument = 0x1001af8a8c8, targetlist = 0x1001a747448, qual = 0x0, lefttree = 0x1001a74a0e8, righttree = 0x1001a74f2f8, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleSlot = 0x1001a750c10, ps_ExprContext = 0x1001a7472f0, ps_ProjInfo = 0x1001a751220, ps_TupFromTlist = 0 '\000'} (gdb) p *planstate->worker_instrument $5 = {num_workers = 2139062143, instrument = 0x1001af8a8d0} (gdb) p n $6 = 13055 (gdb) p n $7 = 13055 Here its clear that work_instrument is either corrupted or Un-inililized that is the reason its ending up with server crash. With bit more debugging and looked at git history I found that issue started coming with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext() calls ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers() calls ExecParallelFinish() which collects the instrumentation before marking ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the allocation of planstate->worker_instrument. With commit af33039317 now we calling the gather_readnext() with per-tuple context, but with nreader == 0 with ExecShutdownGatherWorkers() we end up with allocation of planstate->worker_instrument into per-tuple context - which is wrong. Now fix can be: 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and let ExecEndGather() do that things. But with this change, gather_readread() and gather_getnext() depend on planstate->reader structure to continue reading tuple. Now either we can change those condition to be depend on planstate->nreaders or just pfree(planstate->reader) into gather_readnext() instead of calling ExecShutdownGatherWorkers(). 2) Teach ExecParallelRetrieveInstrumentation() to do allocation of planstate->worker_instrument into proper memory context. Attaching patch, which fix the issue with approach 1). Regards. Rushabh Lathia www.EnterpriseDB.com diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c index 438d1b2..bc56f0e 100644 --- a/src/backend/executor/nodeGather.c +++ b/src/backend/executor/nodeGather.c @@ -348,7 +348,8 @@ gather_readnext(GatherState *gatherstate) --gatherstate->nreaders; if
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
On Sat, Aug 13, 2016 at 8:26 AM, Thomas Munrowrote: > On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane wrote: >> amul sul writes: >>> When I am calling dsm_create on Linux using the POSIX DSM implementation >>> can succeed, but result in SIGBUS when later try to access the memory. >>> This happens because of my system does not have enough shm space & current >>> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can >>> we use fallocate system call (i.e. Zero-fill the file) to ensure that all >>> the file space has really been allocated, so that we don't later seg fault >>> when accessing the memory mapping. But here we will endup by loop calling >>> ‘write’ squillions of times. >> >> Wouldn't that just result in a segfault during dsm_create? >> >> I think probably what you are describing here is kernel misbehavior >> akin to memory overcommit. Maybe it *is* memory overcommit and can >> be turned off the same way. If not, you have material for a kernel >> bug fix/enhancement request. > > [...] But it > looks like if we used fallocate or posix_fallocate in the > dsm_impl_posix case we'd get a nice ESPC error, instead of > success-but-later-SIGBUS-on-access. Here's a simple test extension that creates jumbo dsm segments, and then accesses all pages. If you ask it to write cheques that your Linux 3.10 machine can't cash on unpatched master, it does this: postgres=# create extension foo; CREATE EXTENSION postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024); server closed the connection unexpectedly ... LOG: server process (PID 15105) was terminated by signal 7: Bus error If I apply the attached experimental patch I get: postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024); ERROR: could not resize shared memory segment "/PostgreSQL.1938734921" to 17179869184 bytes: No space left on device It should probably be refactored a bit to separate the error messages for ftruncate and posix_fallocate, and we could possibly use the same approach for dsm_impl_mmap instead of that write() loop, but this at least demonstrates the problem Amul reported. Thoughts? -- Thomas Munro http://www.enterprisedb.com posix_fallocate.patch Description: Binary data create-dsm-test.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
Greg Starkwrites: > On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane wrote: >> pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL >> is nonstandard, maybe dropping support for it would be feasible. > Isn't ISNULL one of the lexical tricks we used to effectively give > bison two token lookahead? No, it's a SQL-exposed feature, though I think it's just there for compatibility with some random other DBMS. See also NOTNULL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On Sat, Aug 13, 2016 at 1:18 AM, Andrew Gierthwrote: > > Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to > agree with this. > > (Inserting on the high leaf page is a special case, which is where the > fillfactor logic kicks in; that's why sequentially filled indexes are > (by default) 90% full rather than 100%. But other pages split into > roughly equal halves.) Hm, I was going from this lore. I didn't realize it was only for inserts near the end of the index. That's cleverer than I realized. commit 1663f3383849968415d29965ef9bfdf5aac4d358 Author: Tom Lane Date: Sat Sep 29 23:49:51 2001 + Tweak btree page split logic so that when splitting a page that is rightmost on its tree level, we split 2/3 to the left and 1/3 to the new right page, rather than the even split we use elsewhere. The idea is that when faced with a steadily increasing series of inserted keys (such as sequence or timestamp values), we'll end up with a btree that's about 2/3ds full not 1/2 full, which is much closer to the desired steady-state load for a btree. Per suggestion from Ann Harrison of IBPhoenix. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
> "Greg" == Greg Starkwrites: >> No, because as the pages split, they fill more slowly (because there >> are now more pages). So on average in a large randomly filled index, >> pages spend more time nearer 50% full than 100% full. This is easy >> to demonstrate by creating a table with an indexed float8 column and >> adding batches of random() values to it, checking with pgstatindex >> at intervals - the average leaf density will rarely exceed 70%. >> >> However, worst case conditions can give lower leaf densities; >> obviously the worst case is if the data is loaded in an order and >> quantity that just happens to leave every leaf page recently split. Greg> btree pages don't split 50/50 either. They split biased to assume Greg> the greater side of the split will receive more inserts -- iirc Greg> 70/30. Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to agree with this. (Inserting on the high leaf page is a special case, which is where the fillfactor logic kicks in; that's why sequentially filled indexes are (by default) 90% full rather than 100%. But other pages split into roughly equal halves.) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
> "Tom" == Tom Lanewrites: Tom> But we need to be clear in the documentation about what this Tom> property actually means. My objection to having it answer at the Tom> index or column level is basically that that encourages confusion Tom> as to what it means. OK. Here's new output with the scope changes, and some of the names changed in an attempt to make them clearer: cap | AM | Index | Column ++---+ asc|| | t desc || | f nulls_first|| | f nulls_last || | t orderable || | t distance_orderable || | f returnable || | t search_array || | t search_nulls || | t clusterable|| t | backward_scan || t | index_scan || t | bitmap_scan|| t | can_order | t | | can_unique | t | | can_multi_col | t | | can_exclude| t | | (17 rows) (The can_* names are reserved for the AM level where we're asking about the abstract capabilities of the AM.) Better? Worse? Any more improvements to the names? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On Fri, Aug 12, 2016 at 8:13 PM, Andrew Gierthwrote: > No, because as the pages split, they fill more slowly (because there are > now more pages). So on average in a large randomly filled index, pages > spend more time nearer 50% full than 100% full. This is easy to > demonstrate by creating a table with an indexed float8 column and adding > batches of random() values to it, checking with pgstatindex at intervals - > the average leaf density will rarely exceed 70%. > > However, worst case conditions can give lower leaf densities; obviously > the worst case is if the data is loaded in an order and quantity that > just happens to leave every leaf page recently split. btree pages don't split 50/50 either. They split biased to assume the greater side of the split will receive more inserts -- iirc 70/30. So if they're loaded sequentially you should get a btree that's 70% full but the worst case is in theory closer to 30% though I think the insert order would have to be pretty perverse to be worse than 50%. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
On Fri, Aug 12, 2016 at 7:40 PM, Tom Lanewrote: > pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL > is nonstandard, maybe dropping support for it would be feasible. Isn't ISNULL one of the lexical tricks we used to effectively give bison two token lookahead? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is tuplesort_heap_siftup() a misnomer?
Doesn't tuplesort_heap_siftup() actually shift-down? The Wikipedia article on heaps [1] lists "shift-down" (never "sift down", FWIW) as a common operation on a heap: "shift-down: move a node down in the tree, similar to shift-up; used to restore heap condition after deletion or replacement." This seems to be what tuplesort_heap_siftup() actually does (among other things; its job is to compact the heap following caller returning the root, where caller leaves a "hole" at the root position 0). As it happens, the node that this tuplesort.c function "moves down in the tree" is the memtuple that was previously positioned physically last in the heap portion of the memtuples array. This node is then considered as a "candidate root" (i) for each subheap as we actually shift down, starting from i = 0. Conceptually, we immediately "fill the hole" that caller left in the root with this other tuple (the physically last tuple), with this new tuple/node, then shifted down as needed. (I say "conceptually" because we don't actually bother with an eager swap into the "hole" position, but that's just to avoid a potentially useless swap.) Now, tuplesort_heap_insert() actually does sift (or shift up, if you prefer), which is necessary because it doesn't assume that there is an existing "hole" anywhere in the heap (it just assumes the availability of space to insert the caller's tuple, at the end of the heap's memtuple portion). Or, as Wikipedia describes it: "shift-up: move a node up in the tree, as long as needed; used to restore heap condition after insertion. Called "sift" because node moves up the tree until it reaches the correct level, as in a sieve." I think that tuplesort_heap_insert() is correct, because it doesn't claim to shift at all (or sift, or whatever). I'm just contrasting it with tuplesort_heap_siftup() to make a point. I think that tuplesort_heap_siftup() should probably be renamed to describe its purpose at a higher level, rather than how it shifts, which in any case is an implementation detail. Specifically, I suggest we rename tuplesort_heap_siftup() to tuplesort_heap_compact(), which suggests that it's roughly the opposite of tuplesort_heap_insert(). I think that the contrast between these two functions add clarity. tuplesort_heap_siftup() comments will also need to be updated, since "sift up" is mentioned there. Should I write a patch? [1] https://en.wikipedia.org/wiki/Heap_(data_structure) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pluggable storage
Many have expressed their interest in this topic, but I haven't seen any design of how it should work. Here's my attempt; I've been playing with this for some time now and I think what I propose here is a good initial plan. This will allow us to write permanent table storage that works differently than heapam.c. At this stage, I haven't throught through whether this is going to allow extensions to define new storage modules; I am focusing on AMs that can coexist with heapam in core. The design starts with a new row type in pg_am, of type "s" (for "storage"). The handler function returns a struct of node StorageAmRoutine. This contains functions for 1) scans (beginscan, getnext, endscan) 2) tuples (tuple_insert/update/delete/lock, as well as set_oid, get_xmin and the like), and operations on tuples that are part of slots (tuple_deform, materialize). To support this, we introduce StorageTuple and StorageScanDesc. StorageTuples represent a physical tuple coming from some storage AM. It is necessary to have a pointer to a StorageAmRoutine in order to manipulate the tuple. For heapam.c, a StorageTuple is just a HeapTuple. RelationData gains ->rd_stamroutine which is a pointer to the StorageAmRoutine for the relation in question. Similarly, TupleTableSlot is augmented with a link to the StorageAmRoutine to handle the StorageTuple it contains (probably in most cases it's set at the same time as the tupdesc). This implies that routines such as ExecAssignScanType need to pass down the StorageAmRoutine from the relation to the slot. The executor is modified so that instead of calling heap_insert etc directly, it uses rel->rd_stamroutine to call these methods. The executor is still in charge of dealing with indexes, constraints, and any other thing that's not the tuple storage itself (this is one major point in which this differs from FDWs). This all looks simple enough, with one exception and a few notes: exception a) ExecMaterializeSlot needs special consideration. This is used in two different ways: a1) is the stated "make tuple independent from any underlying storage" point, which is handled by ExecMaterializeSlot itself and calling a method from the storage AM to do any byte copying as needed. ExecMaterializeSlot no longer returns a HeapTuple, because there might not be any. The second usage pattern a2) is to create a HeapTuple that's passed to other modules which only deal with HT and not slots (triggers are the main case I noticed, but I think there are others such as the executor itself wanting tuples as Datum for some reason). For the moment I'm handling this by having a new ExecHeapifyTuple which creates a HeapTuple from a slot, regardless of the original tuple format. note b) EvalPlanQual currently maintains an array of HeapTuple in EState->es_epqTuple. I think it works to replace that with an array of StorageTuples; EvalPlanQualFetch needs to call the StorageAmRoutine methods in order to interact with it. Other than those changes, it seems okay. note c) nodeSubplan has curTuple as a HeapTuple. It seems simple to replace this with an independent slot-based tuple. note d) grp_firstTuple in nodeAgg / nodeSetOp. These are less simple than the above, but replacing the HeapTuple with a slot-based tuple seems doable too. note e) nodeLockRows uses lr_curtuples to feed EvalPlanQual. TupleTableSlot also seems a good replacement. This has fallout in other users of EvalPlanQual, too. note f) More widespread, MinimalTuples currently use a tweaked HeapTuple format. In the long run, it may be possible to replace them with a separate storage module that's specifically designed to handle tuples meant for tuplestores etc. That may simplify TupleTableSlot and execTuples. For the moment we keep the tts_mintuple as it is. Whenever a tuple is not already in heap format, we heapify it in order to put in the store. The current heapam.c routines need some changes. Currently, practice is that heap_insert, heap_multi_insert, heap_fetch, heap_update scribble on their input tuples to set the resulting ItemPointer in tuple->t_self. This is messy if we want StorageTuples to be abstract. I'm changing this so that the resulting ItemPointer is returned in a separate output argument; the tuple itself is left alone. This is somewhat messy in the case of heap_multi_insert because it returns several items; I think it's acceptable to return an array of ItemPointers in the same order as the input tuples. This works fine for the only caller, which is COPY in batch mode. For the other routines, they don't really care where the TID is returned AFAICS. Additional noteworthy items: i) Speculative insertion: the speculative insertion token is no longer installed directly in the heap tuple by the executor (of course). Instead, the token becomes part of the slot. When the tuple_insert method is called, the insertion routine is in charge of setting the token from the slot into the storage tuple. Executor is
[HACKERS] pg_dump with tables created in schemas created by extensions
Hi, About a month or two ago I reported a pg_dump bug regarding tables (and other objects) created inside a schema from an extension. Objects created by the extensions are not dumped, as they will be created once again with the CREATE EXTENSION call, but and other objects which might live inside an object created by the extension should be dumped so they get created inside the same schema. The problem showed up when dumping a DB with PgQ installed as an extension. Check here: https://www.postgresql.org/message-id/d86dd685-1870-cfa0-e5e4-def1f918bec9%402ndquadrant.com and here: https://www.postgresql.org/message-id/409fe594-f4cc-89f5-c0d2-0a921987a864%402ndquadrant.com Some discussion came up on the bugs list on how to fix the issue, and the fact the new tests were needed. I'm attaching a patch to provide such test, which if applied now, returns failure on a number of runs, all expected due to the bug we have at hand. I believe the fix will be simple after the back and forth mails with Michael, Stephen and Tom. I will work on that later, but preferred to have the tests the show the problem which will also make testing the fix easier. Thoughts? Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl new file mode 100644 index fb4f573..6086317 *** a/src/test/modules/test_pg_dump/t/001_base.pl --- b/src/test/modules/test_pg_dump/t/001_base.pl *** my %tests = ( *** 283,288 --- 283,313 schema_only=> 1, section_pre_data => 1, section_post_data => 1, }, }, + 'CREATE TABLE regress_test_schema_table' => { + create_order => 3, + create_sql => 'CREATE TABLE regress_pg_dump_schema.test_schema_table ( + col1 serial primary key, + CHECK (col1 <= 1000) + );', + regexp => qr/^ + \QCREATE TABLE test_schema_table (\E + \n\s+\Qcol1 integer NOT NULL,\E + \n\s+\QCONSTRAINT test_table_col1_check CHECK \E + \Q((col1 <= 1000))\E + \n\);/xm, + like => { + binary_upgrade => 1, + clean => 1, + clean_if_exists=> 1, + createdb => 1, + defaults => 1, + no_privs => 1, + no_owner => 1, + schema_only=> 1, + section_pre_data => 1, }, + unlike => { + pg_dumpall_globals => 1, + section_post_data => 1, }, }, 'CREATE ACCESS METHOD regress_test_am' => { regexp => qr/^ \QCREATE ACCESS METHOD regress_test_am TYPE INDEX HANDLER bthandler;\E diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql new file mode 100644 index c2fe90d..3f88e6c *** a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql --- b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql *** CREATE TABLE regress_pg_dump_table ( *** 10,15 --- 10,24 CREATE SEQUENCE regress_pg_dump_seq; + -- We want to test that schemas and objects created in the schema by the + -- extension are not dumped, yet other objects created afterwards will be + -- dumped. + CREATE SCHEMA regress_pg_dump_schema +CREATE TABLE regress_pg_dump_schema_table ( + col1 serial, + col2 int + ); + GRANT USAGE ON regress_pg_dump_seq TO regress_dump_test_role; GRANT SELECT ON regress_pg_dump_table TO regress_dump_test_role; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
On Sat, Aug 13, 2016 at 2:08 AM, Tom Lanewrote: > amul sul writes: >> When I am calling dsm_create on Linux using the POSIX DSM implementation can >> succeed, but result in SIGBUS when later try to access the memory. This >> happens because of my system does not have enough shm space & current >> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can >> we use fallocate system call (i.e. Zero-fill the file) to ensure that all >> the file space has really been allocated, so that we don't later seg fault >> when accessing the memory mapping. But here we will endup by loop calling >> ‘write’ squillions of times. > > Wouldn't that just result in a segfault during dsm_create? > > I think probably what you are describing here is kernel misbehavior > akin to memory overcommit. Maybe it *is* memory overcommit and can > be turned off the same way. If not, you have material for a kernel > bug fix/enhancement request. I think this may be different from overcommit. In dsm_impl_posix we do shm_open, then ftruncate. That creates a file with a hole. Based on an LKML discussion where someone tried to address this with a patch that was rejected[1], it believe that Linux implements POSIX shmem as a tmpfs file and in this case the file has a hole, which is not the same phenomenon as unallocated virtual memory pages resulting from overcommit policy. In dsm_impl_mmap it looks like we have code to deal with the same problem: we do open, then, ftruncate, and then we explicitly write a bunch of zeros to the file, with this comment: /* * Zero-fill the file. We have to do this the hard way to ensure that * all the file space has really been allocated, so that we don't * later seg fault when accessing the memory mapping. This is pretty * pessimal. */ Maybe we didn't do that for dsm_impl_posix because maybe you can't write to a fd created with shm_open like that, I don't know. But it looks like if we used fallocate or posix_fallocate in the dsm_impl_posix case we'd get a nice ESPC error, instead of success-but-later-SIGBUS-on-access. Whether there is *also* the possibility of overcommit biting you later I don't know, but I suspect that's an independent problem. The OOM killer kills you with SIGKILL, not SIGBUS. [1] https://lkml.org/lkml/2013/7/31/64 -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierthwrites: > "Tom" == Tom Lane writes: > Tom> +1 mostly, but I'm a bit bemused by can_order and can_backward > Tom> having different scopes --- how come? > That's where they were in the previous list, a couple of messages up in > the thread... > Currently can_backward is always the same for all indexes in the same > AM, but I guess there's no guarantee that won't change. In the previous > message you suggested it might have to be per-column, even, but I think > that makes no sense (either the whole index is scannable in both > directions or it is not, it can't be different per column). OK, fuzzy thinking on my part --- I was conflating amcanbackward with whether you can get a descending-order scan. I can imagine an index type in which some opclasses can produce btree-ordered output and some can't, in which case "orderable" has to be per-index and maybe even per-column. (If, say, the leading column is orderable and the next is not, then you could expect to produce output that is sorted by the first column but not by the first two.) However, the ability to back up in a scan is entirely independent of that; see the hash AM, which sets amcanbackward even though its output has no definable order. Right offhand it's pretty hard to see how amcanbackward wouldn't be dependent on AM only. However, we earlier posited that the only AM-level properties should be those needed to figure out what you can say in CREATE INDEX, so if we believe that then can_backward is properly positioned. > Tom> In particular, I'm not sure what you intend to mean by applying > Tom> can_unique at the index or column level. Is that supposed to mean > Tom> that the index or column *is* unique? > No. (We could add properties like is_unique, is_exclusion which clients > currently query in pg_index; should we?) No, I don't feel a need to replace that. But we need to be clear in the documentation about what this property actually means. My objection to having it answer at the index or column level is basically that that encourages confusion as to what it means. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
> "Tom" == Tom Lanewrites: >> This table shows what properties are exposed at the AM-wide level, >> the per-index level and the per-column level. Tom> +1 mostly, but I'm a bit bemused by can_order and can_backward Tom> having different scopes --- how come? That's where they were in the previous list, a couple of messages up in the thread... Currently can_backward is always the same for all indexes in the same AM, but I guess there's no guarantee that won't change. In the previous message you suggested it might have to be per-column, even, but I think that makes no sense (either the whole index is scannable in both directions or it is not, it can't be different per column). Tom> Also, not sure about allowing things like can_multi_col as index Tom> or column properties. That seems a bit silly: whether or not the Tom> AM can do multi columns, a specific index is what it is. I'd be a Tom> bit inclined to have those return null except at the AM level. I thought it would be cleaner to be able to query all properties at the most specific level. Tom> In particular, I'm not sure what you intend to mean by applying Tom> can_unique at the index or column level. Is that supposed to mean Tom> that the index or column *is* unique? No. (We could add properties like is_unique, is_exclusion which clients currently query in pg_index; should we?) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
> "Tom" == Tom Lanewrites: >> distance_orderable now returns true/false depending on the opclass, >> not just on the amcanorderbyop field. In order to do this, I've >> added an optional amproperty function to the AM api, which if it >> exists, gets first dibs on all property calls so it can override the >> result as it sees fit. Tom> Hmm, seems like for that case, it'd be easier to look into pg_amop Tom> and see if the opclass has any suitably-marked operators. I thought about that, but it seemed like it could get painful. The planner is working forwards from a known operator and matching it against the index column, whereas we'd have to work backwards from the opfamily, and there's no good existing index for this; in the presence of binary-compatible types, I don't think even amoplefttype can be assumed (e.g. a varchar column can be ordered by pg_trgm's <-> operator which is declared for text). So it'd have to be the equivalent of: get index column's opclass oid look it up in pg_opclass to get opfamily for r in select * from pg_amop where amopfamily=? and amoppurpose='o' if r.amoplefttype is binary-coercible from the index column's type then return true As opposed to what I have now, which is: get index column's opclass oid look it up in pg_opclass to get opfamily/opcintype result = SearchSysCacheExists4(AMPROCNUM, ...) (in theory this could produce a false positive if there's a distance function but no actual operators to reference it, but I think that's the opclass author's issue) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
Robert Haaswrites: > On Fri, Aug 12, 2016 at 3:22 PM, Tom Lane wrote: >> Hm. I think that sounds a lot easier than it actually is. As an example, >> this would mean that we'd want such a search_path setting to apply during >> parse analysis of a function's body, but not during planning, because it >> should not apply during inlining or const-folding of another function. >> On the other hand, if someone tried to "SET enable_seqscan = off" with >> this new scope (a highly reasonable thing to do), that certainly should >> apply during planning. > Mmm. Maybe this hypothetical new facility should confine itself to > search_path specifically. Well, that would be sad, because examples like enable_seqscan seem about as useful as search_path. Assuming that the implementation involved something like calling guc.c to tell it to push and later pop a new semantic level (within which outer single-level GUC settings don't apply), it's possible that the problem mentioned above could be solved by having the planner push a semantic level during inline_function and simplify_function. In normal execution you could perhaps do it in fmgr.c's handlers. Not sure whether we'd need to have spi.c know about it in order to make PL functions work nicely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Fri, Aug 12, 2016 at 3:03 PM, Robert Haaswrote: > On Thu, Aug 11, 2016 at 8:34 AM, David G. Johnston > wrote: > > I don't have a fundamental issue with saying "when turning auto-commit on > > you are also requesting that the open transaction, if there is one, is > > committed immediately." I'm more inclined to think an error is the > correct > > solution - or to respond in a way conditional to the present usage > > (interactive vs. script). I disagree with Robert's unsubstantiated > belief > > regarding ON_ERROR_STOP and think that it captures the relevant > user-intent > > for this behavior as well. > > I'll substantiate my belief by referring to you for the documentation > for ON_ERROR_STOP, which says: > > "By default, command processing continues after an error. When this > variable is set to on, processing will instead stop immediately. In > interactive mode, psql will return to the command prompt; otherwise, > psql will exit, returning error code 3 to distinguish this case from > fatal error conditions, which are reported using error code 1. In > either case, any currently running scripts (the top-level script, if > any, and any other scripts which it may have in invoked) will be > terminated immediately. If the top-level command string contained > multiple SQL commands, processing will stop with the current command." > > In every existing case, ON_ERROR_STOP affects whether we continue to > process further commands after an error has already occurred. Your > proposal would involve changing things so that the value ON_ERROR_STOP > affects not only *how errors are handled* but *whether they happen in > the first place* -- but only in this one really specific case, and no > others. > > This isn't really an arguable point, even if you want to try to > pretend otherwise. ON_ERROR_STOP should affect whether we stop on > error, not whether generate an error in the first place. The clue is > in the name. > > Changing AUTOCOMMIT to ON while in a transaction is a psql error - period. If ON_ERROR_STOP is on we stop. This meets the current semantics for ON_ERROR_STOP. With ON_ERROR_STOP off psql is going to continue on with the next command. I'd suggest changing things so that psql can, depending upon the error, invoke additional commands to bring the system into a known good state before the next user command is executed. In the case of "\set AUTOCOMMIT on" this additional command would be COMMIT. We can still report the error before continuing on - so there is no affecting the "generating [of] an error in the first place.". Allowing command-specific responses to errors when faced with script continuation would be a change. I do not think it would be a bad one. Am I stretching a bit here? Sure. Is it worth stretching to avoid adding more knobs to the system? Maybe. I'll admit I haven't tried to find fault with the idea (or discover better alternatives) nor how it would look in implementation. As a user, though, it would make sense if the system behaved in this way. That only AUTOCOMMIT needs this capability at the moment doesn't bother me. I'm also fine with making it an error and moving on - but if you want to accommodate both possibilities this seems like a cleaner solution than yet another environment variable that a user would need to consider. David J.
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
On Fri, Aug 12, 2016 at 3:22 PM, Tom Lanewrote: > Robert Haas writes: >> Let's introduce a new variant of SET that only affects the lexical >> scope of the function to which it is attached, and then do what you >> said. That would be full of win, because actually I think in nearly >> every case that's the behavior people actually want. > > Hm. I think that sounds a lot easier than it actually is. As an example, > this would mean that we'd want such a search_path setting to apply during > parse analysis of a function's body, but not during planning, because it > should not apply during inlining or const-folding of another function. > On the other hand, if someone tried to "SET enable_seqscan = off" with > this new scope (a highly reasonable thing to do), that certainly should > apply during planning. Mmm. Maybe this hypothetical new facility should confine itself to search_path specifically. > It might be practical to make it work, but it will be ticklish to > get the scope of the settings to be non-surprising. Yeah, it's certainly not the sort of thing I'm going to crank out before breakfast some morning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future
On Wed, Aug 10, 2016 at 4:54 PM, Peter Geogheganwrote: > On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: >> My view on this - currently anyway - is that we shouldn't conflate the >> tuplesort with the subsequent index generation, but that we should try >> to use parallelism within the tuplesort itself to the greatest extent >> possible. If there is a single output stream that the leader uses to >> generate the final index, then none of the above problems arise. They >> only arise if you've got multiple processes actually writing to the >> index. > > I'm not sure if you're agreeing with my contention about parallel > CREATE INDEX not being a good target for partitioning here. Are you? No. I agree that writing to the index in parallel is bad, but I think it's entirely reasonable to try to set things up so that the leader does as little of the final merge work itself as possible, instead offloading that to workers. Unless, of course, we can prove that the overhead of the final merge pass is so low that it doesn't matter whether we offload it. > While all this speculation about choice of algorithm is fun, > realistically I'm not gong to write the patch for a rainy day (nor for > parallel CREATE INDEX, at least until we become very comfortable with > all the issues I raise, which could never happen). I'd be happy to > consider helping you improve parallel query by providing > infrastructure like this, but I need someone else to write the client > of the infrastructure (e.g. a parallel merge join patch), or to at > least agree to meet me half way with an interdependent prototype of > their own. It's going to be messy, and we'll have to do a bit of > stumbling to get to a good place. I can sign up to that if I'm not the > only one that has to stumble. Fair enough. > Serial merging still needs work, it seems. At the risk of stating the obvious, improving serial execution performance is always superior to comparable gains originating from parallelism, so no complaints here about work in that area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
Robert Haaswrites: > Let's introduce a new variant of SET that only affects the lexical > scope of the function to which it is attached, and then do what you > said. That would be full of win, because actually I think in nearly > every case that's the behavior people actually want. Hm. I think that sounds a lot easier than it actually is. As an example, this would mean that we'd want such a search_path setting to apply during parse analysis of a function's body, but not during planning, because it should not apply during inlining or const-folding of another function. On the other hand, if someone tried to "SET enable_seqscan = off" with this new scope (a highly reasonable thing to do), that certainly should apply during planning. It might be practical to make it work, but it will be ticklish to get the scope of the settings to be non-surprising. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
> "Jeff" == Jeff Janeswrites: Jeff> But shouldn't that still leave us with a 75% full index, rather Jeff> than slightly over 50% full? Average is usually about 67%-70%. (For capacity estimation I always assume 66% for a non-sequentially-filled btree.) Jeff> The leaf pages start at 50%, grow to 100%, then split back to Jeff> 50%, then grow back to 100%. So the average should be about 75%. No, because as the pages split, they fill more slowly (because there are now more pages). So on average in a large randomly filled index, pages spend more time nearer 50% full than 100% full. This is easy to demonstrate by creating a table with an indexed float8 column and adding batches of random() values to it, checking with pgstatindex at intervals - the average leaf density will rarely exceed 70%. However, worst case conditions can give lower leaf densities; obviously the worst case is if the data is loaded in an order and quantity that just happens to leave every leaf page recently split. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON
On Thu, Aug 11, 2016 at 8:34 AM, David G. Johnstonwrote: > I don't have a fundamental issue with saying "when turning auto-commit on > you are also requesting that the open transaction, if there is one, is > committed immediately." I'm more inclined to think an error is the correct > solution - or to respond in a way conditional to the present usage > (interactive vs. script). I disagree with Robert's unsubstantiated belief > regarding ON_ERROR_STOP and think that it captures the relevant user-intent > for this behavior as well. I'll substantiate my belief by referring to you for the documentation for ON_ERROR_STOP, which says: "By default, command processing continues after an error. When this variable is set to on, processing will instead stop immediately. In interactive mode, psql will return to the command prompt; otherwise, psql will exit, returning error code 3 to distinguish this case from fatal error conditions, which are reported using error code 1. In either case, any currently running scripts (the top-level script, if any, and any other scripts which it may have in invoked) will be terminated immediately. If the top-level command string contained multiple SQL commands, processing will stop with the current command." In every existing case, ON_ERROR_STOP affects whether we continue to process further commands after an error has already occurred. Your proposal would involve changing things so that the value ON_ERROR_STOP affects not only *how errors are handled* but *whether they happen in the first place* -- but only in this one really specific case, and no others. This isn't really an arguable point, even if you want to try to pretend otherwise. ON_ERROR_STOP should affect whether we stop on error, not whether generate an error in the first place. The clue is in the name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
Andrew Gierthwrites: > This table shows what properties are exposed at the AM-wide level, the > per-index level and the per-column level. +1 mostly, but I'm a bit bemused by can_order and can_backward having different scopes --- how come? Also, not sure about allowing things like can_multi_col as index or column properties. That seems a bit silly: whether or not the AM can do multi columns, a specific index is what it is. I'd be a bit inclined to have those return null except at the AM level. In particular, I'm not sure what you intend to mean by applying can_unique at the index or column level. Is that supposed to mean that the index or column *is* unique? If so, I'd prefer a different name for that, on the same principle that "orderable" is not the same thing as "can_order". > distance_orderable now returns true/false depending on the opclass, not > just on the amcanorderbyop field. In order to do this, I've added an > optional amproperty function to the AM api, which if it exists, gets > first dibs on all property calls so it can override the result as it > sees fit. Hmm, seems like for that case, it'd be easier to look into pg_amop and see if the opclass has any suitably-marked operators. The AMs that support this would have to duplicate such code anyway, so it seems better to just have one copy. Or we could leave it to the client to do that join, which is the sort of approach that Stephen was advocating anyway, IIUC. Adding an AM override method might be a good idea anyway, but I'm not convinced that it's an appropriate way to address this particular point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?
On Thu, Mar 10, 2016 at 11:48 AM, Tom Lanewrote: > Robert Haas writes: >> Hmm. The meaning of funcs.inline depends on the search_path, not just >> during dump restoration but all the time. So anything uses it under a >> different search_path setting than the normal one will have this kind >> of problem; not just dump/restore. > > Yeah, I see no reason to claim that this is a dump/restore-specific > problem. > >> I don't have a very good idea what to do about that. > > The safe way to write SQL-language functions to be search-path-proof > is to schema-qualify the names in them, or to add a "SET search_path" > clause to the function definition. > > The problem with the latter approach is that it defeats inlining. > I thought for a minute that maybe we could teach the planner to do > inlining anyway by parsing the function body with the adjusted > search_path, but that doesn't really preserve the same semantics > (a SET would change the environment for called functions too). Let's introduce a new variant of SET that only affects the lexical scope of the function to which it is attached, and then do what you said. That would be full of win, because actually I think in nearly every case that's the behavior people actually want. There's a reason why (for example) Perl started out with dynamic scoping (local) and then eventually introduced lexical scoping (my): it's because lexical scoping makes writing correct programs easier to a greater degree than dynamic scoping. That's basically the same problem we're hitting here - and not only here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
I wrote: > Robert Haaswrites: >> I think I experimented with this a while ago and found that even after >> removing postfix operators there was at least one other grammar >> problem that prevented us from accepting ColLabel there. I gave up >> and didn't dig further, but maybe we should. > Yes, it would be good to find that out. I poked at that a little bit, by dint of changing - | a_expr IDENT + | a_expr ColLabel in the target_el production and then seeing what Bison complained about. The majority of the problems boil down to variants of this: state 997 1691 character: CHARACTER . opt_varying VARYING shift, and go to state 1597 VARYING [reduce using rule 1698 (opt_varying)] $default reduce using rule 1698 (opt_varying) opt_varying go to state 1600 What this is telling us is that given input like, say, SELECT 'foo'::character varying Bison is no longer sure whether "varying" is meant as a type name modifier or a ColLabel. And indeed there is *no* principled answer to that that doesn't involve giving up the ability for "varying" to be a ColLabel. Just promoting it to a fully reserved word (which it is not today) wouldn't be enough, because right now even fully reserved words can be ColLabels. Another problem is here: state 1846 1762 a_expr: a_expr ISNULL . 2418 type_func_name_keyword: ISNULL . $end reduce using rule 1762 (a_expr) $end [reduce using rule 2418 (type_func_name_keyword)] pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL is nonstandard, maybe dropping support for it would be feasible. There are some other problems that look probably fixable with refactoring, but AFAICS the ones above are fundamental. So we basically can't have "you can use anything at all as a ColLabel without AS". We could probably somewhat reduce the set of words you're not allowed to use that way, but we could not even promise that all words that are currently unreserved would work. It's likely that by rejiggering the precedence of productions, we could resolve these ambiguities in favor of "it's not a ColLabel if it appears in a context like this". And probably that wouldn't be too surprising most of the time. But depending on precedence to resolve fundamentally ambiguous grammar is always gonna bite you sometimes. People understand it (... usually ...) in the context of parsing multi-operator expressions, but for other things it's not a great tool. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No longer possible to query catalogs for index capabilities?
So I'm tidying up and doing docs for the next version of this patch, but here for comment is the current functionality: select cap, pg_indexam_has_property(a.oid, cap) as "AM", pg_index_has_property('onek_hundred'::regclass, cap) as "Index", pg_index_column_has_property('onek_hundred'::regclass, 1, cap) as "Column" from pg_am a, unnest(array['asc', 'desc', 'nulls_first', 'nulls_last', 'orderable', 'distance_orderable', 'can_order', 'can_unique', 'can_multi_col', 'can_exclude', 'can_backward', 'can_cluster', 'index_scan', 'bitmap_scan', 'can_return', 'search_array', 'search_nulls']) with ordinality as u(cap,ord) where a.amname='btree' order by ord; cap | AM | Index | Column ++---+ asc|| | t desc || | f nulls_first|| | f nulls_last || | t orderable || | t distance_orderable || | f can_order | t | t | t can_unique | t | t | t can_multi_col | t | t | t can_exclude| t | t | t can_backward || t | t can_cluster|| t | t index_scan || t | t bitmap_scan|| t | t can_return || | t search_array || | t search_nulls || | t (17 rows) This table shows what properties are exposed at the AM-wide level, the per-index level and the per-column level. distance_orderable now returns true/false depending on the opclass, not just on the amcanorderbyop field. In order to do this, I've added an optional amproperty function to the AM api, which if it exists, gets first dibs on all property calls so it can override the result as it sees fit. can_return likewise reflects the result of index_can_return. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
Robert Haaswrites: > Half a percent for two productions is not bad, but I think the real > win would be in removing ambiguity from the grammar. We get periodic > complaints about the fact that things like "SELECT 3 cache" don't work > because cache is an unreserved keyword, and postfix operators are one > of the reasons why we can't do better: Agreed, if postfix operators were the only thing standing between us and fixing that, it would be a pretty strong argument for removing them. > I think I experimented with this a while ago and found that even after > removing postfix operators there was at least one other grammar > problem that prevented us from accepting ColLabel there. I gave up > and didn't dig further, but maybe we should. Yes, it would be good to find that out. I think there's a whole bunch of intertwined issues there, though; this isn't likely to be an easy change. The comment at gram.y lines 679ff lists several things that are relevant, and might or might not be simplifiable without postfix ops. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
On Fri, Aug 12, 2016 at 12:57 PM, Tom Lanewrote: > Robert Haas writes: >> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark wrote: >>> I wonder whether it's really worth keeping postfix operators. They >>> seem to keep causing these kinds of headaches and I wonder how much >>> the grammar tables would be simplified by removing them. > >> I've wondered the same thing at other times. The only postfix >> operator we ship in core is numeric_fac, and frankly it doesn't seem >> worth it just for that. If we decided that factorial(n) was adequate >> notation for that case, and that we didn't care about any hypothetical >> user-defined postfix operators either, I think we simplify or improve >> a number of things. > > [ quick experiment... ] Simply removing the two postfix-operator > productions produces no meaningful savings (~0.5%), which is unsurprising > because after all they're just two more productions to Bison. It's > possible we could save more by simplifying the existing hacks elsewhere > in the grammar that were needed to work around ambiguities induced by > postfix operators. But that would take quite a bit of actual work, > and I suspect we'd end up with a similar result that the tables don't > actually get much smaller. You could argue for this on the grounds of > some reduced intellectual complexity in gram.y, and more forcefully on > the grounds of removing user surprise in cases like Jim's (especially if > you can find some other cases where it matters). But I doubt that we'd > get any kind of noticeable run-time or code-size win. Half a percent for two productions is not bad, but I think the real win would be in removing ambiguity from the grammar. We get periodic complaints about the fact that things like "SELECT 3 cache" don't work because cache is an unreserved keyword, and postfix operators are one of the reasons why we can't do better: /* * We support omitting AS only for column labels that aren't * any known keyword. There is an ambiguity against postfix * operators: is "a ! b" an infix expression, or a postfix * expression and a column label? We prefer to resolve this * as an infix expression, which we accomplish by assigning * IDENT a precedence higher than POSTFIXOP. */ I think I experimented with this a while ago and found that even after removing postfix operators there was at least one other grammar problem that prevented us from accepting ColLabel there. I gave up and didn't dig further, but maybe we should. I sort of like the elegance of supporting user-defied prefix and postfix operators, but in practice this is a not-especially-infrequent problem for people migrating from other databases, a consideration that might be judged to outweigh that elegance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwoodwrote: > After examining the benchmark design - I see we are probably not being > helped by the repeated insertion of keys all of form 'userxxx' leading > to some page splitting. But shouldn't that still leave us with a 75% full index, rather than slightly over 50% full? The leaf pages start at 50%, grow to 100%, then split back to 50%, then grow back to 100%. So the average should be about 75%. > However your index rebuild gets you from 5 to 3 GB - does that really help > performance significantly? It can make a big difference, depending on how much RAM you have. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries
Karan Sikkawrites: >> Having said that, I've had a bee in my bonnet for a long time about >> removing per-row setup cost for repetitive regex matches, and >> whatever infrastructure that needs would work for this too. > What are the per-row setup costs for regex matches? Well, they're pretty darn high if you have more active regexps than will fit in that cache, and even if you don't, the cache lookup seems a bit inefficient. What I'd really like to do is get rid of that cache in favor of having a way to treat a precompiled regexp as a constant. I think this is probably possible via inventing a "regexp" datatype, which we make the declared RHS input type for the ~ operator, and give it an implicit cast from text so that existing queries don't break. The compiled regexp tree structure contains pointers so it could never go to disk, but now that we have the "expanded datum" infrastructure you could imagine that the on-disk representation is the same as text but we support adding a compiled tree to it in-memory. Or maybe we just need a smarter cache mechanism in regexp.c. A cache like that might be the only way to deal with a query using variable patterns (e.g, pattern argument coming from a table column). But it seems like basically the wrong approach for the common case of a constant pattern. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries
> Having said that, I've had a bee in my bonnet for a long time about > removing per-row setup cost for repetitive regex matches, and > whatever infrastructure that needs would work for this too. What are the per-row setup costs for regex matches? I looked at `regexp.c` and saw: ``` /* * We cache precompiled regular expressions using a "self organizing list" * structure, in which recently-used items tend to be near the front. * Whenever we use an entry, it's moved up to the front of the list. * Over time, an item's average position corresponds to its frequency of use. * ``` What proverbial bee did you have in your bonnet about the current regex implementation? Which functions other than `strpos` and `LIKE` would benefit from a similar cache, or perhaps a query-scoped cache? In the mean time I'll look at other TODOs that catch my interest. Feel free to point me in the direction of one that you think is both desirable and easy enough for a beginner. Thanks! Karan
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
On Fri, Aug 12, 2016 at 1:55 PM, amul sulwrote: > No segfault during dsm_create, mmap returns the memory address which is > inaccessible. > > Let me see how can I disable kernel overcommit behaviour, but IMHO, we > should prevent ourselves from crashing, shouldn't we? Overcommit can be caught at dsm_create time with an mlock call (note, that MAP_LOCKED isn't the same). Even a transient lock (mlock + munlock) could prevent sigbus due to overcommit. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
Robert Haaswrites: > On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark wrote: >> I wonder whether it's really worth keeping postfix operators. They >> seem to keep causing these kinds of headaches and I wonder how much >> the grammar tables would be simplified by removing them. > I've wondered the same thing at other times. The only postfix > operator we ship in core is numeric_fac, and frankly it doesn't seem > worth it just for that. If we decided that factorial(n) was adequate > notation for that case, and that we didn't care about any hypothetical > user-defined postfix operators either, I think we simplify or improve > a number of things. [ quick experiment... ] Simply removing the two postfix-operator productions produces no meaningful savings (~0.5%), which is unsurprising because after all they're just two more productions to Bison. It's possible we could save more by simplifying the existing hacks elsewhere in the grammar that were needed to work around ambiguities induced by postfix operators. But that would take quite a bit of actual work, and I suspect we'd end up with a similar result that the tables don't actually get much smaller. You could argue for this on the grounds of some reduced intellectual complexity in gram.y, and more forcefully on the grounds of removing user surprise in cases like Jim's (especially if you can find some other cases where it matters). But I doubt that we'd get any kind of noticeable run-time or code-size win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
No segfault during dsm_create, mmap returns the memory address which is inaccessible. Let me see how can I disable kernel overcommit behaviour, but IMHO, we should prevent ourselves from crashing, shouldn't we? Regards, Amul Sent from a mobile device. Please excuse brevity and tpyos. On Fri, 12 Aug, 2016 at 7:38 pm, Tom Lanewrote: amul sul writes: > When I am calling dsm_create on Linux using the POSIX DSM implementation can > succeed, but result in SIGBUS when later try to access the memory. This > happens because of my system does not have enough shm space & current > allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can > we use fallocate system call (i.e. Zero-fill the file) to ensure that all the > file space has really been allocated, so that we don't later seg fault when > accessing the memory mapping. But here we will endup by loop calling ‘write’ > squillions of times. Wouldn't that just result in a segfault during dsm_create? I think probably what you are describing here is kernel misbehavior akin to memory overcommit. Maybe it *is* memory overcommit and can be turned off the same way. If not, you have material for a kernel bug fix/enhancement request. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
You're right. Reindex improves the performance of the benchmark workloads dramatically. I'm gathering results and will announce them. But I think we should notice that the results before Reindexing is poorer than MongoDB. It seems that this is because of Btree bloating (not exact expression). The lookup performance for the Btree is most crucial for the results because the workload is select for primary key. So larger Btree could mean less cache hits and slower query performance. I think that PG doesn't pack leaf node to 90% but half for future insertion and because of this PG's btree is larger than MongoDB (I turned off prefix compression of WiredTiger index and block compression for storage.) But MongoDB (actually WiredTiger) seems to do differently. Is my speculation is right? I'm not sure because I didn't review the btree code of PG yet. And I want to point that the loading performance of MongoDB is better than PG. If PG leaves more space for future insertion, then could we get at least faster loading performance? Then can we conclude that we have more chances to improve Btree of PG? Best Regards, On Fri, Aug 12, 2016 at 5:40 PM, Mark Kirkwood < mark.kirkw...@catalyst.net.nz> wrote: > After examining the benchmark design - I see we are probably not being > helped by the repeated insertion of keys all of form 'userxxx' leading > to some page splitting. > > However your index rebuild gets you from 5 to 3 GB - does that really help > performance significantly? > > regards > > Mark > > > On 11/08/16 16:08, Kisung Kim wrote: > >> Thank you for your information. >> Here is the result: >> >> After insertions: >> >> ycsb=# select * from pgstatindex('usertable_pkey'); >> version | tree_level | index_size | root_block_no | internal_pages | >> leaf_pages | empty_pages | deleted_pages | avg_leaf_density | >> leaf_fragmentation >> -+++---+ >> ++-+---+ >> --+ >>2 | 3 | 5488721920 | 44337 | 4464 | >> 665545 | 0 | 0 | 52 | 11 >> (1 row) >> >> After rebuild: >> >> >> ycsb=# select * from pgstatindex('usertable_pkey'); >> version | tree_level | index_size | root_block_no | internal_pages | >> leaf_pages | empty_pages | deleted_pages | avg_leaf_density | >> leaf_fragmentation >> -+++---+ >> ++-+---+ >> --+ >>2 | 3 | 3154296832 | 41827 | 1899 | >> 383146 | 0 | 0 |90.08 | >> 0 >> >> >> It seems like that rebuild has an effect to reduce the number of internal >> and leaf_pages and make more dense leaf pages. >> >> >> >> > -- Bitnine Global Inc., Kisung Kim, Ph.D https://sites.google.com/site/kisungresearch/ E-mail : ks...@bitnine.net Office phone : 070-4800-5890, 408-606-8602 US Mobile phone : 408-805-2192
Re: [HACKERS] Add hint for function named "is"
On Fri, Aug 12, 2016 at 9:40 AM, Greg Starkwrote: > On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane wrote: > >> I think what is happening >> in the trouble case is that since IS has lower precedence than Op, the >> grammar decides it ought to resolve || as a postfix operator, and then >> it effectively has >> ('x' ||) IS ... >> which leaves noplace to go except IS NULL and other IS-something syntaxes. > > I wonder whether it's really worth keeping postfix operators. They > seem to keep causing these kinds of headaches and I wonder how much > the grammar tables would be simplified by removing them. I've wondered the same thing at other times. The only postfix operator we ship in core is numeric_fac, and frankly it doesn't seem worth it just for that. If we decided that factorial(n) was adequate notation for that case, and that we didn't care about any hypothetical user-defined postfix operators either, I think we simplify or improve a number of things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [BUG] Print timing statistics of trigger execution under turned off timing option of EXPLAIN ANALYZE
maksimwrites: > postgres=# EXPLAIN (ANALYZE, timing false) insert INTO foo > values(101, ''); > QUERY PLAN > -- > Insert on foo (cost=0.00..0.01 rows=1 width=36) (actual rows=0 loops=1) > -> Result (cost=0.00..0.01 rows=1 width=36) (actual rows=1 loops=1) > Planning time: 0.038 ms > Trigger unique_foo_c1:*/time=0.000/* calls=1 > Execution time: 340.766 ms > (5 rows) Hmm, yeah, it shouldn't do that. > My patch fixes this problem. Will push, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is box <-> point missing, and box <-> box not indexable?
Andrew Gierthwrites: > point <-> point, circle <-> point and polygon <-> point all exist as > orderable-by-operator operators (in fact they are the only ones by > default). But there's no box <-> point operator at all, and no index > support for box <-> box. > Was this intentional, or just a strange oversight? Seems like an oversight. None of these operators have commutators, but it seems like they all should: <->(point,line) <->(point,lseg) <->(point,box) <->(lseg,line) <->(lseg,box) <->(point,path) <->(circle,polygon) <->(line,box) Dunno about the index-support angle. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why is box <-> point missing, and box <-> box not indexable?
point <-> point, circle <-> point and polygon <-> point all exist as orderable-by-operator operators (in fact they are the only ones by default). But there's no box <-> point operator at all, and no index support for box <-> box. Was this intentional, or just a strange oversight? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
Jim Nasbywrites: > Is there a place in the error reporting path where we'd still have > access to the 'is' token, and have enough control to look for a relevant > function? No. The grammar can't assume that it's being run inside a transaction (consider parsing START TRANSACTION, or ROLLBACK after a failure). So catalog access is out, full stop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
amul sulwrites: > When I am calling dsm_create on Linux using the POSIX DSM implementation can > succeed, but result in SIGBUS when later try to access the memory.  This > happens because of my system does not have enough shm space &  current > allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can > we use fallocate system call (i.e. Zero-fill the file) to ensure that all > the file space has really been allocated, so that we don't later seg fault > when accessing the memory mapping. But here we will endup by loop calling > âwriteâ squillions of times. Wouldn't that just result in a segfault during dsm_create? I think probably what you are describing here is kernel misbehavior akin to memory overcommit. Maybe it *is* memory overcommit and can be turned off the same way. If not, you have material for a kernel bug fix/enhancement request. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add hint for function named "is"
On Thu, Aug 11, 2016 at 10:54 PM, Tom Lanewrote: > I think what is happening > in the trouble case is that since IS has lower precedence than Op, the > grammar decides it ought to resolve || as a postfix operator, and then > it effectively has > ('x' ||) IS ... > which leaves noplace to go except IS NULL and other IS-something syntaxes. I wonder whether it's really worth keeping postfix operators. They seem to keep causing these kinds of headaches and I wonder how much the grammar tables would be simplified by removing them. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new autovacuum criterion for visible pages
On 12/08/16 15:15, Peter Eisentraut wrote: > On 8/11/16 11:59 AM, Jeff Janes wrote: >> Insertions and HOT-updates clear vm bits but don't increment the >> counters that those existing parameters are compared to. >> >> Also, the relationship between number of updated/deleted rows and the >> number of hint-bits cleared can be hard to predict due to possible >> clustering of the updates into the same blocks. So it would be hard >> to know what to set the values to. > > Well, the current threshold formulas aren't an exact science either. > They just trigger autovacuum often enough relative to table size and > activity. Just fudging in the insert and HOT update counters times some > factor might be enough, and it would get this functionality out to all > users without more effort. I have a patch I wrote a while ago that does this. I haven't submitted it yet because the documentation is lacking. I will post it over the weekend (I had planned to do it before the commitfest anyway). -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().
Hi All, When I am calling dsm_create on Linux using the POSIX DSM implementation can succeed, but result in SIGBUS when later try to access the memory. This happens because of my system does not have enough shm space & current allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can we use fallocate system call (i.e. Zero-fill the file) to ensure that all the file space has really been allocated, so that we don't later seg fault when accessing the memory mapping. But here we will endup by loop calling ‘write’ squillions of times. Thoughts/Suggestions ? Similar post: [1] http://uk.comp.os.linux.narkive.com/Ve44sO4i/shared-memory-problem-no-space-at-dev-shm-causes-sigbus Regards,Amul Sul
Re: [HACKERS] new autovacuum criterion for visible pages
On 8/11/16 11:59 AM, Jeff Janes wrote: > Insertions and HOT-updates clear vm bits but don't increment the > counters that those existing parameters are compared to. > > Also, the relationship between number of updated/deleted rows and the > number of hint-bits cleared can be hard to predict due to possible > clustering of the updates into the same blocks. So it would be hard > to know what to set the values to. Well, the current threshold formulas aren't an exact science either. They just trigger autovacuum often enough relative to table size and activity. Just fudging in the insert and HOT update counters times some factor might be enough, and it would get this functionality out to all users without more effort. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Relocation of tuple between release and re-acquire of tuple lock
On Fri, Aug 12, 2016 at 3:15 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > I'm now exploring code working with heap tuples. The following code > in heap_update() catch my eyes. > > if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask, >> *lockmode)) >> { >> LockBuffer(buffer, BUFFER_LOCK_UNLOCK); >> /* acquire tuple lock, if necessary */ >> heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode, >> LockWaitBlock, _tuple_lock); >> /* wait for multixact */ >> MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask, >> relation, _self, XLTW_Update, >> ); >> checked_lockers = true; >> locker_remains = remain != 0; >> LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); >> /* >> * If xwait had just locked the tuple then some other xact >> * could update this tuple before we get to this point. Check >> * for xmax change, and start over if so. >> */ >> if (xmax_infomask_changed(oldtup.t_data->t_infomask, >> infomask) || >> !TransactionIdEquals(HeapTupleGetRawXmax(), >> xwait)) >> goto l2; >> } > > > Is it safe to rely on same oldtup.t_data pointer after release > and re-acquire of buffer content lock? Could the heap tuple be relocated > between lock release and re-acquire? I know that we still hold a buffer > pin and vacuum would wait for pin release. But other heap operations could > still call heap_page_prune() which correspondingly can relocate tuple. > Probably, I'm missing something... > Please, forget it. heap_page_prune_opt() do: > /* OK, try to get exclusive buffer lock */ > if (!ConditionalLockBufferForCleanup(buffer)) > return; Nobody repairs buffer fragmentation while there is a pin. Everything is right. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Relocation of tuple between release and re-acquire of tuple lock
Hackers, I'm now exploring code working with heap tuples. The following code in heap_update() catch my eyes. if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask, > *lockmode)) > { > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > /* acquire tuple lock, if necessary */ > heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode, > LockWaitBlock, _tuple_lock); > /* wait for multixact */ > MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask, > relation, _self, XLTW_Update, > ); > checked_lockers = true; > locker_remains = remain != 0; > LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); > /* > * If xwait had just locked the tuple then some other xact > * could update this tuple before we get to this point. Check > * for xmax change, and start over if so. > */ > if (xmax_infomask_changed(oldtup.t_data->t_infomask, > infomask) || > !TransactionIdEquals(HeapTupleGetRawXmax(), > xwait)) > goto l2; > } Is it safe to rely on same oldtup.t_data pointer after release and re-acquire of buffer content lock? Could the heap tuple be relocated between lock release and re-acquire? I know that we still hold a buffer pin and vacuum would wait for pin release. But other heap operations could still call heap_page_prune() which correspondingly can relocate tuple. Probably, I'm missing something... -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] Fwd: [BUG] Print timing statistics of trigger execution under turned off timing option of EXPLAIN ANALYZE
Hello, hackers! At this moment EXPLAIN ANALYZE with turned off timing option after execution query that expires trigger prints time of total execution of trigger function: postgres=# EXPLAIN (ANALYZE, timing false) insert INTO foo values(101, ''); QUERY PLAN -- Insert on foo (cost=0.00..0.01 rows=1 width=36) (actual rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=36) (actual rows=1 loops=1) Planning time: 0.038 ms Trigger unique_foo_c1:*/time=0.000/* calls=1 Execution time: 340.766 ms (5 rows) My patch fixes this problem. -- Maksim Milyutin Postgres Professional:http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index dbd27e5..808e23e 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -698,8 +698,11 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es) appendStringInfo(es->str, " for constraint %s", conname); if (show_relname) appendStringInfo(es->str, " on %s", relname); - appendStringInfo(es->str, ": time=%.3f calls=%.0f\n", - 1000.0 * instr->total, instr->ntuples); + if (es->timing) +appendStringInfo(es->str, ": time=%.3f calls=%.0f\n", + 1000.0 * instr->total, instr->ntuples); + else +appendStringInfo(es->str, ": calls=%.0f\n", instr->ntuples); } else { @@ -707,7 +710,8 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es) if (conname) ExplainPropertyText("Constraint Name", conname, es); ExplainPropertyText("Relation", relname, es); - ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es); + if (es->timing) +ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es); ExplainPropertyFloat("Calls", instr->ntuples, 0, es); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] New psql prompt substitution %r (m = master, r = replica)
Thank you everyone for your replies! I did some research and apparently the is no need in any patch. As David pointed out what I did could be easily done using \gset: ``` $ cat ~/.psqlrc select (case when pg_is_in_recovery() then 'r' else 'm' end) as mor \gset \set PROMPT1 '%p (%:mor:) =# ' ``` Besides I figured out that replica promotion case could also be properly handled without any patches. In case anyone is interested here is a brief description of a solution. ~/.bash_profile: ``` export PATH="/home/eax/bin:$PATH" alias psql='psql_wrapper' ``` ~/bin/psql_wrapper: ``` #!/usr/bin/env python3 import subprocess import sys arg_string = "" idx = 1 maxidx = len(sys.argv) - 1 while idx <= maxidx: arg_string += "'" + sys.argv[idx] + "' " idx += 1 cmd = """USER_ARGS=$'{}' psql {}""".format( arg_string.replace("'","\\'"), arg_string) subprocess.call(cmd, shell = True) ``` ~/.psqlrc: ``` \set PROMPT1 '%p <%`sh -c "psql $USER_ARGS -A -t -c $\'select case when pg_is_in_recovery() then \\\'replica\\\' else \\\'master\\\' end\'"`> =# ' ``` -- Best regards, Aleksander Alekseev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] COPY vs \copy HINT
On 12 August 2016 at 16:34, Christoph Bergwrote: > > postgres=# COPY x TO '/root/nopermissions'; > > ERROR: could not open file "/root/nopermissions" for writing: Permission > > denied > > HINT: Paths for COPY are on the PostgreSQL server, not the client. You > may > > want psql's \copy or a driver COPY ... FROM STDIN wrapper > > TO STDOUT. > Ahem, whoops. It's the simple things sometimes... Attached amended. > Also, I vaguely get what you wanted to say with "a driver ... > wrapper", but it's pretty nonsensical if one doesn't know about the > protocol details. I don't have a better suggestion now, but I think it > needs rephrasing. I don't like it either, but didn't come up with anything better. The problem is that every driver calls it something different. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From e713a8a7c5373a137d417d0001b11e296b841f5a Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Fri, 12 Aug 2016 15:42:12 +0800 Subject: [PATCH] Emit a HINT when COPY can't find a file Users often get confused between COPY and \copy and try to use client-side paths with COPY. The server of course cannot find the file. Emit a HINT in the most common cases to help users out. --- src/backend/commands/copy.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f45b330..c902e8b 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -1761,7 +1761,9 @@ BeginCopyTo(Relation rel, if (!is_absolute_path(filename)) ereport(ERROR, (errcode(ERRCODE_INVALID_NAME), - errmsg("relative path not allowed for COPY to file"))); + errmsg("relative path not allowed for COPY to file"), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... TO STDOUT wrapper"))); oumask = umask(S_IWGRP | S_IWOTH); cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W); @@ -1770,7 +1772,9 @@ BeginCopyTo(Relation rel, ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\" for writing: %m", -cstate->filename))); +cstate->filename), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... TO STDOUT wrapper"))); if (fstat(fileno(cstate->copy_file), )) ereport(ERROR, @@ -2796,7 +2800,9 @@ BeginCopyFrom(Relation rel, ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\" for reading: %m", -cstate->filename))); +cstate->filename), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper"))); if (fstat(fileno(cstate->copy_file), )) ereport(ERROR, -- 2.5.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
After examining the benchmark design - I see we are probably not being helped by the repeated insertion of keys all of form 'userxxx' leading to some page splitting. However your index rebuild gets you from 5 to 3 GB - does that really help performance significantly? regards Mark On 11/08/16 16:08, Kisung Kim wrote: Thank you for your information. Here is the result: After insertions: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ 2 | 3 | 5488721920 | 44337 | 4464 | 665545 | 0 | 0 | 52 | 11 (1 row) After rebuild: ycsb=# select * from pgstatindex('usertable_pkey'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---+++-+---+--+ 2 | 3 | 3154296832 | 41827 | 1899 | 383146 | 0 | 0 |90.08 | 0 It seems like that rebuild has an effect to reduce the number of internal and leaf_pages and make more dense leaf pages. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] COPY vs \copy HINT
Re: Craig Ringer 2016-08-12> I think we should emit a HINT here, something like: > > ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No > such file or directory' > HINT: Paths for COPY are on the PostgreSQL server, not the client. You may > want psql's \copy or a driver COPY ... FROM STDIN wrapper +1 on the idea. > postgres=# COPY x TO '/root/nopermissions'; > ERROR: could not open file "/root/nopermissions" for writing: Permission > denied > HINT: Paths for COPY are on the PostgreSQL server, not the client. You may > want psql's \copy or a driver COPY ... FROM STDIN wrapper TO STDOUT. Also, I vaguely get what you wanted to say with "a driver ... wrapper", but it's pretty nonsensical if one doesn't know about the protocol details. I don't have a better suggestion now, but I think it needs rephrasing. Christoph signature.asc Description: PGP signature
Re: [HACKERS] new autovacuum criterion for visible pages
On Fri, Aug 12, 2016 at 9:01 AM, Tom Lanewrote: > Michael Paquier writes: >> In short, autovacuum will need to scan by itself the VM of each >> relation and decide based on that. > > That seems like a worthwhile approach to pursue. The VM is supposed to be > small, and if you're worried it isn't, you could sample a few pages of it. > I do not think any of the ideas proposed so far for tracking the > visibility percentage on-the-fly are very tenable. > The one visibility map page can store the information of 32672 heap pages (255MB), but it would be cost if autovacuum scan whole visibility map for all tables. So I think that it's better to provide autovacuum_vacuum_pagevisible_factor as a relopts. And the autovacuum scans or samples the visibility map of table that autovacuum_vacuum_pagevisible_factor is set. Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] COPY vs \copy HINT
Hi all I see this sort of question quite a bit: http://stackoverflow.com/q/38903811/398670 where the user wonders why COPY gemeenten FROM 'D:\CBS_woningcijfers_2014.csv' DELIMITER ';' CSV fails with ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No such file or directory' and as usual, it's because the path is on their local host not the Pg server. I think we should emit a HINT here, something like: ERROR: could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No such file or directory' HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper as a usability improvement. Trivial patch attached. I'm not sure how to avoid the need to translate the string multiple times, or if the tooling will automatically flatten them. I haven't bothered with the stat() failure or isdir cases, since they seem less likely to be the cause of users being confused between client and server. Sample output: postgres=# COPY x FROM '/tmp/somepath'; ERROR: could not open file "/tmp/somepath" for reading: No such file or directory HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper postgres=# COPY x TO '/root/nopermissions'; ERROR: could not open file "/root/nopermissions" for writing: Permission denied HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper postgres=# COPY x TO 'relpath'; ERROR: relative path not allowed for COPY to file HINT: Paths for COPY are on the PostgreSQL server, not the client. You may want psql's \copy or a driver COPY ... FROM STDIN wrapper -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 8e9df8a8874e3ae7446c0e6ebbf4da75889d6027 Mon Sep 17 00:00:00 2001 From: Craig RingerDate: Fri, 12 Aug 2016 15:42:12 +0800 Subject: [PATCH] Emit a HINT when COPY can't find a file Users often get confused between COPY and \copy and try to use client-side paths with COPY. The server of course cannot find the file. Emit a HINT in the most common cases to help users out. --- src/backend/commands/copy.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f45b330..69d2047 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -1761,7 +1761,9 @@ BeginCopyTo(Relation rel, if (!is_absolute_path(filename)) ereport(ERROR, (errcode(ERRCODE_INVALID_NAME), - errmsg("relative path not allowed for COPY to file"))); + errmsg("relative path not allowed for COPY to file"), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper"))); oumask = umask(S_IWGRP | S_IWOTH); cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W); @@ -1770,7 +1772,9 @@ BeginCopyTo(Relation rel, ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\" for writing: %m", -cstate->filename))); +cstate->filename), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper"))); if (fstat(fileno(cstate->copy_file), )) ereport(ERROR, @@ -2796,7 +2800,9 @@ BeginCopyFrom(Relation rel, ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\" for reading: %m", -cstate->filename))); +cstate->filename), + errhint("Paths for COPY are on the PostgreSQL server, not the client. " + "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper"))); if (fstat(fileno(cstate->copy_file), )) ereport(ERROR, -- 2.5.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers