[HACKERS] [parallel query] random server crash while running tpc-h query on power2

2016-08-12 Thread Rushabh Lathia
Hi All,

Recently while running tpc-h queries on postgresql master branch, I am
noticed
random server crash. Most of the time server crash coming while turn tpch
query
number 3 - (but its very random).

Call Stack of server crash:

(gdb) bt
#0  0x102aa9ac in ExplainNode (planstate=0x1001a7471d8,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1516
#1  0x102aaeb8 in ExplainNode (planstate=0x1001a746b50,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#2  0x102aaeb8 in ExplainNode (planstate=0x1001a7468b8,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#3  0x102aaeb8 in ExplainNode (planstate=0x1001a745e48,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#4  0x102aaeb8 in ExplainNode (planstate=0x1001a745bb0,
ancestors=0x1001a745348, relationship=0x10913d10 "Outer", plan_name=0x0,
es=0x1001a65ad48)
at explain.c:1599
#5  0x102aaeb8 in ExplainNode (planstate=0x1001a745870,
ancestors=0x1001a745348, relationship=0x0, plan_name=0x0, es=0x1001a65ad48)
at explain.c:1599
#6  0x102a81d8 in ExplainPrintPlan (es=0x1001a65ad48,
queryDesc=0x1001a7442a0) at explain.c:599
#7  0x102a7e20 in ExplainOnePlan (plannedstmt=0x1001a744208,
into=0x0, es=0x1001a65ad48,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0, planduration=0x3fffe17c6488) at explain.c:515
#8  0x102a7968 in ExplainOneQuery (query=0x1001a65ab98, into=0x0,
es=0x1001a65ad48,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0) at explain.c:357
#9  0x102a74b8 in ExplainQuery (stmt=0x1001a6fe468,
queryString=0x1001a6a8c08 "explain (analyze,buffers,verbose)
select\n\tl_orderkey,\n\tsum(l_extendedprice * (1 - l_discount)) as
revenue,\n\to_orderdate,\n\to_shippriority\nfrom\n\tcustomer,\n\torders,\n\tlineitem\nwhere\n\tc_mktsegment
= 'BUILD"..., params=0x0, dest=0x1001a65acb0) at explain.c:245

(gdb) p *w
$2 = {num_workers = 2139062143, instrument = 0x1001af8a8d0}
(gdb) p planstate
$3 = (PlanState *) 0x1001a7471d8
(gdb) p *planstate
$4 = {type = T_HashJoinState, plan = 0x1001a740730, state = 0x1001a745758,
instrument = 0x1001a7514a8, worker_instrument = 0x1001af8a8c8,
  targetlist = 0x1001a747448, qual = 0x0, lefttree = 0x1001a74a0e8,
righttree = 0x1001a74f2f8, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0,
  ps_ResultTupleSlot = 0x1001a750c10, ps_ExprContext = 0x1001a7472f0,
ps_ProjInfo = 0x1001a751220, ps_TupFromTlist = 0 '\000'}
(gdb) p *planstate->worker_instrument
$5 = {num_workers = 2139062143, instrument = 0x1001af8a8d0}
(gdb) p n
$6 = 13055
(gdb) p n
$7 = 13055

Here its clear that work_instrument is either corrupted or Un-inililized
that is the
reason its ending up with server crash.

With bit more debugging and looked at git history I found that issue
started coming
with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext()
calls
ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers()
calls ExecParallelFinish() which collects the instrumentation before
marking
ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do
the allocation
of planstate->worker_instrument.

With commit af33039317 now we calling the gather_readnext() with per-tuple
context,
but with nreader == 0 with ExecShutdownGatherWorkers() we end up with
allocation
of planstate->worker_instrument into per-tuple context - which is wrong.

Now fix can be:

1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and
let
ExecEndGather() do that things. But with this change, gather_readread() and
gather_getnext() depend on planstate->reader structure to continue reading
tuple.
Now either we can change those condition to be depend on
planstate->nreaders or
just pfree(planstate->reader) into gather_readnext() instead of calling
ExecShutdownGatherWorkers().

2) Teach ExecParallelRetrieveInstrumentation() to do allocation of
planstate->worker_instrument into proper memory context.

Attaching patch, which fix the issue with approach 1).


Regards.

Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 438d1b2..bc56f0e 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/nodeGather.c
@@ -348,7 +348,8 @@ gather_readnext(GatherState *gatherstate)
 			--gatherstate->nreaders;
 			if 

Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread Thomas Munro
On Sat, Aug 13, 2016 at 8:26 AM, Thomas Munro
 wrote:
> On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane  wrote:
>> amul sul  writes:
>>> When I am calling dsm_create on Linux using the POSIX DSM implementation 
>>> can succeed, but result in SIGBUS when later try to access the memory.  
>>> This happens because of my system does not have enough shm space &  current 
>>> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can 
>>> we use fallocate system call (i.e. Zero-fill the file) to ensure that all 
>>> the file space has really been allocated, so that we don't later seg fault 
>>> when accessing the memory mapping. But here we will endup by loop calling 
>>> ‘write’ squillions of times.
>>
>> Wouldn't that just result in a segfault during dsm_create?
>>
>> I think probably what you are describing here is kernel misbehavior
>> akin to memory overcommit.  Maybe it *is* memory overcommit and can
>> be turned off the same way.  If not, you have material for a kernel
>> bug fix/enhancement request.
>
> [...] But it
> looks like if we used fallocate or posix_fallocate in the
> dsm_impl_posix case we'd get a nice ESPC error, instead of
> success-but-later-SIGBUS-on-access.

Here's a simple test extension that creates jumbo dsm segments, and
then accesses all pages.  If you ask it to write cheques that your
Linux 3.10 machine can't cash on unpatched master, it does this:

postgres=# create extension foo;
CREATE EXTENSION
postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024);
server closed the connection unexpectedly
...
LOG:  server process (PID 15105) was terminated by signal 7: Bus error

If I apply the attached experimental patch I get:

postgres=# select test_dsm(16::bigint * 1024 * 1024 * 1024);
ERROR:  could not resize shared memory segment
"/PostgreSQL.1938734921" to 17179869184 bytes: No space left on device

It should probably be refactored a bit to separate the error messages
for ftruncate and posix_fallocate, and we could possibly use the same
approach for dsm_impl_mmap instead of that write() loop, but this at
least demonstrates the problem Amul reported.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


posix_fallocate.patch
Description: Binary data


create-dsm-test.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Tom Lane
Greg Stark  writes:
> On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane  wrote:
>> pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
>> is nonstandard, maybe dropping support for it would be feasible.

> Isn't ISNULL one of the lexical tricks we used to effectively give
> bison two token lookahead?

No, it's a SQL-exposed feature, though I think it's just there for
compatibility with some random other DBMS.  See also NOTNULL.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Greg Stark
On Sat, Aug 13, 2016 at 1:18 AM, Andrew Gierth
 wrote:
>
> Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
> agree with this.
>
> (Inserting on the high leaf page is a special case, which is where the
> fillfactor logic kicks in; that's why sequentially filled indexes are
> (by default) 90% full rather than 100%. But other pages split into
> roughly equal halves.)

Hm, I was going from this lore. I didn't realize it was only for
inserts near the end of the index. That's cleverer than I realized.

commit 1663f3383849968415d29965ef9bfdf5aac4d358
Author: Tom Lane 
Date:   Sat Sep 29 23:49:51 2001 +

Tweak btree page split logic so that when splitting a page that is
rightmost on its tree level, we split 2/3 to the left and 1/3 to the
new right page, rather than the even split we use elsewhere.  The idea
is that when faced with a steadily increasing series of inserted keys
(such as sequence or timestamp values), we'll end up with a btree that's
about 2/3ds full not 1/2 full, which is much closer to the desired
steady-state load for a btree.  Per suggestion from Ann Harrison of
IBPhoenix.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Greg" == Greg Stark  writes:

 >> No, because as the pages split, they fill more slowly (because there
 >> are now more pages). So on average in a large randomly filled index,
 >> pages spend more time nearer 50% full than 100% full. This is easy
 >> to demonstrate by creating a table with an indexed float8 column and
 >> adding batches of random() values to it, checking with pgstatindex
 >> at intervals - the average leaf density will rarely exceed 70%.
 >> 
 >> However, worst case conditions can give lower leaf densities;
 >> obviously the worst case is if the data is loaded in an order and
 >> quantity that just happens to leave every leaf page recently split.

 Greg> btree pages don't split 50/50 either. They split biased to assume
 Greg> the greater side of the split will receive more inserts -- iirc
 Greg> 70/30.

Hmm? The code in _bt_findsplitloc and _bt_checksplitloc doesn't seem to
agree with this.

(Inserting on the high leaf page is a special case, which is where the
fillfactor logic kicks in; that's why sequentially filled indexes are
(by default) 90% full rather than 100%. But other pages split into
roughly equal halves.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> But we need to be clear in the documentation about what this
 Tom> property actually means.  My objection to having it answer at the
 Tom> index or column level is basically that that encourages confusion
 Tom> as to what it means.

OK. Here's new output with the scope changes, and some of the names
changed in an attempt to make them clearer:

cap | AM | Index | Column 
++---+
 asc||   | t
 desc   ||   | f
 nulls_first||   | f
 nulls_last ||   | t
 orderable  ||   | t
 distance_orderable ||   | f
 returnable ||   | t
 search_array   ||   | t
 search_nulls   ||   | t
 clusterable|| t | 
 backward_scan  || t | 
 index_scan || t | 
 bitmap_scan|| t | 
 can_order  | t  |   | 
 can_unique | t  |   | 
 can_multi_col  | t  |   | 
 can_exclude| t  |   | 
(17 rows)

(The can_* names are reserved for the AM level where we're asking about
the abstract capabilities of the AM.)

Better? Worse? Any more improvements to the names?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Greg Stark
On Fri, Aug 12, 2016 at 8:13 PM, Andrew Gierth
 wrote:
> No, because as the pages split, they fill more slowly (because there are
> now more pages). So on average in a large randomly filled index, pages
> spend more time nearer 50% full than 100% full. This is easy to
> demonstrate by creating a table with an indexed float8 column and adding
> batches of random() values to it, checking with pgstatindex at intervals -
> the average leaf density will rarely exceed 70%.
>
> However, worst case conditions can give lower leaf densities; obviously
> the worst case is if the data is loaded in an order and quantity that
> just happens to leave every leaf page recently split.


btree pages don't split 50/50 either. They split biased to assume the
greater side of the split will receive more inserts -- iirc 70/30. So
if they're loaded sequentially you should get a btree that's 70% full
but the worst case is in theory closer to 30% though I think the
insert order would have to be pretty perverse to be worse than 50%.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Greg Stark
On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane  wrote:
> pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
> is nonstandard, maybe dropping support for it would be feasible.

Isn't ISNULL one of the lexical tricks we used to effectively give
bison two token lookahead?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is tuplesort_heap_siftup() a misnomer?

2016-08-12 Thread Peter Geoghegan
Doesn't tuplesort_heap_siftup() actually shift-down?

The Wikipedia article on heaps [1] lists "shift-down" (never "sift
down", FWIW) as a common operation on a heap:

"shift-down: move a node down in the tree, similar to shift-up; used
to restore heap condition after deletion or replacement."

This seems to be what tuplesort_heap_siftup() actually does (among
other things; its job is to compact the heap following caller
returning the root, where caller leaves a "hole" at the root position
0). As it happens, the node that this tuplesort.c function "moves down
in the tree" is the memtuple that was previously positioned physically
last in the heap portion of the memtuples array. This node is then
considered as a "candidate root" (i) for each subheap as we actually
shift down, starting from i = 0. Conceptually, we immediately "fill
the hole" that caller left in the root with this other tuple (the
physically last tuple), with this new tuple/node, then shifted down as
needed. (I say "conceptually" because we don't actually bother with an
eager swap into the "hole" position, but that's just to avoid a
potentially useless swap.)

Now, tuplesort_heap_insert() actually does sift (or shift up, if you
prefer), which is necessary because it doesn't assume that there is an
existing "hole" anywhere in the heap (it just assumes the availability
of space to insert the caller's tuple, at the end of the heap's
memtuple portion).

Or, as Wikipedia describes it:

"shift-up: move a node up in the tree, as long as needed; used to
restore heap condition after insertion. Called "sift" because node
moves up the tree until it reaches the correct level, as in a sieve."

I think that tuplesort_heap_insert() is correct, because it doesn't
claim to shift at all (or sift, or whatever). I'm just contrasting it
with tuplesort_heap_siftup() to make a point.

I think that tuplesort_heap_siftup() should probably be renamed to
describe its purpose at a higher level, rather than how it shifts,
which in any case is an implementation detail. Specifically, I suggest
we rename tuplesort_heap_siftup() to tuplesort_heap_compact(), which
suggests that it's roughly the opposite of tuplesort_heap_insert(). I
think that the contrast between these two functions add clarity.
tuplesort_heap_siftup() comments will also need to be updated, since
"sift up" is mentioned there.

Should I write a patch?

[1] https://en.wikipedia.org/wiki/Heap_(data_structure)
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pluggable storage

2016-08-12 Thread Alvaro Herrera
Many have expressed their interest in this topic, but I haven't seen any
design of how it should work.  Here's my attempt; I've been playing with
this for some time now and I think what I propose here is a good initial
plan.  This will allow us to write permanent table storage that works
differently than heapam.c.  At this stage, I haven't throught through
whether this is going to allow extensions to define new storage modules;
I am focusing on AMs that can coexist with heapam in core.

The design starts with a new row type in pg_am, of type "s" (for "storage").
The handler function returns a struct of node StorageAmRoutine.  This
contains functions for 1) scans (beginscan, getnext, endscan) 2) tuples
(tuple_insert/update/delete/lock, as well as set_oid, get_xmin and the
like), and operations on tuples that are part of slots (tuple_deform,
materialize).

To support this, we introduce StorageTuple and StorageScanDesc.
StorageTuples represent a physical tuple coming from some storage AM.
It is necessary to have a pointer to a StorageAmRoutine in order to
manipulate the tuple.  For heapam.c, a StorageTuple is just a HeapTuple.

RelationData gains ->rd_stamroutine which is a pointer to the
StorageAmRoutine for the relation in question.  Similarly,
TupleTableSlot is augmented with a link to the StorageAmRoutine to
handle the StorageTuple it contains (probably in most cases it's set at
the same time as the tupdesc).  This implies that routines such as
ExecAssignScanType need to pass down the StorageAmRoutine from the
relation to the slot.

The executor is modified so that instead of calling heap_insert etc
directly, it uses rel->rd_stamroutine to call these methods.  The
executor is still in charge of dealing with indexes, constraints, and
any other thing that's not the tuple storage itself (this is one major
point in which this differs from FDWs).  This all looks simple enough,
with one exception and a few notes:

exception a) ExecMaterializeSlot needs special consideration.  This is
used in two different ways: a1) is the stated "make tuple independent
from any underlying storage" point, which is handled by
ExecMaterializeSlot itself and calling a method from the storage AM to
do any byte copying as needed.  ExecMaterializeSlot no longer returns a
HeapTuple, because there might not be any.  The second usage pattern a2)
is to create a HeapTuple that's passed to other modules which only deal
with HT and not slots (triggers are the main case I noticed, but I think
there are others such as the executor itself wanting tuples as Datum for
some reason).  For the moment I'm handling this by having a new
ExecHeapifyTuple which creates a HeapTuple from a slot, regardless of
the original tuple format.

note b) EvalPlanQual currently maintains an array of HeapTuple in
EState->es_epqTuple.  I think it works to replace that with an array of
StorageTuples; EvalPlanQualFetch needs to call the StorageAmRoutine
methods in order to interact with it.  Other than those changes, it
seems okay.

note c) nodeSubplan has curTuple as a HeapTuple.  It seems simple
to replace this with an independent slot-based tuple.

note d) grp_firstTuple in nodeAgg / nodeSetOp.  These are less
simple than the above, but replacing the HeapTuple with a slot-based
tuple seems doable too.

note e) nodeLockRows uses lr_curtuples to feed EvalPlanQual.
TupleTableSlot also seems a good replacement.  This has fallout in other
users of EvalPlanQual, too.

note f) More widespread, MinimalTuples currently use a tweaked HeapTuple
format.  In the long run, it may be possible to replace them with a
separate storage module that's specifically designed to handle tuples
meant for tuplestores etc.  That may simplify TupleTableSlot and
execTuples.  For the moment we keep the tts_mintuple as it is.  Whenever
a tuple is not already in heap format, we heapify it in order to put in
the store.


The current heapam.c routines need some changes.  Currently, practice is
that heap_insert, heap_multi_insert, heap_fetch, heap_update scribble on
their input tuples to set the resulting ItemPointer in tuple->t_self.
This is messy if we want StorageTuples to be abstract.  I'm changing
this so that the resulting ItemPointer is returned in a separate output
argument; the tuple itself is left alone.  This is somewhat messy in the
case of heap_multi_insert because it returns several items; I think it's
acceptable to return an array of ItemPointers in the same order as the
input tuples.  This works fine for the only caller, which is COPY in
batch mode.  For the other routines, they don't really care where the
TID is returned AFAICS.


Additional noteworthy items:

i) Speculative insertion: the speculative insertion token is no longer
installed directly in the heap tuple by the executor (of course).
Instead, the token becomes part of the slot.  When the tuple_insert
method is called, the insertion routine is in charge of setting the
token from the slot into the storage tuple.  Executor is 

[HACKERS] pg_dump with tables created in schemas created by extensions

2016-08-12 Thread Martín Marqués
Hi,

About a month or two ago I reported a pg_dump bug regarding tables (and
other objects) created inside a schema from an extension.

Objects created by the extensions are not dumped, as they will be
created once again with the CREATE EXTENSION call, but and other objects
which might live inside an object created by the extension should be
dumped so they get created inside the same schema.

The problem showed up when dumping a DB with PgQ installed as an
extension. Check here:

https://www.postgresql.org/message-id/d86dd685-1870-cfa0-e5e4-def1f918bec9%402ndquadrant.com

and here:

https://www.postgresql.org/message-id/409fe594-f4cc-89f5-c0d2-0a921987a864%402ndquadrant.com

Some discussion came up on the bugs list on how to fix the issue, and
the fact the new tests were needed.

I'm attaching a patch to provide such test, which if applied now,
returns failure on a number of runs, all expected due to the bug we have
at hand.

I believe the fix will be simple after the back and forth mails with
Michael, Stephen and Tom. I will work on that later, but preferred to
have the tests the show the problem which will also make testing the fix
easier.

Thoughts?

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
new file mode 100644
index fb4f573..6086317
*** a/src/test/modules/test_pg_dump/t/001_base.pl
--- b/src/test/modules/test_pg_dump/t/001_base.pl
*** my %tests = (
*** 283,288 
--- 283,313 
  			schema_only=> 1,
  			section_pre_data   => 1,
  			section_post_data  => 1, }, },
+ 	'CREATE TABLE regress_test_schema_table' => {
+ 		create_order => 3,
+ 		create_sql   => 'CREATE TABLE regress_pg_dump_schema.test_schema_table (
+ 		   col1 serial primary key,
+ 		   CHECK (col1 <= 1000)
+ 	   );',
+ 		regexp => qr/^
+ 			\QCREATE TABLE test_schema_table (\E
+ 			\n\s+\Qcol1 integer NOT NULL,\E
+ 			\n\s+\QCONSTRAINT test_table_col1_check CHECK \E
+ 			\Q((col1 <= 1000))\E
+ 			\n\);/xm,
+ 		like => {
+ 			binary_upgrade => 1,
+ 			clean  => 1,
+ 			clean_if_exists=> 1,
+ 			createdb   => 1,
+ 			defaults   => 1,
+ 			no_privs   => 1,
+ 			no_owner   => 1,
+ 			schema_only=> 1,
+ 			section_pre_data   => 1, },
+ 		unlike => {
+ 			pg_dumpall_globals => 1,
+ 			section_post_data  => 1, }, },
  	'CREATE ACCESS METHOD regress_test_am' => {
  		regexp => qr/^
  			\QCREATE ACCESS METHOD regress_test_am TYPE INDEX HANDLER bthandler;\E
diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
new file mode 100644
index c2fe90d..3f88e6c
*** a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
--- b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
*** CREATE TABLE regress_pg_dump_table (
*** 10,15 
--- 10,24 
  
  CREATE SEQUENCE regress_pg_dump_seq;
  
+ -- We want to test that schemas and objects created in the schema by the
+ -- extension are not dumped, yet other objects created afterwards will be
+ -- dumped.
+ CREATE SCHEMA regress_pg_dump_schema
+CREATE TABLE regress_pg_dump_schema_table (
+   col1 serial,
+   col2 int
+ 	);
+ 
  GRANT USAGE ON regress_pg_dump_seq TO regress_dump_test_role;
  
  GRANT SELECT ON regress_pg_dump_table TO regress_dump_test_role;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread Thomas Munro
On Sat, Aug 13, 2016 at 2:08 AM, Tom Lane  wrote:
> amul sul  writes:
>> When I am calling dsm_create on Linux using the POSIX DSM implementation can 
>> succeed, but result in SIGBUS when later try to access the memory.  This 
>> happens because of my system does not have enough shm space &  current 
>> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can 
>> we use fallocate system call (i.e. Zero-fill the file) to ensure that all 
>> the file space has really been allocated, so that we don't later seg fault 
>> when accessing the memory mapping. But here we will endup by loop calling 
>> ‘write’ squillions of times.
>
> Wouldn't that just result in a segfault during dsm_create?
>
> I think probably what you are describing here is kernel misbehavior
> akin to memory overcommit.  Maybe it *is* memory overcommit and can
> be turned off the same way.  If not, you have material for a kernel
> bug fix/enhancement request.

I think this may be different from overcommit.

In dsm_impl_posix we do shm_open, then ftruncate.  That creates a file
with a hole.  Based on an LKML discussion where someone tried to
address this with a patch that was rejected[1], it believe that Linux
implements POSIX shmem as a tmpfs file and in this case the file has a
hole, which is not the same phenomenon as unallocated virtual memory
pages resulting from overcommit policy.

In dsm_impl_mmap it looks like we have code to deal with the same
problem:  we do open, then, ftruncate, and then we explicitly write a
bunch of zeros to the file, with this comment:

/*
 * Zero-fill the file. We have to do this the hard way to ensure that
 * all the file space has really been allocated, so that we don't
 * later seg fault when accessing the memory mapping.  This is pretty
 * pessimal.
 */

Maybe we didn't do that for dsm_impl_posix because maybe you can't
write to a fd created with shm_open like that, I don't know.  But it
looks like if we used fallocate or posix_fallocate in the
dsm_impl_posix case we'd get a nice ESPC error, instead of
success-but-later-SIGBUS-on-access.  Whether there is *also* the
possibility of overcommit biting you later I don't know, but I suspect
that's an independent problem.  The OOM killer kills you with SIGKILL,
not SIGBUS.

[1] https://lkml.org/lkml/2013/7/31/64

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> +1 mostly, but I'm a bit bemused by can_order and can_backward
>  Tom> having different scopes --- how come?

> That's where they were in the previous list, a couple of messages up in
> the thread...

> Currently can_backward is always the same for all indexes in the same
> AM, but I guess there's no guarantee that won't change. In the previous
> message you suggested it might have to be per-column, even, but I think
> that makes no sense (either the whole index is scannable in both
> directions or it is not, it can't be different per column).

OK, fuzzy thinking on my part --- I was conflating amcanbackward with
whether you can get a descending-order scan.  I can imagine an index
type in which some opclasses can produce btree-ordered output and some
can't, in which case "orderable" has to be per-index and maybe even
per-column.  (If, say, the leading column is orderable and the next
is not, then you could expect to produce output that is sorted by
the first column but not by the first two.)  However, the ability to
back up in a scan is entirely independent of that; see the hash AM,
which sets amcanbackward even though its output has no definable order.

Right offhand it's pretty hard to see how amcanbackward wouldn't be
dependent on AM only.  However, we earlier posited that the only AM-level
properties should be those needed to figure out what you can say in CREATE
INDEX, so if we believe that then can_backward is properly positioned.

>  Tom> In particular, I'm not sure what you intend to mean by applying
>  Tom> can_unique at the index or column level.  Is that supposed to mean
>  Tom> that the index or column *is* unique?

> No. (We could add properties like is_unique, is_exclusion which clients
> currently query in pg_index; should we?)

No, I don't feel a need to replace that.  But we need to be clear in the
documentation about what this property actually means.  My objection to
having it answer at the index or column level is basically that that
encourages confusion as to what it means.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> This table shows what properties are exposed at the AM-wide level,
 >> the per-index level and the per-column level.

 Tom> +1 mostly, but I'm a bit bemused by can_order and can_backward
 Tom> having different scopes --- how come?

That's where they were in the previous list, a couple of messages up in
the thread...

Currently can_backward is always the same for all indexes in the same
AM, but I guess there's no guarantee that won't change. In the previous
message you suggested it might have to be per-column, even, but I think
that makes no sense (either the whole index is scannable in both
directions or it is not, it can't be different per column).

 Tom> Also, not sure about allowing things like can_multi_col as index
 Tom> or column properties.  That seems a bit silly: whether or not the
 Tom> AM can do multi columns, a specific index is what it is.  I'd be a
 Tom> bit inclined to have those return null except at the AM level.

I thought it would be cleaner to be able to query all properties at the
most specific level.

 Tom> In particular, I'm not sure what you intend to mean by applying
 Tom> can_unique at the index or column level.  Is that supposed to mean
 Tom> that the index or column *is* unique?

No. (We could add properties like is_unique, is_exclusion which clients
currently query in pg_index; should we?)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> distance_orderable now returns true/false depending on the opclass,
 >> not just on the amcanorderbyop field. In order to do this, I've
 >> added an optional amproperty function to the AM api, which if it
 >> exists, gets first dibs on all property calls so it can override the
 >> result as it sees fit.

 Tom> Hmm, seems like for that case, it'd be easier to look into pg_amop
 Tom> and see if the opclass has any suitably-marked operators.

I thought about that, but it seemed like it could get painful. The
planner is working forwards from a known operator and matching it
against the index column, whereas we'd have to work backwards from the
opfamily, and there's no good existing index for this; in the presence
of binary-compatible types, I don't think even amoplefttype can be
assumed (e.g. a varchar column can be ordered by pg_trgm's <-> operator
which is declared for text). So it'd have to be the equivalent of:

  get index column's opclass oid
  look it up in pg_opclass to get opfamily
  for r in select * from pg_amop where amopfamily=? and amoppurpose='o'
if r.amoplefttype is binary-coercible from the index column's type
  then return true

As opposed to what I have now, which is:

  get index column's opclass oid
  look it up in pg_opclass to get opfamily/opcintype
  result = SearchSysCacheExists4(AMPROCNUM, ...)  

(in theory this could produce a false positive if there's a distance
function but no actual operators to reference it, but I think that's the
opclass author's issue)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-08-12 Thread Tom Lane
Robert Haas  writes:
> On Fri, Aug 12, 2016 at 3:22 PM, Tom Lane  wrote:
>> Hm.  I think that sounds a lot easier than it actually is.  As an example,
>> this would mean that we'd want such a search_path setting to apply during
>> parse analysis of a function's body, but not during planning, because it
>> should not apply during inlining or const-folding of another function.
>> On the other hand, if someone tried to "SET enable_seqscan = off" with
>> this new scope (a highly reasonable thing to do), that certainly should
>> apply during planning.

> Mmm.  Maybe this hypothetical new facility should confine itself to
> search_path specifically.

Well, that would be sad, because examples like enable_seqscan seem about
as useful as search_path.

Assuming that the implementation involved something like calling guc.c to
tell it to push and later pop a new semantic level (within which outer
single-level GUC settings don't apply), it's possible that the problem
mentioned above could be solved by having the planner push a semantic
level during inline_function and simplify_function.  In normal execution
you could perhaps do it in fmgr.c's handlers.  Not sure whether we'd
need to have spi.c know about it in order to make PL functions work
nicely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-12 Thread David G. Johnston
On Fri, Aug 12, 2016 at 3:03 PM, Robert Haas  wrote:

> On Thu, Aug 11, 2016 at 8:34 AM, David G. Johnston
>  wrote:
> > I don't have a fundamental issue with saying "when turning auto-commit on
> > you are also requesting that the open transaction, if there is one, is
> > committed immediately."  I'm more inclined to think an error is the
> correct
> > solution - or to respond in a way conditional to the present usage
> > (interactive vs. script).  I disagree with  Robert's unsubstantiated
> belief
> > regarding ON_ERROR_STOP and think that it captures the relevant
> user-intent
> > for this behavior as well.
>
> I'll substantiate my belief by referring to you for the documentation
> for ON_ERROR_STOP, which says:
>
> "By default, command processing continues after an error. When this
> variable is set to on, processing will instead stop immediately. In
> interactive mode, psql will return to the command prompt; otherwise,
> psql will exit, returning error code 3 to distinguish this case from
> fatal error conditions, which are reported using error code 1. In
> either case, any currently running scripts (the top-level script, if
> any, and any other scripts which it may have in invoked) will be
> terminated immediately. If the top-level command string contained
> multiple SQL commands, processing will stop with the current command."
>
> In every existing case, ON_ERROR_STOP affects whether we continue to
> process further commands after an error has already occurred.  Your
> proposal would involve changing things so that the value ON_ERROR_STOP
> affects not only *how errors are handled* but *whether they happen in
> the first place* -- but only in this one really specific case, and no
> others.
>
> This isn't really an arguable point, even if you want to try to
> pretend otherwise.  ON_ERROR_STOP should affect whether we stop on
> error, not whether generate an error in the first place.  The clue is
> in the name.
>
>
​Changing AUTOCOMMIT to ON while in a transaction is a psql error - period.

If ON_ERROR_STOP is on we stop.  This meets the current semantics for
ON_ERROR_STOP.

With ON_ERROR_STOP off psql is going to continue on with the next command.
I'd suggest changing things so that psql can, depending upon the error,
invoke additional commands to bring the system into a known good state
before the next user command is executed.  In the case of "\set AUTOCOMMIT
on" this additional command would be COMMIT.  We can still report the error
before continuing on - so there is no affecting the "generating [of] an
error in the first place.".

​Allowing command-specific responses to errors when faced with script
continuation would be a change.  I do not think it would be a bad one.  Am
I stretching a bit here?  Sure.  Is it worth stretching to avoid adding
more knobs to the system?  Maybe.

I'll admit I haven't tried to find fault with the idea (or discover better
alternatives) nor how it would look in implementation.  As a user, though,
it would make sense if the system behaved in this way.  That only
AUTOCOMMIT needs this capability at the moment doesn't bother me.  I'm also
fine with making it an error and moving on - but if you want to accommodate
both possibilities ​this seems like a cleaner solution than yet another
environment variable that a user would need to consider.

David J.


Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-08-12 Thread Robert Haas
On Fri, Aug 12, 2016 at 3:22 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Let's introduce a new variant of SET that only affects the lexical
>> scope of the function to which it is attached, and then do what you
>> said.  That would be full of win, because actually I think in nearly
>> every case that's the behavior people actually want.
>
> Hm.  I think that sounds a lot easier than it actually is.  As an example,
> this would mean that we'd want such a search_path setting to apply during
> parse analysis of a function's body, but not during planning, because it
> should not apply during inlining or const-folding of another function.
> On the other hand, if someone tried to "SET enable_seqscan = off" with
> this new scope (a highly reasonable thing to do), that certainly should
> apply during planning.

Mmm.  Maybe this hypothetical new facility should confine itself to
search_path specifically.

> It might be practical to make it work, but it will be ticklish to
> get the scope of the settings to be non-surprising.

Yeah, it's certainly not the sort of thing I'm going to crank out
before breakfast some morning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-12 Thread Robert Haas
On Wed, Aug 10, 2016 at 4:54 PM, Peter Geoghegan  wrote:
> On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas  wrote:
>> My view on this - currently anyway - is that we shouldn't conflate the
>> tuplesort with the subsequent index generation, but that we should try
>> to use parallelism within the tuplesort itself to the greatest extent
>> possible.  If there is a single output stream that the leader uses to
>> generate the final index, then none of the above problems arise.  They
>> only arise if you've got multiple processes actually writing to the
>> index.
>
> I'm not sure if you're agreeing with my contention about parallel
> CREATE INDEX not being a good target for partitioning here. Are you?

No.  I agree that writing to the index in parallel is bad, but I think
it's entirely reasonable to try to set things up so that the leader
does as little of the final merge work itself as possible, instead
offloading that to workers.  Unless, of course, we can prove that the
overhead of the final merge pass is so low that it doesn't matter
whether we offload it.

> While all this speculation about choice of algorithm is fun,
> realistically I'm not gong to write the patch for a rainy day (nor for
> parallel CREATE INDEX, at least until we become very comfortable with
> all the issues I raise, which could never happen). I'd be happy to
> consider helping you improve parallel query by providing
> infrastructure like this, but I need someone else to write the client
> of the infrastructure (e.g. a parallel merge join patch), or to at
> least agree to meet me half way with an interdependent prototype of
> their own. It's going to be messy, and we'll have to do a bit of
> stumbling to get to a good place. I can sign up to that if I'm not the
> only one that has to stumble.

Fair enough.

> Serial merging still needs work, it seems.

At the risk of stating the obvious, improving serial execution
performance is always superior to comparable gains originating from
parallelism, so no complaints here about work in that area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-08-12 Thread Tom Lane
Robert Haas  writes:
> Let's introduce a new variant of SET that only affects the lexical
> scope of the function to which it is attached, and then do what you
> said.  That would be full of win, because actually I think in nearly
> every case that's the behavior people actually want.

Hm.  I think that sounds a lot easier than it actually is.  As an example,
this would mean that we'd want such a search_path setting to apply during
parse analysis of a function's body, but not during planning, because it
should not apply during inlining or const-folding of another function.
On the other hand, if someone tried to "SET enable_seqscan = off" with
this new scope (a highly reasonable thing to do), that certainly should
apply during planning.

It might be practical to make it work, but it will be ticklish to
get the scope of the settings to be non-surprising.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Andrew Gierth
> "Jeff" == Jeff Janes  writes:

 Jeff> But shouldn't that still leave us with a 75% full index, rather
 Jeff> than slightly over 50% full?

Average is usually about 67%-70%. (For capacity estimation I always
assume 66% for a non-sequentially-filled btree.)

 Jeff> The leaf pages start at 50%, grow to 100%, then split back to
 Jeff> 50%, then grow back to 100%.  So the average should be about 75%.

No, because as the pages split, they fill more slowly (because there are
now more pages). So on average in a large randomly filled index, pages
spend more time nearer 50% full than 100% full. This is easy to
demonstrate by creating a table with an indexed float8 column and adding
batches of random() values to it, checking with pgstatindex at intervals -
the average leaf density will rarely exceed 70%.

However, worst case conditions can give lower leaf densities; obviously
the worst case is if the data is loaded in an order and quantity that
just happens to leave every leaf page recently split.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-08-12 Thread Robert Haas
On Thu, Aug 11, 2016 at 8:34 AM, David G. Johnston
 wrote:
> I don't have a fundamental issue with saying "when turning auto-commit on
> you are also requesting that the open transaction, if there is one, is
> committed immediately."  I'm more inclined to think an error is the correct
> solution - or to respond in a way conditional to the present usage
> (interactive vs. script).  I disagree with  Robert's unsubstantiated belief
> regarding ON_ERROR_STOP and think that it captures the relevant user-intent
> for this behavior as well.

I'll substantiate my belief by referring to you for the documentation
for ON_ERROR_STOP, which says:

"By default, command processing continues after an error. When this
variable is set to on, processing will instead stop immediately. In
interactive mode, psql will return to the command prompt; otherwise,
psql will exit, returning error code 3 to distinguish this case from
fatal error conditions, which are reported using error code 1. In
either case, any currently running scripts (the top-level script, if
any, and any other scripts which it may have in invoked) will be
terminated immediately. If the top-level command string contained
multiple SQL commands, processing will stop with the current command."

In every existing case, ON_ERROR_STOP affects whether we continue to
process further commands after an error has already occurred.  Your
proposal would involve changing things so that the value ON_ERROR_STOP
affects not only *how errors are handled* but *whether they happen in
the first place* -- but only in this one really specific case, and no
others.

This isn't really an arguable point, even if you want to try to
pretend otherwise.  ON_ERROR_STOP should affect whether we stop on
error, not whether generate an error in the first place.  The clue is
in the name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Tom Lane
Andrew Gierth  writes:
> This table shows what properties are exposed at the AM-wide level, the
> per-index level and the per-column level.

+1 mostly, but I'm a bit bemused by can_order and can_backward having
different scopes --- how come?

Also, not sure about allowing things like can_multi_col as index or
column properties.  That seems a bit silly: whether or not the AM can
do multi columns, a specific index is what it is.  I'd be a bit inclined
to have those return null except at the AM level.

In particular, I'm not sure what you intend to mean by applying can_unique
at the index or column level.  Is that supposed to mean that the index or
column *is* unique?  If so, I'd prefer a different name for that, on the
same principle that "orderable" is not the same thing as "can_order".

> distance_orderable now returns true/false depending on the opclass, not
> just on the amcanorderbyop field. In order to do this, I've added an
> optional amproperty function to the AM api, which if it exists, gets
> first dibs on all property calls so it can override the result as it
> sees fit.

Hmm, seems like for that case, it'd be easier to look into pg_amop
and see if the opclass has any suitably-marked operators.  The AMs
that support this would have to duplicate such code anyway, so it
seems better to just have one copy.  Or we could leave it to the
client to do that join, which is the sort of approach that Stephen
was advocating anyway, IIUC.

Adding an AM override method might be a good idea anyway, but I'm not
convinced that it's an appropriate way to address this particular point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-08-12 Thread Robert Haas
On Thu, Mar 10, 2016 at 11:48 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Hmm.  The meaning of funcs.inline depends on the search_path, not just
>> during dump restoration but all the time.  So anything uses it under a
>> different search_path setting than the normal one will have this kind
>> of problem; not just dump/restore.
>
> Yeah, I see no reason to claim that this is a dump/restore-specific
> problem.
>
>> I don't have a very good idea what to do about that.
>
> The safe way to write SQL-language functions to be search-path-proof
> is to schema-qualify the names in them, or to add a "SET search_path"
> clause to the function definition.
>
> The problem with the latter approach is that it defeats inlining.
> I thought for a minute that maybe we could teach the planner to do
> inlining anyway by parsing the function body with the adjusted
> search_path, but that doesn't really preserve the same semantics
> (a SET would change the environment for called functions too).

Let's introduce a new variant of SET that only affects the lexical
scope of the function to which it is attached, and then do what you
said.  That would be full of win, because actually I think in nearly
every case that's the behavior people actually want.  There's a reason
why (for example) Perl started out with dynamic scoping (local) and
then eventually introduced lexical scoping (my): it's because lexical
scoping makes writing correct programs easier to a greater degree than
dynamic scoping.  That's basically the same problem we're hitting here
- and not only here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> I think I experimented with this a while ago and found that even after
>> removing postfix operators there was at least one other grammar
>> problem that prevented us from accepting ColLabel there.  I gave up
>> and didn't dig further, but maybe we should.

> Yes, it would be good to find that out.

I poked at that a little bit, by dint of changing

-   | a_expr IDENT
+   | a_expr ColLabel

in the target_el production and then seeing what Bison complained about.
The majority of the problems boil down to variants of this:

state 997

  1691 character: CHARACTER . opt_varying

VARYING  shift, and go to state 1597

VARYING   [reduce using rule 1698 (opt_varying)]
$default  reduce using rule 1698 (opt_varying)

opt_varying  go to state 1600

What this is telling us is that given input like, say,

SELECT 'foo'::character varying

Bison is no longer sure whether "varying" is meant as a type name modifier
or a ColLabel.  And indeed there is *no* principled answer to that that
doesn't involve giving up the ability for "varying" to be a ColLabel.
Just promoting it to a fully reserved word (which it is not today)
wouldn't be enough, because right now even fully reserved words can be
ColLabels.

Another problem is here:

state 1846

  1762 a_expr: a_expr ISNULL .
  2418 type_func_name_keyword: ISNULL .

$end   reduce using rule 1762 (a_expr)
$end   [reduce using rule 2418 (type_func_name_keyword)]

pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
is nonstandard, maybe dropping support for it would be feasible.

There are some other problems that look probably fixable with
refactoring, but AFAICS the ones above are fundamental.

So we basically can't have "you can use anything at all as a ColLabel
without AS".  We could probably somewhat reduce the set of words you're
not allowed to use that way, but we could not even promise that all
words that are currently unreserved would work.

It's likely that by rejiggering the precedence of productions, we could
resolve these ambiguities in favor of "it's not a ColLabel if it appears
in a context like this".  And probably that wouldn't be too surprising
most of the time.  But depending on precedence to resolve fundamentally
ambiguous grammar is always gonna bite you sometimes.  People understand
it (... usually ...) in the context of parsing multi-operator expressions,
but for other things it's not a great tool.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-12 Thread Andrew Gierth
So I'm tidying up and doing docs for the next version of this patch, but
here for comment is the current functionality:

select cap,
   pg_indexam_has_property(a.oid, cap) as "AM",
   pg_index_has_property('onek_hundred'::regclass, cap) as "Index",
   pg_index_column_has_property('onek_hundred'::regclass, 1, cap) as 
"Column"
  from pg_am a,
   unnest(array['asc', 'desc', 'nulls_first', 'nulls_last',
'orderable', 'distance_orderable', 'can_order',
'can_unique', 'can_multi_col', 'can_exclude',
'can_backward', 'can_cluster', 'index_scan',
'bitmap_scan', 'can_return', 'search_array',
'search_nulls'])
 with ordinality as u(cap,ord)
 where a.amname='btree'
 order by ord;
cap | AM | Index | Column 
++---+
 asc||   | t
 desc   ||   | f
 nulls_first||   | f
 nulls_last ||   | t
 orderable  ||   | t
 distance_orderable ||   | f
 can_order  | t  | t | t
 can_unique | t  | t | t
 can_multi_col  | t  | t | t
 can_exclude| t  | t | t
 can_backward   || t | t
 can_cluster|| t | t
 index_scan || t | t
 bitmap_scan|| t | t
 can_return ||   | t
 search_array   ||   | t
 search_nulls   ||   | t
(17 rows)

This table shows what properties are exposed at the AM-wide level, the
per-index level and the per-column level.

distance_orderable now returns true/false depending on the opclass, not
just on the amcanorderbyop field. In order to do this, I've added an
optional amproperty function to the AM api, which if it exists, gets
first dibs on all property calls so it can override the result as it
sees fit.

can_return likewise reflects the result of index_can_return.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Tom Lane
Robert Haas  writes:
> Half a percent for two productions is not bad, but I think the real
> win would be in removing ambiguity from the grammar.  We get periodic
> complaints about the fact that things like "SELECT 3 cache" don't work
> because cache is an unreserved keyword, and postfix operators are one
> of the reasons why we can't do better:

Agreed, if postfix operators were the only thing standing between us and
fixing that, it would be a pretty strong argument for removing them.

> I think I experimented with this a while ago and found that even after
> removing postfix operators there was at least one other grammar
> problem that prevented us from accepting ColLabel there.  I gave up
> and didn't dig further, but maybe we should.

Yes, it would be good to find that out.  I think there's a whole bunch of
intertwined issues there, though; this isn't likely to be an easy change.
The comment at gram.y lines 679ff lists several things that are relevant,
and might or might not be simplifiable without postfix ops.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Robert Haas
On Fri, Aug 12, 2016 at 12:57 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark  wrote:
>>> I wonder whether it's really worth keeping postfix operators. They
>>> seem to keep causing these kinds of headaches and I wonder how much
>>> the grammar tables would be simplified by removing them.
>
>> I've wondered the same thing at other times.  The only postfix
>> operator we ship in core is numeric_fac, and frankly it doesn't seem
>> worth it just for that.  If we decided that factorial(n) was adequate
>> notation for that case, and that we didn't care about any hypothetical
>> user-defined postfix operators either, I think we simplify or improve
>> a number of things.
>
> [ quick experiment... ]  Simply removing the two postfix-operator
> productions produces no meaningful savings (~0.5%), which is unsurprising
> because after all they're just two more productions to Bison.  It's
> possible we could save more by simplifying the existing hacks elsewhere
> in the grammar that were needed to work around ambiguities induced by
> postfix operators.  But that would take quite a bit of actual work,
> and I suspect we'd end up with a similar result that the tables don't
> actually get much smaller.  You could argue for this on the grounds of
> some reduced intellectual complexity in gram.y, and more forcefully on
> the grounds of removing user surprise in cases like Jim's (especially if
> you can find some other cases where it matters).  But I doubt that we'd
> get any kind of noticeable run-time or code-size win.

Half a percent for two productions is not bad, but I think the real
win would be in removing ambiguity from the grammar.  We get periodic
complaints about the fact that things like "SELECT 3 cache" don't work
because cache is an unreserved keyword, and postfix operators are one
of the reasons why we can't do better:

/*
 * We support omitting AS only for column labels that aren't
 * any known keyword.  There is an ambiguity against postfix
 * operators: is "a ! b" an infix expression, or a postfix
 * expression and a column label?  We prefer to resolve this
 * as an infix expression, which we accomplish by assigning
 * IDENT a precedence higher than POSTFIXOP.
 */

I think I experimented with this a while ago and found that even after
removing postfix operators there was at least one other grammar
problem that prevented us from accepting ColLabel there.  I gave up
and didn't dig further, but maybe we should.  I sort of like the
elegance of supporting user-defied prefix and postfix operators, but
in practice this is a not-especially-infrequent problem for people
migrating from other databases, a consideration that might be judged
to outweigh that elegance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Jeff Janes
On Fri, Aug 12, 2016 at 1:40 AM, Mark Kirkwood
 wrote:
> After examining the benchmark design - I see we are probably not being
> helped by the repeated insertion of keys all of form 'userxxx' leading
> to some page splitting.

But shouldn't that still leave us with a 75% full index, rather than
slightly over 50% full?

The leaf pages start at 50%, grow to 100%, then split back to 50%,
then grow back to 100%.  So the average should be about 75%.


> However your index rebuild gets you from 5 to 3 GB - does that really help
> performance significantly?

It can make a big difference, depending on how much RAM you have.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-12 Thread Tom Lane
Karan Sikka  writes:
>> Having said that, I've had a bee in my bonnet for a long time about
>> removing per-row setup cost for repetitive regex matches, and
>> whatever infrastructure that needs would work for this too.

> What are the per-row setup costs for regex matches?

Well, they're pretty darn high if you have more active regexps than will
fit in that cache, and even if you don't, the cache lookup seems a bit
inefficient.  What I'd really like to do is get rid of that cache in favor
of having a way to treat a precompiled regexp as a constant.  I think this
is probably possible via inventing a "regexp" datatype, which we make the
declared RHS input type for the ~ operator, and give it an implicit cast
from text so that existing queries don't break.  The compiled regexp tree
structure contains pointers so it could never go to disk, but now that
we have the "expanded datum" infrastructure you could imagine that the
on-disk representation is the same as text but we support adding a
compiled tree to it in-memory.

Or maybe we just need a smarter cache mechanism in regexp.c.  A cache
like that might be the only way to deal with a query using variable
patterns (e.g, pattern argument coming from a table column).  But it
seems like basically the wrong approach for the common case of a
constant pattern.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-12 Thread Karan Sikka
> Having said that, I've had a bee in my bonnet for a long time about
> removing per-row setup cost for repetitive regex matches, and
> whatever infrastructure that needs would work for this too.

What are the per-row setup costs for regex matches? I looked at
`regexp.c` and saw:

```
/*
 * We cache precompiled regular expressions using a "self organizing list"
 * structure, in which recently-used items tend to be near the front.
 * Whenever we use an entry, it's moved up to the front of the list.
 * Over time, an item's average position corresponds to its frequency of
use.
 *
```

What proverbial bee did you have in your bonnet about the current regex
implementation?
Which functions other than `strpos` and `LIKE` would benefit from a similar
cache, or perhaps a query-scoped cache?

In the mean time I'll look at other TODOs that catch my interest.
Feel free to point me in the direction of one that you think is
both desirable and easy enough for a beginner.

Thanks!
Karan


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread Claudio Freire
On Fri, Aug 12, 2016 at 1:55 PM, amul sul  wrote:
> No segfault during dsm_create,  mmap returns  the memory address  which is
> inaccessible.
>
> Let me see how can I disable kernel overcommit behaviour, but  IMHO,  we
> should prevent ourselves from crashing,  shouldn't we?

Overcommit can be caught at dsm_create time with an mlock call (note,
that MAP_LOCKED isn't the same). Even a transient lock (mlock +
munlock) could prevent sigbus due to overcommit.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Tom Lane
Robert Haas  writes:
> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark  wrote:
>> I wonder whether it's really worth keeping postfix operators. They
>> seem to keep causing these kinds of headaches and I wonder how much
>> the grammar tables would be simplified by removing them.

> I've wondered the same thing at other times.  The only postfix
> operator we ship in core is numeric_fac, and frankly it doesn't seem
> worth it just for that.  If we decided that factorial(n) was adequate
> notation for that case, and that we didn't care about any hypothetical
> user-defined postfix operators either, I think we simplify or improve
> a number of things.

[ quick experiment... ]  Simply removing the two postfix-operator
productions produces no meaningful savings (~0.5%), which is unsurprising
because after all they're just two more productions to Bison.  It's
possible we could save more by simplifying the existing hacks elsewhere
in the grammar that were needed to work around ambiguities induced by
postfix operators.  But that would take quite a bit of actual work,
and I suspect we'd end up with a similar result that the tables don't
actually get much smaller.  You could argue for this on the grounds of
some reduced intellectual complexity in gram.y, and more forcefully on
the grounds of removing user surprise in cases like Jim's (especially if
you can find some other cases where it matters).  But I doubt that we'd
get any kind of noticeable run-time or code-size win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread amul sul
No segfault during dsm_create,  mmap returns  the memory address  which is 
inaccessible. 
Let me see how can I disable kernel overcommit behaviour, but  IMHO,  we should 
prevent ourselves from crashing,  shouldn't we? 

Regards,  
Amul
 Sent from a mobile device. Please excuse brevity and tpyos. 
 
  On Fri, 12 Aug, 2016 at 7:38 pm, Tom Lane wrote:   amul 
sul  writes:
> When I am calling dsm_create on Linux using the POSIX DSM implementation can 
> succeed, but result in SIGBUS when later try to access the memory.  This 
> happens because of my system does not have enough shm space &  current 
> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can 
> we use fallocate system call (i.e. Zero-fill the file) to ensure that all the 
> file space has really been allocated, so that we don't later seg fault when 
> accessing the memory mapping. But here we will endup by loop calling ‘write’ 
> squillions of times.

Wouldn't that just result in a segfault during dsm_create?

I think probably what you are describing here is kernel misbehavior
akin to memory overcommit.  Maybe it *is* memory overcommit and can
be turned off the same way.  If not, you have material for a kernel
bug fix/enhancement request.

            regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
  


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Kisung Kim
You're right. Reindex improves the performance of the benchmark workloads
dramatically.
I'm gathering results and will announce them.

But I think we should notice that the results before Reindexing is poorer
than MongoDB.
It seems that this is because of Btree bloating (not exact expression).
The lookup performance for the Btree is most crucial for the results
because the workload is select for primary key.
So larger Btree could mean less cache hits and slower query performance.
I think that PG doesn't pack leaf node to 90% but half for future insertion
and because of this PG's btree is larger than MongoDB
(I turned off prefix compression of WiredTiger index and block compression
for storage.)
But MongoDB (actually WiredTiger) seems to do differently.

Is my speculation is right? I'm not sure because I didn't review the btree
code of PG yet.

And I want to point that the loading performance of MongoDB is better than
PG.
If PG leaves more space for future insertion, then could we get at least
faster loading performance?
Then can we conclude that we have more chances to improve Btree of PG?

Best Regards,



On Fri, Aug 12, 2016 at 5:40 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> After examining the benchmark design - I see we are probably not being
> helped by the repeated insertion of keys all of form 'userxxx' leading
> to some page splitting.
>
> However your index rebuild gets you from 5 to 3 GB - does that really help
> performance significantly?
>
> regards
>
> Mark
>
>
> On 11/08/16 16:08, Kisung Kim wrote:
>
>> Thank you for your information.
>> Here is the result:
>>
>> After insertions:
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>>  version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> -+++---+
>> ++-+---+
>> --+
>>2 |  3 | 5488721920 | 44337 | 4464 |
>>  665545 |   0 | 0 |   52 | 11
>> (1 row)
>>
>> After rebuild:
>>
>>
>> ycsb=# select * from pgstatindex('usertable_pkey');
>>  version | tree_level | index_size | root_block_no | internal_pages |
>> leaf_pages | empty_pages | deleted_pages | avg_leaf_density |
>> leaf_fragmentation
>> -+++---+
>> ++-+---+
>> --+
>>2 |  3 | 3154296832 | 41827 | 1899 |
>>  383146 |   0 | 0 |90.08 |
>> 0
>>
>>
>> It seems like that rebuild has an effect to reduce the number of internal
>> and leaf_pages and make more dense leaf pages.
>>
>>
>>
>>
>


-- 




Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Robert Haas
On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark  wrote:
> On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane  wrote:
>
>> I think what is happening
>> in the trouble case is that since IS has lower precedence than Op, the
>> grammar decides it ought to resolve || as a postfix operator, and then
>> it effectively has
>> ('x' ||) IS ...
>> which leaves noplace to go except IS NULL and other IS-something syntaxes.
>
> I wonder whether it's really worth keeping postfix operators. They
> seem to keep causing these kinds of headaches and I wonder how much
> the grammar tables would be simplified by removing them.

I've wondered the same thing at other times.  The only postfix
operator we ship in core is numeric_fac, and frankly it doesn't seem
worth it just for that.  If we decided that factorial(n) was adequate
notation for that case, and that we didn't care about any hypothetical
user-defined postfix operators either, I think we simplify or improve
a number of things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [BUG] Print timing statistics of trigger execution under turned off timing option of EXPLAIN ANALYZE

2016-08-12 Thread Tom Lane
maksim  writes:
> postgres=# EXPLAIN (ANALYZE, timing false) insert INTO foo 
> values(101, '');
>  QUERY PLAN
> --
>   Insert on foo  (cost=0.00..0.01 rows=1 width=36) (actual rows=0 loops=1)
> ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual rows=1 loops=1)
>   Planning time: 0.038 ms
>   Trigger unique_foo_c1:*/time=0.000/* calls=1
>   Execution time: 340.766 ms
> (5 rows)

Hmm, yeah, it shouldn't do that.

> My patch fixes this problem.

Will push, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is box <-> point missing, and box <-> box not indexable?

2016-08-12 Thread Tom Lane
Andrew Gierth  writes:
> point <-> point, circle <-> point and polygon <-> point all exist as
> orderable-by-operator operators (in fact they are the only ones by
> default). But there's no box <-> point operator at all, and no index
> support for box <-> box.

> Was this intentional, or just a strange oversight?

Seems like an oversight.  None of these operators have commutators,
but it seems like they all should:

 <->(point,line)
 <->(point,lseg)
 <->(point,box)
 <->(lseg,line)
 <->(lseg,box)
 <->(point,path)
 <->(circle,polygon)
 <->(line,box)

Dunno about the index-support angle.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why is box <-> point missing, and box <-> box not indexable?

2016-08-12 Thread Andrew Gierth
point <-> point, circle <-> point and polygon <-> point all exist as
orderable-by-operator operators (in fact they are the only ones by
default). But there's no box <-> point operator at all, and no index
support for box <-> box.

Was this intentional, or just a strange oversight?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Tom Lane
Jim Nasby  writes:
> Is there a place in the error reporting path where we'd still have 
> access to the 'is' token, and have enough control to look for a relevant 
> function?

No.  The grammar can't assume that it's being run inside a transaction
(consider parsing START TRANSACTION, or ROLLBACK after a failure).
So catalog access is out, full stop.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread Tom Lane
amul sul  writes:
> When I am calling dsm_create on Linux using the POSIX DSM implementation can 
> succeed, but result in SIGBUS when later try to access the memory.  This 
> happens because of my system does not have enough shm space &  current 
> allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can 
> we use fallocate system call (i.e. Zero-fill the file) to ensure that all 
> the file space has really been allocated, so that we don't later seg fault 
> when accessing the memory mapping. But here we will endup by loop calling 
> ‘write’ squillions of times.

Wouldn't that just result in a segfault during dsm_create?

I think probably what you are describing here is kernel misbehavior
akin to memory overcommit.  Maybe it *is* memory overcommit and can
be turned off the same way.  If not, you have material for a kernel
bug fix/enhancement request.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add hint for function named "is"

2016-08-12 Thread Greg Stark
On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane  wrote:

> I think what is happening
> in the trouble case is that since IS has lower precedence than Op, the
> grammar decides it ought to resolve || as a postfix operator, and then
> it effectively has
> ('x' ||) IS ...
> which leaves noplace to go except IS NULL and other IS-something syntaxes.

I wonder whether it's really worth keeping postfix operators. They
seem to keep causing these kinds of headaches and I wonder how much
the grammar tables would be simplified by removing them.



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] new autovacuum criterion for visible pages

2016-08-12 Thread Vik Fearing
On 12/08/16 15:15, Peter Eisentraut wrote:
> On 8/11/16 11:59 AM, Jeff Janes wrote:
>> Insertions and HOT-updates clear vm bits but don't increment the
>> counters that those existing parameters are compared to.
>>
>> Also, the relationship between number of updated/deleted rows and the
>> number of hint-bits cleared can be hard to predict due to possible
>> clustering of the updates into the same blocks.  So it would be hard
>> to know what to set the values to.
> 
> Well, the current threshold formulas aren't an exact science either.
> They just trigger autovacuum often enough relative to table size and
> activity.  Just fudging in the insert and HOT update counters times some
> factor might be enough, and it would get this functionality out to all
> users without more effort.

I have a patch I wrote a while ago that does this.  I haven't submitted
it yet because the documentation is lacking.  I will post it over the
weekend (I had planned to do it before the commitfest anyway).
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2016-08-12 Thread amul sul
Hi All,
When I am calling dsm_create on Linux using the POSIX DSM implementation can 
succeed, but result in SIGBUS when later try to access the memory.  This 
happens because of my system does not have enough shm space &  current 
allocation in dsm_impl_posix does not allocate disk blocks[1]. I wonder can we 
use fallocate system call (i.e. Zero-fill the file) to ensure that all the file 
space has really been allocated, so that we don't later seg fault when 
accessing the memory mapping. But here we will endup by loop calling ‘write’ 
squillions of times.
Thoughts/Suggestions ?
Similar post:  [1] 
http://uk.comp.os.linux.narkive.com/Ve44sO4i/shared-memory-problem-no-space-at-dev-shm-causes-sigbus
 

Regards,Amul Sul

Re: [HACKERS] new autovacuum criterion for visible pages

2016-08-12 Thread Peter Eisentraut
On 8/11/16 11:59 AM, Jeff Janes wrote:
> Insertions and HOT-updates clear vm bits but don't increment the
> counters that those existing parameters are compared to.
> 
> Also, the relationship between number of updated/deleted rows and the
> number of hint-bits cleared can be hard to predict due to possible
> clustering of the updates into the same blocks.  So it would be hard
> to know what to set the values to.

Well, the current threshold formulas aren't an exact science either.
They just trigger autovacuum often enough relative to table size and
activity.  Just fudging in the insert and HOT update counters times some
factor might be enough, and it would get this functionality out to all
users without more effort.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relocation of tuple between release and re-acquire of tuple lock

2016-08-12 Thread Alexander Korotkov
On Fri, Aug 12, 2016 at 3:15 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I'm now exploring code working with heap tuples.  The following code
> in heap_update() catch my eyes.
>
> if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
>> *lockmode))
>> {
>> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>> /* acquire tuple lock, if necessary */
>> heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
>> LockWaitBlock, _tuple_lock);
>> /* wait for multixact */
>> MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
>> relation, _self, XLTW_Update,
>> );
>> checked_lockers = true;
>> locker_remains = remain != 0;
>> LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
>> /*
>> * If xwait had just locked the tuple then some other xact
>> * could update this tuple before we get to this point.  Check
>> * for xmax change, and start over if so.
>> */
>> if (xmax_infomask_changed(oldtup.t_data->t_infomask,
>>  infomask) ||
>> !TransactionIdEquals(HeapTupleGetRawXmax(),
>> xwait))
>> goto l2;
>> }
>
>
> Is it safe to rely on same oldtup.t_data pointer after release
> and re-acquire of buffer content lock?  Could the heap tuple be relocated
> between lock release and re-acquire?  I know that we still hold a buffer
> pin and vacuum would wait for pin release.  But other heap operations could
> still call heap_page_prune() which correspondingly can relocate tuple.
> Probably, I'm missing something...
>

Please, forget it. heap_page_prune_opt() do:


> /* OK, try to get exclusive buffer lock */
> if (!ConditionalLockBufferForCleanup(buffer))
> return;


Nobody repairs buffer fragmentation while there is a pin. Everything is
right.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Relocation of tuple between release and re-acquire of tuple lock

2016-08-12 Thread Alexander Korotkov
Hackers,

I'm now exploring code working with heap tuples.  The following code
in heap_update() catch my eyes.

if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
> *lockmode))
> {
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> /* acquire tuple lock, if necessary */
> heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
> LockWaitBlock, _tuple_lock);
> /* wait for multixact */
> MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
> relation, _self, XLTW_Update,
> );
> checked_lockers = true;
> locker_remains = remain != 0;
> LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
> /*
> * If xwait had just locked the tuple then some other xact
> * could update this tuple before we get to this point.  Check
> * for xmax change, and start over if so.
> */
> if (xmax_infomask_changed(oldtup.t_data->t_infomask,
>  infomask) ||
> !TransactionIdEquals(HeapTupleGetRawXmax(),
> xwait))
> goto l2;
> }


Is it safe to rely on same oldtup.t_data pointer after release
and re-acquire of buffer content lock?  Could the heap tuple be relocated
between lock release and re-acquire?  I know that we still hold a buffer
pin and vacuum would wait for pin release.  But other heap operations could
still call heap_page_prune() which correspondingly can relocate tuple.
Probably, I'm missing something...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


[HACKERS] Fwd: [BUG] Print timing statistics of trigger execution under turned off timing option of EXPLAIN ANALYZE

2016-08-12 Thread maksim

Hello, hackers!


At this moment EXPLAIN ANALYZE with turned off timing option after 
execution query that expires trigger prints time of total execution of 
trigger function:


postgres=# EXPLAIN (ANALYZE, timing false) insert INTO foo 
values(101, '');

QUERY PLAN
--
 Insert on foo  (cost=0.00..0.01 rows=1 width=36) (actual rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=36) (actual rows=1 loops=1)
 Planning time: 0.038 ms
 Trigger unique_foo_c1:*/time=0.000/* calls=1
 Execution time: 340.766 ms
(5 rows)

My patch fixes this problem.

--
Maksim Milyutin
Postgres Professional:http://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index dbd27e5..808e23e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -698,8 +698,11 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
 appendStringInfo(es->str, " for constraint %s", conname);
 			if (show_relname)
 appendStringInfo(es->str, " on %s", relname);
-			appendStringInfo(es->str, ": time=%.3f calls=%.0f\n",
-			 1000.0 * instr->total, instr->ntuples);
+			if (es->timing)
+appendStringInfo(es->str, ": time=%.3f calls=%.0f\n",
+ 1000.0 * instr->total, instr->ntuples);
+			else
+appendStringInfo(es->str, ": calls=%.0f\n", instr->ntuples);
 		}
 		else
 		{
@@ -707,7 +710,8 @@ report_triggers(ResultRelInfo *rInfo, bool show_relname, ExplainState *es)
 			if (conname)
 ExplainPropertyText("Constraint Name", conname, es);
 			ExplainPropertyText("Relation", relname, es);
-			ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es);
+			if (es->timing)
+ExplainPropertyFloat("Time", 1000.0 * instr->total, 3, es);
 			ExplainPropertyFloat("Calls", instr->ntuples, 0, es);
 		}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] New psql prompt substitution %r (m = master, r = replica)

2016-08-12 Thread Aleksander Alekseev
Thank you everyone for your replies!

I did some research and apparently the is no need in any patch. As
David pointed out what I did could be easily done using \gset:

```
$ cat ~/.psqlrc 
select (case when pg_is_in_recovery() then 'r' else 'm' end) as mor
\gset

\set PROMPT1 '%p (%:mor:) =# '
```

Besides I figured out that replica promotion case could also be properly
handled without any patches. In case anyone is interested here is a
brief description of a solution.

~/.bash_profile:

```
export PATH="/home/eax/bin:$PATH"
alias psql='psql_wrapper'
```

~/bin/psql_wrapper:

```
#!/usr/bin/env python3

import subprocess
import sys

arg_string = ""
idx = 1
maxidx = len(sys.argv) - 1

while idx <= maxidx:
  arg_string += "'" + sys.argv[idx] + "' "
  idx += 1

cmd = """USER_ARGS=$'{}' psql {}""".format(
  arg_string.replace("'","\\'"), arg_string)

subprocess.call(cmd, shell = True)
```

~/.psqlrc:

```
\set PROMPT1 '%p <%`sh -c "psql $USER_ARGS -A -t -c $\'select case when 
pg_is_in_recovery() then \\\'replica\\\' else \\\'master\\\' end\'"`> =# '
```

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY vs \copy HINT

2016-08-12 Thread Craig Ringer
On 12 August 2016 at 16:34, Christoph Berg  wrote:


> > postgres=# COPY x TO '/root/nopermissions';
> > ERROR:  could not open file "/root/nopermissions" for writing: Permission
> > denied
> > HINT:  Paths for COPY are on the PostgreSQL server, not the client. You
> may
> > want psql's \copy or a driver COPY ... FROM STDIN wrapper
>
> TO STDOUT.
>

Ahem, whoops. It's the simple things sometimes...

Attached amended.


> Also, I vaguely get what you wanted to say with "a driver ...
> wrapper", but it's pretty nonsensical if one doesn't know about the
> protocol details. I don't have a better suggestion now, but I think it
> needs rephrasing.


I don't like it either, but didn't come up with anything better. The
problem is that every driver calls it something different.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From e713a8a7c5373a137d417d0001b11e296b841f5a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 12 Aug 2016 15:42:12 +0800
Subject: [PATCH] Emit a HINT when COPY can't find a file

Users often get confused between COPY and \copy and try to use client-side
paths with COPY. The server of course cannot find the file.

Emit a HINT in the most common cases to help users out.
---
 src/backend/commands/copy.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f45b330..c902e8b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1761,7 +1761,9 @@ BeginCopyTo(Relation rel,
 			if (!is_absolute_path(filename))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_NAME),
-	  errmsg("relative path not allowed for COPY to file")));
+	  errmsg("relative path not allowed for COPY to file"),
+	  errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		  "You may want psql's \\copy or a driver COPY ... TO STDOUT wrapper")));
 
 			oumask = umask(S_IWGRP | S_IWOTH);
 			cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W);
@@ -1770,7 +1772,9 @@ BeginCopyTo(Relation rel,
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not open file \"%s\" for writing: %m",
-cstate->filename)));
+cstate->filename),
+		 errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		 		 "You may want psql's \\copy or a driver COPY ... TO STDOUT wrapper")));
 
 			if (fstat(fileno(cstate->copy_file), ))
 ereport(ERROR,
@@ -2796,7 +2800,9 @@ BeginCopyFrom(Relation rel,
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not open file \"%s\" for reading: %m",
-cstate->filename)));
+cstate->filename),
+		 errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		 		 "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper")));
 
 			if (fstat(fileno(cstate->copy_file), ))
 ereport(ERROR,
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

2016-08-12 Thread Mark Kirkwood
After examining the benchmark design - I see we are probably not being 
helped by the repeated insertion of keys all of form 'userxxx' 
leading to some page splitting.


However your index rebuild gets you from 5 to 3 GB - does that really 
help performance significantly?


regards

Mark

On 11/08/16 16:08, Kisung Kim wrote:

Thank you for your information.
Here is the result:

After insertions:

ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation

-+++---+++-+---+--+
   2 |  3 | 5488721920 | 44337 | 4464 | 
665545 |   0 | 0 |   52 | 11

(1 row)

After rebuild:


ycsb=# select * from pgstatindex('usertable_pkey');
 version | tree_level | index_size | root_block_no | internal_pages | 
leaf_pages | empty_pages | deleted_pages | avg_leaf_density | 
leaf_fragmentation

-+++---+++-+---+--+
   2 |  3 | 3154296832 | 41827 | 1899 |   
  383146 |   0 | 0 |90.08 |   
   0



It seems like that rebuild has an effect to reduce the number of 
internal and leaf_pages and make more dense leaf pages.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] COPY vs \copy HINT

2016-08-12 Thread Christoph Berg
Re: Craig Ringer 2016-08-12 

> I think we should emit a HINT here, something like:
> 
> ERROR:  could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No
> such file or directory'
> HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
> want psql's \copy or a driver COPY ... FROM STDIN wrapper

+1 on the idea.

> postgres=# COPY x TO '/root/nopermissions';
> ERROR:  could not open file "/root/nopermissions" for writing: Permission
> denied
> HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
> want psql's \copy or a driver COPY ... FROM STDIN wrapper

TO STDOUT.

Also, I vaguely get what you wanted to say with "a driver ...
wrapper", but it's pretty nonsensical if one doesn't know about the
protocol details. I don't have a better suggestion now, but I think it
needs rephrasing.

Christoph


signature.asc
Description: PGP signature


Re: [HACKERS] new autovacuum criterion for visible pages

2016-08-12 Thread Masahiko Sawada
On Fri, Aug 12, 2016 at 9:01 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> In short, autovacuum will need to scan by itself the VM of each
>> relation and decide based on that.
>
> That seems like a worthwhile approach to pursue.  The VM is supposed to be
> small, and if you're worried it isn't, you could sample a few pages of it.
> I do not think any of the ideas proposed so far for tracking the
> visibility percentage on-the-fly are very tenable.
>

The one visibility map page can store the information of 32672 heap
pages (255MB), but it would be cost if autovacuum scan whole
visibility map for all tables.
So I think that it's better to provide
autovacuum_vacuum_pagevisible_factor as a relopts.
And the autovacuum scans or samples the visibility map of table that
autovacuum_vacuum_pagevisible_factor is set.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] COPY vs \copy HINT

2016-08-12 Thread Craig Ringer
Hi all

I see this sort of question quite a bit:

http://stackoverflow.com/q/38903811/398670

where the user wonders why

COPY gemeenten
  FROM 'D:\CBS_woningcijfers_2014.csv'
  DELIMITER ';' CSV

fails with

ERROR:  could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No
such file or directory'

and as usual, it's because the path is on their local host not the Pg
server.

I think we should emit a HINT here, something like:

ERROR:  could not open file "D:\CBS_woningcijfers_2014.csv" for reading: No
such file or directory'
HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
want psql's \copy or a driver COPY ... FROM STDIN wrapper


as a usability improvement. Trivial patch attached.

I'm not sure how to avoid the need to translate the string multiple times,
or if the tooling will automatically flatten them. I haven't bothered with
the stat() failure or isdir cases, since they seem less likely to be the
cause of users being confused between client and server.

Sample output:

postgres=# COPY x FROM '/tmp/somepath';
ERROR:  could not open file "/tmp/somepath" for reading: No such file or
directory
HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
want psql's \copy or a driver COPY ... FROM STDIN wrapper

postgres=# COPY x TO '/root/nopermissions';
ERROR:  could not open file "/root/nopermissions" for writing: Permission
denied
HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
want psql's \copy or a driver COPY ... FROM STDIN wrapper

postgres=# COPY x TO 'relpath';
ERROR:  relative path not allowed for COPY to file
HINT:  Paths for COPY are on the PostgreSQL server, not the client. You may
want psql's \copy or a driver COPY ... FROM STDIN wrapper



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 8e9df8a8874e3ae7446c0e6ebbf4da75889d6027 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 12 Aug 2016 15:42:12 +0800
Subject: [PATCH] Emit a HINT when COPY can't find a file

Users often get confused between COPY and \copy and try to use client-side
paths with COPY. The server of course cannot find the file.

Emit a HINT in the most common cases to help users out.
---
 src/backend/commands/copy.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f45b330..69d2047 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1761,7 +1761,9 @@ BeginCopyTo(Relation rel,
 			if (!is_absolute_path(filename))
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_NAME),
-	  errmsg("relative path not allowed for COPY to file")));
+	  errmsg("relative path not allowed for COPY to file"),
+	  errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		  "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper")));
 
 			oumask = umask(S_IWGRP | S_IWOTH);
 			cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W);
@@ -1770,7 +1772,9 @@ BeginCopyTo(Relation rel,
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not open file \"%s\" for writing: %m",
-cstate->filename)));
+cstate->filename),
+		 errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		 		 "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper")));
 
 			if (fstat(fileno(cstate->copy_file), ))
 ereport(ERROR,
@@ -2796,7 +2800,9 @@ BeginCopyFrom(Relation rel,
 ereport(ERROR,
 		(errcode_for_file_access(),
 		 errmsg("could not open file \"%s\" for reading: %m",
-cstate->filename)));
+cstate->filename),
+		 errhint("Paths for COPY are on the PostgreSQL server, not the client. "
+		 		 "You may want psql's \\copy or a driver COPY ... FROM STDIN wrapper")));
 
 			if (fstat(fileno(cstate->copy_file), ))
 ereport(ERROR,
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers