Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-31 Thread Kyotaro HORIGUCHI
At Tue, 31 Jan 2017 15:07:55 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Push down more full joins in postgres_fdw

2017-01-31 Thread Etsuro Fujita

On 2017/01/30 21:05, Ashutosh Bapat wrote:

On Mon, Jan 30, 2017 at 5:00 PM, Etsuro Fujita
 wrote:

On 2017/01/27 21:25, Etsuro Fujita wrote:

Sorry, I started thinking we went in the wrong direction.  I added to
deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
each subquery present in a given join tree prior to deparsing a whole
remote query.  But that's nothing but an overhead; we need to create a
tlist for the top-level query because we use it as fdw_scan_tlist, but
not for subqueries, and we need to create retrieved_attrs for the
top-level query while deparsing the targetlist in
deparseExplicitTargetList, but not for subqueries.  So, we should need
some work to avoid such a useless overhead.



I think we can avoid the useless overhead by adding a new function,
deparseSubqueryTargetList, that deparses expressions in the given relation's
reltarget, not the tlist, as a SELECT clause of the subquery representing
the given relation.  That would also allow us to make the 1-to-1
relationship between the subquery output columns and their aliases more
explicit, which was your original comment.  Please find attached the new
version.  (The patch doesn't need the patch to avoid duplicate construction
of the tlist, discussed upthread.)



I have not looked at the patch, but the reason we use a tlist instead
of list of expressions is because fdw_scan_tlist is expected in that
form


Actually, we wouldn't need that to deparse a remote join query; a list 
of expressions would be enough.



and we don't want two different representations one to deparse
SELECT and one to interpret results from the foreign server. What you
describe above seems to introduce exactly that hazard.


I agree with you to some extent, BUT:
* I don't think it's a good idea to create a tlist for each base/join 
relation that is deparsed as a subquery, to just avoid that hazard.  As 
I said above, that's nothing but an overhead.
* I think we would need to have two different representations for at 
least base relations; we use fpinfo->attrs_used to deparse a simple 
foreign table scan query for a base relation, but when deparsing the 
base relation as a subquery, we would need to use the list of 
expressions in the base relation's reltarget, to deparse a SELECT clause 
of the subquery, because we need to deparse a whole-row reference to the 
base relation as ROW, not all the user columns expanded, as shown in 
this extracted from the regression tests in the patch:


+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 
50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 
between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 
and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) 
ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
+ 

  QUERY PLAN 




+ 


+  LockRows
+Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+->  Nested Loop
+  Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+  ->  Foreign Scan
+Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+Relations: (public.ft4) FULL JOIN (public.ft5)
+Remote SQL: SELECT s8.c1, s8.c2, s9.c1, s9.c2 FROM 
((SELECT c1, ROW(c1, c2, c3) FROM "S 1"."T 3" WHERE ((c1 >= 50)) AND 
((c1 <= 60))) s8(c1, c2) FULL JOIN (SELECT c1, ROW(c1, c2, c3) FROM "S 
1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) s9(c1, c2) ON (((s8.c1 = 
s9.c1 WHERE (((s8.c1 IS NULL) OR (s8.c1 IS NOT NULL))) ORDER BY 
s8.c1 ASC NULLS LAST, s9.c1 ASC NULLS LAST

+->  Hash Full Join
+  Output: ft4.c1, ft4.*, ft5.c1, ft5.*
+  Hash Cond: (ft4.c1 = ft5.c1)
+  Filter: ((ft4.c1 IS NULL) OR (ft4.c1 IS NOT NULL))
+  ->  Foreign Scan on public.ft4
+Output: ft4.c1, ft4.*
+Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 
3" WHERE ((c1 >= 50)) AND ((c1 <= 60))

+  ->  Hash
+Output: ft5.c1, ft5.*
+->  Foreign Scan on public.ft5
+  Output: ft5.c1, ft5.*
+  Remote SQL: SELECT c1, c2, c3 FROM "S 
1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))

+  ->  Materialize
+Output: "T 3".c1, "T 3".ctid
+->  Seq Scan on "S 1"."T 3"
+  Output: "T 3".c1, 

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 4:31 PM, Vladimir Borodin  wrote:
> 31 янв. 2017 г., в 9:50, Michael Paquier 
> написал(а):
>
>> I am marking this patch as returned with feedback.
>
> Michael, sorry, but why?

Because I have been through many patches today.

> If I understood everything right, the main question
> from Jeff was why is it implemented in such way? And Jeff wanted to see
> other ways of solving the problem. Andrew wrote about them above and it
> seems that implementing them would be quite expensive and without any
> obvious win. I would rather expect some reaction from Jeff or may be someone
> else, so shouldn’t it be marked as «Ready for committer» or at least «Moved
> to next CF»?

I have moved to to the next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Etsuro Fujita

On 2017/01/31 18:24, Abbas Butt wrote:

Postgres_fdw optimizes remote queries by pushing down the where clause.
This feature does not work consistently when the query is executed from
within a pl/pgsql function. The optimization works when the function
executes the query for the first 5 times, and fails afterwards.



I understand that this is because PostgreSQL starts using generic plan
with pulled up where clause after the 5th invocation hoping that it
would be faster since we have skiped planning the query on each
invocation, but in this case this decision is causing the query to slow
down.

How should we fix this problem?


ANALYZE for the foreign table doesn't work?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-01-31 Thread Amit Kapila
On Wed, Dec 28, 2016 at 5:20 PM, Amit Kapila  wrote:
>  To start
> with let us see the plan of TPC-H query (Q-22) and understand how it
> can be improved.
>
> Limit
>InitPlan 1 (returns $0)
>  ->  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Parallel Seq Scan on customer customer_1
>  Filter: (...)
>->  GroupAggregate
>  Group Key: ("substring"((customer.c_phone)::text, 1, 2))
>  ->  Sort
>Sort Key: ("substring"((customer.c_phone)::text, 1, 2))
>->  Nested Loop Anti Join
>  ->  Seq Scan on customer
>Filter: ((c_acctbal > $0) AND (...)))
>  ->  Index Only Scan using idx_orders_custkey on orders
>Index Cond: (o_custkey = customer.c_custkey)
>
>
> In the above plan, we can see that the join on customer and orders
> table (Nested Loop Anti Join) is not parallelised even though we have
> the capability to parallelize Nested Loop Joins. The reason for not
> choosing the parallel plan is that one of the nodes (Seq Scan on
> customer) is referring to initplan and we consider such nodes as
> parallel-restricted which means they can't be parallelised.  Now, I
> could see three ways of parallelizing such a query.  The first way is
> that we just push parallel-safe initplans to workers and allow them to
> execute it, the drawback of this approach is that it won't be able to
> push initplans in cases as shown above where initplan is
> parallel-unsafe (contains Gather node) and second is we will lose the
> expectation of single evaluation.  The second way is that we always
> execute the initplan in the master backend and pass the resultant
> value to the worker, this will allow above form of plans to push
> initplans to workers and hence can help in enabling parallelism for
> other nodes in plan tree.
>

I have used the second way to parallelize queries containing initplans
as that can help in cases where initplans in itself also uses
parallelism and it will also retain an existing expectation of single
evaluation for initplans. The basic idea as mentioned in above mail is
to evaluate the initplans at Gather node and pass the value to worker
backends which can use it as required. The patch has used
*plan->allParam* bitmapset to evaluate the initplans at Gather node
(we traverse the planstate tree to find params at each node and we
take care to avoid multiple evaluations of same initplan).  To
identify initplan params among other params in *allParams*, the patch
has added an additional bool variable (isinitplan) in ParamExecData.
We can do it in some other way as well if there is any better
suggestion.

The patch has also changed the explain output of queries where
initplan param is evaluated at Gather node. For ex.

postgres=# explain (costs off) select t1.i from t1, t2 where t1.j=t2.j
and t1.k < (select max(k) from t3) and t1.k < (select max(k) from t3);
   QUERY PLAN

 Hash Join
   Hash Cond: (t2.j = t1.j)
   InitPlan 1 (returns $0)
 ->  Finalize Aggregate
   ->  Gather
 Workers Planned: 1
 ->  Partial Aggregate
   ->  Parallel Seq Scan on t3
   InitPlan 2 (returns $1)
 ->  Finalize Aggregate
   ->  Gather
 Workers Planned: 1
 ->  Partial Aggregate
   ->  Parallel Seq Scan on t3 t3_1
   ->  Gather
 Workers Planned: 1
 ->  Parallel Seq Scan on t2
   ->  Hash
 ->  Gather
   Workers Planned: 1
   Params Evaluated: $0, $1
   ->  Parallel Seq Scan on t1
 Filter: ((k < $0) AND (k < $1))
(23 rows)


In the above plan, you can notice a line (Params Evaluated: $0, $1)
which indicates the params evaluated at Gather node.  As of now,
explain just uses the *allParam* params present at the Gather node,
but we need to traverse the planstate tree as we do during execution.
This patch gives 2.5~3x performance benefit for Q-22 of TPC-H.

>   The drawback of the second approach is
> that we need to evaluate the initplan before it is actually required
> which means that we might evaluate it even when it is not required.  I
> am not sure if it is always safe to assume that we can evaluate the
> initplan before pushing it to workers especially for the cases when it
> is far enough down in the plan tree which we are parallelizing,
>

I think we can always pull up un-correlated initplans at Gather node,
however, if there is a correlated initplan, then it is better not to
allow such initplans for being pushed below gather.  Ex. of correlated
initplans:

postgres=# explain (costs off) select * from t1 where t1.i in (select
t2.i from t2 where t1.k = 

Re: [HACKERS] Parallel bitmap heap scan

2017-01-31 Thread Dilip Kumar
On Tue, Jan 31, 2017 at 7:47 AM, Haribabu Kommi
 wrote:
> Thanks for the update. I have some comments
>
Thanks for the review.

>
> 0002-hash-support-alloc-free-v14.patch:
>
>
> + if (tb->alloc)
> + {
>
> The memory for tb->alloc is allocated always, is the if check still
> required?

In parallel case, only first worker will call SH_CREATE, other worker
will only do palloc for pagetable and copy the reference from main
worker, so they will not have allocator.


> 0003-parallel-bitmap-heap-scan-v14.patch:
>
>
> + * and set parallel flag in lower level bitmap index scan. Later
> + * bitmap index node will use this flag to indicate tidbitmap that
> + * it needs to create an shared page table.
> + */
>
> I feel better to mention, where this flag is used, so that it will be more
> clear.

Done
>
>
> + * Mark tidbitmap as shared and also store DSA area in it.
> + * marking tidbitmap as shared is indication that this tidbitmap
> + * should be created in shared memory. DSA area will be used for
>
> The flag of shared is set in tidbitmap structure itself, but the
> comment is mentioned that tidbitmpa should be created in shared memory.
> I think that is the page table that needs to be created in shared memory.
> Providing some more information here will be helpful.
>
Done

>
> - node->tbmres = tbmres = tbm_iterate(tbmiterator);
> + node->tbmres = tbmres = pbms_parallel_iterate(tbmiterator,
> + pbminfo ? >tbmiterator : NULL);
>
> Instead Passing both normal and paralle iterators to the functions
> and checking inside again for NULL, How about just adding check
> in the caller itself? Or if you prefer the current method, then try to
> explain the details of input in the function header and more description.
>
Done as you said.

>
> + /* Increase prefetch target if it's not yet at the max. */
> + if (node->prefetch_target < node->prefetch_maximum)
>
> I didn't evaluate all scenarios, but the above code may have a problem,
> In case of parallel mode the the prefetch_target is fetched from node
> and not from the pbminfo. Later it gets from the pminfo and update that.
> May be this code needs to rewrite.
>
Fixed it.

>
> + TBMIterator *iterator = node->prefetch_iterator;
>
> Why another variable? Why can't we use the prefetch_iterator itself?
> Currently node->tbmiterator and node->prefetch_iterator are initialized
> irrespective of whether is it a parallel one or not. But while using, there
> is a check to use the parallel one in case of parallel. if it is the case,
> why can't we avoid the initialization itself?

Fixed
>
>
> + else
> + needWait = false;
>
> By default needWait is false. Just set that to true only for the
> case of PBM_INPROGRESS
>

Actually inside the while loop, suppose first we set needWait = true,
if PBM_INPROGRESS and got for ConditionalSleep, After it come out of
sleep, we need to check that PBM_FINISHED is set or we need to sleep
again, So in such case we need to reset it to needWait=false. However
this can be done by directly returning if it's PBM_FINISHED. But I
want to keep below the logic common.
+ SpinLockRelease(>state_mutex);
+
+ /* If we are leader or leader has already created a TIDBITMAP */
+ if (leader || !needWait)
+ break;

> + * so that during free we can directly get the dsa_pointe and free it.
>
> dsa_pointe -> dsa_pointer
Done

>
>
> +typedef struct
> +{
> + TIDBitmap  *tbm; /* TIDBitmap we're iterating over */
> + int spageptr; /* next spages index */
> + int schunkptr; /* next schunks index */
> + int schunkbit; /* next bit to check in current schunk */
> + TBMIterateResult output; /* MUST BE LAST (because variable-size) */
> +} TBMIterator;
>
> I didn't find the need of moving this structure. Can you point me where it
> is used.

Because  pbms_parallel_iterate need to access this structure so I
moved it to tidbitmap.h

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0003-parallel-bitmap-heap-scan-v15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-01-31 Thread Craig Ringer
On 31 Jan. 2017 19:29, "Michael Paquier"  wrote:

On Fri, Jan 27, 2017 at 8:52 AM, Craig Ringer  wrote:
> Now, if it's simpler to just xlog the gid at COMMIT PREPARED time when
> wal_level >= logical I don't think that's the end of the world. But
> since we already have almost everything we need in memory, why not
> just stash the gid on ReorderBufferTXN?

I have been through this thread... And to be honest, I have a hard
time understanding for which purpose the information of a 2PC
transaction is useful in the case of logical decoding.


TL;DR: this lets us decode the xact after prepare but before commit so
decoding/replay outcomes can affect the commit-or-abort decision.


The prepare and
commit prepared have been received by a node which is at the root of
the cluster tree, a node of the cluster at an upper level, or a
client, being in charge of issuing all the prepare queries, and then
issue the commit prepared to finish the transaction across a cluster.
In short, even if you do logical decoding from the root node, or the
one at a higher level, you would care just about the fact that it has
been committed.


That's where you've misunderstood - it isn't committed yet. The point or
this change is to allow us to do logical decoding at the PREPARE
TRANSACTION point. The xact is not yet committed or rolled back.

This allows the results of logical decoding - or more interestingly results
of replay on another node / to another app / whatever to influence the
commit or rollback decision.

Stas wants this for a conflict-free logical semi-synchronous replication
multi master solution. At PREPARE TRANSACTION time we replay the xact to
other nodes, each of which applies it and PREPARE TRANSACTION, then replies
to confirm it has successfully prepared the xact. When all nodes confirm
the xact is prepared it is safe for the origin node to COMMIT PREPARED. The
other nodes then see hat the first node has committed and they commit too.

Alternately if any node replies "could not replay xact" or "could not
prepare xact" the origin node knows to ROLLBACK PREPARED. All the other
nodes see that and rollback too.

This makes it possible to be much more confident that what's replicated is
exactly the same on all nodes, with no after-the-fact MM conflict
resolution that apps must be aware of to function correctly.

To really make it rock solid you also have to send the old and new values
of a row, or have row versions, or send old row hashes. Something I also
want to have, but we can mostly get that already with REPLICA IDENTITY FULL.

It is of interest to me because schema changes in MM logical replication
are more challenging awkward and restrictive without it. Optimistic
conflict resolution doesn't work well for schema changes and once the
conflciting schema changes are committed on different nodes there is no
going back. So you need your async system to have a global locking model
for schema changes to stop conflicts arising. Or expect the user not to do
anything silly / misunderstand anything and know all the relevant system
limitations and requirements... which we all know works just great in
practice. You also need a way to ensure that schema changes don't render
committed-but-not-yet-replayed row changes from other peers nonsensical.
The safest way is a barrier where all row changes committed on any node
before committing the schema change on the origin node must be fully
replayed on every other node, making an async MM system temporarily sync
single master (and requiring all nodes to be up and reachable). Otherwise
you need a way to figure out how to conflict-resolve incoming rows with
missing columns / added columns / changed types / renamed tables  etc which
is no fun and nearly impossible in the general case.

2PC decoding lets us avoid all this mess by sending all nodes the proposed
schema change and waiting until they all confirm successful prepare before
committing it. It can also be used to solve the row compatibility problems
with some more lazy inter-node chat in logical WAL messages.

I think the purpose of having the GID available to the decoding output
plugin at PREPARE TRANSACTION time is that it can co-operate with a global
transaction manager that way. Each node can tell the GTM "I'm ready to
commit [X]". It is IMO not crucial since you can otherwise use a (node-id,
xid) tuple, but it'd be nice for coordinating with external systems,
simplifying inter node chatter, integrating logical deocding into bigger
systems with external transaction coordinators/arbitrators etc. It seems
pretty silly _not_ to have it really.

Personally I don't think lack of access to the GID justifies blocking 2PC
logical decoding. It can be added separately. But it'd be nice to have
especially if it's cheap.


Re: [HACKERS] logical decoding of two-phase transactions

2017-01-31 Thread Konstantin Knizhnik



On 31.01.2017 09:29, Michael Paquier wrote:

On Fri, Jan 27, 2017 at 8:52 AM, Craig Ringer  wrote:

Now, if it's simpler to just xlog the gid at COMMIT PREPARED time when
wal_level >= logical I don't think that's the end of the world. But
since we already have almost everything we need in memory, why not
just stash the gid on ReorderBufferTXN?

I have been through this thread... And to be honest, I have a hard
time understanding for which purpose the information of a 2PC
transaction is useful in the case of logical decoding. The prepare and
commit prepared have been received by a node which is at the root of
the cluster tree, a node of the cluster at an upper level, or a
client, being in charge of issuing all the prepare queries, and then
issue the commit prepared to finish the transaction across a cluster.
In short, even if you do logical decoding from the root node, or the
one at a higher level, you would care just about the fact that it has
been committed.

Sorry, may be I do not completely understand your arguments.
Actually our multimaster is completely based now on logical replication 
and 2PC (more precisely we are using 3PC now:)
State of transaction (prepared, precommitted, committed) should be 
persisted in WAL  to make it possible to perform recovery.
Recovery can involve transactions in any state. So there three records 
in the WAL: PREPARE, PRECOMMIT, COMMIT_PREPARED and
recovery can involve either all of them, either 
PRECOMMIT+COMMIT_PREPARED either just COMMIT_PREPARED.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-01-31 Thread Kyotaro HORIGUCHI
Hello, thank you for the revised patch.

The only comment from me is about comments on the new over*2D
funnctions.


At Mon, 30 Jan 2017 21:12:31 +0300, Nikita Glukhov  
wrote in <4450e7a6-01e7-0fb2-a01e-98fb5405d...@postgrespro.ru>
> On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> > At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov
> >  wrote
> > in <9ea5b157-478c-8874-bc9b-050076b7d...@postgrespro.ru>:
> >> Working on the tests for new SP-GiST opclasses for polygons and
> >> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some
> >> operators
> >> (&<, &>, $<|, |&>) have wrong tests in
> >> spg_box_quad_inner_consistent().
> >> This obviously leads to incorrect results of a SP-GiST index scan (see
> >> tests
> >> in the attached patch, their results were taken from a sequential
> >> scan).
> >
> > Your problem is not necessarily evident for others.  Perhaps you
> > have to provide an explanation and/or a test case that describes
> > what is wrong for you so that others can get a clue for this
> > problem. Simpler test is better.
> 
> The problem is that there are mixed low/high values for x/y axes.  For
> example,
> overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range
> >right,
> and also lower2D() here must use query->high instead of query->low.
> 
> The corresponding test is very simple: insert 1 nearly arbitrary
> boxes and
> see the difference between results of sequential scan and results of
> index scan.
> 
> regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box);
> 
> regression=# INSERT INTO quad_box_tbl
>   SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
>   FROM generate_series(1, 100) x,
>generate_series(1, 100) y;
> 
> regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING
> spgist(b);

Thank you for the explanation and the test with fixed numbers.
I clearly understand what is the problem.

> regression=# SET enable_seqscan = ON;
> regression=# SET enable_indexscan = OFF;
> regression=# SET enable_bitmapscan = OFF;
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box
> '((100,200),(300,500))';
>  count
> ---
>   2900
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box
> '((100,200),(300,500))';
>  count
> ---
>   9100
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box
> '((100,200),(300,500))';
>  count
> ---
>   4900
> (1 row)
> 
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box
> '((100,200),(300,500))';
>  count
> ---
>   8100
> (1 row)
> 
> regression=# SET enable_seqscan = OFF;
> regression=# SET enable_indexscan = ON;
> regression=# SET enable_bitmapscan = ON;
(different result for the same queies)

The minimal bad example for the '&<' case is the following.

=# create table t (b box);
=# create index on t using spgist(b);
=# insert into t values ('((215, 305),(210,300))'::box);
=# select * from t where b &< '((100,200),(300,400))'::box;
  b  
-
 (215,305),(210,300)
(1 row)

=# set enable_seqscan to false;
=# select * from t where b &< '((100,200),(300,400))'::box;
 b 
---
(0 rows)

Index scan excludes the row and it obviously is not a difference
comes from error nor tolerance.

The current overLeft4D is equivalent to the following.

| FPlt(rect_box->range_box_x->left.high,  query->right.high) &&
| FPlt(rect_box->range_box_x->right.high, query->right.high)

This is translated into the BOX context as the following.

| FPlt(range high(low.x), query->high.y) &&
| FPlt(range high(high.x), query->high.y)

Yes, it is obviously wrong as your report.


The code part of the v02 patch looks very good than the previous
one. It is exactly what I had in my mind. Thanks.

The following comment,

> /* Can any range from range_box to be overlower than this argument? */

This might be better to be using the same wording to its users,
for example the comment for overLeft4D is the following.

| /* Can any rectangle from rect_box does not extend the right of this 
argument? */

And the documentation uses the following sentsnce,

https://www.postgresql.org/docs/current/static/functions-geometry.html
| Does not extend to the right of?

So I think "Can any range from range_box does not extend the
upper of this argument?" is better than the overlower. What do
you think?


> > counting them in this way is unstable. Even though this were
> > stable due to a fixed initial, this would be unacceptable, I
> > think. This kind of test should use nonrandom numbers.
> 
> I have replaced random data in this test with stable uniformly
> distributed data.
> I would also like to use existing box data from rect.data that is
> loaded to
> slow_emp4000 table in copy.sql test, but box.sql test is executed
> before copy.sql.

Nonrandom data is good. I'd like to compare the result of
corresnponding queries but unfortunately we don't have 

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2017-01-31 Thread Pavel Stehule
2017-01-31 11:10 GMT+01:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> At Tue, 31 Jan 2017 15:07:55 +0900, Michael Paquier <
> michael.paqu...@gmail.com> wrote in  8v0X6A4gKQb2Uc=mc+...@mail.gmail.com>
> > On Tue, Jan 31, 2017 at 2:58 PM, Pavel Stehule 
> wrote:
> > > I found a error - I sent mail only to author 2016-12-31 :( - It is my
> > > mistake. I am sorry
> >
> > Ah... Thanks for the update. No problem.
>
> Ouch. Sorry for missing you commnet. I'll dig my mail box for that.
>



I am doing a review of this set of patches:

1. There are no problem with patching, compiling - all regress tests passed

2. tab complete doesn't work well if I am manually writing "create index
on" - second "ON" is appended - it is a regression

I didn't find any other issues -

note: not necessary to implement (nice to have) - I miss a support for OR
REPLACE flag - it is related to LANGUAGE, TRANSFORMATION,  FUNCTION and
RULE.

Regards


>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>


[HACKERS] An issue in remote query optimization

2017-01-31 Thread Abbas Butt
Hi,
Postgres_fdw optimizes remote queries by pushing down the where clause.
This feature does not work consistently when the query is executed from
within a pl/pgsql function. The optimization works when the function
executes the query for the first 5 times, and fails afterwards.

Example:
Step 1:
Create the table on the foreign server and insert some rows in it
create table numbers(a int, b varchar(255));
insert into numbers values(1, 'One');
insert into numbers values(2, 'Two');
insert into numbers values(3, 'Three');
insert into numbers values(4, 'Four');
insert into numbers values(5, 'Five');
insert into numbers values(6, 'Six');
insert into numbers values(7, 'Seven');
insert into numbers values(8, 'Eight');
insert into numbers values(9, 'Nine');

Step 2:
Create the following objects on the local server
CREATE SERVER pg_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'127.0.0.1', port '5552', dbname 'test');
CREATE USER MAPPING FOR abbasbutt SERVER pg_server OPTIONS (user
'abbasbutt', password 'abc123');
CREATE FOREIGN TABLE foreign_numbers(a int, b varchar(255)) SERVER
pg_server OPTIONS (table_name 'numbers');

create or replace function test_pg_fdw() returns void as $$
DECLARE
 n varchar;
BEGIN
 FOR x IN 1..9 LOOP
 select b into n from foreign_numbers where a=x;
 raise notice 'Found number %', n;
 end loop;
 return;
END
$$ LANGUAGE plpgsql;

Step 3:
Run the test:
select test_pg_fdw();

Step 4:
Check the output of auto_explain in the local server log

2017-01-31 00:39:25 PST LOG:  duration: 8.388 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
width=516)
  Output: b
  Remote SQL: SELECT b FROM public.numbers WHERE ((a = 1))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.315 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
width=516)
  Output: b
  Remote SQL: SELECT b FROM public.numbers WHERE ((a = 2))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.250 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
width=516)
  Output: b
  Remote SQL: SELECT b FROM public.numbers WHERE ((a = 3))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.257 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
width=516)
  Output: b
  Remote SQL: SELECT b FROM public.numbers WHERE ((a = 4))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.271 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..111.91 rows=1
width=516)
  Output: b
  Remote SQL: SELECT b FROM public.numbers WHERE ((a = 5))
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.251 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1
width=516)
  Output: b
  Filter: (foreign_numbers.a = $3)
  Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.246 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1
width=516)
  Output: b
  Filter: (foreign_numbers.a = $3)
  Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST CONTEXT:  SQL statement "select bfrom
foreign_numbers where a=x"
PL/pgSQL function test_pg_fdw() line 6 at SQL statement
2017-01-31 00:39:25 PST LOG:  duration: 0.226 ms  plan:
Query Text: select bfrom foreign_numbers where a=x
Foreign Scan on public.foreign_numbers  (cost=100.00..114.91 rows=1
width=516)
  Output: b
  Filter: (foreign_numbers.a = $3)
  Remote SQL: SELECT a, b FROM public.numbers
2017-01-31 00:39:25 PST 

Re: [HACKERS] Radix tree for character conversion

2017-01-31 Thread Kyotaro HORIGUCHI
At Tue, 31 Jan 2017 12:25:46 +0900, Michael Paquier  
wrote in 
> On Mon, Jan 30, 2017 at 3:37 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, this is the revised version of character conversion using radix tree.
> 
> Thanks for the new version, I'll look at it once I am done with the
> cleanup of the current CF. For now I have moved it to the CF 2017-03.

Agreed. Thank you.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2017-01-31 Thread Pavel Stehule
2017-01-24 21:38 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
>
> > * SELECT (xmltable(..)).* + regress tests
> > * compilation and regress tests without --with-libxml
>
> Thanks.  I just realized that this is doing more work than necessary --
> I think it would be simpler to have tableexpr fill a tuplestore with the
> results, instead of just expecting function execution to apply
> ExecEvalExpr over and over to obtain the results.  So evaluating a
> tableexpr returns just the tuplestore, which function evaluation can
> return as-is.  That code doesn't use the value-per-call interface
> anyway.
>
> I also realized that the expr context callback is not called if there's
> an error, which leaves us without shutting down libxml properly.  I
> added PG_TRY around the fetchrow calls, but I'm not sure that's correct
> either, because there could be an error raised in other parts of the
> code, after we've already emitted a few rows (for example out of
> memory).  I think the right way is to have PG_TRY around the execution
> of the whole thing rather than just row at a time; and the tuplestore
> mechanism helps us with that.
>
> I think it would be good to have a more complex test case in regress --
> let's say there is a table with some simple XML values, then we use
> XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> large document, and then XMLTABLE uses that document as input document.
>

I have a 16K lines long real XML 6.MB. Probably we would not to append it
to regress tests.

It is really fast - original customer implementation 20min, nested our
xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

I have a plan to create tests based on pg_proc and CTE - if all works, then
the query must be empty

with x as (select proname, proowner, procost, pronargs,
array_to_string(proargnames,',') as proargnames,
array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
lateral xmltable('/proc' passing proc columns proname name, proowner oid,
procost float, pronargs int, proargnames text, proargtypes text)) select *
from z except select * from x;



>
> Please fix.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Abbas Butt
On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita 
wrote:

> On 2017/01/31 18:24, Abbas Butt wrote:
>
>> Postgres_fdw optimizes remote queries by pushing down the where clause.
>> This feature does not work consistently when the query is executed from
>> within a pl/pgsql function. The optimization works when the function
>> executes the query for the first 5 times, and fails afterwards.
>>
>
> I understand that this is because PostgreSQL starts using generic plan
>> with pulled up where clause after the 5th invocation hoping that it
>> would be faster since we have skiped planning the query on each
>> invocation, but in this case this decision is causing the query to slow
>> down.
>>
>> How should we fix this problem?
>>
>
> ANALYZE for the foreign table doesn't work?
>

No.

analyze ts.tickets;
WARNING:  skipping "tickets" --- cannot analyze this foreign table
ANALYZE



>
> Best regards,
> Etsuro Fujita
>
>
>


-- 
-- 
*Abbas*
Architect

Ph: 92.334.5100153
Skype ID: gabbasb
www.enterprisedb.co m


*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
 and more



Re: [HACKERS] parallelize queries containing subplans

2017-01-31 Thread Amit Kapila
On Tue, Jan 24, 2017 at 10:59 AM, Amit Kapila  wrote:
> On Thu, Jan 19, 2017 at 4:51 PM, Dilip Kumar  wrote:
>> On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
>>> During debugging I found that subplan created for below part of the
>>> query is parallel_unsafe, Is it a problem or there is some explanation
>>> of why it's not parallel_safe,
>>
>> Okay, so basically we don't have any mechanism to perform parallel
>> scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
>> must come along with CTE scan. So I think we can avoid setting below
>> code because we will never be able to test its side effect, another
>> argument can be that if we don't consider the final effect, and just
>> see this subplan then by logic it should be marked parallel-safe or
>> unsafe as per it's path and it will not have any side effect, as it
>> will finally become parallel-unsafe. So it's your call to keep it
>> either way.
>>
>
> Yeah, actually setting parallel_safety information for subplan from
> corresponding is okay.  However, in this particular case as we know
> that it might not be of any use till we enable parallelism for CTE
> Scan (and doing that is certainly not essential for this project).
> So, I have set parallel_safe as false for CTE subplans.
>

Moved this patch to next CF.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Etsuro Fujita

On 2017/01/31 19:53, Abbas Butt wrote:

On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
> wrote:
On 2017/01/31 18:24, Abbas Butt wrote:



Postgres_fdw optimizes remote queries by pushing down the where
clause.
This feature does not work consistently when the query is
executed from
within a pl/pgsql function. The optimization works when the function
executes the query for the first 5 times, and fails afterwards.



I understand that this is because PostgreSQL starts using
generic plan
with pulled up where clause after the 5th invocation hoping that it
would be faster since we have skiped planning the query on each
invocation, but in this case this decision is causing the query
to slow
down.



How should we fix this problem?



ANALYZE for the foreign table doesn't work?



No.

analyze ts.tickets;
WARNING:  skipping "tickets" --- cannot analyze this foreign table
ANALYZE


How the foreign table ts.tickets is defined?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:11 AM, Peter Eisentraut
 wrote:
> And here is a rebased patch for the original feature.  I think this
> addresses all raised concerns and suggestions now.

Thanks for the new version. That looks good to me after an extra lookup.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 6:49 PM, Masahiko Sawada  wrote:
> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
> use attached patch.

This patch has been moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_xlogdump follow into the future

2017-01-31 Thread Michael Paquier
On Fri, Dec 2, 2016 at 1:36 PM, Haribabu Kommi  wrote:
> Patch received feedback at the end of commitfest.
> Closed in 2016-11 commitfest with "moved to next CF".
> Please feel free to update the status once you submit the updated patch.

And the thread has died as well weeks ago. I am marking that as
"returned with feedback" as there are no new patches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-01-31 Thread Michael Paquier
On Fri, Jan 13, 2017 at 12:23 PM, Amit Kapila  wrote:
> On Fri, Jan 13, 2017 at 1:04 AM, Jesper Pedersen
>  wrote:
>> On 12/27/2016 01:58 AM, Amit Kapila wrote:
>>>
>>> After recent commit's 7819ba1e and 25216c98, this patch requires a
>>> rebase.  Attached is the rebased patch.
>>>
>>
>> This needs a rebase after commit e898437.
>>
>
> Attached find the rebased patch.

The patch applies, and there have been no reviews since the last
version has been submitted. So moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 7:37 AM, Craig Ringer  wrote:
> Rebased series attached, on top of current master (which includes
> logical replicaiton).
>
> I'm inclined to think I should split out a few of the changes from
> 0005, roughly along the lines of the bullet points in its commit
> message. Anyone feel strongly about how granular this should be?
>
> This patch series is a pre-requisite for supporting logical
> replication using a physical standby as a source, but does not its
> self enable logical replication from a physical standby.

There are patches but no reviews yet so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan  wrote:
> On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
>  wrote:
>> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
>>> Attached is V7 of the patch.
>>
>> I am doing some testing.  First, some superficial things from first pass:
>>
>> [Various minor cosmetic issues]
>
> Oops.

As this review is very recent, I have moved the patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-31 Thread Michael Paquier
On Mon, Jan 23, 2017 at 6:56 PM, Etsuro Fujita
 wrote:
> Other changes:
>
> * Also modified CreateLocalJoinPath so that we pass outer/inner paths, not
> outer/inner rels, because it would be more flexible for the FDW to build the
> local-join path from paths it chose.
> * Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

The patch is moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
 wrote:
> I noticed that this patch is conflicting with 665d1fa (Logical
> replication) so I rebased this. Only executor/Makefile
> conflicted.

The patches still apply, moved to CF 2017-03. Be aware of that:
$ git diff HEAD~6 --check
contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces.
+   PendingAsyncRequest *areq,
contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces.
+   bool reinit);
src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cache Hash Index meta page.

2017-01-31 Thread Michael Paquier
On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy  wrote:
>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>> force_refresh);
>>
>> If the cache is initialized and force_refresh is not true, then this
>> just returns the cached data, and the metabuf argument isn't used.
>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>> the metabuffer, pin and lock it, use it to set the cache, release the
>> lock, and return with the pin still held.  If *metabuf !=
>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>
> Thanks, Robert I have made a new patch which tries to do same. Now I
> think code looks less complicated.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-31 Thread Michael Paquier
On Fri, Jan 20, 2017 at 3:03 AM, Petr Jelinek
 wrote:
> Okay, here is v3 with some small fixes and rebased on top of rename.
> Also it's rebased without the
> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch as I
> don't expect that one to go further for now.
>
> Thanks for testing!

This patch needs a rebase, moved to next CF with "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 6:22 PM, Craig Ringer  wrote:
> That's where you've misunderstood - it isn't committed yet. The point or
> this change is to allow us to do logical decoding at the PREPARE TRANSACTION
> point. The xact is not yet committed or rolled back.

Yes, I got that. I was looking for a why or an actual use-case.

> Stas wants this for a conflict-free logical semi-synchronous replication
> multi master solution.

This sentence is hard to decrypt, less without "multi master" as the
concept applies basically only to only one master node.

> At PREPARE TRANSACTION time we replay the xact to
> other nodes, each of which applies it and PREPARE TRANSACTION, then replies
> to confirm it has successfully prepared the xact. When all nodes confirm the
> xact is prepared it is safe for the origin node to COMMIT PREPARED. The
> other nodes then see hat the first node has committed and they commit too.

OK, this is the argument I was looking for. So in your schema the
origin node, the one generating the changes, is itself in charge of
deciding if the 2PC should work or not. There are two channels between
the origin node and the replicas replaying the logical changes, one is
for the logical decoder with a receiver, the second one is used to
communicate the WAL apply status. I thought about something like
postgres_fdw doing this job with a transaction that does writes across
several nodes, that's why I got confused about this feature.
Everything goes through one channel, so the failure handling is just
simplified.

> Alternately if any node replies "could not replay xact" or "could not
> prepare xact" the origin node knows to ROLLBACK PREPARED. All the other
> nodes see that and rollback too.

The origin node could just issue the ROLLBACK or COMMIT and the
logical replicas would just apply this change.

> To really make it rock solid you also have to send the old and new values of
> a row, or have row versions, or send old row hashes. Something I also want
> to have, but we can mostly get that already with REPLICA IDENTITY FULL.

On a primary key (or a unique index), the default replica identity is
enough I think.

> It is of interest to me because schema changes in MM logical replication are
> more challenging awkward and restrictive without it. Optimistic conflict
> resolution doesn't work well for schema changes and once the conflicting
> schema changes are committed on different nodes there is no going back. So
> you need your async system to have a global locking model for schema changes
> to stop conflicts arising. Or expect the user not to do anything silly /
> misunderstand anything and know all the relevant system limitations and
> requirements... which we all know works just great in practice. You also
> need a way to ensure that schema changes don't render
> committed-but-not-yet-replayed row changes from other peers nonsensical. The
> safest way is a barrier where all row changes committed on any node before
> committing the schema change on the origin node must be fully replayed on
> every other node, making an async MM system temporarily sync single master
> (and requiring all nodes to be up and reachable). Otherwise you need a way
> to figure out how to conflict-resolve incoming rows with missing columns /
> added columns / changed types / renamed tables  etc which is no fun and
> nearly impossible in the general case.

That's one vision of things, FDW-like approaches would be a second,
but those are not able to pass down utility statements natively,
though this stuff can be done with the utility hook.

> I think the purpose of having the GID available to the decoding output
> plugin at PREPARE TRANSACTION time is that it can co-operate with a global
> transaction manager that way. Each node can tell the GTM "I'm ready to
> commit [X]". It is IMO not crucial since you can otherwise use a (node-id,
> xid) tuple, but it'd be nice for coordinating with external systems,
> simplifying inter node chatter, integrating logical deocding into bigger
> systems with external transaction coordinators/arbitrators etc. It seems
> pretty silly _not_ to have it really.

Well, Postgres-XC/XL save the 2PC GID for this purpose in the GTM,
this way the COMMIT/ABORT PREPARED can be issued from any nodes, and
there is a centralized conflict resolution, the latter being done with
a huge cost, causing much bottleneck in scaling performance.

> Personally I don't think lack of access to the GID justifies blocking 2PC
> logical decoding. It can be added separately. But it'd be nice to have
> especially if it's cheap.

I think it should be added reading this thread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-01-31 Thread Michael Paquier
On Mon, Jan 30, 2017 at 10:46 AM, Tsunakawa, Takayuki
 wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
>> I think it is better to document in some way if we decide to go-ahead with
>> the patch.
>
> Sure, I added these sentences.

Patch has been moved to CF 2017-03. There is a recent new patch, and
the discussion is moving on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-31 Thread Michael Paquier
On Fri, Jan 27, 2017 at 10:26 AM, Haribabu Kommi
 wrote:
> Thanks for the review.
> Let's wait for the committer's opinion.

I have moved this patch to CF 2017-03 to wait for this to happen.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:31 PM, Michael Paquier
 wrote:
> On Tue, Nov 29, 2016 at 1:33 PM, Michael Paquier
>  wrote:
>> Patch moved to CF 2017-01.
>
> And nothing has happened since, the patch rotting a bit because of a
> conflict in pg_dump's TAP tests. Attached is a rebased version.

Moved to CF 2017-03..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Forbid use of LF and CR characters in database and role names

2017-01-31 Thread Michael Paquier
On Tue, Nov 29, 2016 at 1:33 PM, Michael Paquier
 wrote:
> On Fri, Nov 25, 2016 at 4:41 PM, Michael Paquier
>  wrote:
>> On Fri, Nov 25, 2016 at 4:02 PM, Ideriha, Takeshi
>>  wrote:
>>> I applied your fixed patch and new one, and confirmed the applied source 
>>> passed the tests successfully. And I also checked manually the error 
>>> messages were emitted successfully when cr/lf are included in dbname or 
>>> rolename or data_directory.
>>>
>>> AFAICT, this patch satisfies the concept discussed before. So I’ve switched 
>>> this patch “Ready for Committer”.
>>
>> Thanks for the review, Ideriha-san. (See you next week perhaps?)
>
> Patch moved to CF 2017-01.

And nothing has happened since, the patch rotting a bit because of a
conflict in pg_dump's TAP tests. Attached is a rebased version.
-- 
Michael


0001-Forbid-newline-and-carriage-return-characters-in-dat.patch
Description: Binary data


0002-Ensure-clean-up-of-data-directory-even-with-restrict.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 9:36 AM, Alvaro Herrera  wrote:
>> I propose that we should finish the job by inventing CatalogTupleDelete(),
>> which for the moment would be a trivial wrapper around
>> simple_heap_delete(), maybe just a macro for it.
>>
>> If there's no objections I'll go make that happen in a day or two.
>
> Sounds good.

As you are on it, I have moved the patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2017-01-31 Thread Michael Paquier
On Mon, Jan 30, 2017 at 8:30 PM, Etsuro Fujita
 wrote:
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery, which
> are flags to indicate whether to deparse the input relations as subqueries.
> is_subquery_rel would work well for handling the cases of full joins with
> restrictions on the input relations, but I noticed that that wouldn't work
> well when extending to handle the cases where we deparse the input relations
> as subqueries to evaluate PHVs remotely.
> * Since appendSubqueryAlias in the previous patch is pretty small, I
> included the code into deparseRangeTblRef.

The patch is very fresh, so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: session server side variables

2017-01-31 Thread Pavel Stehule
2017-02-01 6:05 GMT+01:00 Michael Paquier :

> On Wed, Jan 11, 2017 at 10:42 PM, Craig Ringer 
> wrote:
> > There is no code yet. Code review and testing is where things get firmer.
> >
> > My personal stance right now is that I'd like to see catalog-decared
> typed
> > variables. I would prefer them to be transactional and would at least
> oppose
> > anything that didn't allow future room for that capability. I'd prefer
> that
> > non-transactional vars be clearly declared as such.
> >
> > In the end though... I'm not the one implementing it. I can have some
> > influence through the code review process. But it's whoever steps up
> with a
> > proposed implementation that has the biggest say. The rest of us can say
> yes
> > or no to some degree... but nobody can make someone else implement
> something
> > they don't want.
>
> The last patch is from the 6th of December and does not apply anymore:
> https://www.postgresql.org/message-id/CAFj8pRA9w_AujBAYdLR0UVfXwwoxhmn%
> 2BFbNHnD3_NL%3DJ9x3y8w%40mail.gmail.com
> I don't have a better idea than marking this patch as "returned with
> feedback" for now, as the thread has died 3 weeks ago as well.
>

There is not a agreement on the form of session variables.

Regards

Pavel


> --
> Michael
>


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 8:01 AM, Michael Paquier
 wrote:
> On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:

>> +/*
>> + * Leave if no masking functions defined, this is possible in the case
>> + * resource managers generating just full page writes, comparing an
>> + * image to itself has no meaning in those cases.
>> + */
>> +if (RmgrTable[rmid].rm_mask == NULL)
>> +return;
>>
>> ...and also...
>>
>> +/*
>> + * This feature is enabled only for the resource managers where
>> + * a masking function is defined.
>> + */
>> +for (i = 0; i <= RM_MAX_ID; i++)
>> +{
>> +if (RmgrTable[i].rm_mask != NULL)
>>
>> Why do we assume that the feature is only enabled for RMs that have a
>> mask function?  Why not instead assume that if there's no masking
>> function, no masking is required?
>
> Not all RMs work on full pages. Tracking in smgr.h the list of RMs
> that are no-ops when masking things is easier than having empty
> routines declared all over the code base. Don't you think so>
>
Robert's suggestion surely makes the approach more general. But,  the
existing approach makes it easier to decide the RM's for which we
support the consistency check facility. Surely, we can use a list to
track the RMs which should (/not) be masked. But, I think we always
have to mask the lsn of the pages at least. Isn't it?

>> +void(*rm_mask) (char *page, BlockNumber blkno);
>>
>> Could the page be passed as type "Page" rather than a "char *" to make
>> things more convenient for the masking functions?  If not, could those
>> functions at least do something like "Page page = (Page) pagebytes;"
>> rather than "Page page_norm = (Page) page;"?
>
> xlog_internal.h is used as well by frontends, this makes the header
> dependencies cleaner. (I have looked at using Page when hacking this
> stuff, but the header dependencies are not worth it, I don't recall
> all the details though this was a couple of months back).
+ 1



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
At Wed, 01 Feb 2017 12:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170201.121304.267734380.horiguchi.kyot...@lab.ntt.co.jp>
> > >  I tried a committed Logical Replication environment. I found
> > >  that replication between databases of different encodings did
> > >  not convert encodings in character type columns. Is this
> > >  behavior correct?
> > 
> > The output plugin for subscription is pgoutput and it currently
> > doesn't consider encoding but would easiliy be added if desired
> > encoding is informed.
> > 
> > The easiest (but somewhat seems fragile) way I can guess is,
> > 
> > - Subscriber connects with client_encoding specification and the
> >   output plugin pgoutput decide whether it accepts the encoding
> >   or not. If the subscriber doesn't, pgoutput send data without
> >   conversion.
> > 
> > The attached small patch does this and works with the following
> > CREATE SUBSCRIPTION.
> 
> Oops. It forgets to care conversion failure. It is amended in the
> attached patch.
> 
> > CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
> > client_encoding=EUC_JP' PUBLICATION pub1;
> > 
> > 
> > Also we may have explicit negotiation on, for example,
> > CREATE_REPLICATION_SLOT.
> > 
> >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> > 
> > Or output plugin may take options.
> > 
> >  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> > 
> > 
> > Any opinions?

This patch chokes replication when the publisher finds an
inconvertible character in a tuple to be sent. For the case,
dropping-then-recreating subscription is necessary to go forward.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Tom Lane wrote:

> BTW, the reason I think it's good cleanup is that it's something that my
> colleagues at Salesforce also had to do as part of putting PG on top of a
> different storage engine that had different ideas about index handling.
> Essentially it's providing a bit of abstraction as to whether catalog
> storage is exactly heaps or not (a topic I've noticed Robert is starting
> to take some interest in, as well).

Yeah, I remembered that too.  Of course, we'd need to change the whole
idea of mapping tuples to C structs too, but this seemed a nice step
forward.  (I renamed Pavan's proposed routine precisely to avoid the
word "Heap" in it.)

> However, the patch misses an
> important part of such an abstraction layer by not also converting
> catalog-related simple_heap_delete() calls into some sort of
> CatalogTupleDelete() operation.  It is certainly a peculiarity of
> PG heaps that deletions don't require any immediate index work --- most
> other storage engines would need that.

> I propose that we should finish the job by inventing CatalogTupleDelete(),
> which for the moment would be a trivial wrapper around
> simple_heap_delete(), maybe just a macro for it.
> 
> If there's no objections I'll go make that happen in a day or two.

Sounds good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Kuntal Ghosh
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
Thanks Robert for taking your time for the review.  I'll update the
patch with the changes suggested by you.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-01-31 Thread Michael Paquier
On Mon, Jan 23, 2017 at 6:51 PM, Kuntal Ghosh
 wrote:
> On Wed, Jan 18, 2017 at 11:31 AM, Rushabh Lathia
>  wrote:
>>
> The patch needs a rebase after the commit 69f4b9c85f168ae006929eec4.

Is an update going to be provided? I have moved this patch to next CF
with "waiting on author" as status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 4:02 PM, Craig Ringer  wrote:
> If we want to save the 4 bytes per xmin advance (probably not worth
> caring) we can instead skip setting it on the standby, in which case
> it'll be potentially wrong until the next checkpoint. I'd rather make
> sure it stays correct.

Those patches still apply and no reviews have come in yet, so moved to
CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-31 Thread Ashutosh Bapat
>
>> However, ExecHashIncreaseNumBatches() may change the
>> number of buckets; the patch does not seem to account for spaceUsed changes
>> because of that.
>
> That's what this hunk is intended to do:
>
> @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
> TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,
>
> hashtable->nbuckets_optimal);
>
> +   /* account for the increase in space that will be used by buckets */
> +   hashtable->spaceUsed += sizeof(HashJoinTuple) *
> +   (hashtable->nbuckets_optimal - hashtable->nbuckets);
> +   if (hashtable->spaceUsed > hashtable->spacePeak)
> +   hashtable->spacePeak = hashtable->spaceUsed;
> +

Sorry, I missed that hunk. You are right, it's getting accounted for.

>>
>> In ExecHashTableReset(), do we want to update spacePeak while setting
>> spaceUsed.
>
> I figured there was no way that the new spaceUsed value could be
> bigger than spacePeak, because we threw out all chunks and have just
> the bucket array, and we had that number of buckets before, so
> spacePeak must at least have been set to a number >= this number
> either when we expanded to this many buckets, or when we created the
> hashtable in the first place.  Perhaps I should
> Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

>
>> While this patch tracks space usage more accurately, I am afraid we might be
>> overdoing it; a reason why we don't track space usage accurately now. But I
>> think I will leave it to be judged by someone who is more familiar with the
>> code and possibly has historical perspective.
>
> Well it's not doing more work; it doesn't make any practical
> difference whatsoever but it's technically doing less work than
> master, by doing memory accounting only when acquiring a new 32KB
> chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

> But if by overdoing it you mean that no one really cares about
> the tiny increase in accuracy so the patch on its own is a bit of a
> waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

> Depending on tuple size, you
> could imagine something like 64 bytes of header and unused space per
> 32KB chunk that we're not accounting for, and that's only 0.2%.  So I
> probably wouldn't propose this refactoring just on accuracy grounds
> alone.
>
> This refactoring is intended to pave the way for shared memory
> accounting in the later patches, which would otherwise generate ugly
> IPC if done for every time a tuple is allocated.  I considered using
> atomic add to count space per tuple, or maintaining per-backend local
> subtotals and periodically summing.  Then I realised that switching to
> per-chunk accounting would fix the IPC problem AND be justifiable on
> theoretical grounds.  When we allocate a new 32KB chunk, we really are
> using 32KB more of your memory.

+1.

Thanks for considering the comments.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting planning time with EXPLAIN

2017-01-31 Thread Michael Paquier
On Wed, Jan 4, 2017 at 7:59 PM, Ashutosh Bapat
 wrote:
> Here are patches for following

Those patches have received no code-level reviews, so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 2:53 PM, Michael Paquier
 wrote:
> The latest patch available still applies, one person has added his
> name (John G) in October though there have been no reviews. There have
> been a couple of arguments against this patch, and the thread has had
> no activity for the last month, so I would be incline to mark that as
> returned with feedback. Thoughts?

And done as such. Feel free to update if there is something fresh.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 6:32 PM, Peter Moser  wrote:
> [reviews and discussions]

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-01-31 Thread Kyotaro HORIGUCHI
Thank you.

At Wed, 1 Feb 2017 14:11:58 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Amit Kapila
On Tue, Jan 31, 2017 at 5:53 PM, Rahila Syed  wrote:
> Hello,
>
>>Agreed, that it makes sense to consider only the number of pages to
>>scan for computation of parallel workers.  I think for index scan we
>>should consider both index and heap pages that need to be scanned
>>(costing of index scan consider both index and heap pages).   I thin
>>where considering heap pages matter more is when the finally selected
>>rows are scattered across heap pages or we need to apply a filter on
>>rows after fetching from the heap.  OTOH, we can consider just pages
>>in the index as that is where mainly the parallelism works
> IMO, considering just index pages will give a better estimate of work to be
> done
> in parallel. As the amount of work/number of pages divided amongst workers
> is irrespective of
> the number of heap pages scanned.
>

Yeah, I understand that point and I can see there is strong argument
to do that way, but let's wait and see what others including Robert
have to say about this point.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 8:16 AM, Tom Lane  wrote:
> [ in the service of closing out this thread... ]
>
> Peter Geoghegan  writes:
>> Finally, 0003-* is a Valgrind suppression borrowed from my parallel
>> CREATE INDEX patch. It's self-explanatory.
>
> Um, I didn't find it all that self-explanatory.  Why wouldn't we want
> to avoid writing undefined data?  I think the comment at least needs
> to explain exactly what part of the written data might be uninitialized.
> And I'd put the comment into valgrind.supp, too, not in the commit msg.
>
> Also, the suppression seems far too broad.  It would for instance
> block any complaint about a write() invoked via an elog call from
> any function invoked from any LogicalTape* function, no matter
> how far removed.

It seems like a new patch will be provided, so moved to next CF with
"waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
At Wed, 01 Feb 2017 12:05:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170201.120540.183393194.horiguchi.kyot...@lab.ntt.co.jp>
> Hello,
> 
> At Tue, 31 Jan 2017 12:46:18 +, "Shinoda, Noriyoshi" 
>  wrote in 
> 
> >  I tried a committed Logical Replication environment. I found
> >  that replication between databases of different encodings did
> >  not convert encodings in character type columns. Is this
> >  behavior correct?
> 
> The output plugin for subscription is pgoutput and it currently
> doesn't consider encoding but would easiliy be added if desired
> encoding is informed.
> 
> The easiest (but somewhat seems fragile) way I can guess is,
> 
> - Subscriber connects with client_encoding specification and the
>   output plugin pgoutput decide whether it accepts the encoding
>   or not. If the subscriber doesn't, pgoutput send data without
>   conversion.
> 
> The attached small patch does this and works with the following
> CREATE SUBSCRIPTION.

Oops. It forgets to care conversion failure. It is amended in the
attached patch.

> CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
> client_encoding=EUC_JP' PUBLICATION pub1;
> 
> 
> Also we may have explicit negotiation on, for example,
> CREATE_REPLICATION_SLOT.
> 
>  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> 
> Or output plugin may take options.
> 
>  'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> 
> 
> Any opinions?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 1f30de6..3c457a2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,6 +443,20 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+		{
+			char *p;
+
+			p = pg_server_to_client(outputstr, strlen(outputstr));
+
+			/* Send the converted string on when succeeded */
+			if (p)
+			{
+pfree(outputstr);
+outputstr = p;
+			}
+		}
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
 		appendBinaryStringInfo(out, outputstr, len); /* data */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-01-31 Thread Michael Paquier
On Tue, Dec 13, 2016 at 10:32 AM, Petr Jelinek
 wrote:
> Okay, this version looks good to me, marked as RfC.

The patches still apply, moved to CF 2017-03 with same status: RfC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2017-01-31 Thread Michael Paquier
On Sat, Dec 31, 2016 at 7:35 AM, Alvaro Herrera
 wrote:
> Attached is v4, which fixes a couple of relatively minor bugs.  There
> are still things to tackle before this is committable, but coding review
> of the new executor node would be welcome.

Moved to CF 2017-03 because of a lack of reviews. The patch fails to
apply and needs a rebase, so it is marked as "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2017-01-31 Thread Michael Paquier
On Mon, Dec 5, 2016 at 11:35 AM, Haribabu Kommi
 wrote:
> Moved to next CF with "needs review" status.

Same, this time to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: session server side variables

2017-01-31 Thread Michael Paquier
On Wed, Jan 11, 2017 at 10:42 PM, Craig Ringer  wrote:
> There is no code yet. Code review and testing is where things get firmer.
>
> My personal stance right now is that I'd like to see catalog-decared typed
> variables. I would prefer them to be transactional and would at least oppose
> anything that didn't allow future room for that capability. I'd prefer that
> non-transactional vars be clearly declared as such.
>
> In the end though... I'm not the one implementing it. I can have some
> influence through the code review process. But it's whoever steps up with a
> proposed implementation that has the biggest say. The rest of us can say yes
> or no to some degree... but nobody can make someone else implement something
> they don't want.

The last patch is from the 6th of December and does not apply anymore:
https://www.postgresql.org/message-id/CAFj8pRA9w_AujBAYdLR0UVfXwwoxhmn%2BFbNHnD3_NL%3DJ9x3y8w%40mail.gmail.com
I don't have a better idea than marking this patch as "returned with
feedback" for now, as the thread has died 3 weeks ago as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2017-01-31 Thread Michael Paquier
On Wed, Jan 11, 2017 at 11:55 PM, Pavel Stehule  wrote:
> I'll mark this patch as ready for commiter

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-01-31 Thread Michael Paquier
On Tue, Jan 17, 2017 at 2:40 PM, Amit Langote
 wrote:
> Hi Amit,
>
> On 2016/12/23 14:21, Amit Khandekar wrote:
>> Currently an Append plan node does not execute its subplans in
>> parallel. There is no distribution of workers across its subplans. The
>> second subplan starts running only after the first subplan finishes,
>> although the individual subplans may be running parallel scans.
>>
>> Secondly, we create a partial Append path for an appendrel, but we do
>> that only if all of its member subpaths are partial paths. If one or
>> more of the subplans is a non-parallel path, there will be only a
>> non-parallel Append. So whatever node is sitting on top of Append is
>> not going to do a parallel plan; for example, a select count(*) won't
>> divide it into partial aggregates if the underlying Append is not
>> partial.
>>
>> The attached patch removes both of the above restrictions.  There has
>> already been a mail thread [1] that discusses an approach suggested by
>> Robert Haas for implementing this feature. This patch uses this same
>> approach.
>
> I was looking at the executor portion of this patch and I noticed that in
> exec_append_initialize_next():
>
> if (appendstate->as_padesc)
> return parallel_append_next(appendstate);
>
> /*
>  * Not parallel-aware. Fine, just go on to the next subplan in the
>  * appropriate direction.
>  */
> if (ScanDirectionIsForward(appendstate->ps.state->es_direction))
> appendstate->as_whichplan++;
> else
> appendstate->as_whichplan--;
>
> which seems to mean that executing Append in parallel mode disregards the
> scan direction.  I am not immediately sure what implications that has, so
> I checked what heap scan does when executing in parallel mode, and found
> this in heapgettup():
>
> else if (backward)
> {
> /* backward parallel scan not supported */
> Assert(scan->rs_parallel == NULL);
>
> Perhaps, AppendState.as_padesc would not have been set if scan direction
> is backward, because parallel mode would be disabled for the whole query
> in that case (PlannerGlobal.parallelModeOK = false).  Maybe add an
> Assert() similar to one in heapgettup().

There have been some reviews, but the patch has not been updated in
two weeks. Marking as "returned with feedback".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 7:20 PM, Etsuro Fujita
 wrote:
> Attached is the new version of the patch.  I also addressed other comments
> from you: moved rewriting the fdw_scan_tlist to postgres_fdw.c,
> added/revised comments, and added regression tests for the case where a
> pushed down UPDATE/DELETE on a join has RETURNING.
>
> My apologies for having been late to work on this.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : Parallel Merge Join

2017-01-31 Thread Michael Paquier
On Thu, Jan 5, 2017 at 12:57 AM, Dilip Kumar  wrote:
> On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila  wrote:
>> Review comments:
>> 1.
>> + bool is_partial);
>> +
>>
>> Seems additional new line is not required.
> Fixed

This patch has a patch, no new reviews. Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 11:05 AM, Claudio Freire  wrote:
> Updated and rebased v7 attached.

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-31 Thread Michael Paquier
On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
> Nikita Glukhov  writes:
>> On 25.01.2017 23:58, Tom Lane wrote:
>>> I think you need to take a second look at the code you're producing
>>> and realize that it's not so clean either.  This extract from
>>> populate_record_field, for example, is pretty horrid:
>
>> But what if we introduce some helper macros like this:
>
>> #define JsValueIsNull(jsv) \
>>  ((jsv)->is_json ? !(jsv)->val.json.str \
>>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)
>
>> #define JsValueIsString(jsv) \
>>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)
>
> Yeah, I was wondering about that too.  I'm not sure that you can make
> a reasonable set of helper macros that will fix this, but if you want
> to try, go for it.
>
> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
> to go back to the manual every darn time to convince myself whether
> that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
> the reader (... or the author) having memorized C's precedence rules
> in quite that much detail.  Extra parens help.

Moved to CF 2017-03 as discussion is going on and more review is
needed on the last set of patches.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index-only scan

2017-01-31 Thread Michael Paquier
On Thu, Jan 19, 2017 at 9:07 PM, Rafia Sabih
 wrote:
> Please find the attached file rebased patch of parallel index-only
> scan on the latest Parallel index-scan patch [1].

Moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 31 Jan 2017 12:46:18 +, "Shinoda, Noriyoshi" 
 wrote in 

>  I tried a committed Logical Replication environment. I found
>  that replication between databases of different encodings did
>  not convert encodings in character type columns. Is this
>  behavior correct?

The output plugin for subscription is pgoutput and it currently
doesn't consider encoding but would easiliy be added if desired
encoding is informed.

The easiest (but somewhat seems fragile) way I can guess is,

- Subscriber connects with client_encoding specification and the
  output plugin pgoutput decide whether it accepts the encoding
  or not. If the subscriber doesn't, pgoutput send data without
  conversion.

The attached small patch does this and works with the following
CREATE SUBSCRIPTION.

CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432 dbname=postgres 
client_encoding=EUC_JP' PUBLICATION pub1;


Also we may have explicit negotiation on, for example,
CREATE_REPLICATION_SLOT.

 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'

Or output plugin may take options.

 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'


Any opinions?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 1f30de6..6a235d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -16,6 +16,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_type.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_wchar.h"
 #include "replication/logicalproto.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -442,6 +443,9 @@ logicalrep_write_tuple(StringInfo out, Relation rel, HeapTuple tuple)
 		pq_sendbyte(out, 't');	/* 'text' data follows */
 
 		outputstr = OidOutputFunctionCall(typclass->typoutput, values[i]);
+		if (pg_get_client_encoding() != GetDatabaseEncoding())
+			outputstr = pg_server_to_client(outputstr, strlen(outputstr));
+
 		len = strlen(outputstr) + 1;	/* null terminated */
 		pq_sendint(out, len, 4);		/* length */
 		appendBinaryStringInfo(out, outputstr, len); /* data */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-31 Thread Michael Paquier
On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
 wrote:
> Six new syscaches in 665d1fa was conflicted and 3-way merge
> worked correctly. The new syscaches don't seem to be targets of
> this patch.

To be honest, I am not completely sure what to think about this patch.
Moved to next CF as there is a new version, and no new reviews to make
the discussion perhaps move on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect: Hash index support

2017-01-31 Thread Michael Paquier
On Sun, Jan 29, 2017 at 11:09 AM, Ashutosh Sharma  wrote:
> okay. Thanks. I have done changes on top of this patch.

Moved to CF 2017-03 as there is a new patch, no reviews.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Measuring replay lag

2017-01-31 Thread Michael Paquier
On Sat, Jan 21, 2017 at 10:49 AM, Thomas Munro
 wrote:
> Ok.  I see that there is a new compelling reason to move the ring
> buffer to the sender side: then I think lag tracking will work
> automatically for the new logical replication that just landed on
> master.  I will try it that way.  Thanks for the feedback!

Seeing no new patches, marked as returned with feedback. Feel free of
course to refresh the CF entry once you have a new patch!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-01-31 Thread Michael Paquier
On Tue, Jan 17, 2017 at 9:18 PM, Amit Kapila  wrote:
> On Tue, Jan 17, 2017 at 11:39 AM, Dilip Kumar  wrote:
>> On Wed, Jan 11, 2017 at 10:55 AM, Dilip Kumar  wrote:
>>> I have reviewed the latest patch and I don't have any more comments.
>>> So if there is no objection from other reviewers I can move it to
>>> "Ready For Committer"?
>>
>> Seeing no objections, I have moved it to Ready For Committer.
>>
>
> Thanks for the review.

Moved to CF 2017-03, the 8th commit fest of this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-01-31 Thread Michael Paquier
On Sat, Jan 28, 2017 at 8:02 PM, Amit Kapila  wrote:
> On Fri, Jan 27, 2017 at 5:15 PM, Ashutosh Sharma  
> wrote:
>>>
>>> Don't you think we should try to identify the reason of the deadlock
>>> error reported by you up thread [1]?  I know that you and Ashutosh are
>>> not able to reproduce it, but still I feel some investigation is
>>> required to find the reason.  It is quite possible that the test case
>>> is such that the deadlock is expected in rare cases, if that is the
>>> case then it is okay.  I have not spent enough time on that to comment
>>> whether it is a test or code issue.
>>
>> I am finally able to reproduce the issue using the attached script
>> file (deadlock_report). Basically, once I was able to reproduce the
>> issue with hash index I also thought of checking it with a non unique
>> B-Tree index and was able to reproduce it with B-Tree index as well.
>> This certainly tells us that there is nothing wrong at the code level
>> rather there is something wrong with the test script which is causing
>> this deadlock issue. Well, I have ran pgbench with two different
>> configurations and my observations are as follows:
>>
>> 1) With Primary keys i.e. uinque values: I have never encountered
>> deadlock issue with this configuration.
>>
>> 2) With non unique indexes (be it hash or B-Tree): I have seen
>> deadlock many a times with this configuration. Basically when we have
>> non-unique values associated with a column then there is a high
>> probability that multiple records will get updated with a single
>> 'UPDATE' statement and when there are huge number of backends trying
>> to do so there is high chance of getting deadlock which i assume is
>> expected behaviour in database.
>>
>
> I agree with your analysis, surely trying to update multiple rows with
> same values from different backends can lead to deadlock.

Moved that to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU integration

2017-01-31 Thread Michael Paquier
On Wed, Jan 25, 2017 at 2:44 AM, Peter Eisentraut
 wrote:
> On 1/15/17 5:53 AM, Pavel Stehule wrote:
>> the regress test fails
>>
>>  Program received signal SIGSEGV, Segmentation fault.
>> 0x007bbc2b in pattern_char_isalpha (locale_is_c=0 '\000',
>> locale=0x1a73220, is_multibyte=1 '\001', c=97 'a') at selfuncs.c:5291
>> 5291return isalpha_l((unsigned char) c, locale->lt);
>
> Here is an updated patch that fixes this crash and is rebased on top of
> recent changes.

Patch that still applies + no reviews = moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera 
wrote:

>
> > > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> > > +do { \
> > > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> > > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> > > +} while (0)
> >
> > > Actually, I think this macro could just return the TID so that it can
> be
> > > used as struct assignment, just like ItemPointerCopy does internally --
> > > callers can do
> > >ctid = HeapTupleHeaderGetNextTid(tup);
> >
> > While I agree with your proposal, I wonder why we have ItemPointerCopy()
> in
> > the first place because we freely copy TIDs as struct assignment. Is
> there
> > a reason for that? And if there is, does it impact this specific case?
>
> I dunno.  This macro is present in our very first commit d31084e9d1118b.
> Maybe it's an artifact from the Lisp to C conversion.  Even then, we had
> some cases of iptrs being copied by struct assignment, so it's not like
> it didn't work.  Perhaps somebody envisioned that the internal details
> could change, but that hasn't happened in two decades so why should we
> worry about it now?  If somebody needs it later, it can be changed then.
>
>
May I suggest in that case that we apply the attached patch which removes
all references to ItemPointerCopy and its definition as well? This will
avoid confusion in future too. No issues noticed in regression tests.


> > There is one issue that bothers me. The current implementation lacks
> > ability to convert WARM chains into HOT chains. The README.WARM has some
> > proposal to do that. But it requires additional free bit in tuple header
> > (which we don't have) and of course, it needs to be vetted and
> implemented.
> > If the heap ends up with many WARM tuples, then index-only-scans will
> > become ineffective because index-only-scan can not skip a heap page, if
> it
> > contains a WARM tuple. Alternate ideas/suggestions and review of the
> design
> > are welcome!
>
> t_infomask2 contains one last unused bit,


Umm, WARM is using 2 unused bits from t_infomask2. You mean there is
another free bit after that too?


> and we could reuse vacuum
> full's bits (HEAP_MOVED_OUT, HEAP_MOVED_IN), but that will need some
> thinking ahead.  Maybe now's the time to start versioning relations so
> that we can ensure clusters upgraded to pg10 do not contain any of those
> bits in any tuple headers.
>

Yeah, IIRC old VACUUM FULL was removed in 9.0, which is good 6 year old.
Obviously, there still a chance that a pre-9.0 binary upgraded cluster
exists and upgrades to 10. So we still need to do something about them if
we reuse these bits. I'm surprised to see that we don't have any mechanism
in place to clear those bits. So may be we add something to do that.

I'd some other ideas (and a patch too) to reuse bits from t_ctid.ip_pos
given that offset numbers can be represented in just 13 bits, even with the
maximum block size. Can look at that if it comes to finding more bits.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_itempointercopy.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-31 Thread Michael Paquier
On Fri, Dec 30, 2016 at 12:55 PM, Haribabu Kommi
 wrote:
> Any Comments on the approach?

I have moved this patch to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of replication commands using printsimple

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 11:59 PM, Robert Haas  wrote:
> Sorry, I have a little more nitpicking.

Thanks for the input.

> How about having
> printsimple() use pq_sendcountedtext() instead of pq_sendint()
> followed by pq_sendbytes(), as it does for TEXTOID?
>
> Other than that, this looks fine to me now.

pq_sendcountedtext() does some encoding conversion, which is why I
haven't used because we deal only with integers in this patch... Now
if you wish to switch to that I have really no arguments against.
-- 
Michael


refactor-repl-cmd-output-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:17 AM, Robert Haas  wrote:
> Well, the question even for that case is whether it really costs
> anything.  My bet is that it is nearly free when it doesn't help, but
> that could be wrong.  My experience running pgbench tests is that
> prewarming all of pgbench_accounts on a scale factor that fits in
> shared_buffers using "dd" took just a few seconds, but when accessing
> the blocks in random order the cache took many minutes to heat up.

And benchmarks like dbt-1 have a pre-warming period added in the test
itself where the user can specify in a number of seconds to linearly
increase the load from 0% to 100%, just for filling in the OS and PG's
cache... This feature would be helpful.

> Now, I assume that this patch sorts the I/O (although I haven't
> checked that) and therefore I expect that the prewarm completes really
> fast.  If that's not the case, then that's bad.  But if it is the
> case, then it's not really hurting you even if the workload changes
> completely.

Having that working fast would be really nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Michael Paquier
On Wed, Feb 1, 2017 at 1:06 AM, Robert Haas  wrote:
> On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
>  wrote:
>> I've attached the patch with the modified changes. PFA.
>
> Can this patch check contrib/bloom?

Only full pages are applied at redo by the generic WAL facility. So
you would finish by comparing a page with itself in generic_redo.

> +/*
> + * For a speculative tuple, the content of t_ctid is conflicting
> + * between the backup page and current page. Hence, we set it
> + * to the current block number and current offset.
> + */
>
> Why does it differ?  Is that a bug?

This has been discussed twice in this thread, once by me, once by Alvaro:
https://www.postgresql.org/message-id/CAM3SWZQC8nUgp8SjKDY3d74VLpdf9puHc7-n3zf4xcr_bghPzg%40mail.gmail.com
https://www.postgresql.org/message-id/CAB7nPqQTLGvn_XePjS07kZMMw46kS6S7LfsTocK%2BgLpTN0bcZw%40mail.gmail.com

> +/*
> + * Leave if no masking functions defined, this is possible in the case
> + * resource managers generating just full page writes, comparing an
> + * image to itself has no meaning in those cases.
> + */
> +if (RmgrTable[rmid].rm_mask == NULL)
> +return;
>
> ...and also...
>
> +/*
> + * This feature is enabled only for the resource managers where
> + * a masking function is defined.
> + */
> +for (i = 0; i <= RM_MAX_ID; i++)
> +{
> +if (RmgrTable[i].rm_mask != NULL)
>
> Why do we assume that the feature is only enabled for RMs that have a
> mask function?  Why not instead assume that if there's no masking
> function, no masking is required?

Not all RMs work on full pages. Tracking in smgr.h the list of RMs
that are no-ops when masking things is easier than having empty
routines declared all over the code base. Don't you think so>

> +void(*rm_mask) (char *page, BlockNumber blkno);
>
> Could the page be passed as type "Page" rather than a "char *" to make
> things more convenient for the masking functions?  If not, could those
> functions at least do something like "Page page = (Page) pagebytes;"
> rather than "Page page_norm = (Page) page;"?

xlog_internal.h is used as well by frontends, this makes the header
dependencies cleaner. (I have looked at using Page when hacking this
stuff, but the header dependencies are not worth it, I don't recall
all the details though this was a couple of months back).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][REVIEW] macaddr 64 bit (EUI-64) datatype support

2017-01-31 Thread Kuntal Ghosh
On Wed, Feb 1, 2017 at 12:57 AM, Vitaly Burovoy
 wrote:
> Hello,
>
> I've reviewed the patch[1].
>
Noting to add from my side as well.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps inside ExecEndGather)

2017-01-31 Thread Michael Paquier
On Fri, Dec 2, 2016 at 9:20 PM, Haribabu Kommi  wrote:
> Moved to next CF with "needs review" status.

There has not been much interest in this patch, moved again, this time
to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-01-31 Thread Kyotaro HORIGUCHI
Hello, thank you for moving this to the next CF.

At Wed, 1 Feb 2017 13:09:51 +0900, Michael Paquier  
wrote in 
> On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
>  wrote:
> > Six new syscaches in 665d1fa was conflicted and 3-way merge
> > worked correctly. The new syscaches don't seem to be targets of
> > this patch.
> 
> To be honest, I am not completely sure what to think about this patch.
> Moved to next CF as there is a new version, and no new reviews to make
> the discussion perhaps move on.

I'm thinking the following is the status of this topic.

- The patch stll is not getting conflicted.

- This is not a hollistic measure for memory leak but surely
  saves some existing cases.

- Shared catcache is another discussion (and won't really
  proposed in a short time due to the issue on locking.)

- As I mentioned, a patch that caps the number of negative
  entries is avaiable (in first-created - first-delete manner)
  but it is having a loose end of how to determine the
  limitation.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Peter Geoghegan
On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro
 wrote:
> 2.  All participants: parallel sequential scan, sort, spool to disk;
> barrier; leader: merge spooled tuples and build btree.
>
> This patch is doing the 2nd thing.  My understanding is that some
> systems might choose to do that if they don't have or don't like the
> table's statistics, since repartitioning for balanced load requires
> carefully chosen ranges and is highly sensitive to distribution
> problems.

The second thing here seems to offer comparable scalability to other
system implementation's of the first thing. They seem to have reused
"partitioning to sort in parallel" for B-Tree builds, at least in some
cases, despite this. WAL logging is the biggest serial bottleneck here
for other systems, I've heard -- that's still going to be pretty much
serial.

I think that the fact that some systems do partitioning for parallel
B-Tree builds might have as much to do with their ability to create
B-Tree indexes in place as anything else. Apparently, some systems
don't use temp files, instead writing out what is for all intents and
purposes part of a finished B-Tree as runs (no use of
temp_tablespaces). That may be a big part of what makes it worthwhile
to try to use partitioning. I understand that only the highest client
counts will see much direct performance benefit relative to the first
approach.

> It's pretty clear that approach 1 is a difficult project.  From my
> research into dynamic repartitioning in the context of hash joins, I
> can see that that infrastructure is a significant project in its own
> right: subproblems include super efficient tuple exchange, buffering,
> statistics/planning and dealing with/adapting to bad outcomes.  I also
> suspect that repartitioning operators might need to be specialised for
> different purposes like sorting vs hash joins, which may have
> differing goals.  I think it's probably easy to build a slow dynamic
> repartitioning mechanism that frequently results in terrible worst
> case scenarios where you paid a fortune in IPC overheads and still
> finished up with one worker pulling most of the whole load.  Without
> range partitioning, I don't believe you can merge the resulting
> non-disjoint btrees efficiently so you'd probably finish up writing a
> complete new btree to mash them together.  As for merging disjoint
> btrees, I assume there are ways to do a structure-preserving merge
> that just rebuilds some internal pages and incorporates the existing
> leaf pages directly, a bit like tree manipulation in functional
> programming languages; that'll take some doing.

I agree with all that. "Stitching together" disjoint B-Trees does seem
to have some particular risks, which users of other systems are
cautioned against in their documentation. You can end up with an
unbalanced B-Tree.

> So I'm in favour of this patch, which is relatively simple and give us
> faster index builds soon.  Eventually we might also be able to have
> approach 1.  From what I gather, it's entirely possible that we might
> still need 2 to fall back on in some cases.

Right. And it can form the basis of an implementation of 1, which in
any case seems to be much more compelling for parallel query, when a
great deal more can be pushed down, and we are not particularly likely
to be I/O bound (usually not much writing to the heap, or WAL
logging).

> Will you move the BufFile changes to a separate patch in the next revision?

That is the plan. I need to get set up with a new machine here, having
given back my work laptop to Heroku, but it shouldn't take too long.

Thanks for the review.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-01-31 Thread Kyotaro HORIGUCHI
At Tue, 31 Jan 2017 14:38:39 +0300, Nikita Glukhov  
wrote in <1622dc9f-cecf-cee3-b71e-b2bf34649...@postgrespro.ru>
> On 31.01.2017 13:04, Kyotaro HORIGUCHI wrote:
> > The following comment,
> >
> >> /* Can any range from range_box to be overlower than this argument? */
> >
> > This might be better to be using the same wording to its users,
> > for example the comment for overLeft4D is the following.
> >
> > | /* Can any rectangle from rect_box does not extend the right of this
> > | argument? */
> >
> > And the documentation uses the following sentence,
> >
> > https://www.postgresql.org/docs/current/static/functions-geometry.html
> > | Does not extend to the right of?
> >
> > So I think "Can any range from range_box does not extend the
> > upper of this argument?" is better than the overlower. What do
> > you think?
> 
> I think "does not extend the upper" is better than unclear "overlower"
> too.
> So I've corrected this comment in the attached v03 patch.

Thank you. I think this patch is already in a good shape. Let's
wait for a while for some commiter to pick up this. If you're
afraid that this might be forgotten, it might be better to
register this as an entry in the next commit fest.

> > I confirmed other functions in geo_spgist but found no bugs like this.
> 
> I found no bugs in other functions too.
> 
> 
> > The minimal bad example for the '&<' case is the following.
> >
> > =# create table t (b box);
> > =# create index on t using spgist(b);
> > =# insert into t values ('((215, 305),(210,300))'::box);
> > =# select * from t where b &< '((100,200),(300,400))'::box;
> >   b
> > -
> >  (215,305),(210,300)
> > (1 row)
> >
> > =# set enable_seqscan to false;
> > =# select * from t where b &< '((100,200),(300,400))'::box;
> >  b
> > ---
> > (0 rows)
> 
> I don't know how you were able to reproduce this bug in
> spg_box_quad_inner_consistent()
> using only one box because index tree must have at least one inner
> node (or 157 boxes)
> in order to spg_box_quad_inner_consistent() began to be called.
> That's why existing
> tests for box_ops didn't detect this bug: box_temp table has only 56
> rows.

GiST seems to split the space even for the first tuple. I saw
spg_box_quad_inner_consistent called several times on SELECT and
fortunately the scan for the tuple going into wrong node.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Thomas Munro
On Wed, Feb 1, 2017 at 5:37 PM, Michael Paquier
 wrote:
> On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan  wrote:
>> On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro
>>  wrote:
>>> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan  wrote:
 Attached is V7 of the patch.
>>>
>>> I am doing some testing.  First, some superficial things from first pass:
>>>
>>> [Various minor cosmetic issues]
>>
>> Oops.
>
> As this review is very recent, I have moved the patch to CF 2017-03.

 ParallelContext *
-CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
+CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers,
+ bool serializable_okay)
 {
MemoryContext oldcontext;
ParallelContext *pcxt;
@@ -143,7 +144,7 @@ CreateParallelContext(parallel_worker_main_type
entrypoint, int nworkers)
 * workers, at least not until somebody enhances that mechanism to be
 * parallel-aware.
 */
-   if (IsolationIsSerializable())
+   if (IsolationIsSerializable() && !serializable_okay)
nworkers = 0;

That's a bit weird but I can't think of a problem with it.  Workers
run with MySerializableXact == InvalidSerializableXact, even though
they may have the snapshot of a SERIALIZABLE leader.  Hopefully soon
the restriction on SERIALIZABLE in parallel queries can be lifted
anyway, and then this could be removed.

Here are some thoughts on the overall approach.  Disclaimer:  I
haven't researched the state of the art in parallel sort or btree
builds.  But I gather from general reading that there are a couple of
well known approaches, and I'm sure you'll correct me if I'm off base
here.

1.  All participants: parallel sequential scan, repartition on the fly
so each worker has tuples in a non-overlapping range, sort, build
disjoint btrees; barrier; leader: merge disjoint btrees into one.

2.  All participants: parallel sequential scan, sort, spool to disk;
barrier; leader: merge spooled tuples and build btree.

This patch is doing the 2nd thing.  My understanding is that some
systems might choose to do that if they don't have or don't like the
table's statistics, since repartitioning for balanced load requires
carefully chosen ranges and is highly sensitive to distribution
problems.

It's pretty clear that approach 1 is a difficult project.  From my
research into dynamic repartitioning in the context of hash joins, I
can see that that infrastructure is a significant project in its own
right: subproblems include super efficient tuple exchange, buffering,
statistics/planning and dealing with/adapting to bad outcomes.  I also
suspect that repartitioning operators might need to be specialised for
different purposes like sorting vs hash joins, which may have
differing goals.  I think it's probably easy to build a slow dynamic
repartitioning mechanism that frequently results in terrible worst
case scenarios where you paid a fortune in IPC overheads and still
finished up with one worker pulling most of the whole load.  Without
range partitioning, I don't believe you can merge the resulting
non-disjoint btrees efficiently so you'd probably finish up writing a
complete new btree to mash them together.  As for merging disjoint
btrees, I assume there are ways to do a structure-preserving merge
that just rebuilds some internal pages and incorporates the existing
leaf pages directly, a bit like tree manipulation in functional
programming languages; that'll take some doing.

So I'm in favour of this patch, which is relatively simple and give us
faster index builds soon.  Eventually we might also be able to have
approach 1.  From what I gather, it's entirely possible that we might
still need 2 to fall back on in some cases.

Will you move the BufFile changes to a separate patch in the next revision?

Still testing and reviewing, more soon.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-01-31 Thread Kyotaro HORIGUCHI
Hello, I'll add the rebased version to the next CF.

At Fri, 20 Jan 2017 11:07:29 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170120.110729.107284864.horiguchi.kyot...@lab.ntt.co.jp>
> > > > - Delaying recycling a segment until the last partial record on it
> > > >   completes. This seems doable in page-wise (coarse resolution)
> > > >   but would cost additional reading of past xlog files (page
> > > >   header of past pages is required).
> > > 
> > > Hm, yes. That looks like the least invasive way to go. At least that
> > > looks more correct than the others.
> > 
> > The attached patch does that. Usually it reads page headers only
> > on segment boundaries, but once continuation record found (or
> > failed to read the next page header, that is, the first record on
> > the first page in the next segment has not been replicated), it
> > becomes to happen on every page boundary until non-continuation
> > page comes.
> > 
> > I leave a debug info (at LOG level) in the attached file shown on
> > every state change of keep pointer. At least for pgbench, the
> > cost seems ignorable.
> 
> I revised it. It became neater and less invasive.
> 
>  - Removed added keep from struct WalSnd. It is never referrenced
>from other processes. It is static variable now.
> 
>  - Restore keepPtr from replication slot on starting.

keepPtr is renamed to a more meaningful name restartLSN.

>  - Moved the main part to more appropriate position.

- Removed the debug print code.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 47f9f867305934cbc5fdbd9629e61be65353fc9c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 1 Feb 2017 16:07:22 +0900
Subject: [PATCH] Fix a bug of physical replication slot.

A physical-replication standby can stop just at the boundary of WAL
segments. restart_lsn of the slot on the master can be assumed to be
the same location. The last segment on the master will be removed
after some checkpoints for the case. If the first record of the next
segment is a continuation record, it is only on the master and its
beginning is only on the standby so the standby cannot restart because
the record to read is scattered to two sources.

This patch detains restart_lsn in the last sgement when the first page
of the next segment is a continuation record.
---
 src/backend/replication/walsender.c | 107 +---
 1 file changed, 100 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 5909b7d..cfbe70e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -188,6 +188,12 @@ static volatile sig_atomic_t replication_active = false;
 static LogicalDecodingContext *logical_decoding_ctx = NULL;
 static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
+/*
+ * Segment keep pointer for physical slots. Has a valid value only when it
+ * differs from the current flush pointer.
+ */
+static XLogRecPtr	   keepPtr = InvalidXLogRecPtr;
+
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
@@ -220,7 +226,7 @@ static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, Tran
 static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
 
-static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+static bool XLogRead(char *buf, XLogRecPtr startptr, Size count, bool noutfoundok);
 
 
 /* Initialize walsender process before entering the main command loop */
@@ -541,6 +547,9 @@ StartReplication(StartReplicationCmd *cmd)
 			ereport(ERROR,
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 (errmsg("cannot use a logical replication slot for physical replication";
+
+		/* Restore keepPtr from replication slot */
+		keepPtr = MyReplicationSlot->data.restart_lsn;
 	}
 
 	/*
@@ -556,6 +565,10 @@ StartReplication(StartReplicationCmd *cmd)
 	else
 		FlushPtr = GetFlushRecPtr();
 
+	/* Set InvalidXLogRecPtr if catching up */
+	if (keepPtr == FlushPtr)
+		keepPtr = InvalidXLogRecPtr;
+
 	if (cmd->timeline != 0)
 	{
 		XLogRecPtr	switchpoint;
@@ -777,7 +790,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 		count = flushptr - targetPagePtr;
 
 	/* now actually read the data, we know it's there */
-	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ, false);
 
 	return count;
 }
@@ -1563,7 +1576,7 @@ static void
 ProcessStandbyReplyMessage(void)
 {
 	XLogRecPtr	writePtr,
-flushPtr,
+flushPtr, oldFlushPtr,
 applyPtr;
 	bool		replyRequested;
 
@@ -1592,6 +1605,7 @@ ProcessStandbyReplyMessage(void)
 		WalSnd	   *walsnd = MyWalSnd;
 
 		SpinLockAcquire(>mutex);
+		oldFlushPtr = walsnd->flush;
 		

Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Rahila Syed
Hello Robert,

>I am a bit mystified about how this manages to work with array keys.
>_bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
>unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
>_bt_parallel_advance_scan() won't do anything unless btps_pageStatus
>is already BTPARALLEL_DONE.  It seems like one of those two things has
>to be wrong.

btps_pageStatus is to be set to BTPARALLEL_DONE only by the first worker
which is
performing scan for latest array key and which has encountered end of scan.
This is ensured by
following check in _bt_parallel_done(),

   if (so->arrayKeyCount >= btscan->btps_arrayKeyCount &&
   btscan->btps_pageStatus != BTPARALLEL_DONE)

Thus, BTPARALLEL_DONE marks end of scan only for latest array keys. This
ensures that when any worker reaches
_bt_advance_array_keys() it advances latest scan which is marked by
btscan->btps_arrayKeyCount only when latest scan
has ended by checking if(btps_pageStatus == BTPARALLEL_DONE) in
_bt_advance_array_keys(). Otherwise, the worker just
advances its local so->arrayKeyCount.

I hope this provides some clarification.

Thank you,
Rahila Syed





On Wed, Feb 1, 2017 at 3:52 AM, Robert Haas  wrote:

> On Tue, Jan 24, 2017 at 4:51 PM, Robert Haas 
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila 
> wrote:
> >> In spite of being careful, I missed reorganizing the functions in
> >> genam.h which I have done in attached patch.
> >
> > Cool.  Committed parallel-generic-index-scan.2.patch.  Thanks.
>
> Reviewing parallel_index_scan_v6.patch:
>
> I think it might be better to swap the return value and the out
> parameter for _bt_parallel_seize; that is, return a bool, and have
> callers ignore the value of the out parameter (e.g. *pageno).
>
> I think _bt_parallel_advance_scan should be renamed something that
> includes the word "keys", like _bt_parallel_advance_array_keys.
>
> The hunk in indexam.c looks like a leftover that can be omitted.
>
> +/*
> + * Below flags are used to indicate the state of parallel scan.
>
> They aren't really flags any more; they're members of an enum.  I
> think you could just leave this sentence out entirely and start right
> in with descriptions of the individual values.  But maybe all of those
> descriptions should end in a period (instead of one ending in a period
> but not the other three) since they read like complete sentences.
>
> + * btinitparallelscan - Initializing BTParallelScanDesc for parallel
> btree scan
>
> Initializing -> initialize
>
> + *  btparallelrescan() -- reset parallel scan
>
> Previous two prototypes have one dash, this one has two.  Make it
> consistent, please.
>
> + * Ideally, we don't need to acquire spinlock here, but being
> + * consistent with heap_rescan seems to be a good idea.
>
> How about: In theory, we don't need to acquire the spinlock here,
> because there shouldn't be any other workers running at this point,
> but we do so for consistency.
>
> + * _bt_parallel_seize() -- returns the next block to be scanned for
> forward
> + *  scans and latest block scanned for backward scans.
>
> I think the description should be more like "Begin the process of
> advancing the scan to a new page.  Other scans must wait until we call
> bt_parallel_release() or bt_parallel_done()."  And likewise
> _bt_parallel_release() should say something like "Complete the process
> of advancing the scan to a new page.  We now have the new value for
> btps_scanPage; some other backend can now begin advancing the scan."
> And _bt_parallel_done should say something like "Mark the parallel
> scan as complete."
>
> I am a bit mystified about how this manages to work with array keys.
> _bt_parallel_done() won't set btps_pageStatus to BTPARALLEL_DONE
> unless so->arrayKeyCount >= btscan->btps_arrayKeyCount, but
> _bt_parallel_advance_scan() won't do anything unless btps_pageStatus
> is already BTPARALLEL_DONE.  It seems like one of those two things has
> to be wrong.
>
> _bt_readpage's header comment should be updated to note that, in the
> case of a parallel scan, _bt_parallel_seize should have been called
> prior to entering this function, and _bt_parallel_release will be
> called prior to return (or this could alternatively be phrased in
> terms of btps_pageStatus on entry/exit).
>
> _bt_readnextpage isn't very clear about the meaning of its blkno
> argument.  It looks like it always has to be valid when scanning
> forward, but only in the parallel case when scanning backward?  That
> is a little odd.  Another, possibly related thing that is odd is that
> when _bt_steppage() finds no tuples and decides to advance to a new
> page again, there's a very short comment in the forward case and a
> very long comment in the backward case:
>
> /* nope, keep going */
> vs.
> /*
>  * For parallel scans, get the last page scanned as it is quite
>  * possible that 

Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Abbas Butt
On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita 
wrote:

> On 2017/01/31 19:53, Abbas Butt wrote:
>
>> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
>> > wrote:
>> On 2017/01/31 18:24, Abbas Butt wrote:
>>
>
> Postgres_fdw optimizes remote queries by pushing down the where
>> clause.
>> This feature does not work consistently when the query is
>> executed from
>> within a pl/pgsql function. The optimization works when the
>> function
>> executes the query for the first 5 times, and fails afterwards.
>>
>
> I understand that this is because PostgreSQL starts using
>> generic plan
>> with pulled up where clause after the 5th invocation hoping that
>> it
>> would be faster since we have skiped planning the query on each
>> invocation, but in this case this decision is causing the query
>> to slow
>> down.
>>
>
> How should we fix this problem?
>>
>
> ANALYZE for the foreign table doesn't work?
>>
>
> No.
>>
>> analyze ts.tickets;
>> WARNING:  skipping "tickets" --- cannot analyze this foreign table
>> ANALYZE
>>
>
> How the foreign table ts.tickets is defined?
>

test=# \d ts.tickets
 Foreign table "ts.tickets"
 Column |  Type   | Modifiers | FDW Options
+-+---+-
 id | integer | not null  |
Server: mysql_server
FDW Options: (dbname 'msql_test_db', table_name 'tickets')

Its a foreign table, referring to table 'tickets' defined on MySQL.


> Best regards,
> Etsuro Fujita
>
>
>


-- 
-- 
*Abbas*
Architect

Ph: 92.334.5100153
Skype ID: gabbasb
www.enterprisedb.co m


*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
 and more



Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-31 Thread Konstantin Knizhnik


On 30.01.2017 19:21, Heikki Linnakangas wrote:

On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote:
Interesting.. What should happen here is that for the backend's own 
insertion slot, the "insertingat" value should be greater than the 
requested flush point ('upto' variable). That's because before 
GetXLogBuffer() calls AdvanceXLInsertBuffer(), it updates the 
backend's insertingat value, to the position that it wants to insert 
to. And AdvanceXLInsertBuffer() only calls 
WaitXLogInsertionsToFinish() with value smaller than what was passed 
as the 'upto' argument.



The comment to WaitXLogInsertionsToFinish says:

  * Note: When you are about to write out WAL, you must call this 
function
  * *before* acquiring WALWriteLock, to avoid deadlocks. This 
function might

  * need to wait for an insertion to finish (or at least advance to next
  * uninitialized page), and the inserter might need to evict an old WAL
buffer
  * to make room for a new one, which in turn requires WALWriteLock.

Which contradicts to the observed stack trace.


Not AFAICS. In the stack trace you showed, the backend is not holding 
WALWriteLock. It would only acquire it after the 
WaitXLogInsertionsToFinish() call finished.





Hmmm, may be I missed something.
I am not telling about WALBufMappingLock which is required after return 
from XLogInsertionsToFinish.
But about lock obtained by WALInsertLockAcquire  at line 946 in 
XLogInsertRecord.
It will be release at line  1021 by  WALInsertLockRelease(). But 
CopyXLogRecordToWAL is invoked with this lock granted.




This line in the stack trace is suspicious:


WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"


AdvanceXLInsertBuffer() should only ever call 
WaitXLogInsertionsToFinish() with an xlog position that points to a 
page bounary, but that upto value points to the middle of a page.


Perhaps the value stored in the stack trace is not what the caller 
passed, but it was updated because it was past the 'reserveUpto' 
value? That would explain the "request to flush past end
of generated WAL" notices you saw in the log. Now, why would that 
happen, I have no idea.


If you can and want to provide me access to the system, I could have a 
look myself. I'd also like to see if the attached additional 
Assertions will fire.


I really get this assertion failed:

ExceptionalCondition(conditionName = "!(OldPageRqstPtr <= upto || 
opportunistic)", errorType = "FailedAssertion", fileName = "xlog.c", 
lineNumber = 1917), line 54 in "assert.c"

(dbx) up
unnamed block in AdvanceXLInsertBuffer(upto = 147439056632, 
opportunistic = '\0'), line 1917 in "xlog.c"

(dbx) p OldPageRqstPtr
147439058944
(dbx) p upto
147439056632
(dbx) p opportunistic
'\0'

Also , in another run, I encountered yet another assertion failure:

ExceptionalCondition(conditionName = "!NewPageBeginPtr) / 8192) % 
(XLogCtl->XLogCacheBlck + 1)) == nextidx)", errorType = 
"FailedAssertion", fileName = "xlog.c", lineNumber = 1950), line 54 in 
"assert.c"


nextidx equals to 1456, while expected value is 1457.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Pavan Deolasee wrote:
> On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera 
> wrote:

> > The simple_heap_update + CatalogUpdateIndexes pattern is getting
> > obnoxious.  How about creating something like catalog_heap_update which
> > does both things at once, and stop bothering each callsite with the WARM
> > stuff?
> 
> What I realised that there are really 2 patterns:
> 1. simple_heap_insert, CatalogUpdateIndexes
> 2. simple_heap_update, CatalogUpdateIndexes
> 
> There are only couple of places where we already have indexes open or have
> more than one tuple to update, so we call CatalogIndexInsert directly. What
> I ended up doing in the attached patch is add two new APIs which combines
> the two steps of each of these patterns. It seems much cleaner to me and
> also less buggy for future users. I hope I am not missing a reason not to
> do combine these steps.

CatalogUpdateIndexes was just added as a convenience function on top of
a very common pattern.  If we now have a reason to create a second one
because there are now two very common patterns, it seems reasonable to
have two functions.  I think I would commit the refactoring to create
these functions ahead of the larger WARM patch, since I think it'd be
bulky and largely mechanical.  (I'm going from this description; didn't
read your actual code.)  

> > +#define HeapTupleHeaderGetNextTid(tup, next_ctid) \
> > +do { \
> > + AssertMacro(!((tup)->t_infomask2 & HEAP_LATEST_TUPLE)); \
> > + ItemPointerCopy(&(tup)->t_ctid, (next_ctid)); \
> > +} while (0)
> 
> > Actually, I think this macro could just return the TID so that it can be
> > used as struct assignment, just like ItemPointerCopy does internally --
> > callers can do
> >ctid = HeapTupleHeaderGetNextTid(tup);
> 
> While I agree with your proposal, I wonder why we have ItemPointerCopy() in
> the first place because we freely copy TIDs as struct assignment. Is there
> a reason for that? And if there is, does it impact this specific case?

I dunno.  This macro is present in our very first commit d31084e9d1118b.
Maybe it's an artifact from the Lisp to C conversion.  Even then, we had
some cases of iptrs being copied by struct assignment, so it's not like
it didn't work.  Perhaps somebody envisioned that the internal details
could change, but that hasn't happened in two decades so why should we
worry about it now?  If somebody needs it later, it can be changed then.

> There is one issue that bothers me. The current implementation lacks
> ability to convert WARM chains into HOT chains. The README.WARM has some
> proposal to do that. But it requires additional free bit in tuple header
> (which we don't have) and of course, it needs to be vetted and implemented.
> If the heap ends up with many WARM tuples, then index-only-scans will
> become ineffective because index-only-scan can not skip a heap page, if it
> contains a WARM tuple. Alternate ideas/suggestions and review of the design
> are welcome!

t_infomask2 contains one last unused bit, and we could reuse vacuum
full's bits (HEAP_MOVED_OUT, HEAP_MOVED_IN), but that will need some
thinking ahead.  Maybe now's the time to start versioning relations so
that we can ensure clusters upgraded to pg10 do not contain any of those
bits in any tuple headers.


I don't have any ideas regarding the estate passed to recheck yet --
haven't looked at the callsites in detail.  I'll give this another look
later.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Index Scans

2017-01-31 Thread Rahila Syed
Hello,

>Agreed, that it makes sense to consider only the number of pages to
>scan for computation of parallel workers.  I think for index scan we
>should consider both index and heap pages that need to be scanned
>(costing of index scan consider both index and heap pages).   I thin
>where considering heap pages matter more is when the finally selected
>rows are scattered across heap pages or we need to apply a filter on
>rows after fetching from the heap.  OTOH, we can consider just pages
>in the index as that is where mainly the parallelism works
IMO, considering just index pages will give a better estimate of work to be
done
in parallel. As the amount of work/number of pages divided amongst workers
is irrespective of
the number of heap pages scanned.

Thank you,
Rahila Syed




On Mon, Jan 30, 2017 at 2:52 PM, Amit Kapila 
wrote:

> On Sat, Jan 28, 2017 at 1:34 AM, Robert Haas 
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila 
> wrote:
> >> parallel_index_opt_exec_support_v6 - Removed the usage of
> >> GatherSupportsBackwardScan.  Expanded the comments in
> >> ExecReScanIndexScan.
> >
> > I looked through this and in general it looks reasonable to me.
> > However, I did notice one thing that I think is wrong.  In the
> > parallel bitmap heap scan patch, the second argument to
> > compute_parallel_worker() is the number of pages that the parallel
> > scan is expected to fetch from the heap.  In this patch, it's the
> > total number of pages in the index.  The former seems to me to be
> > better, because the point of having a threshold relation size for
> > parallelism is that we don't want to use a lot of workers to scan a
> > small number of pages -- the distribution of work won't be even, and
> > the potential savings are limited.  If we've got a big index but are
> > using a very selective qual to pull out only one or a small number of
> > rows on a single page or a small handful of pages, we shouldn't
> > generate a parallel path for that.
> >
>
> Agreed, that it makes sense to consider only the number of pages to
> scan for computation of parallel workers.  I think for index scan we
> should consider both index and heap pages that need to be scanned
> (costing of index scan consider both index and heap pages).   I thin
> where considering heap pages matter more is when the finally selected
> rows are scattered across heap pages or we need to apply a filter on
> rows after fetching from the heap.  OTOH, we can consider just pages
> in the index as that is where mainly the parallelism works.  In the
> attached patch (parallel_index_opt_exec_support_v7.patch), I have
> considered both index and heap pages, let me know if you think some
> other way is better.  I have also prepared a separate independent
> patch (compute_index_pages_v1) on HEAD to compute index pages which
> can be used by parallel index scan. There is no change in
> parallel_index_scan (parallel btree scan) patch, so I am not attaching
> its new version.
>
> > Now, against that theory, the GUC that controls the behavior of
> > compute_parallel_worker() is called min_parallel_relation_size, which
> > might make you think that the decision is supposed to be based on the
> > whole size of some relation.  But I think that just means we need to
> > rename the GUC to something like min_parallel_scan_size.  Possibly we
> > also ought consider reducing the default value somewhat, because it
> > seems like both sequential and index scans can benefit even when
> > scanning less than 8MB.
> >
>
> Agreed, but let's consider it separately.
>
>
> The order in which patches needs to be applied:
> compute_index_pages_v1.patch, parallel_index_scan_v6.patch[1],
> parallel_index_opt_exec_support_v7.patch
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1J%
> 3DLSBpDx7i_izGJxGVUryqPe-2SKT02De-PrQvywiMxw%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: [HACKERS] patch: function xmltable

2017-01-31 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera :

> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
> 
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
> 
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

That's great numbers, kudos for the hard work here.  That will make for
a nice headline in pg10 PR materials.  But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.

> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
> 
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;

Nice one :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical Replication and Character encoding

2017-01-31 Thread Shinoda, Noriyoshi
Hi hackers,

 I tried a committed Logical Replication environment. I found that replication 
between databases of different encodings did not convert encodings in character 
type columns. Is this behavior correct?

I expected that the character 0xe6bca2 (UTF-8) would be converted to the same 
character 0xb4c1 (EUC_JP). The example below replicates from the encoding UTF-8 
database to the encoding EUC_JP database. You can see that the character 
0xe6bca2 (UTF-8) is stored intact in the SUBSCRIPTION side database.

- PUBLICATION side (encode=UTF-8)
postgres=> \l postgres
  List of databases
   Name   |  Owner   | Encoding | Collate | Ctype | Access privileges
--+--+--+-+---+---
 postgres | postgres | UTF8 | C   | C |
(1 row)

postgres=> CREATE TABLE encode1(col1 NUMERIC PRIMARY KEY, col2 VARCHAR(10)) ; 
CREATE TABLE 
postgres=> CREATE PUBLICATION pub1 FOR TABLE encode1 ; 
CREATE PUBLICATION 
postgres=> INSERT INTO encode1 VALUES (1, '漢') ;  -- UTF-8 Character 0xe6bca2
INSERT 0 1

- SUBSCRIPTION side (encode=EUC_JP)
postgres=> \l postgres
  List of databases
   Name   |  Owner   | Encoding | Collate | Ctype | Access privileges
--+--+--+-+---+---
 postgres | postgres | EUC_JP   | C   | C |
(1 row)

postgres=> CREATE TABLE encode1(col1 NUMERIC PRIMARY KEY, col2 VARCHAR(10)) ; 
CREATE TABLE 
postgres=# CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres host=localhost 
port=5432' PUBLICATION pub1 ;
NOTICE:  created replication slot "sub1" on publisher 
CREATE SUBSCRIPTION 
postgres=# SELECT * FROM encode1 ;
ERROR:  invalid byte sequence for encoding "EUC_JP": 0xa2 
postgres=# SELECT heap_page_items(get_raw_page('encode1', 0)) ;
  heap_page_items
---
 (1,8152,1,33,565,0,0,"(0,1)",2,2306,24,,,"\\x0b0080010009e6bca2")  <- 
stored UTF-8 char 0xe6bca2
(1 row)

Snapshot:
  https://ftp.postgresql.org/pub/snapshot/dev/postgresql-snapshot.tar.gz 
2017-01-31 00:29:07
Operating System:
  Red Hat Enterprise Linux 7 Update 2 (x86-64)

Regards.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-01-31 Thread Ashutosh Bapat
>
> 0003-hj-refactor-memory-accounting-v4.patch:
>
> Modify the existing hash join code to work in terms of chunks when
> estimating and later tracking memory usage.  This is probably more
> accurate than the current tuple-based approach, because it tries to
> take into account the space used by chunk headers and the wasted space
> in chunks.  In practice the difference is probably small, but it's
> arguably more accurate;  I did this because I need chunk-based
> accounting the later patches.  Also, make HASH_CHUNK_SIZE the actual
> size of allocated chunks (ie the header information is included in
> that size so we allocate exactly 32KB, not 32KB + a bit, for the
> benefit of the dsa allocator which otherwise finishes up allocating
> 36KB).
>
I looked at this patch. I agree that it accounts the memory usage more
accurately. Here are few comments.

spaceUsed is defined with comment
SizespaceUsed;/* memory space currently used by tuples */

In ExecHashTableCreate(), although the space is allocated for buckets, no
tuples are yet inserted, so no space is used by the tuples, so going strictly
by the comment, spaceUsed should be 0 in that function. But I think the patch
is accounting the spaceUsed more accurately. Without this patch, the actual
allocation might cross spaceAllowed without being noticed. With this patch
that's not possible. Probably we should change the comment to say memory space
currently allocated. However, ExecHashIncreaseNumBatches() may change the
number of buckets; the patch does not seem to account for spaceUsed changes
because of that.

Without this patch ExecHashTableInsert() used to account for the space used by
a single tuple inserted. The patch moves this calculation in dense_alloc() and
accounts for out-of-bound allocation for larger tuples. That's good.

The change in ExecChooseHashTableSize() too looks fine.

In ExecHashTableReset(), do we want to update spacePeak while setting
spaceUsed.

While this patch tracks space usage more accurately, I am afraid we might be
overdoing it; a reason why we don't track space usage accurately now. But I
think I will leave it to be judged by someone who is more familiar with the
code and possibly has historical perspective.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of replication commands using printsimple

2017-01-31 Thread Robert Haas
On Tue, Jan 31, 2017 at 12:19 AM, Michael Paquier
 wrote:
> This is a follow up of the refactoring that has been discussed in the
> thread to increase the default size of WAL segments:
> https://www.postgresql.org/message-id/cab7npqq4hynrlq+w1jrryvsysoxuqa40pyb2uw5uqkkag4h...@mail.gmail.com
>
> The discussion has resulted in the creation of a84069d9 that has
> introduced a new DestReceiver method called printsimple that does not
> need any catalog access. After some investigation, I have noticed that
> a couple of messages used in the replication protocol could be
> refactored as well:
> - IDENTIFY_SYSTEM
> - TIMELINE_HISTORY
> - CREATE_REPLICATION_SLOT
> This results in the following code reduction:
>  3 files changed, 115 insertions(+), 162 deletions(-)
>
> A commit fest entry has been created:
> https://commitfest.postgresql.org/13/978/

Sorry, I have a little more nitpicking.  How about having
printsimple() use pq_sendcountedtext() instead of pq_sendint()
followed by pq_sendbytes(), as it does for TEXTOID?

Other than that, this looks fine to me now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-01-31 Thread Nikita Glukhov

On 31.01.2017 13:04, Kyotaro HORIGUCHI wrote:

The following comment,


/* Can any range from range_box to be overlower than this argument? */


This might be better to be using the same wording to its users,
for example the comment for overLeft4D is the following.

| /* Can any rectangle from rect_box does not extend the right of this 
argument? */

And the documentation uses the following sentence,

https://www.postgresql.org/docs/current/static/functions-geometry.html
| Does not extend to the right of?

So I think "Can any range from range_box does not extend the
upper of this argument?" is better than the overlower. What do
you think?


I think "does not extend the upper" is better than unclear "overlower" too.
So I've corrected this comment in the attached v03 patch.



I confirmed other functions in geo_spgist but found no bugs like this.


I found no bugs in other functions too.



The minimal bad example for the '&<' case is the following.

=# create table t (b box);
=# create index on t using spgist(b);
=# insert into t values ('((215, 305),(210,300))'::box);
=# select * from t where b &< '((100,200),(300,400))'::box;
  b
-
 (215,305),(210,300)
(1 row)

=# set enable_seqscan to false;
=# select * from t where b &< '((100,200),(300,400))'::box;
 b
---
(0 rows)


I don't know how you were able to reproduce this bug in 
spg_box_quad_inner_consistent()
using only one box because index tree must have at least one inner node (or 157 
boxes)
in order to spg_box_quad_inner_consistent() began to be called.  That's why 
existing
tests for box_ops didn't detect this bug: box_temp table has only 56 rows.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/geo_spgist.c b/src/backend/utils/adt/geo_spgist.c
index aacb340..c95ec1c 100644
--- a/src/backend/utils/adt/geo_spgist.c
+++ b/src/backend/utils/adt/geo_spgist.c
@@ -286,6 +286,14 @@ lower2D(RangeBox *range_box, Range *query)
 		FPlt(range_box->right.low, query->low);
 }
 
+/* Can any range from range_box does not extend higher than this argument? */
+static bool
+overLower2D(RangeBox *range_box, Range *query)
+{
+	return FPle(range_box->left.low, query->high) &&
+		FPle(range_box->right.low, query->high);
+}
+
 /* Can any range from range_box to be higher than this argument? */
 static bool
 higher2D(RangeBox *range_box, Range *query)
@@ -294,6 +302,14 @@ higher2D(RangeBox *range_box, Range *query)
 		FPgt(range_box->right.high, query->high);
 }
 
+/* Can any range from range_box does not extend lower than this argument? */
+static bool
+overHigher2D(RangeBox *range_box, Range *query)
+{
+	return FPge(range_box->left.high, query->low) &&
+		FPge(range_box->right.high, query->low);
+}
+
 /* Can any rectangle from rect_box be left of this argument? */
 static bool
 left4D(RectBox *rect_box, RangeBox *query)
@@ -305,7 +321,7 @@ left4D(RectBox *rect_box, RangeBox *query)
 static bool
 overLeft4D(RectBox *rect_box, RangeBox *query)
 {
-	return lower2D(_box->range_box_x, >right);
+	return overLower2D(_box->range_box_x, >left);
 }
 
 /* Can any rectangle from rect_box be right of this argument? */
@@ -319,7 +335,7 @@ right4D(RectBox *rect_box, RangeBox *query)
 static bool
 overRight4D(RectBox *rect_box, RangeBox *query)
 {
-	return higher2D(_box->range_box_x, >right);
+	return overHigher2D(_box->range_box_x, >left);
 }
 
 /* Can any rectangle from rect_box be below of this argument? */
@@ -333,7 +349,7 @@ below4D(RectBox *rect_box, RangeBox *query)
 static bool
 overBelow4D(RectBox *rect_box, RangeBox *query)
 {
-	return lower2D(_box->range_box_y, >left);
+	return overLower2D(_box->range_box_y, >right);
 }
 
 /* Can any rectangle from rect_box be above of this argument? */
@@ -347,7 +363,7 @@ above4D(RectBox *rect_box, RangeBox *query)
 static bool
 overAbove4D(RectBox *rect_box, RangeBox *query)
 {
-	return higher2D(_box->range_box_y, >right);
+	return overHigher2D(_box->range_box_y, >right);
 }
 
 /*
diff --git a/src/test/regress/expected/box.out b/src/test/regress/expected/box.out
index 5f8b945..251df93 100644
--- a/src/test/regress/expected/box.out
+++ b/src/test/regress/expected/box.out
@@ -455,3 +455,108 @@ EXPLAIN (COSTS OFF) SELECT * FROM box_temp WHERE f1 ~= '(20,20),(40,40)';
 
 RESET enable_seqscan;
 DROP INDEX box_spgist;
+--
+-- Test the SP-GiST index on the larger volume of data
+--
+CREATE TEMPORARY TABLE quad_box_tbl (b box);
+INSERT INTO quad_box_tbl
+	SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
+	FROM generate_series(1, 100) x,
+		 generate_series(1, 100) y;
+-- insert repeating data to test allTheSame
+INSERT INTO quad_box_tbl
+	SELECT '((200, 300),(210, 310))'
+	FROM generate_series(1, 1000);
+INSERT INTO quad_box_tbl
+	VALUES
+		(NULL),
+		(NULL),
+		('((-infinity,-infinity),(infinity,infinity))'),
+		('((-infinity,100),(-infinity,500))'),
+		('((-infinity,-infinity),(700,infinity))');
+CREATE INDEX 

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Alvaro Herrera
Pavan Deolasee wrote:
> On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera 
> wrote:

> > CatalogUpdateIndexes was just added as a convenience function on top of
> > a very common pattern.  If we now have a reason to create a second one
> > because there are now two very common patterns, it seems reasonable to
> > have two functions.  I think I would commit the refactoring to create
> > these functions ahead of the larger WARM patch, since I think it'd be
> > bulky and largely mechanical.  (I'm going from this description; didn't
> > read your actual code.)
> 
> Sounds good. Should I submit that as a separate patch on current master?

Yes, please.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Abbas Butt
Sorry for the confusion.
ANALYZE works for the foreign table 'foreign_numbers'.
test=# analyze foreign_numbers;
ANALYZE
test=#



On Tue, Jan 31, 2017 at 5:04 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Jan 31, 2017 at 5:23 PM, Abbas Butt 
> wrote:
> >
> >
> > On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita <
> fujita.ets...@lab.ntt.co.jp>
> > wrote:
> >>
> >> On 2017/01/31 19:53, Abbas Butt wrote:
> >>>
> >>> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
> >>> >
> wrote:
> >>> On 2017/01/31 18:24, Abbas Butt wrote:
> >>
> >>
> >>> Postgres_fdw optimizes remote queries by pushing down the where
> >>> clause.
> >>> This feature does not work consistently when the query is
> >>> executed from
> >>> within a pl/pgsql function. The optimization works when the
> >>> function
> >>> executes the query for the first 5 times, and fails afterwards.
> >>
> >>
> >>> I understand that this is because PostgreSQL starts using
> >>> generic plan
> >>> with pulled up where clause after the 5th invocation hoping
> that
> >>> it
> >>> would be faster since we have skiped planning the query on each
> >>> invocation, but in this case this decision is causing the query
> >>> to slow
> >>> down.
> >>
> >>
> >>> How should we fix this problem?
> >>
> >>
> >>> ANALYZE for the foreign table doesn't work?
> >>
> >>
> >>> No.
> >>>
> >>> analyze ts.tickets;
> >>> WARNING:  skipping "tickets" --- cannot analyze this foreign table
> >>> ANALYZE
> >>
> >>
> >> How the foreign table ts.tickets is defined?
> >
> >
> > test=# \d ts.tickets
> >  Foreign table "ts.tickets"
> >  Column |  Type   | Modifiers | FDW Options
> > +-+---+-
> >  id | integer | not null  |
> > Server: mysql_server
> > FDW Options: (dbname 'msql_test_db', table_name 'tickets')
> >
> > Its a foreign table, referring to table 'tickets' defined on MySQL.
> >
> Isn't your original complaint about postgres_fdw? You can not tickets,
> which is a mongo_fdw foreign table, is probably because mongo_fdw has
> not implemented analyze FDW routine.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
-- 
*Abbas*
Architect

Ph: 92.334.5100153
Skype ID: gabbasb
www.enterprisedb.co m


*Follow us on Twitter*
@EnterpriseDB

Visit EnterpriseDB for tutorials, webinars, whitepapers
 and more



Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Daniel Verite
Tom Lane wrote:

> Moreover, the committed patch is inconsistent in that it forbids
> only one of the above.  Why is it okay to treat unset as "off",
> but not okay to treat the default empty-string value as "on"?

Treating unset (NULL in the value) as "off" comes from the fact 
that except AUTOCOMMIT, our built-in variables are not initialized
with a default value. For instance we call this in EstablishVariableSpace();
  SetVariableAssignHook(pset.vars, "ON_ERROR_STOP", on_error_stop_hook);
then on_error_stop_hook is called with NULL as the value
then ParseVariableBool(NULL, "ON_ERROR_STOP", _error_stop)
is what initializes pset.on_error_stop to false.

The same happens if/when the variable is unset. Again the hook is called
with NULL, and it sets back the pset.* variable in a hardcoded default state,
which is false for all booleans.

Incidentally I want to suggest to change that, so that all variables
should be initialized with a non-NULL value right from the start, and the
value would possibly come to NULL only if it's unset.
This would allow the hook to distinguish between initialization and
unsetting, which in turn will allow it to deny the \unset in the
cases when it doesn't make any sense conceptually (like AUTOCOMMIT).

Forcing values for our built-in variables would also avoid the following:

=# \echo :ON_ERROR_STOP
:ON_ERROR_STOP

Even if the variable ON_ERROR_STOP does exist in the VariableSpace
and has a hook and there's an initialized corresponding pset.on_error_stop,
from the user's viewpoint, it's as if the variable doesn't exist
until they intialize it explicitly.
I suggest that it doesn't make much sense and it would be better
to have that instead right from the start:

=# \echo :ON_ERROR_STOP
off

> One possible compromise that would address your concern about display
> is to modify the hook API some more so that variable hooks could actually
> substitute new values.  Then for example the bool-variable hooks could
> effectively replace "\set AUTOCOMMIT" by "\set AUTOCOMMIT on" and
> "\unset AUTOCOMMIT" by "\set AUTOCOMMIT off", ensuring that interrogation
> of their values always produces sane results.

Agreed, that looks like a good compromise.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] One-shot expanded output in psql using \G

2017-01-31 Thread Daniel Verite
Stephen Frost wrote:

> That's not how '\dx' works, as I pointed out, so I don't see having the
> second character being 'x' to imply "\x mode" makes sense.

\gx means "like \g but output with expanded display"
It turns out that it's semantically close to "\g with \x" so
I refered to it like that as a shortcut upthread, my fault.
It was never meant to establish a precedent that combining
two letters would mean "do the first one-letter command and the
second as a sub-command" which indeed woud be inconsistent with
the existing \ef, \sf, \sv, \d[*] and so on.

> I can't recall ever using the other formatting toggles (aligned, HTML,
> and tuples only) before in interactive sessions, except (rarely) with
> \o.

\a is handy to read sizeable chunks of text in fields that 
contain newlines, and personally I need it on a regular basis
in interactive sessions. It depends on the kind of data you have to work
with.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Bug fix] PQsendQuery occurs error when target_session_attrs is set to read-write

2017-01-31 Thread Higuchi, Daisuke
Hello,

This this is my first posting to the mailing list.

I am interested in multiple hosts of libpq [1], then I found the bug in this 
feature. 
When I set "target_session_attrs" to "any" and call PQsendQuery, my application 
is succeeded.
However, when I set "target_session_attrs" to "read-write" and call 
PQsendQuery, "another command is already in progress" is occurred. 
I attached the test application to reproduce this problem. 

I think this is because PQgetResult is not called until PQgetResult has 
returned a null pointer. 
So, I attached the patch for fix this. 

[1] 
https://www.postgresql.org/message-id/flat/20150818041850.ga5...@wagner.pp.ru#20150818041850.ga5...@wagner.pp.ru

Regards, 
Daisuke Higuchi

/*
 * gcc multiple_hosts_test.c -lpq -I/home/postgres/work/pgsql10/include 
-L/home/postgres/work/pgsql10/lib
 * if target_session_attrs=read-write,  "another command is already in 
progress" is occured
 */

#include 
#include 
#include 
#ifdef WIN32
#include 
#endif
#include "libpq-fe.h"

int main( int argc, char *argv[])
{
int i=0,rtn;
PGconn *conn;
PGresult *r;
const char *keywords[5];
const char *values[5];

/*
** Connect to postgres
*/
keywords[0] = "dbname";
values[0] = "postgres";
keywords[1] = "port";
values[1] = (char *)getenv("PGPORT"); /* export PGPORT=5432,5433 */
keywords[1] = "host";
values[1] = (char *)getenv("PGHOST"); /* export 
PGHOST=localhost,localhost */
keywords[2] = "target_session_attrs";
values[2] = "read-write";
keywords[3] = NULL;
values[3] = NULL;
conn = PQconnectdbParams((const char * const*), (const char * 
const*), 1);
if (PQstatus(conn) != CONNECTION_OK)
{
printf("Connection to database failed: %s 
\n",PQerrorMessage(conn));
goto ERR;
}
printf("host=%s,port=%s\n",PQhost(conn),PQport(conn));

/*
** Asynchronous statement
*/
printf("-- SHOW transaction_read_only --\n");
rtn = PQsendQuery(conn, "SHOW transaction_read_only");
if (rtn != 1)
{
printf( "SHOW command failed: %s", PQerrorMessage(conn));
goto ERR;
}
r = PQgetResult(conn);
printf("SHOW transaction_read_only:%s\n",PQgetvalue(r, 0, 0));
r = PQgetResult(conn);
PQclear(r);

/*
** Disconnect
*/
PQfinish(conn);
printf("Success!\n");

return(0);

ERR:
PQfinish(conn);
printf("Failed!\n");

return(1);
}




PQsendQuery_for_target_session_attrs_v1.patch
Description: PQsendQuery_for_target_session_attrs_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-31 Thread Haribabu Kommi
On Tue, Jan 31, 2017 at 4:13 PM, Vitaly Burovoy 
wrote:

> On 1/27/17, Haribabu Kommi  wrote:
> > Updated patches are attached.
>
>
> Hello,
>
> I'm almost ready to mark it as Ready for committer.
> The final round.


Thanks for the review.


> 1.
> >+DATA(insert OID = 774 ( macaddr8 ...
> >+#define MACADDR8OID 972
> What does this number (972) mean? I think it should be 774, the same
> number as was used in the type definition.
>

I think I might have missed to update during OID number changes.
Fixed.


>
> Since there is an editing required, please, fix whitespaces:
> 2.
> >+DATA(insert OID = 3371 (  403 macaddr8_ops
>   PGNSP PGUID ));
> >+DATA(insert OID = 3372 (  405 macaddr8_ops
>   PGNSP PGUID ));
>
> only one (not three) tab before "PGNSP" should be used (see other rows
> around yours: "OID = 1983", "OID = 1991" etc).
>

Corrected.


> 3.
> >diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
> some new rows have two tabs instead of eight spaces.
>

Corrected.


> 4.
> Some other files need to be fixed by pgindent (it helps supporting in
> the future):
> contrib/btree_gist/btree_macaddr8.c (a lot of rows)
> src/include/utils/inet.h  (I have no idea why it changes indentation
> to tabs for macaddr8 whereas it leaves spaces for macaddr)
>

Done.

5.
> src/backend/utils/adt/network.c
> pg_indent makes it uglier. I've just found how to write the line for it:
> res *= ((double) 256) * 256 * 256 * 256;


Done.

Updated patches are attached.


Regards,
Hari Babu
Fujitsu Australia


contrib_macaddr8_3.patch
Description: Binary data


mac_eui64_support_10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An issue in remote query optimization

2017-01-31 Thread Ashutosh Bapat
On Tue, Jan 31, 2017 at 5:23 PM, Abbas Butt  wrote:
>
>
> On Tue, Jan 31, 2017 at 3:15 AM, Etsuro Fujita 
> wrote:
>>
>> On 2017/01/31 19:53, Abbas Butt wrote:
>>>
>>> On Tue, Jan 31, 2017 at 2:25 AM, Etsuro Fujita
>>> > wrote:
>>> On 2017/01/31 18:24, Abbas Butt wrote:
>>
>>
>>> Postgres_fdw optimizes remote queries by pushing down the where
>>> clause.
>>> This feature does not work consistently when the query is
>>> executed from
>>> within a pl/pgsql function. The optimization works when the
>>> function
>>> executes the query for the first 5 times, and fails afterwards.
>>
>>
>>> I understand that this is because PostgreSQL starts using
>>> generic plan
>>> with pulled up where clause after the 5th invocation hoping that
>>> it
>>> would be faster since we have skiped planning the query on each
>>> invocation, but in this case this decision is causing the query
>>> to slow
>>> down.
>>
>>
>>> How should we fix this problem?
>>
>>
>>> ANALYZE for the foreign table doesn't work?
>>
>>
>>> No.
>>>
>>> analyze ts.tickets;
>>> WARNING:  skipping "tickets" --- cannot analyze this foreign table
>>> ANALYZE
>>
>>
>> How the foreign table ts.tickets is defined?
>
>
> test=# \d ts.tickets
>  Foreign table "ts.tickets"
>  Column |  Type   | Modifiers | FDW Options
> +-+---+-
>  id | integer | not null  |
> Server: mysql_server
> FDW Options: (dbname 'msql_test_db', table_name 'tickets')
>
> Its a foreign table, referring to table 'tickets' defined on MySQL.
>
Isn't your original complaint about postgres_fdw? You can not tickets,
which is a mongo_fdw foreign table, is probably because mongo_fdw has
not implemented analyze FDW routine.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-31 Thread Pavan Deolasee
On Tue, Jan 31, 2017 at 7:21 PM, Alvaro Herrera 
wrote:

> Pavan Deolasee wrote:
> > On Thu, Jan 26, 2017 at 2:38 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> > wrote:
>
> > > The simple_heap_update + CatalogUpdateIndexes pattern is getting
> > > obnoxious.  How about creating something like catalog_heap_update which
> > > does both things at once, and stop bothering each callsite with the
> WARM
> > > stuff?
> >
> > What I realised that there are really 2 patterns:
> > 1. simple_heap_insert, CatalogUpdateIndexes
> > 2. simple_heap_update, CatalogUpdateIndexes
> >
> > There are only couple of places where we already have indexes open or
> have
> > more than one tuple to update, so we call CatalogIndexInsert directly.
> What
> > I ended up doing in the attached patch is add two new APIs which combines
> > the two steps of each of these patterns. It seems much cleaner to me and
> > also less buggy for future users. I hope I am not missing a reason not to
> > do combine these steps.
>
> CatalogUpdateIndexes was just added as a convenience function on top of
> a very common pattern.  If we now have a reason to create a second one
> because there are now two very common patterns, it seems reasonable to
> have two functions.  I think I would commit the refactoring to create
> these functions ahead of the larger WARM patch, since I think it'd be
> bulky and largely mechanical.  (I'm going from this description; didn't
> read your actual code.)
>

Sounds good. Should I submit that as a separate patch on current master?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] sequence data type

2017-01-31 Thread Peter Eisentraut
And here is a rebased patch for the original feature.  I think this
addresses all raised concerns and suggestions now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From ce2680ef072a9a4dc2cb879a70610d71ad24 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 27 Jan 2017 23:52:53 -0500
Subject: [PATCH v5] Add CREATE SEQUENCE AS  clause

This stores a data type, required to be an integer type, with the
sequence.  The sequences min and max values default to the range
supported by the type, and they cannot be set to values exceeding that
range.  The internal implementation of the sequence is not affected.

Change the serial types to create sequences of the appropriate type.
This makes sure that the min and max values of the sequence for a serial
column match the range of values supported by the table column.  So the
sequence can no longer overflow the table column.

This also makes monitoring for sequence exhaustion/wraparound easier,
which currently requires various contortions to cross-reference the
sequences with the table columns they are used with.

This commit also effectively reverts the pg_sequence column reordering
in f3b421da5f4addc95812b9db05a24972b8fd9739, because the new seqtypid
column allows us to fill the hole in the struct and create a more
natural overall column ordering.
---
 doc/src/sgml/catalogs.sgml  |  14 +++-
 doc/src/sgml/information_schema.sgml|   4 +-
 doc/src/sgml/ref/alter_sequence.sgml|  30 ++--
 doc/src/sgml/ref/create_sequence.sgml   |  36 ++
 src/backend/catalog/information_schema.sql  |   4 +-
 src/backend/commands/sequence.c |  89 +---
 src/backend/parser/gram.y   |   6 +-
 src/backend/parser/parse_utilcmd.c  |   2 +-
 src/bin/pg_dump/pg_dump.c   | 103 +++-
 src/bin/pg_dump/t/002_pg_dump.pl|   2 +
 src/include/catalog/catversion.h|   2 +-
 src/include/catalog/pg_proc.h   |   2 +-
 src/include/catalog/pg_sequence.h   |   8 ++-
 src/test/modules/test_pg_dump/t/001_base.pl |   1 +
 src/test/regress/expected/sequence.out  |  49 +
 src/test/regress/sql/sequence.sql   |  13 
 16 files changed, 266 insertions(+), 99 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 086fafc694..f57937a17a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -5774,10 +5774,11 @@ pg_sequence Columns
  
 
  
-  seqcycle
-  bool
+  seqtypid
+  oid
+  pg_type.oid
   
-  Whether the sequence cycles
+  Data type of the sequence
  
 
  
@@ -5814,6 +5815,13 @@ pg_sequence Columns
   
   Cache size of the sequence
  
+
+ 
+  seqcycle
+  bool
+  
+  Whether the sequence cycles
+ 
 

   
diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index c43e325d06..a3a19ce8ce 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -4653,9 +4653,7 @@ sequences Columns
   data_type
   character_data
   
-   The data type of the sequence.  In
-   PostgreSQL, this is currently always
-   bigint.
+   The data type of the sequence.
   
  
 
diff --git a/doc/src/sgml/ref/alter_sequence.sgml b/doc/src/sgml/ref/alter_sequence.sgml
index 3b52e875e3..252a668189 100644
--- a/doc/src/sgml/ref/alter_sequence.sgml
+++ b/doc/src/sgml/ref/alter_sequence.sgml
@@ -23,7 +23,9 @@
 
  
 
-ALTER SEQUENCE [ IF EXISTS ] name [ INCREMENT [ BY ] increment ]
+ALTER SEQUENCE [ IF EXISTS ] name
+[ AS data_type ]
+[ INCREMENT [ BY ] increment ]
 [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
 [ START [ WITH ] start ]
 [ RESTART [ [ WITH ] restart ] ]
@@ -81,6 +83,26 @@ Parameters
  
 
  
+  data_type
+  
+   
+The optional
+clause AS data_type
+changes the data type of the sequence.  Valid types are
+are smallint, integer,
+and bigint.
+   
+
+   
+Note that changing the data type does not automatically change the
+minimum and maximum values.  You can use the clauses NO
+MINVALUE and NO MAXVALUE to adjust the
+minimum and maximum values to the range of the new data type.
+   
+  
+ 
+
+ 
   increment
   

@@ -102,7 +124,7 @@ Parameters
 class="parameter">minvalue determines
 the minimum value a sequence can generate. If NO
 MINVALUE is specified, the defaults of 1 and
--263 for ascending and descending sequences,
+the minimum value of the data type for ascending and descending sequences,
 respectively, will be used.  If neither option is 

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2017-01-31 Thread Robert Haas
On Thu, Jan 26, 2017 at 5:53 AM, Emre Hasegeli  wrote:
> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

As if on cue, let me say that a behavior-changing GUC sounds like a
terrible idea to me.  It's almost never good when a GUC can cause the
same queries to return answers in different sessions, and even worse,
it seems like the GUC might have the effect of letting us build
indexes that are only valid for the value of the GUC with which they
were built.

Backing up a bit here, have we lost track of the problem that we're
trying to solve?  Tom gave his opinion here:

https://www.postgresql.org/message-id/3895.1464791...@sss.pgh.pa.us

But I don't see that the latest patch I can find does anything to fix
that.  Now maybe that's not the problem that Emre is trying to solve,
but then it is not very clear to me what problem he IS trying to
solve.  And I think Kyotaro Horiguchi is trying to solve yet another
problem which is again different.  So IMHO the first task here is to
agree on a clear statement of what we'd like to fix, and then, given a
patch, we can judge whether it's fixed.

Maybe I'm being dumb here and it's clear to you guys what the issues
under discussion are.  If so, apologies for that, but the thread has
gotten too theoretical for me and I can't figure out what the
top-level concern is any more.  I believe we all agree these macros
are bad, but there seems to be no agreement that I can discern on what
would be better or why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2017-01-31 Thread Robert Haas
On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh
 wrote:
> I've attached the patch with the modified changes. PFA.

Can this patch check contrib/bloom?

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+/*
+ * For GIN_DELETED page, the page is initialized to empty.
+ * Hence mask everything.
+ */
+if (opaque->flags & GIN_DELETED)
+memset(page_norm, MASK_MARKER, BLCKSZ);
+else
+mask_unused_space(page_norm);

If the page is initialized to empty, why do we need to mask
anything/everything?  Anyway, it doesn't seem right to mask the
GinPageOpaque structure itself. Or at least it doesn't seem right to
mask the flags word.

+/*
+ * For gist leaf pages, mask some line pointer bits, particularly
+ * those marked as used on a master and unused on a standby.
+ */

Comment doesn't explain why we need to do this.

+if (!HeapTupleHeaderXminFrozen(page_htup))
+page_htup->t_infomask |= HEAP_XACT_MASK;
+else
+page_htup->t_infomask |= HEAP_XMAX_COMMITTED |
HEAP_XMAX_INVALID;

Comment doesn't address this logic.  Also, the "else" case shouldn't
exist at all, I think.

+/*
+ * For a speculative tuple, the content of t_ctid is conflicting
+ * between the backup page and current page. Hence, we set it
+ * to the current block number and current offset.
+ */

Why does it differ?  Is that a bug?

+/*
+ * Mask everything on a DELETED page since it will be re-initialized
+ * during replay.
+ */
+if ((maskopaq->btpo_flags & BTP_DELETED) != 0)
+{
+/* Mask Page Content */
+memset(page_norm + SizeOfPageHeaderData, MASK_MARKER,
+   BLCKSZ - SizeOfPageHeaderData);
+
+/* Mask pd_lower and pd_upper */
+memset(&((PageHeader) page_norm)->pd_lower, MASK_MARKER,
+   sizeof(uint16));
+memset(&((PageHeader) page_norm)->pd_upper, MASK_MARKER,
+   sizeof(uint16));

This isn't consistent with the GIN_DELETE case - it is more selective
about what it masks.  Probably that logic should be adapted to look
more like this.

+/*
+ * Mask some line pointer bits, particularly those marked as
+ * used on a master and unused on a standby.
+ */

Comment (still) doesn't explain why we need to do this.

+/*
+ * During replay of a btree page split, we don't set the BTP_SPLIT_END
+ * flag of the right sibling and initialize th cycle_id to 0 for the same
+ * page.
+ */

A reference to the name of the redo function might be helpful here
(and in some other places), unless it's just ${AMNAME}_redo.  "th" is
a typo for "the".

+appendStringInfoString(buf, " (full page
image, apply)");
+else
+appendStringInfoString(buf, " (full page image)");

How about "(full page image)" and "(full page image, for WAL verification)"?

Similarly in XLogDumpDisplayRecord, I think we should assume that
"FPW" means what it has always meant, and leave that output alone.
Instead, distinguish the WAL-consistency-checking case when it
happens, e.g. "(FPW for consistency check)".

+checkConsistency(XLogReaderState *record)

How about CheckXLogConsistency()?

+ * If needs_backup is true or wal consistency check is enabled for

...or WAL checking is enabled...

+ * If WAL consistency is enabled for the resource manager of

If WAL consistency checking is enabled...

+ * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag

with the BKPIMAGE_APPLY flag

- * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended
- * with all-zeroes pages up to the referenced block number.  In
- * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value
+ * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag
+ * is not set for the backup block, the relation is extended with all-zeroes
+ * pages up to the referenced block number.

OK, I'm puzzled by this.  Surely we don't want the WAL consistency
checking facility to cause the relation to be extended like this.  And
I don't see why it would, because the WAL consistency checking happens
after the page changes have already been applied.  I wonder if this
hunk is in error and should be dropped.

+PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX);
+phdr->pd_prune_xid = PG_UINT32_MAX;
+phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES;
+phdr->pd_flags |= PD_ALL_VISIBLE;
+#define MASK_MARKER0xFF
(and many others)

Why do we mask by setting bits rather than clearing bits?  My
intuition would have been to zero things we want to mask, rather than
setting them 

Re: [HACKERS] Improvements in psql hooks for variables

2017-01-31 Thread Tom Lane
"Daniel Verite"  writes:
>   Tom Lane wrote:
>> One possible compromise that would address your concern about display
>> is to modify the hook API some more so that variable hooks could actually
>> substitute new values.  Then for example the bool-variable hooks could
>> effectively replace "\set AUTOCOMMIT" by "\set AUTOCOMMIT on" and
>> "\unset AUTOCOMMIT" by "\set AUTOCOMMIT off", ensuring that interrogation
>> of their values always produces sane results.

> Agreed, that looks like a good compromise.

Attached is a draft patch for that.  I chose to make a second hook rather
than complicate the assign hook API, mainly because it allows more code
sharing --- all the bool vars can share the same substitute hook, and
so can the three-way vars as long as "on" and "off" are the appropriate
substitutes.

I only touched the behavior for bool vars here, but if people like this
solution it could be fleshed out to apply to all the built-in variables.

Maybe we should merge SetVariableSubstituteHook and SetVariableAssignHook
into one function?

regards, tom lane

diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 0574b5b..985cfcb 100644
*** a/src/bin/psql/startup.c
--- b/src/bin/psql/startup.c
*** showVersion(void)
*** 777,787 
  
  
  /*
!  * Assign hooks for psql variables.
   *
   * This isn't an amazingly good place for them, but neither is anywhere else.
   */
  
  static bool
  autocommit_hook(const char *newval)
  {
--- 777,804 
  
  
  /*
!  * Substitute hooks and assign hooks for psql variables.
   *
   * This isn't an amazingly good place for them, but neither is anywhere else.
   */
  
+ static char *
+ bool_substitute_hook(char *newval)
+ {
+ 	if (newval == NULL)
+ 	{
+ 		/* "\unset FOO" becomes "\set FOO off" */
+ 		newval = pg_strdup("off");
+ 	}
+ 	else if (newval[0] == '\0')
+ 	{
+ 		/* "\set FOO" becomes "\set FOO on" */
+ 		pg_free(newval);
+ 		newval = pg_strdup("on");
+ 	}
+ 	return newval;
+ }
+ 
  static bool
  autocommit_hook(const char *newval)
  {
*** EstablishVariableSpace(void)
*** 1002,1015 
--- 1019,1039 
  {
  	pset.vars = CreateVariableSpace();
  
+ 	SetVariableSubstituteHook(pset.vars, "AUTOCOMMIT", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "AUTOCOMMIT", autocommit_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ON_ERROR_STOP", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ON_ERROR_STOP", on_error_stop_hook);
+ 	SetVariableSubstituteHook(pset.vars, "QUIET", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "QUIET", quiet_hook);
+ 	SetVariableSubstituteHook(pset.vars, "SINGLELINE", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "SINGLELINE", singleline_hook);
+ 	SetVariableSubstituteHook(pset.vars, "SINGLESTEP", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "SINGLESTEP", singlestep_hook);
  	SetVariableAssignHook(pset.vars, "FETCH_COUNT", fetch_count_hook);
  	SetVariableAssignHook(pset.vars, "ECHO", echo_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ECHO_HIDDEN", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ECHO_HIDDEN", echo_hidden_hook);
+ 	SetVariableSubstituteHook(pset.vars, "ON_ERROR_ROLLBACK", bool_substitute_hook);
  	SetVariableAssignHook(pset.vars, "ON_ERROR_ROLLBACK", on_error_rollback_hook);
  	SetVariableAssignHook(pset.vars, "COMP_KEYWORD_CASE", comp_keyword_case_hook);
  	SetVariableAssignHook(pset.vars, "HISTCONTROL", histcontrol_hook);
diff --git a/src/bin/psql/variables.c b/src/bin/psql/variables.c
index 91e4ae8..61c4ccc 100644
*** a/src/bin/psql/variables.c
--- b/src/bin/psql/variables.c
*** CreateVariableSpace(void)
*** 52,57 
--- 52,58 
  	ptr = pg_malloc(sizeof *ptr);
  	ptr->name = NULL;
  	ptr->value = NULL;
+ 	ptr->substitute_hook = NULL;
  	ptr->assign_hook = NULL;
  	ptr->next = NULL;
  
*** ParseVariableBool(const char *value, con
*** 101,111 
  	size_t		len;
  	bool		valid = true;
  
  	if (value == NULL)
! 	{
! 		*result = false;		/* not set -> assume "off" */
! 		return valid;
! 	}
  
  	len = strlen(value);
  
--- 102,110 
  	size_t		len;
  	bool		valid = true;
  
+ 	/* Treat "unset" as an empty string, which will lead to error below */
  	if (value == NULL)
! 		value = "";
  
  	len = strlen(value);
  
*** ParseVariableNum(const char *value, cons
*** 152,159 
  	char	   *end;
  	long		numval;
  
  	if (value == NULL)
! 		return false;
  	errno = 0;
  	numval = strtol(value, , 0);
  	if (errno == 0 && *end == '\0' && end != value && numval == (int) numval)
--- 151,160 
  	char	   *end;
  	long		numval;
  
+ 	/* Treat "unset" as an empty string, which will lead to error below */
  	if (value == NULL)
! 		value = "";
! 
  	errno = 0;
  	numval = strtol(value, , 0);
  	if (errno == 0 && *end == '\0' && end != value && numval == (int) numval)
*** SetVariable(VariableSpace space, const c
*** 

  1   2   >