Re: [HACKERS] BLK_DONE state in XLogReadBufferForRedoExtended

2017-10-23 Thread Amit Kapila
On Tue, Oct 17, 2017 at 7:53 AM, Michael Paquier
 wrote:
> On Mon, Oct 16, 2017 at 9:50 PM, Amit Kapila  wrote:
>> If above analysis is correct, then I think we can say that row state
>> in a page will be same during recovery as it was when the original
>> operation was performed if the full_page_writes are enabled. I am not
>> sure how much this can help in current heap format, but this can help
>> in zheap (undo based heap).
>
> If I understood that correctly, that looks like a sane assumption. For
> REGBUF_NO_IMAGE you may need to be careful though with undo
> operations.
>

Right, but as of now, we don't need to use this assumption with REGBUF_NO_IMAGE.

>> In zheap, we are writing complete tuple for Delete operation in undo
>> so that we can reclaim the corresponding tuple space as soon as the
>> deleting transaction is committed.  Now, during recovery, we have to
>> generate the complete undo record (which includes the entire tuple)
>> and for that ideally, we should write the complete tuple in WAL, but
>> instead of that, I think we can regenerate it from the original page.
>> This is only applicable when full_page_writes are enabled, otherwise,
>> a complete tuple is required in WAL.
>
> Yeah, you should really try to support both modes as well.
>

I also think so.

> Fortunately, it is possible to know if full page writes are enforced
> at the moment a record is assembled and inserted, so you could rely on
> that.
>

Yeah, but actually we need to know whether full page writes are
enforced while forming the record (something like in log_heap_update).
Now, ideally to read the flags XLogCtlInsert->
fullPageWrites or XLogCtlInsert->forcePageWrites, we need an insertion
lock as we do in XLogInsertRecord.  However, I think we don't need an
insertion lock to read the values for this purpose, rather we can use
GetFullPageWriteInfo which doesn't need a lock.  The reason is that if
the value of doPageWrites is true while forming and assembling the WAL
records, then we will include the copy of page even if the value
changes in XLogInsertRecord.   OTOH, if it is false while forming and
assembling the WAL records, then we would have to anyway include undo
tuple in the WAL record which will avoid the dependency on
full_page_image, so even if doPageWrites changes to true in
XLogInsertRecord, we don't need to worry.

>> I am not sure how much above makes sense to anyone without a detailed
>> explanation, but I thought I should give some background on why I
>> asked this question.  However, if anybody needs more explanation or
>> sees any fault in above understanding, please let me know.
>
> Thanks for clarifying, I was wondering the reason behind the question
> as well. It is the second time that I see the word zheap on -hackers,
> and the first time was no longer than 2 days ago by Robert.
>

This is a big undertaking and will take time to reach a stage where
the whole project can be shared, but some of the important design
points which are quite linked with existing technology can be
discussed earlier to avoid making wrong assumptions.

Thanks for having a discussion on this topic.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in adjust_appendrel_attrs_mutator

2017-10-23 Thread Amit Langote
On 2017/10/24 0:22, Tom Lane wrote:
> Amit Langote  writes:
>> On 2017/10/23 2:07, Tom Lane wrote:
>>> Hmm.  adjust_appendrel_attrs() thinks it's only used after conversion
>>> of sublinks to subplans, but this is a counterexample.  I wonder if
>>> that assumption was ever correct?  Or maybe we need to rethink what
>>> it does when recursing into RTE subqueries.
> 
>> Looking at the backtrace, the crashing SubLink seems to have been
>> referenced from a PlaceHolderVar that is in turn referenced by
>> joinaliasvars of a JOIN rte in query->rtable.
> 
> Right.  The core of the issue is that joinaliasvars lists don't get run
> through preprocess_expression, so (among other things) any SubLinks in
> them don't get expanded to subplans.

Ah, I missed that.

> Probably the reason we've not
> heard field complaints about this is that in a non-assert build there
> would be no observable bad effects --- the scan would simply ignore
> the subquery, and whether the joinaliasvars entry has been correctly
> mutated doesn't matter at all because it will never be used again.

I see.

>> I wonder if we shouldn't just ignore those (joinaliasvars in JOIN rtes)
>> while adjust_appendrel_attrs() is doing its job on a Query, just like we
>> ask to ignore subqueries by passing QTW_IGNORE_RC_SUBQUERIES to
>> query_tree_mutator()?
> 
> I don't really like this fix, because ISTM it's fixing one symptom rather
> than the root of the problem.

That's true.

> The root is that joinaliasvars lists
> diverge from the representation of expressions elsewhere in the tree,
> and not only with respect to SubLinks --- another example is that function
> calls with named arguments won't have been rearranged into executable
> form.  That could easily be a dangerous thing, if we allow arbitrary
> expression processing to get done on them.  Moreover, your patch is
> letting the divergence get even bigger: now the joinaliasvars lists don't
> even have the right varnos, making them certainly unusable for anything.

Hmm, yes.  Although, I only proposed that patch because I had convinced
myself that joinaliasvars lists weren't looked at anymore.

> So what I'm thinking is that we would be well advised to actually remove
> the untransformed joinaliasvars from the tree as soon as we're done with
> preprocessing expressions.  We'd drop them at the end of planning anyway
> (cf. add_rte_to_flat_rtable) so this is just getting rid of them a bit
> sooner, and it won't affect anything after the planner.
>
> In short, I'm thinking something more like the attached.

Yeah, that makes more sense.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Amit Langote
On 2017/10/24 1:15, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
>>  wrote:
>>> I started with Maksim's submitted code, and developed according to the
>>> ideas discussed in this thread.  Attached is a very WIP patch series for
>>> this feature.

Nice!

>>> Many things remain to be done before this is committable:  pg_dump
>>> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
>>> implemented.  No REINDEX support yet.  Docs not updated (but see the
>>> regression test as a guide for how this is supposed to work; see patch
>>> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>>>
>>> I'm now working on the ability to build unique indexes (and unique
>>> constraints) on top of this.
>>
>> Cool.  Are you planning to do that by (a) only allowing the special
>> case where the partition key columns/expressions are included in the
>> indexed columns/expressions, (b) trying to make every insert to any
>> index check all of the indexes for uniqueness conflicts, or (c)
>> implementing global indexes?  Because (b) sounds complex - think about
>> attach operations, for example - and (c) sounds super-hard.  I'd
>> suggest doing (a) first, just on the basis of complexity.
> 
> Yes, I think (a) is a valuable thing to have -- not planning on doing
> (c) at all because I fear it'll be a huge time sink.  I'm not sure about
> (b), but it's not currently on my plan.

+1 to proceeding with (a) first.

>> I hope that you don't get so involved in making this unique index
>> stuff work that we don't get the cascading index feature, at least,
>> committed to v11.  That's already a considerable step forward in terms
>> of ease of use, and I'd really like to have it.
> 
> Absolutely -- I do plan to get this one finished regardless of unique
> indexes.

+1

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-10-23 Thread Masahiko Sawada
On Mon, Oct 23, 2017 at 10:43 AM, Amit Langote
 wrote:
> On 2017/10/22 5:25, Thomas Munro wrote:
>> On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas  wrote:
>>> On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada  
>>> wrote:
> Down at the bottom of the build log in the regression diffs file you can 
> see:
>
> ! ERROR: cache lookup failed for relation 32893
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

 Thank you for letting me know.

 Hmm, it's an interesting failure. I'll investigate it and post the new 
 patch.
>>>
>>> Did you ever find out what the cause of this problem was?
>>
>> I wonder if it might have been the same issue that commit
>> 19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.
>
> Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation
> XXX" error in a different code path though (the code path handling manual
> vacuum).  Not sure if the commit could have prevented that error being
> emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in
> this case.  Maybe I'm missing something though.
>

Yeah, I was thinking the commit is relevant with this issue but as
Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
I don't find out the cause of this issue yet. With the previous
version patch, autovacuum workers were woking with one parallel worker
but it never drops relations. So it's possible that the error might
not have been relevant with the patch but anywayI'll continue to work
on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-23 Thread Alvaro Herrera
Tom Lane wrote:

> What I'm suspicious of as the actual bug cause is the comment in
> perform_work_item about how we need to be sure that we're allocating these
> strings in a long-lived context.  If, in fact, they were allocated in some
> context that could get reset during the PG_TRY (particularly in the
> error-cleanup path, which I bet we don't test), then the observed symptom
> that the pointers seem to be pointing at garbage could be explained.
> 
> So what I'm thinking is that you need an error during perform_work_item,
> and/or more than one work_item picked up in the calling loop, to make this
> bug manifest.  You would need to enter perform_work_item in a
> non-long-lived context, so the first time through the loop is probably
> safe anyway.

I created a reproducer for this bug, by 1) adding some sleeps to make
the summarization process take longer, 2) injecting errors randomly
during the summarization run, 3) inserting lots of rows using the
attached pgbench script running with 8 clients (creation script below).
Takes less than a second to crash.

What is happening is that we're freeing the strings (allocated in
TopTransactionContext) even in the case where the PG_CATCH block aborts
the transaction, which is obviously bogus.  I moved the frees to inside
the PG_TRY block and no crash occurs (I didn't like a 'goto' from
outside to inside the PG_TRY block, so I duplicated those lines
instead).  I propose the attached patch.

Before pushing, I'll give a look to the regular autovacuum path to see
if it needs a similar fix.

initialization:
CREATE TABLE t (a integer);
CREATE INDEX t_a_idx ON t USING brin (a) WITH (autosummarize='true', 
pages_per_range='16');

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2543c9ee25245f63653bf342a0240eaa8a1dcd6f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 23 Oct 2017 18:55:12 +0200
Subject: [PATCH] Fix autovacuum work items

---
 src/backend/postmaster/autovacuum.c | 39 +
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 776b1c0a9d..f5aa520d39 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2525,12 +2525,15 @@ deleted:
continue;
if (workitem->avw_active)
continue;
+   if (workitem->avw_database != MyDatabaseId)
+   continue;
 
/* claim this one, and release lock while performing it */
workitem->avw_active = true;
LWLockRelease(AutovacuumLock);
 
perform_work_item(workitem);
+   /* we intentially do not set did_vacuum here */
 
/*
 * Check for config changes before acquiring lock for further
@@ -2601,16 +2604,22 @@ perform_work_item(AutoVacuumWorkItem *workitem)
/*
 * Save the relation name for a possible error message, to avoid a 
catalog
 * lookup in case of an error.  If any of these return NULL, then the
-* relation has been dropped since last we checked; skip it. Note: they
-* must live in a long-lived memory context because we call vacuum and
-* analyze in different transactions.
+* relation has been dropped since last we checked; skip it.
 */
-
+   MemoryContextSwitchTo(TopTransactionContext);
cur_relname = get_rel_name(workitem->avw_relation);
cur_nspname = 
get_namespace_name(get_rel_namespace(workitem->avw_relation));
cur_datname = get_database_name(MyDatabaseId);
if (!cur_relname || !cur_nspname || !cur_datname)
-   goto deleted2;
+   {
+   if (cur_datname)
+   pfree(cur_datname);
+   if (cur_nspname)
+   pfree(cur_nspname);
+   if (cur_relname)
+   pfree(cur_relname);
+   return;
+   }
 
autovac_report_workitem(workitem, cur_nspname, cur_datname);
 
@@ -2623,7 +2632,6 @@ perform_work_item(AutoVacuumWorkItem *workitem)
PG_TRY();
{
/* have at it */
-   MemoryContextSwitchTo(TopTransactionContext);
 
switch (workitem->avw_type)
{
@@ -2645,6 +2653,14 @@ perform_work_item(AutoVacuumWorkItem *workitem)
 * point.)
 */
QueryCancelPending = false;
+
+   /* be tidy */
+   if (cur_datname)
+   pfree(cur_datname);
+   if (cur_nspname)
+   pfree(cur_nspname);
+   if (cur_relname)
+   pfree(cur_relname);
}
PG_CATCH();
{
@@ -2667,17 +2683,6 @@ perform_work_item(AutoVacuumWorkItem 

Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Beena Emerson
On Mon, Oct 23, 2017 at 3:24 PM, Rajkumar Raghuwanshi
 wrote:
>
> On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote
>  wrote:
>>
>> The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
>> this typedef redefinition introduced by the 0002 patch, but it seems that
>> it's not needed anyway, so got rid of that line in the attached updated
>> patch.
>>
>> Fixed one more useless diff in 0002, but no changes in any other patch
>
>
> Thanks for updated patches, I am able to compile it on head.
>
> While testing this, I got an observation, pruning is not scanning default
> partition leading to wrong output. below is test to reproduce this.
>
> create table rp (a int, b varchar) partition by range (a);
> create table rp_p1 partition of rp default;
> create table rp_p2 partition of rp for values from (1) to (10);
> create table rp_p3 partition of rp for values from (10) to (maxvalue);
>
> insert into rp values (-1,'p1');
> insert into rp values (1,'p2');
> insert into rp values (11,'p3');
>
> postgres=# select tableoid::regclass,* from rp;
>  tableoid | a  | b
> --++
>  rp_p2|  1 | p2
>  rp_p3| 11 | p3
>  rp_p1| -1 | p1
> (3 rows)
>
> --with pruning
> postgres=# explain (costs off) select * from rp where a <= 1;
> QUERY PLAN
> --
>  Append
>->  Seq Scan on rp_p2
>  Filter: (a <= 1)
> (3 rows)
>
> postgres=# select * from rp where a <= 1;
>  a | b
> ---+
>  1 | p2
> (1 row)
>

I had noticed this and also that this crash:

tprt PARTITION BY RANGE(Col1)
   tprt_1 FOR VALUES FROM (1) TO (50001) PARTITION BY RANGE(Col1)
  tprt_11 FOR VALUES FROM (1) TO (1),
  tprt_1d DEFAULT
   tprt_2 FOR VALUES FROM (50001) TO (11)

EXPLAIN (COSTS OFF) SELECT * FROM tprt WHERE col1 BETWEEN 2 AND 7;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>



-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
>  wrote:
> > I started with Maksim's submitted code, and developed according to the
> > ideas discussed in this thread.  Attached is a very WIP patch series for
> > this feature.
> >
> > Many things remain to be done before this is committable:  pg_dump
> > support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> > implemented.  No REINDEX support yet.  Docs not updated (but see the
> > regression test as a guide for how this is supposed to work; see patch
> > 0005).  CREATE INDEX CONCURRENTLY not done yet.
> >
> > I'm now working on the ability to build unique indexes (and unique
> > constraints) on top of this.
> 
> Cool.  Are you planning to do that by (a) only allowing the special
> case where the partition key columns/expressions are included in the
> indexed columns/expressions, (b) trying to make every insert to any
> index check all of the indexes for uniqueness conflicts, or (c)
> implementing global indexes?  Because (b) sounds complex - think about
> attach operations, for example - and (c) sounds super-hard.  I'd
> suggest doing (a) first, just on the basis of complexity.

Yes, I think (a) is a valuable thing to have -- not planning on doing
(c) at all because I fear it'll be a huge time sink.  I'm not sure about
(b), but it's not currently on my plan.

> I hope that you don't get so involved in making this unique index
> stuff work that we don't get the cascading index feature, at least,
> committed to v11.  That's already a considerable step forward in terms
> of ease of use, and I'd really like to have it.

Absolutely -- I do plan to get this one finished regardless of unique
indexes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Robert Haas
On Mon, Oct 23, 2017 at 11:12 AM, Alvaro Herrera
 wrote:
> I started with Maksim's submitted code, and developed according to the
> ideas discussed in this thread.  Attached is a very WIP patch series for
> this feature.
>
> Many things remain to be done before this is committable:  pg_dump
> support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
> implemented.  No REINDEX support yet.  Docs not updated (but see the
> regression test as a guide for how this is supposed to work; see patch
> 0005).  CREATE INDEX CONCURRENTLY not done yet.
>
> I'm now working on the ability to build unique indexes (and unique
> constraints) on top of this.

Cool.  Are you planning to do that by (a) only allowing the special
case where the partition key columns/expressions are included in the
indexed columns/expressions, (b) trying to make every insert to any
index check all of the indexes for uniqueness conflicts, or (c)
implementing global indexes?  Because (b) sounds complex - think about
attach operations, for example - and (c) sounds super-hard.  I'd
suggest doing (a) first, just on the basis of complexity.

I hope that you don't get so involved in making this unique index
stuff work that we don't get the cascading index feature, at least,
committed to v11.  That's already a considerable step forward in terms
of ease of use, and I'd really like to have it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in adjust_appendrel_attrs_mutator

2017-10-23 Thread Tom Lane
Amit Langote  writes:
> On 2017/10/23 2:07, Tom Lane wrote:
>> Hmm.  adjust_appendrel_attrs() thinks it's only used after conversion
>> of sublinks to subplans, but this is a counterexample.  I wonder if
>> that assumption was ever correct?  Or maybe we need to rethink what
>> it does when recursing into RTE subqueries.

> Looking at the backtrace, the crashing SubLink seems to have been
> referenced from a PlaceHolderVar that is in turn referenced by
> joinaliasvars of a JOIN rte in query->rtable.

Right.  The core of the issue is that joinaliasvars lists don't get run
through preprocess_expression, so (among other things) any SubLinks in
them don't get expanded to subplans.  Probably the reason we've not
heard field complaints about this is that in a non-assert build there
would be no observable bad effects --- the scan would simply ignore
the subquery, and whether the joinaliasvars entry has been correctly
mutated doesn't matter at all because it will never be used again.

> I wonder if we shouldn't just ignore those (joinaliasvars in JOIN rtes)
> while adjust_appendrel_attrs() is doing its job on a Query, just like we
> ask to ignore subqueries by passing QTW_IGNORE_RC_SUBQUERIES to
> query_tree_mutator()?

I don't really like this fix, because ISTM it's fixing one symptom rather
than the root of the problem.  The root is that joinaliasvars lists
diverge from the representation of expressions elsewhere in the tree,
and not only with respect to SubLinks --- another example is that function
calls with named arguments won't have been rearranged into executable
form.  That could easily be a dangerous thing, if we allow arbitrary
expression processing to get done on them.  Moreover, your patch is
letting the divergence get even bigger: now the joinaliasvars lists don't
even have the right varnos, making them certainly unusable for anything.

So what I'm thinking is that we would be well advised to actually remove
the untransformed joinaliasvars from the tree as soon as we're done with
preprocessing expressions.  We'd drop them at the end of planning anyway
(cf. add_rte_to_flat_rtable) so this is just getting rid of them a bit
sooner, and it won't affect anything after the planner.

In short, I'm thinking something more like the attached.

Comments?

regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ecdd728..c0ce3c3 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** subquery_planner(PlannerGlobal *glob, Qu
*** 777,782 
--- 777,803 
  	}
  
  	/*
+ 	 * Now that we are done preprocessing expressions, and in particular done
+ 	 * flattening JOIN alias variables, get rid of the joinaliasvars lists.
+ 	 * They no longer match what expressions in the rest of the tree look
+ 	 * like, because we have not preprocessed expressions in those lists (and
+ 	 * do not want to; for example, expanding a SubLink there would result in
+ 	 * a useless unreferenced subplan).  Leaving them in place simply creates
+ 	 * a hazard for later scans of the tree.  We could try to prevent that by
+ 	 * using QTW_IGNORE_JOINALIASES in every tree scan done after this point,
+ 	 * but that doesn't sound very reliable.
+ 	 */
+ 	if (root->hasJoinRTEs)
+ 	{
+ 		foreach(l, parse->rtable)
+ 		{
+ 			RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
+ 
+ 			rte->joinaliasvars = NIL;
+ 		}
+ 	}
+ 
+ 	/*
  	 * In some cases we may want to transfer a HAVING clause into WHERE. We
  	 * cannot do so if the HAVING clause contains aggregates (obviously) or
  	 * volatile functions (since a HAVING clause is supposed to be executed

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-23 Thread Alvaro Herrera
Hello

I started with Maksim's submitted code, and developed according to the
ideas discussed in this thread.  Attached is a very WIP patch series for
this feature.

Many things remain to be done before this is committable:  pg_dump
support needs to be written.  ALTER INDEX ATTACH/DETACH not yet
implemented.  No REINDEX support yet.  Docs not updated (but see the
regression test as a guide for how this is supposed to work; see patch
0005).  CREATE INDEX CONCURRENTLY not done yet.

I'm now working on the ability to build unique indexes (and unique
constraints) on top of this.

The docs have not been updated yet, but the new regression test file
illustrates how this works.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 8e8bf2f587188859eba1f61a44ee9ee08d960f51 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 16 Oct 2017 12:01:12 +0200
Subject: [PATCH v1 1/5] Tweak index_create/index_constraint_create APIs

I was annoyed by index_create and index_constraint_create respective
APIs.  It's too easy to get confused when adding a new behavioral flag
given the large number of boolean flags they already have.  Turning them
into a flags bitmask makes that code easier to read, as in the attached
patch.

I split the flags in two -- one set for index_create directly and
another related to constraints.  index_create() itself receives both,
and then passes down the constraint one to index_constraint_create.  It
is shorter in LOC terms to create a single set mixing all the values,
but it seemed to make more sense this way.  Also, I chose not to turn
the booleans 'allow_sytem_table_mods' and 'is_internal' into flag bits
because of the different way they interact with this code.
---
 src/backend/catalog/index.c  | 101 +++
 src/backend/catalog/toasting.c   |   3 +-
 src/backend/commands/indexcmds.c |  33 +
 src/backend/commands/tablecmds.c |  13 +++--
 src/include/catalog/index.h  |  29 ++-
 5 files changed, 100 insertions(+), 79 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c7b2f031f0..17faeffada 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -680,19 +680,25 @@ UpdateIndexRelation(Oid indexoid,
  * classObjectId: array of index opclass OIDs, one per index column
  * coloptions: array of per-index-column indoption settings
  * reloptions: AM-specific options
- * isprimary: index is a PRIMARY KEY
- * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint
- * deferrable: constraint is DEFERRABLE
- * initdeferred: constraint is INITIALLY DEFERRED
+ * flags: bitmask that can include any combination of these bits:
+ * INDEX_CREATE_IS_PRIMARY
+ * the index is a primary key
+ * INDEX_CREATE_ADD_CONSTRAINT:
+ * invoke index_constraint_create also
+ * INDEX_CREATE_SKIP_BUILD:
+ * skip the index_build() step for the moment; caller must 
do it
+ * later (typically via reindex_index())
+ * INDEX_CREATE_CONCURRENT:
+ * do not lock the table against writers.  The index will 
be
+ * marked "invalid" and the caller must take additional 
steps
+ * to fix it up.
+ * INDEX_CREATE_IF_NOT_EXISTS:
+ * do not throw an error if a relation with the same name
+ * already exists.
+ * constr_flags: flags passed to index_constraint_create
+ * (only if INDEX_CREATE_ADD_CONSTRAINT is set)
  * allow_system_table_mods: allow table to be a system catalog
- * skip_build: true to skip the index_build() step for the moment; caller
- * must do it later (typically via reindex_index())
- * concurrent: if true, do not lock the table against writers.  The index
- * will be marked "invalid" and the caller must take additional 
steps
- * to fix it up.
  * is_internal: if true, post creation hook for new index
- * if_not_exists: if true, do not throw an error if a relation with
- * the same name already exists.
  *
  * Returns the OID of the created index.
  */
@@ -709,15 +715,10 @@ index_create(Relation heapRelation,
 Oid *classObjectId,
 int16 *coloptions,
 Datum reloptions,
-bool isprimary,
-bool isconstraint,
-bool deferrable,
-bool initdeferred,
+uint16 flags,
+uint16 constr_flags,
 bool allow_system_table_mods,
-bool skip_build,
-bool concurrent,
-bool is_internal,
-   

[HACKERS] Re: [BUGS] BUG #14849: jsonb_build_object doesn't like VARIADIC calls very much

2017-10-23 Thread Michael Paquier
On Mon, Oct 23, 2017 at 7:03 AM, Michael Paquier
 wrote:
>> Looks good otherwise.
>
> My set of diffs for funcapi.h are actually that:
>   * funcapi.h
>   *   Definitions for functions which return composite type and/or sets
> + *   or work on VARIADIC inputs.
> [...]
> +/*--
> + * Support to ease writing of functions dealing with VARIADIC inputs
> + *--
> + *
> + * This function extracts a set of argument values, types and NULL markers
> + * for a given input function. This returns a set of data:
> + * - **values includes the set of Datum values extracted.
> + * - **types the data type OID for each element.
> + * - **nulls tracks if an element is NULL.
> + *
> + * convert_unknown set to true enforces conversion of unknown input type
> + * unknown to text.
> + * variadic_start tracks the argument number of the function call where the
> + * VARIADIC set of arguments begins.
> + *
> + * The return result is the number of elements stored. In the event of a
> + * NULL input, then the caller of this function ought to generate a NULL
> + * object as final result, and in this case a result value of -1 is used
> + * to be able to make the difference between an empty array or object.
> + */
> +extern int extract_variadic_args(FunctionCallInfo fcinfo, int variadic_start,
> +bool convert_unknown, Datum **values,
> Oid **types,
> +bool **nulls);
>
> Got this bit as well:
>   * funcapi.c
>   *   Utility and convenience functions for fmgr functions that return
> - *   sets and/or composite types.
> + *   sets and/or composite types, or deal with VARIADIC inputs.
>   *

Okay, attached is what I think a fully implemented patch should look
like. On top of what Andrew has done, I added and reworked the
following:
- removed duplicate error handling.
- documented the function in funcapi.h and funcapi.c.
- Added a new section in funcapi.h to outline that this is for support
of VARIADIC inputs.
I have added a commit message as well. Hope this helps.

format, concat and potentially count_nulls could take advantage of
this new function, though refactoring is left for later. I am fine to
do the legwork on a different thread. Changing count_nulls would also
switch it to a o(n^2),  which is not cool either, so I think that it
could be left out. Still let's discuss that on another thread.
-- 
Michael


json_variadic_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-23 Thread Amit Kapila
On Thu, Oct 19, 2017 at 1:16 AM, Robert Haas  wrote:
> On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund  wrote:
>
>>b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
>>   queue whenever it's not empty when a tuple is ready.
>
> Batching them with what?  I hate to postpone sending tuples we've got;
> that sounds nice in the case where we're sending tons of tuples at
> high speed, but there might be other cases where it makes the leader
> wait.
>

I think Rafia's latest patch on the thread [1] has done something
similar where the tuples are sent till there is a space in shared
memory queue and then turn to batching the tuples using local queues.


[1] - 
https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-23 Thread Amit Kapila
On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund  wrote:
> Hi Everyone,
>
> On 2017-05-19 17:25:38 +0530, Rafia Sabih wrote:
>> While analysing the performance of TPC-H queries for the newly developed
>> parallel-operators, viz, parallel index, bitmap heap scan, etc. we noticed
>> that the time taken by gather node is significant. On investigation, as per
>> the current method it copies each tuple to the shared queue and notifies
>> the receiver. Since, this copying is done in shared queue, a lot of locking
>> and latching overhead is there.
>>
>> So, in this POC patch I tried to copy all the tuples in a local queue thus
>> avoiding all the locks and latches. Once, the local queue is filled as per
>> it's capacity, tuples are transferred to the shared queue. Once, all the
>> tuples are transferred the receiver is sent the notification about the same.
>>
>> With this patch I could see significant improvement in performance for
>> simple queries,
>
> I've spent some time looking into this, and I'm not quite convinced this
> is the right approach.
>

As per my understanding, the patch in this thread is dead (not
required) after the patch posted by Rafia in thread "Effect of
changing the value for PARALLEL_TUPLE_QUEUE_SIZE" [1].  There seem to
be two related problems in this area, first is shm queue communication
overhead and second is workers started to stall when shm queue gets
full.  We can observe the first problem in simple queries that use
gather and second in gather merge kind of scenarios.  We are trying to
resolve both the problems with the patch posted in another thread.  I
think there is some similarity with the patch posted on this thread,
but it is different.  I have mentioned something similar up thread as
well.


>  Let me start by describing where I see the
> current performance problems around gather stemming from.
>
> The observations here are made using
> select * from t where i < 3000 offset 2999 - 1;
> with Rafia's data. That avoids slowdowns on the leader due to too many
> rows printed out. Sometimes I'll also use
> SELECT * FROM lineitem WHERE l_suppkey > '5012' OFFSET 10 LIMIT 1;
> on tpch data to show the effects on wider tables.
>
> The precise query doesn't really matter, the observations here are more
> general, I hope.
>
> 1) nodeGather.c re-projects every row from workers. As far as I can tell
>that's now always exactly the same targetlist as it's coming from the
>worker. Projection capability was added in 8538a6307049590 (without
>checking whether it's needed afaict), but I think it in turn often
>obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7.  My
>measurement shows that removing the projection yields quite massive
>speedups in queries like yours, which is not too surprising.
>
>I suspect this just needs a tlist_matches_tupdesc check + an if
>around ExecProject(). And a test, right now tests pass unless
>force_parallel_mode is used even if just commenting out the
>projection unconditionally.
>

+1.  I think we should something to avoid this.

>
>Commenting out the memory barrier - which is NOT CORRECT - improves
>timing:
>before: 4675.626
>after: 4125.587
>
>The correct fix obviously is not to break latch correctness. I think
>the big problem here is that we perform a SetLatch() for every read
>from the latch.
>
>I think we should
>a) introduce a batch variant for receiving like:
>
> extern shm_mq_result shm_mq_receivev(shm_mq_handle *mqh,
>  shm_mq_iovec *iov, int *iovcnt,
>  bool nowait)
>
>   which then only does the SetLatch() at the end. And maybe if it
>   wrapped around.
>
>b) Use shm_mq_sendv in tqueue.c by batching up insertions into the
>   queue whenever it's not empty when a tuple is ready.
>

I think the patch posted in another thread has tried to achieve such a
batching by using local queues, maybe we should try some other way.

>
> Unfortunately the patch's "local worker queue" concept seems, to me,
> like it's not quite addressing the structural issues, instead opting to
> address them by adding another layer of queuing.
>

That is done to use batching the tuples while sending them.  Sure, we
can do some of the other things as well, but I think the main
advantage is from batching the tuples in a smart way while sending
them and once that is done, we might not need many of the other
optimizations.


[1] - 
https://www.postgresql.org/message-id/CAOGQiiNiMhq5Pg3LiYxjfi2B9eAQ_q5YjS%3DfHiBJmbSOF74aBQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

Alexander Korotkov писал 2017-10-23 13:19:

Despite code cleanup, you still have some random empty lines removals
in your patch.


I reconfigured my IDE to avoid this in the future.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+{
+
+	WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
new file mode 100644
index 000..db2f549
--- /dev/null
+++ b/src/backend/commands/waitlsn.c
@@ -0,0 +1,273 @@
+/*-
+ *
+ * waitlsn.c
+ *	  WaitLSN statment: WAITLSN
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Alexander Korotkov
On Mon, Oct 23, 2017 at 12:42 PM, Ivan Kartyshov  wrote:

> New little cleanup code changes
>

Despite code cleanup, you still have some random empty lines removals in
your patch.

@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
>   {NULL, 0, false}
>  };
>
> -
>  /*
>   * Although only "on", "off", and "always" are documented,
>   * we accept all the likely variants of "on" and "off".



@@ -141,7 +141,6 @@
>  #include "utils/timestamp.h"
>  #include "utils/tqual.h"
>
> -
>  /*
>   * Maximum size of a NOTIFY payload, including terminating NULL.  This
>   * must be kept small enough so that a notification message fits on one


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Rajkumar Raghuwanshi
On Mon, Oct 23, 2017 at 1:12 PM, Amit Langote  wrote:

> The compiler I have here (gcc (GCC) 6.2.0) didn't complain like that for
> this typedef redefinition introduced by the 0002 patch, but it seems that
> it's not needed anyway, so got rid of that line in the attached updated
> patch.
>
> Fixed one more useless diff in 0002, but no changes in any other patch


Thanks for updated patches, I am able to compile it on head.

While testing this, I got an observation, pruning is not scanning default
partition leading to wrong output. below is test to reproduce this.

create table rp (a int, b varchar) partition by range (a);
create table rp_p1 partition of rp default;
create table rp_p2 partition of rp for values from (1) to (10);
create table rp_p3 partition of rp for values from (10) to (maxvalue);

insert into rp values (-1,'p1');
insert into rp values (1,'p2');
insert into rp values (11,'p3');

postgres=# select tableoid::regclass,* from rp;
 tableoid | a  | b
--++
 rp_p2|  1 | p2
 rp_p3| 11 | p3
 rp_p1| -1 | p1
(3 rows)

--with pruning
postgres=# explain (costs off) select * from rp where a <= 1;
QUERY PLAN
--
 Append
   ->  Seq Scan on rp_p2
 Filter: (a <= 1)
(3 rows)

postgres=# select * from rp where a <= 1;
 a | b
---+
 1 | p2
(1 row)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

New little cleanup code changes

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov 
Date:   Mon Oct 23 12:08:59 2017 +0300

Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+{
+
+	WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index 4a6c99e..0d10117 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -20,6 +20,6 @@ OBJS = amcmds.o aggregatecmds.o alter.o analyze.o async.o cluster.o comment.o \
 	policy.o portalcmds.o prepare.o proclang.o publicationcmds.o \
 	schemacmds.o seclabel.o sequence.o statscmds.o subscriptioncmds.o \
 	tablecmds.o tablespace.o trigger.o tsearchcmds.o typecmds.o user.o \
-	vacuum.o vacuumlazy.o variable.o view.o
+	vacuum.o vacuumlazy.o variable.o view.o waitlsn.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f7de742..cdeddfc 100644
--- a/src/backend/commands/async.c
+++ 

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Ivan Kartyshov

Alexander Korotkov писал 2017-09-26 12:07:

I propose following syntax options.

WAITLSN lsn;
WAITLSN lsn TIMEOUT delay;
WAITLSN lsn INFINITE;
WAITLSN lsn NOWAIT;

For me that looks rather better.  What do you think?


I agree with you, now syntax looks better.
New patch attached to tha mail.

Ants Aasma писал 2017-09-26 13:00:

Exposing this interface as WAITLSN will encode that visibility order
matches LSN order. This removes any chance of fixing for example
visibility order of async/vs/sync transactions. Just renaming it so
the token is an opaque commit visibility token that just happens to be
a LSN would still allow for progress in transaction management. For
example, making PostgreSQL distributed will likely want timestamp
and/or vector clock based visibility rules.


I'm sorry I did not understand exactly what you meant.
Please explain this in more detail.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 217f842726531edb1b0056a5c5727ab01bab7f9b
Author: i.kartyshov 
Date:   Mon Oct 23 12:08:59 2017 +0300

Cherry picked and ported 11dev

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 01acc2e..6792eb0 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -181,6 +181,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..6f389ca
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,144 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ INFINITELY ]
+WAITLSN 'LSN' TIMEOUT wait_time
+WAITLSN 'LSN' NOWAIT
+
+ 
+
+ 
+  Description
+
+  
+   interprocess communication mechanism to wait for the target log sequence
+   number (LSN) on standby in 
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAITLSN command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option, or check the target LSN
+   status immediately using the NOWAIT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+	  Specify the target log sequence number to wait for.
+ 
+
+   
+   
+	INFINITELY
+
+ 
+	  Wait until the target LSN is replayed on standby.
+	  This is an optional parameter reinforcing the default behavior.
+ 
+
+   
+  
+ 
+
+   
+	TIMEOUT wait_time
+	
+	 
+	  Limit the time to wait for the LSN to be replayed.
+	  The specified wait_time must be an integer
+	  and is measured in milliseconds.
+	 
+	
+   
+
+   
+	NOWAIT
+	
+	 
+	  Report whether the target LSN has been replayed already,
+	  without any waiting.
+	 
+	
+   
+  
+
+ 
+  Examples
+
+  
+   Run WAITLSN from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAITLSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAITLSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAITLSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 9000b3a..0c5951a 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -209,6 +209,7 @@



+   
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dd028a1..117cc9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
-
 /*
  * Although only "on", "off", and "always" are documented,
  * we accept all the likely variants of "on" and "off".
@@ -7312,6 +7312,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWaitLSN())
+{
+
+	WaitLSNSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = 

Re: [HACKERS] [sqlsmith] Failed assertion in adjust_appendrel_attrs_mutator

2017-10-23 Thread Amit Langote
On 2017/10/23 2:07, Tom Lane wrote:
> Andreas Seltenreich  writes:
>> testing master as of 7c981590c2, sqlsmith just triggered the following
>> assertion:
>> TRAP: FailedAssertion("!(!const Node*)(node))->type) == T_SubLink))", 
>> File: "prepunion.c", Line: 2231)
> 
> Hmm.  adjust_appendrel_attrs() thinks it's only used after conversion
> of sublinks to subplans, but this is a counterexample.  I wonder if
> that assumption was ever correct?  Or maybe we need to rethink what
> it does when recursing into RTE subqueries.

Looking at the backtrace, the crashing SubLink seems to have been
referenced from a PlaceHolderVar that is in turn referenced by
joinaliasvars of a JOIN rte in query->rtable.

I wonder if we shouldn't just ignore those (joinaliasvars in JOIN rtes)
while adjust_appendrel_attrs() is doing its job on a Query, just like we
ask to ignore subqueries by passing QTW_IGNORE_RC_SUBQUERIES to
query_tree_mutator()?  IOW, it doesn't look like anything critically
depends on the Vars referenced from joinaliasvars being translated at that
point in inheritance_planner(), but I may be missing something.

The attached patch to do the same, while didn't break any existing tests,
fixed the crash reported by OP.

Thoughts?

Thanks,
Amit
diff --git a/src/backend/optimizer/prep/prepunion.c 
b/src/backend/optimizer/prep/prepunion.c
index 1c84a2cb28..4bdfe73d29 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1964,7 +1964,8 @@ adjust_appendrel_attrs(PlannerInfo *root, Node *node, int 
nappinfos,
newnode = query_tree_mutator((Query *) node,
 
adjust_appendrel_attrs_mutator,
 (void 
*) ,
-
QTW_IGNORE_RC_SUBQUERIES);
+
QTW_IGNORE_RC_SUBQUERIES |
+
QTW_IGNORE_JOINALIASES);
for (cnt = 0; cnt < nappinfos; cnt++)
{
AppendRelInfo *appinfo = appinfos[cnt];

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Andres Freund
On 2017-10-22 23:04:50 -0400, Tom Lane wrote:
> John Lumby  writes:
> > I have a C function (a trigger function) which uses the PG_TRY() 
> > construct to handle certain ERROR conditions.
> > One example is where invoked as INSTEAD OF INSERT into a view.  It 
> > PG_TRYs INSERT into the real base table,
> > but this table may not yet exist  (it is a partitioned child of an 
> > inheritance parent).
> > If the error is  ERRCODE_UNDEFINED_TABLE,  then the CATCH issues 
> > FlushErrorState() and returns to caller who CREATes the table and 
> > re-issues the insert.
> > All works perfectly (on every release of 9.x).
> 
> If it works, it's only because you didn't try very hard to break it.
> In general you can't catch and ignore errors without a full-fledged
> subtransaction --- BeginInternalSubTransaction, then either
> ReleaseCurrentSubTransaction or RollbackAndReleaseCurrentSubTransaction,
> not just FlushErrorState.  See e.g. plpgpsql's exec_stmt_block.
> 
> There may well be specific scenarios where an error gets thrown without
> having done anything that requires transaction cleanup.  But when you
> hit a scenario where that's not true, or when a scenario that used to
> not require cleanup now does, nobody is going to consider that a PG bug.

It'd probably be a good idea to expand on this in the sgml docs. This
has confused quite anumber of people...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Andres Freund
On 2017-10-23 16:16:10 +0800, Craig Ringer wrote:
> On 23 October 2017 at 08:30, John Lumby  wrote:
> 
> > All works but not perfectly --  at COMMIT,  resource_owner issues
> > relcache reference leak messages about relation scans not closed
> > and also about  snapshot still active. I guess that the CREATE has
> > switched resource_owner and pushed a snapshot,  but I did not
> > debug in detail.
> 
> A lot more work is required than what's done pg PG_CATCH to return to
> a queryable state. I've been down this path myself, and it's not fun.
> 
> Take a look at all the extra work done on the error handling path in
> PostgresMain.

That seems quite misleading - that's *not* what needs to be done
to catch an error inside a function. See Tom's response.


> At some point I'd really like to expose that in a more general way so
> it can be used from background workers. Right now AFAICS most
> background workers have to cope with errors with a proc_exit(1) and
> getting restarted to try again. Not ideal.

I agree that generalizing wouldn't be bad, but there's absolutely
nothing preventing you from handling errors in bgworkers without
restarting today.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 16:16, Craig Ringer  wrote:
> On 23 October 2017 at 08:30, John Lumby  wrote:
>
>> All works but not perfectly --  at COMMIT,  resource_owner issues
>> relcache reference leak messages about relation scans not closed
>> and also about  snapshot still active. I guess that the CREATE has
>> switched resource_owner and pushed a snapshot,  but I did not
>> debug in detail.
>
> A lot more work is required than what's done pg PG_CATCH to return to
> a queryable state. I've been down this path myself, and it's not fun.

Ignore me, Tom's example is probably more relevant to you since it
applies to subtransactions, not top-level query state.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] legitimacy of using PG_TRY , PG_CATCH , PG_END_TRY in C function

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 08:30, John Lumby  wrote:

> All works but not perfectly --  at COMMIT,  resource_owner issues
> relcache reference leak messages about relation scans not closed
> and also about  snapshot still active. I guess that the CREATE has
> switched resource_owner and pushed a snapshot,  but I did not
> debug in detail.

A lot more work is required than what's done pg PG_CATCH to return to
a queryable state. I've been down this path myself, and it's not fun.

Take a look at all the extra work done on the error handling path in
PostgresMain.

At some point I'd really like to expose that in a more general way so
it can be used from background workers. Right now AFAICS most
background workers have to cope with errors with a proc_exit(1) and
getting restarted to try again. Not ideal.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-10-23 Thread Andrey Borodin

> 22 окт. 2017 г., в 21:21, Tom Lane  написал(а):
> 
> Andrey Borodin  writes:
>> I was looking for a way to correctly drop compress\decompress functions from 
>> opclasses.
> 
> Making a new opclass seems like a pretty grotty answer; it won't
> help existing installations.

Unfortunately in cube's case that's the only option: cube allows 
100-dimensional cubes, while 94-dimensional cubes could be toasted (according 
to code).

> I think what you need is to undo opclasscmds.c's decision that the
> dependencies should be INTERNAL.  I tried this:

Thanks Tom, that's exactly what I need. I'll push patches with this to November 
commitfest.

Best regards, Andrey Borodin.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] ECPG: fails to recognize embedded parameters

2017-10-23 Thread Tsunakawa, Takayuki
Hello,

This is an actual problem that our customer hit.  In ECPG, opening a cursor 
fails which is declared as follows:

EXEC SQL DECLARE cur CURSOR FOR
SELECT oid, datname
FROM pg_database
WHERE datname LIKE 'post%' ESCAPE '\' AND datconnlimit = 
:connlimit;

sqlstate: 07001
sqlerrm.sqlerrmc: too many arguments on line 30

The cause is that next_insert() in ecpglib unconditionally skips the next 
character after the backslash.

Could you review and commit the attached patch?  I also attached the test 
program for convenience.

Regards
Takayuki Tsunakawa




ecpg_escape_string.patch
Description: ecpg_escape_string.patch


escape.ec
Description: escape.ec

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to determine that a TransactionId is really aborted?

2017-10-23 Thread Craig Ringer
On 23 October 2017 at 05:44, Eric Ridge  wrote:
>> On Oct 22, 2017, at 3:24 PM, Peter Geoghegan  wrote:

>> Again, you'll probably need to put this low level requirement into
>> context if you want sound advice from this list.
>
> I'm just thinking out lout here, but the context is likely something along 
> the lines of externally storing all transaction ids, and periodically asking 
> Postgres if they're known-to-be-aborted-by-all-transactions -- one at a time.

I think Peter is asking "why?".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-10-23 Thread Rajkumar Raghuwanshi
On Thu, Oct 19, 2017 at 12:16 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Description of the attached patches:
>
> 0001: add new tests for partition-pruning
>
> 0002: patch that makes all the changes needed in the planer (adds a stub
>   function in partition.c)
>
> 0003: patch that implements the aforementioned stub (significant amount of
>   code to analyze partition clauses and gin up bounding keys to
>   compare with the values in PartitionBoundInfo; the actual function
>   that will do the comparison is just a stub as of this patch)
>
> 0004: make some preparatory changes to partition_bound_cmp/bsearch, to be
>   able to pass incomplete partition keys (aka, prefix of a multi-
>   column key) for comparison with the values in PartitionBoundInfo
>  (just a refactoring patch)
>
> 0005: implements the stub mentioned in 0003 and finally gets the new
>   partition-pruning working (also disables constraint exclusion using
>   internal partition constraints by teaching get_relation_constraints
>   to not include those).
>
> Feedback greatly welcome.
>
Hi Amit,

I have tried to apply attached patch. patch applied cleanly on commit id -
bf54c0f05c0a58db17627724a83e1b6d4ec2712c
but make failed with below error.

./../../../src/include/nodes/relation.h:2126: error: redefinition of
typedef ‘AppendRelInfo’
../../../../src/include/nodes/relation.h:584: note: previous declaration of
‘AppendRelInfo’ was here
make[4]: *** [gistbuild.o] Error 1