Re: [HACKERS] some review comments on logical rep code

2017-04-28 Thread Noah Misch
On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote:
> On 4/28/17 01:01, Noah Misch wrote:
> > On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
> >> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch  wrote:
> >>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
>  Pushed. Thanks!
> >>>
> >>> Does this close the open item, or is there more to do?
> >>
> >> There is only one item remaining, and the patch is attached on
> >> here[1]. I guess reviewing that patch is almost done.
> >>
> >> [1] 
> >> https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
> > 
> > Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
> > update.  Kindly send a status update within 24 hours, and include a date for
> > your subsequent status update.  Refer to the policy on open item ownership:
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I think the patch that Fujii Masao has proposed has found general
> agreement.  I would recommend that he commits it as he sees fit.

This is not a conforming status update, because it does not specify a date for
your next update.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Tom Lane
Kevin Grittner  writes:
> Well, I was sort of hoping that the triggers that can now be defined
> but can never fire *did* fire at some point.

They will fire if you have an INSTEAD OF row-level trigger; the existence
of that trigger is what determines whether we implement DML on a view
through the view's own triggers or through translation to an action on
the underlying table.

I do not think it'd be reasonable to throw an error for creation of
a statement-level view trigger when there's no row-level trigger,
because that just imposes a hard-to-deal-with DDL ordering dependency.

You could make a case for having the updatable-view translation code
print a WARNING if it notices that there are statement-level triggers
that cannot be fired due to the translation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Tom Lane
I wrote:
> David Rowley  writes:
>> (On 29 April 2017 at 02:26, Tom Lane  wrote:
>> Seems related to the unconditional setting of extra.inner_unique to
>> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()
>> Setting this based on the return value of innerrel_is_unique() as done
>> with the other join types seems to fix the issue.

> Yes, I think that's correct.

Well, "make check-world" disabused me of that notion: there are several
test cases in postgres_fdw that lost perfectly-valid inner_unique
markings.  The reason is that create_unique_path will create uniqueness
even when you couldn't prove it from the underlying rel itself.  So my
previous thought about comparing the outerrel to sjinfo->min_lefthand is
really necessary to avoid regressions from what we had before.

However, while that seems to be enough to generate correct plans, it
doesn't address Teodor's performance complaint: he's wishing the planner
would notice that the semijoin inner rel is effectively unique, even when
the best plan involves initially joining the semijoin inner rel to just
a subset of the semijoin outer --- against which that inner rel is *not*
unique.  Applying innerrel_is_unique() helps for some simpler cases, but
not this one.

Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join.  The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.

Attached are two alternative patches.  The first just does the minimum
necessary to fix the bug; the second adds some code to perform
strength-reduction of semijoins.  The second patch is capable of finding
the plan Teodor wanted for his test case --- in fact, left to its own
devices, it finds a *better* plan, both by cost and actual runtime.

I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11.  Opinions?

regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 5aedcd1..39e2ddd 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*** add_paths_to_joinrel(PlannerInfo *root,
*** 126,133 
  	 *
  	 * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
  	 * matter since the executor can make the equivalent optimization anyway;
! 	 * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
! 	 * know we're going to force uniqueness of the innerrel below.  For
  	 * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
  	 * this module.
  	 */
--- 126,136 
  	 *
  	 * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
  	 * matter since the executor can make the equivalent optimization anyway;
! 	 * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, if
! 	 * the LHS covers all of the associated semijoin's min_lefthand, then it's
! 	 * appropriate to set inner_unique because the path produced by
! 	 * create_unique_path will be unique relative to the LHS.  (If we have an
! 	 * LHS that's only part of the min_lefthand, that is *not* true.)  For
  	 * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
  	 * this module.
  	 */
*** add_paths_to_joinrel(PlannerInfo *root,
*** 138,144 
  			extra.inner_unique = false; /* well, unproven */
  			break;
  		case JOIN_UNIQUE_INNER:
! 			extra.inner_unique = true;
  			break;
  		case JOIN_UNIQUE_OUTER:
  			extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
--- 141,148 
  			extra.inner_unique = false; /* well, unproven */
  			break;
  		case JOIN_UNIQUE_INNER:
! 			extra.inner_unique = bms_is_subset(sjinfo->min_lefthand,
! 			   outerrel->relids);
  			break;
  		case JOIN_UNIQUE_OUTER:
  			extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 69ce7aa..87ff365 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*** reset enable_sort;
*** 5634,5636 
--- 5634,5665 
  drop table j1;
  drop table j2;
  drop table j3;
+ -- check that semijoin inner is not seen as unique for a portion of the outerrel
+ explain (verbose, costs off)
+ select t1.unique1, t2.hundred
+ from onek t1, tenk1 t2
+ where exists (select 1 from tenk1 t3
+   where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred)
+   and t1.unique1 < 1;
+QUERY PLAN
+ -
+  Nested Loop
+Output: t1.unique1, t2.hundred
+ 

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 3:00 PM, Tom Lane  wrote:
> You are confusing number of tuples in the index, which we estimate from
> independent measurements such as the file size, with endpoint value,
> which is used for purposes like guessing whether a mergejoin will be
> able to stop early.  For purposes like that, we do NOT want to include
> dead tuples, because the merge join is never gonna see 'em.

I spent several hours today thinking about this and, more than once,
thought I'd come up with an example demonstrating why my idea was
better than yours (despite the fact that, as you point out, the merge
join is never gonna see 'em).  However, in each instance, I eventually
realized that I was wrong, so I guess I'll have to concede this point.

> Or put another way: the observed problem is planning time, not that the
> resulting estimates are bad.  You argue that we should cut planning
> time by changing the definition of the estimate, and claim that the
> new definition is better, but I think you have nothing but wishful
> thinking behind that.  I'm willing to try to cut planning time, but
> I don't want the definition to change any further than it has to.

OK, I guess that makes sense. There can be scads of dead tuples at the
end of the index, and there's no surety that the query actually
requires touching that portion of the index at all apart from
planning, so it seems a bit unfortunate to burden planning with the
overhead of cleaning them up.  But I guess with your new proposed
definition at least that can only happen once.  After that there may
be a bunch of pages to skip at the end of the index before we actually
find a tuple, but they should at least be empty.  Right now, you can
end up skipping many tuples repeatedly for every query.  I tested a
two-table join with a million committed deleted but not dead tuples at
the end of one index; that increased planning time from ~0.25ms to
~90ms; obviously, a two-and-a-half order of magnitude increase in CPU
time spent planning is not a welcome development on a production
system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A design for amcheck heapam verification

2017-04-28 Thread Peter Geoghegan
It seems like a good next step for amcheck would be to add
functionality that verifies that heap tuples have matching index
tuples, and that heap pages are generally sane. I've been thinking
about a design for this for a while now, and would like to present
some tentative ideas before I start writing code.

Using a bloom filter built with hashed index tuples, and then matching
that against the heap is an approach to index/heap consistency
checking that has lots of upsides, but at least one downside. The
downside is pretty obvious: we introduce a low though quantifiable
risk of false negatives (failure to detect corruption due to the
actual absence of an appropriate index tuple). That's not great,
especially because it occurs non-deterministically, but the upsides to
this approach are considerable. Perhaps it's worth it.

Here is the general approach that I have in mind right now:

* Size the bloom filter based on the pg_class.reltuples of the index
to be verified, weighing a practical tolerance for false negatives,
capping with work_mem.
  - We might throw an error immediately if it's impossible to get a
reasonably low probability of false negatives -- for some value of
"reasonable".
* Perform existing verification checks for a B-Tree. While scanning
the index, hash index tuples, including their heap TID pointer. Build
the bloom filter with hashed values as we go.

As we Scan the heap, we:

* Verify that HOT safety was correctly assessed in respect of the
index (or indexes) being verified.
* Test the visibility map, and sanity check MultiXacts [1].
* Probe index/heap match check (uses bloom filter):

   If a heap tuple meets the following conditions:

 - Is not a HOT update tuple.
 - Is committed, and committed before RecentGlobalXmin.
 - Satisfies any predicate (for partial index case).

   Then:

  - Build a would-be index tuple value, perhaps reusing CREATE INDEX code.
  - Hash that in the same manner as in the original index pass.
  - Raise an error if the bloom filter indicates it is not present in the index.

Seems like we should definitely go to the index first, because the
heap is where we'll find visibility information.

If I wanted to go about implementing the same index/heap checks in a
way that does not have the notable downside around false negatives, I
suppose I'd have to invent a new, internal mechanism that performed a
kind of special merge join of an index against the heap. That would be
complicated, and require a lot of code; in other words, it would be
bug-prone. I wouldn't say that amcheck is simple today, but at least
the complexity owes everything to how B-Trees already work, as opposed
to how a bunch of custom infrastructure we had to invent works. The
amount of code in amcheck's verify_nbtree.c file is pretty low, and
that's a good goal to stick with. The very detailed comment that
argues for the correctness of amcheck's bt_right_page_check_scankey()
function is, to a large degree, also arguing for the correctness of a
bunch of code within nbtree. amcheck verification should be
comprehensive, but also simple and minimal, which IMV is a good idea
for about the same reason that that's a good idea when writing unit
tests.

The merge join style approach would also make verification quite
expensive, particularly when one table has multiple indexes. A tool
whose overhead can only really be justified when we're pretty sure
that there is already a problem is *significantly* less useful. And,
it ties verification to the executor, which could become a problem if
we make the functionality into a library that is usable by backup
tools that don't want to go through the buffer manager (or any SQL
interface).

Apart from the low memory overhead of using a bloom filter, resource
management is itself made a lot easier. We won't be sensitive to
misestimations, because we only need an estimate of the number of
tuples in the index, which will tend to be accurate enough in the vast
majority of cases. reltuples is needed to size the bloom filter bitmap
up-front. It doesn't matter how wide individual index tuples turn out
to be, because we simply hash everything, including even the heap TID
contained within the index.

Using a bloom filter makes verification "stackable" in a way that
might become important later. For example, we can later improve
amcheck to verify a table in parallel, by having a parallel worker
verify one index each, with bloom filters built in fixed size shared
memory buffers. A parallel heap scan then has workers concurrently
verify heap pages, and concurrently probe each final bloom filter.
Similarly, everything works the same if we add the option of scanning
a B-Tree in physical order (at the expense of not doing cross-page
verification). And, while I'd start with nbtree, you can still pretty
easily generalize the approach to building the bloom filter across
AMs. All index AMs other than BRIN have index tuples that are
essentially some values that are either the original heap values

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread David Fetter
On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote:
> On 2017/04/28 7:36, David Fetter wrote:
> > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
> >> On 2017/04/27 1:52, Robert Haas wrote:
> >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
> >>>  wrote:
>  FWIW, I too prefer the latter, that is, fire only the parent's triggers.
>  In that case, applying only the patch 0001 will do.
> >>>
> >>> Do we need to update the documentation?
> >>
> >> Yes, I think we should.  How about as in the attached?
> >>
> >> By the way, code changes I made in the attached are such that a subsequent
> >> patch could implement firing statement-level triggers of all the tables in
> >> a partition hierarchy, which it seems we don't want to do.  Should then
> >> the code be changed to not create ResultRelInfos of all the tables but
> >> only the root table (the one mentioned in the command)?  You will see that
> >> the patch adds fields named es_nonleaf_result_relations and
> >> es_num_nonleaf_result_relations, whereas just es_root_result_relation
> >> would perhaps do, for example.
> > 
> > Did I notice correctly that there's no way to handle transition tables
> > for partitions in AFTER STATEMENT triggers?
> 
> Did you mean to ask about AFTER STATEMENT triggers defined on
> "partitioned" tables?  Specifying transition table for them is disallowed
> at all.
> 
> ERROR:  "p" is a partitioned table
> DETAIL:  Triggers on partitioned tables cannot have transition tables.

OK, I suppose.  It wasn't clear from the documentation.

> Triggers created on (leaf) partitions *do* allow specifying transition table.

That includes the upcoming "default" tables, I presume.

> Or are you asking something else altogether?

I was just fuzzy on the interactions among these features.

> > If not, I'm not suggesting that this be added at this late date, but
> > we might want to document that.
> 
> I don't see mentioned in the documentation that such triggers cannot be
> defined on partitioned tables.  Is that what you are saying should be
> documented?

Yes, but I bias toward documenting a lot, and this restriction could
go away in some future version, which would make things more confusing
in the long run.  I'm picturing a conversation in 2020 that goes
something like this:

"On 10, you could have AFTER STATEMENT triggers on tables, foreigh
tables, and leaf partition tables which referenced transition tables,
but not on DEFAULT partitions.  On 11, you could on DEFAULT partition
tables.  From 12 onward, you can have transition tables on any
relation."

Kevin?  Thomas?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Tom Lane
David Rowley  writes:
> (On 29 April 2017 at 02:26, Tom Lane  wrote:
>> It looks like in the case that's giving wrong answers, the mergejoin
>> is wrongly getting marked as "Inner Unique".  Something's a bit too
>> cheesy about that planner logic --- not sure what, yet.

> Seems related to the unconditional setting of extra.inner_unique to
> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()
> Setting this based on the return value of innerrel_is_unique() as done
> with the other join types seems to fix the issue.
> I don't know yet if that's the correct fix. It's pretty late 'round
> this side to be thinking too hard about it.

Yes, I think that's correct.  I'd jumped to the conclusion that we could
skip making the test in this case, but this example shows that that's
wrong.  The problem is that, in an example like this, create_unique_path
will create a path that's unique-ified for all the join keys of the
semijoin --- but we're considering joining against just a subset of the
semijoin's outer rels, so the inner path is NOT unique for that subset.

We could possibly skip making the test if the outerrel contains
sjinfo->min_lefthand, but I'm not sufficiently excited about shaving
cycles here to take any new risks.  Let's just call innerrel_is_unique()
and be done.

Will fix in a bit, once I've managed to create a smaller test case for
the regression tests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Alexander Korotkov
On Fri, Apr 28, 2017 at 6:59 PM, Tom Lane  wrote:

> David Rowley  writes:
> > Did you mean to attach this?
>
> See the link in Teodor's original message (it's actually a .bz2 file
> not a .gz)
>

Yes, I didn't mean Teodor has renamed it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] Push limit to sort through a subquery

2017-04-28 Thread Douglas Doole
>
> If you add this to the commitfest app, more people might look at it when
> the next commitfest opens.


I have added it. https://commitfest.postgresql.org/14/1119/

Also, it might help if you can provide a query/ies with numbers where this
> optimization shows improvement.
>

I can't provide the real queries where we encountered the problem because
they are internal. However I showed a simplified version of the queries in
my first post.

On our queries, the change made quite a difference - execution time dropped
from 31.4 seconds to 7.2 seconds. Explain analyze also shows that memory
use dropped significantly and we didn't have to spill the sort to disk

From:

-> Sort (cost=989.95..1013.27 rows=9326 width=30)
(node_startup_time/loop=31328.891, node_total_time/loop: 31329.756
rows=2001 loops=1) Buffers: temp read=772 written=11201 lsm_bufmgr
hits=3392 Sort Key: *** Sort Method: external merge Sort Space Used: 89592
Sort Space Type: Disk

To:

-> Sort (cost=989.95..1013.27 rows=9326 width=30)
(node_startup_time/loop=7123.275, node_total_time/loop: 7123.504 rows=2001
loops=1) Buffers: lsm_bufmgr hits=3387 Sort Key: *** Sort Method: top-N
heapsort Sort Space Used: 3256 Sort Space Type: Memory


Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner
On Fri, Apr 28, 2017 at 3:54 PM, Kevin Grittner  wrote:

> Not firing a table's DML because it was fired off a view would be crazy,

should read:

Not firing a table's trigger because the trigger is on DML run from a
view's INSTEAD OF trigger would be crazy,

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner
On Fri, Apr 28, 2017 at 2:42 PM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro
>>  wrote:
>>> For views, aside from the question of transition tables, I noticed
>>> that statement triggers don't seem to fire at all with updatable
>>> views.  Surely they should -- isn't that a separate bug?
>
>> I checked out 25dc142a (before any of my commits for $subject),
>> built it, and tried the above -- with no warning generated.  I then
>> used an UPDATE and DELETE against the view, also with no trigger
>> fired (nor any error during trigger creation or DML).  Does anyone
>> know whether such trigger ever fired at any point in the commit
>> history?
>
> [ experiments... ]  They did, and do, fire if you do it the old-style
> way with an INSTEAD OF row trigger.

Here is the table from near the start of CREATE TRIGGER docs,
"folded" such that I hope it remains intelligible in a fixed-width
font after email systems get done with it:

When
  Event
Row-level  Statement-level

BEFORE
  INSERT/UPDATE/DELETE
Tables and foreign tables  Tables, views, and foreign tables
  TRUNCATE
—  Tables

AFTER
  INSERT/UPDATE/DELETE
Tables and foreign tables  Tables, views, and foreign tables
  TRUNCATE
—  Tables

INSTEAD OF
  INSERT/UPDATE/DELETE
Views  —
  TRUNCATE
—  —

The claim seems to be that you can create a { BEFORE | AFTER } {
event [ OR ... ] } ...  FOR EACH STATEMENT trigger where event is {
INSERT | UPDATE | DELETE } on an updateable view.  Well, you can
*create* it, but it will never fire.

> They don't fire if you're relying
> on an updatable view.  It seems we fire the table's statement triggers
> instead, ie, the update is fully translated into an update on the
> underlying table.

Well, certainly that should *also* happen.  Not firing a table's DML
because it was fired off a view would be crazy, or so it seems to
me.

> I'm not sure how intentional that was, but it's not a completely
> unreasonable definition on its face, and given the lack of field
> complaints since 9.3, I think we should probably stick to it.

Are you talking about being able to create INSERT, UPDATE, and
DELETE triggers on the view (which never fire), or about firing
triggers on the table when an INSTEAD OF trigger is fired.

> However, if you didn't understand that from the documentation,
> then we have a documentation problem.

I totally get that there are INSTEAD OF triggers and have no quibble
with how they behave.  I can't even argue that the above chart is
wrong in terms of what CREATE TRIGGER allows.  However, creating
triggers that can never fire seems entirely wrong.

>> If we do get these working, don't they deserve at least
>> one regression test?
>
> Are you sure there isn't one?

Well, I was sort of hoping that the triggers that can now be defined
but can never fire *did* fire at some point.  But if that were true,
and they subsequently were broken, it would mean we lacked
regression tests for that case.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes

2017-04-28 Thread Fabien COELHO


Here is a v3, with less files. I cannot say I find it better, but it 
still works.


The "command_likes" function has been renamed "command_checks".

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae36247..749b16d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -229,7 +229,7 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions */
+	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
  * and --latency-limit */
 	SimpleStats latency;
@@ -329,7 +329,7 @@ typedef struct
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
 	/* per client collected stats */
-	int64		cnt;			/* transaction count */
+	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -2045,7 +2045,9 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	if (INSTR_TIME_IS_ZERO(now))
 		INSTR_TIME_SET_CURRENT(now);
 	now_us = INSTR_TIME_GET_MICROSEC(now);
-	while (thread->throttle_trigger < now_us - latency_limit)
+	while (thread->throttle_trigger < now_us - latency_limit &&
+		   /* with -t, do not overshoot */
+		   (nxacts <= 0 || st->cnt < nxacts))
 	{
 		processXactStats(thread, st, , true, agg);
 		/* next rendez-vous */
@@ -2053,6 +2055,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		thread->throttle_trigger += wait;
 		st->txn_scheduled = thread->throttle_trigger;
 	}
+
+	if (nxacts > 0 && st->cnt >= nxacts)
+	{
+		st->state = CSTATE_FINISHED;
+		break;
+	}
 }
 
 st->state = CSTATE_THROTTLE;
@@ -2372,7 +2380,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	per_script_stats || use_log)
 	processXactStats(thread, st, , false, agg);
 else
-	thread->stats.cnt++;
+	thread->stats.cnt++, st->cnt++;
 
 if (is_connect)
 {
@@ -2381,7 +2389,6 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_ZERO(now);
 }
 
-++st->cnt;
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
 {
 	/* exit success */
@@ -2530,6 +2537,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
+	/* thread stats */
 	if (progress || throttle_delay || latency_limit)
 	{
 		accumStats(>stats, skipped, latency, lag);
@@ -2539,8 +2547,12 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 			thread->latency_late++;
 	}
 	else
+		/* just count */
 		thread->stats.cnt++;
 
+	/* client stat is just counting */
+	st->cnt ++;
+
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
 
@@ -3118,7 +3130,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	/* Save line */
 	my_command->line = expr_scanner_get_substring(sstate,
   start_offset,
-  end_offset);
+  end_offset + 1);
 
 	if (pg_strcasecmp(my_command->argv[0], "sleep") == 0)
 	{
@@ -3509,7 +3521,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	{
 		printf("number of transactions per client: %d\n", nxacts);
 		printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-			   total->cnt, nxacts * nclients);
+			   total->cnt - total->skipped, nxacts * nclients);
 	}
 	else
 	{
@@ -3525,12 +3537,12 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	if (throttle_delay && latency_limit)
 		printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n",
 			   total->skipped,
-			   100.0 * total->skipped / (total->skipped + total->cnt));
+			   100.0 * total->skipped / total->cnt);
 
 	if (latency_limit)
 		printf("number of transactions above the %.1f ms latency limit: %d (%.3f %%)\n",
 			   latency_limit / 1000.0, latency_late,
-			   100.0 * latency_late / (total->skipped + total->cnt));
+			   100.0 * latency_late / total->cnt);
 
 	if (throttle_delay || progress || latency_limit)
 		printSimpleStats("latency", >latency);
@@ -3580,7 +3592,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
 	   sql_script[i].stats.skipped,
 	   100.0 * sql_script[i].stats.skipped /
-	(sql_script[i].stats.skipped + sql_script[i].stats.cnt));
+	   sql_script[i].stats.cnt);
 
 			if (num_scripts > 1)
 printSimpleStats(" - latency", _script[i].stats.latency);
@@ -4106,6 +4118,12 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (progress_timestamp && progress <= 0)
+	{
+		fprintf(stderr, "--progress-timestamp is allowed only under --progress\n");
+		exit(1);
+	}
+
 	/*
 	 * save main process id in the global variable because process id will be
 	 * changed after fork.
diff 

Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Tom Lane
Kevin Grittner  writes:
> On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro
>  wrote:
>> For views, aside from the question of transition tables, I noticed
>> that statement triggers don't seem to fire at all with updatable
>> views.  Surely they should -- isn't that a separate bug?

> I checked out 25dc142a (before any of my commits for $subject),
> built it, and tried the above -- with no warning generated.  I then
> used an UPDATE and DELETE against the view, also with no trigger
> fired (nor any error during trigger creation or DML).  Does anyone
> know whether such trigger ever fired at any point in the commit
> history?

[ experiments... ]  They did, and do, fire if you do it the old-style
way with an INSTEAD OF row trigger.  They don't fire if you're relying
on an updatable view.  It seems we fire the table's statement triggers
instead, ie, the update is fully translated into an update on the
underlying table.

I'm not sure how intentional that was, but it's not a completely
unreasonable definition on its face, and given the lack of field
complaints since 9.3, I think we should probably stick to it.
However, if you didn't understand that from the documentation,
then we have a documentation problem.

> If we do get these working, don't they deserve at least
> one regression test?

Are you sure there isn't one?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 1:18 AM, Ashutosh Bapat
 wrote:
> For two-way join this works and is fairly straight-forward. I am
> assuming that A an B are base relations and not joins. But making it
> work for N-way join is the challenge.

I don't think it's much different, is it?  Anyway, I'm going to
protest if your algorithm for merging bounds takes any more than
linear time, regardless of what else we decide.

>> Having said that I think we could make this work, I'm starting to
>> agree with you that it will add more complexity than it's worth.
>> Needing to keep track of the type of every partition bound
>> individually seems like a real nuisance, and it's not likely to win
>> very often because, realistically, people should and generally will
>> use the same type for the partitioning column in all of the relevant
>> tables.  So I'm going to revise my position and say it's fine to just
>> give up on partitionwise join unless the types match exactly, but I
>> still think we should try to cover the cases where the bounds don't
>> match exactly but only 1:1 or 1:0 or 0:1 mappings are needed (iow,
>> optimizations 1 and 2 from your list of 4).  I agree that ganging
>> partitions (optimization 4 from your list) is not something to tackle
>> right now.
>
> Good. I will have a more enjoyable vacation now.

Phew, what a relief.  :-)

> Do you still want the patition key type to be out of partition scheme?
> Keeping it there means we match it only once and save it only at a
> single place. Otherwise, it will have to be stored in RelOptInfo of
> the partitioned table and match it for every pair of joining
> relations.

The only reason for removing things from the PartitionScheme was if
they didn't need to be consistent across all tables.  Deciding that
the type is one of the things that has to match means deciding it
should be in the PartitionScheme, not the RelOptInfo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane  wrote:
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,

> I don't, in general, share your intuition that using SnapshotAny is
> the wrong thing.

I think you are mistaken.

> We're looking up the last value in the index for
> planning purposes.  It seems to me that, as far as making index scans
> more or less expensive to scan, a dead tuple is almost as good as a
> live one.

You are confusing number of tuples in the index, which we estimate from
independent measurements such as the file size, with endpoint value,
which is used for purposes like guessing whether a mergejoin will be
able to stop early.  For purposes like that, we do NOT want to include
dead tuples, because the merge join is never gonna see 'em.

In short, arguing about the cost of an indexscan per se is quite
irrelevant, because this statistic is not used when estimating the
cost of an indexscan.

Or put another way: the observed problem is planning time, not that the
resulting estimates are bad.  You argue that we should cut planning
time by changing the definition of the estimate, and claim that the
new definition is better, but I think you have nothing but wishful
thinking behind that.  I'm willing to try to cut planning time, but
I don't want the definition to change any further than it has to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 2:49 PM, Peter Eisentraut
 wrote:
> On 4/27/17 16:10, Robert Haas wrote:
>> I still think you should consider improving the psql output, though.
>> Vitaly's examples upthread indicate that for a serial sequence,
>> there's psql output showing the linkage between the table and sequence
>> in both directions, but not when GENERATED is used.  Can we get
>> something morally equivalent for the GENERATED case?
>
> Done.  Good catch.

Cool.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transition tables for triggers on foreign tables and views

2017-04-28 Thread Kevin Grittner
On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro
 wrote:

> My colleague Prabhat Sahu reported off list that transition tables
> don't work for views.  I probably should have thought about that when
> I fixed something similar for partitioned tables, and after some
> experimentation I see that this is also broken for foreign tables.

Good spot.  Don't know why that didn't occur to me to look at.

> For foreign tables using postgres_fdw, I see that transition tables
> capture new rows for INSERT but capture nothing for DELETE and UPDATE.

Will dig into that in a bit, but first...

> For views, aside from the question of transition tables, I noticed
> that statement triggers don't seem to fire at all with updatable
> views.  Surely they should -- isn't that a separate bug?
>
> Example:
>
>   create table my_table (i int);
>   create view my_view as select * from my_table;
>   create function my_trigger_function() returns trigger language plpgsql as
> $$ begin raise warning 'hello world'; return null; end; $$;
>   create trigger my_trigger after insert or update or delete on my_view
> for each statement execute procedure my_trigger_function();
>   insert into my_view values (42);
>
> ... and the world remains ungreeted.

I checked out 25dc142a (before any of my commits for $subject),
built it, and tried the above -- with no warning generated.  I then
used an UPDATE and DELETE against the view, also with no trigger
fired (nor any error during trigger creation or DML).  Does anyone
know whether such trigger ever fired at any point in the commit
history?  Should we remove the documentation that anything except
INSTEAD OF triggers work on a view, and generate errors for attempt
to do otherwise, or is there something to salvage that has worked at
some point?  If we do get these working, don't they deserve at least
one regression test?

Will post again after I have a chance to review the FDW issue.

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-28 Thread Peter Eisentraut
On 4/27/17 16:10, Robert Haas wrote:
> I still think you should consider improving the psql output, though.
> Vitaly's examples upthread indicate that for a serial sequence,
> there's psql output showing the linkage between the table and sequence
> in both directions, but not when GENERATED is used.  Can we get
> something morally equivalent for the GENERATED case?

Done.  Good catch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] some review comments on logical rep code

2017-04-28 Thread Peter Eisentraut
On 4/28/17 01:01, Noah Misch wrote:
> On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote:
>> On Fri, Apr 28, 2017 at 1:42 PM, Noah Misch  wrote:
>>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote:
 Pushed. Thanks!
>>>
>>> Does this close the open item, or is there more to do?
>>
>> There is only one item remaining, and the patch is attached on
>> here[1]. I guess reviewing that patch is almost done.
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com
> 
> Thanks.  Peter, This PostgreSQL 10 open item is past due for your status
> update.  Kindly send a status update within 24 hours, and include a date for
> your subsequent status update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I think the patch that Fujii Masao has proposed has found general
agreement.  I would recommend that he commits it as he sees fit.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dropping a partitioned table takes too long

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 6:12 AM, 高增琦  wrote:
> It seems that in 'load_relcache_init_file()', we forget to initialize
> 'rd_pdcxt' of relcache.

Fixed.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dropping a partitioned table takes too long

2017-04-28 Thread Robert Haas
On Wed, Apr 26, 2017 at 12:21 PM, Peter Eisentraut
 wrote:
> On 4/26/17 12:15, Robert Haas wrote:
>> On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
>>  wrote:
 The attached patch try to replace 'heap_open' with 'LockRelationOid' when
 locking parent table.
 It improved dropping a table with 7000 partitions.
>>>
>>> Your patch seems to be a much better solution to the problem, thanks.
>>
>> Does anyone wish to object to this patch as untimely?
>>
>> If not, I'll commit it.
>
> Seems quite reasonable.

OK, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash when partition column specified twice

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 7:23 AM, Beena Emerson  wrote:
> Hello Amit,
>
> The extra n->is_from_type = false; seems to be added by mistake?
>
> @@ -11888,6 +11891,8 @@ TableFuncElement:   ColId Typename
> opt_collate_clause
> n->is_local = true;
> n->is_not_null = false;
> n->is_from_type = false;
> +   n->is_from_type = false;
> +   n->is_from_parent = false;
> n->storage = 0;
> n->raw_default = NULL;
> n->cooked_default = NULL;

Good catch.  Committed after fixing that issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Peter Eisentraut
On 4/27/17 21:20, Masahiko Sawada wrote:
> Isn't it better to use  != NIL instead as follows?
> 
>else if (table_state != NIL && last_start_times)

I'm not a fan of that in general, and it doesn't really add any clarity
here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Peter Eisentraut
On 4/27/17 15:33, Petr Jelinek wrote:
> On 27/04/17 21:00, Peter Eisentraut wrote:
>> On 4/27/17 06:47, Petr Jelinek wrote:
>>> One thing I am missing in your patch however is cleanup of entries for
>>> relations that finished sync. I wonder if it would be enough to just
>>> destroy the hash when we get to empty list.
>>
>> I had omitted that because the amount of memory "leaked" is not much,
>> but I guess it wouldn't hurt to clean it up.
>>
>> How about the attached?
> 
> Yes that's more or less what I meant. All good.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Robert Haas
On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane  wrote:
>>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>>> the index, such that they'd be visited this way only once?  If that's
>>> not happening, maybe we should try to fix it.
>
>> Hmm.  Actually, I think the scenario I saw was where there was a large
>> number of tuples at the end of the index that weren't dead yet due to
>> an old snapshot held open.  That index was being scanned by lots of
>> short-running queries.  Those queries executed just fine, but they
>> took a long to plan because they had to step over all of the dead
>> tuples in the index one by one.
>
> But that was the scenario that we intended to fix by changing to
> SnapshotDirty, no?  Or I guess not quite, because
> dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty.

Yup.

> Maybe we need another type of snapshot that would accept any
> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
> but a tuple that was live more recently than the xmin horizon seems
> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
> implements the right behavior, but we don't have a Snapshot-style
> interface for it.

Maybe.  What I know is that several people have found SnapshotDirty to
be problematic, and in the case with which I am acquainted, using
SnapshotAny fixed it.  I do not know whether, if everybody in the
world were using SnapshotAny, somebody would have the problem you're
talking about, or some other one.  And if they did, I don't know
whether using the new kind of snapshot you are proposing would fix it.
I do know that giving SnapshotAny to people seems to have only
improved things according to the information currently available to
me.

I don't, in general, share your intuition that using SnapshotAny is
the wrong thing.  We're looking up the last value in the index for
planning purposes.  It seems to me that, as far as making index scans
more or less expensive to scan, a dead tuple is almost as good as a
live one.  Until that tuple is not only marked dead, but removed from
the index page, it contributes to the cost of an index scan.  To put
that another way, suppose the range of index keys is 0 to 2 million,
but the heap tuples for values 1 million and up are committed deleted.
All the index tuples remain (and may or may not be removable depending
on what other snapshots exist in the system).  Now, consider the
following three cases:

(a) index scan from 0 to 10,000
(b) index scan from 1,000,000 to 1,010,000
(c) index scan from 3,000,000 to 3,010,000

I contend that the cost of index scan (b) is a lot closer to the cost
of (a) than to the cost of (c).  (b) finds a whole bunch of index
tuples; (c) gives up immediately and goes home.  So I actually think
that using the actual last value in the index - 2,000,000 - is
conceptually *correct* regardless of whether it's marked dead and
regardless of whether the corresponding heap tuple is dead.  The cost
depends mostly on which tuples are present in the index, not which
table rows the user can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread David Rowley
(On 29 April 2017 at 02:26, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> I've reproduced this bug on d981074c.
>> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
>> result is OK.
>> But with seqscan and hashjoin disabled, query returns 0 rows.
>
> Ah, thanks for the clue about enable_hashjoin, because it wasn't
> reproducing for me as stated.
>
> It looks like in the case that's giving wrong answers, the mergejoin
> is wrongly getting marked as "Inner Unique".  Something's a bit too ()
> cheesy about that planner logic --- not sure what, yet.

Seems related to the unconditional setting of extra.inner_unique to
true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()

Setting this based on the return value of innerrel_is_unique() as done
with the other join types seems to fix the issue.

I don't know yet if that's the correct fix. It's pretty late 'round
this side to be thinking too hard about it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication in the same cluster

2017-04-28 Thread Robert Haas
On Thu, Apr 27, 2017 at 4:08 AM, Petr Jelinek
 wrote:
> Back when writing the original patch set, I was also playing with the
> idea of having CREATE SUBSCRIPTION do multiple committed steps in
> similar fashion to CREATE INDEX CONCURRENTLY but that leaves mess behind
> on failure which also wasn't very popular outcome. I wonder how bad it
> would be if we created all the stuff for subscription but in disabled
> form, then committed, then created slot outside of tx (slot creation is
> not transactional anyway) and then switched the subscription to enabled
> (if needed) in next tx. It would still leave subscription behind on
> failure but a) user would see the failure, b) the subscription would be
> inactive so no active harm from it. We also already prevent running
> CREATE SUBSCRIPTION inside transaction block when automatic slot
> creation is chosen so there is no difference from that perspective.

Sounds like a solid approach.  There's no way to end up with a remote
object without also ending up with a logical object, which seems like
it greatly reduces the chances of confusion and error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Robert Haas
 On Fri, Apr 28, 2017 at 2:13 AM, Amit Langote
 wrote:
> It seems to me that there is no difference in behavior between
> inheritance-based and declarative partitioning as far as statement-level
> triggers are concerned (at least currently).  In both the cases, we fire
> statement-level triggers only for the table specified in the command.

OK.

>>> By the way, code changes I made in the attached are such that a subsequent
>>> patch could implement firing statement-level triggers of all the tables in
>>> a partition hierarchy, which it seems we don't want to do.  Should then
>>> the code be changed to not create ResultRelInfos of all the tables but
>>> only the root table (the one mentioned in the command)?  You will see that
>>> the patch adds fields named es_nonleaf_result_relations and
>>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>>> would perhaps do, for example.
>>
>> It seems better not to create any ResultRelInfos that we don't
>> actually need, so +1 for such a revision to the patch.
>
> OK, done.  It took a bit more work than I thought.

So, this seems weird, because rootResultRelIndex is initialized even
when splan->partitioned_rels == NIL, but isn't actually valid in that
case.  ExecInitModifyTable seems to think it's always valid, though.

I think the way that you've refactored fireBSTriggers and
fireASTriggers is a bit confusing.  Instead of splitting out a
separate function, how about just having the existing function begin
with if (node->rootResultRelInfo) resultRelInfo =
node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ?
I think the way you've coded it is a holdover from the earlier design
where you were going to call it multiple times, but now that's not
needed.

Looks OK, otherwise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Dmitriy Sarafannikov

> What I'm thinking of is the regular indexscan that's done internally
> by get_actual_variable_range, not whatever ends up getting chosen as
> the plan for the user query.  I had supposed that that would kill
> dead index entries as it went, but maybe that's not happening for
> some reason.


Really, this happens as you said. Index entries are marked as dead.
But after this, backends spends cpu time on skip this killed entries
in _bt_checkkeys :

if (scan->ignore_killed_tuples && ItemIdIsDead(iid))
{
/* return immediately if there are more tuples on the page */
if (ScanDirectionIsForward(dir))
{
if (offnum < PageGetMaxOffsetNumber(page))
return NULL;
}
else
{
BTPageOpaque opaque = (BTPageOpaque) 
PageGetSpecialPointer(page);

if (offnum > P_FIRSTDATAKEY(opaque))
return NULL;
}

This confirmed by perf records and backtrace reported by Vladimir earlier.
root@pgload01e ~ # perf report | grep -v '^#' | head
56.67%  postgres   postgres[.] _bt_checkkeys
19.27%  postgres   postgres[.] _bt_readpage
 2.09%  postgres   postgres[.] pglz_decompress
 2.03%  postgres   postgres[.] LWLockAttemptLock
 1.61%  postgres   postgres[.] PinBuffer.isra.3
 1.14%  postgres   postgres[.] hash_search_with_hash_value
 0.68%  postgres   postgres[.] LWLockRelease
 0.42%  postgres   postgres[.] AllocSetAlloc
 0.40%  postgres   postgres[.] SearchCatCache
 0.40%  postgres   postgres[.] ReadBuffer_common
root@pgload01e ~ #
It seems like killing dead tuples does not solve this problem.

Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Tom Lane
Robert Haas  writes:
> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane  wrote:
>> How so?  Shouldn't the indexscan go back and mark such tuples dead in
>> the index, such that they'd be visited this way only once?  If that's
>> not happening, maybe we should try to fix it.

> Hmm.  Actually, I think the scenario I saw was where there was a large
> number of tuples at the end of the index that weren't dead yet due to
> an old snapshot held open.  That index was being scanned by lots of
> short-running queries.  Those queries executed just fine, but they
> took a long to plan because they had to step over all of the dead
> tuples in the index one by one.

But that was the scenario that we intended to fix by changing to
SnapshotDirty, no?  Or I guess not quite, because
dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty.

Maybe we need another type of snapshot that would accept any
non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
but a tuple that was live more recently than the xmin horizon seems
like it's acceptable enough.  HeapTupleSatisfiesVacuum already
implements the right behavior, but we don't have a Snapshot-style
interface for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Tom Lane
David Rowley  writes:
> Did you mean to attach this?

See the link in Teodor's original message (it's actually a .bz2 file
not a .gz)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread David Rowley
On 29 April 2017 at 00:45, Alexander Korotkov  wrote:
> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
> result is OK.

Hi,

Did you mean to attach this?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-28 Thread Robert Haas
On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane  wrote:
>>> But if we delete many rows from beginning or end of index, it would be
>>> very expensive too because we will fetch each dead row and reject it.
>
>> Yep, and I've seen that turn into a serious problem in production.
>
> How so?  Shouldn't the indexscan go back and mark such tuples dead in
> the index, such that they'd be visited this way only once?  If that's
> not happening, maybe we should try to fix it.

Hmm.  Actually, I think the scenario I saw was where there was a large
number of tuples at the end of the index that weren't dead yet due to
an old snapshot held open.  That index was being scanned by lots of
short-running queries.  Those queries executed just fine, but they
took a long to plan because they had to step over all of the dead
tuples in the index one by one.  That increased planning time -
multiplied by the number of times it was incurred - was sufficient to
cripple the system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev



Ah, thanks for the clue about enable_hashjoin, because it wasn't
reproducing for me as stated.

I missed tweaked config, sorry

--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Tom Lane
Alexander Korotkov  writes:
> I've reproduced this bug on d981074c.
> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
> result is OK.
> But with seqscan and hashjoin disabled, query returns 0 rows.

Ah, thanks for the clue about enable_hashjoin, because it wasn't
reproducing for me as stated.

It looks like in the case that's giving wrong answers, the mergejoin
is wrongly getting marked as "Inner Unique".  Something's a bit too
cheesy about that planner logic --- not sure what, yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] vcregress support for single TAP tests

2017-04-28 Thread Andrew Dunstan


On 04/26/2017 10:32 PM, Peter Eisentraut wrote:
> On 4/23/17 17:09, Andrew Dunstan wrote:
>> Here's a patch that will allow calling vcregress.pl to run a single TAP
>> test set. It would work like this:
>>
>>
>> vcregress.pl src/test/recover true
>>
>>
>> The second argument if true (in the perl sense, of course) would trigger
>> a temp install before running the tests. It defaults to off, in an
>> attempt to minimize the unnecessary running of installs.
> Seems kind of weird to have that functionality only for the tap tests,
> though.
>



Yeah, you're right, I think we'd be much better mimicking what the
makefiles do and honouring the NO_TEMP_INSTALL environment variable.
Revised patch that works that way attached.

So to run an arbitrary TAP test set, you would do (for example)

vcregress.pl taptest src/test/recover


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 8933920..d387274 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -34,7 +34,7 @@ if (-e "src/tools/msvc/buildenv.pl")
 
 my $what = shift || "";
 if ($what =~
-/^(check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck)$/i
+/^(check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck|taptest)$/i
   )
 {
 	$what = uc $what;
@@ -54,13 +54,6 @@ copy("$Config/dummy_seclabel/dummy_seclabel.dll", "src/test/regress");
 
 $ENV{PATH} = "$topdir/$Config/libpq;$ENV{PATH}";
 
-my $schedule = shift;
-unless ($schedule)
-{
-	$schedule = "serial";
-	$schedule = "parallel" if ($what eq 'CHECK' || $what =~ /PARALLEL/);
-}
-
 if ($ENV{PERL5LIB})
 {
 	$ENV{PERL5LIB} = "$topdir/src/tools/msvc;$ENV{PERL5LIB}";
@@ -90,13 +83,14 @@ my %command = (
 	ISOLATIONCHECK => \,
 	BINCHECK   => \,
 	RECOVERYCHECK  => \,
-	UPGRADECHECK   => \,);
+	UPGRADECHECK   => \,
+	TAPTEST=> \,);
 
 my $proc = $command{$what};
 
 exit 3 unless $proc;
 
-&$proc();
+&$proc(@_);
 
 exit 0;
 
@@ -104,6 +98,7 @@ exit 0;
 
 sub installcheck
 {
+	my $schedule = shift || 'serial';
 	my @args = (
 		"../../../$Config/pg_regress/pg_regress",
 		"--dlpath=.",
@@ -119,6 +114,7 @@ sub installcheck
 
 sub check
 {
+	my $schedule = shift || 'parallel';
 	InstallTemp();
 	chdir "${topdir}/src/test/regress";
 	my @args = (
@@ -145,7 +141,7 @@ sub ecpgcheck
 	exit $status if $status;
 	InstallTemp();
 	chdir "$topdir/src/interfaces/ecpg/test";
-	$schedule = "ecpg";
+	my $schedule = "ecpg";
 	my @args = (
 		"../../../../$Config/pg_regress_ecpg/pg_regress_ecpg",
 		"--bindir=",
@@ -219,6 +215,17 @@ sub bincheck
 	exit $mstat if $mstat;
 }
 
+sub taptest
+{
+	my $dir = shift;
+
+	die "no tests found!" unless -d "$topdir/$dir/t";
+
+	InstallTemp();
+	my $status = tap_check("$topdir/$dir");
+	exit $status if $status;
+}
+
 sub plcheck
 {
 	chdir "../../pl";
@@ -516,7 +523,6 @@ sub fetchRegressOpts
 	$m =~ s{\\\r?\n}{}g;
 	if ($m =~ /^\s*REGRESS_OPTS\s*\+?=(.*)/m)
 	{
-
 		# Substitute known Makefile variables, then ignore options that retain
 		# an unhandled variable reference.  Ignore anything that isn't an
 		# option starting with "--".
@@ -588,17 +594,21 @@ sub GetTests
 
 sub InstallTemp
 {
-	print "Setting up temp install\n\n";
-	Install("$tmp_installdir", "all", $config);
+	unless ($ENV{NO_TEMP_INSTALL})
+	{
+		print "Setting up temp install\n\n";
+		Install("$tmp_installdir", "all", $config);
+	}
 	$ENV{PATH} = "$tmp_installdir/bin;$ENV{PATH}";
 }
 
 sub usage
 {
 	print STDERR
-	  "Usage: vcregress.pl  [  ]\n\n",
+	  "Usage: vcregress.pl  [ arg [ , arg] ... ]\n\n",
 	  "Options for :\n",
 	  "  bincheck   run tests of utilities in src/bin/\n",
+	  "  taptest   run an arbitrary TAP test\n",
 	  "  check  deploy instance and run regression tests on it\n",
 	  "  contribcheck   run tests of modules in contrib/\n",
 	  "  ecpgcheck  run regression tests of ECPG\n",
@@ -608,8 +618,10 @@ sub usage
 	  "  plcheckrun tests of PL languages\n",
 	  "  recoverycheck  run recovery test suite\n",
 	  "  upgradecheck   run tests of pg_upgrade\n",
-	  "\nOptions for :\n",
+	  "\nOptions for arg: (used by check and installcheck)\n",
 	  "  serial serial mode\n",
-	  "  parallel   parallel mode\n";
+	  "  parallel   parallel mode\n",
+	  "\nOptions for args for taptest\n",
+	  "  test-dir   (required) directory where tests reside\n",
 	exit(1);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Alexander Korotkov
On Fri, Apr 28, 2017 at 12:48 PM, Teodor Sigaev  wrote:

> Both 9.6 and 10devel are affected to addiction  of query result on seqscan
>> variable.
>>
> Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting.
> But it doesn't push down condition too.


I've reproduced this bug on d981074c.
On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query
result is OK.

# explain analyze SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd';
 QUERY PLAN
-
 Nested Loop  (cost=6.42..1924.71 rows=1 width=99) (actual
time=14.044..34.957 rows=*162* loops=1)
   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..4.30 rows=1 width=66)
(actual time=0.026..0.028 rows=1 loops=1)
 Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
 Heap Fetches: 0
   ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual
time=0.077..28.310 rows=70530 loops=1)
 Hash Cond: (t3.id2 = t2.id)
 ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66)
(actual time=0.005..6.433 rows=70530 loops=1)
 ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual
time=0.065..0.065 rows=184 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 20kB
   ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33)
(actual time=0.003..0.025 rows=184 loops=1)
 Planning time: 2.542 ms
 Execution time: 35.008 ms
(14 rows)

But with seqscan and hashjoin disabled, query returns 0 rows.

# set enable_seqscan = off;
# set enable_hashjoin = off;
# explain analyze SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd';
  QUERY PLAN
---
 Nested Loop  (cost=0.97..5265.82 rows=1 width=99) (actual
time=18.718..18.718 rows=*0* loops=1)
   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 163
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..4.30 rows=1 width=66)
(actual time=0.024..0.024 rows=1 loops=1)
 Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
 Heap Fetches: 0
   ->  Merge Join  (cost=0.69..5259.48 rows=163 width=66) (actual
time=0.033..18.670 rows=163 loops=1)
 Merge Cond: (t2.id = t3.id2)
 ->  Index Only Scan using t2i1 on t2  (cost=0.27..19.03 rows=184
width=33) (actual time=0.015..0.038 rows=184 loops=1)
   Heap Fetches: 0
 ->  Index Only Scan using t3i2 on t3  (cost=0.42..4358.37
rows=70530 width=66) (actual time=0.015..10.484 rows=70094 loops=1)
   Heap Fetches: 0
 Planning time: 2.571 ms
 Execution time: 18.778 ms
(14 rows)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] scram and \password

2017-04-28 Thread Heikki Linnakangas

On 04/28/2017 07:49 AM, Noah Misch wrote:

On Fri, Apr 21, 2017 at 11:04:14PM +0300, Heikki Linnakangas wrote:

I'll continue reviewing the rest of the patch on Monday, but [...]


This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Moreover, this open item has been in progress for 17 days, materially
longer than the 1-2 week RMT non-intervention period.  Refer to the policy on
open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


I just pushed some little cleanups that caught my eye while working on 
this. I need to rebase the patch set I sent earlier, and fix the little 
things that Michael pointed out, but that shouldn't take long. I plan to 
push a fix for this on Tuesday (Monday is a national holiday here).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure in REL9_5_STABLE

2017-04-28 Thread Pavan Deolasee
On Wed, Apr 19, 2017 at 11:19 AM, Andrew Gierth  wrote:

> > "Pavan" == Pavan Deolasee  writes:
>
>  Pavan> I am attaching a patch that throws a similar ERROR during
>  Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we
>  Pavan> always decide to use sort-based implementation for grouping, but
>  Pavan> do not check if the columns support ordering or not.
>
> Hmmm. This is obviously my fault somewhere... the patch looks good, I'll
> deal with it shortly.
>

JFR this got fixed via 7be3678a8cfb55dcfca90fa586485f835ab912a5

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-04-28 Thread Craig Ringer
On 28 Apr. 2017 17:04, "Kang Yuzhe"  wrote:

Hello Simon,
The journey that caused and is causing me a lot of pain is finding my way
in PG development.
Complex Code Reading like PG. Fully understanding the science of DBMS
Engines: Query Processing, Storage stuff, Transaction Management and so
on...

Anyway as you said, the rough estimation towards any expertise seems to be
in abidance with by The 10,000 Hour Rule. I will strive based on this rule.


Start with not top-posting on the mailing list ;)


For now, would please tell me how to know the exact PG version to which a
specific patch was developed?
Given x patch, how do I know the specific PG version it was developed for?


If it a was created by git format-patch then the base git revision will be
shown. This may be a commit from postgres public tree that you can find
with 'git branch --contains'.

Otherwise look at the proposed commit message if any, in the patch header.
Or the email it was attached to. If all else fails guess based on the date.


Re: [HACKERS] Crash when partition column specified twice

2017-04-28 Thread Beena Emerson
Hello Amit,

The extra n->is_from_type = false; seems to be added by mistake?

@@ -11888,6 +11891,8 @@ TableFuncElement:   ColId Typename
opt_collate_clause
n->is_local = true;
n->is_not_null = false;
n->is_from_type = false;
+   n->is_from_type = false;
+   n->is_from_parent = false;
n->storage = 0;
n->raw_default = NULL;
n->cooked_default = NULL;



On Fri, Apr 28, 2017 at 6:08 AM, Amit Langote  wrote:

> On 2017/04/27 12:36, Amit Langote wrote:
> > Noticed that a crash occurs if a column is specified twice when creating
> a
> > partition:
> >
> > create table p (a int) partition by list (a);
> >
> > -- crashes
> > create table p1 partition of parent (
> >   a not null,
> >   a default 1
> > ) for values in (1);
> >
> > The logic in MergeAttributes() that merged partition column options with
> > those of the parent didn't properly check for column being specified
> twice
> > and instead tried to delete the same ColumnDef from a list twice, causing
> > the crash.
> >
> > Attached fixes that.
>
> Patch rebased, because of a conflict with b9a3ef55b2.
>
> Thanks,
> Amit
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Dropping a partitioned table takes too long

2017-04-28 Thread 高增琦
It seems that in 'load_relcache_init_file()', we forget to initialize
'rd_pdcxt' of relcache.

2017-04-27 0:33 GMT+08:00 Robert Haas :

> On Wed, Apr 26, 2017 at 12:22 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote
> >>  wrote:
> >>> Your patch seems to be a much better solution to the problem, thanks.
> >
> >> Does anyone wish to object to this patch as untimely?
> >
> >> If not, I'll commit it.
> >
> > It's certainly not untimely to address such problems.  What I'm wondering
> > is if we should commit both patches.  Avoiding an unnecessary heap_open
> > is certainly a good thing, but it seems like the memory leak addressed
> > by the first patch might still be of concern in other scenarios.
>
> I will defer to you on that.  If you think that patch is a good idea,
> please have at it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


Re: [HACKERS] Cached plans and statement generalization

2017-04-28 Thread Konstantin Knizhnik



On 26.04.2017 13:46, Pavel Stehule wrote:


I attach new patch which allows to limit the number of
autoprepared statements (autoprepare_limit GUC variable).
Also I did more measurements, now with several concurrent
connections and read-only statements.
Results of pgbench with 10 connections, scale 10 and read-only
statements are below:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
206k


As you can see, autoprepare provides more than 2 times speed
improvement.

Also I tried to measure overhead of parsing (to be able to
substitute all literals, not only string literals).
I just added extra call of pg_parse_query. Speed is reduced to 181k.
So overhead is noticeable, but still making such optimization useful.
This is why I want to ask question:  is it better to implement
slower but safer and more universal solution?


Unsafe solution has not any sense, and it is dangerous (80% of 
database users has not necessary knowledge). If somebody needs the max 
possible performance, then he use explicit prepared statements.




I attached new patch to this mail. I completely reimplement my original 
approach and now use parse tree transformation.

New pgbench (-S -c 10) results are the following:

Protocol
TPS
extended
87k
prepared
209k
simple+autoprepare
185k


So there is some slowdown comparing with my original implementation and 
explicitly prepared statements, but still it provide more than two times 
speed-up comparing with unprepared queries. And it doesn't require to 
change existed applications.
As far as most of real production application are working with DBMS 
through some connection pool (pgbouncer,...), I think that such 
optimization will be useful.
Isn't it interesting if If we can increase system throughput almost two 
times by just setting one parameter in configuration file?


I also tried to enable autoprepare by default and run regression tests. 
7 tests are not passed because of the following reasons:
1. Slightly different error reporting (for example error location is not 
always identically specified).
2. Difference in query behavior caused by  changed local settings 
(Andres gives an example with search_path,  and date test is failed 
because of changing datestyle).
3. Problems with indirect dependencies (when table is altered only 
cached plans directly depending on this relation and invalidated, but 
not plans with indirect dependencies).

4. Not performing domain checks for null values.

I do not think that this issues can cause problems for real application.

Also it is possible to limit number of autoprepared statements using 
autoprepare_limit parameter, avoid possible backend memory overflow in 
case of larger number of unique queries sent by application. LRU 
discipline is used to drop least recently used plans.


Any comments and suggestions for future improvement of this patch are 
welcome.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index cd39167..4fbc8b7 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -3610,6 +3610,454 @@ raw_expression_tree_walker(Node *node,
 }
 
 /*
+ * raw_expression_tree_mutator --- transform raw parse tree. 
+ *
+ * This function is implementing slightly different approach for tree update than expression_tree_mutator().
+ * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node.
+ * It makes it possible to remember changes and easily revert them without extra traversal of the tree.
+ * 
+ * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update.
+ * 
+ * Like raw_expression_tree_walker, there is no special rule about query
+ * boundaries: we descend to everything that's possibly interesting.
+ *
+ * Currently, the node type coverage here extends only to DML statements
+ * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because
+ * this is used mainly during analysis of CTEs, and only DML statements can
+ * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned.
+ */
+bool
+raw_expression_tree_mutator(Node *node,
+			bool (*mutator) (),
+			void *context)
+{
+	ListCell   *temp;
+
+	/*
+	 * The walker has already visited the current node, and so we need only
+	 * recurse into any sub-nodes it has.
+	 */
+	if (node == NULL)
+		return false;
+
+	/* Guard against stack overflow due to overly complex expressions */
+	check_stack_depth();
+
+	switch (nodeTag(node))
+	{
+		case T_SetToDefault:
+		case T_CurrentOfExpr:
+		case T_Integer:
+		case T_Float:
+		case T_String:
+		case T_BitString:
+		case T_Null:
+		case T_ParamRef:
+		case T_A_Const:
+		case 

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev

Both 9.6 and 10devel are affected to addiction  of query result on seqscan
variable.
Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it 
doesn't push down condition too.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Masahiko Sawada
On Fri, Apr 28, 2017 at 5:26 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada  
> wrote in 

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote
On 2017/04/28 7:36, David Fetter wrote:
> On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote:
>> On 2017/04/27 1:52, Robert Haas wrote:
>>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote
>>>  wrote:
 FWIW, I too prefer the latter, that is, fire only the parent's triggers.
 In that case, applying only the patch 0001 will do.
>>>
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
>>
>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> Did I notice correctly that there's no way to handle transition tables
> for partitions in AFTER STATEMENT triggers?

Did you mean to ask about AFTER STATEMENT triggers defined on
"partitioned" tables?  Specifying transition table for them is disallowed
at all.

ERROR:  "p" is a partitioned table
DETAIL:  Triggers on partitioned tables cannot have transition tables.

Triggers created on (leaf) partitions *do* allow specifying transition table.

Or are you asking something else altogether?

> If not, I'm not suggesting that this be added at this late date, but
> we might want to document that.

I don't see mentioned in the documentation that such triggers cannot be
defined on partitioned tables.  Is that what you are saying should be
documented?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-28 Thread Teodor Sigaev

Hi!

Seems, there two issues:

1) Sometime conditions which for a first glance could be pushed down to scan are 
leaved as join quals. And it could be a ~30 times performance loss.


2) Number of query result depend on enabe_seqscan variable.

The query
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

produces following plan:
 Nested Loop  (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 
rows=162 loops=1)

   Join Filter: (t3.id1 = t1.id)
   Rows Removed by Join Filter: 70368
   ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 width=66) 
(actual time=0.100..0.103 rows=1 loops=1)

 Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
 Heap Fetches: 1
   ->  Hash Join  (cost=6.14..1918.37 rows=163 width=66) (actual 
time=0.370..120.971 rows=70530 loops=1)

(1)  Hash Cond: (t3.id2 = t2.id)
(2)  ->  Seq Scan on t3  (cost=0.00..1576.30 rows=70530 width=66) (actual 
time=0.017..27.424 rows=70530 loops=1)
 ->  Hash  (cost=3.84..3.84 rows=184 width=33) (actual 
time=0.273..0.273 rows=184 loops=1)

   Buckets: 1024  Batches: 1  Memory Usage: 20kB
   ->  Seq Scan on t2  (cost=0.00..3.84 rows=184 width=33) (actual 
time=0.017..0.105 rows=184 loops=1)

 Planning time: 7.326 ms
 Execution time: 149.115 ms


Condition (1) is not pushed to scan (2) which seemsly could be safely moved. 
With seqscan = off condition is not pushed too but query returns only one row 
instead of 162. Scan on t3 returns ~7 rows but only ~150 rows are really 
needed. I didn't found a combination of GUCs enable_* to push down that and it 
seems to me there is reason for that which I don't see or support is somehow missed.


If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on 
them) the query could be directly rewrited to inner join and its plan is:
 Nested Loop  (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 
rows=162 loops=1)
   ->  Nested Loop  (cost=9.43..292.77 rows=25 width=99) (actual 
time=0.316..0.645 rows=162 loops=1)
 ->  Index Only Scan using t1i2 on t1  (cost=0.28..8.30 rows=1 
width=66) (actual time=0.047..0.050 rows=1 loops=1)

   Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text)
   Heap Fetches: 1
 ->  Bitmap Heap Scan on t3  (cost=9.15..283.53 rows=94 width=66) 
(actual time=0.257..0.426 rows=162 loops=1)

   Recheck Cond: (id1 = t1.id)
   Heap Blocks: exact=3
   ->  Bitmap Index Scan on t3i1  (cost=0.00..9.12 rows=94 width=0) 
(actual time=0.186..0.186 rows=162 loops=1)

 Index Cond: (id1 = t1.id)
   ->  Index Only Scan using t2i1 on t2  (cost=0.27..0.29 rows=1 width=33) 
(actual time=0.024..0.024 rows=1 loops=162)

 Index Cond: (id = t3.id2)
 Heap Fetches: 162
 Planning time: 5.532 ms
 Execution time: 5.457 ms

Second plan is ~30 times faster. But with turned  off sequentual scan the first 
query is not work correctly, which points to some bug in planner, I suppose. 
Both 9.6 and 10devel are affected to addiction  of query result on seqscan variable.



Dump to reproduce (subset of real data but obfucated), queries are in attachment
http://sigaev.ru/misc/exists_to_nested.sql.gz
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
--query returns 162 rows
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

set enable_seqscan=off;

--the same query returns only one row!
explain analyze
SELECT
*
FROM
t1
INNER JOIN t2 ON (
EXISTS (
SELECT
true
FROM
t3
WHERE
t3.id1 = t1.id  AND
t3.id2 = t2.id
)
)
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd'
;

explain analyze
SELECT
t1.*
FROM
t1, t2, t3
WHERE
t1.name = '5c5fec6a41b8809972870abc154b3ecd' AND
t3.id1 = t1.id  AND
t3.id2 = t2.id;


Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-04-28 Thread Kang Yuzhe
Hello Simon,
The journey that caused and is causing me a lot of pain is finding my way
in PG development.
Complex Code Reading like PG. Fully understanding the science of DBMS
Engines: Query Processing, Storage stuff, Transaction Management and so
on...

Anyway as you said, the rough estimation towards any expertise seems to be
in abidance with by The 10,000 Hour Rule. I will strive based on this rule.

For now, would please tell me how to know the exact PG version to which a
specific patch was developed?
Given x patch, how do I know the specific PG version it was developed for?

Regards,
Zeray



On Mon, Apr 17, 2017 at 7:33 PM, Simon Riggs  wrote:

> On 27 March 2017 at 13:00, Kang Yuzhe  wrote:
>
> > I have found PG source Code reading and hacking to be one the most
> > frustrating experiences in my life.  I believe that PG hacking should
> not be
> > a painful journey but an enjoyable one!
> >
> > It is my strong believe that out of my PG hacking frustrations, there may
> > come insights for the PG experts on ways how to devise hands-on with PG
> > internals so that new comers will be great coders as quickly as possible.
>
> I'm here now because PostgreSQL has clear, well designed and
> maintained code with accurate docs, great comments and a helpful team.
>
> I'd love to see detailed cases where another project is better in a
> measurable way; I am willing to learn from that.
>
> Any journey to expertise takes 10,000 hours. There is no way to shorten
> that.
>
> What aspect of your journey caused you pain?
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote
Hi Rajkumar,

On 2017/04/28 17:11, Rajkumar Raghuwanshi wrote:
> On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
>> Updated patch attached.
> 
> I have applied given patch, could see below behaviour with statement
> trigger.
> 
> When trying to delete value within partition range, triggers got fired
> (with zero row affected) as expected, but if trying to delete value which
> is outside of partition range (with zero row affected), No trigger fired.
> is this expected??

Good catch.

I'm afraid though that this is not a defect of this patch, but some
unrelated (maybe) bug, which affects not only the partitioned tables but
inheritance in general.

Problem is that the plan created is such that the executor has no
opportunity to fire the trigger in question, because the plan contains no
information about which table is affected by the statement.  You can see
that with inheritance.  See below:

create table foo ();
create table bar () inherits (foo);

create or replace function trig_notice() returns trigger as $$
  begin raise notice 'trigger fired'; return null; end;
$$ language plpgsql;

create trigger foo_del_before before delete on foo
  for each statement execute procedure trig_notice();

explain delete from foo where false;
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

-- no trigger fired
delete from foo where false;
DELETE 0

Trigger *is* fired though, if inheritance is not used.

explain delete from only foo where false;
   QUERY PLAN
-
 Delete on foo  (cost=0.00..0.00 rows=0 width=0)
   ->  Result  (cost=0.00..0.00 rows=0 width=6)
 One-Time Filter: false
(3 rows)

delete from only foo where false;
NOTICE:  trigger fired
DELETE 0

I'm not sure how to go about fixing this, if at all.  Or to fix it at
least for partitioned tables.  Would like to hear what others think.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interval for launching the table sync worker

2017-04-28 Thread Kyotaro HORIGUCHI
At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada  
wrote in 

Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Rajkumar Raghuwanshi
On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Updated patch attached.
>

Hi Amit,

I have applied given patch, could see below behaviour with statement
trigger.

When trying to delete value within partition range, triggers got fired
(with zero row affected) as expected, but if trying to delete value which
is outside of partition range (with zero row affected), No trigger fired.
is this expected??

Below are steps to reproduce.

CREATE TABLE trigger_test_table (a INT, b INT) PARTITION BY RANGE(a);
CREATE TABLE trigger_test_table1 PARTITION OF trigger_test_table FOR VALUES
FROM (0) to (6);
INSERT INTO trigger_test_table (a,b) SELECT i,i FROM generate_series(1,3)i;

CREATE TABLE trigger_test_statatics(TG_NAME varchar,TG_TABLE_NAME
varchar,TG_LEVEL varchar,TG_WHEN varchar, TG_OP varchar);

CREATE FUNCTION trigger_test_procedure() RETURNS TRIGGER AS $ttp$
BEGIN
INSERT INTO trigger_test_statatics SELECT
TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,TG_OP;
RETURN OLD;
END;
$ttp$ LANGUAGE plpgsql;

CREATE TRIGGER trigger_test11 AFTER DELETE ON trigger_test_table FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();
CREATE TRIGGER trigger_test12 AFTER DELETE ON trigger_test_table1 FOR EACH
STATEMENT EXECUTE PROCEDURE trigger_test_procedure();

postgres=# DELETE FROM trigger_test_table WHERE a = 5;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
tg_name |   tg_table_name| tg_level  | tg_when | tg_op
++---+-+
 trigger_test11 | trigger_test_table | STATEMENT | AFTER   | DELETE
(1 row)

TRUNCATE TABLE trigger_test_statatics;

--statement trigger NOT fired, when trying to delete data outside partition
range.
postgres=# DELETE FROM trigger_test_table WHERE a = 10;
DELETE 0
postgres=# SELECT * FROM trigger_test_statatics;
 tg_name | tg_table_name | tg_level | tg_when | tg_op
-+---+--+-+---
(0 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] subscription worker doesn't start immediately on eabled

2017-04-28 Thread Kyotaro HORIGUCHI
At Fri, 28 Apr 2017 06:43:19 +0900, Fujii Masao  wrote 
in 

Re: [HACKERS] .pgpass's behavior has changed

2017-04-28 Thread Michael Paquier
On Fri, Apr 28, 2017 at 4:54 PM, Kyotaro HORIGUCHI
 wrote:
> I noticed that the precedence between host and hostaddr in a
> connection string is reversed in regard to .pgpass lookup in
> devel.
>
> For example the following connection string uses a .pgpass entry
> with "127.0.0.1", not "hoge".
>
> "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres"
>
>
> This change was introdueced by the commit
> 274bb2b3857cc987cfa21d14775cae9b0dababa5 and the current behavior
> contradicts the documentation.
>
> https://www.postgresql.org/docs/devel/static/libpq-connect.html
>
>> hostaddr
>> ...
>>   Note that authentication is likely to fail if host is not the
>>   name of the server at network address hostaddr. Also, note that
>>   host rather than hostaddr is used to identify the connection in
>>   a password file (see Section 33.15, “The Password File”).
>
> I think this should be fixed for the same reason with the
> following commit.

I am planning to look at your patch more closely, but thinking how
painful it is going to be to check your patch, I think that it would
be a good idea to have a TAP test for pgpass in
src/test/authentication/ to be sure that this never breaks again in
the future. We cannot test all combinations as servers only listen to
unix domains but surely we can get some coverage with libpq.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] .pgpass's behavior has changed

2017-04-28 Thread Kyotaro HORIGUCHI
Hello.

I noticed that the precedence between host and hostaddr in a
connection string is reversed in regard to .pgpass lookup in
devel.

For example the following connection string uses a .pgpass entry
with "127.0.0.1", not "hoge".

"host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres"


This change was introdueced by the commit
274bb2b3857cc987cfa21d14775cae9b0dababa5 and the current behavior
contradicts the documentation.

https://www.postgresql.org/docs/devel/static/libpq-connect.html

> hostaddr
> ...
>   Note that authentication is likely to fail if host is not the
>   name of the server at network address hostaddr. Also, note that
>   host rather than hostaddr is used to identify the connection in
>   a password file (see Section 33.15, “The Password File”).

I think this should be fixed for the same reason with the
following commit.

> commit 11003eb55658df0caf183eef69c7a97d56a4f2d7
> Author: Robert Haas 
> Date:   Thu Dec 1 14:36:39 2016 -0500
> 
> libpq: Fix inadvertent change in PQhost() behavior.

But the above also leaves a bug so I sent another patch to fix
it. The attched patch restores the 9.6's beavior of looking up
.pgpass file in the same manner to the aother patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 978,986  connectOptions2(PGconn *conn)
  
  		for (i = 0; i < conn->nconnhost; i++)
  		{
! 			/* Try to get a password for this host from pgpassfile */
  			conn->connhost[i].password =
! passwordFromFile(conn->connhost[i].host,
   conn->connhost[i].port,
   conn->dbName,
   conn->pguser,
--- 978,995 
  
  		for (i = 0; i < conn->nconnhost; i++)
  		{
! 			/*
! 			 * Try to get a password for this host from pgpassfile. We use host
! 			 * name rather than host address in the same manner to PQhost().
! 			 */
! 			char *pwhost = conn->connhost[i].host;
! 
! 			if (conn->connhost[i].type == CHT_HOST_ADDRESS &&
! conn->pghost != NULL && conn->pghost[0] != '\0')
! pwhost = conn->pghost;
! 
  			conn->connhost[i].password =
! passwordFromFile(pwhost,
   conn->connhost[i].port,
   conn->dbName,
   conn->pguser,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PQhost may return socket dir for network connection

2017-04-28 Thread Kyotaro HORIGUCHI
Hello.

As the subject, PQhost() seems to be forgeting about the case
where only hostaddr is specified in a connection string.

PQhost() returns /var/run/postgresql or such for a connection
made from the connection string "hostaddr=127.0.0.1 port=5432
dbname=postgres". I don't believe this behavior is useful.

The first attached file is a fix for master. connhost[]->pghost
is properly set by connectOptions2() so we don't need to handle
the default case in PQhost.

The second is a fix for 9.6. connectOption2 was setting
pgunixsocket then clearing pghost reagardless of the value of
pghostaddr. This behavior inhibited PQhost() from getting the
right name to return.

The behavior for several connection strings after this fix are
the follows. The first line is the case fixed by this patch.

1. PQhost = 127.0.0.1 <= "hostaddr=127.0.0.1 port=5432 dbname=postgres"
2. PQhost = /tmp <= "port=5432 dbname=postgres"
3. PQhost = /tmp <= "host=/tmp port=5432 dbname=postgres"
4. PQhost = /tmp <= "host=/tmp hostaddr=127.0.0.1 port=5432 dbname=postgres"
5. PQhost = localhost <= "host=localhost port=5432 dbname=postgres"
6. PQhost = localhost <= "host=localhost hostaddr=127.0.0.1 port=5432 
dbname=postgres"
7. PQhost = hoge <= "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres"

The third case is somewhat confusing but it would be the user's
fault.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 5836,5854  PQhost(const PGconn *conn)
  {
  	if (!conn)
  		return NULL;
! 	if (conn->connhost != NULL &&
! 		conn->connhost[conn->whichhost].type != CHT_HOST_ADDRESS)
! 		return conn->connhost[conn->whichhost].host;
! 	else if (conn->pghost != NULL && conn->pghost[0] != '\0')
  		return conn->pghost;
! 	else
! 	{
! #ifdef HAVE_UNIX_SOCKETS
! 		return DEFAULT_PGSOCKET_DIR;
! #else
! 		return DefaultHost;
! #endif
! 	}
  }
  
  char *
--- 5836,5857 
  {
  	if (!conn)
  		return NULL;
! 
! 	/*
! 	 * We should try to avoid returning ip address. connhost[]->host stores IP
! 	 * address in the case of CHT_HOST_ADDRESS so try to use conn->pghost
! 	 * instead.
! 	 */
! 	if (conn->connhost == NULL ||
! 		(conn->connhost[conn->whichhost].type == CHT_HOST_ADDRESS &&
! 		 conn->pghost != NULL && conn->pghost[0] != '\0'))
  		return conn->pghost;
! 
! 	/*
! 	 * Otherwise we use this as host name even though it might be an IP
! 	 * address.
! 	 */
! 	return conn->connhost[conn->whichhost].host;
  }
  
  char *
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 819,827  connectOptions2(PGconn *conn)
  	}
  
  	/*
! 	 * Allow unix socket specification in the host name
  	 */
! 	if (conn->pghost && is_absolute_path(conn->pghost))
  	{
  		if (conn->pgunixsocket)
  			free(conn->pgunixsocket);
--- 819,828 
  	}
  
  	/*
! 	 * Allow unix socket specification in the host name if hostaddr is not set
  	 */
! 	if ((conn->pghostaddr == NULL || conn->pghostaddr[0] == '\0') &&
! 		conn->pghost && is_absolute_path(conn->pghost))
  	{
  		if (conn->pgunixsocket)
  			free(conn->pgunixsocket);
***
*** 5354,5370  PQhost(const PGconn *conn)
  		return NULL;
  	if (conn->pghost != NULL && conn->pghost[0] != '\0')
  		return conn->pghost;
! 	else
! 	{
  #ifdef HAVE_UNIX_SOCKETS
! 		if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0')
! 			return conn->pgunixsocket;
! 		else
! 			return DEFAULT_PGSOCKET_DIR;
  #else
! 		return DefaultHost;
  #endif
- 	}
  }
  
  char *
--- 5355,5369 
  		return NULL;
  	if (conn->pghost != NULL && conn->pghost[0] != '\0')
  		return conn->pghost;
! 	if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0')
! 		return conn->pghostaddr;
  #ifdef HAVE_UNIX_SOCKETS
! 	if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0')
! 		return conn->pgunixsocket;
! 	return DEFAULT_PGSOCKET_DIR;
  #else
! 	return DefaultHost;
  #endif
  }
  
  char *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise aggregation/grouping

2017-04-28 Thread Antonin Houska
Jeevan Chalke  wrote:

> On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska  wrote:
>
> > Robert Haas  wrote:
> > > Well, I'm confused. I see that there's a relationship between what
> > > Antonin is trying to do and what Jeevan is trying to do, but I can't
> > > figure out whether one is a subset of the other, whether they're both
> > > orthogonal, or something else. This plan looks similar to what I
> > > would expect Jeevan's patch to produce,

> >  The point is that the patch Jeevan wanted to work on is actually a subset 
> > of
> >  [1] combined with [2].

> Seems like, as you are targeting every relation whether or not it is
> partitioned.

Yes.

> With my patch, I am getting following plan where we push entire
> aggregation below append.
> 
> QUERY PLAN 
> --
> Append
> -> HashAggregate
> Group Key: b_1.j
> -> Hash Join
> Hash Cond: (b_1.j = c_1.k)
> -> Seq Scan on b_1
> -> Hash
> -> Seq Scan on c_1
> -> HashAggregate
> Group Key: b_2.j
> -> Hash Join
> Hash Cond: (b_2.j = c_2.k)
> -> Seq Scan on b_2
> -> Hash
> -> Seq Scan on c_2
> (15 rows)

I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.

> Antonin, I have tried applying your patch on master but it doesn't get
> apply. Can you please provide the HEAD and any other changes required
> to be applied first?

I've lost that information. I'll post a new version to the [1] thread asap.

> How the plan look like when GROUP BY key does not match with the
> partitioning key i.e. GROUP BY b.v ?

EXPLAIN (COSTS false)
SELECT  b.v, avg(b.v + c.v)
FROMb
JOIN
c ON b.j = c.k
GROUP BYb.v;

   QUERY PLAN   

 Finalize HashAggregate
   Group Key: b_1.v
   ->  Append
 ->  Partial HashAggregate
   Group Key: b_1.v
   ->  Hash Join
 Hash Cond: (b_1.j = c_1.k)
 ->  Seq Scan on b_1
 ->  Hash
   ->  Seq Scan on c_1
 ->  Partial HashAggregate
   Group Key: b_2.v
   ->  Hash Join
 Hash Cond: (b_2.j = c_2.k)
 ->  Seq Scan on b_2
 ->  Hash
   ->  Seq Scan on c_2


> >  [1] https://www.postgresql.org/message-id/9666.1491295317%40localhost
> >
> >  [2] https://commitfest.postgresql.org/14/994/


-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug in prepared statement cache invalidation?

2017-04-28 Thread Konstantin Knizhnik

Hi hackers,

I find out that now Postgres correctly invalidates prepared plans which 
directly depend on altered relation, but doesn't invalidate plans having 
transitive (indirect) dependencies.

Is it a bug or feature?

postgres=# create table foo(x integer);
CREATE TABLE
postgres=# select * from foo;
 x
---
(0 rows)

postgres=# create function returnqueryf()returns setof foo  as $$ begin 
return query select * from foo; end; $$ language plpgsql;

CREATE FUNCTION
postgres=# select * from returnqueryf();
 x
---
(0 rows)

postgres=# create function returnqueryff()returns setof foo  as $$ begin 
return query select * from returnqueryf(); end; $$ language plpgsql;

CREATE FUNCTION
postgres=# select * from returnqueryff();
 x
---
(0 rows)

postgres=# alter table foo add column y integer;
ALTER TABLE
postgres=# select * from foo;
 x | y
---+---
(0 rows)

postgres=# select * from returnqueryf();
 x | y
---+---
(0 rows)

postgres=# select * from returnqueryff();
ERROR:  structure of query does not match function result type
DETAIL:  Number of returned columns (1) does not match expected column 
count (2).

CONTEXT:  PL/pgSQL function returnqueryff() line 1 at RETURN QUERY
p

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Merge join for GiST

2017-04-28 Thread Andrew Borodin
Hi, hackers!

I've adapted crossmatch join from pgSphere to cube for performance tests.
I've placed spatial join code here
https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/spatialjoin.c
and node code here
https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/joinnode.c

If you have an idea of improving the performance of this code, please
do not hesitate to express them.
One of the performance bottlenecks is code nearby heap_getattr() in
fetch_next_pair().

==Performance Tests==
I've tested performance on queries which aggregate result of the
spatial join. See cube_test.sql attached.

On 3d data, Spatial Join performs 3x faster than Nested Loop Join + GiST Scan

Nested Loop + Index Scan plan:
 HashAggregate  (cost=36841568.00..36841570.00 rows=200 width=40)
(actual time=206565.869..206738.307 rows=298731 loops=1)
   Group Key: r.nx
   ->  Nested Loop  (cost=0.41..25591568.00 rows=9 width=40)
(actual time=0.357..200838.416 rows=8464147 loops=1)
 ->  Seq Scan on regions r  (cost=0.00..6410.00 rows=30
width=40) (actual time=0.015..324.436 rows=30 loops=1)
 ->  Index Scan using idx on datatable a  (cost=0.41..55.28
rows=3000 width=64) (actual time=0.174..0.648 rows=28 loops=30)
   Index Cond: (r.c @> c)
 Planning time: 17.175 ms
 Execution time: 206806.926 ms

Time: 206852,635 ms (03:26,853)

Spatial Join plan:
HashAggregate  (cost=56250001.00..56250003.00 rows=200 width=40)
(actual time=67373.644..67553.118 rows=298731 loops=1)
   Group Key: r.nx
   ->  Custom Scan (SpatialJoin)  (cost=0.00..1.00 rows=45
width=40) (actual time=0.151..61718.804 rows=8464147 loops=1)
 Outer index: idx
 Inner index: idx1
 Planning time: 0.182 ms
 Execution time: 67630.742 ms

Time: 67631,557 ms (01:07,632)

But on more realistic 7D data with queries emulating OLAP system
performance of Spatial Join is 2 times worse than Nested Loop Join +
GiST Scan. Which comes as a complete surprise to me.
I do not see any algorithmic reason for Spatial Join to be slower.
Thus I strongly suspect that my implementation is not efficient, but
as for now I have no ideas how to improve it.

Here are plans for 7D
Nested Loop + Index Scan
 HashAggregate  (cost=3425143.00..3425743.00 rows=6 width=72)
(actual time=122794.715..122822.893 rows=6 loops=1)
   Group Key: r.nx
   ->  Nested Loop  (cost=0.41..2075143.00 rows=6000 width=72)
(actual time=0.311..100478.710 rows=39817008 loops=1)
 ->  Seq Scan on r1 r  (cost=0.00..2419.00 rows=6
width=128) (actual time=0.043..60.579 rows=6 loops=1)
 ->  Index Scan using ix_a1_cube on a1 a  (cost=0.41..24.55
rows=1000 width=128) (actual time=0.110..1.266 rows=664 loops=6)
   Index Cond: (c <@ r.c)
 Planning time: 0.349 ms
 Execution time: 122831.353 ms
(8 rows)

Spatial Join
 HashAggregate  (cost=6750001.00..6750601.00 rows=6 width=72)
(actual time=241832.855..241889.360 rows=6 loops=1)
   Group Key: r.nx
   ->  Custom Scan (SpatialJoin)  (cost=0.00..1.00 rows=3
width=72) (actual time=0.140..216187.111 rows=39817008 loops=1)
 Outer index: ix_r1_cube
 Inner index: ix_a1_cube
 Planning time: 0.533 ms
 Execution time: 241907.569 ms
(7 rows)

Time: 241910,440 ms (04:01,910)

Any ideas will be highly appreciated.

Best regards, Andrey Borodin.
create extension if not exists cube;

begin transaction;
SELECT setseed(.43);

\timing

create table dataTable as select cube(array[random(),random(),random()]) c, 
1::float as x1, 1::float as x2, 1::float as x3, 1::float as x4 from 
generate_series(1,3e6,1) s;
create index idx on dataTable using gist(c);

create table regions as select cube(array[x,y,z],array[x+0.1,y+0.01,z+0.01]) c, 
row_number() over() as nx from (select random() x,random() y, random() z from 
generate_series(1,3e5,1) s) q;
create index idx1 on regions using gist(c);

set max_parallel_workers_per_gather = 0;

explain analyze select 
sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4
from dataTable a
join regions r
on 
r.c @> a.c
group by r.nx;

explain analyze select 
sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4
from dataTable a
join regions r
on 
r.c && a.c
group by r.nx;

commit;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2017-04-28 Thread Amit Langote
Thanks for taking a look.

On 2017/04/28 5:24, Robert Haas wrote:
> On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote
>  wrote:
>>> Do we need to update the documentation?
>>
>> Yes, I think we should.  How about as in the attached?
> 
> Looks reasonable, but I was thinking you might also update the section
> which contrasts inheritance-based partitioning with declarative
> partitioning.

It seems to me that there is no difference in behavior between
inheritance-based and declarative partitioning as far as statement-level
triggers are concerned (at least currently).  In both the cases, we fire
statement-level triggers only for the table specified in the command.

Maybe, we will fix things someday so that statement-level triggers will be
fired for all the tables in a inheritance hierarchy when the root parent
is updated or deleted, but that's currently not true.  We may never
implement that behavior for declarative partitioned tables though, so
there will be a difference if and when we implement the former.

Am I missing something?

>> By the way, code changes I made in the attached are such that a subsequent
>> patch could implement firing statement-level triggers of all the tables in
>> a partition hierarchy, which it seems we don't want to do.  Should then
>> the code be changed to not create ResultRelInfos of all the tables but
>> only the root table (the one mentioned in the command)?  You will see that
>> the patch adds fields named es_nonleaf_result_relations and
>> es_num_nonleaf_result_relations, whereas just es_root_result_relation
>> would perhaps do, for example.
> 
> It seems better not to create any ResultRelInfos that we don't
> actually need, so +1 for such a revision to the patch.

OK, done.  It took a bit more work than I thought.

Updated patch attached.

Thanks,
Amit
>From a8b584f09bc02b6c16c33e816fc12c7e443dd65c Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 24 Apr 2017 14:55:08 +0900
Subject: [PATCH] Fire per-statement triggers of partitioned tables

Current implementation completely misses them, because it assumes
that no ResultRelInfos need to be created for partitioned tables,
whereas they *are* needed to fire the per-statment triggers, which
we *do* allow to be defined on the partitioned tables.

Reported By: Rajkumar Raghuwanshi
Patch by: Amit Langote
Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com
---
 doc/src/sgml/trigger.sgml   | 26 +++
 src/backend/executor/execMain.c | 55 --
 src/backend/executor/nodeModifyTable.c  | 64 ++
 src/backend/nodes/copyfuncs.c   |  2 +
 src/backend/nodes/outfuncs.c|  3 ++
 src/backend/nodes/readfuncs.c   |  2 +
 src/backend/optimizer/plan/createplan.c |  1 +
 src/backend/optimizer/plan/planner.c|  3 ++
 src/backend/optimizer/plan/setrefs.c| 19 ++--
 src/include/nodes/execnodes.h   | 12 +
 src/include/nodes/plannodes.h   | 13 +-
 src/include/nodes/relation.h|  1 +
 src/test/regress/expected/triggers.out  | 81 +
 src/test/regress/sql/triggers.sql   | 70 
 14 files changed, 323 insertions(+), 29 deletions(-)

diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index 2a718d7f47..5771bd5fdc 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -33,7 +33,8 @@

 A trigger is a specification that the database should automatically
 execute a particular function whenever a certain type of operation is
-performed.  Triggers can be attached to tables, views, and foreign tables.
+performed.  Triggers can be attached to tables (partitioned or not),
+views, and foreign tables.
   
 
   
@@ -111,14 +112,21 @@
 Statement-level BEFORE triggers naturally fire before the
 statement starts to do anything, while statement-level AFTER
 triggers fire at the very end of the statement.  These types of
-triggers may be defined on tables or views.  Row-level BEFORE
-triggers fire immediately before a particular row is operated on,
-while row-level AFTER triggers fire at the end of the
-statement (but before any statement-level AFTER triggers).
-These types of triggers may only be defined on tables and foreign tables.
-Row-level INSTEAD OF triggers may only be defined on views,
-and fire immediately as each row in the view is identified as needing to
-be operated on.
+triggers may be defined on tables, views, or foreign tables.  Row-level
+BEFORE triggers fire immediately before a particular row is
+operated on, while row-level AFTER triggers fire at the end of
+the statement (but before any statement-level AFTER triggers).
+These types of triggers may only be defined on non-partitioned tables and
+foreign tables.