Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Rafia Sabih
On Wed, Jul 26, 2017 at 11:06 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Jul 26, 2017 at 11:00 AM, Rafia Sabih
>  wrote:
> >
> >
> > On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat
> >  wrote:
> >>
> >> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
> >>  wrote:
> >>
> >> > Query plans for the above mentioned queries is attached.
> >> >
> >>
> >> Can you please share plans for all the queries, even if they haven't
> >> chosen partition-wise join when run on partitioned tables with
> >> enable_partition_wise_join ON? Also, please include the query in
> >> explain analyze output using -a or -e flats to psql. That way we will
> >> have query and its plan in the same file for ready reference.
> >>
> > I didn't run the query not using partition-wise join, for now.
>
> parse-parse error, sorry. Do you mean, you haven't run the queries
> which do not use partition-wise join?
>
> Yes, that's what I mean.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Ashutosh Bapat
On Wed, Jul 26, 2017 at 11:00 AM, Rafia Sabih
 wrote:
>
>
> On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat
>  wrote:
>>
>> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
>>  wrote:
>>
>> > Query plans for the above mentioned queries is attached.
>> >
>>
>> Can you please share plans for all the queries, even if they haven't
>> chosen partition-wise join when run on partitioned tables with
>> enable_partition_wise_join ON? Also, please include the query in
>> explain analyze output using -a or -e flats to psql. That way we will
>> have query and its plan in the same file for ready reference.
>>
> I didn't run the query not using partition-wise join, for now.

parse-parse error, sorry. Do you mean, you haven't run the queries
which do not use partition-wise join?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Rafia Sabih
On Wed, Jul 26, 2017 at 10:58 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
>  wrote:
>
> > Query plans for the above mentioned queries is attached.
> >
>
> Can you please share plans for all the queries, even if they haven't
> chosen partition-wise join when run on partitioned tables with
> enable_partition_wise_join ON? Also, please include the query in
> explain analyze output using -a or -e flats to psql. That way we will
> have query and its plan in the same file for ready reference.
>
> I didn't run the query not using partition-wise join, for now.


-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Ashutosh Bapat
On Tue, Jul 25, 2017 at 11:01 AM, Rafia Sabih
 wrote:

> Query plans for the above mentioned queries is attached.
>

Can you please share plans for all the queries, even if they haven't
chosen partition-wise join when run on partitioned tables with
enable_partition_wise_join ON? Also, please include the query in
explain analyze output using -a or -e flats to psql. That way we will
have query and its plan in the same file for ready reference.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-07-25 Thread Pavel Stehule
Hi

I wrote a special pager for psql. Surely, this pager is not good for paging
of man pages. So is not good to set it as global pager. We can introduce
new env variable PSQL_PAGER for this purpose. It can work similar like
PSQL_EDITOR variable.

Notes, comments?

Regards

Pavel


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Ashutosh Bapat
On Tue, Jul 25, 2017 at 9:39 PM, Dilip Kumar  wrote:
> On Tue, Jul 25, 2017 at 8:59 PM, Robert Haas  wrote:
>> On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
>>  wrote:
>>> - other queries show a good 20-30% improvement in performance. Performance
>>> numbers are as follows,
>>>
>>> Query| un_part_head (seconds) | part_head (seconds) | part_patch (seconds) |
>>> 3 | 76 |127 | 88 |
>>> 4 |17 | 244 | 41 |
>>> 5 | 52 | 123 | 84 |
>>> 7 | 73 | 134 | 103 |
>>> 10 | 67 | 111 | 89 |
>>> 12 | 53 | 114 | 99 |
>>> 18 | 447 | 709 | 551 |
>>
>> Hmm.  This certainly shows that benefit of the patch, although it's
>> rather sad that we're still slower than if we hadn't partitioned the
>> data in the first place.  Why does partitioning hurt performance so
>> much?
>
> I was analysing some of the plans (without partition and with
> partition), Seems like one of the reasons of performing worse with the
> partitioned table is that we can not use an index on the partitioned
> table.
>
> Q4 is taking 17s without partition whereas it's taking 244s with partition.
>
> Now if we analyze the plan
>
> Without partition, it can use parameterize index scan on lineitem
> table which is really big in size. But with partition, it has to scan
> this table completely.
>
>   ->  Nested Loop Semi Join
>  ->  Parallel Bitmap Heap Scan on orders
>->  Bitmap Index Scan on
> idx_orders_orderdate  (cost=0.00..24378.88 r
>->  Index Scan using idx_lineitem_orderkey on
> lineitem  (cost=0.57..29.29 rows=105 width=8) (actual
> time=0.031..0.031 rows=1 loops=1122364)
>Index Cond: (l_orderkey =
> orders.o_orderkey)
>Filter: (l_commitdate < l_receiptdate)
>Rows Removed by Filter: 1
>

If the partitions have the same indexes as the unpartitioned table,
planner manages to create parameterized plans for each partition and
thus parameterized plan for the whole partitioned table. Do we have
same indexes on unpartitioned table and each of the partitions? The
difference between the two cases is the parameterized path on an
unpartitioned table scans only one index whereas that on the
partitioned table scans the indexes on all the partitions. My guess is
the planner thinks those many scans are costlier than hash/merge join
and chooses those strategies over parameterized nest loop join. In
case of partition-wise join, only one index on the inner partition is
involved and thus partition-wise join picks up parameterized nest loop
join. Notice, that this index is much smaller than the index on the
partitioned table, so the index scan will be a bit faster. But only a
bit, since the depth of the index doesn't increase linearly with the
size of index.

Rrun-time partition pruning will improve performance even without
partition-wise join since partition pruning will be able to eliminate
all but one partition and only one index needs to be scanned. If
planner is smart enough to cost that effectively, it will choose
parameterized nest loop join for partitioned table thus improving the
performance similar to unpartitioned case.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inadequate infrastructure for NextValueExpr

2017-07-25 Thread Thomas Munro
On Wed, Jul 26, 2017 at 4:04 PM, Thomas Munro
 wrote:
> On Wed, Jul 26, 2017 at 6:35 AM, Robert Haas  wrote:
>> That's pretty cool.  Per long-standing precedent, anything we use in a
>> build needs to be in Perl, not Python, but I think it would be great
>> to fix all of these (or the script) and then add this to our standard
>> build process.  It would be *great* to stop making mistakes like this.
>
> Ok, here is a Perl version.  It is literally my first Perl program so
> I probably got all those squiggles wrong.  Ouput as of today:

Argh, let me try that again with a copy-and-paste error corrected:

$ ./src/tools/check_node_support_code.pl
T_ObjectWithArgs (category PARSE TREE NODES) not handled in outfuncs.c
T_AccessPriv (category PARSE TREE NODES) not handled in outfuncs.c
T_CreateOpClassItem (category PARSE TREE NODES) not handled in outfuncs.c
T_FunctionParameter (category PARSE TREE NODES) not handled in outfuncs.c
T_InferClause (category PARSE TREE NODES) not handled in outfuncs.c
T_OnConflictClause (category PARSE TREE NODES) not handled in outfuncs.c
T_RoleSpec (category PARSE TREE NODES) not handled in outfuncs.c
T_PartitionCmd (category PARSE TREE NODES) not handled in outfuncs.c
T_NamedTuplestoreScan is written by outfuncs.c as NAMEDTUPLESTORESCAN
but that name is not recognized by readfuncs.c
T_Alias (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_RangeVar (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_Expr (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_CaseWhen (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_TargetEntry (category PRIMITIVE NODES) not handled in
ruleutils.c:get_rule_expr
T_RangeTblRef (category PRIMITIVE NODES) not handled in
ruleutils.c:get_rule_expr
T_JoinExpr (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_FromExpr (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr
T_OnConflictExpr (category PRIMITIVE NODES) not handled in
ruleutils.c:get_rule_expr
T_IntoClause (category PRIMITIVE NODES) not handled in ruleutils.c:get_rule_expr

-- 
Thomas Munro
http://www.enterprisedb.com


check-node-support-code-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inadequate infrastructure for NextValueExpr

2017-07-25 Thread Thomas Munro
On Wed, Jul 26, 2017 at 6:35 AM, Robert Haas  wrote:
> That's pretty cool.  Per long-standing precedent, anything we use in a
> build needs to be in Perl, not Python, but I think it would be great
> to fix all of these (or the script) and then add this to our standard
> build process.  It would be *great* to stop making mistakes like this.

Ok, here is a Perl version.  It is literally my first Perl program so
I probably got all those squiggles wrong.  Ouput as of today:

$ ./src/tools/check_node_support_code.pl
T_ObjectWithArgs (category PARSE TREE NODES) not handled in outfuncs.c
T_AccessPriv (category PARSE TREE NODES) not handled in outfuncs.c
T_CreateOpClassItem (category PARSE TREE NODES) not handled in outfuncs.c
T_FunctionParameter (category PARSE TREE NODES) not handled in outfuncs.c
T_InferClause (category PARSE TREE NODES) not handled in outfuncs.c
T_OnConflictClause (category PARSE TREE NODES) not handled in outfuncs.c
T_RoleSpec (category PARSE TREE NODES) not handled in outfuncs.c
T_PartitionCmd (category PARSE TREE NODES) not handled in outfuncs.c
T_NamedTuplestoreScan is written by outfuncs.c as NAMEDTUPLESTORESCAN
but that name is not recognized by readfuncs.c
T_Alias (category PRIMITIVE NODES) not handled in equalfuncs.c
T_RangeVar (category PRIMITIVE NODES) not handled in equalfuncs.c
T_Expr (category PRIMITIVE NODES) not handled in equalfuncs.c
T_CaseWhen (category PRIMITIVE NODES) not handled in equalfuncs.c
T_TargetEntry (category PRIMITIVE NODES) not handled in equalfuncs.c
T_RangeTblRef (category PRIMITIVE NODES) not handled in equalfuncs.c
T_JoinExpr (category PRIMITIVE NODES) not handled in equalfuncs.c
T_FromExpr (category PRIMITIVE NODES) not handled in equalfuncs.c
T_OnConflictExpr (category PRIMITIVE NODES) not handled in equalfuncs.c
T_IntoClause (category PRIMITIVE NODES) not handled in equalfuncs.c

-- 
Thomas Munro
http://www.enterprisedb.com


check-node-support-code-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Tom Lane
Robert Haas  writes:
> Based on discussion downthread, it seems like what we actually need to
> do is update perl.m4 to extract CCFLAGS.  Turns out somebody proposed
> a patch for that back in 2002:
> https://www.postgresql.org/message-id/Pine.LNX.4.44.0211051045070.16317-20%40wotan.suse.de
> It seems to need a rebase.  :-)

Ah-hah, I *thought* we had considered the question once upon a time.
There were some pretty substantial compatibility concerns raised in that
thread, which is doubtless why it's still like that.

My beef about inter-compiler compatibility (if building PG with a
different compiler from that used for Perl) could probably be addressed by
absorbing only -D switches from the Perl flags.  But Peter seemed to feel
that even that could break things, and I worry that he's right for cases
like -D_FILE_OFFSET_BITS which affect libc APIs.  Usually we'd have made
the same decisions as Perl for that sort of thing, but if we didn't, it's
a mess.

I wonder whether we could adopt some rule like "absorb -D switches
for macros whose names do not begin with an underscore".  That's
surely a hack and three-quarters, but it seems safer than just
absorbing everything willy-nilly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-25 Thread Peter Geoghegan
On Tue, Jul 25, 2017 at 8:02 PM, Peter Geoghegan  wrote:
> Setup:
>
> Initialize pgbench (any scale factor).
> create index on pgbench_accounts (aid);

That "create index" was meant to be on "abalance", to make the UPDATE
queries always HOT-unsafe. (You'll want to *also* create this index
once the PK is dropped, to show how killing items on recent Postgres
versions hinges upon this being a unique index).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 10:23 AM, Robert Haas  wrote:
> While I'm not sure of the details, I suspect that we need to use one
> of those methods to get the CCFLAGS used to build perl, and include
> those when SPI.o, Util.o, and plperl.o in src/pl/plperl.  Or at least
> the -D switches from those CCFLAGS.  Here's about the simplest thing
> that seems like it might work on Linux; Windows would need something
> equivalent:
>
> override CPPFLAGS += $(shell $(PERL) -MConfig -e 'print
> $$Config::Config{"ccflags"};')
>
> On my MacBook Pro, with the built-in switches, that produces:
>
> -fno-common -DPERL_DARWIN -mmacosx-version-min=10.12 -pipe -Os
> -fno-strict-aliasing -fstack-protector-strong -I/opt/local/include
> -DPERL_USE_SAFE_PUTENV
>
> Or we could try to extract just the -D switches:
>
> override CPPFLAGS += $(shell $(PERL) -MConfig -e 'print join " ", grep
> { /^-D/ } split /\s+/, $$Config::Config{"ccflags"};')

Based on discussion downthread, it seems like what we actually need to
do is update perl.m4 to extract CCFLAGS.  Turns out somebody proposed
a patch for that back in 2002:

https://www.postgresql.org/message-id/Pine.LNX.4.44.0211051045070.16317-20%40wotan.suse.de

It seems to need a rebase.  :-)

And maybe some other changes, too.  I haven't carefully reviewed that thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-25 Thread Peter Geoghegan
On Tue, Jul 25, 2017 at 3:02 PM, Peter Geoghegan  wrote:
> I've been thinking about this a lot, because this really does look
> like a pathological case to me. I think that this workload is very
> sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at
> least, I can imagine that mechanism doing a lot better than it
> actually manages to do here. I wonder if it's possible that commit
> 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf
> pages, should have considered workloads like this.

While the benchmark Alik came up with is non-trivial to reproduce, I
can show a consistent regression for a simple case with only one
active backend. I'm not sure whether or not this is considered an
acceptable trade-off -- I didn't look through the archives from around
March of 2015 just yet.

Setup:

Initialize pgbench (any scale factor).
create index on pgbench_accounts (aid);

The point of this example is to show a simple query that is never
quite HOT-safe, where we need to create a new index entry in an index
for that reason alone. Theoretically, only one index needs to be
updated, not all indexes, because only one index covers attributes
that have truly changed. For example, if we had WARM, I think that
many of these theoretically unnecessary index tuple insertions would
not happen. When they do happen, because we don't have something like
WARM, it seems important that the kill_prior_tuples/LP_DEAD stuff does
its job. I don't think that it will consistently do that, though.

Steps:

(Set debugger breakpoint from within _bt_killitems())

Test queries (run these from a single interactive psql session):

update pgbench_accounts set abalance = abalance + 1 where aid = 763;
update pgbench_accounts set abalance = abalance + 1 where aid = 763;
update pgbench_accounts set abalance = abalance + 1 where aid = 763;

We now see that no update ever kills items within _bt_killitems(),
because our own update to the index leaf page itself nullifies our
ability to kill anything, by changing the page LSN from the one
stashed in the index scan state variable. Fortunately, we are not
really "self-blocking" dead item cleanup here, because the
_bt_check_unique() logic for killing all_dead index entries saves the
day by not caring about LSNs. However, that only happens because the
index on "aid" is a unique index.

If we drop the primary key, and create a regular index on "aid" in its
place, then the same UPDATE queries really do self-block from killing
index tuples due to changes in 2ed5b87f9, which could be pretty bad if
there wasn't some selects to do the kill_prior_tuple stuff instead. I
verified that this regression against 9.4 exists, just to be sure that
the problem wasn't somehow there all along -- the regression is real.
 :-(

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mishandling of WCO constraints in direct foreign table modification

2017-07-25 Thread Etsuro Fujita

On 2017/07/25 5:35, Robert Haas wrote:

On Fri, Jul 21, 2017 at 6:21 AM, Etsuro Fujita
 wrote:

I mean constraints derived from WITH CHECK OPTIONs specified for parent
views.  We use the words "WITH CHECK OPTION constraints" in comments in
nodeModifyTable.c, so the expression "CHECK OPTION constrains" doesn't sound
not that bad to me.  (I used "CHECK OPTION", not "WITH CHECK OPTION",
because we use "CHECK OPTION" a lot more in the documentation than "WITH
CHECK OPTION".)


Yeah, it seems OK to me, too; if the consensus is otherwise, we also
have the option to change it later.

Agreed.


Committed and back-patched as you
had it, but I removed a spurious comma.


Thanks for that, Robert!  Thanks for reviewing, Horiguchi-san!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change in "policy" on dump ordering?

2017-07-25 Thread Tom Lane
I wrote:
> The main problem with Kevin's fix, after thinking about it more, is that
> it shoves matview refresh commands into the same final processing phase
> where ACLs are done, which means that in a parallel restore they will not
> be done in parallel.  That seems like a pretty serious objection, although
> maybe not so serious that we'd be willing to accept a major rewrite in the
> back branches to avoid it.

> I'm wondering at this point about having restore create a fake DO_ACLS
> object (fake in the sense that it isn't in the dump file) that would
> participate normally in the dependency sort, and which we'd give a
> priority before matview refreshes but after everything else.  "Restore"
> of that object would perform the same operation we do now of running
> through the whole TOC and emitting grants/revokes.  So it couldn't be
> parallelized in itself (at least not without an additional batch of work)
> but it could be treated as an indivisible parallelized task, and then the
> matview refreshes could be parallelizable tasks after that.

> There's also Peter's proposal of splitting up GRANTs from REVOKEs and
> putting only the latter at the end.  I'm not quite convinced that that's
> a good idea but it certainly merits consideration.

After studying things for awhile, I've concluded that that last option
is probably not workable.  ACL items contain a blob of SQL that would be
tricky to pull apart, and is both version and options dependent, and
contains ordering dependencies that seem likely to defeat any desire
to put the REVOKEs last anyway.

Instead, I've prepared the attached draft patch, which addresses the
problem by teaching pg_backup_archiver.c to process TOC entries in
three separate passes, "main" then ACLs then matview refreshes.
It's survived light testing but could doubtless use further review.

Another way we could attack this is to adopt something similar to
the PRE_DATA_BOUNDARY/POST_DATA_BOUNDARY mechanism; that is, invent more
dummy section boundary objects, add dependencies sufficient to constrain
all TOC objects to be before or after the appropriate boundaries, and
then let the dependency sort go at it.  But I think that way is probably
more expensive than this one, and it doesn't have any real advantage if
there's not a potential for circular dependencies that need to be broken.
If somebody else wants to try drafting a patch like that, I won't stand
in the way, but I don't wanna do so.

Not clear where we want to go from here.  Should we try to get this
into next month's minor releases, or review it in September's commitfest
and back-patch after that?

regards, tom lane

diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 6123859..3687687 100644
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
*** typedef enum
*** 203,208 
--- 203,230 
  	OUTPUT_OTHERDATA			/* writing data as INSERT commands */
  } ArchiverOutput;
  
+ /*
+  * For historical reasons, ACL items are interspersed with everything else in
+  * a dump file's TOC; typically they're right after the object they're for.
+  * However, we need to restore data before ACLs, as otherwise a read-only
+  * table (ie one where the owner has revoked her own INSERT privilege) causes
+  * data restore failures.  On the other hand, matview REFRESH commands should
+  * come out after ACLs, as otherwise non-superuser-owned matviews might not
+  * be able to execute.  (If the permissions at the time of dumping would not
+  * allow a REFRESH, too bad; we won't fix that for you.)  These considerations
+  * force us to make three passes over the TOC, restoring the appropriate
+  * subset of items in each pass.  We assume that the dependency sort resulted
+  * in an appropriate ordering of items within each subset.
+  */
+ typedef enum
+ {
+ 	RESTORE_PASS_MAIN = 0,		/* Main pass (most TOC item types) */
+ 	RESTORE_PASS_ACL,			/* ACL item types */
+ 	RESTORE_PASS_REFRESH		/* Matview REFRESH items */
+ 
+ #define RESTORE_PASS_LAST RESTORE_PASS_REFRESH
+ } RestorePass;
+ 
  typedef enum
  {
  	REQ_SCHEMA = 0x01,			/* want schema */
*** struct _archiveHandle
*** 329,334 
--- 351,357 
  	int			noTocComments;
  	ArchiverStage stage;
  	ArchiverStage lastErrorStage;
+ 	RestorePass restorePass;	/* used only during parallel restore */
  	struct _tocEntry *currentTE;
  	struct _tocEntry *lastErrorTE;
  };
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f461692..4cfb71c 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*** static ArchiveHandle *_allocAH(const cha
*** 58,64 
  		 SetupWorkerPtrType setupWorkerPtr);
  static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
  	  ArchiveHandle *AH);
! static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData, bool acl_pass);
  

Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Tom Lane
Stephen Frost  writes:
> On Tue, Jul 25, 2017 at 20:29 Thom Brown  wrote:
>> I should point out that this commit was made during the 9.6 cycle, and
>> I get the same issue with 9.6.

> Interesting that Tom didn't. Still, that does make more sense to me.

Yeah, it makes more sense to me too, but nonetheless that's the result
I get.  I suspect there is an element of indeterminacy here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make ExplainBeginGroup()/ExplainEndGroup() public.

2017-07-25 Thread Kyotaro HORIGUCHI
At Fri, 21 Jul 2017 10:09:25 -0400, Hadi Moshayedi  wrote 
in 
> Hello,
> 
> The attached patch moves declarations of
> ExplainBeginGroup()/ExplainEndGroup() from explain.c to explain.h.
> 
> This can be useful for extensions that need explain groups in their
> custom-scan explain output.
> 
> For example, Citus uses groups in its custom explain outputs [1]. But it
> achieves it by having a copy of
> ExplainBeginGroup()/ExplainEndGroup() in its source code, which is not the
> best way.
> 
> Please review.
> 
> [1]
> https://github.com/citusdata/citus/blob/master/src/backend/distributed/planner/multi_explain.c

The patch is a kind of straightforward and looks fine for me.

I think they ought to be public, like many ExplainProperty*()
functions. On the other hand this patch can cause symbol
conflicts with some external modules but I think such breakage
doesn't matter so much.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-25 Thread Andres Freund
Hi,

On 2017-07-24 13:27:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> * I think the comments need more work.  Am willing to make a pass over
> >> that if you want.
> 
> > That'd be good, but let's wait till we have something more final.
> 
> Agreed, I'll wait till you produce another version.

Attached. Did a bunch of cleanup myself already.


I've moved the CHECK_FOR_INTERRUPTS() to the callsites. That
unsurprisingly ends up being somewhat verbose, and there's a bunch of
minor judgement calls where exactly to place them. While doing so I've
also added a few extra ones.  Did this in a separate patch to make it
easier to review.

I'm pretty jetlagged right now, so I want to do another pass to make
sure I didn't forget any CFI()s, but the general shape looks right.

Tried to address the rest of your feedback too.

> >> * Can we redefine the ExecCustomScan function pointer as type
> >> ExecProcNodeCB, eliminating the unsightly cast in nodeCustom.c?
> 
> > That'd change an "extension API", which is why I skipped it at this
> > point of the release cycle. It's not like we didn't have this type of
> > cast all over before. Ok, with changing it, but that's where I came
> > down.
> 
> Is this patch really not changing anything else that a custom-scan
> extension would touch?  If not, I'm okay with postponing this bit
> of cleanup to v11.

FWIW, I've reintroduced ExecCustomScan() which I'd previously removed,
because it now contains a CHECK_FOR_INTERRUPTS(). So this seems moot.


Greetings,

Andres Freund
>From c70603ac35665ba78f0a83d0abbd080b05a9442d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 25 Jul 2017 17:37:17 -0700
Subject: [PATCH 1/2] Move interrupt checking from ExecProcNode() to callers.

---
 contrib/postgres_fdw/postgres_fdw.c   |  2 ++
 src/backend/executor/execMain.c   |  2 ++
 src/backend/executor/execProcnode.c   |  2 --
 src/backend/executor/nodeAgg.c|  2 ++
 src/backend/executor/nodeAppend.c |  3 +++
 src/backend/executor/nodeCtescan.c|  2 ++
 src/backend/executor/nodeCustom.c |  3 +++
 src/backend/executor/nodeForeignscan.c|  3 +++
 src/backend/executor/nodeGather.c |  2 ++
 src/backend/executor/nodeGatherMerge.c|  2 ++
 src/backend/executor/nodeGroup.c  |  6 +
 src/backend/executor/nodeHash.c   |  4 +++
 src/backend/executor/nodeHashjoin.c   | 21 +--
 src/backend/executor/nodeLimit.c  |  4 +++
 src/backend/executor/nodeLockRows.c   |  2 ++
 src/backend/executor/nodeMaterial.c   |  2 ++
 src/backend/executor/nodeMergeAppend.c|  6 -
 src/backend/executor/nodeMergejoin.c  |  3 +++
 src/backend/executor/nodeModifyTable.c|  2 ++
 src/backend/executor/nodeNestloop.c   |  3 +++
 src/backend/executor/nodeProjectSet.c |  5 
 src/backend/executor/nodeRecursiveunion.c |  2 ++
 src/backend/executor/nodeResult.c |  3 +++
 src/backend/executor/nodeSetOp.c  |  9 +++
 src/backend/executor/nodeSort.c   |  4 +++
 src/backend/executor/nodeSubplan.c| 44 +++
 src/backend/executor/nodeSubqueryscan.c   |  3 +++
 src/backend/executor/nodeUnique.c |  3 +++
 src/backend/executor/nodeWindowAgg.c  |  8 +-
 29 files changed, 128 insertions(+), 29 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d77c2a70e4..0b2093f34b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2079,6 +2079,8 @@ postgresRecheckForeignScan(ForeignScanState *node, TupleTableSlot *slot)
 
 	Assert(outerPlan != NULL);
 
+	CHECK_FOR_INTERRUPTS();
+
 	/* Execute a local join execution plan */
 	result = ExecProcNode(outerPlan);
 	if (TupIsNull(result))
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 78cbcd1a32..a58a70e3f5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1542,6 +1542,8 @@ ExecPostprocessPlan(EState *estate)
 		{
 			TupleTableSlot *slot;
 
+			CHECK_FOR_INTERRUPTS();
+
 			/* Reset the per-output-tuple exprcontext each time */
 			ResetPerTupleExprContext(estate);
 
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 294ad2cff9..20fd9f822e 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -399,8 +399,6 @@ ExecProcNode(PlanState *node)
 {
 	TupleTableSlot *result;
 
-	CHECK_FOR_INTERRUPTS();
-
 	if (node->chgParam != NULL) /* something changed */
 		ExecReScan(node);		/* let ReScan handle this */
 
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index de9a18e71c..f9073e79aa 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -675,6 +675,8 @@ fetch_input_tuple(AggState *aggstate)
 {
 	TupleTableSlot *slot;
 
+	CHECK_FOR_INTERRUPTS();
+
 	if 

[HACKERS] Log LDAP "diagnostic messages"?

2017-07-25 Thread Thomas Munro
Hi hackers,

Some LDAP error codes are a bit vague.  For example:

   LDAP_CONNECT_ERROR  Indicates a connection problem.
   LDAP_PROTOCOL_ERROR A protocol violation was detected.

To learn more, you have to call
ldap_get_option(LDAP_OPT_DIAGNOSTIC_MESSAGE).  Should we do that?  For
example, instead of:

  LOG:  could not start LDAP TLS session: Protocol error

... you could see:

  LOG:  could not start LDAP TLS session: Protocol error
  DETAIL:  LDAP diagnostic message: unsupported extended operation

Well, that may not be the most illuminating example, but that's a
message sent back by the LDAP server that we're currently throwing
away, and can be used to distinguish between unsupported TLS versions,
missing StartTLS extension and various other cases.  Perhaps that
particular message would also be available via your LDAP server's
logs, if you can access them, but in some cases we're throwing away
client-side messages that are not available anywhere else like "TLS:
unable to get CN from peer certificate", "TLS: hostname does not match
CN in peer certificate" and more.

Something like the attached.

-- 
Thomas Munro
http://www.enterprisedb.com


ldap-diagnostic-message.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-25 Thread Craig Ringer
(Re-added hackers to Cc as this doesn't seem private, just accidentally
didn't reply-all?)

On 24 July 2017 at 23:50, Mat Arye  wrote:

>
>
>
>> Issue 1: Preloading the right shared library.
>> When preloading libraries (either via local_preload_libraries, or
>> session_preload_libraries, shared_preload_libraries), it would be nice to
>> preload the shared_library according to it's version. But, I looked through
>> the code and found no logic for adding version numbers to shared library
>> names.
>>
>>
>> You can't do that for shared_preload_libraries, because at
>> shared_preload_libraries time we don't have access to the DB and can't look
>> up the installed extension version(s). There might be different ones in
>> different DBs too.
>>
>>
> Yeah shared_preload_libraries is a special case I guess. Something like
> that could work with  local_preload_libraries or session_preload_libraries
> right?
>

It could work, but since it doesn't offer a complete solution I don't think
it's especially compelling.


>
>> Solution 1: Set session_preload_libraries on the database via ALTER
>> DATABASE SET. This can be done in the sql and the sql version-migration
>> scripts can change this value as you change extensions versions. I think
>> this would work, but it seems very hack-ish and less-than-ideal.
>>
>>
>> Won't work for some hooks, right?
>>
>> I've faced this issue with pglogical and BDR. If the user tries to update
>> the extension before a new enough .so is loaded we ERROR due to failure to
>> load missing C functions.
>>
>
> This is a good point. Thanks for bringing it to my attention. I guess if
> the CREATE FUNCTION call contained the name of the new .so then it would
> force a load, right? But that means you need to be safe with regard to
> having both .so file loaded at once (not sure that's possible). I think
> this is the biggest unknown in terms of whether a stub-loader /can/ work.
>

Unless both .so's have different filenames, you can't have both loaded in
the same backend. Though if you unlink and replace the .so with the same
file name while Pg is running, different backends could have different
versions loaded.

If you do give them different names and they both get linked into one
backend, whether it works will depend on details of linker options, etc. I
wouldn't want to do it personally, at least not unless I prefixed all the
.so's exported  symbols. If you're not worried about being portable it's
less of a concern.

Personally I just make sure to retain stub functions in the C extension for
anything removed. It's trivial clutter, easily swept into a corner in a
backward compat file.



>
>
>> If the .so is updated first the old extension function definitions can
>> fail at runtime if funcs are removed or change signature, but won't fail at
>> startup or load.
>>
>> So we let the C extension detect when it's newer than the loaded SQL ext
>> during its startup and run an ALTER EXTENSION to update it.
>>
>
> Yeah that's very similar to what we do now. It doesn't work for multiple
> dbs having different extension versions, though (at least for us).
>

Makes sense. Not a case I have ever cared to support.

-- 
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-07-25 Thread Craig Ringer
On 26 July 2017 at 07:12, Alvaro Herrera  wrote:

> Chapman Flack wrote:
>
> > Any takers if I propose this amendment in the form of a patch?
> >
> > Relying on the perl idiom instead of a $node->isa() test shortens
> > the patch; does that ameliorate at all the concern about complicating
> > core for the benefit of modules?
>
> Yeah, I like this (I also got word from Abhijit Menon-Sen that this is a
> saner way to do it, which counts as +1000 at least.)  This is how we
> should have built the method in the first place, rather than creating an
> exported function.  Not sure how we missed that.
>
> I changed the POD docs so that the class method version is the preferred
> form, and the exported function usage "is just backwards compatibility".
> All current usage uses that form, but hey, we can updated these later
> (if at all).
>
> Pushed to 9.6 and HEAD.
>

Thanks.

An upvote from our resident Perl wizard certainly does help :)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2017-07-25 Thread Craig Ringer
On 26 July 2017 at 00:16, Thom Brown  wrote:

> On 8 April 2016 at 07:13, Craig Ringer  wrote:
> > On 6 April 2016 at 22:17, Andres Freund  wrote:
> >
> >>
> >> Quickly skimming 0001 in [4] there appear to be a number of issues:
> >> * LWLockHeldByMe() is only for debugging, not functional differences
> >> * ReplicationSlotPersistentData is now in an xlog related header
> >> * The code and behaviour around name conflicts of slots seems pretty
> >>   raw, and not discussed
> >> * Taking spinlocks dependant on InRecovery() seems like a seriously bad
> >>   idea
> >> * I doubt that the archive based switches around StartupReplicationSlots
> >>   do what they intend. Afaics that'll not work correctly for basebackups
> >>   taken with -X, without recovery.conf
> >>
> >
> > Thanks for looking at it. Most of those are my errors. I think this is
> > pretty dead at least for 9.6, so I'm mostly following up in the hopes of
> > learning about a couple of those mistakes.
> >
> > Good catch with -X without a recovery.conf. Since it wouldn't be
> recognised
> > as a promotion and wouldn't increment the timeline, copied non-failover
> > slots wouldn't get removed. I've never liked that logic at all anyway, I
> > just couldn't think of anything better...
> >
> > LWLockHeldByMe() has a comment to the effect of: "This is meant as debug
> > support only." So that's just a dumb mistake on my part, and I should've
> > added "alreadyLocked" parameters. (Ugly, but works).
> >
> > But why would it be a bad idea to conditionally take a code path that
> > acquires a spinlock based on whether RecoveryInProgress()? It's not
> testing
> > RecoveryInProgress() more than once and doing the acquire and release
> based
> > on separate tests, which would be a problem. I don't really get the
> problem
> > with:
> >
> > if (!RecoveryInProgress())
> > {
> > /* first check whether there's something to write out */
> > SpinLockAcquire(>mutex);
> > was_dirty = slot->dirty;
> > slot->just_dirtied = false;
> > SpinLockRelease(>mutex);
> >
> > /* and don't do anything if there's nothing to write */
> > if (!was_dirty)
> > return;
> > }
> >
> > ... though I think what I really should've done there is just always
> dirty
> > the slot in the redo functions.
>
> Are there any plans to submit a new design/version for v11?


No. The whole approach seems to have been bounced from core. I don't agree
and continue to think this functionality is desirable but I don't get to
make that call.

If time permits I will attempt to update the logical decoding on standby
patchset instead, and if possible add support for fast-forward logical
decoding that does the minimum required to correctly maintain a slot's
catalog_xmin and restart_lsn when advanced. But this won't be usable
directly for failover like failover slots are, it'll require each
application to keep track of standbys and maintain slots on them too. Or
we'll need some kind of extension/helper to sync slot state.

In the mean time, 2ndQuadrant maintains an on-disk-compatible version of
failover slots that's available for support customers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Stephen Frost
Thom,

On Tue, Jul 25, 2017 at 20:29 Thom Brown  wrote:

> On 26 July 2017 at 00:52, Stephen Frost  wrote:
> > Thom,
> >
> > * Thom Brown (t...@linux.com) wrote:
> >> This is the culprit:
> >>
> >> commit 23f34fa4ba358671adab16773e79c17c92cbc870
> >> Author: Stephen Frost 
> >> Date:   Wed Apr 6 21:45:32 2016 -0400
> >
> > Thanks!  I'll take a look tomorrow.
>
> I should point out that this commit was made during the 9.6 cycle, and
> I get the same issue with 9.6.


Interesting that Tom didn't. Still, that does make more sense to me.

Thanks!

Stephen

>
>


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Thom Brown
On 26 July 2017 at 00:52, Stephen Frost  wrote:
> Thom,
>
> * Thom Brown (t...@linux.com) wrote:
>> This is the culprit:
>>
>> commit 23f34fa4ba358671adab16773e79c17c92cbc870
>> Author: Stephen Frost 
>> Date:   Wed Apr 6 21:45:32 2016 -0400
>
> Thanks!  I'll take a look tomorrow.

I should point out that this commit was made during the 9.6 cycle, and
I get the same issue with 9.6.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Stephen Frost
Thom,

* Thom Brown (t...@linux.com) wrote:
> This is the culprit:
> 
> commit 23f34fa4ba358671adab16773e79c17c92cbc870
> Author: Stephen Frost 
> Date:   Wed Apr 6 21:45:32 2016 -0400

Thanks!  I'll take a look tomorrow.

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] TAP: allow overriding PostgresNode in get_new_node

2017-07-25 Thread Alvaro Herrera
Chapman Flack wrote:

> Any takers if I propose this amendment in the form of a patch?
> 
> Relying on the perl idiom instead of a $node->isa() test shortens
> the patch; does that ameliorate at all the concern about complicating
> core for the benefit of modules?

Yeah, I like this (I also got word from Abhijit Menon-Sen that this is a
saner way to do it, which counts as +1000 at least.)  This is how we
should have built the method in the first place, rather than creating an
exported function.  Not sure how we missed that.  

I changed the POD docs so that the class method version is the preferred
form, and the exported function usage "is just backwards compatibility".
All current usage uses that form, but hey, we can updated these later
(if at all).

Pushed to 9.6 and HEAD.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Thom Brown
On 25 July 2017 at 21:47, Stephen Frost  wrote:
> Tom,
>
> On Tue, Jul 25, 2017 at 16:43 Tom Lane  wrote:
>>
>> AFAICT, pg_dump has no notion that it needs to be careful about the order
>> in which permissions are granted.  I did
>>
>> regression=# create user joe;
>> CREATE ROLE
>> regression=# create user bob;
>> CREATE ROLE
>> regression=# create user alice;
>> CREATE ROLE
>> regression=# \c - joe
>> You are now connected to database "regression" as user "joe".
>> regression=> create table joestable(f1 int);
>> CREATE TABLE
>> regression=> grant select on joestable to alice with grant option;
>> GRANT
>> regression=> \c - alice
>> You are now connected to database "regression" as user "alice".
>> regression=> grant select on joestable to bob;
>> GRANT
>>
>> and then pg_dump'd that.  The ACL entry for joestable looks like this:
>>
>> --
>> -- TOC entry 5642 (class 0 OID 0)
>> -- Dependencies: 606
>> -- Name: joestable; Type: ACL; Schema: public; Owner: joe
>> --
>>
>> SET SESSION AUTHORIZATION alice;
>> GRANT SELECT ON TABLE joestable TO bob;
>> RESET SESSION AUTHORIZATION;
>> GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;
>>
>> Unsurprisingly, that fails to restore:
>>
>> db2=# SET SESSION AUTHORIZATION alice;
>> SET
>> db2=> GRANT SELECT ON TABLE joestable TO bob;
>> ERROR:  permission denied for relation joestable
>>
>>
>> I am not certain whether this is a newly introduced bug or not.
>> However, I tried it in 9.2-9.6, and all of them produce the
>> GRANTs in the right order:
>>
>> GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;
>> SET SESSION AUTHORIZATION alice;
>> GRANT SELECT ON TABLE joestable TO bob;
>> RESET SESSION AUTHORIZATION;
>>
>> That might be just chance, but my current bet is that Stephen
>> broke it sometime in the v10 cycle --- especially since we
>> haven't heard any complaints like this from the field.
>
>
> Hmmm. I'll certainly take a look when I get back to a laptop, but I can't
> recall offhand any specific code for handling that, nor what change I might
> have made in the v10 cycle which would have broken it (if anything, I would
> have expected an issue from the rework in 9.6...).
>
> I should be able to look at this tomorrow though.

This is the culprit:

commit 23f34fa4ba358671adab16773e79c17c92cbc870
Author: Stephen Frost 
Date:   Wed Apr 6 21:45:32 2016 -0400

In pg_dump, include pg_catalog and extension ACLs, if changed

Now that all of the infrastructure exists, add in the ability to
dump out the ACLs of the objects inside of pg_catalog or the ACLs
for objects which are members of extensions, but only if they have
been changed from their original values.

The original values are tracked in pg_init_privs.  When pg_dump'ing
9.6-and-above databases, we will dump out the ACLs for all objects
in pg_catalog and the ACLs for all extension members, where the ACL
has been changed from the original value which was set during either
initdb or CREATE EXTENSION.

This should not change dumps against pre-9.6 databases.

Reviews by Alexander Korotkov, Jose Luis Tallon

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-25 Thread Peter Geoghegan
On Fri, Jul 14, 2017 at 5:06 PM, Peter Geoghegan  wrote:
> I think that what this probably comes down to, more than anything
> else, is that you have leftmost hot/bloated leaf pages like this:
>
>
>   idx  | level | l_item | blkno | btpo_prev |
> btpo_next | btpo_flags | type | live_items | dead_items |
> avg_item_size | page_size | free_size | highkey
> ---+---++---+---+---++--+++---+---+---+-
>  ...
>  pgbench_accounts_pkey | 0 |  1 | 1 | 0 |
> 2751 | 65 | l|100 | 41 |16 |
>8192 |  5328 | 11 00 00 00 00 00 00 00
>  pgbench_accounts_pkey | 0 |  2 |  2751 | 1 |
> 2746 | 65 | l| 48 | 90 |16 |
>8192 |  5388 | 32 00 00 00 00 00 00 00
> ...
>
> The high key for the leftmost shows that only values below 0x11 belong
> on the first page. This is about 16 or 17 possible distinct values,
> and yet the page has 100 live items, and 41 dead items; in total,
> there is room for 367 items. It's only slightly better with other
> nearby pages. This is especially bad because once the keyspace gets
> split up this finely, it's *impossible* to reverse it -- it's more or
> less a permanent problem, at least until a REINDEX.

I've been thinking about this a lot, because this really does look
like a pathological case to me. I think that this workload is very
sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at
least, I can imagine that mechanism doing a lot better than it
actually manages to do here. I wonder if it's possible that commit
2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf
pages, should have considered workloads like this.

The whole point of that commit was to reduce blocking by VACUUM on
index scans, and I think that it was effective in doing that in many
important cases. My concern is that the commit added this, which could
make LP_DEAD hinting occur less frequently:

+ * If the pin was released after reading the page, then we re-read it.  If it
+ * has been modified since we read it (as determined by the LSN), we dare not
+ * flag any entries because it is possible that the old entry was vacuumed
+ * away and the TID was re-used by a completely different heap tuple.
  */
 void
-_bt_killitems(IndexAnd, ScanDesc scan, bool haveLock)
+_bt_killitems(IndexScanDesc scan)

Even if I'm basically right about this making LP_DEAD hinting occur
less frequently in cases like the one from Alik, it might have
actually been worth it even from the point of view of index bloat,
because VACUUM operations complete more frequently, and so there is an
indirect boost. It's not as if we've been that great at assessing
problems like this before now, and so it bears considering if we could
do better here without introducing much additional complexity. I just
don't know.

I would like to hear opinions on:

How much of a difference might this have actually made?

If it's a significant factor for any important workload, what can be
done about it? Does anyone have any ideas about a less restrictive
strategy for avoiding spuriously killing an index tuple whose heap TID
was just recycled?

ISTM that by doing this LSN check, _bt_killitems() is least likely to
work when and where it is most needed.

The most obvious way to correctly proceed even when LSN has changed
would be to revisit the heap page when the leaf page LSN check doesn't
work out, and revalidate the safety of proceeding with killing tuples.
At least with unique indexes. It's probably extremely unlikely that
the problem that we must avoid here will actually be observed in the
real world, so it seems likely that an approach like this will be
worth it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

2017-07-25 Thread Alvaro Herrera
Kunshchikov Vladimir wrote:

> Errors should be like this:
> pg_restore: [compress_io] could not read from input file: d3/2811.dat.gz: 
> invalid distance too far back
> 
> Attached small fix for this issue.

After looking at this patch, I don't like it very much.  In particular,
I don't like the way you've handled the WRITE_ERROR_EXIT macro in
pg_backup_directory.c by undef'ing the existing one and creating it
anew.  The complete and correct definition should reside in one place
(pg_backup_archiver.h), instead of being split in two and being defined
differently.

Another point is that you broke the comment on the definition of struct
cfp "this is opaque to callers" by moving it to the header file
precisely with the point of making it transparent to callers.  We need
some better idea there.  I think it can be done by making compress_io.c
responsible of handing over the error message through some new function
(something very simple like "if compressedfp then return get_gz_error
else strerror" should suffice).

Also, I needed to rebase your patch over a recent pgindent run, though
that's a pretty small change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-25 Thread Robert Haas
On Thu, Jul 13, 2017 at 1:09 PM, Amit Khandekar  wrote:
> Attached is a WIP patch (make_resultrels_ordered.patch) that generates
> the result rels in canonical order. This patch is kept separate from
> the update-partition-key patch, and can be applied on master branch.

Hmm, I like the approach you've taken here in general, but I think it
needs cleanup.

+typedef struct ParentChild

This is a pretty generic name.  Pick something more specific and informative.

+static List *append_rel_partition_oids(List *rel_list, Relation rel);

One could be forgiven for thinking that this function was just going
to append OIDs, but it actually appends ParentChild structures, so I
think the name needs work.

+List *append_rel_partition_oids(List *rel_list, Relation rel)

Style.  Please pgindent your patches.

+#ifdef DEBUG_PRINT_OIDS
+print_oids(*leaf_part_oids);
+#endif

I'd just rip out this debug stuff once you've got this working, but if
we keep it, it certainly can't have a name as generic as print_oids()
when it's actually doing something with a list of ParentChild
structures.  Also, it prints names, not OIDs.  And DEBUG_PRINT_OIDS is
no good for the same reasons.

+if (RelationGetPartitionDesc(rel))
+walker->rels_list = append_rel_partition_oids(walker->rels_list, rel);

Every place that calls append_rel_partition_oids guards that call with
if (RelationGetPartitionDesc(...)).  It seems to me that it would be
simpler to remove those tests and instead just replace the
Assert(partdesc) inside that function with if (!partdesc) return;

Is there any real benefit in this "walker" interface?  It looks to me
like it might be simpler to just change things around so that it
returns a list of OIDs, like find_all_inheritors, but generated
differently.  Then if you want bound-ordering rather than
OID-ordering, you just do this:

list_free(inhOids);
inhOids = get_partition_oids_in_bound_order(rel);

That'd remove the need for some if/then logic as you've currently got
in get_next_child().

+is_partitioned_resultrel =
+(oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE
+ && rti == parse->resultRelation);

I suspect this isn't correct for a table that contains wCTEs, because
there would in that case be multiple result relations.

I think we should always expand in bound order rather than only when
it's a result relation. I think for partition-wise join, we're going
to want to do it this way for all relations in the query, or at least
for all relations in the query that might possibly be able to
participate in a partition-wise join.  If there are multiple cases
that are going to need this ordering, it's hard for me to accept the
idea that it's worth the complexity of trying to keep track of when we
expanded things in one order vs. another.  There are other
applications of having things in bound order too, like MergeAppend ->
Append strength-reduction (which might not be legal anyway if there
are list partitions with multiple, non-contiguous list bounds or if
any NULL partition doesn't end up in the right place in the order, but
there will be lots of cases where it can work).

On another note, did you do anything about the suggestion Thomas made
in 
http://postgr.es/m/CAEepm=3sc_j1zwqdyrbu4dtfx5rhcamnnuaxrkwzfgt9m23...@mail.gmail.com
?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-25 Thread Tom Lane
Victor Wagner  writes:
> PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
> (because it searches for libicu using pkgconfig, and pgkconfig support
> significantly changed in ICU  version 4.6)

> However, there are some reasons to build PostgreSQL with older versions
> of ICU. 

No doubt, but making that happen seems like a feature, and IMO we're too
late in the v10 beta test cycle to be taking new features on board,
especially ones with inherently-large portability risks.  We could maybe
consider patches for this for v11 ... but will anyone still care by then?

(Concretely, my concern is that the alpha and beta testing that's happened
so far hasn't covered pre-4.6 ICU at all.  I'd be surprised if the
incompatibility you found so far is the only one.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Stephen Frost
Tom,

On Tue, Jul 25, 2017 at 16:43 Tom Lane  wrote:

> AFAICT, pg_dump has no notion that it needs to be careful about the order
> in which permissions are granted.  I did
>
> regression=# create user joe;
> CREATE ROLE
> regression=# create user bob;
> CREATE ROLE
> regression=# create user alice;
> CREATE ROLE
> regression=# \c - joe
> You are now connected to database "regression" as user "joe".
> regression=> create table joestable(f1 int);
> CREATE TABLE
> regression=> grant select on joestable to alice with grant option;
> GRANT
> regression=> \c - alice
> You are now connected to database "regression" as user "alice".
> regression=> grant select on joestable to bob;
> GRANT
>
> and then pg_dump'd that.  The ACL entry for joestable looks like this:
>
> --
> -- TOC entry 5642 (class 0 OID 0)
> -- Dependencies: 606
> -- Name: joestable; Type: ACL; Schema: public; Owner: joe
> --
>
> SET SESSION AUTHORIZATION alice;
> GRANT SELECT ON TABLE joestable TO bob;
> RESET SESSION AUTHORIZATION;
> GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;
>
> Unsurprisingly, that fails to restore:
>
> db2=# SET SESSION AUTHORIZATION alice;
> SET
> db2=> GRANT SELECT ON TABLE joestable TO bob;
> ERROR:  permission denied for relation joestable
>
>
> I am not certain whether this is a newly introduced bug or not.
> However, I tried it in 9.2-9.6, and all of them produce the
> GRANTs in the right order:
>
> GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;
> SET SESSION AUTHORIZATION alice;
> GRANT SELECT ON TABLE joestable TO bob;
> RESET SESSION AUTHORIZATION;
>
> That might be just chance, but my current bet is that Stephen
> broke it sometime in the v10 cycle --- especially since we
> haven't heard any complaints like this from the field.


Hmmm. I'll certainly take a look when I get back to a laptop, but I can't
recall offhand any specific code for handling that, nor what change I might
have made in the v10 cycle which would have broken it (if anything, I would
have expected an issue from the rework in 9.6...).

I should be able to look at this tomorrow though.

Thanks!

Stephen


[HACKERS] pg_dump does not handle indirectly-granted permissions properly

2017-07-25 Thread Tom Lane
AFAICT, pg_dump has no notion that it needs to be careful about the order
in which permissions are granted.  I did

regression=# create user joe;
CREATE ROLE
regression=# create user bob;
CREATE ROLE
regression=# create user alice;
CREATE ROLE
regression=# \c - joe
You are now connected to database "regression" as user "joe".
regression=> create table joestable(f1 int);
CREATE TABLE
regression=> grant select on joestable to alice with grant option;
GRANT
regression=> \c - alice
You are now connected to database "regression" as user "alice".
regression=> grant select on joestable to bob;
GRANT

and then pg_dump'd that.  The ACL entry for joestable looks like this:

--
-- TOC entry 5642 (class 0 OID 0)
-- Dependencies: 606
-- Name: joestable; Type: ACL; Schema: public; Owner: joe
--

SET SESSION AUTHORIZATION alice;
GRANT SELECT ON TABLE joestable TO bob;
RESET SESSION AUTHORIZATION;
GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;

Unsurprisingly, that fails to restore:

db2=# SET SESSION AUTHORIZATION alice;
SET
db2=> GRANT SELECT ON TABLE joestable TO bob;
ERROR:  permission denied for relation joestable


I am not certain whether this is a newly introduced bug or not.
However, I tried it in 9.2-9.6, and all of them produce the
GRANTs in the right order:

GRANT SELECT ON TABLE joestable TO alice WITH GRANT OPTION;
SET SESSION AUTHORIZATION alice;
GRANT SELECT ON TABLE joestable TO bob;
RESET SESSION AUTHORIZATION;

That might be just chance, but my current bet is that Stephen
broke it sometime in the v10 cycle --- especially since we
haven't heard any complaints like this from the field.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] expand_dbname in postgres_fdw

2017-07-25 Thread Arseny Sher
Hi,

Is there any particular reason why postgres_fdw forbids usage of
connection strings by passing expand_dbname = false to
PQconnectdbParams? Attached patch shows where this happens.

>From 6bf3741976b833379f5bb370aa41f73a51b99afd Mon Sep 17 00:00:00 2001
From: Arseny Sher 
Date: Tue, 25 Jul 2017 21:36:45 +0300
Subject: [PATCH] expand_dbname=true in postgres_fdw

---
 contrib/postgres_fdw/connection.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index be4ec07cf9..bfcd9ed2e3 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -263,7 +263,7 @@ connect_pg_server(ForeignServer *server, UserMapping *user)
 		/* verify connection parameters and make connection */
 		check_conn_params(keywords, values);
 
-		conn = PQconnectdbParams(keywords, values, false);
+		conn = PQconnectdbParams(keywords, values, true);
 		if (!conn || PQstatus(conn) != CONNECTION_OK)
 			ereport(ERROR,
 	(errcode(ERRCODE_SQLCLIENT_UNABLE_TO_ESTABLISH_SQLCONNECTION),
-- 
2.11.0


--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL 10 (latest beta) and older ICU

2017-07-25 Thread Victor Wagner
Collegues,

PostgreSQL 10 when build with --with-icu requires ICU 4.6 and above.
(because it searches for libicu using pkgconfig, and pgkconfig support
significantly changed in ICU  version 4.6)

However, there are some reasons to build PostgreSQL with older versions
of ICU. 

For instance such Linux distributions as RHEL 6 ships ICU 4.2, and SLES
11 also ships older ICU. Sometimes, it is not feasible to compile newer
ICU from sources on the servers with these (and derived) distributions.

It is possible to compile PostgreSQL 10 with these versions of ICU
by specifying ICU_CFLAGS and ICU_LIBS explicitely.

However, backend startup fails with errors on some Spanish and
Singalese locales.

It turns out, that PostgreSQL enumerates collations for all ICU locales
and passes it into uloc_toLanguageTag function with strict argument of
this function set to TRUE. But for some locales
(es*@collation=tradtiional, si*@collation=dictionary) only call with
strict=FALSE succeeds in ICU 4.2. In newer versions of ICU all locales
can be resolved with strict=TRUE.

ICU docs state that if strict=FALSE, some locale fields can be
incomplete.

What solition is better:

1. Just skip locale/collation combinations which cannot be resolved
with strict=TRUE and issue warning instead of error if
uloc_toLanguageTag fails on some locale.

It would cause some ICU collations be missiong from the databases
running on the systems with old ICU.

2. If we see ICU version earlier than 4.6, pass strict=FALSE to
ucol_toLanguageTag.  It would cause databases on such systems use
fallback collation sequence for these collations.

Personally I prefer first solition, because I doubt that any of my
users would ever need Singalese collation, and probably they can
survive without traditional Spanish too.

-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 01:41 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 07/25/2017 11:25 AM, Tom Lane wrote:
>>> Oh.  So when was the last time it worked?  And why do the buildfarm
>>> archives contain apparently-successful log_stage entries for pg_ctl-check
>>> on jacana, up through yesterday when I looked?
>> There was a buildfarm bug, since corrected, which was not properly
>> picking up errors in a corner case. You will see the errors if you look
>> at all the logs for pg_ctl-check from 12 days back if they exist. The
>> last known actual good run of this test was 33 days ago, i.e. 2017-06-22
> Hm.  This failure presumably started with commit f13ea95f9, which
> introduced use of command_like() in the pg_ctl TAP test; but that
> didn't go in until 2017-06-28.  Was jacana not running in the
> interim?


Looks like not, possibly because one of the other machines on this box
was hung (probably lorikeet).


>
> Anyway, if we believe that it broke with f13ea95f9, hen it's plausible
> that CMD.EXE has been sharing pg_ctl's stdout/stderr all along, and we
> just had not noticed before.  Could you check what happens if we
> change the bInheritHandles parameter as I suggested upthread?
>
>   


I'll try when I get all the animals caught up.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inadequate infrastructure for NextValueExpr

2017-07-25 Thread Robert Haas
On Thu, Jul 13, 2017 at 9:46 PM, Thomas Munro
 wrote:
> I wrote a script to cross-check various node handling functions and it told 
> me:
>
> T_NextValueExpr not handled in outfuncs.c
> T_ObjectWithArgs not handled in outfuncs.c
> T_AccessPriv not handled in outfuncs.c
> T_CreateOpClassItem not handled in outfuncs.c
> T_FunctionParameter not handled in outfuncs.c
> T_InferClause not handled in outfuncs.c
> T_OnConflictClause not handled in outfuncs.c
> T_RoleSpec not handled in outfuncs.c
> T_PartitionCmd not handled in outfuncs.c
> T_NamedTuplestoreScan can be produced by outfuncs.c with tagname
> NAMEDTUPLESTORESCAN but that tagname is not recognized by readfuncs.c
> T_Alias not handled in ruleutils.c
> T_RangeVar not handled in ruleutils.c
> T_Expr not handled in ruleutils.c
> T_CaseWhen not handled in ruleutils.c
> T_TargetEntry not handled in ruleutils.c
> T_RangeTblRef not handled in ruleutils.c
> T_JoinExpr not handled in ruleutils.c
> T_FromExpr not handled in ruleutils.c
> T_OnConflictExpr not handled in ruleutils.c
> T_IntoClause not handled in ruleutils.c
> T_NextValueExpr not handled in ruleutils.c

That's pretty cool.  Per long-standing precedent, anything we use in a
build needs to be in Perl, not Python, but I think it would be great
to fix all of these (or the script) and then add this to our standard
build process.  It would be *great* to stop making mistakes like this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 5:47 AM, Petr Jelinek
 wrote:
> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

But if you don't call ConditionVariablePrepareToSleep() before calling
ConditionVariableSleep(), then the first call to the latter will call
the former and return without doing anything else.  So I don't see how
this can ever go wrong if you're using these primitives as documented.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:42 PM, Alvaro Herrera
 wrote:
> As I see it, we need to backpatch at least parts of this patch.  I've
> received reports that in earlier branches pglogical and BDR can
> sometimes leave slots behind when removing nodes, and I have a hunch
> that it can be explained by the bugs being fixed here.  Now we cannot
> use condition variables in back-branches, so we'll need to figure out
> how to actually do it ...

If all you had to back-patch was the condition variable code itself,
that might not really be all that bad, but it depends on the
WaitEventSet stuff, which I think is far too dangerous to back-patch.
However, you might be able to create a dumber, slower version that
only uses WaitLatch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:50 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> I'm not sure what you're arguing for here.
>
> Robert wants perfection, of course ;-).  As do we all.  But there are
> only so many hours in the day, and rejiggering pg_dump's rules about
> how to dump PLs is just way down the to-do list.  I'm going to go do
> something with more tangible benefit, like see if we can make its
> REFRESH MATERIALIZED VIEW commands come out at the right time.

+1 to all of that.  I'm only arguing that there's a difference between
the things that are worth fixing and the things that are formally
bugs.  This may not be worth fixing, but I think it's formally a bug,
because you could easily expect it to work and there's no user-facing
documentation anywhere that says it doesn't.  However, I'm no doubt
about the relative priority of this vs. the other issue you mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Tom Lane
"Joshua D. Drake"  writes:
> Isn't the simplest solution just to actually document this?

Given that it's behaved this way since 8.1 (maybe earlier, I'm not sure),
and nobody has complained before, I'm not sure it's worth documenting.
There are lots of undocumented behaviors in PG; if we tried to explain
every one of them, the docs would become unusably bloated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Tom Lane
Andres Freund  writes:
> I'm not sure what you're arguing for here.

Robert wants perfection, of course ;-).  As do we all.  But there are
only so many hours in the day, and rejiggering pg_dump's rules about
how to dump PLs is just way down the to-do list.  I'm going to go do
something with more tangible benefit, like see if we can make its
REFRESH MATERIALIZED VIEW commands come out at the right time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Joshua D. Drake

On 07/25/2017 10:24 AM, Andres Freund wrote:

On 2017-07-25 13:18:25 -0400, Robert Haas wrote:

On Tue, Jul 25, 2017 at 1:13 PM, Andres Freund  wrote:

On 2017-07-25 13:10:11 -0400, Robert Haas wrote:

On Tue, Jul 25, 2017 at 1:06 PM, Tom Lane  wrote:

Is this assumption, like, documented someplace?



Yes, and?  We can try to address countless intentionally unsupported
edge-cases, but it's going to cost code, complexity and time. And very
likely it's going to add hard to find, test and address bugs. pg_dump is
complicated as is, I don't think trying to address every conceivable
weirdness is a good idea. There's plenty more fundamental things wrong
(e.g. DDL concurrent with a dump sometimes breaking that dump).

I'm not sure what you're arguing for here.


Isn't the simplest solution just to actually document this? Code is not 
documentation except for those reading code. End users, sql developers, 
DBAs etc, should never have to open doxygen.postgresql.org to figure 
this stuff out.


JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 25/07/17 01:33, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> >> I'm back at looking into this again, after a rather exhausting week.  I
> >> think I have a better grasp of what is going on in this code now,
> > 
> > Here's an updated patch, which I intend to push tomorrow.
> 
> I think the ConditionVariablePrepareToSleep() call in
> ReplicationSlotAcquire() needs to be done inside the mutex lock
> otherwise there is tiny race condition where the process which has slot
> acquired might release the slot between the mutex unlock and the
> ConditionVariablePrepareToSleep() call which means we'll never get
> signaled and ConditionVariableSleep() will wait forever.

Hmm, yeah, that's not good.  However, I didn't like the idea of putting
it inside the locked area, as it's too much code.  Instead I added just
before acquiring the spinlock.  We cancel the sleep unconditionally
later on if we didn't need to sleep after all.

As I see it, we need to backpatch at least parts of this patch.  I've
received reports that in earlier branches pglogical and BDR can
sometimes leave slots behind when removing nodes, and I have a hunch
that it can be explained by the bugs being fixed here.  Now we cannot
use condition variables in back-branches, so we'll need to figure out
how to actually do it ...

> As a side note, the ConditionVariablePrepareToSleep()'s comment could be
> improved because currently it says the only advantage is that we skip
> double-test in the beginning of ConditionVariableSleep(). But that's not
> true, it's essential for preventing race conditions like the one above
> because it puts the current process into waiting list so we can be sure
> it will be signaled on broadcast once ConditionVariablePrepareToSleep()
> has been called.

Hmm.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/25/2017 11:25 AM, Tom Lane wrote:
>> Oh.  So when was the last time it worked?  And why do the buildfarm
>> archives contain apparently-successful log_stage entries for pg_ctl-check
>> on jacana, up through yesterday when I looked?

> There was a buildfarm bug, since corrected, which was not properly
> picking up errors in a corner case. You will see the errors if you look
> at all the logs for pg_ctl-check from 12 days back if they exist. The
> last known actual good run of this test was 33 days ago, i.e. 2017-06-22

Hm.  This failure presumably started with commit f13ea95f9, which
introduced use of command_like() in the pg_ctl TAP test; but that
didn't go in until 2017-06-28.  Was jacana not running in the
interim?

Anyway, if we believe that it broke with f13ea95f9, hen it's plausible
that CMD.EXE has been sharing pg_ctl's stdout/stderr all along, and we
just had not noticed before.  Could you check what happens if we
change the bInheritHandles parameter as I suggested upthread?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Andres Freund
On 2017-07-25 13:18:25 -0400, Robert Haas wrote:
> On Tue, Jul 25, 2017 at 1:13 PM, Andres Freund  wrote:
> > On 2017-07-25 13:10:11 -0400, Robert Haas wrote:
> >> On Tue, Jul 25, 2017 at 1:06 PM, Tom Lane  wrote:
> >> >> Is this assumption, like, documented someplace?
> >> >
> >> > Uh, right there?
> >>
> >> I don't think we can expect end-users to read the code comments to
> >> determine whether their apparently-legal SQL is fully supported.
> >
> > I don't think plain end-users are going to create differently named PLs
> > using builtin handlers. There's plenty special casing of system object
> > in pg_dump and elsewhere. Dependency tracking doesn't quite work right
> > if you refer to system objects either, etc.  This is superuser only
> > stuff, for a reason.
> 
> But superuser != developer.  Superusers aren't obliged to read the
> code comments any more than any other user.

And yet we tell them that they're to blame if they do a CREATE FUNCTION
with the wrong signature, or a DELETE FROM pg_class; or ...


> I think the only reason we don't get people whining about stuff like
> this more than we do is that it's pretty obscure.  But I bet if we
> look through the pgsql-bugs archives we can find people complaining
> about various cases where they did assorted seemingly-legal things
> that turned out not to be supported by pg_dump.  Whether this
> particular thing has been discovered by anyone before, I dunno.  But
> there's certainly a whole category of bug reports along the line of
> "pg_dump works mostly, except when I do X".

Yes, and?  We can try to address countless intentionally unsupported
edge-cases, but it's going to cost code, complexity and time. And very
likely it's going to add hard to find, test and address bugs. pg_dump is
complicated as is, I don't think trying to address every conceivable
weirdness is a good idea. There's plenty more fundamental things wrong
(e.g. DDL concurrent with a dump sometimes breaking that dump).

I'm not sure what you're arguing for here.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:13 PM, Andres Freund  wrote:
> On 2017-07-25 13:10:11 -0400, Robert Haas wrote:
>> On Tue, Jul 25, 2017 at 1:06 PM, Tom Lane  wrote:
>> >> Is this assumption, like, documented someplace?
>> >
>> > Uh, right there?
>>
>> I don't think we can expect end-users to read the code comments to
>> determine whether their apparently-legal SQL is fully supported.
>
> I don't think plain end-users are going to create differently named PLs
> using builtin handlers. There's plenty special casing of system object
> in pg_dump and elsewhere. Dependency tracking doesn't quite work right
> if you refer to system objects either, etc.  This is superuser only
> stuff, for a reason.

But superuser != developer.  Superusers aren't obliged to read the
code comments any more than any other user.

I think the only reason we don't get people whining about stuff like
this more than we do is that it's pretty obscure.  But I bet if we
look through the pgsql-bugs archives we can find people complaining
about various cases where they did assorted seemingly-legal things
that turned out not to be supported by pg_dump.  Whether this
particular thing has been discovered by anyone before, I dunno.  But
there's certainly a whole category of bug reports along the line of
"pg_dump works mostly, except when I do X".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Andres Freund
On 2017-07-25 13:10:11 -0400, Robert Haas wrote:
> On Tue, Jul 25, 2017 at 1:06 PM, Tom Lane  wrote:
> >> Is this assumption, like, documented someplace?
> >
> > Uh, right there?
> 
> I don't think we can expect end-users to read the code comments to
> determine whether their apparently-legal SQL is fully supported.

I don't think plain end-users are going to create differently named PLs
using builtin handlers. There's plenty special casing of system object
in pg_dump and elsewhere. Dependency tracking doesn't quite work right
if you refer to system objects either, etc.  This is superuser only
stuff, for a reason.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:06 PM, Tom Lane  wrote:
>> Is this assumption, like, documented someplace?
>
> Uh, right there?

I don't think we can expect end-users to read the code comments to
determine whether their apparently-legal SQL is fully supported.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 25, 2017 at 10:31 AM, Tom Lane  wrote:
>> pg_dump doesn't really support that scenario, and I don't feel any
>> great need to make it do so.  Per the comment in dumpProcLang:

> Is this assumption, like, documented someplace?

Uh, right there?

> I would be on board with the idea that you (or anyone, really) doesn't
> want to fix this because it's a fairly unimportant issue, but I balk
> at the notion that nothing is wrong here, because to me that looks
> busted.

Well, it's not just unimportant but smack in the middle of code that
is treading a very narrow path to avoid assorted version dependencies.
I don't want to risk breaking cases that are actually important in the
field to support something that's obviously a toy test case.

We might be able to make some simplification/rationalization here
whenever we desupport pg_dump from < 8.1 servers (ie pre pg_pltemplate).
But right now I'm afraid to touch it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-07-25 Thread Tomas Vondra




On 7/25/17 5:04 PM, Tom Lane wrote:

Tomas Vondra  writes:

On 7/25/17 12:55 AM, Tom Lane wrote:
I think the planner basically assumes that reltuples is the live 
tuple count, so maybe we'd better change VACUUM to get in step.



Attached is a patch that (I think) does just that. The disagreement
was caused by VACUUM treating recently dead tuples as live, while
ANALYZE treats both of those as dead.



At first I was worried that this will negatively affect plans in
the long-running transaction, as it will get underestimates (due
to reltuples not including rows it can see). But that's a problem
we already have anyway, you just need to run ANALYZE in the other
session.


This definitely will have some impact on plans, at least in cases
where there's a significant number of unvacuumable dead tuples. So I
think it's a bit late for v10, and I wouldn't want to back-patch at
all. Please add to the next commitfest.



I dare to disagree here, for two reasons.

Firstly, the impact *is* already there, it only takes running ANALYZE. 
Or VACUUM ANALYZE. In both those cases we already end up with 
reltuples=n_live_tup.


Secondly, I personally strongly prefer stable predictable behavior over 
intermittent oscillations between two values. That's a major PITA on 
production, both to investigate and fix.


So people already have this issue, although it only strikes randomly. 
And no way to fix it (well, except for fixing the cleanup, but that may 
not be possible).


It is true we tend to run VACUUM more often than ANALYZE, particularly 
in situations where the cleanup can't proceed - ANALYZE will do it's 
work and VACUUM will be triggered over and over again, so it "wins" this 
way. But I'm not sure that's something we should rely on.



FWIW I personally see this as a fairly annoying bug, and would vote to 
backpatch it, although I understand people might object. But I don't 
quite see a reason not to fix this in v10.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 5:57 AM, Teodor Sigaev  wrote:
>> It's not clear from the web site in question that the relevant code is
>> released under the PostgreSQL license.
>
> As author I allow to use this code in PostgreSQL and under its license.

Great.  As long as all authors feel that way, we're fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 10:31 AM, Tom Lane  wrote:
> tushar  writes:
>> postgres=# CREATE LANGUAGE alt_lang1 HANDLER plpgsql_call_handler;
>> CREATE LANGUAGE
>
> pg_dump doesn't really support that scenario, and I don't feel any
> great need to make it do so.  Per the comment in dumpProcLang:
>
>  * Try to find the support function(s).  It is not an error if we 
> don't
>  * find them --- if the functions are in the pg_catalog schema, as is
>  * standard in 8.1 and up, then we won't have loaded them. (In this 
> case
>  * we will emit a parameterless CREATE LANGUAGE command, which will
>  * require PL template knowledge in the backend to reload.)
>
> So the assumption is basically that PLs that don't have pg_pltemplate
> entries will not keep their support functions in pg_catalog.

Is this assumption, like, documented someplace?

I tend to think it's pretty bad when users can execute SQL and then
pg_dump doesn't work.  I mean, if you play with the contents of
pg_catalog, then I'm not sure how much we can guarantee, but I don't
see how a user would know that there's anything wrong with Tushar's
SQL command.

I would be on board with the idea that you (or anyone, really) doesn't
want to fix this because it's a fairly unimportant issue, but I balk
at the notion that nothing is wrong here, because to me that looks
busted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2017-07-25 Thread Thom Brown
On 8 April 2016 at 07:13, Craig Ringer  wrote:
> On 6 April 2016 at 22:17, Andres Freund  wrote:
>
>>
>> Quickly skimming 0001 in [4] there appear to be a number of issues:
>> * LWLockHeldByMe() is only for debugging, not functional differences
>> * ReplicationSlotPersistentData is now in an xlog related header
>> * The code and behaviour around name conflicts of slots seems pretty
>>   raw, and not discussed
>> * Taking spinlocks dependant on InRecovery() seems like a seriously bad
>>   idea
>> * I doubt that the archive based switches around StartupReplicationSlots
>>   do what they intend. Afaics that'll not work correctly for basebackups
>>   taken with -X, without recovery.conf
>>
>
> Thanks for looking at it. Most of those are my errors. I think this is
> pretty dead at least for 9.6, so I'm mostly following up in the hopes of
> learning about a couple of those mistakes.
>
> Good catch with -X without a recovery.conf. Since it wouldn't be recognised
> as a promotion and wouldn't increment the timeline, copied non-failover
> slots wouldn't get removed. I've never liked that logic at all anyway, I
> just couldn't think of anything better...
>
> LWLockHeldByMe() has a comment to the effect of: "This is meant as debug
> support only." So that's just a dumb mistake on my part, and I should've
> added "alreadyLocked" parameters. (Ugly, but works).
>
> But why would it be a bad idea to conditionally take a code path that
> acquires a spinlock based on whether RecoveryInProgress()? It's not testing
> RecoveryInProgress() more than once and doing the acquire and release based
> on separate tests, which would be a problem. I don't really get the problem
> with:
>
> if (!RecoveryInProgress())
> {
> /* first check whether there's something to write out */
> SpinLockAcquire(>mutex);
> was_dirty = slot->dirty;
> slot->just_dirtied = false;
> SpinLockRelease(>mutex);
>
> /* and don't do anything if there's nothing to write */
> if (!was_dirty)
> return;
> }
>
> ... though I think what I really should've done there is just always dirty
> the slot in the redo functions.

Are there any plans to submit a new design/version for v11?

Thanks

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Dilip Kumar
On Tue, Jul 25, 2017 at 8:59 PM, Robert Haas  wrote:
> On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
>  wrote:
>> - other queries show a good 20-30% improvement in performance. Performance
>> numbers are as follows,
>>
>> Query| un_part_head (seconds) | part_head (seconds) | part_patch (seconds) |
>> 3 | 76 |127 | 88 |
>> 4 |17 | 244 | 41 |
>> 5 | 52 | 123 | 84 |
>> 7 | 73 | 134 | 103 |
>> 10 | 67 | 111 | 89 |
>> 12 | 53 | 114 | 99 |
>> 18 | 447 | 709 | 551 |
>
> Hmm.  This certainly shows that benefit of the patch, although it's
> rather sad that we're still slower than if we hadn't partitioned the
> data in the first place.  Why does partitioning hurt performance so
> much?

I was analysing some of the plans (without partition and with
partition), Seems like one of the reasons of performing worse with the
partitioned table is that we can not use an index on the partitioned
table.

Q4 is taking 17s without partition whereas it's taking 244s with partition.

Now if we analyze the plan

Without partition, it can use parameterize index scan on lineitem
table which is really big in size. But with partition, it has to scan
this table completely.

  ->  Nested Loop Semi Join
 ->  Parallel Bitmap Heap Scan on orders
   ->  Bitmap Index Scan on
idx_orders_orderdate  (cost=0.00..24378.88 r
   ->  Index Scan using idx_lineitem_orderkey on
lineitem  (cost=0.57..29.29 rows=105 width=8) (actual
time=0.031..0.031 rows=1 loops=1122364)
   Index Cond: (l_orderkey =
orders.o_orderkey)
   Filter: (l_commitdate < l_receiptdate)
   Rows Removed by Filter: 1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 11:32 AM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Jul 25, 2017 at 11:00 AM, Tom Lane  wrote:
>>> Hm, I had the idea that we were already asking ExtUtils::Embed for that,
>>> but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
>>> like a promising avenue to pursue.
>>>
>>> It would be useful to see the results of
>>> perl -MExtUtils::Embed -e ccopts
>>> on one of the affected installations, and compare that to the problematic
>>> field(s).
>> Why ccopts rather than ccflags?
> I was looking at the current code which fetches ldopts, and analogizing.
> Don't know the difference between ccflags and ccopts.
>
>   


per docs:

ccopts()
This function combines "perl_inc()", "ccflags()" and
"ccdlflags()"
into one.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 11:32 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jul 25, 2017 at 11:00 AM, Tom Lane  wrote:
>>> Hm, I had the idea that we were already asking ExtUtils::Embed for that,
>>> but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
>>> like a promising avenue to pursue.
>>>
>>> It would be useful to see the results of
>>> perl -MExtUtils::Embed -e ccopts
>>> on one of the affected installations, and compare that to the problematic
>>> field(s).
>
>> Why ccopts rather than ccflags?
>
> I was looking at the current code which fetches ldopts, and analogizing.
> Don't know the difference between ccflags and ccopts.

Oh, here I was thinking you were 3 steps ahead of me.  :-)

Per "perldoc ExtUtils::Embed", ccopts() = perl_inc() plus ccflags()
plus ccdlflags().

On my system:

[rhaas pgsql]$ perl -MExtUtils::Embed -e 'for (qw(ccopts ccflags
ccdlflags perl_inc)) { print "==$_==\n"; eval "$_()"; print "\n\n";
};'
==ccopts==
 -fno-common -DPERL_DARWIN -mmacosx-version-min=10.12 -pipe -Os
-fno-strict-aliasing -fstack-protector-strong -I/opt/local/include
-DPERL_USE_SAFE_PUTENV
-I/opt/local/lib/perl5/5.24/darwin-thread-multi-2level/CORE

==ccflags==
 -fno-common -DPERL_DARWIN -mmacosx-version-min=10.12 -pipe -Os
-fno-strict-aliasing -fstack-protector-strong -I/opt/local/include
-DPERL_USE_SAFE_PUTENV

==ccdlflags==


==perl_inc==
 -I/opt/local/lib/perl5/5.24/darwin-thread-multi-2level/CORE

I don't have a clear sense of whether ccopts() or ccflags() is what we
want here, but FWIW ccopts() is more inclusive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 11:25 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 07/25/2017 11:11 AM, Tom Lane wrote:
>>> What I'm on about is that jacana still succeeds entirely, more than half
>>> the time.  If this is a handle-duplication problem, how could it ever
>>> succeed?
>> No, it hangs 100% of the time. The only time you see a result at all is
>> if I manually intervene. The pg_ctl test is thus currently disabled  on
>> jacana.
> Oh.  So when was the last time it worked?  And why do the buildfarm
> archives contain apparently-successful log_stage entries for pg_ctl-check
> on jacana, up through yesterday when I looked?
>
>   


There was a buildfarm bug, since corrected, which was not properly
picking up errors in a corner case. You will see the errors if you look
at all the logs for pg_ctl-check from 12 days back if they exist. The
last known actual good run of this test was 33 days ago, i.e. 2017-06-22

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jul 25, 2017 at 11:00 AM, Tom Lane  wrote:
>> Hm, I had the idea that we were already asking ExtUtils::Embed for that,
>> but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
>> like a promising avenue to pursue.
>> 
>> It would be useful to see the results of
>> perl -MExtUtils::Embed -e ccopts
>> on one of the affected installations, and compare that to the problematic
>> field(s).

> Why ccopts rather than ccflags?

I was looking at the current code which fetches ldopts, and analogizing.
Don't know the difference between ccflags and ccopts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 1:31 AM, Rafia Sabih
 wrote:
> - other queries show a good 20-30% improvement in performance. Performance
> numbers are as follows,
>
> Query| un_part_head (seconds) | part_head (seconds) | part_patch (seconds) |
> 3 | 76 |127 | 88 |
> 4 |17 | 244 | 41 |
> 5 | 52 | 123 | 84 |
> 7 | 73 | 134 | 103 |
> 10 | 67 | 111 | 89 |
> 12 | 53 | 114 | 99 |
> 18 | 447 | 709 | 551 |

Hmm.  This certainly shows that benefit of the patch, although it's
rather sad that we're still slower than if we hadn't partitioned the
data in the first place.  Why does partitioning hurt performance so
much?

Maybe things would be better at a higher scale factor.

When reporting results of this sort, it would be good to make a habit
of reporting the number of partitions along with the other details you
included.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Robert Haas
On Tue, Jul 25, 2017 at 11:00 AM, Tom Lane  wrote:
> Hm, I had the idea that we were already asking ExtUtils::Embed for that,
> but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
> like a promising avenue to pursue.
>
> It would be useful to see the results of
>
> perl -MExtUtils::Embed -e ccopts
>
> on one of the affected installations, and compare that to the problematic
> field(s).

Why ccopts rather than ccflags?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/25/2017 11:11 AM, Tom Lane wrote:
>> What I'm on about is that jacana still succeeds entirely, more than half
>> the time.  If this is a handle-duplication problem, how could it ever
>> succeed?

> No, it hangs 100% of the time. The only time you see a result at all is
> if I manually intervene. The pg_ctl test is thus currently disabled  on
> jacana.

Oh.  So when was the last time it worked?  And why do the buildfarm
archives contain apparently-successful log_stage entries for pg_ctl-check
on jacana, up through yesterday when I looked?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 11:11 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 07/24/2017 09:33 PM, Tom Lane wrote:
>>> Seems like the TAP test should fail every time, if this is a full
>>> description.  But maybe it's part of the answer, so it seems worth
>>> experimenting in this direction.
>> The test that hangs is the only case where we call pg_ctl via
>> command_like. If we use other mechanisms such as command_ok that don't
>> try to read the output there is no problem.
> What I'm on about is that jacana still succeeds entirely, more than half
> the time.  If this is a handle-duplication problem, how could it ever
> succeed?
>



No, it hangs 100% of the time. The only time you see a result at all is
if I manually intervene. The pg_ctl test is thus currently disabled  on
jacana.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 11:00 AM, Tom Lane wrote:
> Robert Haas  writes:
>> Perl also has a mechanism for flags added to Configure to be passed
>> along when building loadable modules; if it didn't, not just plperl
>> but every Perl module written in C would have this issue if any such
>> flags where used.
>> ...
>> While I'm not sure of the details, I suspect that we need to use one
>> of those methods to get the CCFLAGS used to build perl, and include
>> those when SPI.o, Util.o, and plperl.o in src/pl/plperl.  Or at least
>> the -D switches from those CCFLAGS.
> Hm, I had the idea that we were already asking ExtUtils::Embed for that,
> but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
> like a promising avenue to pursue.
>
> It would be useful to see the results of
>
> perl -MExtUtils::Embed -e ccopts
>
> on one of the affected installations, and compare that to the problematic
> field(s).

  -s -O2 -DWIN32 -DWIN64 -DCONSERVATIVE -DPERL_TEXTMODE_SCRIPTS
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fwrapv
-fno-strict-aliasing -mms-bitfields  -I"C:\Perl64\lib\CORE"

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Tom Lane
Andrew Dunstan  writes:
> On 07/24/2017 09:33 PM, Tom Lane wrote:
>> Seems like the TAP test should fail every time, if this is a full
>> description.  But maybe it's part of the answer, so it seems worth
>> experimenting in this direction.

> The test that hangs is the only case where we call pg_ctl via
> command_like. If we use other mechanisms such as command_ok that don't
> try to read the output there is no problem.

What I'm on about is that jacana still succeeds entirely, more than half
the time.  If this is a handle-duplication problem, how could it ever
succeed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [SQL] Postgresql “alter column type” creates an event which contains “temp_table_xxx”

2017-07-25 Thread Craig Ringer
On 25 July 2017 at 22:18, Tom Lane  wrote:

> =?UTF-8?B?WmVocmEgR8O8bCDDh2FidWs=?=  writes:
> >  => ALTER TABLE test ALTER COLUMN x TYPE integer USING
> > (trim(x)::integer);ALTER TABLE
> > Last command I've executed to alter column data type creates an event
> like
> > this:
> > BEGIN 500913table public.pg_temp_1077668: INSERT: x[integer]:14table
> > public.pg_temp_1077668: INSERT: x[integer]:42COMMIT 500913
> > How could I find "real" table name using this record? Is there any way to
> > see real table name in fetched record?
>
> That is the real name --- table rewrites create a table with a temporary
> name and the desired new column layout, then fill it with data, then
> exchange the data area with the old table, then drop the temp table.
>
> Evidently logical decoding is exposing some of this infrastructure
> to you.  I bet it isn't exposing the critical "swap data" step though,
> so I wonder how exactly a logical decoding plugin is supposed to make
> sense of what it can see here.


IMO, table rewrite support is less than ideal in logical decoding, and it's
something I'd love to tackle soon. Currently make_new_heap and
finish_heap_swap appear to be completely unaware of logical
decoding/replication. (I'm not sure that's the right level at which to
handle table rewrites yet, but it's a potential starting point).

Rather than emitting normal-looking insert change events for some fake
table name pg_temp_xxx, we should probably invoke a table-rewrite(start)
callback with the original table info, stream the new contents, and call a
table-rewrite(finished) callback. That'd likely just mean one new callback
for the output plugin, a rewrite(oid, bool start|finished)) callback.

Making this work sanely on the apply side might require some work too, but
it's one of the things that's needed to make logical replication more
transparent. The apply side should probably mirror what the originating txn
did, making a new temporary heap, populating it, and swapping it in. This
could result in FK violations if downstream-side tables have extra rows not
present in upstream tables, but that's no worse than regular logical
replication with session_replication_role='replica', and currently falls
into the "don't do that then" category.

We should probably support TRUNCATE the same way. The current mechanism
used in pglogical, capturing truncates with triggers, is a hack
necessitated by logical decoding's lack of support for telling output
plugins about relation truncation. AFAIK in-core logical rep doesn't
natively handle truncation yet, and this is one of the things it'd be good
to do for pg11, especially if more people get interested in contributing.

In the mean time, logical decoding clients can special case "pg_temp_"
relation names in their output plugins, extracting the oid and looking up
the table being rewritten and handling it that way. Not beautiful but it
offers a workaround.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-07-25 Thread Tom Lane
Tomas Vondra  writes:
> On 7/25/17 12:55 AM, Tom Lane wrote:
>> I think the planner basically assumes that reltuples is the live
>> tuple count, so maybe we'd better change VACUUM to get in step.

> Attached is a patch that (I think) does just that. The disagreement was 
> caused by VACUUM treating recently dead tuples as live, while ANALYZE 
> treats both of those as dead.

> At first I was worried that this will negatively affect plans in the 
> long-running transaction, as it will get underestimates (due to 
> reltuples not including rows it can see). But that's a problem we 
> already have anyway, you just need to run ANALYZE in the other session.

This definitely will have some impact on plans, at least in cases where
there's a significant number of unvacuumable dead tuples.  So I think
it's a bit late for v10, and I wouldn't want to back-patch at all.
Please add to the next commitfest.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Tom Lane
Robert Haas  writes:
> Perl also has a mechanism for flags added to Configure to be passed
> along when building loadable modules; if it didn't, not just plperl
> but every Perl module written in C would have this issue if any such
> flags where used.
> ...
> While I'm not sure of the details, I suspect that we need to use one
> of those methods to get the CCFLAGS used to build perl, and include
> those when SPI.o, Util.o, and plperl.o in src/pl/plperl.  Or at least
> the -D switches from those CCFLAGS.

Hm, I had the idea that we were already asking ExtUtils::Embed for that,
but now I see we only inquire about LDFLAGS not CCFLAGS.  Yes, this sounds
like a promising avenue to pursue.

It would be useful to see the results of

perl -MExtUtils::Embed -e ccopts

on one of the affected installations, and compare that to the problematic
field(s).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Tom Lane
Andrew Dunstan  writes:
> No amount of checking with the Perl community is likely to resolve this
> quickly w.r.t. existing releases of Perl.

Yes, but if they are shipping broken perl builds that cannot support
building of extension modules, they need to be made aware of that.
If that *isn't* the explanation, then we need to find out what is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-25 Thread Andrew Dunstan


On 07/24/2017 09:33 PM, Tom Lane wrote:
>
> [good theory about why pg_ctl hangs in TAP test]
>
> Now, this theory still has a Mack-truck-sized hole in it, which is
> if that's the failure mechanism then why isn't it 100% effective?
> Seems like the TAP test should fail every time, if this is a full
> description.  But maybe it's part of the answer, so it seems worth
> experimenting in this direction.


The test that hangs is the only case where we call pg_ctl via
command_like. If we use other mechanisms such as command_ok that don't
try to read the output there is no problem.

Another data point is that this test doesn't hang on bowerbird, which is
an animal on the same machine but doing an MSVC build. Thus my thesis
that it's probably to do with the imteraction with the MSys perl and shell.

A simple workaround might be to provide two flavors of command_like, one
that uses files it then slurps in and one that uses direct scalars and
EOF detection.

cheers

andrew


>
> A bit of googling Microsoft's documentation suggests that maybe all
> we have to do is pass CreateProcessAsUser's bInheritHandles parameter
> as FALSE not TRUE in pg_ctl.c.  It's not apparent to me that there
> are any handles we do need the CMD process to inherit.
>
>   


Maybe.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimation for join results cardinality is sometimes more than the product of the downstream nodes'

2017-07-25 Thread Tom Lane
Alexey Bashtanov  writes:
> Is there a reason that join cardinality estimates are not limited by the 
> product of the joined parts cardinalities like in the 
> join-card-est.patch attached?

Because that would be giving an unfair advantage to some paths over
others based on nothing except estimation errors.  I do not think we'd
get a net benefit in plan quality.

If we could do this earlier and adjust the join relation's overall
cardinality estimate, it might be something to consider.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Tom Lane
Amit Kapila  writes:
> I think the real question is where do we go from here.  Ashutosh has
> proposed a patch up-thread based on a suggestion from Andrew, but it
> is not clear if we want to go there as that seems to be bypassing
> handshake mechanism.

That definitely seems like the wrong route to me.  If the resulting
code works, it would at best be accidental.

> The other tests and analysis seem to indicate
> that the new version Perl binaries on Windows are getting built with
> flags that are incompatible with what we use for plperl and it is not
> clear why perl is using those flags.  Do you think we can do something
> at our end to make it work or someone should check with Perl community
> about it?

It would be a good idea to find somebody who knows more about how these
Perl distros are built.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Andrew Dunstan


On 07/25/2017 08:58 AM, Amit Kapila wrote:
>
> I think the real question is where do we go from here.  Ashutosh has
> proposed a patch up-thread based on a suggestion from Andrew, but it
> is not clear if we want to go there as that seems to be bypassing
> handshake mechanism.  The other tests and analysis seem to indicate
> that the new version Perl binaries on Windows are getting built with
> flags that are incompatible with what we use for plperl and it is not
> clear why perl is using those flags.  Do you think we can do something
> at our end to make it work or someone should check with Perl community
> about it?
>


No amount of checking with the Perl community is likely to resolve this
quickly w.r.t. existing releases of Perl.

We seem to have a choice either to abandon, at least temporarily, perl
support on Windows for versions of perl >= 5.20 (I think) or adopt the
suggestion I made. It's not a complete hack - it's something at least
somewhat blessed in perl's XSUB.h:

/* dXSBOOTARGSNOVERCHK has no API in xsubpp to choose it so do
#undef dXSBOOTARGSXSAPIVERCHK
#define dXSBOOTARGSXSAPIVERCHK dXSBOOTARGSNOVERCHK */


Note that on earlier versions of perl we got by without this check for
years without any issue or complaint I recall hearing of. If we don't
adopt this I would not be at all surprised to see Windows packagers
adopt it anyway.

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Tom Lane
tushar  writes:
> postgres=# CREATE LANGUAGE alt_lang1 HANDLER plpgsql_call_handler;
> CREATE LANGUAGE

pg_dump doesn't really support that scenario, and I don't feel any
great need to make it do so.  Per the comment in dumpProcLang:

 * Try to find the support function(s).  It is not an error if we don't
 * find them --- if the functions are in the pg_catalog schema, as is
 * standard in 8.1 and up, then we won't have loaded them. (In this case
 * we will emit a parameterless CREATE LANGUAGE command, which will
 * require PL template knowledge in the backend to reload.)

So the assumption is basically that PLs that don't have pg_pltemplate
entries will not keep their support functions in pg_catalog.  I think
there are exceptions to handle languages+support functions that are
wrapped into extensions ... but you didn't do that either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Syncing sql extension versions with shared library versions

2017-07-25 Thread Mat Arye
On Mon, Jul 24, 2017 at 6:16 PM, Tom Lane  wrote:

> Mat Arye  writes:
> > On Mon, Jul 24, 2017 at 1:38 PM, Tom Lane  wrote:
> >> I'm not really sure why planner hooks would have anything to do with
> your
> >> exposed SQL API?
>
> > Sorry what I meant was i'd like to package different versions of my
> > extension -- both .sql and .c --
> > and have the extension act consistently for any version until I do a
> ALTER
> > EXTENSION UPDATE.
> > So, I'd prefer a DB with an older extension to have the logic/code in the
> > hook not change even if I install a newer version's .so for use in
> another
> > database
> > (but don't update the extension to the newer version).  Does that make
> any
> > sense?
>
> The newer version's .so simply is not going to load into the older
> version; we intentionally prevent that from happening.  It's not necessary
> anyway because versions do not share library directories.  Therefore,
> you can have foo.so for 9.5 in your 9.5 version's library directory,
> and foo.so for 9.6 in your 9.6 version's library directory, and the
> filesystem will keep them straight for you.  It is not necessary to
> call them foo-9.5.so and foo-9.6.so.
>

I meant the extension version not the PG version. Let me try to explain:
If version 0.1.0 has optimization A in the planner hook, and 0.2.0 has
optimization B,
I'd like the property that even if I install foo-0.2.0.so (and also have
foo-0.1.0.so) in the
cluster, any database that has not done an ALTER EXTENSION UPDATE will
still do A
while any databases that have updated the extension will do B. I'd also
like to avoid doing a bunch
of if/else statements to make this happen. But that's the ideal, not sure
if I can make this happen.



>
> As for the other point, the usual idea is that if you have a
> SQL-accessible C function xyz() that needs to behave differently after an
> extension version update, then you make the extension update script point
> the SQL function to a different library entry point.  If your 1.0
> extension version originally had
>
> CREATE FUNCTION xyz(...) RETURNS ...
>   LANGUAGE C AS 'MODULE_PATHNAME', 'xyz';
>
> (note that the second part of the AS clause might have been implicit;
> no matter), then your update script for version 1.1 could do
>
> CREATE OR REPLACE FUNCTION xyz(...) RETURNS ...
>   LANGUAGE C AS 'MODULE_PATHNAME', 'xyz_1_1';
>
> Then in the 1.1 version of the C code, the xyz_1_1() C function provides
> the new behavior, while the xyz() C function provides the old behavior,
> or maybe just throws an error if you conclude it's impractical to emulate
> the old behavior exactly.  As I mentioned earlier, you can usually set
> things up so that you can share much of the code between two such
> functions.
>

Thanks for that explanation. It's clear now.


>
> The pgstattuple C function in contrib/pgstattuple is one example of
> having changed a C function's behavior in this way over multiple versions.
>
> regards, tom lane
>


[HACKERS] [PATCH] Make ExplainOpenGroup()/ExplainCloseGroup() available for extensions.

2017-07-25 Thread Hadi Moshayedi
Hello Hackers,

The attached patch moves declarations of
ExplainOpenGroup()/ExplainCloseGroup() from explain.c to explain.h.

This can be useful for extensions that need explain groups in their
custom-scan explain output.

For example, Citus uses groups in its custom explain outputs [1]. But it
achieves it by having a copy of ExplainOpenGroup()/ExplainCloseGroup() in
its source code, which is not the best way.

Please review.

[1] https://github.com/citusdata/citus/blob/master/src/backend/
distributed/planner/multi_explain.c

Thanks,
Hadi
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7648201218..46467e1045 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -124,10 +124,6 @@ static void ExplainCustomChildren(CustomScanState *css,
 	  List *ancestors, ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *value,
 bool numeric, ExplainState *es);
-static void ExplainOpenGroup(const char *objtype, const char *labelname,
- bool labeled, ExplainState *es);
-static void ExplainCloseGroup(const char *objtype, const char *labelname,
-  bool labeled, ExplainState *es);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
   ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
@@ -3213,7 +3209,7 @@ ExplainPropertyBool(const char *qlabel, bool value, ExplainState *es)
  * If labeled is true, the group members will be labeled properties,
  * while if it's false, they'll be unlabeled objects.
  */
-static void
+void
 ExplainOpenGroup(const char *objtype, const char *labelname,
  bool labeled, ExplainState *es)
 {
@@ -3276,7 +3272,7 @@ ExplainOpenGroup(const char *objtype, const char *labelname,
  * Close a group of related objects.
  * Parameters must match the corresponding ExplainOpenGroup call.
  */
-static void
+void
 ExplainCloseGroup(const char *objtype, const char *labelname,
   bool labeled, ExplainState *es)
 {
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 78822b766a..543b2bb0c6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -101,4 +101,9 @@ extern void ExplainPropertyFloat(const char *qlabel, double value, int ndigits,
 extern void ExplainPropertyBool(const char *qlabel, bool value,
 	ExplainState *es);
 
+extern void ExplainOpenGroup(const char *objtype, const char *labelname,
+ bool labeled, ExplainState *es);
+extern void ExplainCloseGroup(const char *objtype, const char *labelname,
+  bool labeled, ExplainState *es);
+
 #endif			/* EXPLAIN_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Robert Haas
On Wed, Jul 19, 2017 at 5:01 PM, Tom Lane  wrote:
> I am really suspicious that this means your libperl was built in an unsafe
> fashion, that is, by injecting configuration choices as random -D switches
> in the build process rather than making sure the choices were recorded in
> perl's config.h.  As an example, looking at the perl 5.24.1 headers on
> a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined
> if PERL_COPY_ON_WRITE were not defined, and the only way that that can
> happen is if PERL_NO_COW is defined, and there are no references to the
> latter anyplace except in this particular #if defined test in perl.h.

Hmm, it might not be so random as all that.  Have a look at this
commit log entry:

commit 1a904fc88069e249a4bd0ef196a3f1a7f549e0fe
Author: Father Chrysostomos 
Date:   Sun Nov 25 12:57:04 2012 -0800

Disable PL_sawampersand

PL_sawampersand actually causes bugs (e.g., perl #4289), because the
behaviour changes.  eval '$&' after a match will produce different
results depending on whether $& was seen before the match.

Using copy-on-write for the pre-match copy (preceding patches do that)
alleviates the slowdown caused by mentioning $&.  The copy doesn’t
happen unless the string is modified after the match.  It’s now a
post- match copy.  So we no longer need to do things differently
depending on whether $& has been seen.

PL_sawampersand is now #defined to be equal to what it would be if
every program began with $',$&,$`.

I left the PL_sawampersand code in place, in case this commit proves
immature.  Running Configure with -Accflags=PERL_SAWAMPERSAND will
reënable the PL_sawampersand mechanism.

Based on a bit of experimentation, that last bit contains a typo: it
should say -Accflags=-DPERL_SAWAMPERSAND; as written, the -D is
missing.[1] Anyway, the point is that at least in this case, there
seems to have been some idea that somebody might want to reenable this
in their own build even after it was disabled by default.

Perl also has a mechanism for flags added to Configure to be passed
along when building loadable modules; if it didn't, not just plperl
but every Perl module written in C would have this issue if any such
flags where used.  Normally, you compile perl modules by running "perl
Makefile.PL" to generate a makefile, and then building from the
makefile.  If you do that, then the Makefile ends up with a section in
it that looks like this:

# --- MakeMaker cflags section:

CCFLAGS = -fno-strict-aliasing -pipe -fstack-protector-strong
-I/opt/local/include -DPERL_USE_SAFE_PUTENV -DPERL_SAWAMPERSAND -Wall
-Werror=declaration-after-statement -Wextra -Wc++-compat
-Wwrite-strings
OPTIMIZE = -O3
PERLTYPE =
MPOLLUTE =

...and lo-and-behold, the -DPERL_SAWAMPERSAND flag which I passed to
Configure is there.   After a bit of time deciphering how MakeMaker
actually works, I figured out that it gets the value for CFLAGS by
doing "use Config;" and then referencing $Config::Config{'ccflags'};
an alternative way to get it, from the shell, is to run perl
-V:ccflags.

While I'm not sure of the details, I suspect that we need to use one
of those methods to get the CCFLAGS used to build perl, and include
those when SPI.o, Util.o, and plperl.o in src/pl/plperl.  Or at least
the -D switches from those CCFLAGS.  Here's about the simplest thing
that seems like it might work on Linux; Windows would need something
equivalent:

override CPPFLAGS += $(shell $(PERL) -MConfig -e 'print
$$Config::Config{"ccflags"};')

On my MacBook Pro, with the built-in switches, that produces:

-fno-common -DPERL_DARWIN -mmacosx-version-min=10.12 -pipe -Os
-fno-strict-aliasing -fstack-protector-strong -I/opt/local/include
-DPERL_USE_SAFE_PUTENV

Or we could try to extract just the -D switches:

override CPPFLAGS += $(shell $(PERL) -MConfig -e 'print join " ", grep
{ /^-D/ } split /\s+/, $$Config::Config{"ccflags"};')

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Arguably, the umlaut over "reenable" is also a typo, but that's a
sort of in a different category.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [SQL] Postgresql “alter column type” creates an event which contains “temp_table_xxx”

2017-07-25 Thread Tom Lane
=?UTF-8?B?WmVocmEgR8O8bCDDh2FidWs=?=  writes:
>  => ALTER TABLE test ALTER COLUMN x TYPE integer USING
> (trim(x)::integer);ALTER TABLE
> Last command I've executed to alter column data type creates an event like
> this:
> BEGIN 500913table public.pg_temp_1077668: INSERT: x[integer]:14table
> public.pg_temp_1077668: INSERT: x[integer]:42COMMIT 500913
> How could I find "real" table name using this record? Is there any way to
> see real table name in fetched record?

That is the real name --- table rewrites create a table with a temporary
name and the desired new column layout, then fill it with data, then
exchange the data area with the old table, then drop the temp table.

Evidently logical decoding is exposing some of this infrastructure
to you.  I bet it isn't exposing the critical "swap data" step though,
so I wonder how exactly a logical decoding plugin is supposed to make
sense of what it can see here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache lookup failed error for partition key with custom opclass

2017-07-25 Thread Tom Lane
Rushabh Lathia  writes:
> On Mon, Jul 24, 2017 at 7:23 PM, Tom Lane  wrote:
>> Some looking around says that this *isn't* the only place that just
>> blithely assumes that it will find an opfamily entry.  But I agree
>> that not checking for that isn't up to project standards.

> I go thorough the get_opfamily_proc() in the system and added the
> check for InvalidOid.

Think I did that already, please compare your results with
278cb4341103e967189997985b09981a73e23a34

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] estimation for join results cardinality is sometimes more than the product of the downstream nodes'

2017-07-25 Thread Alexey Bashtanov

Hello,

Postgres can produce a plan with a nested loop node having rows estimate 
much more than the product of underlying nodes' estimates, relying only 
on outer relation size:


alexey=# explain
SELECT oid, relname
FROM (
   SELECT m.oid, m.relname
   FROM pg_class m
UNION ALL
   SELECT m.oid, m.relname
   FROM pg_class m
) m
WHERE oid IN (VALUES (162456317), (162456310));
 QUERY PLAN

 Nested Loop  (cost=0.31..33.24 rows=*341* width=68)
   ->  Unique  (cost=0.04..0.04 rows=*2* width=4)
 ->  Sort  (cost=0.04..0.04 rows=2 width=4)
   Sort Key: (("*VALUES*".column1)::oid)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 
width=4)

   ->  Append  (cost=0.27..16.58 rows=*2* width=68)
 ->  Index Scan using pg_class_oid_index on pg_class m 
(cost=0.27..8.29 rows=1 width=68)

   Index Cond: (oid = ("*VALUES*".column1)::oid)
 ->  Index Scan using pg_class_oid_index on pg_class m_1 
(cost=0.27..8.29 rows=1 width=68)

   Index Cond: (oid = ("*VALUES*".column1)::oid)
(10 rows)

Why?
Is there a reason that join cardinality estimates are not limited by the 
product of the joined parts cardinalities like in the 
join-card-est.patch attached?
An example of a query working faster as a result of this change is in 
join-card-est.sql, result is in join-card-est.result


Best Regards,
  Alexey
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b35acb7..fa4bebc 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2185,15 +2185,6 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 	else
 		path->path.rows = path->path.parent->rows;
 
-	/* For partial paths, scale row estimate. */
-	if (path->path.parallel_workers > 0)
-	{
-		double		parallel_divisor = get_parallel_divisor(>path);
-
-		path->path.rows =
-			clamp_row_est(path->path.rows / parallel_divisor);
-	}
-
 	/*
 	 * We could include disable_cost in the preliminary estimate, but that
 	 * would amount to optimizing for the case where the join method is
@@ -2321,6 +2312,19 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
 		ntuples = outer_path_rows * inner_path_rows;
 	}
 
+	/* We cannot emit more rows than we process. */
+	if (path->path.rows > ntuples)
+		path->path.rows = clamp_row_est(ntuples);
+
+	/* For partial paths, scale row estimate. */
+	if (path->path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(>path);
+
+		path->path.rows =
+			clamp_row_est(path->path.rows / parallel_divisor);
+	}
+
 	/* CPU costs */
 	cost_qual_eval(_qual_cost, path->joinrestrictinfo, root);
 	startup_cost += restrict_qual_cost.startup;


join-card-est.sql
Description: application/sql
===MASTER:
 Hash Join  (cost=3085.45..23948.76 rows=100 width=12) (actual 
time=36.048..36.432 rows=4 loops=1)  
   Hash Cond: (aa1.b = bb.b)

   ->  Nested Loop  (cost=1.46..34.85 rows=100 width=8) (actual 
time=0.048..0.091 rows=4 loops=1)   
 ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.015..0.018 
rows=2 loops=1) 
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual 
time=0.015..0.016 rows=2 loops=1) 
 Sort Key: ai.a 

 Sort Method: quicksort  Memory: 25kB   

 ->  Seq Scan on ai  (cost=0.00..1.02 rows=2 width=4) 
(actual time=0.008..0.009 rows=2 loops=1) 
 ->  Append  (cost=0.42..16.89 rows=2 width=8) (actual 
time=0.019..0.033 rows=2 loops=2)
   ->  Index Scan using aa1_pkey on aa1  (cost=0.42..8.44 rows=1 
width=8) (actual time=0.018..0.020 rows=1 loops=2) 
 Index Cond: (a = ai.a) 

   ->  Index Scan using aa2_pkey on aa2  (cost=0.42..8.44 rows=1 
width=8) (actual time=0.011..0.011 rows=1 loops=2) 
 Index Cond: (a = ai.a) 

   ->  Hash  (cost=1443.00..1443.00 rows=10 width=8) (actual 
time=35.871..35.871 rows=10 loops=1)   
 Buckets: 131072  Batches: 2  Memory Usage: 2976kB  

 ->  Seq Scan on bb  (cost=0.00..1443.00 rows=10 width=8) (actual 

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-25 Thread Amit Kapila
On Mon, Jul 24, 2017 at 11:58 AM, Noah Misch  wrote:
> On Wed, Jul 19, 2017 at 05:01:31PM -0400, Tom Lane wrote:
>> Ashutosh Sharma  writes:
>> > Here are the list of macros and variables from 'intrpvar.h' file that
>> > are just defined in perl module but not in plperl on Windows,
>>
>> > #ifdef PERL_USES_PL_PIDSTATUS
>> > PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid 
>> > */
>> > #endif
>>
>> > #ifdef PERL_SAWAMPERSAND
>> > PERLVAR(I, sawampersand, U8)/* must save all match strings */
>> > #endif
>>
>> I am really suspicious that this means your libperl was built in an unsafe
>> fashion, that is, by injecting configuration choices as random -D switches
>> in the build process rather than making sure the choices were recorded in
>> perl's config.h.  As an example, looking at the perl 5.24.1 headers on
>> a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined
>> if PERL_COPY_ON_WRITE were not defined, and the only way that that can
>> happen is if PERL_NO_COW is defined, and there are no references to the
>> latter anyplace except in this particular #if defined test in perl.h.
>>
>> Where did your perl installation come from, anyway?  Are you sure the .h
>> files match up with the executables?
>
> I see corresponding symptoms with the following Perl distributions:
>
> strawberry-perl-5.26.0.1-64bit.msi:
>   src/pl/plperl/Util.c: loadable library and perl binaries are mismatched 
> (got handshake key 11800080, needed 11c00080)
> ActivePerl-5.24.1.2402-MSWin32-x64-401627.exe:
>   src/pl/plperl/Util.c: loadable library and perl binaries are mismatched 
> (got handshake key 11500080, needed 11900080)
>
> So, this affects each of the two prominent families of Perl Windows binaries.
>

I think the real question is where do we go from here.  Ashutosh has
proposed a patch up-thread based on a suggestion from Andrew, but it
is not clear if we want to go there as that seems to be bypassing
handshake mechanism.  The other tests and analysis seem to indicate
that the new version Perl binaries on Windows are getting built with
flags that are incompatible with what we use for plperl and it is not
clear why perl is using those flags.  Do you think we can do something
at our end to make it work or someone should check with Perl community
about it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread Thom Brown
On 25 July 2017 at 12:09, tushar  wrote:
> v9.6
>
> postgres=# CREATE LANGUAGE alt_lang1 HANDLER plpgsql_call_handler;
> CREATE LANGUAGE
> postgres=# \q
>
> v10 , run pg_upgrade - failing with this error -
>
> pg_restore: creating pg_largeobject_metadata "pg_largeobject_metadata"
> pg_restore: creating COMMENT "postgres"
> pg_restore: creating SCHEMA "public"
> pg_restore: creating COMMENT "SCHEMA "public""
> pg_restore: creating PROCEDURAL LANGUAGE "alt_lang1"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 560; 2612 16384 PROCEDURAL
> LANGUAGE alt_lang1 edb
> pg_restore: [archiver (db)] could not execute query: ERROR: unsupported
> language "alt_lang1"
> HINT:  The supported languages are listed in the pg_pltemplate system
> catalog.
> Command was: CREATE OR REPLACE PROCEDURAL LANGUAGE "alt_lang1";
>
>
> take the cluster dump using pg_dumpall
> "
> --
> -- Name: alt_lang1; Type: PROCEDURAL LANGUAGE; Schema: -; Owner: edb
> --
>
> CREATE OR REPLACE PROCEDURAL LANGUAGE alt_lang1;
>
>
> ALTER PROCEDURAL LANGUAGE alt_lang1 OWNER TO edb;
> "
>
> Handler part is missing and due to that  it is throwing an error ,if we try
> to execute on psql terminal.

I think this is because the handler function is in the pg_catalog
schema, which aren't dumped.  However, it seems they need to be if a
non-pg_catalog language is created that uses them.

This seems odd though:

/* Skip if not to be dumped */
if (!plang->dobj.dump || dopt->dataOnly)
return;

...

funcInfo = findFuncByOid(plang->lanplcallfoid);
if (funcInfo != NULL && !funcInfo->dobj.dump && plang->dobj.dump)
funcInfo = NULL;/* treat not-dumped same as not-found */


The reason I think it's odd is because, if it finds that the language
needs to be dumped, it checks whether the functions referenced for the
handler, inline and validator are in the pg_catalog schema too.  If
they are, it doesn't output them, but if we know the language isn't in
pg_catalog, and the functions for these options are, then surely we
*have* to output them?  Should there be any checks for these functions
at all?

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-25 Thread Rajkumar Raghuwanshi
On Tue, Jul 25, 2017 at 3:54 PM, Amit Khandekar 
wrote:

> On 25 July 2017 at 15:02, Rajkumar Raghuwanshi
>  wrote:
> > On Mon, Jul 24, 2017 at 11:23 AM, Amit Khandekar  >
> > wrote:
> >>
> >>
> >> Attached update-partition-key_v13.patch now contains this
> >> make_resultrels_ordered.patch changes.
> >>
> >
> > I have applied attach patch and got below observation.
> >
> > Observation :  if join producing multiple output rows for a given row to
> be
> > modified. I am seeing here it is updating a row and also inserting rows
> in
> > target table. hence after update total count of table got incremented.
>
> Thanks for catching this Rajkumar.
>
> So after the row to be updated is already moved to another partition,
> when the next join output row corresponds to the same row which is
> moved, that row is now deleted, so ExecDelete()=>heap_delete() gets
> HeapTupleSelfUpdated, and this is not handled. So even when
> ExecDelete() finds that the row is already deleted, we still call
> ExecInsert(), so a new row is inserted.  In ExecDelete(), we should
> indicate that the row is already deleted. In the existing patch, there
> is a parameter concurrenty_deleted for ExecDelete() which indicates
> that the row is concurrently deleted. I think we can make this
> parameter for both of these purposes so as to avoid ExecInsert() for
> both these scenarios. Will work on a patch.
>

Thanks Amit.

Got one more observation :  update... returning is not working with whole
row reference. please take a look.

postgres=# create table part (a int, b int) partition by range(a);
CREATE TABLE
postgres=# create table part_p1 partition of part for values from
(minvalue) to (0);
CREATE TABLE
postgres=# create table part_p2 partition of part for values from (0) to
(maxvalue);
CREATE TABLE
postgres=# insert into part values (10,1);
INSERT 0 1
postgres=# insert into part values (20,2);
INSERT 0 1
postgres=# update part t1 set a = b returning t1;
ERROR:  unexpected whole-row reference found in partition key


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-07-25 Thread Tomas Vondra

On 7/25/17 12:55 AM, Tom Lane wrote:

Tomas Vondra  writes:

It seems to me that VACUUM and ANALYZE somewhat disagree on what
exactly reltuples means. VACUUM seems to be thinking that reltuples
= live + dead while ANALYZE apparently believes that reltuples =
live



The question is - which of the reltuples definitions is the right
one? I've always assumed that "reltuples = live + dead" but perhaps
not?


I think the planner basically assumes that reltuples is the live
tuple count, so maybe we'd better change VACUUM to get in step.



Attached is a patch that (I think) does just that. The disagreement was 
caused by VACUUM treating recently dead tuples as live, while ANALYZE 
treats both of those as dead.


At first I was worried that this will negatively affect plans in the 
long-running transaction, as it will get underestimates (due to 
reltuples not including rows it can see). But that's a problem we 
already have anyway, you just need to run ANALYZE in the other session.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c5c194a..574ca70 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1261,7 +1261,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
 		 nblocks,
  vacrelstats->tupcount_pages,
-		 num_tuples);
+		 num_tuples - nkeep);
 
 	/*
 	 * Release any remaining pin on visibility map page.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Create language syntax is not proper in pg_dumpall and not working using pg_upgrade

2017-07-25 Thread tushar

v9.6

postgres=# CREATE LANGUAGE alt_lang1 HANDLER plpgsql_call_handler;
CREATE LANGUAGE
postgres=# \q

v10 , run pg_upgrade - failing with this error -

pg_restore: creating pg_largeobject_metadata "pg_largeobject_metadata"
pg_restore: creating COMMENT "postgres"
pg_restore: creating SCHEMA "public"
pg_restore: creating COMMENT "SCHEMA "public""
pg_restore: creating PROCEDURAL LANGUAGE "alt_lang1"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 560; 2612 16384 
PROCEDURAL LANGUAGE alt_lang1 edb
pg_restore: [archiver (db)] could not execute query: ERROR: unsupported 
language "alt_lang1"
HINT:  The supported languages are listed in the pg_pltemplate system 
catalog.

Command was: CREATE OR REPLACE PROCEDURAL LANGUAGE "alt_lang1";


take the cluster dump using pg_dumpall
"
--
-- Name: alt_lang1; Type: PROCEDURAL LANGUAGE; Schema: -; Owner: edb
--

CREATE OR REPLACE PROCEDURAL LANGUAGE alt_lang1;


ALTER PROCEDURAL LANGUAGE alt_lang1 OWNER TO edb;
"

Handler part is missing and due to that  it is throwing an error ,if we 
try to execute on psql terminal.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-25 Thread Amit Khandekar
On 25 July 2017 at 15:02, Rajkumar Raghuwanshi
 wrote:
> On Mon, Jul 24, 2017 at 11:23 AM, Amit Khandekar 
> wrote:
>>
>>
>> Attached update-partition-key_v13.patch now contains this
>> make_resultrels_ordered.patch changes.
>>
>
> I have applied attach patch and got below observation.
>
> Observation :  if join producing multiple output rows for a given row to be
> modified. I am seeing here it is updating a row and also inserting rows in
> target table. hence after update total count of table got incremented.

Thanks for catching this Rajkumar.

So after the row to be updated is already moved to another partition,
when the next join output row corresponds to the same row which is
moved, that row is now deleted, so ExecDelete()=>heap_delete() gets
HeapTupleSelfUpdated, and this is not handled. So even when
ExecDelete() finds that the row is already deleted, we still call
ExecInsert(), so a new row is inserted.  In ExecDelete(), we should
indicate that the row is already deleted. In the existing patch, there
is a parameter concurrenty_deleted for ExecDelete() which indicates
that the row is concurrently deleted. I think we can make this
parameter for both of these purposes so as to avoid ExecInsert() for
both these scenarios. Will work on a patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-25 Thread Masahiko Sawada
On Tue, Jul 25, 2017 at 4:43 AM, Michael Paquier
 wrote:
> On Mon, Jul 24, 2017 at 6:45 PM, Stephen Frost  wrote:
>> What the change would do is make the pg_stop_backup() caller block until
>> the last WAL is archvied, and perhaps that ends up taking hours, and
>> then the connection is dropped for whatever reason and the backup fails
>> where it otherwise what?  wouldn't have been valid anyway at that
>> point, since it's not valid until the last WAL is actually archived.
>> Perhaps eventually it would be archived and the caller was planning for
>> that and everything is fine, but, well, that feels like an awful lot of
>> wishful thinking.
>
> Letting users taking unconsciously inconsistent backups is worse than
> potentially breaking scripts that were actually not working as
> Postgres would expect. So I am +1 for back-patching a lighter version
> of the proposed patch that makes the wait happen on purpose.
>
>>> > I'd hate to have to do it, but we could technically add a GUC to address
>>> > this in the back-branches, no?  I'm not sure that's really worthwhile
>>> > though..
>>>
>>> That would be mighty ugly.
>>
>> Oh, absolutely agreed.
>
> Yes, let's avoid that. We are talking about a switch aimed at making
> backups potentially inconsistent.

Thank you for the review comments!
Attached updated the patch. The noting in pg_baseback doc will be
necessary for back branches if we decided to not back-patch it to back
branches. So it's not contained in this patch for now.

Regarding back-patching this to back branches, I also vote for
back-patching to back branches. Or we can fix the docs of back
branches and fix the code only in PG10. I expect that the user who
wrote a backup script has done enough functional test and dealt with
this issue somehow, but since the current doc clearly says that
pg_stop_backup() waits for all WAL to be archived we have to make a
consideration about there are users who wrote a wrong backup script.
So I think we should at least notify it in the minor release.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pg_stop_backup_on_standby_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-07-25 Thread Andres Freund
On 2017-07-10 21:39:09 +1200, Thomas Munro wrote:
> Here's a new version that introduces a per-session DSM segment to hold
> the shared record typmod registry (and maybe more things later).

You like to switch it up. *.patchset.tgz??? ;)


It does concern me that we're growing yet another somewhat different
hashtable implementation. Yet I don't quite see how we could avoid
it. dynahash relies on proper pointers, simplehash doesn't do locking
(and shouldn't) and also relies on pointers, although to a much lesser
degree.  All the open coded tables aren't a good match either.  So I
don't quite see an alternative, but I'd love one.


Regards,

Andres

diff --git a/src/backend/lib/dht.c b/src/backend/lib/dht.c
new file mode 100644
index 000..2fec70f7742
--- /dev/null
+++ b/src/backend/lib/dht.c

FWIW, not a big fan of dht as a filename (nor of dsa.c). For one DHT
usually refers to distributed hash tables, which this is not, and for
another the abbreviation is so short it's not immediately
understandable, and likely to conflict further.  I think it'd possibly
ok to have dht as symbol prefixes, but rename the file to be longer.

+ * To deal with currency, it has a fixed size set of partitions, each of which
+ * is independently locked.

s/currency/concurrency/ I presume.


+ * Each bucket maps to a partition; so insert, find
+ * and iterate operations normally only acquire one lock.  Therefore, good
+ * concurrency is achieved whenever they don't collide at the lock partition

s/they/operations/?


+ * level.  However, when a resize operation begins, all partition locks must
+ * be acquired simultaneously for a brief period.  This is only expected to
+ * happen a small number of times until a stable size is found, since growth is
+ * geometric.

I'm a bit doubtful that we need partitioning at this point, and that it
doesn't actually *degrade* performance for your typmod case.


+ * Resizing is done incrementally so that no individual insert operation pays
+ * for the potentially large cost of splitting all buckets.

I'm not sure this is a reasonable tradeoff for the use-case suggested so
far, it doesn't exactly make things simpler. We're not going to grow
much.


+/* The opaque type used for tracking iterator state. */
+struct dht_iterator;
+typedef struct dht_iterator dht_iterator;

Isn't it actually the iterator state? Rather than tracking it? Also, why
is it opaque given you're actually defining it below? Guess you'd moved
it at some point.


+/*
+ * The set of parameters needed to create or attach to a hash table.  The
+ * members tranche_id and tranche_name do not need to be initialized when
+ * attaching to an existing hash table.
+ */
+typedef struct
+{
+   Size key_size;  /* Size of the key (initial 
bytes of entry) */
+   Size entry_size;/* Total size of entry */

Let's use size_t, like we kind of concluded in the thread you started:
http://archives.postgresql.org/message-id/25076.1489699457%40sss.pgh.pa.us
:)

+   dht_compare_function compare_function;  /* Compare function */
+   dht_hash_function hash_function;/* Hash function */

Might be worth explaining that these need to provided when attaching
because they're possibly process local. Did you test this with
EXEC_BACKEND?

+   int tranche_id; /* The tranche ID to use for 
locks. */
+} dht_parameters;


+struct dht_iterator
+{
+   dht_hash_table *hash_table; /* The hash table we are iterating 
over. */
+   bool exclusive; /* Whether to lock buckets 
exclusively. */
+   Size partition; /* The index of the next 
partition to visit. */
+   Size bucket;/* The index of the next bucket 
to visit. */
+   dht_hash_table_item *item;  /* The most recently returned item. */
+   dsa_pointer last_item_pointer; /* The last item visited. */
+   Size table_size_log2;   /* The table size when we started iterating. */
+   bool locked;/* Whether the current partition is 
locked. */

Haven't gotten to the actual code yet, but this kinda suggest we leave a
partition locked when iterating? Hm, that seems likely to result in a
fair bit of pain...


+/* Iterating over the whole hash table. */
+extern void dht_iterate_begin(dht_hash_table *hash_table,
+ dht_iterator 
*iterator, bool exclusive);
+extern void *dht_iterate_next(dht_iterator *iterator);
+extern void dht_iterate_delete(dht_iterator *iterator);

s/delete/delete_current/? Otherwise it looks like it's part of
manipulating just the iterator.

+extern void dht_iterate_release(dht_iterator *iterator);

I'd add lock to to the name.


+/*
+ * An item in the hash table.  This wraps the user's entry object in an
+ * envelop that holds a pointer back to the bucket and a pointer to the next
+ * item in the bucket.
+ */
+struct 

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Teodor Sigaev


It's not clear from the web site in question that the relevant code is
released under the PostgreSQL license.



As author I allow to use this code in PostgreSQL and under its license.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-25 Thread Petr Jelinek
On 25/07/17 01:33, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> I'm back at looking into this again, after a rather exhausting week.  I
>> think I have a better grasp of what is going on in this code now,
> 
> Here's an updated patch, which I intend to push tomorrow.
> 

I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.

Otherwise the patch looks good to me.

As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-25 Thread Rajkumar Raghuwanshi
On Mon, Jul 24, 2017 at 11:23 AM, Amit Khandekar 
wrote:

>
> Attached update-partition-key_v13.patch now contains this
> make_resultrels_ordered.patch changes.
>
>
I have applied attach patch and got below observation.

Observation :  if join producing multiple output rows for a given row to be
modified. I am seeing here it is updating a row and also inserting rows in
target table. hence after update total count of table got incremented.

below are steps:
postgres=# create table part_upd (a int, b int) partition by range(a);
CREATE TABLE
postgres=# create table part_upd1 partition of part_upd for values from
(minvalue) to (-10);
CREATE TABLE
postgres=# create table part_upd2 partition of part_upd for values from
(-10) to (0);
CREATE TABLE
postgres=# create table part_upd3 partition of part_upd for values from (0)
to (10);
CREATE TABLE
postgres=# create table part_upd4 partition of part_upd for values from
(10) to (maxvalue);
CREATE TABLE
postgres=# insert into part_upd select i,i from generate_series(-30,30,3)i;
INSERT 0 21





*postgres=# select count(*) from part_upd; count ---21(1 row)*
postgres=#
postgres=# create table non_part_upd (a int);
CREATE TABLE
postgres=# insert into non_part_upd select i%2 from
generate_series(-30,30,5)i;
INSERT 0 13
postgres=# update part_upd t1 set a = (t2.a+10) from non_part_upd t2 where
t2.a = t1.b;
UPDATE 7





*postgres=# select count(*) from part_upd; count ---27(1 row)*
postgres=# select tableoid::regclass,* from part_upd;
 tableoid  |  a  |  b
---+-+-
 part_upd1 | -30 | -30
 part_upd1 | -27 | -27
 part_upd1 | -24 | -24
 part_upd1 | -21 | -21
 part_upd1 | -18 | -18
 part_upd1 | -15 | -15
 part_upd1 | -12 | -12
 part_upd2 |  -9 |  -9
 part_upd2 |  -6 |  -6
 part_upd2 |  -3 |  -3
 part_upd3 |   3 |   3
 part_upd3 |   6 |   6
 part_upd3 |   9 |   9
 part_upd4 |  12 |  12
 part_upd4 |  15 |  15
 part_upd4 |  18 |  18
 part_upd4 |  21 |  21
 part_upd4 |  24 |  24
 part_upd4 |  27 |  27
 part_upd4 |  30 |  30







* part_upd4 |  10 |   0 part_upd4 |  10 |   0 part_upd4 |  10 |
0 part_upd4 |  10 |   0 part_upd4 |  10 |   0 part_upd4 |  10 |
0 part_upd4 |  10 |   0*(27 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] asynchronous execution

2017-07-25 Thread Kyotaro HORIGUCHI
Hello,

8bf58c0d9bd33686 badly conflicts with this patch, so I'll rebase
this and added a patch to refactor the function that Anotonin
pointed. This would be merged into 0002 patch.

At Tue, 18 Jul 2017 16:24:52 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170718.162452.221576658.horiguchi.kyot...@lab.ntt.co.jp>
> I'll put an upper limit to the number of waiters processed at
> once. Then add a comment like that.
> 
> > Actually the reason I thought of simplification was that I noticed small
> > inefficiency in the way you do the compaction. In particular, I think it's 
> > not
> > always necessary to swap the tail and head entries. Would something like 
> > this
> > make sense?
> 
> I'm not sure, but I suppose that it is rare that all of the first
> many elements in the array are not COMPLETE. In most cases the
> first element gets a response first.
...
> Yeah, but maybe the "head" is still confusing even if reversed
> because it is still not a head of something.  It might be less
> confusing by rewriting it in more verbose-but-straightforwad way.
> 
> 
> |  int npending = 0;
> | 
> |  /* Skip over not-completed items at the beginning */
> |  while (npending < estate->es_num_pending_async &&
> | estate->es_pending_async[npending] != ASYNCREQ_COMPLETE)
> |npending++;
> | 
> |  /* Scan over the rest for not-completed items */
> |  for (i = npending + 1 ; i < estate->es_num_pending_async; ++i)
> |  {
> |PendingAsyncRequest *tmp;
> |PendingAsyncRequest *curr = estate->es_pending_async[i];
> |
> |if (curr->state == ASYNCREQ_COMPLETE)
> |  continue;
> |
> |  /* Move the not-completed item to the tail of the first chunk */
> |tmp = estate->es_pending_async[i];
> |estate->es_pending_async[nepending] = tmp;
> |estate->es_pending_async[i] = tmp;
> |++npending;
> |  }

The last patch does something like this (with apparent bugs
fixed)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 41ad9a7518c066da619363e6cdf8574fa00ee1e5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 22 Feb 2017 09:07:49 +0900
Subject: [PATCH 1/5] Allow wait event set to be registered to resource owner

WaitEventSet needs to be released using resource owner for a certain
case. This change adds WaitEventSet reowner and allow the creator of a
WaitEventSet to specify a resource owner.
---
 src/backend/libpq/pqcomm.c|  2 +-
 src/backend/storage/ipc/latch.c   | 18 ++-
 src/backend/storage/lmgr/condition_variable.c |  2 +-
 src/backend/utils/resowner/resowner.c | 68 +++
 src/include/storage/latch.h   |  4 +-
 src/include/utils/resowner_private.h  |  8 
 6 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 4452ea4..ed71e7c 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -220,7 +220,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 07b1364..9543397 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -51,6 +51,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "utils/resowner_private.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -77,6 +78,8 @@ struct WaitEventSet
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
+	ResourceOwner	resowner;	/* Resource owner */
+
 	/*
 	 * Array, of nevents_space length, storing the definition of events this
 	 * set is waiting for.
@@ -359,7 +362,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -518,12 +521,15 @@ ResetLatch(volatile Latch *latch)
  * WaitEventSetWait().
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
 	Size		sz = 0;
 
+	if (res)
+		ResourceOwnerEnlargeWESs(res);
+
 	/*
 	 * Use MAXALIGN size/alignment to guarantee that later uses of memory are
 	 * aligned correctly. E.g. 

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-07-25 Thread Oleg Bartunov
On Mon, Jul 24, 2017 at 11:38 PM, Robert Haas  wrote:
> On Fri, Jul 21, 2017 at 8:05 AM, Alexey Chernyshov
>  wrote:
>> the following patch transfers functionality from gevel module
>> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
>> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
>> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
>> GIN indexes.
>
> It's not clear from the web site in question that the relevant code is
> released under the PostgreSQL license.

git clone git://sigaev.ru/gevel

from README.gevel

License

Stable version, included into PostgreSQL distribution, released under
BSD license. Development version, available from this site, released under
the GNU General Public License, version 2 (June 1991)

We would be happy to write anything community likes :)

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump issues

2017-07-25 Thread Victor Drobny

We can't create any schema dump with another (user defined) name. E.g.
we dump schema test and we want to save it's dump with test2 name in
any format. Those refers to databases dump.


Hello,

Do you expect to have some flag like '--rename=test->test2'?
Will dump with test replaced by test2(of course only in related places) 
be valid dump in this case?
What is the possible scenario for the renaming option? Is it doing to be 
dumping of the one schema only?
Or it could be dump of database? In this case pg_dump should also 
support multiple rules for renaming.


Thank you for attention!

--
--
Victor Drobny
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING

2017-07-25 Thread Kyotaro HORIGUCHI
At Mon, 24 Jul 2017 10:23:07 +0530, Ashutosh Bapat 
 wrote in 

> On Fri, Jul 21, 2017 at 10:39 PM, Tom Lane  wrote:
> > Ashutosh Bapat  writes:
> >> On Fri, Jul 21, 2017 at 10:55 AM, Kyotaro HORIGUCHI
> >>  wrote:
> >>> The attached patch differs only in this point.
> >
> >> +1. The patch looks good to me.
> >
> > Pushed with a couple additional changes: we'd all missed that the header
> > comment for GetConnection was obsoleted by this change, and the arguments
> > for GetSysCacheHashValue really need to be coerced to Datum.  (I think
> > OID to Datum is the same as what the compiler would do anyway, but best
> > not to assume that.)
> 
> Thanks and sorry for not noticing the prologue.

Ditto.

> >
> > Back-patching was more exciting than I could wish.  It seems that
> > before 9.6, we did not have struct UserMapping storing the OID of the
> > pg_user_mapping row it had been made from.  I changed GetConnection to
> > re-look-up that row and get the OID.  But that's ugly, and there's a
> > race condition: if user mappings are being added or deleted meanwhile,
> > we might locate a per-user mapping when we're really using a PUBLIC
> > mapping or vice versa, causing the ConnCacheEntry to be labeled with
> > the wrong hash value so that it might not get invalidated properly later.
> > Still, it's significantly better than it was, and that corner case seems
> > unlikely to get hit in practice --- for one thing, you'd have to then
> > revert the mapping addition/deletion before the ConnCacheEntry would be
> > found and used again.  I don't want to take the risk of modifying struct
> > UserMapping in stable branches, so it's hard to see a way to make that
> > completely bulletproof before 9.6.
> 
> +1.

Agreed.

> -- 
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers